Exploratory data analysis


In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. 

A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. 

EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

For a good example of automated EDA, please check this one: (IBM cloud)

https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/b021d2c8-585c-4af7-98fb-9c7d950fb9d1/view?access_token=92f2f438d92d81ce251a8aaecc4e3e35373bc978cee3f0c03ff59ec0e6749757

Also, you can check this example in github:

https://github.com/dspub/Probability-and-Statistics-for-Data-Science/blob/master/Automated%20Exploratory%20Data%20Analysis.ipynb

Comments

Popular posts from this blog

Real life example for linear regression: drug dosage and blood pressure of patients

Decision Tree algorithm

Real life example for linear regression: fertiliser, water and crop yields