Descriptive Statistics and Exploratory Data Analysis (EDA)
Broadly speaking, before any analysis on the data begins, you can classify any business situation into two groups:
- The business already has a Hypothesis about its products or services and would like to test its Hypothesis using its data.
- The business has collected data as part of its operations but has no knowledge of its significance and/or may have some hunches about Hypothesis that may be hidden in the data with no concrete evidence.
Statistical techniques are used in both the above scenarios. Statistics refers to the various math concepts that are applied on data to help understand the data.
There are very few things which we know, which are not capable of being reduc’d to a Mathematical Reasoning, ... and where a Mathematical Reasoning can be had, it’s as great folly to make use of any other, as to grope for a thing in the dark when you have a Candle standing by you
John Arburhnot (1692)
Statistical methods can be divided into:
- Inferential Statistics - applied when you already have a hypothesis and want to test your hypothesis using advanced models. With inferential statistics, you are trying to reach conclusions that extend beyond the sample data on hand; perhaps the entire population. You may make judgements or infer outcomes based on your hypothesis testing with advanced models.
- Descriptive Statistics - applied when you are trying to explore data with neither a hypothesis nor a concrete evidence for any hypothesis in mind. You start the journey with descriptive summaries and charts and try to discover if there is any hidden hypothesis that you can extract.
Inferential Statistical Techniques are enormous and is beyond the scope of this eBook. However, you will learn some key concepts in data collection when you already have a Hypothesis.
In this eBook, you will learn some basic Descriptive Statistical techniques used in Exploratory Data Analytics (EDA) to help a budding Data Analyst come up to speed.