Statistics Big Picture
Statistics provides a way of organizing data to extract information on a wider and objective basis than relying on personal experience
- Data Gathering
- Data Understanding
- Data Analysis/Interpretation
- Data Presentation
Population & Sample
Census: Gathering data from the whole population of interest.
For example, elections, 10-year census, etc.
Survey: Gathering data from the sample in order to make conclusions about the population.
For example, opinion polls, quality control checks in manufacturing units, etc.
Data Gathering or Sampling Techniques
There are four types of sampling techniques
- Convenience Sampling
Convenience sampling is a type of non-probability sampling that involves the sample being drawn from that part of the population that is close to hand. This type of sampling is most useful for pilot testing.
Eg: Online Polls, Asking your friends etc.
2. Random Sampling
Each member has equal chance of being selected.
3. Systematic Random Sampling
Example: Supermarket chooses every 10th or 15th customer entering the supermarket and conduct the survey.
4. Stratified Sampling
Divide the data into several relevant strata and then sample from each strata
Eg: For getting an opinion on demonetization, one choice of strata might be state-wise analysis. We get 20 random volunteers from each and every state.
5. Cluster Sampling
Divide the population in to groups or clusters. Then select a one or a few clusters and survey everyone from the chosen subset.
Parameter & Statistic
Parameter: A descriptive measure of the population. For example, population mean, population variance, population standard deviation, etc.
Statistic: A descriptive measure of the sample. For example, sample mean, sample variance, sample standard deviation, etc.
Mean – μ
Variance – σ2
Standard Deviation – σ
Mean – x
Variance – s2
Standard Deviation – s
Descriptive & Inferential Statistics
Descriptive Statistics: Data gathered about a group to reach conclusion about the same group.
Inferential Statistics: Data gathered from a sample and the statistics generated to reach conclusion about the population from which the sample is taken. Also known as Inductive Statistics
These are the basic terminologies that are required for Statistics for Data Science