Sampling
In order to perform statistical calculations, we must have data. Generally in statistics we take a sample from a given population. The sample should represent the population.
In our ZingM pill experiment earlier, we had 10 participants who make the sample from the population.
There is also one more variable that has to be accounted for when sampling. Is our data random?
A true random data closely represents the population and the statistics that you derive from this random will closely reflect the population parameters.
More on these new terms;
Population: The population is all the individuals in a group. Population does not mean the entire population on earth! It means the complete list of participants for the study. Supposing you are doing a survey in a school to find out if students are happy with the school administration, then the entire student body of that school is the population. However you will randomly get a sample of the population to conduct your survey.
- Parameter: A parameter defines a characteristic of the population. The parameter; the average age of all students in a school is 29.
- Statistic: Statistic defines a characteristic of the sample. The statistic; the average age of 45 random sample students is 28.
Here are some common sampling techniques:
- SRS (Simple Random Sample) - This is the most basic random sample, where any member of the population has an equal chance of being selected for the sample
- Stratified Random Sample - This sampling technique divides a population based on a certain characteristic (Memory test population could be divided into female and male groups). An SRS of those groups are taken. This method creates a more diverse sample, but is harder to execute.
- Cluster Sample - Unlike a stratified sample, cluster samples are randomly clustered into groups and an SRS is taken. These groups are not based on any specific variable. This sampling method is easiest to use but less precise.
However, sometimes you see studies done on samples that are not adequately random, and therefore do not make good techniques. Beware of:
- Convenience Samples - Sampling technique where the easiest method of sampling is taken (If I wanted to get some sample to run my memory test, the easiest option for me is to just ask students in my class!)
- Voluntary Samples - Participants choose to join the sample (any internet survey)
Foundation for Random Theory
John Venn presented the Logic of Chance: In a shower, no one can guess the whereabouts at any instant a drop will fall, but we know that if we put out a sheet of paper it will gradually become uniformly spotted and if we were to mark out any two uniform areas on the paper, these would gradually be stuck equally often.
This is the foundation for obtaining random sampling of data!
A few other Statistical Terms
- Treatment: In an experiment, the manner in which researchers handle subjects is called a treatment. Researchers are specifically interested in how different treatments might yield differing results.
- Observational Study: An observational study is when an experimenter watches a group of subjects and does not introduce a treatment.