2. Central Limit Theorem and Standard Error: A Journey to Inner Serenity with Statistics

JoeWebDesigns
4 min readApr 24, 2022

Intro

This article uses a real-life question to help you understand: i) what is the Central Limit Theorem (CLT); ii) why is it so important; and iii) and how it can be applied to the real world to make meaningful statistical inferences

All Important Question

  • What is the mean height of English males?

Useful Definitions

  • With Replacement: When we undergo random sampling and a data point has been extracted and recorded, we must not exclude the possibility of it being picked out again in the future (we conceptually put it back in the hat)
  • Without Replacement: When we undergo random sampling and a data point has been extracted and recorded, we must exclude the possibility of it being picked out again in the future (we conceptually tear it up and do not put it back in the hat)

What is CLT?

Different populations can take on different shapes. A population’s distribution may be normal, left-skewed, right-skewed, uniform, or other. Regardless, CLT applies. So what is it? If we have our population of English males, we can take multiple samples from this population and compute a mean for each sample. For CLT to hold, the following conditions must be met: i) each sample must be sufficiently large (n>30) and ii) samples must be taken with replacement. If we were to plot our sample means using a histogram (with a suitable bin size), then its shape would follow a normal distribution. Even more amazing is that at the centre of this normal distribution would be the population mean. Yes you read correctly, the population mean…

Why is CLT so Important?

No matter the shape of the population distribution, we can take samples and find the population mean (as long as we obey the CLT conditions stated above). We can also determine how far away the sample means are from the population mean using the standard error. More on this in the next section…

Normally Distributed Population vs Normally Distributed Sample vs Normal Distribution of Sample Means (Important Distinction)

For CLT, we are interested in Case 3 only, but it is important to understand the distinction between all of the Cases below to help you become a more-informed statistician:

  • Case 1 — Normally Distributed Population: Populations that are normally distributed are characterised by two parameters i.e. parameter 1: μ (population mean) and parameter 2: σ (population standard deviation). The former shows the central value of the population’s distribution and the latter measures how dispersed the population’s data is in relation to μ. One important use of σ is that we can apply the 68–95–99 rule which states that 68%, 95% and 99% of the population data lies within ±1σ, ±2σ and ±3σ of the population mean μ respectively.
  • Case 2 — Normally Distributed Sample: Samples that are normally distributed can be characterised by two statistics i.e. statistic 1: x_bar (sample mean) and statistic 2: S (sample standard deviation). The former shows the central value of the sample’s normal distribution and the latter measures how dispersed the sample’s data is in relation to x-bar. One important use of S is that we can apply the 68–95–99 rule which states that 68%, 95% and 99% of the sample data lies within ±1S, ±2S and ±3S of the sample mean x_bar respectively.
  • Case 3 — Normal Distribution of Sample Means: Sample means that are normally distributed around the population mean measure dispersion using SE (standard error). One important use of SE is that we can apply the 68–95–99 rule which states that 68%, 95% and 99% of the sample means lie within ±1SE, ±2SE and ±3SE of the population mean μ respectively.

How Can CLT Be Applied To Our Real World Question?

So once we have found the mean height of English males by extracting the central value of our normal distribution of sample means (Case 3), we can calculate the standard error and use the 68–95–99 rule to determine the likelihood that a particular sample belongs to the English male population. If it falls outside of μ±3SE, we can say that: i) the sample belongs to the bottom 1% (approximately); or ii) the sample is biassed; or iii) the sample does not belong to the population at all. Further analysis can be explored to determine which is true…

Takeaway Messages

  • CLT (Central Limit Theorem) is extremely useful in making inferences about the population and it states that sample means will be normally distributed around the population mean (even if the population is not normally distributed)
  • For CLT to work, you must use a sample size of n>30 and you must sample with replacement
  • SE (standard error) is used to measure the dispersion of sample means around its central value i.e. the population mean μ
  • The 68–95–99 rule states that 68%, 95% and 99% of the sample means lie within ±1SE, ±2SE and ±3SE of the population mean μ respectively

Useful Resources

  • Udemy: Statistics Courses (5 to 30+ hours per course)
  • Medium: A Journey to Inner Serenity with Statistics by Joseph Wheatley
  • Books: Find one that suits your needs and is tailored towards your learning style

--

--

JoeWebDesigns

I'm here to bring your ideas to the World Wide Web!