4. ANOVA and the f-distribution: A Journey to Inner Serenity with Statistics

JoeWebDesigns
8 min readMay 18, 2022

Intro

This article aims to help you understand: i) What ANOVA (Analysis of Variance) is and when to use it; ii) Which distribution to use; iii) How to carry out ANOVA testing; And iv) How to arrive at a meaningful conclusion. Once you understand the fundamentals of ANOVA, and in what situation it should be applied; all you need to do is carry out the the test and interpret the results!

Symbols

  • 𝜇 (Population Mean)
  • x_bar (Sample Mean)

What is ANOVA?

ANOVA is a statistical technique used to examine whether two or more groups are significantly different from each other. We can carry out ANOVA testing by making use of the f-distribution.

f-distribution vs z-distribution vs t-distribution

Which distribution should we use for ANOVA? z-tests and t-tests rely on the z-distribution and t-distribution respectively to assess the probability that two samples come from the same population (fyi: both distributions are symmetrical). Context and information available determines whether we use the former or the latter test. On the other hand, f-tests rely on the f-distribution (a positively skewed distribution) which is used when we want to know: i) the probability that two samples come from populations that have the same variance; or ii) the probability that three or more samples come from the same population. For ANOVA, the f-distribution is the distribution we care about, and we will include examples of ii) later on in this article.

Types of ANOVA Test

  • One Way ANOVA: use this test when you want to know whether there are statistical differences between the means of three or more independent groups.
  • Two Way ANOVA: same as One Way ANOVA except you can further divide the data by looking across the different groups and separating the values into blocks. The different blocks represent changes to a separate variable which may (or may not) have an impact on the data. There are two types of Two Way ANOVA: i) without replication and ii) with replication. The difference between these will easily be understood in the examples later.

One Way ANOVA Method (3 Sample Test)

a) Million Dollar Question: Do Sample A, Sample B and Sample C belong to the same population?

b) Define your null hypothesis Hₒ and your alternative hypothesis Hₐ.

c) Choose an appropriate C.L (confidence level) and hence determine your significance level 𝛼, where 𝛼 = 1-C.L

d) Calculate the sample means (mean of Sample A, mean of Sample B and mean of Sample C).

e) Calculate the F-value. What is that? ANOVA considers two types of variance: i) Between Groups (how far the group means stray from the total mean) and ii) Within Groups (how far the individual values stray from their respective group mean). The F-value is the ratio between i) and ii) and can easily be calculated using software like Excel (see examples later).

f) We then need to calculate F-crit. But how do we do this? F-crit can be found using an F-table for the confidence level defined in step c) AND for the degrees of freedom unique to the problem (see formula below). F-crit can also be very easily calculated using Excel (see examples later).

g) We then compare the F-value with F-crit, which leads to one of the following outcomes to be true:

  • If F < F-crit, we fail to reject the null hypothesis.
  • If F > F-crit, we reject the null hypothesis, and thus the alternative hypothesis is true i.e. Sample A, Sample B and Sample C do not belong to the same population.

One Way ANOVA Example

A company sells different types of fertiliser i.e. Fertiliser A, Fertiliser B and Fertiliser C. They decide to conduct an experiment to determine whether the time taken for Plant K to reach maturity is dependent on the fertiliser being used. The data has been shown below.

a) Q: Are Fertiliser A, Fertiliser B and Fertiliser C statistically similar or statistically different with respect to Plant K’s time to maturity?

b) Null Hypothesis:

c) We will choose a C.L of 0.95 (95%), and hence our significance level 𝛼 =0.05 (5%)

d), e) and f) We will do our analysis in Excel:

  • In the Data ribbon, select the Data Analysis tab, and choose Anova: Single Factor. Select OK.
  • Then input: i) your data, ii) your calculated significance level and iii) where you want your output table to be located. Then click OK.
  • You will then see the output table below.

g) From the output table shown above, we can see that F < F-crit. Therefore we fail to reject the null hypothesis, and hence for a confidence level of 95%, we cannot statistically differentiate between the fertilisers in terms of the time taken for Plant K to reach maturity.

Important Note: If we decided to lower our C.L to 0.9 (90%), then 𝛼 =0.1 (1%), and after following d), e), f) and g) through, we now find that F=3.99 and F-crit=3.46. Now the tables have turned where F > F-crit, and hence we now reject the null hypothesis, and find that there is a statistical difference between the fertilisers. This is a great example to show how changing the C.L has an impact on our end result, reinforcing the point that our output has a confidence level attached to it. As a general rule, the more serious we are about the data set, the higher the level of confidence we need for testing.

Two Way ANOVA Example (without Replication)

The same company decides to do a similar experiment, but in addition to changing the fertiliser used, they also subject Plant K to different humidities. The data has been shown below.

IMPORTANT NOTE: The data above is classed as ANOVA ‘without repetition’ because within each fertiliser group, we only have one plant subjected to a given humidity.

a) Q: Are Fertiliser A, Fertiliser B and Fertiliser C statistically similar or statistically different with respect to Plant K’s time to maturity?

b) Null Hypothesis:

c) We will choose a C.L of 0.95 (95%), and hence our significance level 𝛼 =0.05 (5%)

d), e) and f) We will do our analysis in Excel:

  • In the Data ribbon, select the Data Analysis tab, and choose Anova: Two -Factor Without Replication. Select OK.
  • Then input: i) your data, ii) your calculated significance level and iii) where you want your output table to be located. Then click OK.
  • You will then see the output table below.

g) From the output table shown above, we can see that F < F-crit. Therefore we fail to reject the null hypothesis, and hence for a confidence level of 95%, we cannot statistically differentiate between the fertilisers in terms of the time taken for Plant K to reach maturity.

Important Note: If we decided to lower our C.L to 0.9 (90%), then 𝛼 =0.1 (1%), and after following d), e), f) and g) through, we find that F=2.23 and F-crit=4.32, where F < F-crit. We therefore reach the same conclusion as before and we can say with a 90% confidence level that we cannot statistically differentiate between the fertilisers in terms of the time taken for Plant K to reach maturity. If we kept lowering our C.L, we may reach a point where F becomes greater than F-crit. Why don’t you have a play with this yourself?

Two Way ANOVA Example (with Replication)

Similar to Two Way ANOVA (without Replication), but this time we have the data set below. Can you spot the difference? Well, within each fertiliser group, we have more than one plant subjected to a given humidity. Note: we have also removed the 80% Humidity block and changed the values slightly, but that is simply to make this example easier to understand :)

a) Q: Are Fertiliser A, Fertiliser B and Fertiliser C statistically similar or statistically different with respect to Plant K’s time to maturity?

b) Null Hypothesis:

c) We will choose a C.L of 0.95 (95%), and hence our significance level 𝛼 =0.05 (5%)

d), e) and f) We will do our analysis in Excel:

  • In the Data ribbon, select the Data Analysis tab, and choose Anova: Two -Factor With Replication. Select OK.
  • Then input: i) your data, ii) the number of rows per sample, iii) your calculated significance level and iv) where you want your output table to be located. Then click OK.
  • You will then see the output table below.

g) From the output table shown above, we can see that F > F-crit. Therefore we reject the null hypothesis, and hence for a confidence level of 95%, we can statistically say that the fertiliser type does make an impact on the time taken for Plant K to reach maturity.

Takeaway Messages

  • Use ANOVA when you want to test whether two or more groups are statistically different from each other.
  • Use the f-distribution for ANOVA testing, but also be aware of the differences between an f-distribution, a z-distribution and a t-distribution.
  • Be cognisant of the different types of ANOVA tests: i) One-Way, ii) Two-Way (without Replication), and iii) Two-way (With Replication).
  • Excel or Python will be your greatest friends when carrying out ANOVA testing. By all means try it out by hand, but at your own peril!
  • A strong understanding of hypothesis testing is needed to fully appreciate ANOVA testing in all its magnificence, so here is a link to an article should you need to brush up on your knowledge: ‘https://medium.com/@joe-1297/3-hypothesis-testing-z-tests-vs-t-tests-a-journey-to-inner-serenity-with-statistics-121dba31aa9a’.

Check Out My Medium Page

Medium Articles: https://medium.com/@joe-1297

--

--