3. Hypothesis Testing (z-tests vs t-tests): A Journey to Inner Serenity with Statistics

8 min readMay 5, 2022

Intro

This article uses examples to help you understand: i) What is hypothesis testing and why do we use it? ii) What types of hypothesis test are available and which test should we use? iii) How do we perform a hypothesis test? And iv) How do we arrive at a meaningful conclusion? Once you understand the fundamentals and know which avenue to explore, you will be on the road to competent highway!

Useful Definitions

Null Hypothesis Hₒ: Statistical statement that suggests no statistical deviation from the status quo.
Alternative Hypothesis Hₐ: Statistical statement that suggests a deviation from the status quo.
C.L (Confidence Level): A percentage that reflects the level of certainty we have in our final conclusion. Defining this is important as it determines how narrow / wide our critical value is.
Left-tailed Test: Our null hypothesis rejection region lies on the left-tail of the normal distribution.
Right-tailed Test: Our null hypothesis rejection region lies on the right-tail of the normal distribution.
Two-tailed Test: Our null hypothesis rejection region lies on both the left and right tails of the normal distribution.

Symbols

μ (Population Mean)
𝛼 (Significance Level)
σ (Population Standard Deviation)
x_bar (Sample Mean)
s (Sample Standard Deviation)
df (Degrees of Freedom)

Hypothesis Testing — What is it and Why do we do it?

Hypothesis testing is a method of statistical inference used to determine whether the data we have supports a particular hypothesis. This allows us to find meaningful insights about the population from only having access to a sample — amazing, huh?

Hypothesis Testing — Overall Method

To perform a hypothesis test we must do the following:

Define the claim or question we want to find an answer to.
Define the null hypothesis Hₒ and the alternative hypothesis Hₐ based on the claim / question we want to answer. There are only ever two possible outcomes to hypothesis testing, these are: i) we reject Hₒ and hence Hₐ is true, or ii) we fail to reject Hₒ (we cannot say whether Hₒ is true). As a result, we must be careful in how we define Hₒ and Hₐ, and we can follow some rules to do this correctly: i) if we are testing something that can be assumed to be true (status quo), Hₒ can reflect this assumption; ii) if we are testing something we want to be true (but can’t assume), we test the opposite; iii) Hₒ should contain an equality (=, ≤, ≥), iv) Hₐ should not contain an equality (≠, <, >). You will see in the examples sections later on how these rules can be applied.
Determine an appropriate C.L (confidence level) for the test, examples include: i) 90%/0.9, ii) 95%/0.95 (most commonly used) or iii) 99%/0.99. After selecting a suitable C.L, find the corresponding significance level 𝛼, where 𝛼 = 1-C.L, i.e. i) 0.1, ii) 0.05 or iii) 0.01 for the C.L examples respectively.
Determine an appropriate test statistic i.e. do we use a z-statistic, t-statistic, f-statistic or other? What about all the different types of t-statistics you can choose from? It is easy to blur your understanding here and be unsure of which to use, but don’t stress, we will cover this clearly later on with some useful examples. After choosing an appropriate test statistic, you will need to look back at your Hₒ and Hₐ to determine whether you are carrying out: i) a left-tailed test (Hₒ>Hₐ), ii) a right-tailed test (Hₒ<Hₐ), or iii) a two-tailed test (Hₒ≠Hₐ). You will understand what this really means and how this can be used in the examples sections later.
Once we know what test statistic to use and whether we are testing using a left-tailed, right-tailed or two-tailed test, we then need to: i) calculate the test statistic, and ii) find the critical value using a table (or Excel). Then compare i) and ii) together. There are two ways to make comparisons i.e.1) using the Traditional Method (comparing the statistic with the critical value), or ii) using the P-value Method (comparing the probability with 𝛼) . We will show how both lead to the same conclusion in the examples sections later.
Two Possible Results: i) After using the Traditional Method and finding that the test statistic is outside the region defined by the critical value (inside the rejection region), or by using the P-value Method and finding that P<𝛼, then we reject Hₒ, and conclude that Hₐ is true. Or ii) After using the Traditional Method and finding that the test statistic is inside the region defined by the critical value, or by using the P-value Method and finding that P>𝛼, then we fail to reject Hₒ (we cannot say whether Hₒ is true).

Hypothesis Testing — Test Statistics

The test statistics we have available to carry out our hypothesis tests include:

z-test (example covered later): Use this test when we know σ (population standard deviation) AND when our sample size n≥30.
One Sample t-test (example covered later): Use this test when we do not know σ OR when we do know σ but our sample size n<30.
Independent Two Sample t-test (example covered later): Use this test when we want to know whether the unknown population means of two groups are equal or not.
Paired Sample t-test (not covered here but many examples can be found online): Use this test to compare two population means - that have been taken from the same set of people or objects - to determine whether they belong to the same population or not. Examples include: i) iron content in blood before and after taking a pill, or ii) class test scores before and after taking a course. In both cases, the same people were tested, but we want to determine whether i) the pill or ii) the course did or did not result in a change to iron levels or test scores respectively.
f-test (will be covered in a separate article).

Hypothesis Testing — z-test Example

Company A went on a major sales campaign to increase the number of kettles that they could sell each day. After their campaign ended, they wanted to see if there had been a statistical increase in the number of sales. Before the sales campaign the mean number of sales per day was 145.6, with a standard deviation of 5.7. After taking 40 days of post-campaign sales into consideration, their mean number of sales over this period was 151.6. Question: Was their a statistical increase in the number of sales as a result of the sales campaign? Assumptions: i) the number of kettles sold per day is normally distributed around a central mean value; and ii) kettle sales are mostly independent of season (time of year) and other external variables.

Meaningful Conclusion: As a result of rejecting Hₒ and finding that Hₐ is true, we can say with 95% confidence that the sales campaign did result in a statistical increase in the number of kettles sold per day.

Hypothesis Testing — One Sample t-test Example

Company B predicts that the mean toasting time of their seeded bread using a standard toaster is 47.4 seconds. Company B took a random sample of 10 slices, which produced a mean toasting time of 41.6s with a standard deviation of 3.3s. Question: Was Company B’s prediction correct?Assumptions: i) toasting time is normally distributed around a central mean value; and ii) the same toaster was used throughout the experiment.

Meaningful Conclusion: As a result of rejecting Hₒ and finding that Hₐ is true, we can say with 99% confidence that the population mean toasting time is not equal to 47.4s.

Hypothesis Testing — Independent Two Sample t-test Example

A company has decided that due to unfortunate circumstances, they will need to shut down one of their two shops (one in Cambridge, and one in Oxford), but which should they choose? Both shops have the same running costs, and their sales revenue across the last 14 weeks can be directly compared. Q: Does one shop statistically generate more revenue than the other? Assumptions: i) assume that the sales revenue is normally distributed around a central mean value; ii) assume that due to geographical separation, the financial performance of one shop is independent of the other.

Meaningful Conclusion: As a result of failing to reject Hₒ, we cannot say whether Shop B statistically generates more revenue than Shop A.

Type I and Type II Errors

Type I Error: When you reject a null hypothesis that should not have been rejected.

Type II Error: When you fail to reject a null hypothesis that should have been rejected.

Takeaway Messages

Understand the question / claim you want to answer / prove and determine the level of confidence you need in your end result.
Write down your null and alternative hypotheses correctly using the guidance / rules outlined in this article.
Choose the statistical test most appropriate for your situation and then carry out the test correctly.
Arrive at a meaningful conclusion, with an eye on how your conclusion can lead to informed decisions being made.
Be aware of Type I and Type II Errors.

Useful Resources

z-table link -> http://www.z-table.com
t-table link -> https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf
Further reading on Type I and Type II Errors here -> https://www.simplypsychology.org/type_I_and_type_II_errors.html

3. Hypothesis Testing (z-tests vs t-tests): A Journey to Inner Serenity with Statistics

Written by JoeWebDesigns