Z-test Calculator

Table of contents

This Z-test calculator is a tool that helps you perform a one-sample Z-test on the population's mean . Two forms of this test - a two-tailed Z-test and a one-tailed Z-tests - exist, and can be used depending on your needs. You can also choose whether the calculator should determine the p-value from Z-test or you'd rather use the critical value approach!

Read on to learn more about Z-test in statistics, and, in particular, when to use Z-tests, what is the Z-test formula, and whether to use Z-test vs. t-test. As a bonus, we give some step-by-step examples of how to perform Z-tests!

Or you may also check our t-statistic calculator , where you can learn the concept of another essential statistic. If you are also interested in F-test, check our F-statistic calculator .

What is a Z-test?

A one sample Z-test is one of the most popular location tests. The null hypothesis is that the population mean value is equal to a given number, μ 0 \mu_0 μ 0 ​ :

We perform a two-tailed Z-test if we want to test whether the population mean is not μ 0 \mu_0 μ 0 ​ :

and a one-tailed Z-test if we want to test whether the population mean is less/greater than μ 0 \mu_0 μ 0 ​ :

Let us now discuss the assumptions of a one-sample Z-test.

When do I use Z-tests?

You may use a Z-test if your sample consists of independent data points and:

the data is normally distributed , and you know the population variance ;

the sample is large , and data follows a distribution which has a finite mean and variance. You don't need to know the population variance.

The reason these two possibilities exist is that we want the test statistics that follow the standard normal distribution N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) . In the former case, it is an exact standard normal distribution, while in the latter, it is approximately so, thanks to the central limit theorem.

The question remains, "When is my sample considered large?" Well, there's no universal criterion. In general, the more data points you have, the better the approximation works. Statistics textbooks recommend having no fewer than 50 data points, while 30 is considered the bare minimum.

Z-test formula

Let x 1 , . . . , x n x_1, ..., x_n x 1 ​ , ... , x n ​ be an independent sample following the normal distribution N ( μ , σ 2 ) \mathrm N(\mu, \sigma^2) N ( μ , σ 2 ) , i.e., with a mean equal to μ \mu μ , and variance equal to σ 2 \sigma ^2 σ 2 .

We pose the null hypothesis, H 0  ⁣  ⁣ :  ⁣  ⁣   μ = μ 0 \mathrm H_0 \!\!:\!\! \mu = \mu_0 H 0 ​ :   μ = μ 0 ​ .

We define the test statistic, Z , as:

x ˉ \bar x x ˉ is the sample mean, i.e., x ˉ = ( x 1 + . . . + x n ) / n \bar x = (x_1 + ... + x_n) / n x ˉ = ( x 1 ​ + ... + x n ​ ) / n ;

μ 0 \mu_0 μ 0 ​ is the mean postulated in H 0 \mathrm H_0 H 0 ​ ;

n n n is sample size; and

σ \sigma σ is the population standard deviation.

In what follows, the uppercase Z Z Z stands for the test statistic (treated as a random variable), while the lowercase z z z will denote an actual value of Z Z Z , computed for a given sample drawn from N(μ,σ²).

If H 0 \mathrm H_0 H 0 ​ holds, then the sum S n = x 1 + . . . + x n S_n = x_1 + ... + x_n S n ​ = x 1 ​ + ... + x n ​ follows the normal distribution, with mean n μ 0 n \mu_0 n μ 0 ​ and variance n 2 σ n^2 \sigma n 2 σ . As Z Z Z is the standardization (z-score) of S n / n S_n/n S n ​ / n , we can conclude that the test statistic Z Z Z follows the standard normal distribution N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , provided that H 0 \mathrm H_0 H 0 ​ is true. By the way, we have the z-score calculator if you want to focus on this value alone.

If our data does not follow a normal distribution, or if the population standard deviation is unknown (and thus in the formula for Z Z Z we substitute the population standard deviation σ \sigma σ with sample standard deviation), then the test statistics Z Z Z is not necessarily normal. However, if the sample is sufficiently large, then the central limit theorem guarantees that Z Z Z is approximately N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) .

In the sections below, we will explain to you how to use the value of the test statistic, z z z , to make a decision , whether or not you should reject the null hypothesis . Two approaches can be used in order to arrive at that decision: the p-value approach, and critical value approach - and we cover both of them! Which one should you use? In the past, the critical value approach was more popular because it was difficult to calculate p-value from Z-test. However, with help of modern computers, we can do it fairly easily, and with decent precision. In general, you are strongly advised to report the p-value of your tests!

p-value from Z-test

Formally, the p-value is the smallest level of significance at which the null hypothesis could be rejected. More intuitively, p-value answers the questions: provided that I live in a world where the null hypothesis holds, how probable is it that the value of the test statistic will be at least as extreme as the z z z - value I've got for my sample? Hence, a small p-value means that your result is very improbable under the null hypothesis, and so there is strong evidence against the null hypothesis - the smaller the p-value, the stronger the evidence.

To find the p-value, you have to calculate the probability that the test statistic, Z Z Z , is at least as extreme as the value we've actually observed, z z z , provided that the null hypothesis is true. (The probability of an event calculated under the assumption that H 0 \mathrm H_0 H 0 ​ is true will be denoted as P r ( event ∣ H 0 ) \small \mathrm{Pr}(\text{event} | \mathrm{H_0}) Pr ( event ∣ H 0 ​ ) .) It is the alternative hypothesis which determines what more extreme means :

  • Two-tailed Z-test: extreme values are those whose absolute value exceeds ∣ z ∣ |z| ∣ z ∣ , so those smaller than − ∣ z ∣ -|z| − ∣ z ∣ or greater than ∣ z ∣ |z| ∣ z ∣ . Therefore, we have:

The symmetry of the normal distribution gives:

  • Left-tailed Z-test: extreme values are those smaller than z z z , so
  • Right-tailed Z-test: extreme values are those greater than z z z , so

To compute these probabilities, we can use the cumulative distribution function, (cdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , which for a real number, x x x , is defined as:

Also, p-values can be nicely depicted as the area under the probability density function (pdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , due to:

Two-tailed Z-test and one-tailed Z-test

With all the knowledge you've got from the previous section, you're ready to learn about Z-tests.

  • Two-tailed Z-test:

From the fact that Φ ( − z ) = 1 − Φ ( z ) \Phi(-z) = 1 - \Phi(z) Φ ( − z ) = 1 − Φ ( z ) , we deduce that

The p-value is the area under the probability distribution function (pdf) both to the left of − ∣ z ∣ -|z| − ∣ z ∣ , and to the right of ∣ z ∣ |z| ∣ z ∣ :

two-tailed p value

  • Left-tailed Z-test:

The p-value is the area under the pdf to the left of our z z z :

left-tailed p value

  • Right-tailed Z-test:

The p-value is the area under the pdf to the right of z z z :

right-tailed p value

The decision as to whether or not you should reject the null hypothesis can be now made at any significance level, α \alpha α , you desire!

if the p-value is less than, or equal to, α \alpha α , the null hypothesis is rejected at this significance level; and

if the p-value is greater than α \alpha α , then there is not enough evidence to reject the null hypothesis at this significance level.

Z-test critical values & critical regions

The critical value approach involves comparing the value of the test statistic obtained for our sample, z z z , to the so-called critical values . These values constitute the boundaries of regions where the test statistic is highly improbable to lie . Those regions are often referred to as the critical regions , or rejection regions . The decision of whether or not you should reject the null hypothesis is then based on whether or not our z z z belongs to the critical region.

The critical regions depend on a significance level, α \alpha α , of the test, and on the alternative hypothesis. The choice of α \alpha α is arbitrary; in practice, the values of 0.1, 0.05, or 0.01 are most commonly used as α \alpha α .

Once we agree on the value of α \alpha α , we can easily determine the critical regions of the Z-test:

To decide the fate of H 0 \mathrm H_0 H 0 ​ , check whether or not your z z z falls in the critical region:

If yes, then reject H 0 \mathrm H_0 H 0 ​ and accept H 1 \mathrm H_1 H 1 ​ ; and

If no, then there is not enough evidence to reject H 0 \mathrm H_0 H 0 ​ .

As you see, the formulae for the critical values of Z-tests involve the inverse, Φ − 1 \Phi^{-1} Φ − 1 , of the cumulative distribution function (cdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) .

How to use the one-sample Z-test calculator?

Our calculator reduces all the complicated steps:

Choose the alternative hypothesis: two-tailed or left/right-tailed.

In our Z-test calculator, you can decide whether to use the p-value or critical regions approach. In the latter case, set the significance level, α \alpha α .

Enter the value of the test statistic, z z z . If you don't know it, then you can enter some data that will allow us to calculate your z z z for you:

  • sample mean x ˉ \bar x x ˉ (If you have raw data, go to the average calculator to determine the mean);
  • tested mean μ 0 \mu_0 μ 0 ​ ;
  • sample size n n n ; and
  • population standard deviation σ \sigma σ (or sample standard deviation if your sample is large).

Results appear immediately below the calculator.

If you want to find z z z based on p-value , please remember that in the case of two-tailed tests there are two possible values of z z z : one positive and one negative, and they are opposite numbers. This Z-test calculator returns the positive value in such a case. In order to find the other possible value of z z z for a given p-value, just take the number opposite to the value of z z z displayed by the calculator.

Z-test examples

To make sure that you've fully understood the essence of Z-test, let's go through some examples:

  • A bottle filling machine follows a normal distribution. Its standard deviation, as declared by the manufacturer, is equal to 30 ml. A juice seller claims that the volume poured in each bottle is, on average, one liter, i.e., 1000 ml, but we suspect that in fact the average volume is smaller than that...

Formally, the hypotheses that we set are the following:

H 0  ⁣ :   μ = 1000  ml \mathrm H_0 \! : \mu = 1000 \text{ ml} H 0 ​ :   μ = 1000  ml

H 1  ⁣ :   μ < 1000  ml \mathrm H_1 \! : \mu \lt 1000 \text{ ml} H 1 ​ :   μ < 1000  ml

We went to a shop and bought a sample of 9 bottles. After carefully measuring the volume of juice in each bottle, we've obtained the following sample (in milliliters):

1020 , 970 , 1000 , 980 , 1010 , 930 , 950 , 980 , 980 \small 1020, 970, 1000, 980, 1010, 930, 950, 980, 980 1020 , 970 , 1000 , 980 , 1010 , 930 , 950 , 980 , 980 .

Sample size: n = 9 n = 9 n = 9 ;

Sample mean: x ˉ = 980   m l \bar x = 980 \ \mathrm{ml} x ˉ = 980   ml ;

Population standard deviation: σ = 30   m l \sigma = 30 \ \mathrm{ml} σ = 30   ml ;

And, therefore, p-value = Φ ( − 2 ) ≈ 0.0228 \text{p-value} = \Phi(-2) \approx 0.0228 p-value = Φ ( − 2 ) ≈ 0.0228 .

As 0.0228 < 0.05 0.0228 \lt 0.05 0.0228 < 0.05 , we conclude that our suspicions aren't groundless; at the most common significance level, 0.05, we would reject the producer's claim, H 0 \mathrm H_0 H 0 ​ , and accept the alternative hypothesis, H 1 \mathrm H_1 H 1 ​ .

We tossed a coin 50 times. We got 20 tails and 30 heads. Is there sufficient evidence to claim that the coin is biased?

Clearly, our data follows Bernoulli distribution, with some success probability p p p and variance σ 2 = p ( 1 − p ) \sigma^2 = p (1-p) σ 2 = p ( 1 − p ) . However, the sample is large, so we can safely perform a Z-test. We adopt the convention that getting tails is a success.

Let us state the null and alternative hypotheses:

H 0  ⁣ :   p = 0.5 \mathrm H_0 \! : p = 0.5 H 0 ​ :   p = 0.5 (the coin is fair - the probability of tails is 0.5 0.5 0.5 )

H 1  ⁣ :   p ≠ 0.5 \mathrm H_1 \! : p \ne 0.5 H 1 ​ :   p  = 0.5 (the coin is biased - the probability of tails differs from 0.5 0.5 0.5 )

In our sample we have 20 successes (denoted by ones) and 30 failures (denoted by zeros), so:

Sample size n = 50 n = 50 n = 50 ;

Sample mean x ˉ = 20 / 50 = 0.4 \bar x = 20/50 = 0.4 x ˉ = 20/50 = 0.4 ;

Population standard deviation is given by σ = 0.5 × 0.5 \sigma = \sqrt{0.5 \times 0.5} σ = 0.5 × 0.5 ​ (because 0.5 0.5 0.5 is the proportion p p p hypothesized in H 0 \mathrm H_0 H 0 ​ ). Hence, σ = 0.5 \sigma = 0.5 σ = 0.5 ;

  • And, therefore

Since 0.1573 > 0.1 0.1573 \gt 0.1 0.1573 > 0.1 we don't have enough evidence to reject the claim that the coin is fair , even at such a large significance level as 0.1 0.1 0.1 . In that case, you may safely toss it to your Witcher or use the coin flip probability calculator to find your chances of getting, e.g., 10 heads in a row (which are extremely low!).

What is the difference between Z-test vs t-test?

We use a t-test for testing the population mean of a normally distributed dataset which had an unknown population standard deviation . We get this by replacing the population standard deviation in the Z-test statistic formula by the sample standard deviation, which means that this new test statistic follows (provided that H₀ holds) the t-Student distribution with n-1 degrees of freedom instead of N(0,1) .

When should I use t-test over the Z-test?

For large samples, the t-Student distribution with n degrees of freedom approaches the N(0,1). Hence, as long as there are a sufficient number of data points (at least 30), it does not really matter whether you use the Z-test or the t-test, since the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test instead of Z-test .

How do I calculate the Z test statistic?

To calculate the Z test statistic:

  • Compute the arithmetic mean of your sample .
  • From this mean subtract the mean postulated in null hypothesis .
  • Multiply by the square root of size sample .
  • Divide by the population standard deviation .
  • That's it, you've just computed the Z test statistic!

Here, we perform a Z-test for population mean μ. Null hypothesis H₀: μ = μ₀.

Alternative hypothesis H₁

Significance level α

The probability that we reject the true hypothesis H₀ (type I error).

StatAnalytica

Step-by-step guide to hypothesis testing in statistics

hypothesis testing in statistics

Hypothesis testing in statistics helps us use data to make informed decisions. It starts with an assumption or guess about a group or population—something we believe might be true. We then collect sample data to check if there is enough evidence to support or reject that guess. This method is useful in many fields, like science, business, and healthcare, where decisions need to be based on facts.

Learning how to do hypothesis testing in statistics step-by-step can help you better understand data and make smarter choices, even when things are uncertain. This guide will take you through each step, from creating your hypothesis to making sense of the results, so you can see how it works in practical situations.

What is Hypothesis Testing?

Table of Contents

Hypothesis testing is a method for determining whether data supports a certain idea or assumption about a larger group. It starts by making a guess, like an average or a proportion, and then uses a small sample of data to see if that guess seems true or not.

For example, if a company wants to know if its new product is more popular than its old one, it can use hypothesis testing. They start with a statement like “The new product is not more popular than the old one” (this is the null hypothesis) and compare it with “The new product is more popular” (this is the alternative hypothesis). Then, they look at customer feedback to see if there’s enough evidence to reject the first statement and support the second one.

Simply put, hypothesis testing is a way to use data to help make decisions and understand what the data is really telling us, even when we don’t have all the answers.

Importance Of Hypothesis Testing In Decision-Making And Data Analysis

Hypothesis testing is important because it helps us make smart choices and understand data better. Here’s why it’s useful:

  • Reduces Guesswork : It helps us see if our guesses or ideas are likely correct, even when we don’t have all the details.
  • Uses Real Data : Instead of just guessing, it checks if our ideas match up with real data, which makes our decisions more reliable.
  • Avoids Errors : It helps us avoid mistakes by carefully checking if our ideas are right so we don’t make costly errors.
  • Shows What to Do Next : It tells us if our ideas work or not, helping us decide whether to keep, change, or drop something. For example, a company might test a new ad and decide what to do based on the results.
  • Confirms Research Findings : It makes sure that research results are accurate and not just random chance so that we can trust the findings.

Here’s a simple guide to understanding hypothesis testing, with an example:

1. Set Up Your Hypotheses

Explanation: Start by defining two statements:

  • Null Hypothesis (H0): This is the idea that there is no change or effect. It’s what you assume is true.
  • Alternative Hypothesis (H1): This is what you want to test. It suggests there is a change or effect.

Example: Suppose a company says their new batteries last an average of 500 hours. To check this:

  • Null Hypothesis (H0): The average battery life is 500 hours.
  • Alternative Hypothesis (H1): The average battery life is not 500 hours.

2. Choose the Test

Explanation: Pick a statistical test that fits your data and your hypotheses. Different tests are used for various kinds of data.

Example: Since you’re comparing the average battery life, you use a one-sample t-test .

3. Set the Significance Level

Explanation: Decide how much risk you’re willing to take if you make a wrong decision. This is called the significance level, often set at 0.05 or 5%.

Example: You choose a significance level of 0.05, meaning you’re okay with a 5% chance of being wrong.

4. Gather and Analyze Data

Explanation: Collect your data and perform the test. Calculate the test statistic to see how far your sample result is from what you assumed.

Example: You test 30 batteries and find they last an average of 485 hours. You then calculate how this average compares to the claimed 500 hours using the t-test.

5. Find the p-Value

Explanation: The p-value tells you the probability of getting a result as extreme as yours if the null hypothesis is true.

Example: You find a p-value of 0.0001. This means there’s a very small chance (0.01%) of getting an average battery life of 485 hours or less if the true average is 500 hours.

6. Make Your Decision

Explanation: Compare the p-value to your significance level. If the p-value is smaller, you reject the null hypothesis. If it’s larger, you do not reject it.

Example: Since 0.0001 is much less than 0.05, you reject the null hypothesis. This means the data suggests the average battery life is different from 500 hours.

7. Report Your Findings

Explanation: Summarize what the results mean. State whether you rejected the null hypothesis and what that implies.

Example: You conclude that the average battery life is likely different from 500 hours. This suggests the company’s claim might not be accurate.

Hypothesis testing is a way to use data to check if your guesses or assumptions are likely true. By following these steps—setting up your hypotheses, choosing the right test, deciding on a significance level, analyzing your data, finding the p-value, making a decision, and reporting results—you can determine if your data supports or challenges your initial idea.

Understanding Hypothesis Testing: A Simple Explanation

Hypothesis testing is a way to use data to make decisions. Here’s a straightforward guide:

1. What is the Null and Alternative Hypotheses?

  • Null Hypothesis (H0): This is your starting assumption. It says that nothing has changed or that there is no effect. It’s what you assume to be true until your data shows otherwise. Example: If a company says their batteries last 500 hours, the null hypothesis is: “The average battery life is 500 hours.” This means you think the claim is correct unless you find evidence to prove otherwise.
  • Alternative Hypothesis (H1): This is what you want to find out. It suggests that there is an effect or a difference. It’s what you are testing to see if it might be true. Example: To test the company’s claim, you might say: “The average battery life is not 500 hours.” This means you think the average battery life might be different from what the company says.

2. One-Tailed vs. Two-Tailed Tests

  • One-Tailed Test: This test checks for an effect in only one direction. You use it when you’re only interested in finding out if something is either more or less than a specific value. Example: If you think the battery lasts longer than 500 hours, you would use a one-tailed test to see if the battery life is significantly more than 500 hours.
  • Two-Tailed Test: This test checks for an effect in both directions. Use this when you want to see if something is different from a specific value, whether it’s more or less. Example: If you want to see if the battery life is different from 500 hours, whether it’s more or less, you would use a two-tailed test. This checks for any significant difference, regardless of the direction.

3. Common Misunderstandings

  • Clarification: Hypothesis testing doesn’t prove that the null hypothesis is true. It just helps you decide if you should reject it. If there isn’t enough evidence against it, you don’t reject it, but that doesn’t mean it’s definitely true.
  • Clarification: A small p-value shows that your data is unlikely if the null hypothesis is true. It suggests that the alternative hypothesis might be right, but it doesn’t prove the null hypothesis is false.
  • Clarification: The significance level (alpha) is a set threshold, like 0.05, that helps you decide how much risk you’re willing to take for making a wrong decision. It should be chosen carefully, not randomly.
  • Clarification: Hypothesis testing helps you make decisions based on data, but it doesn’t guarantee your results are correct. The quality of your data and the right choice of test affect how reliable your results are.

Benefits and Limitations of Hypothesis Testing

  • Clear Decisions: Hypothesis testing helps you make clear decisions based on data. It shows whether the evidence supports or goes against your initial idea.
  • Objective Analysis: It relies on data rather than personal opinions, so your decisions are based on facts rather than feelings.
  • Concrete Numbers: You get specific numbers, like p-values, to understand how strong the evidence is against your idea.
  • Control Risk: You can set a risk level (alpha level) to manage the chance of making an error, which helps avoid incorrect conclusions.
  • Widely Used: It can be used in many areas, from science and business to social studies and engineering, making it a versatile tool.

Limitations

  • Sample Size Matters: The results can be affected by the size of the sample. Small samples might give unreliable results, while large samples might find differences that aren’t meaningful in real life.
  • Risk of Misinterpretation: A small p-value means the results are unlikely if the null hypothesis is true, but it doesn’t show how important the effect is.
  • Needs Assumptions: Hypothesis testing requires certain conditions, like data being normally distributed . If these aren’t met, the results might not be accurate.
  • Simple Decisions: It often results in a basic yes or no decision without giving detailed information about the size or impact of the effect.
  • Can Be Misused: Sometimes, people misuse hypothesis testing, tweaking data to get a desired result or focusing only on whether the result is statistically significant.
  • No Absolute Proof: Hypothesis testing doesn’t prove that your hypothesis is true. It only helps you decide if there’s enough evidence to reject the null hypothesis, so the conclusions are based on likelihood, not certainty.

Final Thoughts 

Hypothesis testing helps you make decisions based on data. It involves setting up your initial idea, picking a significance level, doing the test, and looking at the results. By following these steps, you can make sure your conclusions are based on solid information, not just guesses.

This approach lets you see if the evidence supports or contradicts your initial idea, helping you make better decisions. But remember that hypothesis testing isn’t perfect. Things like sample size and assumptions can affect the results, so it’s important to be aware of these limitations.

In simple terms, using a step-by-step guide for hypothesis testing is a great way to better understand your data. Follow the steps carefully and keep in mind the method’s limits.

What is the difference between one-tailed and two-tailed tests?

 A one-tailed test assesses the probability of the observed data in one direction (either greater than or less than a certain value). In contrast, a two-tailed test looks at both directions (greater than and less than) to detect any significant deviation from the null hypothesis.

How do you choose the appropriate test for hypothesis testing?

The choice of test depends on the type of data you have and the hypotheses you are testing. Common tests include t-tests, chi-square tests, and ANOVA. You get more details about ANOVA, you may read Complete Details on What is ANOVA in Statistics ?  It’s important to match the test to the data characteristics and the research question.

What is the role of sample size in hypothesis testing?  

Sample size affects the reliability of hypothesis testing. Larger samples provide more reliable estimates and can detect smaller effects, while smaller samples may lead to less accurate results and reduced power.

Can hypothesis testing prove that a hypothesis is true?  

Hypothesis testing cannot prove that a hypothesis is true. It can only provide evidence to support or reject the null hypothesis. A result can indicate whether the data is consistent with the null hypothesis or not, but it does not prove the alternative hypothesis with certainty.

Related Posts

how-to-find-the=best-online-statistics-homework-help

How to Find the Best Online Statistics Homework Help

why-spss-homework-help-is-an-important-aspects-for-students

Why SPSS Homework Help Is An Important aspect for Students?

Leave a comment cancel reply.

Your email address will not be published. Required fields are marked *

Z-Test for Statistical Hypothesis Testing Explained

The Z-test is a statistical hypothesis test that determines where the distribution of the statistic we are measuring, like the mean, is part of the normal distribution.

Egor Howell

The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean , is part of the normal distribution .

There are multiple types of Z-tests, however, we’ll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference between the mean of a sample and the mean of a population is statistically significant.

What Is a Z-Test?

A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution.  

The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations’ mean.

Z-tests are the most common statistical tests conducted in fields such as healthcare and data science . Therefore, it’s an essential concept to understand.

Requirements for a Z-Test

In order to conduct a Z-test, your statistics need to meet a few requirements, including:

  • A Sample size that’s greater than 30. This is because we want to ensure our sample mean comes from a distribution that is normal. As stated by the c entral limit theorem , any distribution can be approximated as normally distributed if it contains more than 30 data points.
  • The standard deviation and mean of the population is known .
  • The sample data is collected/acquired randomly .

More on Data Science:   What Is Bootstrapping Statistics?

Z-Test Steps

There are four steps to complete a Z-test. Let’s examine each one.

4 Steps to a Z-Test

  • State the null hypothesis.
  • State the alternate hypothesis.
  • Choose your critical value.
  • Calculate your Z-test statistics. 

1. State the Null Hypothesis

The first step in a Z-test is to state the null hypothesis, H_0 . This what you believe to be true from the population, which could be the mean of the population, μ_0 :

2. State the Alternate Hypothesis

Next, state the alternate hypothesis, H_1 . This is what you observe from your sample. If the sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:

3. Choose Your Critical Value

Then, choose your critical value, α , which determines whether you accept or reject the null hypothesis. Typically for a Z-test we would use a statistical significance of 5 percent which is z = +/- 1.96 standard deviations from the population’s mean in the normal distribution:

This critical value is based on confidence intervals.

4. Calculate Your Z-Test Statistic

Compute the Z-test Statistic using the sample mean, μ_1 , the population mean, μ_0 , the number of data points in the sample, n and the population’s standard deviation, σ :

If the test statistic is greater (or lower depending on the test we are conducting) than the critical value, then the alternate hypothesis is true because the sample’s mean is statistically significant enough from the population mean.

Another way to think about this is if the sample mean is so far away from the population mean, the alternate hypothesis has to be true or the sample is a complete anomaly.

More on Data Science: Basic Probability Theory and Statistics Terms to Know

Z-Test Example

Let’s go through an example to fully understand the one-sample mean Z-test.

A school says that its pupils are, on average, smarter than other schools. It takes a sample of 50 students whose average IQ measures to be 110. The population, or the rest of the schools, has an average IQ of 100 and standard deviation of 20. Is the school’s claim correct?

The null and alternate hypotheses are:

Where we are saying that our sample, the school, has a higher mean IQ than the population mean.

Now, this is what’s called a right-sided, one-tailed test as our sample mean is greater than the population’s mean. So, choosing a critical value of 5 percent, which equals a Z-score of 1.96 , we can only reject the null hypothesis if our Z-test statistic is greater than 1.96.

If the school claimed its students’ IQs were an average of 90, then we would use a left-tailed test, as shown in the figure above. We would then only reject the null hypothesis if our Z-test statistic is less than -1.96.

Computing our Z-test statistic, we see:

Therefore, we have sufficient evidence to reject the null hypothesis, and the school’s claim is right.

Hope you enjoyed this article on Z-tests. In this post, we only addressed the most simple case, the one-sample mean test. However, there are other types of tests, but they all follow the same process just with some small nuances.  

Recent Data Science Articles

Python: How to List Files in Directory

Z Test: Definition & Two Proportion Z-Test

What is a z test.

z test

For example, if someone said they had found a new drug that cures cancer, you would want to be sure it was probably true. A hypothesis test will tell you if it’s probably true, or probably not true. A Z test, is used when your data is approximately normally distributed (i.e. the data has the shape of a bell curve when you graph it).

When you can run a Z Test.

Several different types of tests are used in statistics (i.e. f test , chi square test , t test ). You would use a Z test if:

  • Your sample size is greater than 30 . Otherwise, use a t test .
  • Data points should be independent from each other. In other words, one data point isn’t related or doesn’t affect another data point.
  • Your data should be normally distributed . However, for large sample sizes (over 30) this doesn’t always matter.
  • Your data should be randomly selected from a population, where each item has an equal chance of being selected.
  • Sample sizes should be equal if at all possible.

How do I run a Z Test?

Running a Z test on your data requires five steps:

  • State the null hypothesis and alternate hypothesis .
  • Choose an alpha level .
  • Find the critical value of z in a z table .
  • Calculate the z test statistic (see below).
  • Compare the test statistic to the critical z value and decide if you should support or reject the null hypothesis .

You could perform all these steps by hand. For example, you could find a critical value by hand , or calculate a z value by hand . For a step by step example, watch the following video: Watch the video for an example:

z score hypothesis testing examples

Can’t see the video? Click here to watch it on YouTube. You could also use technology, for example:

  • Two sample z test in Excel .
  • Find a critical z value on the TI 83 .
  • Find a critical value on the TI 89 (left-tail) .

Two Proportion Z-Test

Watch the video to see a two proportion z-test:

z score hypothesis testing examples

Can’t see the video? Click here to watch it on YouTube.

A Two Proportion Z-Test (or Z-interval) allows you to calculate the true difference in proportions of two independent groups to a given confidence interval .

There are a few familiar conditions that need to be met for the Two Proportion Z-Interval to be valid.

  • The groups must be independent. Subjects can be in one group or the other, but not both – like teens and adults.
  • The data must be selected randomly and independently from a homogenous population. A survey is a common example.
  • The population should be at least ten times bigger than the sample size. If the population is teenagers for example, there should be at least ten times as many total teenagers as the number of teenagers being surveyed.
  • The null hypothesis (H 0 ) for the test is that the proportions are the same.
  • The alternate hypothesis (H 1 ) is that the proportions are not the same.

Example question: let’s say you’re testing two flu drugs A and B. Drug A works on 41 people out of a sample of 195. Drug B works on 351 people in a sample of 605. Are the two drugs comparable? Use a 5% alpha level .

Step 1: Find the two proportions:

  • P 1 = 41/195 = 0.21 (that’s 21%)
  • P 2 = 351/605 = 0.58 (that’s 58%).

Set these numbers aside for a moment.

Step 2: Find the overall sample proportion . The numerator will be the total number of “positive” results for the two samples and the denominator is the total number of people in the two samples.

  • p = (41 + 351) / (195 + 605) = 0.49.

Set this number aside for a moment.

two-proprtion-z-test

Solving the formula, we get: Z = 8.99

We need to find out if the z-score falls into the “ rejection region .”

z alpha

Step 5: Compare the calculated z-score from Step 3 with the table z-score from Step 4. If the calculated z-score is larger, you can reject the null hypothesis.

8.99 > 1.96, so we can reject the null hypothesis .

Example 2:  Suppose that in a survey of 700 women and 700 men, 35% of women and 30% of men indicated that they support a particular presidential candidate. Let’s say we wanted to find the true difference in proportions of these two groups to a 95% confidence interval .

At first glance the survey indicates that women support the candidate more than men by about 5% . However, for this statistical inference to be valid we need to construct a range of values to a given confidence interval.

To do this, we use the formula for Two Proportion Z-Interval:

z score hypothesis testing examples

Plugging in values we find the true difference in proportions to be

z score hypothesis testing examples

Based on the results of the survey, we are 95% confident that the difference in proportions of women and men that support the presidential candidate is between about 0 % and 10% .

Check out our YouTube channel for more stats help and tips!

Introduction to Statistics and Data Analysis

Chapter 6 hypothesis testing: the z-test.

We’ve all had the experience of standing at a crosswalk waiting staring at a pedestrian traffic light showing the little red man. You’re waiting for the little green man so you can cross. After a little while you’re still waiting and there aren’t any cars around. You might think ‘this light is really taking a long time’, but you continue waiting. Minutes pass and there’s still no little green man. At some point you come to the conclusion that the light is broken and you’ll never see that little green man. You cross on the little red man when it’s clear.

You may not have known this but you just conducted a hypothesis test. When you arrived at the crosswalk, you assumed that the light was functioning properly, although you will always entertain the possibility that it’s broken. In terms of hypothesis testing, your ‘null hypothesis’ is that the light is working and your ‘alternative hypothesis’ is that it’s broken. As time passes, it seems less and less likely that light is working properly. Eventually, the probability of the light working given how long you’ve been waiting becomes so low that you reject the null hypothesis in favor of the alternative hypothesis.

This sort of reasoning is the backbone of hypothesis testing and inferential statistics. It’s also the point in the course where we turn the corner from descriptive statistics to inferential statistics. Rather than describing our data in terms of means and plots, we will now start using our data to make inferences, or generalizations, about the population that our samples are drawn from. In this course we’ll focus on standard hypothesis testing where we set up a null hypothesis and determine the probability of our observed data under the assumption that the null hypothesis is true (the much maligned p-value). If this probability is small enough, then we conclude that our data suggests that the null hypothesis is false, so we reject it.

In this chapter, we’ll introduce hypothesis testing with examples from a ‘z-test’, when we’re comparing a single mean to what we’d expect from a population with known mean and standard deviation. In this case, we can convert our observed mean into a z-score for the standard normal distribution. Hence the name z-test.

It’s time to introduce the hypothesis test flow chart . It’s pretty self explanatory, even if you’re not familiar with all of these hypothesis tests. The z-test is (1) based on means, (2) with only one mean, and (3) where we know \(\sigma\) , the standard deviation of the population. Here’s how to find the z-test in the flow chart:

z score hypothesis testing examples

6.1 Women’s height example

Let’s work with the example from the end of the last chapter where we started with the fact that the heights of US women has a mean of 63 and a standard deviation of 2.5 inches. We calculated that the average height of the 122 women in Psych 315 is 64.7 inches. We then used the central limit theorem and calculated the probability of a random sample 122 heights from this population having a mean of 64.7 or greater is 2.4868996^{-14}. This is a very, very small number.

Here’s how we do it using R:

Let’s think of our sample as a random sample of UW psychology students, which is a reasonable assumption since all psychology students have to take a statistics class. What does this sample say about the psychology students that are women at UW compared to the US population? It could be that these psychology students at UW have the same mean and standard deviation as the US population, but our sample just happens to have an unusual number of tall women, but we calculated that the probability of this happening is really low. Instead, it makes more sense that the population that we’re drawing from has a mean that’s greater than the US population mean. Notice that we’re making a conclusion about the whole population of women psychology students based on our one sample.

Using the terminology of hypothesis testing, we first assumed the null hypothesis that UW women psych students have the same mean (and standard deviation) as the US population. The null hypothesis is written as:

\[ H_{0}: \mu = 63 \] In this example, our alternative hypothesis is that the mean of our population is larger than the mean of null hypothesis population. We write this as:

\[ H_{A}: \mu > 63 \]

Next, after obtaining a random sample and calculate the mean, we calculate the probability of drawing a mean this large (or larger) from the null hypothesis distribution.

If this probability is low enough, we reject the null hypothesis in favor of the alternative hypothesis. When our probability allows us to reject the null hypothesis, we say that our observed results are ‘statistically significant’.

In statistics terms, we never say we ‘accept that alternative hypothesis’ as true. All we can say is that we don’t think the null hypothesis is true. I know it’s subtle, but in science can never prove that a hypothesis is true or not. There’s always the possibility that we just happened to grab an unusual sample from the null hypothesis distribution.

6.2 The hated p<.05

The probability that we obtain our observed mean or greater given that the null hypothesis is true is called the p-value. How improbable is improbable enough to reject the null hypothesis? The p-value for our example above on women’s heights is astronomically low, so it’s clear that we should reject \(H_{0}\) .

The p-value that’s on the border of rejection is called the alpha ( \(\alpha\) ) value. We reject \(H_{0}\) when our p-value is less than \(\alpha\) .

You probably know that the most common value of alpha is \(\alpha = .05\) .

The first publication of this value dates back to Sir Ronald Fisher, in his seminal 1925 book Statistical Methods for Research Workers where he states:

“It is convenient to take this point as a limit in judging whether a deviation is considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant.” (p. 47)

If you read the chapter on the normal distribution, then you should know that 95% of the area under the normal distribution lies within \(\pm\) two standard deviations of the mean. So the probability of obtaining a sample that exceeds two standard deviations from the mean (in either direction) is .05.

6.3 IQ example

Let’s do an example using IQ scores. IQ scores are normalized to have a mean of 100 and a standard deviation of 15 points. Because they’re normalized, they are a rare example of a population which has a known mean and standard deviation. In the next chapter we’ll discuss the t-test, which is used in the more common situation when we don’t know the population standard deviation.

Suppose you have the suspicion that graduate students have higher IQ’s than the general population. You have enough time to go and measure the IQ’s of 25 randomly sampled grad students and obtain a mean of 105. Is this difference between our this observed mean and 100 statistically significant using an alpha value of \(\alpha = 0.05\) ?

Here the null hypothesis is:

\[ H_{0}: \mu = 100\]

And the alternative hypothesis is:

\[ H_{A}: \mu > 100 \]

We know that the parameters for the null hypothesis are:

\[ \mu = 100 \] and \[ \sigma = 15 \]

From this, we can calculate the probability of observing our mean of 105 or higher using the central limit theorem and what we know about the normal distribution:

\[ \sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}} = \frac{15}{\sqrt{25}} = 3 \] From this, we can calculate the probability of our observed mean using R’s ‘pnorm’ function. Here’s how to do the whole thing in R.

Since our p-value of 0.0478 is (just barely) less than our chosen value of \(\alpha = 0.05\) as our criterion, we reject \(H_{0}\) for this (contrived) example and conclude that our observed mean of 105 is significantly greater than 100, so our study suggests that the average graduate student has a higher IQ than the overall population.

You should feel uncomfortable making such a hard, binary decision for such a borderline case. After all, if we had chosen our second favorite value of alpha, \(\alpha = .01\) , we would have failed to reject \(H_{0}\) . This discomfort is a primary reason why statisticians are moving away from this discrete decision making process. Later on we’ll discuss where things are going, including reporting effect sizes, and using confidence intervals.

6.4 Alpha values vs. critical values

Using R’s qnorm function, we can find the z-score for which only 5% of the area lies above:

So the probability of a randomly sampled z-score exceeding 1.644854 is less than 5%. It follows that if we convert our observed mean into z-score values, we will reject \(H_{0}\) if and only if our z-score is greater than 1.644854. This value is called the ‘critical value’ because it lies on the boundary between rejecting and failing to reject \(H_{0}\) .

In our last example, the z-score for our observed mean is:

\[ z = \frac{X-\mu}{\frac{\sigma}{\sqrt{n}}} = \frac{105 - 100}{3} = 1.67 \] Our z-score is just barely greater than the critical value of 1.644854, which makes sense because our p-value is just barely less than 0.05.

Sometimes you’ll see textbooks will compare critical values to observed scores for the decision making process in hypothesis testing. This dates back to days were computers were less available and we had to rely on tables instead. There wasn’t enough space in a book to hold complete tables which prohibited the ability to look up a p-value for any observed value. Instead only critical values for specific values of alpha were included. If you look at really old papers, you’ll see statistics reported as \(p<.05\) or \(p<.01\) instead of actual p-values for this reason.

It may help to visualize the relationship between p-values, alpha values and critical values like this:

z score hypothesis testing examples

The red shaded region is the upper 5% of the standard normal distribution which starts at the critical value of z=1.644854. This is sometimes called the ‘rejection region’. The blue vertical line is drawn at our observed value of z=1.67. You can see that the red line falls just inside the rejection region, so we Reject \(H_{0}\) !

6.5 One vs. two-tailed tests

Recall that our alternative hypothesis was to reject if our mean IQ was significantly greater than the null hypothesis mean: \(H_{A}: \mu > 100\) . This implies that the situation where \(\mu < 100\) is never even in consideration, which is weird. In science, we’re trying to understand the true state of the world. Although we have a hunch that grad student IQ’s are higher than average, there is always the possibility that they are lower than average. If our sample came up with an IQ well below 100, we’d simply fail to reject \(H_{0}\) and move on. This feels like throwing out important information.

The test we just ran is called a ‘one-tailed’ test because we only reject \(H_{0}\) if our results fall in one of the two tails of the population distribution.

Instead, it might make more sense to reject \(H_{0}\) if we get either an unusually large or small score. This means we need two critical values - one above and one below zero. At first thought you might think we just duplicate our critical value from a one-tailed test to the other side. But will double the area of the rejection region. That’s not a good thing because if \(H_{0}\) is true, there’s actually a \(2\alpha\) probability that we’ll draw a score in the rejection region.

Instead, we divide the area into two tails, each containing an area of \(\frac{\alpha}{2}\) . For \(\alpha\) = 0.05, we can find the critical value of z with qnorm:

So with a two-tailed test at \(\alpha = 0.05\) we reject \(H_{0}\) if our observed z-score is either above z = 1.96 or less than -1.96. This is that value around 2 that Sir Ronald Fischer was talking about!

Here’s what the critical regions and observed value of z looks like for our example with a two-tailed test:

z score hypothesis testing examples

You can see that splitting the area of \(\alpha = 0.05\) into two halves increased the critical value in the positive direction from 1.64 to 1.96, making it harder to reject \(H_{0}\) . For our example, this changes our decision: our observed value of z = 1.67 no longer falls into the rejection region, so now we fail to reject \(H_{0}\) .

If we now fail to reject \(H_{0}\) , what about the p-value? Remember, for a one-tailed test, p = \(\alpha\) if our observed z-score lands right on the critical value of z. The same is true for a two-tailed test. But the z-score moved so that the area above that score is \(\frac{\alpha}{2}\) . So for a two-tailed test, in order to have a p-value of \(\alpha\) when our z-score lands right on the critical value, we need to double p-value hat we’d get for a one-tailed test.

For our example, the p-value for the one tailed test was \(p=0.0478\) . So if we use a two-tailed test, our p-value is \((2)(0.0478) = 0.0956\) . This value is greater than \(\alpha\) = 0.05, which makes sense because we just showed above that we fail to reject \(H_{0}\) with a two tailed test.

Which is the right test, one-tailed or two-tailed? Ideally, as scientists, we should be agnostic about the results of our experiment. But in reality, we all know that the results are more interesting if they are statistically significant. So you can imagine that for this example, given a choice between one and two-tailed, we’d choose a one-tailed test so that we can reject \(H_{0}\) .

There are two problems with this. First, we should never adjust our choice of hypothesis test after we observe the data. That would be an example of ‘p-hacking’, a topic we’ll discuss later. Second, most statisticians these days strongly recommend against one-tailed tests. The only reason for a one-tailed test is if there is no logical or physical possibility for a population mean to fall below the null hypothesis mean.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Z-score: Definition, Formula, and Uses

By Jim Frost 14 Comments

A z-score measures the distance between a data point and the mean using standard deviations. Z-scores can be positive or negative. The sign tells you whether the observation is above or below the mean. For example, a z-score of +2 indicates that the data point falls two standard deviations above the mean, while a -2 signifies it is two standard deviations below the mean. A z-score of zero equals the mean. Statisticians also refer to z-scores as standard scores , and I’ll use those terms interchangeably.

Standardizing the raw data by transforming them into z-scores provides the following benefits:

  • Understand where a data point fits into a distribution.
  • Compare observations between dissimilar variables.
  • Identify outliers
  • Calculate probabilities and percentiles using the standard normal distribution.

In this post, I cover all these uses for z-scores along with using z-tables, z-score calculators, and I show you how to do it all in Excel.

How to Find a Z-score

To calculate z-scores, take the raw measurements, subtract the mean, and divide by the standard deviation.

The formula for finding z-scores is the following:

Z-score formula.

X represents the data point of interest. Mu and sigma represent the mean and standard deviation for the population from which you drew your sample . Alternatively, use the sample mean and standard deviation when you do not know the population values.

Z-scores follow the distribution of the original data. Consequently, when the original data follow the normal distribution, so do the corresponding z-scores. Specifically, the z-scores follow the standard normal distribution, which has a mean of 0 and a standard deviation of 1. However, skewed data will produce z-scores that are similarly skewed.

In this post, I include graphs of z-scores using the standard normal distribution because they bring the concepts to life. Additionally, z-scores are most valuable when your data are normally distributed. However, be aware that when your data are nonnormal, the z-scores are also nonnormal, and the interpretations might not be valid.

Learn how Z-scores are an integral part of hypothesis testing with Z Tests !

Related posts : The Mean in Statistics and Standard Deviation

Using Z-scores to Understand How an Observation Fits into a Distribution

Z-scores help you understand where a specific observation falls within a distribution. Sometimes the raw test scores are not informative. For instance, SAT, ACT, and GRE scores do not have real-world interpretations on their own. An SAT score of 1340 is not fundamentally meaningful. Many psychological metrics are simply sums or averages of responses to a survey. For these cases, you need to know how an individual score compares to the entire distribution of scores. For example, if your standard score for any of these tests is a +2, that’s far above the mean. Now that’s helpful!

In other cases, the measurement units are meaningful, but you want to see the relative standing. For example, if a baby weighs five kilograms, you might wonder how her weight compares to others. For a one-month-old baby girl, that equates to a z-score of 0.74. She weighs more than average, but not by a full standard deviation. Now you understand where she fits in with her cohort!

In all these cases, you’re using standard scores to compare an observation to the average. You’re placing that value within an entire distribution.

When your data are normally distributed, you can graph z-scores on the standard normal distribution, which is a particular form of the normal distribution. The mean occurs at the peak with a z-score of zero. Above average z-scores are on the right half of the distribution and below average values are on the left. The graph below shows where the baby’s z-score of 0.74 fits in the population.

image of the standard normal distribution.

Analysts often convert standard scores to percentiles, which I cover later in this post.

Related post : Understanding the Normal Distribution

Using Standard Scores to Compare Different Types of Variables

Z-scores allow you to take data points drawn from populations with different means and standard deviations and place them on a common scale. This standard scale lets you compare observations for different types of variables that would otherwise be difficult. That’s why z-scores are also known as standard scores, and the process of transforming raw data to z-scores is called standardization. It lets you compare data points across variables that have different distributions.

In other words, you can compare apples to oranges. Isn’t statistics grand!

Imagine we literally need to compare apples to oranges. Specifically, we’ll compare their weights. We have a 110-gram apple and a 100-gram orange.

By comparing the raw values, it’s easy to see the apple weighs slightly more than the orange. However, let’s compare their z-scores. To do this, we need to know the means and standard deviations for the populations of apples and oranges. Assume that apples and oranges follow a normal distribution with the following properties:

Mean weight grams 100 140
Standard Deviation 15 25

Let’s calculate the Z-scores for our apple and orange!

Apple = (110-100) / 15 = 0.667

Orange = (100-140) / 25 = -1.6

The apple’s positive z-score (0.667) signifies that it is heavier than the average apple. It’s not an extreme value, but it is above the mean. Conversely, the orange has a markedly negative Z-score (-1.6). It’s well below the mean weight for oranges. I’ve positioned these standard scores in the standard normal distribution below.

Graph of a standard normal distribution that compares apples to oranges using a Z-score.

Our apple is a bit heavier than average, while the orange is puny! Using z-scores, we learned where each fruit falls within its distribution and how they compare.

Using Z-scores to Detect Outliers

Z-scores can quantify the unusualness of an observation. Raw data values that are far from the average are unusual and potential outliers. Consequently, we’re looking for high absolute z-scores.

The standard cutoff values for finding outliers are z-scores of +/-3 or more extreme. The standard normal distribution plot below displays the distribution of z-scores. Z-scores beyond the cutoff are so unusual you can hardly see the shading under the curve.

Distribution of Z-scores for finding outliers.

In populations that follow a normal distribution, Z-score values outside +/- 3 have a probability of 0.0027 (2 * 0.00135), approximately 1 in 370 observations. However, if your data don’t follow a normal distribution, this approach might not be correct.

For the example dataset, I display the raw data points and their z-scores. I circled an observation that is a potential outlier.

Datasheet that displays Z-scores to identify outliers.

Caution: Z-scores can be misleading in small datasets because the maximum z-score is limited to ( n −1) / √ n.

Samples with ten or fewer data points cannot have Z-scores that exceed the cutoff value of +/-3.

Additionally, an outlier’s presence throws off the z-scores because it inflates the mean and standard deviation. Notice how all z-scores are negative except the outlier’s value. If we calculated Z-scores without the outlier, they’d be different! If your dataset contains outliers, z-values appear to be less extreme (i.e., closer to zero).

Related post : Five Ways to Find Outliers

Using Z-tables to Calculate Probabilities and Percentiles

The standard normal distribution is a probability distribution. Consequently, if you have only the mean and standard deviation, and you can reasonably assume your data follow the normal distribution (at least approximately), you can easily use z-scores to calculate probabilities and percentiles. Typically, you’ll use online calculators, Excel, or statistical software for these calculations. We’ll get to that.

But first I’ll show you the old-fashioned way of doing that by hand using z-tables.

Let’s go back to the z-score for our apple (0.667) from before. We’ll use it to calculate its weight percentile. A percentile is the proportion of a population that falls below a value. Consequently, we need to find the area under the standard normal distribution curve corresponding to the range of z-scores less than 0.667. In the portion of the z-table below, I’ll use the standard score that is closest to our apple, which is 0.65.

Photograph shows a portion of a table of standard scores (Z-scores).

Click here for a full Z-table and illustrated instructions for using it !

Related post : Understanding Probability Distributions and Probability Fundamentals

The Nuts and Bolts of Using Z-tables

Using these tables to calculate probabilities requires that you understand the properties of the normal distribution. While the tables provide an answer, it might not be the answer you need. However, by applying your knowledge of the normal distribution, you can find your answer!

For example, the table indicates that the area of the curve between -0.65 and +0.65 is 48.43%. Unfortunately, that’s not what we want to know. We need to find the area that is less than a z-score of 0.65.

We know that the two halves of the normal distribution are symmetrical, which helps us solve our problem. The z-table tells us that the area for the range from -0.65 and +0.65 is 48.43%. Because of the symmetry, the interval from 0 to +0.65 must be half of that: 48.43/2 = 24.215%. Additionally, the area for all scores less than zero is half (50%) of the distribution.

Therefore, the area for all z-scores up to 0.65 = 50% + 24.215% = 74.215%

That’s how you convert standard scores to percentiles. Our apple is at approximately the 74 th percentile.

If you want to calculate the probability for values falling between ranges of standard scores, calculate the percentile for each z-score and then subtract them.

For example, the probability of a z-score between 0.40 and 0.65 equals the difference between the percentiles for z = 0.65 and z = 0.40. We calculated the percentile for z = 0.65 above (74.215%). Using the same method, the percentile for z = 0.40 is 65.540%. Now we subtract the percentiles.

74.215% – 65.540% = 8.675%

The probability of an observation having a z-score between 0.40 and 0.65 is 8.675%.

Using only simple math and a z-table, you can easily find the probabilities that you need!

Alternatively, use the Empirical Rule to find probabilities for values in a normal distribution using ranges based on standard deviations.

Related post : Percentiles: Interpretations and Calculations

Using Z-score Calculators

In this day and age, you’ll probably use software and online z-score calculators for these probability calculations. Statistical software produced the probability distribution plot below. It displays the apple’s percentile with a graphical representation of the area under the standard normal distribution curve. Graphing is a great way to get an intuitive feel for what you’re calculating using standard scores.

The percentile is a tad different because we used the z-score of 0.65 in the table while the software uses the more precise value of 0.667.

A probability distribution plot that graphically displays a percentile using a Z-score.

Alternatively, you can enter z-scores into calculators, like this one.

If you enter the z-score value of 0.667, the left-tail p-value matches the shaded region in the probability plot above (0.7476). The right-tail value (0.2524) equals all values above our z-score, which is equivalent to the unshaded region in the graph. Unsurprisingly, those values add to 1 because you’re covering the entire distribution.

How to Find Z-scores in Excel

You can calculate z-scores and their probabilities in Excel. Let’s work through an example. We’ll return to our apple example and start by calculating standard scores for values in a dataset. I have all the data and formulas in this Excel file: Z-scores .

To find z-scores using Excel, you’ll need to either calculate the sample mean and standard deviation or use population reference values. In this example, I use the sample estimates . If you need to use population values supplied to you, enter them into the spreadsheet rather than calculating them.

My apple weight data are in cells A2:A21.

To calculate the mean and standard deviation, I use the following Excel functions:

  • Mean: =AVERAGE(A2:A21)
  • Standard deviation (sample): =STDEV.S(A2:A21)

Then, in column B, I use the following Excel formula to calculate the z-scores:

=(A2-A$24)/A$26

Cell A24 is where I have the mean, and A26 has the standard deviation. This formula takes a data value in column A, subtracts the mean, and then divides by the standard deviation.

I copied that formula for all rows from B2:B21 and Excel displays z-scores for all data points.

Using Excel to Calculate Probabilities for Standard Scores

Next, I use Excel’s NORM.S.DIST function to calculate the probabilities associated with z-scores. I work with the standard score from our apple example, 0.667.

The NORM.S.DIST (Z, Cumulative) function provides either the cumulative distribution function (TRUE) or probability mass function (FALSE) for the z-score you specify. The probability mass function is the height value in the z-table earlier in this post, and it corresponds to the y-axis value on a probability distribution plot for the z-score. We’ll use the cumulative function, which calculates the cumulative probability for all z-scores less than the value we specify.

In the function, we need to specify the z-value (0.667) and use the TRUE parameter to obtain the cumulative probability.

I’ll enter the following:

= NORM.S.DIST(0.667,TRUE)

Excel displays 0.747613933, matching the output in the probability distribution plot above.

If you want to find the probability for values greater than the z-score, remember that the values above and below it must sum to 1. Therefore, subtract from 1 to calculate probabilities for larger values:

= 1 – NORM.S.DIST(0.667,TRUE)

Excel displays 0.252386067.

Here’s what my spreadsheet looks like.

Excel spreadsheet that calculates z-scores and uses them to find probabilities.

Share this:

z score hypothesis testing examples

Reader Interactions

' src=

July 10, 2024 at 10:50 am

Very well explained.

I had noticed Z-chart (control chart) in Minitab and never really thought about what it was for until I had a client that was making small runs (5 or 6 at a time) on a machine with different parts. They wanted to do SPC. Being flummoxed, I rooted around and found the solution: Z-chart! There are various options for how to tell Minitab to handle the variation, based on your subject matter knowledge. It’s a great tool for seeing the underlying process behavior rather than simply the part variation. I found it so useful, I added it to my LSS training reference.

' src=

September 25, 2023 at 9:42 am

This is what i needed to help my kid in AP Statistics. Thanks a ton!

' src=

September 25, 2023 at 5:02 pm

You’re very welcome, Devi. So glad it was helpful! 🙂

' src=

September 6, 2023 at 10:08 am

This is exactly what I needed! Thanks Jim

September 7, 2023 at 8:11 pm

You’re very welcome, Adele! So glad it was helpful!

' src=

November 4, 2022 at 12:21 am

Thanks for your work, your material make it so easy to understand. I recommand is book Regression analysis: An intuitive guide . It’s really helping.

November 4, 2022 at 12:50 am

Hi Ella, you’re very welcome and I’m so glad to hear that you’ve found it to be helpful!

' src=

June 2, 2022 at 12:05 am

Hi Jim. Can we use standardized factor scores (in principal component analysis) for regression analysis? If so, how should we interpret the results of regression analysis as standardized scores have mean = zero and SD = 1? Thanks,

June 2, 2022 at 10:59 pm

Hi, I think you might be mixing several things together.

Yes, you can use standardized scores independent variables in regression . Click the link to learn more about that, including how you interpret the results. However, that is different than principal components.

But you can also perform regression on principle components in a process known as partial least squares regression. Again, that’s a different procedure than using standardized scores. It’s a useful process when you have highly correlated predictors and/or more predictors than observations.

' src=

May 12, 2022 at 4:19 pm

How to get (n−1) / √ n in the statement below? “Caution: Z-scores can be misleading in small datasets because the maximum z-score is limited to (n−1) / √ n.”

May 12, 2022 at 5:11 pm

Hi Vincent,

In the outliers section of this post, click the link the to the Five Ways to Detect Outliers post. In it, I provide a reference for that.

' src=

April 2, 2022 at 12:57 pm

Thanks! Really helpful article. Keep up the good work.

' src=

September 13, 2021 at 7:39 am

' src=

September 13, 2021 at 1:55 am

Fruitful content!

Comments and Questions Cancel reply

Logo for Maricopa Open Digital Press

10 Chapter 10: Hypothesis Testing with Z

Setting up the hypotheses.

When setting up the hypotheses with z, the parameter is associated with a sample mean (in the previous chapter examples the parameters for the null used 0). Using z is an occasion in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the US, as our null value and test for differences against that. For now, we will focus on testing a value of a single mean against what we expect from the population.

Using birthweight as an example, our null hypothesis takes the form: H 0 : μ = 7.47 Notice that we are testing the value for μ, the population parameter, NOT the sample statistic ̅X (or M). We are referring to the data right now in raw form (we have not standardized it using z yet). Again, using inferential statistics, we are interested in understanding the population, drawing from our sample observations. For the research question, we have a mean value from the sample to use, we have specific data is – it is observed and used as a comparison for a set point.

As mentioned earlier, the alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. We will set the criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative.

If we expect our obtained sample mean to be above or below the null hypothesis value (knowing which direction), we set a directional hypothesis. O ur alternative hypothesis takes the form based on the research question itself. In our example with birthweight, this could be presented as H A : μ > 7.47 or H A : μ < 7.47. 

Note that we should only use a directional hypothesis if we have a good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative hypothesis. In our birthweight example, this could be set as H A : μ ≠ 7.47

In working with data for this course we will need to set a critical value of the test statistic for alpha (α) for use of test statistic tables in the back of the book. This is determining the critical rejection region that has a set critical value based on α.

Determining Critical Value from α

We set alpha (α) before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use.

When a research hypothesis predicts an effect but does not predict a direction for the effect, it is called a non-directional hypothesis . To test the significance of a non-directional hypothesis, we have to consider the possibility that the sample could be extreme at either tail of the comparison distribution. We call this a two-tailed test .

z score hypothesis testing examples

Figure 1. showing a 2-tail test for non-directional hypothesis for z for area C is the critical rejection region.

When a research hypothesis predicts a direction for the effect, it is called a directional hypothesis . To test the significance of a directional hypothesis, we have to consider the possibility that the sample could be extreme at one-tail of the comparison distribution. We call this a one-tailed test .

z score hypothesis testing examples

Figure 2. showing a 1-tail test for a directional hypothesis (predicting an increase) for z for area C is the critical rejection region.

Determining Cutoff Scores with Two-Tailed Tests

Typically we specify an α level before analyzing the data. If the data analysis results in a probability value below the α level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected. In other words, if our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis ; if not, we fail to reject the null (we never “accept” the null). According to this perspective, if a result is significant, then it does not matter how significant it is. Moreover, if it is not significant, then it does not matter how close to being significant it is. Therefore, if the 0.05 level is being used, then probability values of 0.049 and 0.001 are treated identically. Similarly, probability values of 0.06 and 0.34 are treated identically. Note we will discuss ways to address effect size (which is related to this challenge of NHST).

When setting the probability value, there is a special complication in a two-tailed test. We have to divide the significance percentage between the two tails. For example, with a 5% significance level, we reject the null hypothesis only if the sample is so extreme that it is in either the top 2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level of significance at a total of 5%. A one-tailed test does have such an extreme value but with a one-tailed test only one side of the distribution is considered.

z score hypothesis testing examples

Figure 3. Critical value differences in one and two-tail tests. Photo Credit

Let’s re view th e set critical values for Z.

We discussed z-scores and probability in chapter 8.  If we revisit the z-score for 5% and 1%, we can identify the critical regions for the critical rejection areas from the unit standard normal table.

  • A two-tailed test at the 5% level has a critical boundary Z score of +1.96 and -1.96
  • A one-tailed test at the 5% level has a critical boundary Z score of +1.64 or -1.64
  • A two-tailed test at the 1% level has a critical boundary Z score of +2.58 and -2.58
  • A one-tailed test at the 1% level has a critical boundary Z score of +2.33 or -2.33.

Review: Critical values, p-values, and significance level

There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effec t. The value laid out in H 0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true.

Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than z is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to α: if we use our normal criterion of α = .05, then 5% of the area under the curve becomes what we call the rejection region (also called the critical region) of the distribution. This is illustrated in Figure 4.

image

Figure 4: The rejection region for a one-tailed test

The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.

The rejection region is bounded by a specific z-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, z crit (“z-crit”) or z* (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve like we did in Unit 1. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (z = 1.64 corresponds to 0.0405 and z = 1.65 corresponds to 0.0495, so .05 is exactly in between them) if we go to the right and -1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing then shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For α = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of z* = ±1.96. This is shown in Figure 5.

image

Figure 5: Two-tailed rejection region

Thus, any z-score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z-scores in this way, the obtained value of z (sometimes called z-obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis.

Calculate the test statistic: Z

Now that we understand setting up the hypothesis and determining the outcome, let’s examine hypothesis testing with z!  The next step is to carry out the study and get the actual results for our sample. Central to hypothesis test is comparison of the population and sample means. To make our calculation and determine where the sample is in the hypothesized distribution we calculate the Z for the sample data.

Make a decision

To decide whether to reject the null hypothesis, we compare our sample’s Z score to the Z score that marks our critical boundary. If our sample Z score falls inside the rejection region of the comparison distribution (is greater than the z-score critical boundary) we reject the null hypothesis.

The formula for our z- statistic has not changed:

z score hypothesis testing examples

To formally test our hypothesis, we compare our obtained z-statistic to our critical z-value. If z obt > z crit , that means it falls in the rejection region (to see why, draw a line for z = 2.5 on Figure 1 or Figure 2) and so we reject H 0 . If z obt < z crit , we fail to reject. Remember that as z gets larger, the corresponding area under the curve beyond z gets smaller. Thus, the proportion, or p-value, will be smaller than the area for α, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.

Conversely, if we fail to reject, we know that the proportion will be larger than α because the z-statistic will not be as far into the tail. This is illustrated for a one- tailed test in Figure 6.

image

Figure 6. Relation between α, z obt , and p

When the null hypothesis is rejected, the effect is said to be statistically significant . Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

Review: Steps of the Hypothesis Testing Process

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remained of the textbook and course, and though the hypothesis and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above AND in words, explaining in normal English what each one means in terms of the research question.

Step 2: Find the Critical Values

Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: α, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.

Step 3: Compute the Test Statistic

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic, in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.

Step 4: Make the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.

Example: Movie Popcorn

Let’s see how hypothesis testing works in action by working through an example. Say that a movie theater owner likes to keep a very close eye on how much popcorn goes into each bag sold, so he knows that the average bag has 8 cups of popcorn and that this varies a little bit, about half a cup. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ =0.50. The owner wants to make sure that the newest employee is filling bags correctly, so over the course of a week he randomly assesses 25 bags filled by the employee to test for a difference (n = 25). He doesn’t want bags overfilled or under filled, so he looks for differences in both directions. This scenario has all of the information we need to begin our hypothesis testing procedure.

Our manager is looking for a difference in the mean cups of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

H 0 : There is no difference in the cups of popcorn bags from this employee H 0 : μ = 8.00

Notice that we phrase the hypothesis in terms of the population parameter μ, which in this case would be the true average cups of bags filled by the new employee.

Our assumption of no difference, the null hypothesis, is that this mean is exactly

the same as the known population mean value we want it to match, 8.00. Now let’s do the alternative:

H A : There is a difference in the cups of popcorn bags from this employee H A : μ ≠ 8.00

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the level of significance. We decided in step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that α = 0.05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z-test at α = 0.05 are z* = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution so we can visualize the rejection region and make sure it makes sense

image

Figure 7: Rejection region for z* = ±1.96

Step 3: Calculate the Test Statistic

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average cups of this employee’s popcorn bags is ̅X = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z:

So our test statistic is z = -2.50, which we can draw onto our rejection region distribution:

image

Figure 8: Test statistic location

Looking at Figure 5, we can see that our obtained z-statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, -2.50 > -1.96, so we reject the null hypothesis. We can now write our conclusion:

When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average sample size we calculated (the exact location doesn’t matter, just somewhere that flows naturally and makes sense) and the z-statistic and p-value. We don’t know the exact p-value, but we do know that because we rejected the null, it must be less than α.

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect sizes give us an idea of how large, important, or meaningful a statistically significant effect is.

For mean differences like we calculated here, our effect size is Cohen’s d :

z score hypothesis testing examples

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Whenever you find a significant result, you should always calculate an effect size

d Interpretation
0.0 – 0.2 negligible
0.2 – 0.5 small
0.5 – 0.8 medium
0.8 – large

Table 1. Interpretation of Cohen’s d

Example: Office Temperature

Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degree Fahrenheit but is allowed

to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.

You start by laying out the null hypothesis:

H 0 : There is no difference in the average building temperature H 0 : μ = 74

Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:

H A : The average building temperature is higher than claimed H A : μ > 74

image

Now that you have everything set up, you spend one week collecting temperature data:

Day

Temp

Monday

77

Tuesday

76

Wednesday

74

Thursday

78

Friday

78

You calculate the average of these scores to be 𝑋̅ = 76.6 degrees. You use this to calculate the test statistic, using μ = 74 (the supposed average temperature), σ = 1.00 (how much the temperature should vary), and n = 5 (how many data points you collected):

z = 76.60 − 74.00 = 2.60    = 5.78

          1.00/√5            0.45

This value falls so far into the tail that it cannot even be plotted on the distribution!

image

Figure 7: Obtained z-statistic

You compare your obtained z-statistic, z = 5.77, to the critical value, z* = 1.645, and find that z > z*. Therefore you reject the null hypothesis, concluding: Based on 5 observations, the average temperature (𝑋̅ = 76.6 degrees) is statistically significantly higher than it is supposed to be, z = 5.77, p < .05.

d = (76.60-74.00)/ 1= 2.60

The effect size you calculate is definitely large, meaning someone has some explaining to do!

Example: Different Significance Level

First, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, α = 0.01, to test the hypothesis.

We will use 60 as an arbitrary null hypothesis value: H 0 : The average score does not differ from the population H 0 : μ = 50

We will assume a two-tailed test: H A : The average score does differ H A : μ ≠ 50

We have seen the critical values for z-tests at α = 0.05 levels of significance several times. To find the values for α = 0.01, we will go to the standard normal table and find the z-score cutting of 0.005 (0.01 divided by 2 for a two-tailed test) of the area in the tail, which is z crit * = ±2.575. Notice that this cutoff is much higher than it was for α = 0.05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.

We can now calculate our test statistic.  The average of 10 scores is M = 60.40 with a µ = 60. We will use σ = 10 as our known population standard deviation. From this information, we calculate our z-statistic as:

Our obtained z-statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:

Notice two things about the end of the conclusion. First, we wrote that p is greater than instead of p is less than, like we did in the previous two examples. This is because we failed to reject the null hypothesis. We don’t know exactly what the p- value is, but we know it must be larger than the α level we used to test our hypothesis. Second, we used 0.01 instead of the usual 0.05, because this time we tested at a different level. The number you compare to the p-value should always be the significance level you test at. Because we did not detect a statistically significant effect, we do not need to calculate an effect size. Note: some statisticians will suggest to always calculate effects size as a possibility of Type II error. Although insignificant, calculating d = (60.4-60)/10 = .04 which suggests no effect (and not a possibility of Type II error).

Review Considerations in Hypothesis Testing

Errors in hypothesis testing.

Keep in mind that rejecting the null hypothesis is not an all-or-nothing decision. The Type I error rate is affected by the α level: the lower the α level the lower the Type I error rate. It might seem that α is the probability of a Type I error. However, this is not correct. Instead, α is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error. The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error.

Statistical Power

The statistical power of a research design is the probability of rejecting the null hypothesis given the sample size and expected relationship strength. Statistical power is the complement of the probability of committing a Type II error. Clearly, researchers should be interested in the power of their research designs if they want to avoid making Type II errors. In particular, they should make sure their research design has adequate power before collecting data. A common guideline is that a power of .80 is adequate. This means that there is an 80% chance of rejecting the null hypothesis for the expected relationship strength.

Given that statistical power depends primarily on relationship strength and sample size, there are essentially two steps you can take to increase statistical power: increase the strength of the relationship or increase the sample size. Increasing the strength of the relationship can sometimes be accomplished by using a stronger manipulation or by more carefully controlling extraneous variables to reduce the amount of noise in the data (e.g., by using a within-subjects design rather than a between-subjects design). The usual strategy, however, is to increase the sample size. For any expected relationship strength, there will always be some sample large enough to achieve adequate power.

Inferential statistics uses data from a sample of individuals to reach conclusions about the whole population. The degree to which our inferences are valid depends upon how we selected the sample (sampling technique) and the characteristics (parameters) of population data. Statistical analyses assume that sample(s) and population(s) meet certain conditions called statistical assumptions.

It is easy to check assumptions when using statistical software and it is important as a researcher to check for violations; if violations of statistical assumptions are not appropriately addressed then results may be interpreted incorrectly.

Learning Objectives

Having read the chapter, students should be able to:

  • Conduct a hypothesis test using a z-score statistics, locating critical region, and make a statistical decision including.
  • Explain the purpose of measuring effect size and power, and be able to compute Cohen’s d.

Exercises – Ch. 10

  • List the main steps for hypothesis testing with the z-statistic. When and why do you calculate an effect size?
  • z = 1.99, two-tailed test at α = 0.05
  • z = 1.99, two-tailed test at α = 0.01
  • z = 1.99, one-tailed test at α = 0.05
  • You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with μ = 78 and σ = 12. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: 82, 74, 62, 68, 79, 94, 90, 81, 80.
  • A study examines self-esteem and depression in teenagers.  A sample of 25 teens with a low self-esteem are given the Beck Depression Inventory.  The average score for the group is 20.9.  For the general population, the average score is 18.3 with σ = 12.  Use a two-tail test with α = 0.05 to examine whether teenagers with low self-esteem show significant differences in depression.
  • You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about $12 (μ = 42, σ = 12). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the α = 0.05 level of significance.

Answers to Odd- Numbered Exercises – Ch. 10

1. List hypotheses. Determine critical region. Calculate z.  Compare z to critical region. Draw Conclusion.  We calculate an effect size when we find a statistically significant result to see if our result is practically meaningful or important

5. Step 1: H 0 : μ = 42 “My average tips does not differ from other servers”, H A : μ ≠ 42 “My average tips do differ from others”

Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Practice Mathematical Algorithm
  • Mathematical Algorithms
  • Pythagorean Triplet
  • Fibonacci Number
  • Euclidean Algorithm
  • LCM of Array
  • GCD of Array
  • Binomial Coefficient
  • Catalan Numbers
  • Sieve of Eratosthenes
  • Euler Totient Function
  • Modular Exponentiation
  • Modular Multiplicative Inverse
  • Stein's Algorithm
  • Juggler Sequence
  • Chinese Remainder Theorem
  • Quiz on Fibonacci Numbers

Z-test : Formula, Types, Examples

Z-test is especially useful when you have a large sample size and know the population’s standard deviation. Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examining the probability or possibility of acquiring the observed data under particular premises or hypotheses. They offer a framework for evaluating the evidence for or against a given hypothesis.

Table of Content

What is Z-Test?

Z-test formula, when to use z-test, hypothesis testing, steps to perform z-test, type of z-test, practice problems.

Z-test

Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30).

Z-test can also be defined as a statistical method that is used to determine whether the distribution of the test statistics can be approximated using the normal distribution or not. It is the method to determine whether two sample means are approximately the same or different when their variance is known and the sample size is large (should be >= 30).

The Z-test compares the difference between the sample mean and the population means by considering the standard deviation of the sampling distribution. The resulting Z-score represents the number of standard deviations that the sample mean deviates from the population mean. This Z-Score is also known as Z-Statistics, and can be formulated as:

[Tex]\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma} [/Tex]

  • [Tex]\bar{x}  [/Tex] : mean of the sample.
  • [Tex]\mu  [/Tex] : mean of the population.
  • [Tex]\sigma  [/Tex] : Standard deviation of the population.

z-test assumes that the test statistic (z-score) follows a standard normal distribution.

The average family annual income in India is 200k, with a standard deviation of 5k, and the average family annual income in Delhi is 300k.

Then Z-Score for Delhi will be.

[Tex]\begin{aligned} \text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma} \\&=\frac{300-200}{5} \\&=20 \end{aligned} [/Tex]

This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).

  • The sample size should be greater than 30. Otherwise, we should use the t-test.
  • Samples should be drawn at random from the population.
  • The standard deviation of the population should be known.
  • Samples that are drawn from the population should be independent of each other.
  • The data should be normally distributed , however, for a large sample size, it is assumed to have a normal distribution because central limit theorem

A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis testing is a way to validate the claim of an experiment.

  • Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H 0 .
  • Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by H A .
  • Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝).
  • First, identify the null and alternate hypotheses.
  • Determine the level of significance (∝).
  • Find the critical value of z in the z-test using
  • n: sample size.
  • Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis

Left-tailed Test

In this test, our region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

Z-test

Right-tailed Test

In this test, our region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

Z-test

One-Tailed Test

 A school claimed that the students who study that are more intelligent than the average school. On calculating the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of the principal is right or not at a 5% significance level.

  • First, we define the null hypothesis and the alternate hypothesis. Our null hypothesis will be: [Tex]H_0 : \mu  = 100        [/Tex] and our alternate hypothesis. [Tex]H_A : \mu > 100 [/Tex]
  • State the level of significance. Here, our level of significance is given in this question ( [Tex]\alpha [/Tex]  =0.05), if not given then we take ∝=0.05 in general.
  • Now, we compute the Z-Score: X = 110 Mean = 100 Standard Deviation = 15 Number of samples = 50 [Tex]\begin{aligned} \text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma/\sqrt{n}} \\&=\frac{110-100}{15/\sqrt{50}} \\&=\frac{10}{2.12} \\&=4.71 \end{aligned} [/Tex]
  • Now, we look up to the z-table. For the value of ∝=0.05, the z-score for the right-tailed test is 1.645.
  • Here 4.71 >1.645, so we reject the null hypothesis. 
  • If the z-test statistics are less than the z-score, then we will not reject the null hypothesis.

Code Implementations of One-Tailed Z-Test

# Import the necessary libraries import numpy as np import scipy.stats as stats # Given information sample_mean = 110 population_mean = 100 population_std = 15 sample_size = 50 alpha = 0.05 # compute the z-score z_score = ( sample_mean - population_mean ) / ( population_std / np . sqrt ( 50 )) print ( 'Z-Score :' , z_score ) # Approach 1: Using Critical Z-Score # Critical Z-Score z_critical = stats . norm . ppf ( 1 - alpha ) print ( 'Critical Z-Score :' , z_critical ) # Hypothesis if z_score > z_critical : print ( "Reject Null Hypothesis" ) else : print ( "Fail to Reject Null Hypothesis" ) # Approach 2: Using P-value # P-Value : Probability of getting less than a Z-score p_value = 1 - stats . norm . cdf ( z_score ) print ( 'p-value :' , p_value ) # Hypothesis if p_value < alpha : print ( "Reject Null Hypothesis" ) else : print ( "Fail to Reject Null Hypothesis" )

Z-Score : 4.714045207910317Critical Z-Score : 1.6448536269514722Reject Null Hypothesisp-value : 1.2142337364462463e-06Reject Null Hypothesis

Two-tailed test

In this test, our region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.

z score hypothesis testing examples

Below is an example of performing the z-test:

Two-sampled z-test

In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider u 1 and u 2 to be the population mean, and X 1 and X 2 to be the observed sample mean. Here, our null hypothesis could be like this:

[Tex]H_{0} : \mu_{1} -\mu_{2} = 0    [/Tex]

and alternative hypothesis

[Tex]H_{1} :  \mu_{1} – \mu_{2} \ne 0    [/Tex]

and the formula for calculating the z-test score:

[Tex]Z = \frac{\left ( \overline{X_{1}} – \overline{X_{2}} \right ) – \left ( \mu_{1} – \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}    [/Tex]

where  [Tex]\sigma_1 [/Tex]   and  [Tex]\sigma_2 [/Tex]   are the standard deviation and n 1 and n 2 are the sample size of population corresponding to u 1 and u 2 .  

There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination, the score of each student comes. Now we want to determine whether the online or offline classes are better.

Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10 Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12

Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.

Step 1: Null & Alternate Hypothesis

  • Null Hypothesis: There is no significant difference between the mean score between the online and offline classes [Tex] \mu_1 -\mu_2 = 0 [/Tex]
  • Alternate Hypothesis: There is a significant difference in the mean scores between the online and offline classes. [Tex] \mu_1 -\mu_2 \neq 0 [/Tex]

Step 2: Significance Label

  • Significance Label: 5%  [Tex]\alpha = 0.05 [/Tex]

Step 3: Z-Score

[Tex]\begin{aligned} \text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)} {\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}} \\ &= \frac{(75-80)-0} {\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}} \\ &= \frac{-5} {\sqrt{2+2.4}} \\ &= \frac{-5} {2.0976} \\&=-2.384 \end{aligned} [/Tex]

Step 4: Check to Critical Z-Score value in the Z-Table for apha/2 = 0.025

  •  Critical Z-Score = 1.96

Step 5: Compare with the absolute Z-Score value

  • absolute(Z-Score) > Critical Z-Score
  • Reject the null hypothesis. There is a significant difference between the online and offline classes.

Code Implementations on Two-sampled Z-test

import numpy as np import scipy.stats as stats # Group A (Offline Classes) n1 = 50 x1 = 75 s1 = 10 # Group B (Online Classes) n2 = 60 x2 = 80 s2 = 12 # Null Hypothesis = mu_1-mu_2 = 0 # Hypothesized difference (under the null hypothesis) D = 0 # Set the significance level alpha = 0.05 # Calculate the test statistic (z-score) z_score = (( x1 - x2 ) - D ) / np . sqrt (( s1 ** 2 / n1 ) + ( s2 ** 2 / n2 )) print ( 'Z-Score:' , np . abs ( z_score )) # Calculate the critical value z_critical = stats . norm . ppf ( 1 - alpha / 2 ) print ( 'Critical Z-Score:' , z_critical ) # Compare the test statistic with the critical value if np . abs ( z_score ) > z_critical : print ( """Reject the null hypothesis. There is a significant difference between the online and offline classes.""" ) else : print ( """Fail to reject the null hypothesis. There is not enough evidence to suggest a significant difference between the online and offline classes.""" ) # Approach 2: Using P-value # P-Value : Probability of getting less than a Z-score p_value = 2 * ( 1 - stats . norm . cdf ( np . abs ( z_score ))) print ( 'P-Value :' , p_value ) # Compare the p-value with the significance level if p_value < alpha : print ( """Reject the null hypothesis. There is a significant difference between the online and offline classes.""" ) else : print ( """Fail to reject the null hypothesis. There is not enough evidence to suggest significant difference between the online and offline classes.""" )

Z-Score: 2.3836564731139807 Critical Z-Score: 1.959963984540054 Reject the null hypothesis. There is a significant difference between the online and offline classes. P-Value : 0.01714159544079563 Reject the null hypothesis. There is a significant difference between the online and offline classes.

Solved examples :

Example 1: One-sample Z-test

Problem: A company claims that the average battery life of their new smartphone is 12 hours. A consumer group tests 100 phones and finds the average battery life to be 11.8 hours with a population standard deviation of 0.5 hours. At a 5% significance level, is there evidence to refute the company’s claim?

Solution: Step 1: State the hypotheses H₀: μ = 12 (null hypothesis) H₁: μ ≠ 12 (alternative hypothesis) Step 2: Calculate the Z-score Z = (x̄ – μ) / (σ / √n) = (11.8 – 12) / (0.5 / √100) = -0.2 / 0.05 = -4 Step 3: Find the critical value (two-tailed test at 5% significance) Z₀.₀₂₅ = ±1.96 Step 4: Compare Z-score with critical value |-4| > 1.96, so we reject the null hypothesis. Conclusion: There is sufficient evidence to refute the company’s claim about battery life.

Problem: A researcher wants to compare the effectiveness of two different medications for reducing blood pressure. Medication A is tested on 50 patients, resulting in a mean reduction of 15 mmHg with a standard deviation of 3 mmHg. Medication B is tested on 60 patients, resulting in a mean reduction of 13 mmHg with a standard deviation of 4 mmHg. At a 1% significance level, is there a significant difference between the two medications?

Step 1: State the hypotheses H₀: μ₁ – μ₂ = 0 (null hypothesis) H₁: μ₁ – μ₂ ≠ 0 (alternative hypothesis) Step 2: Calculate the Z-score Z = (x̄₁ – x̄₂) / √((σ₁²/n₁) + (σ₂²/n₂)) = (15 – 13) / √((3²/50) + (4²/60)) = 2 / √(0.18 + 0.2667) = 2 / 0.6455 = 3.10 Step 3: Find the critical value (two-tailed test at 1% significance) Z₀.₀₀₅ = ±2.576 Step 4: Compare Z-score with critical value 3.10 > 2.576, so we reject the null hypothesis. Conclusion: There is a significant difference between the effectiveness of the two medications at the 1% significance level.

Problem 3 : A polling company claims that 60% of voters support a new policy. In a sample of 1000 voters, 570 support the policy. At a 5% significance level, is there evidence to support the company’s claim?

Step 1: State the hypotheses H₀: p = 0.60 (null hypothesis) H₁: p ≠ 0.60 (alternative hypothesis) Step 2: Calculate the Z-score p̂ = 570/1000 = 0.57 (sample proportion) Z = (p̂ – p) / √(p(1-p)/n) = (0.57 – 0.60) / √(0.60(1-0.60)/1000) = -0.03 / √(0.24/1000) = -0.03 / 0.0155 = -1.94 Step 3: Find the critical value (two-tailed test at 5% significance) Z₀.₀₂₅ = ±1.96 Step 4: Compare Z-score with critical value |-1.94| < 1.96, so we fail to reject the null hypothesis. Conclusion: There is not enough evidence to refute the polling company’s claim at the 5% significance level.

Problem 4 : A manufacturer claims that their light bulbs last an average of 1000 hours. A sample of 100 bulbs has a mean life of 985 hours. The population standard deviation is known to be 50 hours. At a 5% significance level, is there evidence to reject the manufacturer’s claim?

Solution: H₀: μ = 1000 H₁: μ ≠ 1000 Z = (x̄ – μ) / (σ / √n) = (985 – 1000) / (50 / √100) = -15 / 5 = -3 Critical value (α = 0.05, two-tailed): ±1.96 |-3| > 1.96, so reject H₀. Conclusion: There is sufficient evidence to reject the manufacturer’s claim at the 5% significance level.

Example 5 : Two factories produce semiconductors. Factory A’s chips have a mean resistance of 100 ohms with a standard deviation of 5 ohms. Factory B’s chips have a mean resistance of 98 ohms with a standard deviation of 4 ohms. Samples of 50 chips from each factory are tested. At a 1% significance level, is there a difference in mean resistance between the two factories?

H₀: μA – μB = 0 H₁: μA – μB ≠ 0 Z = (x̄A – x̄B) / √((σA²/nA) + (σB²/nB)) = (100 – 98) / √((5²/50) + (4²/50)) = 2 / √(0.5 + 0.32) = 2 / 0.872 = 2.29 Critical value (α = 0.01, two-tailed): ±2.576 |2.29| < 2.576, so fail to reject H₀. Conclusion: There is not enough evidence to conclude a difference in mean resistance at the 1% significance level.

Problem 6 : A political analyst claims that 40% of voters in a certain district support a new tax policy. In a random sample of 500 voters, 220 support the policy. At a 5% significance level, is there evidence to reject the analyst’s claim?

H₀: p = 0.40 H₁: p ≠ 0.40 p̂ = 220/500 = 0.44 Z = (p̂ – p) / √(p(1-p)/n) = (0.44 – 0.40) / √(0.40(1-0.40)/500) = 0.04 / 0.0219 = 1.83 Critical value (α = 0.05, two-tailed): ±1.96 |1.83| < 1.96, so fail to reject H₀. Conclusion: There is not enough evidence to reject the analyst’s claim at the 5% significance level.

Problem 7 : Two advertising methods are compared. Method A results in 150 sales out of 1000 contacts. Method B results in 180 sales out of 1200 contacts. At a 5% significance level, is there a difference in the effectiveness of the two methods?

H₀: pA – pB = 0 H₁: pA – pB ≠ 0 p̂A = 150/1000 = 0.15 p̂B = 180/1200 = 0.15 p̂ = (150 + 180) / (1000 + 1200) = 0.15 Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB)) = (0.15 – 0.15) / √(0.15(1-0.15)(1/1000 + 1/1200)) = 0 / 0.0149 = 0 Critical value (α = 0.05, two-tailed): ±1.96 |0| < 1.96, so fail to reject H₀. Conclusion: There is no significant difference in the effectiveness of the two advertising methods at the 5% significance level.

Problem 8 : A new treatment for a disease is tested in two cities. In City A, 120 out of 400 patients recover. In City B, 140 out of 500 patients recover. At a 5% significance level, is there a difference in the recovery rates between the two cities?

H₀: pA – pB = 0 H₁: pA – pB ≠ 0 p̂A = 120/400 = 0.30 p̂B = 140/500 = 0.28 p̂ = (120 + 140) / (400 + 500) = 0.2889 Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB)) = (0.30 – 0.28) / √(0.2889(1-0.2889)(1/400 + 1/500)) = 0.02 / 0.0316 = 0.633 Critical value (α = 0.05, two-tailed): ±1.96 |0.633| < 1.96, so fail to reject H₀. Conclusion: There is not enough evidence to conclude a difference in recovery rates between the two cities at the 5% significance level.

Problem 9 : Two advertising methods are compared. Method A results in 150 sales out of 1000 contacts. Method B results in 180 sales out of 1200 contacts. At a 5% significance level, is there a difference in the effectiveness of the two methods?

Problem 10 : A company claims that their product weighs 500 grams on average. A sample of 64 products has a mean weight of 498 grams. The population standard deviation is known to be 8 grams. At a 1% significance level, is there evidence to reject the company’s claim?

H₀: μ = 500 H₁: μ ≠ 500 Z = (x̄ – μ) / (σ / √n) = (498 – 500) / (8 / √64) = -2 / 1 = -2 Critical value (α = 0.01, two-tailed): ±2.576 |-2| < 2.576, so fail to reject H₀. Conclusion: There is not enough evidence to reject the company’s claim at the 1% significance level.

1).A cereal company claims that their boxes contain an average of 350 grams of cereal. A consumer group tests 100 boxes and finds a mean weight of 345 grams with a known population standard deviation of 15 grams. At a 5% significance level, is there evidence to refute the company’s claim?

2).A study compares the effect of two different diets on cholesterol levels. Diet A is tested on 50 people, resulting in a mean reduction of 25 mg/dL with a standard deviation of 8 mg/dL. Diet B is tested on 60 people, resulting in a mean reduction of 22 mg/dL with a standard deviation of 7 mg/dL. At a 1% significance level, is there a significant difference between the two diets?

3).A politician claims that 60% of voters in her district support her re-election. In a random sample of 1000 voters, 570 support her. At a 5% significance level, is there evidence to reject the politician’s claim?

4).Two different teaching methods are compared. Method A results in 80 students passing out of 120 students. Method B results in 90 students passing out of 150 students. At a 5% significance level, is there a difference in the effectiveness of the two methods?

5).A company claims that their new energy-saving light bulbs last an average of 10,000 hours. A sample of 64 bulbs has a mean life of 9,800 hours. The population standard deviation is known to be 500 hours. At a 1% significance level, is there evidence to reject the company’s claim?

6).The mean salary of employees in a large corporation is said to be $75,000 per year. A union representative suspects this is too high and surveys 100 randomly selected employees, finding a mean salary of $72,500. The population standard deviation is known to be $8,000. At a 5% significance level, is there evidence to support the union representative’s suspicion?

7).Two factories produce computer chips. Factory A’s chips have a mean processing speed of 3.2 GHz with a standard deviation of 0.2 GHz. Factory B’s chips have a mean processing speed of 3.3 GHz with a standard deviation of 0.25 GHz. Samples of 100 chips from each factory are tested. At a 5% significance level, is there a difference in mean processing speed between the two factories?

8).A new vaccine is claimed to be 90% effective. In a clinical trial with 500 participants, 440 develop immunity. At a 1% significance level, is there evidence to reject the claim about the vaccine’s effectiveness?

9).Two different advertising campaigns are tested. Campaign A results in 250 sales out of 2000 views. Campaign B results in 300 sales out of 2500 views. At a 5% significance level, is there a difference in the effectiveness of the two campaigns?

10).A quality control manager claims that the defect rate in a production line is 5%. In a sample of 1000 items, 65 are found to be defective. At a 5% significance level, is there evidence to suggest that the actual defect rate is different from the claimed 5%?

Type 1 error and Type II error

  • Type I error: Type 1 error has occurred when we reject the null hypothesis, even when the hypothesis is true. This error is denoted by alpha.
  • Type II error: Type II error occurred when we didn’t reject the null hypothesis, even when the hypothesis is false. This error is denoted by beta.
 Null Hypothesis is TRUENull Hypothesis is FALSE
Reject Null Hypothesis

Type I Error

(False Positive)

Correct decision
Fail to Reject the Null HypothesisCorrect decision

Type II error

(False Negative)

Z-tests are used to determine whether there is a statistically significant difference between a sample statistic and a population parameter, or between two population parameters.Z-tests are statistical tools used to determine if there’s a significant difference between a sample statistic and a population parameter, or between two population parameters. They’re applicable when dealing with large sample sizes (typically n > 30) and known population standard deviations. Z-tests can be used for analyzing means or proportions in both one-sample and two-sample scenarios. The process involves stating hypotheses, calculating a Z-score, comparing it to a critical value based on the chosen significance level (often 5% or 1%), and then making a decision to reject or fail to reject the null hypothesis.

What is the main limitation of the z-test?

The limitation of Z-Tests is that we don’t usually know the population standard deviation. What we do is: When we don’t know the population’s variability, we assume that the sample’s variability is a good basis for estimating the population’s variability.

What is the minimum sample for z-test?

A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.

What is the application of z-test?

It is also used to determine if there is a significant difference between the mean of two independent samples. The z-test can also be used to compare the population proportion to an assumed proportion or to determine the difference between the population proportion of two samples.

What is the theory of the z-test?

The z test is a commonly used hypothesis test in inferential statistics that allows us to compare two populations using the mean values of samples from those populations, or to compare the mean of one population to a hypothesized value, when what we are interested in comparing is a continuous variable.

Please Login to comment...

Similar reads.

  • Engineering Mathematics
  • Machine Learning
  • Mathematical
  • Best PS5 SSDs in 2024: Top Picks for Expanding Your Storage
  • Best Nintendo Switch Controllers in 2024
  • Xbox Game Pass Ultimate: Features, Benefits, and Pricing in 2024
  • Xbox Game Pass vs. Xbox Game Pass Ultimate: Which is Right for You?
  • Full Stack Developer Roadmap [2024 Updated]

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

  • Prompt Library
  • DS/AI Trends
  • Stats Tools
  • Interview Questions
  • Generative AI
  • Machine Learning
  • Deep Learning

Z-tests for Hypothesis testing: Formula & Examples

Different types of Z-test - One sample and two samples

Z-tests are statistical hypothesis testing techniques that are used to determine whether the null hypothesis relating to comparing sample means or proportions with that of population at a given significance level can be rejected or otherwise based on the z-statistics or z-score. As a data scientist , you must get a good understanding of the z-tests and its applications to test the hypothesis for your statistical models. In this blog post, we will discuss an overview of different types of z-tests and related concepts with the help of examples. You may want to check my post on hypothesis testing titled – Hypothesis testing explained with examples

Table of Contents

What are Z-tests & Z-statistics?

Z-tests can be defined as statistical hypothesis testing techniques that are used to quantify the hypothesis testing related to claim made about the population parameters such as mean and proportion. Z-test uses the sample data to test the hypothesis about the population parameters (mean or proportion). There are different types of Z-tests which are used to estimate the population mean or proportion, or, perform hypotheses testing related to samples’ means or proportions.

Different types of Z-tests 

There are following different types of Z-tests which are used to perform different types of hypothesis testing.  

Different types of Z-test - One sample and two samples

  • One-sample Z-test for means
  • Two-sample Z-test for means
  • One sample Z-test for proportion
  • Two sample Z-test for proportions

Four variables are involved in the Z-test for performing hypothesis testing for different scenarios. They are as follows:

  • An independent variable that is called the “sample” and assumed to be normally distributed;
  • A dependent variable that is known as the test statistic (Z) and calculated based on sample data
  • Different types of Z-test that can be used for performing hypothesis testing
  • A significance level or “alpha” is usually set at 0.05 but can take the values such as 0.01, 0.05, 0.1

When to use Z-test – Explained with examples

The following are different scenarios when Z-test can be used:

  • Compare the sample or a single group with that of the population with respect to the parameter, mean. This is called as one-sample Z-test for means. For example, whether the student of a particular school has been scoring marks in Mathematics which is statistically significant than the other schools. This can also be thought of as a hypothesis test to check whether the sample belongs to the population or otherwise.
  • Compare two groups with respect to the population parameter, mean. This is called as two-samples Z-test for means. For example, you want to compare class X students from different schools and determine if students of one school are better than others based on their score of Mathematics.
  • Compare hypothesized proportion of the population to that of population theoritical proportion. For example, whether the unemployment rate of a given state is different than the well-established rate for the ccountry
  • Compare the proportion of one population with the proportion of othe rproportion. For example, whether the efficacy rate of vaccination in two different population are statistically significant or otherwise.

Z-test Interview Questions 

Here is a list of a few interview questions you may expect in your data scientists interview:

  • What is Z-test?
  • What is Z-statistics or Z-score?
  • When to use Z-test vs other tests such as T-test or Chi-square test?
  • What is Z-distribution?
  • What is the difference between Z-distribution and T-distribution?
  • What is sampling distribution?
  • What are different types of Z-tests?
  • Explain different types of Z-tests with the help of real-world examples?
  • What’s the difference two samples Z-test for means and two-samples Z-test for proportions? Explain with one example each.
  • As data scientists, give some scenarios when you would like to use Z-test when building machine learning models?

Recent Posts

Ajitesh Kumar

  • ROC Curve & AUC Explained with Python Examples - September 8, 2024
  • Accuracy, Precision, Recall & F1-Score – Python Examples - August 28, 2024
  • Logistic Regression in Machine Learning: Python Example - August 26, 2024

Ajitesh Kumar

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search for:

ChatGPT Prompts (250+)

  • Generate Design Ideas for App
  • Expand Feature Set of App
  • Create a User Journey Map for App
  • Generate Visual Design Ideas for App
  • Generate a List of Competitors for App
  • ROC Curve & AUC Explained with Python Examples
  • Accuracy, Precision, Recall & F1-Score – Python Examples
  • Logistic Regression in Machine Learning: Python Example
  • Reducing Overfitting vs Models Complexity: Machine Learning
  • Model Parallelism vs Data Parallelism: Examples

Data Science / AI Trends

  • • Prepend any arxiv.org link with talk2 to load the paper into a responsive chat application
  • • Custom LLM and AI Agents (RAG) On Structured + Unstructured Data - AI Brain For Your Organization
  • • Guides, papers, lecture, notebooks and resources for prompt engineering
  • • Common tricks to make LLMs efficient and stable
  • • Machine learning in finance

Free Online Tools

  • Create Scatter Plots Online for your Excel Data
  • Histogram / Frequency Distribution Creation Tool
  • Online Pie Chart Maker Tool
  • Z-test vs T-test Decision Tool
  • Independent samples t-test calculator

Recent Comments

I found it very helpful. However the differences are not too understandable for me

Very Nice Explaination. Thankyiu very much,

in your case E respresent Member or Oraganization which include on e or more peers?

Such a informative post. Keep it up

Thank you....for your support. you given a good solution for me.

Learn Math and Stats with Dr. G

A shortcut is the longest distance between two points.

Learn Math and Stats with Dr. G

Two-Tailed z-test Hypothesis Test By Hand

Running a Two-Tailed z-test Hypothesis Test by Hand

HOW TO Video z-test Using Excel

Suppose it is up to you to determine if a certain state (Michigan) receives a significantly different amount of public school funding (per student) than the USA average. You know that the USA mean public school yearly funding is $6800 per student per year, with a standard deviation of $400.

Next, suppose you collect a sample (n = 100) from Michigan and determine that the sample mean for Michigan (per student per year) is $6873

Use the z-test and the correct Ho and Ha to run a hypothesis test to determine if Michigan receives a significantly different amount of funding for public school education (per student per year).

NOTE: This entire example works the same way if you have a dataset. Using the dataset, you would need to first calculate the sample mean. To run a z-test, it is generally expected that you have a larger sample size (30 or more) and that you have information about the population mean and standard deviation. If you do not have this information, it is sometimes best to use the t-test.

Step 1: Set up your hypothesis

Hypothesis: The mean per student per year funding in Michigan is significantly different than the average per student per year funding over the entire USA.

Step 2: Create Ho and Ha

NOTE: There are many ways to write out Ho.

Ho: mean per student per year funding for Michigan = mean per student per year funding for the USA

This can also be written as the following. Ho: Michigan mean – Population mean = 0

Ha: mean per student per year funding for Michigan ≠ mean per student per year funding for the USA

NOTICE1: The Ha in this example is TWO-TAILED because we are interested in seeing if Michigan is significantly different than the population mean. In a two-tailed test, the Ha contains a NOT EQUAL and the test will see if there is a significant difference (greater or smaller).

NOTICE2: The Ho is the null hypothesis and so always contains the equal sign as it is the case for which there is no significant difference between the two groups.

Step 3: Calculate the z-test statistic

Now, calculate the test statistic. In this example, we are using the z-test and are doing this by hand. However, there are many applications that run such tests. This Site has several examples under the Stats Apps link.

z = (sample mean – population mean) / [population standard deviation/sqrt(n)]

z = (6873 – 6800) / [400/sqrt(100)]

z = 73 / [400/10]

z = 73/ [40]

So, the z-test result, also called the test statistic is 1.825.

Step 4: Using the z-table, determine the rejection regions for you z-test. To do this, you must first select an alpha value . The alpha value is the percentage chance that you will reject the null (choose to go with your Ha research hypothesis as you conclusion) when in fact the Ho really true (and your research Ha should not be selected). This is also called a Type I error (choosing Ha when Ho is actually correct).

The smaller the alpha, the smaller the percentage of error, BUT the smaller the rejection regions and more difficult to reject Ho.

Most research uses alpha at .05, which creates only a 5% chance of Type I error. However, in cases such as medical research, the alpha is set much smaller.

In our case, we will use alpha = .05

This is TWO-TAILED test, therefore the rejection regions are denoted by + or – 1.96.

HOW TO Find Critical Values and Rejection Regions

NOTE: From the z-table, the critical values for a two-tailed z-test at alpha = .05 is +/- 1.96

Step 5: Create a conclusion

Our z-test result is 1.825

Because 1.825 < 1.96 it is NOT inside the rejection region.

Recall that the rejection regions for a two tailed test with alpha set to .05 is any value above 1.96 OR any value below – 1.96. Because 1.825 is not above 1.96 or below -1.96, it is NOT in the rejection region.

Therefore, this result is NOT significant. We CANNOT reject Ho. We CANNOT conclude that there is a significant difference between the funding for Michigan and the average funding for the USA.

http://www.ascd.org/publications/educational-leadership/may02/vol59/num08/Unequal-School-Funding-in-the-United-States.aspx

Z-Score: Definition, Formula, Calculation & Interpretation

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A z-score is a statistical measure that describes the position of a raw score in terms of its distance from the mean, measured in standard deviation units. A positive z-score indicates that the value lies above the mean, while a negative z-score indicates that the value lies below the mean.

It is also known as a standard score because it allows scores on different variables to be compared by standardizing the distribution. A standard normal distribution (SND) is a normally shaped distribution with a mean of 0 and a standard deviation (SD) of 1 (see Fig. 1).

Gauss distribution. Standard normal distribution. Gaussian bell graph curve. Business and marketing concept. Math probability theory.

Why Are Z-Scores Important?

It is useful to standardize the values (raw scores) of a  normal distribution  by converting them into z-scores because:
  • Probability estimation : Z-scores can be used to estimate the probability of a particular data point occurring within a normal distribution. By converting z-scores to percentiles or using a standard normal distribution table, you can determine the likelihood of a value being above or below a certain threshold.
  • Hypothesis testing : Z-scores are used in hypothesis testing to determine the significance of results. By comparing the z-score of a sample statistic to critical values, you can decide whether to reject or fail to reject a null hypothesis.
  • Comparing datasets : Z-scores allow you to compare data points from different datasets by standardizing the values. This is useful when the datasets have different scales or units.
  • Identifying outliers : Z-scores help identify outliers, which are data points significantly different from the rest of the dataset. Typically, data points with z-scores greater than 3 or less than -3 are considered potential outliers and may warrant further investigation.

How To Calculate

The formula for calculating a z-score is z = (x-μ)/σ, where x is the raw score, μ is the population mean, and σ is the population standard deviation.

As the formula shows, the z-score is simply the raw score minus the population mean, divided by the population standard deviation.

Z score formula

When the population mean and the population standard deviation are unknown, the standard score may be calculated using the sample mean (x̄) and sample standard deviation (s) as estimates of the population values.

To calculate a z-score, follow these steps:

  • Identify the individual score ( x ) you want to convert to a z-score.
  • Determine the mean ( μ or mu ) of the dataset. The mean is the average of all the scores.
  • Calculate the standard deviation ( σ or sigma ) of the dataset. The standard deviation measures how spread out the scores are from the mean.
  • Subtract the mean ( μ ) from the individual score ( x ). This will give you the difference between the score and the mean.
  • Divide the difference you calculated in step 4 by the standard deviation ( σ ). The result is the z-score.

Interpretation

The value of the z-score tells you how many standard deviations you are away from the mean. A larger absolute value indicates a greater distance from the mean.
  • Positive z-score : If a z-score is positive, it indicates that the data point is above the mean. For example, a z-score of 1.5 means the data point is 1.5 standard deviations above the mean.
  • Negative z-score : If a z-score is negative, it indicates that the data point is below the mean. For example, a z-score of -2 means the data point is 2 standard deviations below the mean.
  • Zero z-score : A z-score of zero indicates that the data point is equal to the mean.

Another way to interpret z-scores is by creating a standard normal distribution, also known as the z-score distribution, or probability distribution (see Fig. 3).

Probability Estimation

When working with z-scores, the data is assumed to follow a standard normal distribution with a mean of 0 and a standard deviation of 1. This allows for the use of standard normal distribution tables or calculators to determine probabilities.

The z-score tells us how many standard deviations a data point is from the mean. Once we know the z-score, we can estimate the probability of a data point falling within a specific range or being above or below a certain value.

In a standard normal distribution, there’s a handy rule called the empirical rule, or the 68-95-99.7 rule. This rule states that:

  • Approximately 68% of the data falls within one standard deviation of the mean (z-scores between -1 and 1).
  • Around 95% of the data falls within two standard deviations of the mean (z-scores between -2 and 2).
  • Nearly 99.7% of the data falls within three standard deviations of the mean (z-scores between -3 and 3).

Figure 3 shows the proportion of a standard normal distribution in percentages. As you can see, there’s a 95% probability of randomly selecting a score between -1.96 and +1.96 standard deviations from the mean.

Proportion of a Standard Normal Distribution (SND) in %

Using the standard normal distribution, researchers can calculate the probability of randomly obtaining a score from the sample. For example, there’s a 68% chance of randomly selecting a score between -1 and +1 standard deviations from the mean.

Hypothesis Testing

Using a z-score table lets you quickly determine the probability associated with a specific value in a dataset, helping you make decisions and draw conclusions based on your data.

  • If you have a one-tailed test, you will look for the area to the left (for a left-tailed test) or right (for a right-tailed test) of your z-score.
  • If you have a two-tailed test, you will look for the area in both tails combined.

The significance level (α) is the probability threshold for rejecting the null hypothesis. Common significance levels are 0.01, 0.05, and 0.10. The critical values are the z-scores that correspond to the chosen significance level. These values can be found using a standard normal distribution table or calculator.

A Z-score table shows the percentage of values (usually a decimal figure) to the left of a given Z-score on a standard normal distribution.

Z table

1. Identify the parts of the z-score :

  • The z-score consists of a whole number and decimal parts
  • For example, if your z-score is 1.24, the whole number part is 1, and the decimal part is 0.24

2. Find the corresponding probability in the z-score table :

  • Z-score tables are usually organized with the whole number part of the z-score in the leftmost column and the decimal part across the top row
  • Locate the whole number part of your z-score in the leftmost column
  • Move across the row until you find the column that matches the decimal part of your z-score
  • The value at the intersection of the row and column is the probability (area under the curve) associated with your z-score

3. Interpret the probability :

  • For a left-tailed test, the probability you found in the table is your p-value
  • For a right-tailed test, subtract the probability you found from 1 to get your p-value
  • For a two-tailed test, if your z-score is positive, double the probability you found to get your p-value; if your z-score is negative, subtract the probability from 1 and then double the result to get your p-value
  • Compare the probability to your chosen alpha level (0.05 or 0.01). If the probability is less than the alpha level, the result is considered statistically significant

In statistical analysis, if there is less than a 5% chance of randomly selecting a particular raw score, it is considered a statistically significant result. This means the result is unlikely to have occurred by chance alone and is more likely to be a real effect or difference.

p-value from z-score calculator

Conclusion:

Practice Problems for Z-Scores

Calculate the z-scores for the following:

Sample Questions

  • Scores on a psychological well-being scale range from 1 to 10, with an average score of 6 and a standard deviation of 2. What is the z-score for a person who scored 4?
  • On a measure of anxiety, a group of participants show a mean score of 35 with a standard deviation of 5. What is the z-score corresponding to a score of 30?
  • A depression inventory has an average score of 50 with a standard deviation of 10. What is the z-score corresponding to a score of 70?
  • In a study on sleep, participants report an average of 7 hours of sleep per night, with a standard deviation of 1 hour. What is the z-score for a person reporting 5 hours of sleep?
  • On a memory test, the average score is 100, with a standard deviation of 15. What is the z-score corresponding to a score of 85?
  • A happiness scale has an average score of 75 with a standard deviation of 10. What is the z-score corresponding to a score of 95?
  • An intelligence test has a mean score of 100 with a standard deviation of 15. What is the z-score that corresponds to a score of 130?

Answers for Sample Questions

Double-check your answers with these solutions. Remember, for each problem, you subtract the average from your value, then divide by how much values typically vary (the standard deviation).

  • Z-score = (4 – 6)/2 = -1
  • Z-score = (30 – 35)/5 = -1
  • Z-score = (70 – 50)/10 = 2
  • Z-score = (5 – 7)/1 = -2
  • Z-score = (85 – 100)/15 = -1
  • Z-score = (95 – 75)/10 = 2
  • Z-score = (130 – 100)/15 = 2

Calculating a Raw Score

Sometimes, we know a z-score and want to find the corresponding raw score. The formula for calculating a z-score in a sample into a raw score is given below:

X = (z)(SD) + mean

As the formula shows, the z-score and standard deviation are multiplied together, and this figure is added to the mean.

Check your answer makes sense: If we have a negative z-score, the corresponding raw score should be less than the mean, and a positive z-score must correspond to a raw score higher than the mean.

Calculating a Z-Score using Excel

To calculate the z-score of a specific value, x, first, you must calculate the mean of the sample by using the AVERAGE formula.

For example, if the range of scores in your sample begins at cell A1 and ends at cell A20, the formula =AVERAGE(A1:A20) returns the average of those numbers.

Next, you must calculate the standard deviation of the sample by using the STDEV.S formula. For example, if the range of scores in your sample begins at cell A1 and ends at cell A20, the formula = STDEV.S (A1:A20) returns the standard deviation of those numbers.

Now to calculate the z-score, type the following formula in an empty cell: = (x – mean) / [standard deviation].

To make things easier, instead of writing the mean and SD values in the formula, you could use the cell values corresponding to these values. For example, = (A12 – B1) / [C1].

Then, to calculate the probability for a SMALLER z-score, which is the probability of observing a value less than x (the area under the curve to the LEFT of x), type the following into a blank cell: = NORMSDIST( and input the z-score you calculated).

To find the probability of LARGER z-score, which is the probability of observing a value greater than x (the area under the curve to the RIGHT of x), type: =1 – NORMSDIST (and input the z-score you calculated).

Frequently Asked Questions

Can z-scores be used with any type of data, regardless of distribution.

Z-scores are commonly used to standardize and compare data across different distributions. They are most appropriate for data that follows a roughly symmetric and bell-shaped distribution.

However, they can still provide useful insights for other types of data, as long as certain assumptions are met. Yet, for highly skewed or non-normal distributions, alternative methods may be more appropriate.

It’s important to consider the characteristics of the data and the goals of the analysis when determining whether z-scores are suitable or if other approaches should be considered.

How can understanding z-scores contribute to better research and statistical analysis in psychology?

Understanding z-scores enhances research and statistical analysis in psychology. Z-scores standardize data for meaningful comparisons, identify outliers, and assess likelihood.

They aid in interpreting practical significance, applying statistical tests, and making accurate conclusions. Z-scores provide a common metric, facilitating communication of findings.

By using z-scores, researchers improve rigor, objectivity, and clarity in their work, leading to better understanding and knowledge in psychology.

Can a z-score be used to determine the likelihood of an event occurring?

No, a z-score itself cannot directly determine the likelihood of an event occurring. However, it provides information about the relative position of a data point within a distribution.

By converting data to z-scores, researchers can assess how unusual or extreme a value is compared to the rest of the distribution. This can help estimate the probability or likelihood of obtaining a particular score or more extreme values.

So, while z-scores provide insights into the relative rarity of an event, they do not directly determine the likelihood of the event occurring on their own.

Further Information

  • How to Use a Z-Table (Standard Normal Table) to Calculate the Percentage of Scores Above or Below the Z-Score
  • Z-Score Table (for positive or negative scores)
  • Statistics for Psychology Book Download

z score

Logo for Pressbooks at Virginia Tech

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

5.5 Introduction to Hypothesis Tests

Dalmation puppy near man sitting on the floor.

One job of a statistician is to make statistical inferences about populations based on samples taken from the population. Confidence intervals are one way to estimate a population parameter.

Another way to make a statistical inference is to make a decision about a parameter. For instance, a car dealership advertises that its new small truck gets 35 miles per gallon on average. A tutoring service claims that its method of tutoring helps 90% of its students get an A or a B. A company says that female managers in their company earn an average of $60,000 per year. A statistician may want to make a decision about or evaluate these claims. A hypothesis test can be used to do this.

A hypothesis test involves collecting data from a sample and evaluating the data. Then the statistician makes a decision as to whether or not there is sufficient evidence to reject the null hypothesis based upon analyses of the data.

In this section, you will conduct hypothesis tests on single means when the population standard deviation is known.

Hypothesis testing consists of two contradictory hypotheses or statements, a decision based on the data, and a conclusion. To perform a hypothesis test, a statistician will perform some variation of these steps:

  • Define hypotheses.
  • Collect and/or use the sample data to determine the correct distribution to use.
  • Calculate test statistic.
  • Make a decision.
  • Write a conclusion.

Defining your hypotheses

The actual test begins by considering two hypotheses: the null hypothesis and the alternative hypothesis. These hypotheses contain opposing viewpoints.

The null hypothesis ( H 0 ) is often a statement of the accepted historical value or norm. This is your starting point that you must assume from the beginning in order to show an effect exists.

The alternative hypothesis ( H a ) is a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 .

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision . There are two options for a decision. They are “reject H 0 ” if the sample information favors the alternative hypothesis or “do not reject H 0 ” or “decline to reject H 0 ” if the sample information is insufficient to reject the null hypothesis.

The following table shows mathematical symbols used in H 0 and H a :

Figure 5.12: Null and alternative hypotheses
equal (=) not equal (≠) greater than (>) less than (<)
equal (=) less than (<)
equal (=) more than (>)

NOTE: H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol in the alternative hypothesis depends on the wording of the hypothesis test. Despite this, many researchers may use =, ≤, or ≥ in the null hypothesis. This practice is acceptable because our only decision is to reject or not reject the null hypothesis.

We want to test whether the mean GPA of students in American colleges is 2.0 (out of 4.0). The null hypothesis is: H 0 : μ = 2.0. What is the alternative hypothesis?

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

Using the Sample to Test the Null Hypothesis

Once you have defined your hypotheses, the next step in the process is to collect sample data. In a classroom context, the data or summary statistics will usually be given to you.

Then you will have to determine the correct distribution to perform the hypothesis test, given the assumptions you are able to make about the situation. Right now, we are demonstrating these ideas in a test for a mean when the population standard deviation is known using the z distribution. We will see other scenarios in the future.

Calculating a Test Statistic

Next you will start evaluating the data. This begins with calculating your test statistic , which is a measure of the distance between what you observed and what you are assuming to be true. In this context, your test statistic, z ο , quantifies the number of standard deviations between the sample mean, x, and the population mean, µ . Calculating the test statistic is analogous to the previously discussed process of standardizing observations with z -scores:

z=\frac{\overline{x}-{\mu }_{o}}{\left(\frac{\sigma }{\sqrt{n}}\right)}

where µ o   is the value assumed to be true in the null hypothesis.

Making a Decision

Once you have your test statistic, there are two methods to use it to make your decision:

  • Critical value method (discussed further in later chapters)
  • p -value method (our current focus)

p -Value Method

To find a p -value , we use the test statistic to calculate the actual probability of getting the test result. Formally, the p -value is the probability that, if the null hypothesis is true, the results from another randomly selected sample will be as extreme or more extreme as the results obtained from the given sample.

A large p -value calculated from the data indicates that we should not reject the null hypothesis. The smaller the p -value, the more unlikely the outcome and the stronger the evidence is against the null hypothesis. We would reject the null hypothesis if the evidence is strongly against it.

Draw a graph that shows the p -value. The hypothesis test is easier to perform if you use a graph because you see the problem more clearly.

Suppose a baker claims that his bread height is more than 15 cm on average. Several of his customers do not believe him. To persuade his customers that he is right, the baker decides to do a hypothesis test. He bakes ten loaves of bread. The mean height of the sample loaves is 17 cm. The baker knows from baking hundreds of loaves of bread that the standard deviation for the height is 0.5 cm and the distribution of heights is normal.

The null hypothesis could be H 0 : μ ≤ 15.

The alternate hypothesis is H a : μ > 15.

The words “is more than” calls for the use of the > symbol, so “ μ > 15″ goes into the alternate hypothesis. The null hypothesis must contradict the alternate hypothesis.

\frac{\sigma }{\sqrt{n}}

Suppose the null hypothesis is true (the mean height of the loaves is no more than 15 cm). Then, is the mean height (17 cm) calculated from the sample unexpectedly large? The hypothesis test works by asking how unlikely the sample mean would be if the null hypothesis were true. The graph shows how far out the sample mean is on the normal curve. The p -value is the probability that, if we were to take other samples, any other sample mean would fall at least as far out as 17 cm.

This means that the p -value is the probability that a sample mean is the same or greater than 17 cm when the population mean is, in fact, 15 cm. We can calculate this probability using the normal distribution for means.

Normal distribution curve on average bread heights with values 15, as the population mean, and 17, as the point to determine the p-value, on the x-axis.

A p -value of approximately zero tells us that it is highly unlikely that a loaf of bread rises no more than 15 cm on average. That is, almost 0% of all loaves of bread would be at least as high as 17 cm purely by CHANCE had the population mean height really been 15 cm. Because the outcome of 17 cm is so unlikely (meaning it is happening NOT by chance alone), we conclude that the evidence is strongly against the null hypothesis that the mean height would be at most 15 cm. There is sufficient evidence that the true mean height for the population of the baker’s loaves of bread is greater than 15 cm.

A normal distribution has a standard deviation of one. We want to verify a claim that the mean is greater than 12. A sample of 36 is taken with a sample mean of 12.5.

Find the p -value.

Decision and Conclusion

A systematic way to decide whether to reject or not reject the null hypothesis is to compare the p -value and a preset or preconceived α (also called a significance level ). A preset α is the probability of a type I error (rejecting the null hypothesis when the null hypothesis is true). It may or may not be given to you at the beginning of the problem. If there is no given preconceived α , then use α = 0.05.

When you make a decision to reject or not reject H 0 , do as follows:

  • If α > p -value, reject H 0 . The results of the sample data are statistically significant . You can say there is sufficient evidence to conclude that H 0 is an incorrect belief and that the alternative hypothesis, H a , may be correct.
  • If α ≤ p -value, fail to reject H 0 . The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis, H a , may be correct.

After you make your decision, write a thoughtful conclusion in the context of the scenario incorporating the hypotheses.

NOTE: When you “do not reject H 0 ,” it does not mean that you should believe that H 0 is true. It simply means that the sample data have failed to provide sufficient evidence to cast serious doubt about the truthfulness of H o .

When using the p -value to evaluate a hypothesis test, the following rhymes can come in handy:

If the p -value is low, the null must go.

If the p -value is high, the null must fly.

This memory aid relates a p -value less than the established alpha (“the p -value is low”) as rejecting the null hypothesis and, likewise, relates a p -value higher than the established alpha (“the p -value is high”) as not rejecting the null hypothesis.

Fill in the blanks:

  • Reject the null hypothesis when              .
  • The results of the sample data             .
  • Do not reject the null when hypothesis when             .

It’s a Boy Genetics Labs claim their procedures improve the chances of a boy being born. The results for a test of a single population proportion are as follows:

  • H 0 : p = 0.50, H a : p > 0.50
  • p -value = 0.025

Interpret the results and state a conclusion in simple, non-technical terms.

Click here for more multimedia resources, including podcasts, videos, lecture notes, and worked examples.

Figure References

Figure 5.11: Alora Griffiths (2019). dalmatian puppy near man in blue shorts kneeling. Unsplash license. https://unsplash.com/photos/7aRQZtLsvqw

Figure 5.13: Kindred Grey (2020). Bread height probability. CC BY-SA 4.0.

A decision-making procedure for determining whether sample evidence supports a hypothesis

The claim that is assumed to be true and is tested in a hypothesis test

A working hypothesis that is contradictory to the null hypothesis

A measure of the difference between observations and the hypothesized (or claimed) value

The probability that an event will occur, assuming the null hypothesis is true

Probability that a true null hypothesis will be rejected, also known as type I error and denoted by α

Finding sufficient evidence that the observed effect is not just due to variability, often from rejecting the null hypothesis

Significant Statistics Copyright © 2024 by John Morgan Russell, OpenStaxCollege, OpenIntro is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

IMAGES

  1. Two Sample Z Hypothesis Test

    z score hypothesis testing examples

  2. Z-test: Definition, formula, examples, uses, z-test vs t-test

    z score hypothesis testing examples

  3. One Sample Z Hypothesis Test

    z score hypothesis testing examples

  4. PPT

    z score hypothesis testing examples

  5. Hypothesis Testing: Z-Score Example

    z score hypothesis testing examples

  6. Hypothesis Testing using Z-test Statistics

    z score hypothesis testing examples

VIDEO

  1. Z-Score: Definition, Formula, Calculation & Interpretation

  2. 8 Hypothesis testing| Z-test |Two Independent Samples with MS Excel

  3. LEC04| COSM

  4. Hypothesis Testing using one-sample T-test and Z-test

  5. Hypothesis Z-Test using Excel

  6. hypothesis testing z test

COMMENTS

  1. Z Test: Uses, Formula & Examples

    Learn when to use a Z test or a T test to compare group means. Z tests require known population standard deviation, while T tests use sample estimate. See hypotheses, assumptions, interpretation and worked examples.

  2. Z-test Calculator

    Perform a one-sample Z-test on the population's mean using this online tool. Learn about Z-test formula, p-value, critical values, and examples.

  3. PDF Hypothesis Testing with z Tests

    Learn how to use z tests to compare one score or sample with a population mean or another sample mean. See examples of z tests for height, SAT, GRE, and more.

  4. PDF The Z-test

    Learn how to conduct a z-test to compare a sample mean with a population mean when the standard deviation is known. See how to use z-scores, critical values, p-values and R functions to perform one-tailed and two-tailed tests.

  5. Step-by-step guide to hypothesis testing in statistics

    For example, if a company wants to know if its new product is more popular than its old one, it can use hypothesis testing. They start with a statement like "The new product is not more popular than the old one" (this is the null hypothesis) and compare it with "The new product is more popular" (this is the alternative hypothesis).

  6. Hypothesis Testing: Z-Scores. A guide to understanding what…

    The null hypothesis is retained when the sample mean is associated with a high probability of occurrence. Such probability of occurrence is better known as p-value. ... There are several statistical tests for different types of distributions, usually, for a normal distribution, the z-test or test based on z-scores is used. Make a decision: ...

  7. Z-Test for Statistical Hypothesis Testing Explained

    Learn how to conduct a Z-test to determine if the difference between the mean of a sample and a population is statistically significant. Follow the four steps of stating the null and alternate hypotheses, choosing the critical value, and calculating the Z-test statistic.

  8. Z Test: Definition & Two Proportion Z-Test

    Learn how to perform a z test to compare two proportions and test if they are the same. See the formula, steps, example, and video tutorial.

  9. Chapter 6 Hypothesis Testing: the z-test

    6 Hypothesis Testing: the z-test | Introduction to Statistics and ...

  10. 8.3: Sampling distribution and hypothesis testing

    Questions. 1. What is the probability of having a sample mean greater than 50 (mean > 50) for a sample of n = 9 ?. We'll use a slight modification of the Z-score equation we introduced in Chapter 6.6 — the modification here is that previously we referred to the distribution of \(X_{i}\) values and how likely a particular observation would be. Instead, we can use the Z score with the ...

  11. 5 Examples of Using Z-Scores in Real Life

    Learn how z-scores are used to compare values to population means and standard deviations in different scenarios. See examples of z-scores for exam scores, newborn weights, giraffe heights, shoe sizes and blood pressure.

  12. How to Interpret Z-Scores (With Examples)

    A z-score tells us how many standard deviations away a value lies from the mean. Learn how to calculate and interpret z-scores with examples, and why they are useful for comparing values to a distribution.

  13. One Sample Z-Test: Definition, Formula, and Example

    Learn how to perform a one sample z-test to test whether the mean of a population is equal to some hypothesized value. This test assumes that the population standard deviation is known and uses the z test statistic to calculate the p-value.

  14. Z-score: Definition, Formula, and Uses

    Learn how to calculate and interpret z-scores, which measure the distance between a data point and the mean using standard deviations. Z-scores can help you compare observations across variables, identify outliers, and find probabilities and percentiles.

  15. 10 Chapter 10: Hypothesis Testing with Z

    Learn how to perform a z-test to make an inference about a population mean based on a sample mean. See examples of one-tailed and two-tailed tests, critical values, p-values, and errors in hypothesis testing.

  16. Z-test : Formula, Types, Examples

    Z-test is a statistical test to compare the mean of a sample with a known population mean when the sample size is large and the population standard deviation is known. Learn how to perform Z-test, its formula, types, and examples with hypothesis testing steps.

  17. Z-tests for Hypothesis testing: Formula & Examples

    Z-tests are statistical hypothesis testing techniques that are used to determine whether the null hypothesis relating to comparing sample means or proportions with that of population at a given significance level can be rejected or otherwise based on the z-statistics or z-score. As a data scientist, you must get a good understanding of the z ...

  18. Two-Tailed z-test Hypothesis Test By Hand

    Learn how to run a hypothesis test using a two-tailed z-test with a sample and a population mean and standard deviation. Follow the steps to set up Ho and Ha, calculate the test statistic, and use the z-table to determine the rejection regions and conclusion.

  19. Z-Score: Definition, Formula, Calculation & Interpretation

    A z-score is a statistical measure that describes the position of a raw score in terms of its distance from the mean, measured in standard deviation units. Learn how to calculate, interpret and use z-scores for probability estimation, hypothesis testing, comparing datasets and identifying outliers.

  20. 5.5 Introduction to Hypothesis Tests

    A hypothesis test can be used to do this. A hypothesis test involves collecting data from a sample and evaluating the data. Then the statistician makes a decision as to whether or not there is sufficient evidence to reject the null hypothesis based upon analyses of the data. In this section, you will conduct hypothesis tests on single means ...

  21. Two Sample Z-Test: Definition, Formula, and Example

    Learn how to perform a two sample z-test to compare two population means with known standard deviations. See the formula, assumptions, and a step-by-step example with p-value and conclusion.

  22. Khan Academy

    Learn how to calculate z-scores, which measure how many standard deviations a data point is from the mean in a distribution. Watch a video example and read the comments from other learners who ask questions and share their insights.

  23. An example of how to z score and hypothesis testing

    A short video on hypothesis testing with z scores and setting up a rejection regions.Playlist on Z scoreshttp://www.youtube.com/course?list=EC6157D8E20C15149...