Descriptive Statistics

Hypothesis test, online statistics calculator.

On Statisty you can statistically analyse your data online. Simply copy your own data into the table above and select the variables you want to analyse.

Statistics-Calculator

Statisty thus is a free statistical software that makes your calculations directly online. In contrast to SPSS, JASP or Excel, nothing needs to be installed in order to statistically evaluate your data.

Depending on how many variables you click on and what scale level they have, the appropriate tests are calculated.

  • One sample t-Test
  • Independent t-test
  • Paired t-Test
  • Binomial Test
  • Chi-Square Test
  • One-way ANOVA
  • Two-way ANOVA
  • Repeated measures ANOVA
  • Two-way ANOVA with repeated measures
  • Mann-Whitney U-test
  • Wilcoxon Signed-Rank test
  • Kruskal-Wallis Test
  • Friedman-Test
  • Correlation analysis
  • Pearson correlation
  • Spearman correlation
  • Simple Lineare Regression
  • Multiple Lineare Regression
  • Logistische Regression

Statistics App

The results are then displayed clearly. First you get the descriptive statistics and then the appropriate hypothesis test. Of course, you can also calculate a linear regression or a logistic regression .

Statistics-App

If you like also have a look at the Online Statistics Calculator at DATAtab

  • App Store Testing
  • Creative Optimization
  • Discoverability
  • Mobile Growth
  • Storemaven Insider
  • Storemaven Testing Platform
  • User Acquisition

app hypothesis tests

  • A/B testing methodology

The Ultimate App Store Test (Part 1): Building Hypotheses

app hypothesis tests

As an ASO professional, having the ability to craft quality app store marketing hypotheses is an essential skill to learn; you’ll struggle to succeed in the field of app store optimization without it. Why? Because a strong hypothesis is the foundation of all successful ASO test. 

We found that app store tests with strong hypotheses are 90% more likely to be ‘successful’ – industry lingo for producing an actionable insight that drives a conversion rate improvement. Having a strong hypothesis helps you avoid having to explain why time and resources were spent on a test that failed to produce useful learnings. 

Once you master the art of hypothesizing, you’ll be able to focus your app store page testing efforts and achieve higher conversion rates in less time. 

In this article, we’ll teach you what makes a good hypothesis (with examples!), how to conduct thorough hypothesis research, and what to avoid when hypothesizing. Let’s get started!

Check all other articles in our Ultimate App Store Test Series: Part 2: Creative Design Part 3: Driving Traffic Part 4: Analyzing Results

The Ultimate App Store Test (Part 1): Building Hypotheses - 1

A hypothesis – the definition

In terms of app store optimization, a hypothesis is a solid intention that can be proven or disproven through testing and will reveal actual learnings about a target audience – information that will teach you something about why your app was downloaded, or not.

What separates a good hypothesis from a mediocre one? 

If you hope to master the testing cycle, as we call it here at Storemaven, you’re going to need the answer to this question. Fortunately, we have it for you! A good app store hypothesis is the following.

The Ultimate App Store Test (Part 1): Building Hypotheses - 2

What will your hypothesis tell you about your users? All great app store page hypotheses give developers new insight into their target audiences.

For example, if you were to hypothesize that “A yellow icon will generate more clicks than a blue icon,” and test your theory, you wouldn’t actually learn anything substantial about your users. Maybe you discover that yes, a yellow background does generate more clicks but until you know why this is the case, the information is useless to you.

Make sure all of your hypotheses reveal new information about your app store visitors so that you can use those details to improve your optimization efforts.

Does your current hypothesis closely resemble previous hypotheses you’ve made? If the answer is yes, you should develop a more distinct hypothesis that will uncover new insights. Many ASO managers run tests that are too nuanced to be helpful.

Here’s an example of two distinct hypotheses:

  • Showing my app’s ease of use will prompt engagement.
  • Competitor differentiation messaging will outperform use case messaging.

See how these two hypotheses will reveal completely different things about an app’s target audience? Make sure your hypotheses are similarly distinct.

Well-Researched

Lastly, good hypotheses are well-researched.

Dig through your competitors’ creative assets using the ASO Tool Box , which will allow you to easily preview and download their icons, screenshots, etc. for analysis. Read through your reviews to learn what your users like and don’t like about your app. Take a look at your analytics to discover which app features are most engaged with.

Then take this wealth of information and use it to develop well-informed hypotheses that pertain to your exact situation in terms of ASO.

Four Great Hypothesis Examples

Since your hypotheses are key to the success of your app store page testing efforts, let’s take a moment to look at four great hypothesis examples. This way you can get a feel for what separates a quality hypothesis from a mediocre one.

1. Social Proof

In competitive markets, social proof can help differentiate one app from another. But of course you won’t actually know if this is true for your app until you test it. Here’s an example of a quality social proof oriented hypothesis:

“Including social proof elements in the first impression screen will boost install rates.”

The Ultimate App Store Test (Part 1): Building Hypotheses - 3

2. Feature-Oriented

Why should a user download your app? Because it’s chock-full of amazing features! Test this theory with a well-crafted, feature oriented hypothesis such as:

“Highlighting specific and highly used features of our app will increase our conversion rate (CVR).”

The Ultimate App Store Test (Part 1): Building Hypotheses - 4

3. Characters

Awesome characters are a huge draw for many app store users. Once they see that a game will allow them to play a fantastical hero, they’re more likely to whip out their wallets and purchase it ASAP. If you’ve developed a character-driven game, you might hypothesize something like:

“Users prefer to see screenshots that feature our cast of characters.”

The Ultimate App Store Test (Part 1): Building Hypotheses - 5

4. Gameplay Mechanics

How do users play your game? Many Apple App Store and Google Play Store users quickly scan through apps looking for gameplay mechanics that tickle their fancy. If this describes your audience, you might consider testing a gameplay mechanics hypothesis:

“A clear view of gameplay mechanics in the first impression will boost conversions.”

The Ultimate App Store Test (Part 1): Building Hypotheses - 6

We’re just illustrating what a good hypothesis looks like to inspire you to create your own, don’t simply copy the hypotheses above! Do your research and make sure they relate to your specific audience.

A Framework For Solid Hypothesis Research

When it comes to hypotheses, the battle’s won in the research phase. With that in mind, let’s look at a three-step process you can use to properly research your app-based theories.

1. Know Your Users

When crafting hypotheses, always start with in-depth audience research. Who are your users and where do they live? Why do they need an app or game like yours? And what specific features/gameplay elements get them most excited to hit the download button?

Answers to these questions are essential, but they’re not always easy to obtain. After all, you can’t post a “How Did You Hear About Us?” form in the Apple or Google Play stores.

Fortunately, we have a few tips for you:

  • Understand User Types: There are two types of app store users, Decisive and Explorative. Decisive users choose to download an app (or not) after viewing the first impression screen. Explorative users take time to investigate apps thoroughly before deciding to install them. Based on our research, 68.4% of Apple App Store users and 76.4% of Google Play users are decisive. This means that in all likelihood, your users will only view your first impression screen, which is helpful to know.
  • Study Your Past Marketing Initiatives: Do you run Facebook ads or Adword campaigns? Have you ever hosted a webinar? Do you post content on your website to try and attract your target audience? If you answered “yes” to any of these questions, you have a wealth of audience information at your fingertips! Analyze your data to discover who’s clicking on your ads, attending your webinars, visiting your website, etc. Then use these details to paint a clearer picture of your user base.
  • Use a Tool Like Storemaven: Our team will create an exact replica of your Apple App Store or Google Play store page, put it in a sandbox environment, and drive actual traffic to it. This will allow you to learn exactly who your audience is and how they interact with your page via expert audience tracking. 

During your user research, do your best to uncover the age, gender, location, and interests of your audience. These crucial bits of information will give you the insights you need to craft better app store pages and convert more visitors into lifelong customers.

Note: your audience will likely change over time. Because of this, you need to study your users on a regular basis to make sure you’re targeting them properly.

2. Research Your Market

Next, you need to research your market, which means taking a good, hard look at your direct competitors. Who are they? What do their app store pages look like? How do they differentiate themselves? Do they feature videos? What kind of screenshots do they use?

Also, when attempting to increase conversion rates in markets that are culturally far removed from those you know, it’s crucial to research the local culture, habits, and ways of communication.  

There are a couple of different ways to do market research:

  • The Manual Way: If you’re short on funds but have time to spare, you can manually research your competitors. Simply type the keywords you target into the Apple App Store and Google Play store search bars and see which apps appear. Or view the rankings in your app’s category. Or type [Keyword] + App into Google and see what the search engine digs up. Basically, use the free tools at your disposal to uncover competitors and then manually analyze their app store pages for insights.
  • The Faster Way : If you’re looking for a quicker and more accurate way to research your competition, try using a tool like ASO Tool Box or Appfigures. ASO Tool Box will allow you to easily download and view your competitors’ creative assets so that you can understand how they assemble their app pages. ASO Tool Box is a free tool and AppTweak pricing depends on the features required. With Appfigures you can optimize for more organic keywords and get insights into keyword ranks. You can also identify more keywords that will increase your views and get you more downloads.  

Remember, you’ll need to do competitor research for every market you’re in. There’s a good chance your competition in the U.S. will be different than it is in India, for example.

3. Analyze Your Data

As you research your hypotheses and begin running tests, it’s important to analyze the data you receive back. Was your initial hypothesis correct? If so, why? If not, why? Most importantly, how will your findings inform your next hypothesis and the one after that?

You should always question your hypothesis and realize that there are unforeseen factors at play. For example, if your downloads went through the roof in April 2020, is this because of a new ASO strategy or simply because COVID-19 resulted in your target users having more time on their hands during lockdowns?

It’s your job to find out! Use your data to create informed hypotheses. Then test them and tap into the results to craft better hypotheses in the future that help boost your conversion rates.

How NOT to Develop a Hypothesis

So far we’ve covered what makes a good hypothesis, explored a few quality hypothesis examples, and given you a three-step process for hypothesis research. To help, we want to finish with some things to avoid . Do NOT do the following:

The Ultimate App Store Test (Part 1): Building Hypotheses - 7

Don’t Play the Copycat Game

While market research is important, you shouldn’t copy and paste any competitor’s complete strategy. Your app or game is (hopefully!) unique, which means it will require a unique ASO strategy to properly optimize its app store page.

Use competitive research as a starting point. Take what’s working for similar apps and adjust it to fit your needs and specific user base. But always use your invaluable brain and any available data points to determine which elements to copy and which to toss by the wayside.

Don’t Neglect Your Previous Insights

Like we mentioned above, your previous hypotheses should inform your future ones. If you neglect the insights you gleaned in the past, you’ll waste a lot of time and money. Instead, use what you’ve already learned to inform your approach to new hypotheses and tests.

At the same time, don’t get too married to specific ideas. Things change quickly in ASO and you should always question your hypotheses to find true answers.

Don’t be Limited by Brand Guidelines

ASO managers can run into problems when they don’t have full control over creative assets.

For example, you might hypothesize that a character-based logo will outperform a text-based one. But if your company’s branding completely revolves around its text-based logo, you might face internal resistance when attempting to change it.

Do your best to work with other departments. Show them the potential benefits of adjusting your logo and assure them that if the change doesn’t produce significant gains, you’ll switch it back immediately. Or, simply run your tests in a sandbox environment like Storemaven so that you can hypothesize about anything you like without touching your actual app page!

Don’t Forget About Your Why

Finally, don’t ever test a hypothesis that isn’t buoyed by a strong why . If you don’t have a specific reason for hypothesizing about something, don’t move forward with it.

So many ASO managers make mistakes because they test for the sake of testing. It’s much better to hypothesize about something that could result in significant gains for your company. Then test the hypothesis and act on your findings in an appropriate way.

In Conclusion: Your aim is to always build better hypotheses

Your hypotheses make up the foundation of your app store testing process. Without them, you’ll be unfocused and waste a lot of valuable time and energy.

Take the time to research your users and market. Then use your findings to craft informed hypotheses, test them, and use the results to improve future hypotheses. If you can do that, you’ll be able to run accurate tests that provide important insights.

Happy? Feeling more confident about the dos and don’ts? Well, we’re not stopping there! Keep following us for part 2 in our Ultimate Test series, where we’ll focus on the creative design process . Stay tuned.

Never miss an update in the Mobile Growth industry with Storemaven’s newsletter

Join 10,000 mobile marketers that stay on top of their craft

Please leave this field empty.

Check your inbox in a few 🙂

app hypothesis tests

Related Articles:

app hypothesis tests

How LeoVegas Reduced Acquisition Costs and Increased ROAS with Testing

app hypothesis tests

Design the Best App Store Product Page With These Guidelines in Mind

app hypothesis tests

User Behavior Bible: The Apple App Store

Join 10,000 other mobile marketers and stay on top of your craft with the mobile growth newsletter, thank you :).

StatAnalytica

Step-by-step guide to hypothesis testing in statistics

hypothesis testing in statistics

Hypothesis testing in statistics helps us use data to make informed decisions. It starts with an assumption or guess about a group or population—something we believe might be true. We then collect sample data to check if there is enough evidence to support or reject that guess. This method is useful in many fields, like science, business, and healthcare, where decisions need to be based on facts.

Learning how to do hypothesis testing in statistics step-by-step can help you better understand data and make smarter choices, even when things are uncertain. This guide will take you through each step, from creating your hypothesis to making sense of the results, so you can see how it works in practical situations.

What is Hypothesis Testing?

Table of Contents

Hypothesis testing is a method for determining whether data supports a certain idea or assumption about a larger group. It starts by making a guess, like an average or a proportion, and then uses a small sample of data to see if that guess seems true or not.

For example, if a company wants to know if its new product is more popular than its old one, it can use hypothesis testing. They start with a statement like “The new product is not more popular than the old one” (this is the null hypothesis) and compare it with “The new product is more popular” (this is the alternative hypothesis). Then, they look at customer feedback to see if there’s enough evidence to reject the first statement and support the second one.

Simply put, hypothesis testing is a way to use data to help make decisions and understand what the data is really telling us, even when we don’t have all the answers.

Importance Of Hypothesis Testing In Decision-Making And Data Analysis

Hypothesis testing is important because it helps us make smart choices and understand data better. Here’s why it’s useful:

  • Reduces Guesswork : It helps us see if our guesses or ideas are likely correct, even when we don’t have all the details.
  • Uses Real Data : Instead of just guessing, it checks if our ideas match up with real data, which makes our decisions more reliable.
  • Avoids Errors : It helps us avoid mistakes by carefully checking if our ideas are right so we don’t make costly errors.
  • Shows What to Do Next : It tells us if our ideas work or not, helping us decide whether to keep, change, or drop something. For example, a company might test a new ad and decide what to do based on the results.
  • Confirms Research Findings : It makes sure that research results are accurate and not just random chance so that we can trust the findings.

Here’s a simple guide to understanding hypothesis testing, with an example:

1. Set Up Your Hypotheses

Explanation: Start by defining two statements:

  • Null Hypothesis (H0): This is the idea that there is no change or effect. It’s what you assume is true.
  • Alternative Hypothesis (H1): This is what you want to test. It suggests there is a change or effect.

Example: Suppose a company says their new batteries last an average of 500 hours. To check this:

  • Null Hypothesis (H0): The average battery life is 500 hours.
  • Alternative Hypothesis (H1): The average battery life is not 500 hours.

2. Choose the Test

Explanation: Pick a statistical test that fits your data and your hypotheses. Different tests are used for various kinds of data.

Example: Since you’re comparing the average battery life, you use a one-sample t-test .

3. Set the Significance Level

Explanation: Decide how much risk you’re willing to take if you make a wrong decision. This is called the significance level, often set at 0.05 or 5%.

Example: You choose a significance level of 0.05, meaning you’re okay with a 5% chance of being wrong.

4. Gather and Analyze Data

Explanation: Collect your data and perform the test. Calculate the test statistic to see how far your sample result is from what you assumed.

Example: You test 30 batteries and find they last an average of 485 hours. You then calculate how this average compares to the claimed 500 hours using the t-test.

5. Find the p-Value

Explanation: The p-value tells you the probability of getting a result as extreme as yours if the null hypothesis is true.

Example: You find a p-value of 0.0001. This means there’s a very small chance (0.01%) of getting an average battery life of 485 hours or less if the true average is 500 hours.

6. Make Your Decision

Explanation: Compare the p-value to your significance level. If the p-value is smaller, you reject the null hypothesis. If it’s larger, you do not reject it.

Example: Since 0.0001 is much less than 0.05, you reject the null hypothesis. This means the data suggests the average battery life is different from 500 hours.

7. Report Your Findings

Explanation: Summarize what the results mean. State whether you rejected the null hypothesis and what that implies.

Example: You conclude that the average battery life is likely different from 500 hours. This suggests the company’s claim might not be accurate.

Hypothesis testing is a way to use data to check if your guesses or assumptions are likely true. By following these steps—setting up your hypotheses, choosing the right test, deciding on a significance level, analyzing your data, finding the p-value, making a decision, and reporting results—you can determine if your data supports or challenges your initial idea.

Understanding Hypothesis Testing: A Simple Explanation

Hypothesis testing is a way to use data to make decisions. Here’s a straightforward guide:

1. What is the Null and Alternative Hypotheses?

  • Null Hypothesis (H0): This is your starting assumption. It says that nothing has changed or that there is no effect. It’s what you assume to be true until your data shows otherwise. Example: If a company says their batteries last 500 hours, the null hypothesis is: “The average battery life is 500 hours.” This means you think the claim is correct unless you find evidence to prove otherwise.
  • Alternative Hypothesis (H1): This is what you want to find out. It suggests that there is an effect or a difference. It’s what you are testing to see if it might be true. Example: To test the company’s claim, you might say: “The average battery life is not 500 hours.” This means you think the average battery life might be different from what the company says.

2. One-Tailed vs. Two-Tailed Tests

  • One-Tailed Test: This test checks for an effect in only one direction. You use it when you’re only interested in finding out if something is either more or less than a specific value. Example: If you think the battery lasts longer than 500 hours, you would use a one-tailed test to see if the battery life is significantly more than 500 hours.
  • Two-Tailed Test: This test checks for an effect in both directions. Use this when you want to see if something is different from a specific value, whether it’s more or less. Example: If you want to see if the battery life is different from 500 hours, whether it’s more or less, you would use a two-tailed test. This checks for any significant difference, regardless of the direction.

3. Common Misunderstandings

  • Clarification: Hypothesis testing doesn’t prove that the null hypothesis is true. It just helps you decide if you should reject it. If there isn’t enough evidence against it, you don’t reject it, but that doesn’t mean it’s definitely true.
  • Clarification: A small p-value shows that your data is unlikely if the null hypothesis is true. It suggests that the alternative hypothesis might be right, but it doesn’t prove the null hypothesis is false.
  • Clarification: The significance level (alpha) is a set threshold, like 0.05, that helps you decide how much risk you’re willing to take for making a wrong decision. It should be chosen carefully, not randomly.
  • Clarification: Hypothesis testing helps you make decisions based on data, but it doesn’t guarantee your results are correct. The quality of your data and the right choice of test affect how reliable your results are.

Benefits and Limitations of Hypothesis Testing

  • Clear Decisions: Hypothesis testing helps you make clear decisions based on data. It shows whether the evidence supports or goes against your initial idea.
  • Objective Analysis: It relies on data rather than personal opinions, so your decisions are based on facts rather than feelings.
  • Concrete Numbers: You get specific numbers, like p-values, to understand how strong the evidence is against your idea.
  • Control Risk: You can set a risk level (alpha level) to manage the chance of making an error, which helps avoid incorrect conclusions.
  • Widely Used: It can be used in many areas, from science and business to social studies and engineering, making it a versatile tool.

Limitations

  • Sample Size Matters: The results can be affected by the size of the sample. Small samples might give unreliable results, while large samples might find differences that aren’t meaningful in real life.
  • Risk of Misinterpretation: A small p-value means the results are unlikely if the null hypothesis is true, but it doesn’t show how important the effect is.
  • Needs Assumptions: Hypothesis testing requires certain conditions, like data being normally distributed . If these aren’t met, the results might not be accurate.
  • Simple Decisions: It often results in a basic yes or no decision without giving detailed information about the size or impact of the effect.
  • Can Be Misused: Sometimes, people misuse hypothesis testing, tweaking data to get a desired result or focusing only on whether the result is statistically significant.
  • No Absolute Proof: Hypothesis testing doesn’t prove that your hypothesis is true. It only helps you decide if there’s enough evidence to reject the null hypothesis, so the conclusions are based on likelihood, not certainty.

Final Thoughts 

Hypothesis testing helps you make decisions based on data. It involves setting up your initial idea, picking a significance level, doing the test, and looking at the results. By following these steps, you can make sure your conclusions are based on solid information, not just guesses.

This approach lets you see if the evidence supports or contradicts your initial idea, helping you make better decisions. But remember that hypothesis testing isn’t perfect. Things like sample size and assumptions can affect the results, so it’s important to be aware of these limitations.

In simple terms, using a step-by-step guide for hypothesis testing is a great way to better understand your data. Follow the steps carefully and keep in mind the method’s limits.

What is the difference between one-tailed and two-tailed tests?

 A one-tailed test assesses the probability of the observed data in one direction (either greater than or less than a certain value). In contrast, a two-tailed test looks at both directions (greater than and less than) to detect any significant deviation from the null hypothesis.

How do you choose the appropriate test for hypothesis testing?

The choice of test depends on the type of data you have and the hypotheses you are testing. Common tests include t-tests, chi-square tests, and ANOVA. You get more details about ANOVA, you may read Complete Details on What is ANOVA in Statistics ?  It’s important to match the test to the data characteristics and the research question.

What is the role of sample size in hypothesis testing?  

Sample size affects the reliability of hypothesis testing. Larger samples provide more reliable estimates and can detect smaller effects, while smaller samples may lead to less accurate results and reduced power.

Can hypothesis testing prove that a hypothesis is true?  

Hypothesis testing cannot prove that a hypothesis is true. It can only provide evidence to support or reject the null hypothesis. A result can indicate whether the data is consistent with the null hypothesis or not, but it does not prove the alternative hypothesis with certainty.

Related Posts

how-to-find-the=best-online-statistics-homework-help

How to Find the Best Online Statistics Homework Help

why-spss-homework-help-is-an-important-aspects-for-students

Why SPSS Homework Help Is An Important aspect for Students?

Leave a comment cancel reply.

Your email address will not be published. Required fields are marked *

logo

Stats and R

Hypothesis test by hand.

  • Confidence interval

Hypothesis test

  • Inferential statistics

Descriptive versus inferential statistics

Motivations and limitations, step #1: stating the null and alternative hypothesis, step #2: computing the test statistic, step #3: finding the critical value, why don’t we accept \(h_0\) , step #3: computing the p -value, step #4: concluding and interpreting the results, step #2: computing the confidence interval, step #3: concluding and interpreting the results, which method to choose.

app hypothesis tests

Remember that descriptive statistics is the branch of statistics aiming at describing and summarizing a set of data in the best possible manner, that is, by reducing it down to a few meaningful key measures and visualizations—with as little loss of information as possible. In other words, the branch of descriptive statistics helps to have a better understanding and a clear image about a set of observations thanks to summary statistics and graphics. With descriptive statistics, there is no uncertainty because we describe only the group of observations that we decided to work on and no attempt is made to generalize the observed characteristics to another or to a larger group of observations.

Inferential statistics , one the other hand, is the branch of statistics that uses a random sample of data taken from a population to make inferences, i.e., to draw conclusions about the population of interest (see the difference between population and sample if you need a refresh of the two concepts). In other words, information from the sample is used to make generalizations about the parameter of interest in the population.

The two most important tools used in the domain of inferential statistics are:

  • hypothesis test (which is the main subject of the present article), and
  • confidence interval (which is briefly discussed in this section )

Via my teaching tasks, I realized that many students (especially in introductory statistic classes) struggle to perform hypothesis tests and interpret the results. It seems to me that these students often encounter difficulties mainly because hypothesis testing is rather unclear and abstract to them.

One of the reason it looks abstract to them is because they do not understand the final goal of hypothesis testing—the “why” behind this tool. They often do inferential statistics without understanding the reasoning behind it, as if they were following a cooking recipe which does not require any thinking. However, as soon as they understand the principle underlying hypothesis testing, it is much easier for them to apply the concepts and solve the exercises.

For this reason, I though it would be useful to write an article on the goal of hypothesis tests (the “why?”), in which context they should be used (the “when?”), how they work (the “how?”) and how to interpret the results (the “so what?”). Like anything else in statistics, it becomes much easier to apply a concept in practice when we understand what we are testing or what we are trying to demonstrate beforehand.

In this article, I present—as comprehensibly as possible—the different steps required to perform and conclude a hypothesis test by hand .

These steps are illustrated with a basic example. This will build the theoretical foundations of hypothesis testing, which will in turn be of great help for the understanding of most statistical tests .

Hypothesis tests come in many forms and can be used for many parameters or research questions. The steps I present in this article are not applicable to all hypothesis test, unfortunately.

They are however, appropriate for at least the most common hypothesis tests—the tests on:

  • One mean: \(\mu\)
  • independent samples: \(\mu_1\) and \(\mu_2\)
  • paired samples: \(\mu_D\)
  • One proportion: \(p\)
  • Two proportions: \(p_1\) and \(p_2\)
  • One variance: \(\sigma^2\)
  • Two variances: \(\sigma^2_1\) and \(\sigma^2_2\)

The good news is that the principles behind these 6 statistical tests (and many more) are exactly the same. So if you understand the intuition and the process for one of them, all others pretty much follow.

Unlike descriptive statistics where we only describe the data at hand, hypothesis tests use a subset of observations , referred as a sample , to draw conclusions about a population .

One may wonder why we would try to “guess” or make inference about a parameter of a population based on a sample, instead of simply collecting data for the entire population, compute statistics we are interested in and take decisions based upon that.

The main reason we actually use a sample instead of the entire population is because, most of the time, collecting data on the entire population is practically impossible, too complex, too expensive, it would take too long, or a combination of any of these. 1

So the overall objective of a hypothesis test is to draw conclusions in order to confirm or refute a belief about a population , based on a smaller group of observations.

In practice, we take some measurements of the variable of interest—representing the sample(s)—and we check whether our measurements are likely or not given our assumption (our belief). Based on the probability of observing the sample(s) we have, we decide whether we can trust our belief or not.

Hypothesis tests have many practical applications.

Here are different situations illustrating when the 6 tests mentioned above would be appropriate:

  • One mean: suppose that a health professional would like to test whether the mean weight of Belgian adults is different than 80 kg (176.4 lbs).
  • Independent samples: suppose that a physiotherapist would like to test the effectiveness of a new treatment by measuring the mean response time (in seconds) for patients in a control group and patients in a treatment group, where patients in the two groups are different.
  • Paired samples: suppose that a physiotherapist would like to test the effectiveness of a new treatment by measuring the mean response time (in seconds) before and after a treatment, where patients are measured twice—before and after treatment, so patients are the same in the 2 samples.
  • One proportion: suppose that a political pundit would like to test whether the proportion of citizens who are going to vote for a specific candidate is smaller than 30%.
  • Two proportions: suppose that a doctor would like to test whether the proportion of smokers is different between professional and amateur athletes.
  • One variance: suppose that an engineer would like to test whether a voltmeter has a lower variability than what is imposed by the safety standards.
  • Two variances: suppose that, in a factory, two production lines work independently from each other. The financial manager would like to test whether the costs of the weekly maintenance of these two machines have the same variance. Note that a test on two variances is also often performed to verify the assumption of equal variances, which is required for several other statistical tests, such as the Student’s t-test for instance.

Of course, this is a non-exhaustive list of potential applications and many research questions can be answered thanks to a hypothesis test.

One important point to remember is that in hypothesis testing we are always interested in the population and not in the sample. The sample is used for the aim of drawing conclusions about the population, so we always test in terms of the population.

Usually, hypothesis tests are used to answer research questions in confirmatory analyses . Confirmatory analyses refer to statistical analyses where hypotheses—deducted from theory—are defined beforehand (preferably before data collection). In this approach, the researcher has a specific idea about the variables under consideration and she is trying to see if her idea, specified as hypotheses, is supported by data.

On the other hand, hypothesis tests are rarely used in exploratory analyses. 2 Exploratory analyses aims to uncover possible relationships between the variables under investigation. In this approach, the researcher does not have any clear theory-driven assumptions or ideas in mind before data collection. This is the reason exploratory analyses are sometimes referred as hypothesis-generating analyses—they are used to create some hypotheses, which in turn may be tested via confirmatory analyses at a later stage.

There are, to my knowledge, 3 different methods to perform a hypothesis tests:

Method A: Comparing the test statistic with the critical value

Method b: comparing the p -value with the significance level \(\alpha\), method c: comparing the target parameter with the confidence interval.

Although the process for these 3 approaches may slightly differ, they all lead to the exact same conclusions. Using one method or another is, therefore, more often than not a matter of personal choice or a matter of context. See this section to know which method I use depending on the context.

I present the 3 methods in the following sections, starting with, in my opinion, the most comprehensive one when it comes to doing it by hand: comparing the test statistic with the critical value.

For the three methods, I will explain the required steps to perform a hypothesis test from a general point of view and illustrate them with the following situation: 3

Suppose a health professional who would like to test whether the mean weight of Belgian adults is different than 80 kg.

Note that, as for most hypothesis tests, the test we are going to use as example below requires some assumptions. Since the aim of the present article is to explain a hypothesis test, we assume that all assumptions are met. For the interested reader, see the assumptions (and how to verify them) for this type of hypothesis test in the article presenting the one-sample t-test .

Method A, which consists in comparing the test statistic with the critical value, boils down to the following 4 steps:

  • Stating the null and alternative hypothesis
  • Computing the test statistic
  • Finding the critical value
  • Concluding and interpreting the results

Each step is detailed below.

As discussed before, a hypothesis test first requires an idea, that is, an assumption about a phenomenon. This assumption, referred as hypothesis, is derived from the theory and/or the research question.

Since a hypothesis test is used to confirm or refute a prior belief, we need to formulate our belief so that there is a null and an alternative hypothesis . Those hypotheses must be mutually exclusive , which means that they cannot be true at the same time. This is step #1.

In the context of our scenario, the null and alternative hypothesis are thus:

  • Null hypothesis \(H_0: \mu = 80\)
  • Alternative hypothesis \(H_1: \mu \ne 80\)

When stating the null and alternative hypothesis, bear in mind the following three points:

  • We are always interested in the population and not in the sample. This is the reason \(H_0\) and \(H_1\) will always be written in terms of the population and not in terms of the sample (in this case, \(\mu\) and not \(\bar{x}\) ).
  • The assumption we would like to test is often the alternative hypothesis. If the researcher wanted to test whether the mean weight of Belgian adults was less than 80 kg, she would have stated \(H_0: \mu = 80\) (or equivalently, \(H_0: \mu \ge 80\) ) and \(H_1: \mu < 80\) . 4 Do not mix the null with the alternative hypothesis, or the conclusions will be diametrically opposed!
  • The null hypothesis is often the status quo. For instance, suppose that a doctor wants to test whether the new treatment A is more efficient than the old treatment B. The status quo is that the new and old treatments are equally efficient. Assuming a larger value is better, she will then write \(H_0: \mu_A = \mu_B\) (or equivalently, \(H_0: \mu_A - \mu_B = 0\) ) and \(H_1: \mu_A > \mu_B\) (or equivalently, \(H_0: \mu_A - \mu_B > 0\) ). On the opposite, if the lower the better, she would have written \(H_0: \mu_A = \mu_B\) (or equivalently, \(H_0: \mu_A - \mu_B = 0\) ) and \(H_1: \mu_A < \mu_B\) (or equivalently, \(H_0: \mu_A - \mu_B < 0\) ).

The test statistic (often called t-stat ) is, in some sense, a metric indicating how extreme the observations are compared to the null hypothesis . The higher the t-stat (in absolute value), the more extreme the observations are.

There are several formulas to compute the t-stat, with one formula for each type of hypothesis test—one or two means, one or two proportions, one or two variances. This means that there is a formula to compute the t-stat for a hypothesis test on one mean, another formula for a test on two means, another for a test on one proportion, etc. 5

The only difficulty in this second step is to choose the appropriate formula. As soon as you know which formula to use based on the type of test, you simply have to apply it to the data. For the interested reader, see the different formulas to compute the t-stat for the most common tests in this Shiny app .

Luckily, formulas for hypothesis tests on one and two means, and one and two proportions follow the same structure.

Computing the test statistic for these tests is similar than scaling a random variable (a process also knows as “standardization” or “normalization”) which consists in subtracting the mean from that random variable, and dividing the result by the standard deviation:

\[Z = \frac{X - \mu}{\sigma}\]

For these 4 hypothesis tests (one/two means and one/two proportions), computing the test statistic is like scaling the estimator (computed from the sample) corresponding to the parameter of interest (in the population). So we basically subtract the target parameter from the point estimator and then divide the result by the standard error (which is equivalent to the standard deviation but for an estimator).

If this is unclear, here is how the test statistic (denoted \(t_{obs}\) ) is computed in our scenario (assuming that the variance of the population is unknown):

\[t_{obs} = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}\]

  • \(\bar{x}\) is the sample mean (i.e., the estimator)
  • \(\mu\) is the mean under the null hypothesis (i.e., the target parameter)
  • \(s\) is the sample standard deviation
  • \(n\) is the sample size
  • ( \(\frac{s}{\sqrt{n}}\) is the standard error)

Notice the similarity between the formula of this test statistic and the formula used to standardize a random variable. This structure is the same for a test on two means, one proportion and two proportions, except that the estimator, the parameter and the standard error are, of course, slightly different for each type of test.

Suppose that in our case we have a sample mean of 71 kg ( \(\bar{x}\) = 71), a sample standard deviation of 13 kg ( \(s\) = 13) and a sample size of 10 adults ( \(n\) = 10). Remember that the population mean (the mean under the null hypothesis) is 80 kg ( \(\mu\) = 80).

The t-stat is thus:

\[t_{obs} = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}} = \frac{71 - 80}{\frac{13}{\sqrt{10}}} = -2.189\]

Although formulas are different depending on which parameter you are testing, the value found for the test statistic gives us an indication on how extreme our observations are.

We keep this value of -2.189 in mind because it will be used again in step #4.

Although the t-stat gives us an indication of how extreme our observations are, we cannot tell whether this “score of extremity” is too extreme or not based on its value only.

So, at this point, we cannot yet tell whether our data are too extreme or not. For this, we need to compare our t-stat with a threshold—referred as critical value —given by the probability distribution tables (and which can, of course, also be found with R).

In the same way that the formula to compute the t-stat is different for each parameter of interest, the underlying probability distribution—and thus the statistical table—on which the critical value is based is also different for each target parameter. This means that, in addition to choosing the appropriate formula to compute the t-stat, we also need to select the appropriate probability distribution depending on the parameter we are testing.

Luckily, there are only 4 different probability distributions for the 6 hypothesis tests covered in this article (one/two means, one/two proportions and one/two variances):

  • test on one and two means with known population variance(s)
  • test on two paired samples where the variance of the difference between the 2 samples \(\sigma^2_D\) is known
  • test on one and two proportions (given that some assumptions are met)
  • test on one and two means with un known population variance(s)
  • test on two paired samples where the variance of the difference between the 2 samples \(\sigma^2_D\) is un known
  • test on one variance
  • test on two variances

Each probability distribution also has its own parameters (up to two parameters for the 4 distribution considered here), defining its shape and/or location. Parameter(s) of a probability distribution can be seen as its DNA; meaning that the distribution is entirely defined by its parameter(s).

Taking our initial scenario—a health professional who would like to test whether the mean weight of Belgian adults is different than 80 kg—as example.

The underlying probability distribution of a test on one mean is either the standard Normal or the Student distribution, depending on whether the variance of the population (not sample variance!) is known or unknown: 6

  • If the population variance is known \(\rightarrow\) the standard Normal distribution is used
  • If the population variance is un known \(\rightarrow\) the Student distribution is used

If no population variance is explicitly given, you can assume that it is unknown since you cannot compute it based on a sample. If you could compute it, that would mean you have access to the entire population and there is, in this case, no point in performing a hypothesis test (you could simply use some descriptive statistics to confirm or refute your belief).

In our example, no population variance is specified so it is assumed to be unknown. We therefore use the Student distribution.

The Student distribution has one parameter which defines it; the number of degrees of freedom. The number of degrees of freedom depends on the type of hypothesis test. For instance, the number of degrees of freedom for a test on one mean is equal to the number of observations minus one ( \(n\) - 1). Without going too far into the details, the - 1 comes from the fact that there is one quantity which is estimated (i.e., the mean). 7 The sample size being equal to 10 in our example, the degrees of freedom is equal to \(n\) - 1 = 10 - 1 = 9.

There is only one last element missing to find the critical value: the significance level . The significance level , denoted \(\alpha\) , is the probability of wrongly rejecting the null hypothesis, so the probability of rejecting the null hypothesis although it is in reality true . In this sense, it is an error (type I error, as opposed to the type II error 8 ) that we accept to deal with, in order to be able to draw conclusions about a population based on a subset of it.

As you may have read in many statistical textbooks, the significance level is very often set to 5%. 9 In some fields (such as medicine or engineering, among others), the significance level is also sometimes set to 1% to decrease the error rate.

It is best to specify the significance level before performing a hypothesis test to avoid the temptation to set the significance level in accordance to the results (the temptation is even bigger when the results are on the edge of being significant). As I always tell my students, you cannot “guess” nor compute the significance level. Therefore, if it is not explicitly specified, you can safely assume it is 5%. In our case, we did not indicate it, so we take \(\alpha\) = 5% = 0.05.

Furthermore, in our example, we want to test whether the mean weight of Belgian adults is different than 80 kg. Since we do not specify the direction of the test, it is a two-sided test . If we wanted to test that the mean weight was less than 80 kg ( \(H_1: \mu <\) 80) or greater than 80 kg ( \(H_1: \mu >\) 80), we would have done a one-sided test.

Make sure that you perform the correct test (two-sided or one-sided) because it has an impact on how to find the critical value (see more in the following paragraphs).

So now that we know the appropriate distribution (Student distribution), its parameter (degrees of freedom (df) = 9), the significance level ( \(\alpha\) = 0.05) and the direction (two-sided), we have all we need to find the critical value in the statistical tables :

app hypothesis tests

By looking at the row df = 9 and the column \(t_.025\) in the Student’s distribution table, we find a critical value of:

\[t_{n-1; \alpha / 2} = t_{9; 0.025} = 2.262\]

One may wonder why we take \(t_{\alpha/2} = t_.025\) and not \(t_\alpha = t_.05\) since the significance level is 0.05. The reason is that we are doing a two-sided test ( \(H_1: \mu \ne\) 80), so the error rate of 0.05 must be divided in 2 to find the critical value to the right of the distribution. Since the Student’s distribution is symmetric, the critical value to the left of the distribution is simply: -2.262.

Visually, the error rate of 0.05 is partitioned into two parts:

  • 0.025 to the left of -2.262 and
  • 0.025 to the right of 2.262

app hypothesis tests

We keep in mind these critical values of -2.262 and 2.262 for the fourth and last step.

Note that the red shaded areas in the previous plot are also known as the rejection regions. More on that in the following section.

These critical values can also be found in R, thanks to the qt() function:

The qt() function is used for the Student’s distribution ( q stands for quantile and t for Student). There are other functions accompanying the different distributions:

  • qnorm() for the Normal distribution
  • qchisq() for the Chi-square distribution
  • qf() for the Fisher distribution

In this fourth and last step, all we have to do is to compare the test statistic (computed in step #2) with the critical values (found in step #3) in order to conclude the hypothesis test .

The only two possibilities when concluding a hypothesis test are:

  • Rejection of the null hypothesis
  • Non-rejection of the null hypothesis

In our example of adult weight, remember that:

  • the t-stat is -2.189
  • the critical values are -2.262 and 2.262

Also remember that:

  • the t-stat gives an indication on how extreme our sample is compared to the null hypothesis
  • the critical values are the threshold from which the t-stat is considered as too extreme

To compare the t-stat with the critical values, I always recommend to plot them:

app hypothesis tests

These two critical values form the rejection regions (the red shaded areas):

  • from \(- \infty\) to -2.262, and
  • from 2.262 to \(\infty\)

If the t-stat lies within one of the rejection region, we reject the null hypothesis . On the contrary, if the t-stat does not lie within any of the rejection region, we do not reject the null hypothesis .

As we can see from the above plot, the t-stat is less extreme than the critical value and therefore does not lie within any of the rejection region. In conclusion, we do not reject the null hypothesis that \(\mu = 80\) .

This is the conclusion in statistical terms but they are meaningless without proper interpretation. So it is a good practice to also interpret the result in the context of the problem:

At the 5% significance level, we do not reject the hypothesis that the mean weight of Belgian adults is 80 kg.

From a more philosophical (but still very important) perspective, note that we wrote “we do not reject the null hypothesis” and “we do not reject the hypothesis that the mean weight of Belgian adults is equal to 80 kg”. We did not write “we accept the null hypothesis” nor “the mean weight of Belgian adults is 80 kg”.

The reason is due to the fact that, in hypothesis testing, we conclude something about the population based on a sample. There is, therefore, always some uncertainty and we cannot be 100% sure that our conclusion is correct.

Perhaps it is the case that the mean weight of Belgian adults is in reality different than 80 kg, but we failed to prove it based on the data at hand. It may be the case that if we had more observations, we would have rejected the null hypothesis (since all else being equal, a larger sample size implies a more extreme t-stat). Or, it may be the case that even with more observations, we would not have rejected the null hypothesis because the mean weight of Belgian adults is in reality close to 80 kg. We cannot distinguish between the two.

So we can just say that we did not find enough evidence against the hypothesis that the mean weight of Belgian adults is 80 kg, but we do not conclude that the mean is equal to 80 kg.

If the difference is still not clear to you, the following example may help. Suppose a person is suspected of having committed a crime. This person is either innocent—the null hypothesis—or guilty—the alternative hypothesis. In the attempt to know if the suspect committed the crime, the police collects as much information and proof as possible. This is similar to the researcher collecting data to form a sample. And then the judge, based on the collected evidence, decides whether the suspect is considered as innocent or guilty. If there is enough evidence that the suspect committed the crime, the judge will conclude that the suspect is guilty. In other words, she will reject the null hypothesis of the suspect being innocent because there are enough evidence that the suspect committed the crime.

This is similar to the t-stat being more extreme than the critical value: we have enough information (based on the sample) to say that the null hypothesis is unlikely because our data would be too extreme if the null hypothesis were true. Since the sample cannot be “wrong” (it corresponds to the collected data), the only remaining possibility is that the null hypothesis is in fact wrong. This is the reason we write “we reject the null hypothesis”.

On the other hand, if there is not enough evidence that the suspect committed the crime (or no evidence at all), the judge will conclude that the suspect is considered as not guilty. In other words, she will not reject the null hypothesis of the suspect being innocent. But even if she concludes that the suspect is considered as not guilty, she will never be 100% sure that he is really innocent.

It may be the case that:

  • the suspect did not commit the crime, or
  • the suspect committed the crime but the police was not able to collect enough information against the suspect.

In the former case the suspect is really innocent, whereas in the latter case the suspect is guilty but the police and the judge failed to prove it because they failed to find enough evidence against him. Similar to hypothesis testing, the judge has to conclude the case by considering the suspect not guilty, without being able to distinguish between the two.

This is the main reason we write “we do not reject the null hypothesis” or “we fail to reject the null hypothesis” (you may even read in some textbooks conclusion such as “there is no sufficient evidence in the data to reject the null hypothesis”), and we do not write “we accept the null hypothesis”.

I hope this metaphor helped you to understand the reason why we reject the null hypothesis instead of accepting it.

In the following sections, we present two other methods used in hypothesis testing.

These methods will result in the exact same conclusion: non-rejection of the null hypothesis, that is, we do not reject the hypothesis that the mean weight of Belgian adults is 80 kg. It is thus presented only if you prefer to use these methods over the first one.

Method B, which consists in computing the p -value and comparing this p -value with the significance level \(\alpha\) , boils down to the following 4 steps:

  • Computing the p -value

In this second method which uses the p -value, the first and second steps are similar than in the first method.

The null and alternative hypotheses remain the same:

  • \(H_0: \mu = 80\)
  • \(H_1: \mu \ne 80\)

Remember that the formula for the t-stat is different depending on the type of hypothesis test (one or two means, one or two proportions, one or two variances). In our case of one mean with unknown variance, we have:

The p -value is the probability (so it goes from 0 to 1) of observing a sample at least as extreme as the one we observed if the null hypothesis were true. In some sense, it gives you an indication on how likely your null hypothesis is . It is also defined as the smallest level of significance for which the data indicate rejection of the null hypothesis.

For more information about the p -value, I recommend reading this note about the p -value and the significance level \(\alpha\) .

Formally, the p -value is the area beyond the test statistic. Since we are doing a two-sided test, the p -value is thus the sum of the area above 2.189 and below -2.189.

Visually, the p -value is the sum of the two blue shaded areas in the following plot:

app hypothesis tests

The p -value can computed with precision in R with the pt() function:

The p -value is 0.0563, which indicates that there is a 5.63% chance to observe a sample at least as extreme as the one observed if the null hypothesis were true. This already gives us a hint on whether our t-stat is too extreme or not (and thus whether our null hypothesis is likely or not), but we formally conclude in step #4.

Like the qt() function to find the critical value, we use pt() to find the p -value because the underlying distribution is the Student’s distribution.

Use pnorm() , pchisq() and pf() for the Normal, Chi-square and Fisher distribution, respectively. See also this Shiny app to compute the p -value given a certain t-stat for most probability distributions.

If you do not have access to a computer (during exams for example) you will not be able to compute the p -value precisely, but you can bound it using the statistical table referring to your test.

In our case, we use the Student distribution and we look at the row df = 9 (since df = n - 1):

app hypothesis tests

  • The test statistic is -2.189
  • We take the absolute value, which gives 2.189
  • The value 2.189 is between 1.833 and 2.262 (highlighted in blue in the above table)
  • the area to the right of 1.833 is 0.05
  • the area to the right of 2.262 is 0.025
  • So we know that the area to the right of 2.189 must be between 0.025 and 0.05
  • Since the Student distribution is symmetric, we know that the area to the left of -2.189 must also be between 0.025 and 0.05
  • Therefore, the sum of the two areas must be between 0.05 and 0.10
  • In other words, the p -value is between 0.05 and 0.10 (i.e., 0.05 < p -value < 0.10)

Although we could not compute it precisely, it is enough to conclude our hypothesis test in the last step.

The final step is now to simply compare the p -value (computed in step #3) with the significance level \(\alpha\) . As for all statistical tests :

  • If the p -value is smaller than \(\alpha\) ( p -value < 0.05) \(\rightarrow H_0\) is unlikely \(\rightarrow\) we reject the null hypothesis
  • If the p -value is greater than or equal to \(\alpha\) ( p -value \(\ge\) 0.05) \(\rightarrow H_0\) is likely \(\rightarrow\) we do not reject the null hypothesis

No matter if we take into consideration the exact p -value (i.e., 0.0563) or the bounded one (0.05 < p -value < 0.10), it is larger than 0.05, so we do not reject the null hypothesis. 10 In the context of the problem, we do not reject the null hypothesis that the mean weight of Belgian adults is 80 kg.

Remember that rejecting (or not rejecting) a null hypothesis at the significance level \(\alpha\) using the critical value method (method A) is equivalent to rejecting (or not rejecting) the null hypothesis when the p -value is lower (equal or greater) than \(\alpha\) (method B).

This is the reason we find the exact same conclusion than with method A, and why you should too if you use both methods on the same data and with the same significance level.

Method C, which consists in computing the confidence interval and comparing this confidence interval with the target parameter (the parameter under the null hypothesis), boils down to the following 3 steps:

  • Computing the confidence interval

In this last method which uses the confidence interval, the first step is similar than in the first two methods.

Like hypothesis testing, confidence intervals are a well-known tool in inferential statistics.

Confidence interval is an estimation procedure which produces an interval (i.e., a range of values) containing the true parameter with a certain —usually high— probability .

In the same way that there is a formula for each type of hypothesis test when computing the test statistics, there exists a formula for each type of confidence interval. Formulas for the different types of confidence intervals can be found in this Shiny app .

Here is the formula for a confidence interval on one mean \(\mu\) (with unknown population variance):

\[ (1-\alpha)\text{% CI for } \mu = \bar{x} \pm t_{\alpha/2, n - 1} \frac{s}{\sqrt{n}} \]

where \(t_{\alpha/2, n - 1}\) is found in the Student distribution table (and is similar to the critical value found in step #3 of method A).

Given our data and with \(\alpha\) = 0.05, we have:

\[ \begin{aligned} 95\text{% CI for } \mu &= \bar{x} \pm t_{\alpha/2, n - 1} \frac{s}{\sqrt{n}} \\ &= 71 \pm 2.262 \frac{13}{\sqrt{10}} \\ &= [61.70; 80.30] \end{aligned} \]

The 95% confidence interval for \(\mu\) is [61.70; 80.30] kg. But what does a 95% confidence interval mean?

We know that this estimation procedure has a 95% probability of producing an interval containing the true mean \(\mu\) . In other words, if we construct many confidence intervals (with different samples of the same size), 95% of them will , on average, include the mean of the population (the true parameter). So on average, 5% of these confidence intervals will not cover the true mean.

If you wish to decrease this last percentage, you can decrease the significance level (set \(\alpha\) = 0.01 or 0.02 for instance). All else being equal, this will increase the range of the confidence interval and thus increase the probability that it includes the true parameter.

The final step is simply to compare the confidence interval (constructed in step #2) with the value of the target parameter (the value under the null hypothesis, mentioned in step #1):

  • If the confidence interval does not include the hypothesized value \(\rightarrow H_0\) is unlikely \(\rightarrow\) we reject the null hypothesis
  • If the confidence interval includes the hypothesized value \(\rightarrow H_0\) is likely \(\rightarrow\) we do not reject the null hypothesis

In our example:

  • the hypothesized value is 80 (since \(H_0: \mu\) = 80)
  • 80 is included in the 95% confidence interval since it goes from 61.70 to 80.30 kg
  • So we do not reject the null hypothesis

In the terms of the problem, we do not reject the hypothesis that the mean weight of Belgian adults is 80 kg.

As you can see, the conclusion is equivalent than with the critical value method (method A) and the p -value method (method B). Again, this must be the case since we use the same data and the same significance level \(\alpha\) for all three methods.

All three methods give the same conclusion. However, each method has its own advantage so I usually select the most convenient one depending on the situation:

  • It is, in my opinion, the easiest and most straightforward method of the three when I do not have access to R.
  • In addition to being able to know whether the null hypothesis is rejected or not, computing the exact p -value can be very convenient so I tend to use this method if I have access to R.
  • If I need to test several hypothesized values , I tend to choose this method because I can construct one single confidence interval and compare it to as many values as I want. For example, with our 95% confidence interval [61.70; 80.30], I know that any value below 61.70 kg and above 80.30 kg will be rejected, without testing it for each value.

In this article, we reviewed the goals and when hypothesis testing is used. We then showed how to do a hypothesis test by hand through three different methods (A. critical value , B. p -value and C. confidence interval ). We also showed how to interpret the results in the context of the initial problem.

Although all three methods give the exact same conclusion when using the same data and the same significance level (otherwise there is a mistake somewhere), I also presented my personal preferences when it comes to choosing one method over the other two.

Thanks for reading.

I hope this article helped you to understand the structure of a hypothesis by hand. I remind you that, at least for the 6 hypothesis tests covered in this article, the formulas are different, but the structure and the reasoning behind it remain the same. So you basically have to know which formulas to use, and simply follow the steps mentioned in this article.

For the interested reader, I created two accompanying Shiny apps:

  • Hypothesis testing and confidence intervals : after entering your data, the app illustrates all the steps in order to conclude the test and compute a confidence interval. See more information in this article .
  • How to read statistical tables : the app helps you to compute the p -value given a t-stat for most probability distributions. See more information in this article .

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.

Suppose a researcher wants to test whether Belgian women are taller than French women. Suppose a health professional would like to know whether the proportion of smokers is different among athletes and non-athletes. It would take way too long to measure the height of all Belgian and French women and to ask all athletes and non-athletes their smoking habits. So most of the time, decisions are based on a representative sample of the population and not on the whole population. If we could measure the entire population in a reasonable time frame, we would not do any inferential statistics. ↩︎

Don’t get me wrong, this does not mean that hypothesis tests are never used in exploratory analyses. It is just much less frequent in exploratory research than in confirmatory research. ↩︎

You may see more or less steps in other articles or textbooks, depending on whether these steps are detailed or concise. Hypothesis testing should, however, follows the same process regardless of the number of steps. ↩︎

For one-sided tests, writing \(H_0: \mu = 80\) or \(H_0: \mu \ge 80\) are both correct. The point is that the null and alternative hypothesis must be mutually exclusive since you are testing one hypothesis against the other, so both cannot be true at the same time. ↩︎

To be complete, there are even different formulas within each type of test, depending on whether some assumptions are met or not. For the interested reader, see all the different scenarios and thus the different formulas for a test on one mean and on two means . ↩︎

There are more uncertainty if the population variance is unknown than if it is known, and this greater uncertainty is taken into account by using the Student distribution instead of the standard Normal distribution. Also note that as the sample size increases, the degrees of freedom of the Student distribution increases and the two distributions become more and more similar. For large sample size (usually from \(n >\) 30), the Student distribution becomes so close to the standard Normal distribution that, even if the population variance is unknown, the standard Normal distribution can be used. ↩︎

For a test on two independent samples, the degrees of freedom is \(n_1 + n_2 - 2\) , where \(n_1\) and \(n_2\) are the size of the first and second sample, respectively. Note the - 2 due to the fact that in this case, two quantities are estimated. ↩︎

The type II error is the probability of not rejecting the null hypothesis although it is in reality false. ↩︎

Whether this is a good or a bad standard is a question that comes up often and is debatable. This is, however, beyond the scope of the article. ↩︎

Again, p -values found via a statistical table or via R must be coherent. ↩︎

Related articles

  • One-sample Wilcoxon test in R
  • Correlation coefficient and correlation test in R
  • One-proportion and chi-square goodness of fit test
  • How to perform a one-sample t-test by hand and in R: test on one mean

Liked this post?

  • Get updates every time a new article is published (no spam and unsubscribe anytime):

Yes, receive new posts by email

  • Support the blog

FAQ Contribute Sitemap

This website (https://www.statisticsteacher.org) will be unavailable on due to a scheduled power outage in the area. We apologize for any inconvenience and appreciate your patience during this timeframe.

Statistics Teacher

  • Submissions
  • Announcements
  • Lesson Plans

Developing the Theory of Hypothesis Testing: An Exploration

app hypothesis tests

Craig Lazarski, Cary Academy

There are many concepts associated with hypothesis testing, but it all comes down to variation. How unusual is the variation we observe in a sample?

Students can often lose sight of this basic idea once they have learned the various procedures introduced in an introductory statistics course. Further, they may blindly follow the procedure and never question the impact of the sample size or magnitude of variation on the conclusion they draw.

I developed an R Shiny app inspired by a dice activity in which students try to estimate the distribution of dice produced by six companies and determine if they are fair. The app allows students to test hypotheses and provides the test statistics, p -values, and sampling distribution based on chosen sample size so they can explore the effect sample size has on p -values. It also allows students to explore the power of a test by repeatedly testing the same hypothesis with randomly sampled data and examining visual evidence of the rejection rate. Through this activity, students develop an intuitive understanding of the power of a hypothesis test and improve their understanding of how and why a hypothesis test produces results.

This task was inspired by a simple question: Can you tell if a die is fair? To anyone versed in basic statistical principles, this is a straight-forward task. You should roll the die as many times as possible, and then the law of large numbers will take over and reveal the true distribution. When students approach this task, the results can be surprising.

Day 1: Classroom Data Exploration

To explore this question, I directed students to the first page of the Shiny app. I asked each student to explore one dice manufacturing company. To start, I showed the students how to select a company and choose a sample size. Next, I asked them to describe the results presented by the app, but I did not give them further directions.

After allowing a few minutes for exploration, we came together for a joint discussion and I asked, “Who thinks they found a company that has fair dice?” One student responded that their company, High Rollers, was fair. Then I asked if anyone else in class had this company and, if so, did they come to the same conclusion? Another student responded that they found the company to be unfair.

Analyzing Student Responses for Group Discussion When I asked the students to explain their results and how they came to their conclusions, I got the following responses:

Student 1: I set the sample size to 100 and looked at the graph. The bars were relatively close to the same heights. Then I repeated this three times and the results were similar, so I concluded the die was fair. Figure 1: Student 1 histogram with 100 samples Student 2: I set the sample size to 10,000 and looked at the graph. The bars were different and, when I repeated this, it consistently showed a similar pattern each time. Figure 2: Student 2 histogram with 10,000 samples

Student 1 considered the variation and believed what they were seeing was natural variation one should expect when rolling the die 100 times. Since the bars changed for each sample but the overall pattern seemed to be around a single value for each bar, they concluded it was fair.

Student 2 discovered the law of large numbers. They recognized that the results from sample to sample were very consistent and the bars were not equal. Student 1, after hearing Student 2’s explanation, quickly recanted their theory.

Once we discussed the law of large numbers in more detail, I asked again, “Who thinks they have a fair company?” Another student responded that Pips and Dots is a fair company, and they also used the law of large numbers as the basis for their conclusion. Figure 3 is a graph of 10,000 rolls of a Pips and Dots die using the simulation.

Figure 3: A graph of 10,000 rolls showing Pips and Dots is the only company with a fair distribution

Figure 3: A graph of 10,000 rolls showing Pips and Dots is the only company with a fair distribution

The students all seemed fairly convinced that this company produced a fair die, but I asked them if they were concerned that not all the bars were exactly equal. Could it be that the die is only slightly unfair? How can we measure such deviations and decide? Essentially, I was leading my students to the need for a hypothesis test in this scenario.

We took the results above for the fair die, and I developed the goodness of fit procedure with my students. These students had already completed a lesson on one- and two-proportion z-tests, so hypothesis testing was not a new concept. Our hypothesis test led us to the correct conclusion that they were fair.

End of Day 1: Student Homework Assignment

For homework, I asked students to log in to the Shiny app from home, along with the second tab—“Chi Square Analysis”—to help them complete their homework. This second tab is an extension of the first.

Background on the Chi Square Analysis Logic and Code Just like the first tab, I use the same distribution for each of the dice manufacturing companies throughout every tab in the app. These distributions can be configured through a simple text file, company_weights.csv, which is loaded when the Shiny app is launched.

The only options on this tab are to again select a manufacturer and to set the sample size, indicating how many rolls the student wishes to simulate. The output now includes two graphs. The first is a modified version of the histogram exactly as shown on the first tab, but it has an additional line drawn to indicate where the expected value of each die would be if the die were fair.

I added a second graph that plots the test statistic against a chi square distribution with five degrees of freedom that models the rolls of a six-sided fair die and indicates the rejection region. Now, each time the student clicks the “Update Sample Size” button, the test statistic is plotted on this second graph, allowing the students to see if it lies in the rejection region (see Figure 4).

Note that the alpha level is fixed at 0.05. Further, the actual test statistic and p -value (rounded to four decimal places) are provided under the summary information.

Figure 4: Example of chi square analysis plotting our test statistic

Figure 4: Example of chi square analysis plotting our test statistic

Tasks Left for Student Completion Students were asked to respond to the following questions for this exploration:

  • Do you believe the dice produced by Dice Dice Baby are fair? Without doing a goodness of fit test, what evidence is there for your conclusion?
  • What is the minimum sample size that can be used to perform a goodness of fit test for this company if we want to ensure the expected counts are at least 5?
  • Use the app to generate a sample using the minimum sample size and conduct a goodness of fit test.
  • Use the app to generate a sample of 600. Conduct a goodness of fit test using this data.
  • The Dice Dice Baby company is not producing fair dice. Go to the second tab on the app, “Chi-Square Analysis.” Using the Dice Dice Baby company, run several tests using a sample size of 30 and note the p -value and conclusion you would make. (You will find the test statistic and p -value under the frequency table.) Repeat using a sample size of 600 and note the p -value and conclusion you would make. Make a conjecture about the role sample size plays in a hypothesis test.
  • Use the app to determine what minimum sample size is necessary so the test concludes that the dice is not reliably fair. Reliably means that, in most cases, the test will correctly determine the dice are unfair.

These questions asked students to perform a goodness of fit test on a company they thought to be unfair. Using a small sample size, the student should reach a fail to reject conclusion, while a large sample size would lead them to reject. Using the app, the students are able to quickly execute many iterations of this activity to ensure the response they are seeing isn’t a fluke.

Sample Responses from Homework Task Question 5 asks students to explain what they observed. Sample responses include the following:

Student 1 “The smaller the sample size, the larger the p -value and therefore the more likely it is we would fail to reject the null hypothesis of each face being rolled equally. The larger the sample size, the smaller the p -value and the more likely we will reject the null hypothesis.” Student 2 “I believe that the greater the sample size, the more ‘consistency’ there will be in p -values within samples.” Student 3 “The smaller the sample size, the more scattered and unreliable the results are in a hypothesis test.” Student 4 “According to these tests, a greater sample size is more likely to produce accurate and precise results, which would help with the certainty with which we can claim to reject or fail to reject the null hypothesis.”

The responses indicate the students were developing an understanding that a hypothesis test is not infallible. They were seeing that the sample size has a direct effect on a test correctly identifying a false null hypothesis. Essentially, I took away their blind faith in hypothesis testing. Simply reaching a conclusion does not mean the student has reached the correct conclusion.

The misconception that a hypothesis test always provides the correct conclusion is a vital one to correct, and this interactive, visual exploration of test statistics makes that readily apparent to the student.

Task 3: Power of Tests

In the “Power” tab of the Shiny app is the third and final task for the students to complete. This task can be explored after an in-class discussion with the students about their homework in the previous section. The goal of this task is to restore the students’ faith in hypothesis testing by exploring the power of a test.

Technical Notes

The app was developed in R using ggplot2 and the Shiny library for web applications. The source code is complete and licensed under the GPL v2.0 (free to use, code modifications and improvements should be shared with the author).

The demo app has been deployed on the free-to-use shinyapps.io platform. If you are unable to run the demo, it is most likely because the free version allows for a limited number of compute hours per user per month. You can also download the code directly from git and run it on your own machine in RStudio.

For classroom use, two classes with 18 and nine students, respectively, used approximately 20 compute hours on the shinyapps.io platform to complete all the tasks. The free version of the platform gives only 25 compute hours per month, so this will run out quickly for a large class.

For the remainder of my classes, we used ShinyProxy to serve the application to the rest of our students. ShinyProxy allows you to set up a server that will auto-scale and support hundreds of students concurrently, but must be hosted on your own web server.

To accomplish this, a docker image must be built, and the docker set-up files can also be found in the GitHub repository. See the Dockerfile and Rprofile.site files for configuration. An experienced IT person should be able to configure this in 1–2 hours.

To begin, the students will again select a manufacturing company, choose a sample size, and perform a set number of hypothesis tests (number of simulations). Since the students have already explored the underlying distribution of the manufactures in the first two tasks, I gave them the true distribution of each manufacture just below the button “Update Sample Size.” As students change the manufacture in the drop-down selector, the true distribution is automatically updated and shown below on the left panel.

After some manual exploration, I decided to set the initial sample size to 10 and the number of simulations to 100. These parameters are intentionally not large enough to show the true power of the hypothesis test, and it is up to the students to adjust the numbers until they reach their own conclusions about the power of the test.

Just like tab two, we displayed the computed test statistics using a dot plot; however, the graph no longer had the chi square distribution overlaid as it was on the homework task.

After students clicked the “Update Sample Size” button, a distribution of the test statistics for the number of tests was displayed. The distribution was separated by a reject region that indicated those test statistics that would lead to a conclusion of reject versus those that would lead to a fail to reject conclusion. An example using dice from Dice Dice Baby with a sample size of 100 and running 100 simulations is shown in Figure 5.

app hypothesis tests

Figure 5: Power of a test example

Below this graph, I also presented students with the Type 2 error and power rate (see Figure 6) in the Shiny app.

Figure 6: Power decision rates

Figure 6: Power decision rates

The power was calculated as the number of observed test statistics that led to rejecting the null hypothesis out of the total number of simulations run.

Power Analysis: Guided Activity

Prior to starting the power tasks, I asked the students if they found anything strange or unexpected from their homework exploration in Task 2. One student responded that they had. The student explained that the hypothesis did not always reject when it should and changing the sample size helped. When the sample size was increased, the test more consistently rejected.

After the discussion, I asked students to work in small groups through the Task 3 activity on the “Power” tab and try to answer all the questions. As they worked collaboratively, I walked around listening and observing their work. I noted students were quickly developing an understanding of the power of a test and what a type II error was. Following are one group of students’ responses:

Dice R Us is a company close to being fair. What size sample is needed for the power to be near 90 percent? The power is near 90 percent around 575 rolls for Dice R Us. Dice Dice Baby is a company that is far from being fair. What size sample is needed for the power to be near 90 percent? The power is near 90 percent around 200 rolls for Dice Dice Baby. For the two companies you just explored, how is the sample size related to the power of the test? For both companies that were just explored, the power of the test increases with sample size. How is the degree to which the truth varies from the null hypothesis related to the power of the test? If the truth is far from the null hypothesis, the power will be much higher, as it will be easier for the test to reject the null hypothesis. Type II error is also presented. What is its relation to power? Can you explain what a type II error is? Type II error is the opposite of power, meaning it’s the proportion of simulations that failed to reject the null hypothesis even when it was false.

The students above are clearly showing a deep understanding of both power and type II errors. They understand the role sample size has, as well as the magnitude of the difference from the null hypothesis.

After the exploration, we discussed the answers to the above questions as a class and students showed similar understanding.

Finally, we addressed group responses to the last question in this task, which asks:

Pips and Dots is a company that produced fair dice. Evaluate the power of the test for this company? Can you explain what you are seeing?

The purpose of this question is to challenge students’ understanding of everything they just demonstrated in their prior exploration. The company being analyzed, Pips and Dots, is the company that produced fair dice. Therefore, the test should never reject and, in theory, the question of power is irrelevant.

However, the app still generated a value for power as shown in Figure 7.

Figure 7: Test statistics of the “fair dice” company

Figure 7: Test statistics of the “fair dice” company

Following are examples of the students’ responses to this question:

Group 1 Since the dice were fair, the power is very low, as it is much harder to reject a null hypothesis that is correct. Group 2 The power is extremely low and, as the sample size increases, the power only gets lower because when a company is fair, the company should not reject the null, and the power is the chance that it will correctly reject the null. Group 3 The power stays low even when the sample size is high. The reason it stays low is because why would you reject a correct null?

These responses led to a discussion about the nuance of power in which I specified that we only calculate the power of a test for specific alternatives. When we assume the null is false, we then attempt to determine what the chance is that our test will catch it.

Students understood the power was irrelevant in this case. Even more impressive, a few students recognized the power was describing a mistake. The power was calculating how often the test incorrectly rejected a true null hypothesis, which is similar to a type I error.

It was important that I noted the power calculated was not a type I error, but the connection they made cemented their understanding of the three elements used to evaluate the quality of a hypothesis test.

Personally, my understanding was also affected. One student asked if this was related to the confidence level when constructing a confidence interval. I had never made this connection before, and it is absolutely true! Just as the confidence level evaluates how often an interval will capture a parameter, the power of a test evaluates how often a hypothesis test will correctly reject a false null.

My conclusion after completing this activity is that the students developed an intuitive understanding of what it means to have a test with high power and the types of mistakes that are possible when completing any hypothesis test.

In the past, I have observed students could often parrot back the technical definitions but had trouble interpreting them in their proper context. After completing this activity, students were able to easily identify errors in context, in addition to the power of a test. And they demonstrated an ability to think more critically about the procedures they were employing.

Facebook

Recent Posts

  • Screen Time and Notifications
  • Announcements: Summer 2024
  • New NCTM Statement on Math, AI Relevant for Educators
  • Data Science in Secondary Grades: Exploring Our Communities Through Pictures
  • Get to Know the ASA-NCTM Joint Committee

Previous Issues

  • November 2023
  • October 2022
  • October 2021
  • December 2020
  • November 2020
  • September 2019
  • January 2019
  • October 2017
  • September 2017
  • February 2017
  • January 2017
  • December 2016

Statistics Teacher (ST) is an online journal published by the American Statistical Association (ASA) – National Council of Teachers of Mathematics (NCTM) Joint Committee on Curriculum in Statistics and Probability for Grades K-12. ST supports the teaching and learning of statistics through education articles, lesson plans, announcements, professional development opportunities, technology, assessment, and classroom resources. Authors should use this form to submit articles or lesson plans.

Linen Theme by The Theme Foundry

Copyright © 2024 American Statistical Association. All rights reserved.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved September 9, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Hypothesis Testing Calculator

$H_o$:
$H_a$: μ μ₀
$n$ =   $\bar{x}$ =   =
$\text{Test Statistic: }$ =
$\text{Degrees of Freedom: } $ $df$ =
$ \text{Level of Significance: } $ $\alpha$ =

Type II Error

$H_o$: $\mu$
$H_a$: $\mu$ $\mu_0$
$n$ =   σ =   $\mu$ =
$\text{Level of Significance: }$ $\alpha$ =

The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.

$\sigma$ Known $\sigma$ Unknown
Test Statistic $ z = \dfrac{\bar{x}-\mu_0}{\sigma/\sqrt{{\color{Black} n}}} $ $ t = \dfrac{\bar{x}-\mu_0}{s/\sqrt{n}} $

Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a two-tailed test. To switch from a lower tail test to an upper tail or two-tailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively.

Lower Tail Test Upper Tail Test Two-Tailed Test
$H_0 \colon \mu \geq \mu_0$ $H_0 \colon \mu \leq \mu_0$ $H_0 \colon \mu = \mu_0$
$H_a \colon \mu $H_a \colon \mu \neq \mu_0$

In the p-value approach, the test statistic is used to calculate a p-value. If the test is a lower tail test, the p-value is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the p-value is the probability of getting a value for the test statistic at least as large as the value from the sample. In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample.

To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis. If the p-value is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or two-tailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the p-value from the test statistic by clicking on the switch symbol twice.

In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a two-tailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic.

To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the p-value approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a two-tailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis.

Lower Tail Test Upper Tail Test Two-Tailed Test
If $z \leq -z_\alpha$, reject $H_0$. If $z \geq z_\alpha$, reject $H_0$. If $z \leq -z_{\alpha/2}$ or $z \geq z_{\alpha/2}$, reject $H_0$.
If $t \leq -t_\alpha$, reject $H_0$. If $t \geq t_\alpha$, reject $H_0$. If $t \leq -t_{\alpha/2}$ or $t \geq t_{\alpha/2}$, reject $H_0$.

When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.

Condition
$H_0$ True $H_a$ True
Conclusion Accept $H_0$ Correct Type II Error
Reject $H_0$ Type I Error Correct

Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Interval Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Population Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page.

Logo for Pressbooks at Virginia Tech

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

5.6 Hypothesis Tests in Depth

Establishing the parameter of interest, type of distribution to use, the test statistic, and p -value can help you figure out how to go about a hypothesis test. However, there are several other factors you should consider when interpreting the results.

Rare Events

Suppose you make an assumption about a property of the population (this assumption is the null hypothesis). Then you gather sample data randomly. If the sample has properties that would be very unlikely to occur if the assumption is true, then you would conclude that your assumption about the population is probably incorrect. Remember that your assumption is just an assumption; it is not a fact, and it may or may not be true. But your sample data are real and are showing you a fact that seems to contradict your assumption.

\frac{1}{200}

Errors in Hypothesis Tests

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis H 0 and the decision to reject or not. The outcomes are summarized in the following table:

Figure 5.14: Type I and type II errors
IS ACTUALLY
Action
Correct outcome Type II error
Type I error Correct outcome

The four possible outcomes in the table are:

  • The decision is not to reject H 0 when H 0 is true (correct decision).
  • The decision is to reject H 0 when H 0 is true (incorrect decision known as a type I error ).
  • The decision is not to reject H 0 when, in fact, H 0 is false (incorrect decision known as a type II error ).
  • The decision is to reject H 0 when H 0 is false (correct decision whose probability is called the power of the test).

Each of the errors occurs with a particular probability. The Greek letters α and β represent the probabilities.

α = probability of a type I error = P (type I error) = probability of rejecting the null hypothesis when the null hypothesis is true. These are also known as false positives. We know that α is often determined in advance, and α = 0.05 is often widely accepted. In that case, you are saying, “We are OK making this type of error in 5% of samples.” In fact, the p -value is the exact probability of a type I error based on what you observed.

β = probability of a type II error = P (type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false. These are also known as false negatives.

The power of a test is 1 – β .

Ideally, α and β should be as small as possible because they are probabilities of errors but are rarely zero. We want a high power that is as close to one as well. Increasing the sample size can help us achieve these by reducing both α and β and therefore increasing the power of the test.

Suppose the null hypothesis, H 0 , is that Frank’s rock climbing equipment is safe.

Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe. Type II error: Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.

α = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe. β = probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the type II error, in which Frank thinks his rock climbing equipment is safe, so he goes ahead and uses it.

Suppose the null hypothesis, H 0 , is that the blood cultures contain no traces of pathogen X . State the type I and type II errors.

Statistical Significance vs. Practical Significance

When the sample size becomes larger, point estimates become more precise and any real differences in the mean and null value become easier to detect and recognize. Even a very small difference would likely be detected if we took a large enough sample. Sometimes, researchers will take such large samples that even the slightest difference is detected, even differences where there is no practical value. In such cases, we still say the difference is statistically significant , but it is not practically significant.

For example, an online experiment might identify that placing additional ads on a movie review website statistically significantly increases viewership of a TV show by 0.001%, but this increase might not have any practical value.

One role of a data scientist in conducting a study often includes planning the size of the study. The data scientist might first consult experts or scientific literature to learn what would be the smallest meaningful difference from the null value. She also would obtain other information, such as a very rough estimate of the true proportion p , so that she could roughly estimate the standard error. From here, she could suggest a sample size that is sufficiently large enough to detect the real difference if it is meaningful. While larger sample sizes may still be used, these calculations are especially helpful when considering costs or potential risks, such as possible health impacts to volunteers in a medical study.

Click here for more multimedia resources, including podcasts, videos, lecture notes, and worked examples.

The decision is to reject the null hypothesis when, in fact, the null hypothesis is true

Erroneously rejecting a true null hypothesis or erroneously failing to reject a false null hypothesis

The probability of failing to reject a true hypothesis

Finding sufficient evidence that the observed effect is not just due to variability, often from rejecting the null hypothesis

Significant Statistics Copyright © 2024 by John Morgan Russell, OpenStaxCollege, OpenIntro is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Statistical Hypothesis Testing Overview

By Jim Frost 59 Comments

In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables.

This post provides an overview of statistical hypothesis testing. If you need to perform hypothesis tests, consider getting my book, Hypothesis Testing: An Intuitive Guide .

Why You Should Perform Statistical Hypothesis Testing

Graph that displays mean drug scores by group. Use hypothesis testing to determine whether the difference between the means are statistically significant.

Hypothesis testing is a form of inferential statistics that allows us to draw conclusions about an entire population based on a representative sample. You gain tremendous benefits by working with a sample. In most cases, it is simply impossible to observe the entire population to understand its properties. The only alternative is to collect a random sample and then use statistics to analyze it.

While samples are much more practical and less expensive to work with, there are trade-offs. When you estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly.  For instance, your sample mean is unlikely to equal the population mean. The difference between the sample statistic and the population value is the sample error.

Differences that researchers observe in samples might be due to sampling error rather than representing a true effect at the population level. If sampling error causes the observed difference, the next time someone performs the same experiment the results might be different. Hypothesis testing incorporates estimates of the sampling error to help you make the correct decision. Learn more about Sampling Error .

For example, if you are studying the proportion of defects produced by two manufacturing methods, any difference you observe between the two sample proportions might be sample error rather than a true difference. If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics. That can be a costly mistake!

Let’s cover some basic hypothesis testing terms that you need to know.

Background information : Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Hypothesis Testing

Hypothesis testing is a statistical analysis that uses sample data to assess two mutually exclusive theories about the properties of a population. Statisticians call these theories the null hypothesis and the alternative hypothesis. A hypothesis test assesses your sample statistic and factors in an estimate of the sample error to determine which hypothesis the data support.

When you can reject the null hypothesis, the results are statistically significant, and your data support the theory that an effect exists at the population level.

The effect is the difference between the population value and the null hypothesis value. The effect is also known as population effect or the difference. For example, the mean difference between the health outcome for a treatment group and a control group is the effect.

Typically, you do not know the size of the actual effect. However, you can use a hypothesis test to help you determine whether an effect exists and to estimate its size. Hypothesis tests convert your sample effect into a test statistic, which it evaluates for statistical significance. Learn more about Test Statistics .

An effect can be statistically significant, but that doesn’t necessarily indicate that it is important in a real-world, practical sense. For more information, read my post about Statistical vs. Practical Significance .

Null Hypothesis

The null hypothesis is one of two mutually exclusive theories about the properties of the population in hypothesis testing. Typically, the null hypothesis states that there is no effect (i.e., the effect size equals zero). The null is often signified by H 0 .

In all hypothesis testing, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, the proportion of defect in a manufacturing process, and so on. There is some benefit or difference that the researchers hope to identify.

However, it’s possible that there is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. Therefore, if you can reject the null, you can favor the alternative hypothesis, which states that the effect exists (doesn’t equal zero) at the population level.

You can think of the null as the default theory that requires sufficiently strong evidence against in order to reject it.

For example, in a 2-sample t-test, the null often states that the difference between the two means equals zero.

When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .

Related post : Understanding the Null Hypothesis in More Detail

Alternative Hypothesis

The alternative hypothesis is the other theory about the properties of the population in hypothesis testing. Typically, the alternative hypothesis states that a population parameter does not equal the null hypothesis value. In other words, there is a non-zero effect. If your sample contains sufficient evidence, you can reject the null and favor the alternative hypothesis. The alternative is often identified with H 1 or H A .

For example, in a 2-sample t-test, the alternative often states that the difference between the two means does not equal zero.

You can specify either a one- or two-tailed alternative hypothesis:

If you perform a two-tailed hypothesis test, the alternative states that the population parameter does not equal the null value. For example, when the alternative hypothesis is H A : μ ≠ 0, the test can detect differences both greater than and less than the null value.

A one-tailed alternative has more power to detect an effect but it can test for a difference in only one direction. For example, H A : μ > 0 can only test for differences that are greater than zero.

Related posts : Understanding T-tests and One-Tailed and Two-Tailed Hypothesis Tests Explained

Image of a P for the p-value in hypothesis testing.

P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null. You use P-values in conjunction with the significance level to determine whether your data favor the null or alternative hypothesis.

Related post : Interpreting P-values Correctly

Significance Level (Alpha)

image of the alpha symbol for hypothesis testing.

For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.

Use p-values and significance levels together to help you determine which hypothesis the data support. If the p-value is less than your significance level, you can reject the null and conclude that the effect is statistically significant. In other words, the evidence in your sample is strong enough to be able to reject the null hypothesis at the population level.

Related posts : Graphical Approach to Significance Levels and P-values and Conceptual Approach to Understanding Significance Levels

Types of Errors in Hypothesis Testing

Statistical hypothesis tests are not 100% accurate because they use a random sample to draw conclusions about entire populations. There are two types of errors related to drawing an incorrect conclusion.

  • False positives: You reject a null that is true. Statisticians call this a Type I error . The Type I error rate equals your significance level or alpha (α).
  • False negatives: You fail to reject a null that is false. Statisticians call this a Type II error. Generally, you do not know the Type II error rate. However, it is a larger risk when you have a small sample size , noisy data, or a small effect size. The type II error rate is also known as beta (β).

Statistical power is the probability that a hypothesis test correctly infers that a sample effect exists in the population. In other words, the test correctly rejects a false null hypothesis. Consequently, power is inversely related to a Type II error. Power = 1 – β. Learn more about Power in Statistics .

Related posts : Types of Errors in Hypothesis Testing and Estimating a Good Sample Size for Your Study Using Power Analysis

Which Type of Hypothesis Test is Right for You?

There are many different types of procedures you can use. The correct choice depends on your research goals and the data you collect. Do you need to understand the mean or the differences between means? Or, perhaps you need to assess proportions. You can even use hypothesis testing to determine whether the relationships between variables are statistically significant.

To choose the proper statistical procedure, you’ll need to assess your study objectives and collect the correct type of data . This background research is necessary before you begin a study.

Related Post : Hypothesis Tests for Continuous, Binary, and Count Data

Statistical tests are crucial when you want to use sample data to make conclusions about a population because these tests account for sample error. Using significance levels and p-values to determine when to reject the null hypothesis improves the probability that you will draw the correct conclusion.

To see an alternative approach to these traditional hypothesis testing methods, learn about bootstrapping in statistics !

If you want to see examples of hypothesis testing in action, I recommend the following posts that I have written:

  • How Effective Are Flu Shots? This example shows how you can use statistics to test proportions.
  • Fatality Rates in Star Trek . This example shows how to use hypothesis testing with categorical data.
  • Busting Myths About the Battle of the Sexes . A fun example based on a Mythbusters episode that assess continuous data using several different tests.
  • Are Yawns Contagious? Another fun example inspired by a Mythbusters episode.

Share this:

app hypothesis tests

Reader Interactions

' src=

January 14, 2024 at 8:43 am

Hello professor Jim, how are you doing! Pls. What are the properties of a population and their examples? Thanks for your time and understanding.

' src=

January 14, 2024 at 12:57 pm

Please read my post about Populations vs. Samples for more information and examples.

Also, please note there is a search bar in the upper-right margin of my website. Use that to search for topics.

' src=

July 5, 2023 at 7:05 am

Hello, I have a question as I read your post. You say in p-values section

“P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null.”

But according to your definition of effect, the null states that an effect does not exist, correct? So what I assume you want to say is that “P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is **incorrect**.”

July 6, 2023 at 5:18 am

Hi Shrinivas,

The correct definition of p-value is that it is a probability that exists in the context of a true null hypothesis. So, the quotation is correct in stating “if the null hypothesis is correct.”

Essentially, the p-value tells you the likelihood of your observed results (or more extreme) if the null hypothesis is true. It gives you an idea of whether your results are surprising or unusual if there is no effect.

Hence, with sufficiently low p-values, you reject the null hypothesis because it’s telling you that your sample results were unlikely to have occurred if there was no effect in the population.

I hope that helps make it more clear. If not, let me know I’ll attempt to clarify!

' src=

May 8, 2023 at 12:47 am

Thanks a lot Ny best regards

May 7, 2023 at 11:15 pm

Hi Jim Can you tell me something about size effect? Thanks

May 8, 2023 at 12:29 am

Here’s a post that I’ve written about Effect Sizes that will hopefully tell you what you need to know. Please read that. Then, if you have any more specific questions about effect sizes, please post them there. Thanks!

' src=

January 7, 2023 at 4:19 pm

Hi Jim, I have only read two pages so far but I am really amazed because in few paragraphs you made me clearly understand the concepts of months of courses I received in biostatistics! Thanks so much for this work you have done it helps a lot!

January 10, 2023 at 3:25 pm

Thanks so much!

' src=

June 17, 2021 at 1:45 pm

Can you help in the following question: Rocinante36 is priced at ₹7 lakh and has been designed to deliver a mileage of 22 km/litre and a top speed of 140 km/hr. Formulate the null and alternative hypotheses for mileage and top speed to check whether the new models are performing as per the desired design specifications.

' src=

April 19, 2021 at 1:51 pm

Its indeed great to read your work statistics.

I have a doubt regarding the one sample t-test. So as per your book on hypothesis testing with reference to page no 45, you have mentioned the difference between “the sample mean and the hypothesised mean is statistically significant”. So as per my understanding it should be quoted like “the difference between the population mean and the hypothesised mean is statistically significant”. The catch here is the hypothesised mean represents the sample mean.

Please help me understand this.

Regards Rajat

April 19, 2021 at 3:46 pm

Thanks for buying my book. I’m so glad it’s been helpful!

The test is performed on the sample but the results apply to the population. Hence, if the difference between the sample mean (observed in your study) and the hypothesized mean is statistically significant, that suggests that population does not equal the hypothesized mean.

For one sample tests, the hypothesized mean is not the sample mean. It is a mean that you want to use for the test value. It usually represents a value that is important to your research. In other words, it’s a value that you pick for some theoretical/practical reasons. You pick it because you want to determine whether the population mean is different from that particular value.

I hope that helps!

' src=

November 5, 2020 at 6:24 am

Jim, you are such a magnificent statistician/economist/econometrician/data scientist etc whatever profession. Your work inspires and simplifies the lives of so many researchers around the world. I truly admire you and your work. I will buy a copy of each book you have on statistics or econometrics. Keep doing the good work. Remain ever blessed

November 6, 2020 at 9:47 pm

Hi Renatus,

Thanks so much for you very kind comments. You made my day!! I’m so glad that my website has been helpful. And, thanks so much for supporting my books! 🙂

' src=

November 2, 2020 at 9:32 pm

Hi Jim, I hope you are aware of 2019 American Statistical Association’s official statement on Statistical Significance: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 In case you do not bother reading the full article, may I quote you the core message here: “We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way."

With best wishes,

November 3, 2020 at 2:09 am

I’m definitely aware of the debate surrounding how to use p-values most effectively. However, I need to correct you on one point. The link you provide is NOT a statement by the American Statistical Association. It is an editorial by several authors.

There is considerable debate over this issue. There are problems with p-values. However, as the authors state themselves, much of the problem is over people’s mindsets about how to use p-values and their incorrect interpretations about what statistical significance does and does not mean.

If you were to read my website more thoroughly, you’d be aware that I share many of their concerns and I address them in multiple posts. One of the authors’ key points is the need to be thoughtful and conduct thoughtful research and analysis. I emphasize this aspect in multiple posts on this topic. I’ll ask you to read the following three because they all address some of the authors’ concerns and suggestions. But you might run across others to read as well.

Five Tips for Using P-values to Avoid Being Misled How to Interpret P-values Correctly P-values and the Reproducibility of Experimental Results

' src=

September 24, 2020 at 11:52 pm

HI Jim, i just want you to know that you made explanation for Statistics so simple! I should say lesser and fewer words that reduce the complexity. All the best! 🙂

September 25, 2020 at 1:03 am

Thanks, Rene! Your kind words mean a lot to me! I’m so glad it has been helpful!

' src=

September 23, 2020 at 2:21 am

Honestly, I never understood stats during my entire M.Ed course and was another nightmare for me. But how easily you have explained each concept, I have understood stats way beyond my imagination. Thank you so much for helping ignorant research scholars like us. Looking forward to get hardcopy of your book. Kindly tell is it available through flipkart?

September 24, 2020 at 11:14 pm

I’m so happy to hear that my website has been helpful!

I checked on flipkart and it appears like my books are not available there. I’m never exactly sure where they’re available due to the vagaries of different distribution channels. They are available on Amazon in India.

Introduction to Statistics: An Intuitive Guide (Amazon IN) Hypothesis Testing: An Intuitive Guide (Amazon IN)

' src=

July 26, 2020 at 11:57 am

Dear Jim I am a teacher from India . I don’t have any background in statistics, and still I should tell that in a single read I can follow your explanations . I take my entire biostatistics class for botany graduates with your explanations. Thanks a lot. May I know how I can avail your books in India

July 28, 2020 at 12:31 am

Right now my books are only available as ebooks from my website. However, soon I’ll have some exciting news about other ways to obtain it. Stay tuned! I’ll announce it on my email list. If you’re not already on it, you can sign up using the form that is in the right margin of my website.

' src=

June 22, 2020 at 2:02 pm

Also can you please let me if this book covers topics like EDA and principal component analysis?

June 22, 2020 at 2:07 pm

This book doesn’t cover principal components analysis. Although, I wouldn’t really classify that as a hypothesis test. In the future, I might write a multivariate analysis book that would cover this and others. But, that’s well down the road.

My Introduction to Statistics covers EDA. That’s the largely graphical look at your data that you often do prior to hypothesis testing. The Introduction book perfectly leads right into the Hypothesis Testing book.

June 22, 2020 at 1:45 pm

Thanks for the detailed explanation. It does clear my doubts. I saw that your book related to hypothesis testing has the topics that I am studying currently. I am looking forward to purchasing it.

Regards, Take Care

June 19, 2020 at 1:03 pm

For this particular article I did not understand a couple of statements and it would great if you could help: 1)”If sample error causes the observed difference, the next time someone performs the same experiment the results might be different.” 2)”If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics.”

I discovered your articles by chance and now I keep coming back to read & understand statistical concepts. These articles are very informative & easy to digest. Thanks for the simplifying things.

June 20, 2020 at 9:53 pm

I’m so happy to hear that you’ve found my website to be helpful!

To answer your questions, keep in mind that a central tenant of inferential statistics is that the random sample that a study drew was only one of an infinite number of possible it could’ve drawn. Each random sample produces different results. Most results will cluster around the population value assuming they used good methodology. However, random sampling error always exists and makes it so that population estimates from a sample almost never exactly equal the correct population value.

So, imagine that we’re studying a medication and comparing the treatment and control groups. Suppose that the medicine is truly not effect and that the population difference between the treatment and control group is zero (i.e., no difference.) Despite the true difference being zero, most sample estimates will show some degree of either a positive or negative effect thanks to random sampling error. So, just because a study has an observed difference does not mean that a difference exists at the population level. So, on to your questions:

1. If the observed difference is just random error, then it makes sense that if you collected another random sample, the difference could change. It could change from negative to positive, positive to negative, more extreme, less extreme, etc. However, if the difference exists at the population level, most random samples drawn from the population will reflect that difference. If the medicine has an effect, most random samples will reflect that fact and not bounce around on both sides of zero as much.

2. This is closely related to the previous answer. If there is no difference at the population level, but say you approve the medicine because of the observed effects in a sample. Even though your random sample showed an effect (which was really random error), that effect doesn’t exist. So, when you start using it on a larger scale, people won’t benefit from the medicine. That’s why it’s important to separate out what is easily explained by random error versus what is not easily explained by it.

I think reading my post about how hypothesis tests work will help clarify this process. Also, in about 24 hours (as I write this), I’ll be releasing my new ebook about Hypothesis Testing!

' src=

May 29, 2020 at 5:23 am

Hi Jim, I really enjoy your blog. Can you please link me on your blog where you discuss about Subgroup analysis and how it is done? I need to use non parametric and parametric statistical methods for my work and also do subgroup analysis in order to identify potential groups of patients that may benefit more from using a treatment than other groups.

May 29, 2020 at 2:12 pm

Hi, I don’t have a specific article about subgroup analysis. However, subgroup analysis is just the dividing up of a larger sample into subgroups and then analyzing those subgroups separately. You can use the various analyses I write about on the subgroups.

Alternatively, you can include the subgroups in regression analysis as an indicator variable and include that variable as a main effect and an interaction effect to see how the relationships vary by subgroup without needing to subdivide your data. I write about that approach in my article about comparing regression lines . This approach is my preferred approach when possible.

' src=

April 19, 2020 at 7:58 am

sir is confidence interval is a part of estimation?

' src=

April 17, 2020 at 3:36 pm

Sir can u plz briefly explain alternatives of hypothesis testing? I m unable to find the answer

April 18, 2020 at 1:22 am

Assuming you want to draw conclusions about populations by using samples (i.e., inferential statistics ), you can use confidence intervals and bootstrap methods as alternatives to the traditional hypothesis testing methods.

' src=

March 9, 2020 at 10:01 pm

Hi JIm, could you please help with activities that can best teach concepts of hypothesis testing through simulation, Also, do you have any question set that would enhance students intuition why learning hypothesis testing as a topic in introductory statistics. Thanks.

' src=

March 5, 2020 at 3:48 pm

Hi Jim, I’m studying multiple hypothesis testing & was wondering if you had any material that would be relevant. I’m more trying to understand how testing multiple samples simultaneously affects your results & more on the Bonferroni Correction

March 5, 2020 at 4:05 pm

I write about multiple comparisons (aka post hoc tests) in the ANOVA context . I don’t talk about Bonferroni Corrections specifically but I cover related types of corrections. I’m not sure if that exactly addresses what you want to know but is probably the closest I have already written. I hope it helps!

' src=

January 14, 2020 at 9:03 pm

Thank you! Have a great day/evening.

January 13, 2020 at 7:10 pm

Any help would be greatly appreciated. What is the difference between The Hypothesis Test and The Statistical Test of Hypothesis?

January 14, 2020 at 11:02 am

They sound like the same thing to me. Unless this is specialized terminology for a particular field or the author was intending something specific, I’d guess they’re one and the same.

' src=

April 1, 2019 at 10:00 am

so these are the only two forms of Hypothesis used in statistical testing?

April 1, 2019 at 10:02 am

Are you referring to the null and alternative hypothesis? If so, yes, that’s those are the standard hypotheses in a statistical hypothesis test.

April 1, 2019 at 9:57 am

year very insightful post, thanks for the write up

' src=

October 27, 2018 at 11:09 pm

hi there, am upcoming statistician, out of all blogs that i have read, i have found this one more useful as long as my problem is concerned. thanks so much

October 27, 2018 at 11:14 pm

Hi Stano, you’re very welcome! Thanks for your kind words. They mean a lot! I’m happy to hear that my posts were able to help you. I’m sure you will be a fantastic statistician. Best of luck with your studies!

' src=

October 26, 2018 at 11:39 am

Dear Jim, thank you very much for your explanations! I have a question. Can I use t-test to compare two samples in case each of them have right bias?

October 26, 2018 at 12:00 pm

Hi Tetyana,

You’re very welcome!

The term “right bias” is not a standard term. Do you by chance mean right skewed distributions? In other words, if you plot the distribution for each group on a histogram they have longer right tails? These are not the symmetrical bell-shape curves of the normal distribution.

If that’s the case, yes you can as long as you exceed a specific sample size within each group. I include a table that contains these sample size requirements in my post about nonparametric vs parametric analyses .

Bias in statistics refers to cases where an estimate of a value is systematically higher or lower than the true value. If this is the case, you might be able to use t-tests, but you’d need to be sure to understand the nature of the bias so you would understand what the results are really indicating.

I hope this helps!

' src=

April 2, 2018 at 7:28 am

Simple and upto the point 👍 Thank you so much.

April 2, 2018 at 11:11 am

Hi Kalpana, thanks! And I’m glad it was helpful!

' src=

March 26, 2018 at 8:41 am

Am I correct if I say: Alpha – Probability of wrongly rejection of null hypothesis P-value – Probability of wrongly acceptance of null hypothesis

March 28, 2018 at 3:14 pm

You’re correct about alpha. Alpha is the probability of rejecting the null hypothesis when the null is true.

Unfortunately, your definition of the p-value is a bit off. The p-value has a fairly convoluted definition. It is the probability of obtaining the effect observed in a sample, or more extreme, if the null hypothesis is true. The p-value does NOT indicate the probability that either the null or alternative is true or false. Although, those are very common misinterpretations. To learn more, read my post about how to interpret p-values correctly .

' src=

March 2, 2018 at 6:10 pm

I recently started reading your blog and it is very helpful to understand each concept of statistical tests in easy way with some good examples. Also, I recommend to other people go through all these blogs which you posted. Specially for those people who have not statistical background and they are facing to many problems while studying statistical analysis.

Thank you for your such good blogs.

March 3, 2018 at 10:12 pm

Hi Amit, I’m so glad that my blog posts have been helpful for you! It means a lot to me that you took the time to write such a nice comment! Also, thanks for recommending by blog to others! I try really hard to write posts about statistics that are easy to understand.

' src=

January 17, 2018 at 7:03 am

I recently started reading your blog and I find it very interesting. I am learning statistics by my own, and I generally do many google search to understand the concepts. So this blog is quite helpful for me, as it have most of the content which I am looking for.

January 17, 2018 at 3:56 pm

Hi Shashank, thank you! And, I’m very glad to hear that my blog is helpful!

' src=

January 2, 2018 at 2:28 pm

thank u very much sir.

January 2, 2018 at 2:36 pm

You’re very welcome, Hiral!

' src=

November 21, 2017 at 12:43 pm

Thank u so much sir….your posts always helps me to be a #statistician

November 21, 2017 at 2:40 pm

Hi Sachin, you’re very welcome! I’m happy that you find my posts to be helpful!

' src=

November 19, 2017 at 8:22 pm

great post as usual, but it would be nice to see an example.

November 19, 2017 at 8:27 pm

Thank you! At the end of this post, I have links to four other posts that show examples of hypothesis tests in action. You’ll find what you’re looking for in those posts!

Comments and Questions Cancel reply

Help Center Help Center

  • Help Center
  • Trial Software
  • Product Updates
  • Documentation

Hypothesis Tests

Statistics and Machine Learning Toolbox™ provides parametric and nonparametric hypothesis tests to help you determine if your sample data comes from a population with particular characteristics.

Distribution tests, such as Anderson-Darling and one-sample Kolmogorov-Smirnov, test whether sample data comes from a population with a particular distribution. Test whether two sets of sample data have the same distribution using tests such as two-sample Kolmogorov-Smirnov.

Location tests, such as z -test and one-sample t -test, test whether sample data comes from a population with a particular mean or median. Test two or more sets of sample data for the same location value using a two-sample t -test or multiple comparison test.

Dispersion tests, such as Chi-square variance, test whether sample data comes from a population with a particular variance. Compare the variances of two or more sample data sets using a two-sample F -test or multiple-sample test.

Determine additional features of sample data by cross-tabulating, conducting a run test for randomness, and determine the sample size and power for a hypothesis test.

Distribution Tests

Anderson-Darling test
Chi-square goodness-of-fit test
Cross-tabulation
Durbin-Watson test with residual inputs
Fisher’s exact test
Jarque-Bera test
One-sample Kolmogorov-Smirnov test
Two-sample Kolmogorov-Smirnov test
Lilliefors test
Run test for randomness

Location Tests

Friedman’s test
Kruskal-Wallis test
Multiple comparison test
Wilcoxon rank sum test
Sample size and power of test
Wilcoxon signed rank test
Sign test
One-sample and paired-sample -test
Two-sample -test
-test

Dispersion Tests

Ansari-Bradley test
Bartlett’s test
Sample size and power of test
Chi-square variance test
Two-sample -test for equal variances
Multiple-sample tests for equal variances

Estimation Statistics

One-sample or two-sample effect size computations
Gardner-Altman plot for two-sample effect size

Batch Drift Detection

Detect drift.

Detect drifts between baseline and target data using permutation testing

Access Test Results

Diagnostics information for batch drift detection

Examine Test Results

Summary table for object
Compute empirical cumulative distribution function (ecdf) for baseline and target data specified for data drift detection
Compute histogram bin counts for specified variables in baseline and target data for drift detection
Plot -values and confidence intervals for variables tested for data drift
Plot empirical cumulative distribution function (ecdf) of a variable specified for data drift detection
Plot histogram of a variable specified for data drift detection
Plot histogram of permutation results for a variable specified for data drift detection

View hypothesis tests of distributions and statistics.

Hypothesis testing is a common method of drawing inferences about a population based on statistical evidence from a sample.

All hypothesis tests share the same basic terminology and structure.

Different hypothesis tests make different assumptions about the distribution of the random variable being sampled in the data.

Featured Examples

Selecting a Sample Size

Selecting a Sample Size

Determine the number of samples or observations needed to carry out a statistical test. It illustrates sample size calculations for a simple problem, then shows how to use the sampsizepwr function to compute power and sample size for two more realistic problems. Finally, it illustrates the use of Statistics and Machine Learning Toolbox™ functions to compute the required sample size for a test that the sampsizepwr function does not support.

MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

  • Switzerland (English)
  • Switzerland (Deutsch)
  • Switzerland (Français)
  • 中国 (English)

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

  • América Latina (Español)
  • Canada (English)
  • United States (English)
  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)

Contact your local office

IMAGES

  1. Guide for Hypothesis-Driven Development: How to Form a List of

    app hypothesis tests

  2. HL Apps Hypothesis Tests #3 Poisson Bivariate

    app hypothesis tests

  3. Hypothesis Test APK for Android Download

    app hypothesis tests

  4. Building App Store Hypotheses

    app hypothesis tests

  5. Building App Store Hypotheses

    app hypothesis tests

  6. Hypothesis testing Infographics by: Mariz Turdanes

    app hypothesis tests

VIDEO

  1. HYPOTHESIS TESTS

  2. Introduction on Hypothesis testing

  3. Lesson 33 : Hypothesis Testing Procedure for One Population Mean

  4. Intro to Hypothesis Testing

  5. Hypothesis Testing in Machine Learning

  6. Hypothesis Testing 🔥 Explained in 60 Seconds

COMMENTS

  1. How to Build a List of Hypotheses for Mobile App (Guide for Hypothesis

    The only way to test a list of hypotheses for the mobile app is to give the product to a potential customer as soon as possible. If you follow this methodology consistently, you will realize that most hypotheses fail. You assume, fail, and have to go back to the beginning each time to test new hypotheses.

  2. Free Statistics App: t-test, chi-square, correlation, ANOVA, Regression

    Free Statistics App: t-test, chi-square, correlation, ANOVA ...

  3. A Shiny app for inferential statistics by hand

    The two major tools in inferential statistics are: confidence intervals, and. hypothesis tests. Here is a Shiny app which helps you to use these two tools: Statistics-201. This Shiny app focuses on confidence intervals and hypothesis tests for: 1 and 2 means (with unpaired and paired samples) 1 and 2 proportions. 1 and 2 variances.

  4. The Ultimate App Store Test (Part 1): Building Hypotheses

    We found that app store tests with strong hypotheses are 90% more likely to be 'successful' - industry lingo for producing an actionable insight that drives a conversion rate improvement. Having a strong hypothesis helps you avoid having to explain why time and resources were spent on a test that failed to produce useful learnings.

  5. Everything You Need To Know about Hypothesis Testing

    Everything You Need To Know about Hypothesis Testing

  6. Step-by-step guide to hypothesis testing in statistics

    Simply put, hypothesis testing is a way to use data to help make decisions and understand what the data is really telling us, even when we don't have all the answers. Importance Of Hypothesis Testing In Decision-Making And Data Analysis. Hypothesis testing is important because it helps us make smart choices and understand data better.

  7. T-test and Hypothesis Testing (Explained Simply)

    T-test and Hypothesis Testing (Explained Simply)

  8. A Complete Guide to Hypothesis Testing

    2. Photo from StepUp Analytics. Hypothesis testing is a method of statistical inference that considers the null hypothesis H ₀ vs. the alternative hypothesis H a, where we are typically looking to assess evidence against H ₀. Such a test is used to compare data sets against one another, or compare a data set against some external standard.

  9. Hypothesis test by hand

    Learn the structure of a hypothesis test by hand, illustrated by 4 easy steps using the critical value, p-value and confidence interval methods. ... For the interested reader, see the different formulas to compute the t-stat for the most common tests in this Shiny app. Luckily, formulas for hypothesis tests on one and two means, and one and two ...

  10. Developing the Theory of Hypothesis Testing: An Exploration

    The app allows students to test hypotheses and provides the test statistics, p-values, and sampling distribution based on chosen sample size so they can explore the effect sample size has on p-values. It also allows students to explore the power of a test by repeatedly testing the same hypothesis with randomly sampled data and examining visual ...

  11. Hypothesis Testing

    Hypothesis Testing | A Step-by-Step Guide with Easy ...

  12. Hypothesis Testing Calculator with Steps

    Hypothesis Testing Calculator with Steps

  13. Mastering Hypothesis Testing: A Comprehensive Guide for ...

    7. Hypothesis Testing in the Age of Big Data - Challenges and opportunities with large datasets. - The role of software and automation in hypothesis testing. 8. Conclusion - Summarising key takeaways.

  14. Hypothesis Testing: Uses, Steps & Example

    The researchers write their hypotheses. These statements apply to the population, so they use the mu (μ) symbol for the population mean parameter.. Null Hypothesis (H 0): The population means of the test scores for the two groups are equal (μ 1 = μ 2).; Alternative Hypothesis (H A): The population means of the test scores for the two groups are unequal (μ 1 ≠ μ 2).

  15. 5.6 Hypothesis Tests in Depth

    Errors in Hypothesis Tests. When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis H 0 and the decision to reject or not. The outcomes are summarized in the following table: Figure 5.14: Type I and type II errors; H 0 IS ACTUALLY; Action: True:

  16. Statistical Hypothesis Testing Overview

    Statistical Hypothesis Testing Overview

  17. Hypothesis Testing with Python: Step by step hands-on tutorial with

    Hypothesis Testing with Python: Step by step hands-on ...

  18. A Comprehensive Guide to Hypothesis Testing: Understanding ...

    A Comprehensive Guide to Hypothesis Testing

  19. Introduction to Hypothesis Testing

    A hypothesis test consists of five steps: 1. State the hypotheses. State the null and alternative hypotheses. These two hypotheses need to be mutually exclusive, so if one is true then the other must be false. 2. Determine a significance level to use for the hypothesis. Decide on a significance level.

  20. Understanding Hypothesis Testing. A simple yet detailed dive into all

    Understanding Hypothesis Testing

  21. Khan Academy

    Unit 12: Significance tests (hypothesis testing)

  22. Hypothesis Tests

    Hypothesis testing is a common method of drawing inferences about a population based on statistical evidence from a sample. All hypothesis tests share the same basic terminology and structure. Different hypothesis tests make different assumptions about the distribution of the random variable being sampled in the data.

  23. Hypothesis testing for data scientists

    Hypothesis testing for data scientists | by Alicia Horsch