Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Understanding P values | Definition and Examples

Understanding P-values | Definition and Examples

Published on July 16, 2020 by Rebecca Bevans . Revised on June 22, 2023.

The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.

P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis.

Table of contents

What is a null hypothesis, what exactly is a p value, how do you calculate the p value, p values and statistical significance, reporting p values, caution when using p values, other interesting articles, frequently asked questions about p-values.

All statistical tests have a null hypothesis. For most tests, the null hypothesis is that there is no relationship between your variables of interest or that there is no difference among groups.

For example, in a two-tailed t test , the null hypothesis is that the difference between two groups is zero.

  • Null hypothesis ( H 0 ): there is no difference in longevity between the two groups.
  • Alternative hypothesis ( H A or H 1 ): there is a difference in longevity between the two groups.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The p value , or probability value, tells you how likely it is that your data could have occurred under the null hypothesis. It does this by calculating the likelihood of your test statistic , which is the number calculated by a statistical test using your data.

The p value tells you how often you would expect to see a test statistic as extreme or more extreme than the one calculated by your statistical test if the null hypothesis of that test was true. The p value gets smaller as the test statistic calculated from your data gets further away from the range of test statistics predicted by the null hypothesis.

The p value is a proportion: if your p value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.

P values are usually automatically calculated by your statistical program (R, SPSS, etc.).

You can also find tables for estimating the p value of your test statistic online. These tables show, based on the test statistic and degrees of freedom (number of observations minus number of independent variables) of your test, how frequently you would expect to see that test statistic under the null hypothesis.

The calculation of the p value depends on the statistical test you are using to test your hypothesis :

  • Different statistical tests have different assumptions and generate different test statistics. You should choose the statistical test that best fits your data and matches the effect or relationship you want to test.
  • The number of independent variables you include in your test changes how large or small the test statistic needs to be to generate the same p value.

No matter what test you use, the p value always describes the same thing: how often you can expect to see a test statistic as extreme or more extreme than the one calculated from your test.

P values are most often used by researchers to say whether a certain pattern they have measured is statistically significant.

Statistical significance is another way of saying that the p value of a statistical test is small enough to reject the null hypothesis of the test.

How small is small enough? The most common threshold is p < 0.05; that is, when you would expect to find a test statistic as extreme as the one calculated by your test only 5% of the time. But the threshold depends on your field of study – some fields prefer thresholds of 0.01, or even 0.001.

The threshold value for determining statistical significance is also known as the alpha value.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

p value of testing a hypothesis

P values of statistical tests are usually reported in the results section of a research paper , along with the key information needed for readers to put the p values in context – for example, correlation coefficient in a linear regression , or the average difference between treatment groups in a t -test.

P values are often interpreted as your risk of rejecting the null hypothesis of your test when the null hypothesis is actually true.

In reality, the risk of rejecting the null hypothesis is often higher than the p value, especially when looking at a single study or when using small sample sizes. This is because the smaller your frame of reference, the greater the chance that you stumble across a statistically significant pattern completely by accident.

P values are also often interpreted as supporting or refuting the alternative hypothesis. This is not the case. The  p value can only tell you whether or not the null hypothesis is supported. It cannot tell you whether your alternative hypothesis is true, or why.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient
  • Null hypothesis

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

A p -value , or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test .

P -values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p -value tables for the relevant test statistic .

P -values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

If the test statistic is far from the mean of the null distribution, then the p -value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

No. The p -value only tells you how likely the data you have observed is to have occurred under the null hypothesis .

If the p -value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Understanding P-values | Definition and Examples. Scribbr. Retrieved August 5, 2024, from https://www.scribbr.com/statistics/p-value/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, an easy introduction to statistical significance (with examples), test statistics | definition, interpretation, and examples, what is effect size and why does it matter (examples), what is your plagiarism score.

P-Value in Statistical Hypothesis Tests: What is it?

P value definition.

A p value is used in hypothesis testing to help you support or reject the null hypothesis . The p value is the evidence against a null hypothesis . The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

P values are expressed as decimals although it may be easier to understand what they are if you convert them to a percentage . For example, a p value of 0.0254 is 2.54%. This means there is a 2.54% chance your results could be random (i.e. happened by chance). That’s pretty tiny. On the other hand, a large p-value of .9(90%) means your results have a 90% probability of being completely random and not due to anything in your experiment. Therefore, the smaller the p-value, the more important (“ significant “) your results.

When you run a hypothesis test , you compare the p value from your test to the alpha level you selected when you ran the test. Alpha levels can also be written as percentages.

p value

P Value vs Alpha level

Alpha levels are controlled by the researcher and are related to confidence levels . You get an alpha level by subtracting your confidence level from 100%. For example, if you want to be 98 percent confident in your research, the alpha level would be 2% (100% – 98%). When you run the hypothesis test, the test will give you a value for p. Compare that value to your chosen alpha level. For example, let’s say you chose an alpha level of 5% (0.05). If the results from the test give you:

  • A small p (≤ 0.05), reject the null hypothesis . This is strong evidence that the null hypothesis is invalid.
  • A large p (> 0.05) means the alternate hypothesis is weak, so you do not reject the null.

P Values and Critical Values

p-value

What if I Don’t Have an Alpha Level?

In an ideal world, you’ll have an alpha level. But if you do not, you can still use the following rough guidelines in deciding whether to support or reject the null hypothesis:

  • If p > .10 → “not significant”
  • If p ≤ .10 → “marginally significant”
  • If p ≤ .05 → “significant”
  • If p ≤ .01 → “highly significant.”

How to Calculate a P Value on the TI 83

Example question: The average wait time to see an E.R. doctor is said to be 150 minutes. You think the wait time is actually less. You take a random sample of 30 people and find their average wait is 148 minutes with a standard deviation of 5 minutes. Assume the distribution is normal. Find the p value for this test.

  • Press STAT then arrow over to TESTS.
  • Press ENTER for Z-Test .
  • Arrow over to Stats. Press ENTER.
  • Arrow down to μ0 and type 150. This is our null hypothesis mean.
  • Arrow down to σ. Type in your std dev: 5.
  • Arrow down to xbar. Type in your sample mean : 148.
  • Arrow down to n. Type in your sample size : 30.
  • Arrow to <μ0 for a left tail test . Press ENTER.
  • Arrow down to Calculate. Press ENTER. P is given as .014, or about 1%.

The probability that you would get a sample mean of 148 minutes is tiny, so you should reject the null hypothesis.

Note : If you don’t want to run a test, you could also use the TI 83 NormCDF function to get the area (which is the same thing as the probability value).

Dodge, Y. (2008). The Concise Encyclopedia of Statistics . Springer. Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial.

P-Value And Statistical Significance: What It Is & Why It Matters

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P Value Calculator From T Score
  • P-Value Calculator For Chi-Square
  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

How Hypothesis Tests Work: Significance Levels (Alpha) and P values

By Jim Frost 45 Comments

Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to draw conclusions about an entire population . In the testing process, you use significance levels and p-values to determine whether the test results are statistically significant.

You hear about results being statistically significant all of the time. But, what do significance levels, P values, and statistical significance actually represent? Why do we even need to use hypothesis tests in statistics?

In this post, I answer all of these questions. I use graphs and concepts to explain how hypothesis tests function in order to provide a more intuitive explanation. This helps you move on to understanding your statistical results.

Hypothesis Test Example Scenario

To start, I’ll demonstrate why we need to use hypothesis tests using an example.

A researcher is studying fuel expenditures for families and wants to determine if the monthly cost has changed since last year when the average was $260 per month. The researcher draws a random sample of 25 families and enters their monthly costs for this year into statistical software. You can download the CSV data file: FuelsCosts . Below are the descriptive statistics for this year.

Table of descriptive statistics for our fuel cost example.

We’ll build on this example to answer the research question and show how hypothesis tests work.

Descriptive Statistics Alone Won’t Answer the Question

The researcher collected a random sample and found that this year’s sample mean (330.6) is greater than last year’s mean (260). Why perform a hypothesis test at all? We can see that this year’s mean is higher by $70! Isn’t that different?

Regrettably, the situation isn’t as clear as you might think because we’re analyzing a sample instead of the full population. There are huge benefits when working with samples because it is usually impossible to collect data from an entire population. However, the tradeoff for working with a manageable sample is that we need to account for sample error.

The sampling error is the gap between the sample statistic and the population parameter. For our example, the sample statistic is the sample mean, which is 330.6. The population parameter is μ, or mu, which is the average of the entire population. Unfortunately, the value of the population parameter is not only unknown but usually unknowable. Learn more about Sampling Error .

We obtained a sample mean of 330.6. However, it’s conceivable that, due to sampling error, the mean of the population might be only 260. If the researcher drew another random sample, the next sample mean might be closer to 260. It’s impossible to assess this possibility by looking at only the sample mean. Hypothesis testing is a form of inferential statistics that allows us to draw conclusions about an entire population based on a representative sample. We need to use a hypothesis test to determine the likelihood of obtaining our sample mean if the population mean is 260.

Background information : The Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

A Sampling Distribution Determines Whether Our Sample Mean is Unlikely

It is very unlikely for any sample mean to equal the population mean because of sample error. In our case, the sample mean of 330.6 is almost definitely not equal to the population mean for fuel expenditures.

If we could obtain a substantial number of random samples and calculate the sample mean for each sample, we’d observe a broad spectrum of sample means. We’d even be able to graph the distribution of sample means from this process.

This type of distribution is called a sampling distribution. You obtain a sampling distribution by drawing many random samples of the same size from the same population. Why the heck would we do this?

Because sampling distributions allow you to determine the likelihood of obtaining your sample statistic and they’re crucial for performing hypothesis tests.

Luckily, we don’t need to go to the trouble of collecting numerous random samples! We can estimate the sampling distribution using the t-distribution, our sample size, and the variability in our sample.

We want to find out if the average fuel expenditure this year (330.6) is different from last year (260). To answer this question, we’ll graph the sampling distribution based on the assumption that the mean fuel cost for the entire population has not changed and is still 260. In statistics, we call this lack of effect, or no change, the null hypothesis . We use the null hypothesis value as the basis of comparison for our observed sample value.

Sampling distributions and t-distributions are types of probability distributions.

Related posts : Sampling Distributions and Understanding Probability Distributions

Graphing our Sample Mean in the Context of the Sampling Distribution

The graph below shows which sample means are more likely and less likely if the population mean is 260. We can place our sample mean in this distribution. This larger context helps us see how unlikely our sample mean is if the null hypothesis is true (μ = 260).

Sampling distribution of means for our fuel cost data.

The graph displays the estimated distribution of sample means. The most likely values are near 260 because the plot assumes that this is the true population mean. However, given random sampling error, it would not be surprising to observe sample means ranging from 167 to 352. If the population mean is still 260, our observed sample mean (330.6) isn’t the most likely value, but it’s not completely implausible either.

The Role of Hypothesis Tests

The sampling distribution shows us that we are relatively unlikely to obtain a sample of 330.6 if the population mean is 260. Is our sample mean so unlikely that we can reject the notion that the population mean is 260?

In statistics, we call this rejecting the null hypothesis. If we reject the null for our example, the difference between the sample mean (330.6) and 260 is statistically significant. In other words, the sample data favor the hypothesis that the population average does not equal 260.

However, look at the sampling distribution chart again. Notice that there is no special location on the curve where you can definitively draw this conclusion. There is only a consistent decrease in the likelihood of observing sample means that are farther from the null hypothesis value. Where do we decide a sample mean is far away enough?

To answer this question, we’ll need more tools—hypothesis tests! The hypothesis testing procedure quantifies the unusualness of our sample with a probability and then compares it to an evidentiary standard. This process allows you to make an objective decision about the strength of the evidence.

We’re going to add the tools we need to make this decision to the graph—significance levels and p-values!

These tools allow us to test these two hypotheses:

  • Null hypothesis: The population mean equals the null hypothesis mean (260).
  • Alternative hypothesis: The population mean does not equal the null hypothesis mean (260).

Related post : Hypothesis Testing Overview

What are Significance Levels (Alpha)?

A significance level, also known as alpha or α, is an evidentiary standard that a researcher sets before the study. It defines how strongly the sample evidence must contradict the null hypothesis before you can reject the null hypothesis for the entire population. The strength of the evidence is defined by the probability of rejecting a null hypothesis that is true. In other words, it is the probability that you say there is an effect when there is no effect.

For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.

Lower significance levels require stronger sample evidence to be able to reject the null hypothesis. For example, to be statistically significant at the 0.01 significance level requires more substantial evidence than the 0.05 significance level. However, there is a tradeoff in hypothesis tests. Lower significance levels also reduce the power of a hypothesis test to detect a difference that does exist.

The technical nature of these types of questions can make your head spin. A picture can bring these ideas to life!

To learn a more conceptual approach to significance levels, see my post about Understanding Significance Levels .

Graphing Significance Levels as Critical Regions

On the probability distribution plot, the significance level defines how far the sample value must be from the null value before we can reject the null. The percentage of the area under the curve that is shaded equals the probability that the sample value will fall in those regions if the null hypothesis is correct.

To represent a significance level of 0.05, I’ll shade 5% of the distribution furthest from the null value.

Graph that displays a two-tailed critical region for a significance level of 0.05.

The two shaded regions in the graph are equidistant from the central value of the null hypothesis. Each region has a probability of 0.025, which sums to our desired total of 0.05. These shaded areas are called the critical region for a two-tailed hypothesis test.

The critical region defines sample values that are improbable enough to warrant rejecting the null hypothesis. If the null hypothesis is correct and the population mean is 260, random samples (n=25) from this population have means that fall in the critical region 5% of the time.

Our sample mean is statistically significant at the 0.05 level because it falls in the critical region.

Related posts : One-Tailed and Two-Tailed Tests Explained , What Are Critical Values? , and T-distribution Table of Critical Values

Comparing Significance Levels

Let’s redo this hypothesis test using the other common significance level of 0.01 to see how it compares.

Chart that shows a two-tailed critical region for a significance level of 0.01.

This time the sum of the two shaded regions equals our new significance level of 0.01. The mean of our sample does not fall within with the critical region. Consequently, we fail to reject the null hypothesis. We have the same exact sample data, the same difference between the sample mean and the null hypothesis value, but a different test result.

What happened? By specifying a lower significance level, we set a higher bar for the sample evidence. As the graph shows, lower significance levels move the critical regions further away from the null value. Consequently, lower significance levels require more extreme sample means to be statistically significant.

You must set the significance level before conducting a study. You don’t want the temptation of choosing a level after the study that yields significant results. The only reason I compared the two significance levels was to illustrate the effects and explain the differing results.

The graphical version of the 1-sample t-test we created allows us to determine statistical significance without assessing the P value. Typically, you need to compare the P value to the significance level to make this determination.

Related post : Step-by-Step Instructions for How to Do t-Tests in Excel

What Are P values?

P values are the probability that a sample will have an effect at least as extreme as the effect observed in your sample if the null hypothesis is correct.

This tortuous, technical definition for P values can make your head spin. Let’s graph it!

First, we need to calculate the effect that is present in our sample. The effect is the distance between the sample value and null value: 330.6 – 260 = 70.6. Next, I’ll shade the regions on both sides of the distribution that are at least as far away as 70.6 from the null (260 +/- 70.6). This process graphs the probability of observing a sample mean at least as extreme as our sample mean.

Probability distribution plot shows how our sample mean has a p-value of 0.031.

The total probability of the two shaded regions is 0.03112. If the null hypothesis value (260) is true and you drew many random samples, you’d expect sample means to fall in the shaded regions about 3.1% of the time. In other words, you will observe sample effects at least as large as 70.6 about 3.1% of the time if the null is true. That’s the P value!

Learn more about How to Find the P Value .

Using P values and Significance Levels Together

If your P value is less than or equal to your alpha level, reject the null hypothesis.

The P value results are consistent with our graphical representation. The P value of 0.03112 is significant at the alpha level of 0.05 but not 0.01. Again, in practice, you pick one significance level before the experiment and stick with it!

Using the significance level of 0.05, the sample effect is statistically significant. Our data support the alternative hypothesis, which states that the population mean doesn’t equal 260. We can conclude that mean fuel expenditures have increased since last year.

P values are very frequently misinterpreted as the probability of rejecting a null hypothesis that is actually true. This interpretation is wrong! To understand why, please read my post: How to Interpret P-values Correctly .

Discussion about Statistically Significant Results

Hypothesis tests determine whether your sample data provide sufficient evidence to reject the null hypothesis for the entire population. To perform this test, the procedure compares your sample statistic to the null value and determines whether it is sufficiently rare. “Sufficiently rare” is defined in a hypothesis test by:

  • Assuming that the null hypothesis is true—the graphs center on the null value.
  • The significance (alpha) level—how far out from the null value is the critical region?
  • The sample statistic—is it within the critical region?

There is no special significance level that correctly determines which studies have real population effects 100% of the time. The traditional significance levels of 0.05 and 0.01 are attempts to manage the tradeoff between having a low probability of rejecting a true null hypothesis and having adequate power to detect an effect if one actually exists.

The significance level is the rate at which you incorrectly reject null hypotheses that are actually true ( type I error ). For example, for all studies that use a significance level of 0.05 and the null hypothesis is correct, you can expect 5% of them to have sample statistics that fall in the critical region. When this error occurs, you aren’t aware that the null hypothesis is correct, but you’ll reject it because the p-value is less than 0.05.

This error does not indicate that the researcher made a mistake. As the graphs show, you can observe extreme sample statistics due to sample error alone. It’s the luck of the draw!

Related posts : Statistical Significance: Definition & Meaning and Types of Errors in Hypothesis Testing

Hypothesis tests are crucial when you want to use sample data to make conclusions about a population because these tests account for sample error. Using significance levels and P values to determine when to reject the null hypothesis improves the probability that you will draw the correct conclusion.

Keep in mind that statistical significance doesn’t necessarily mean that the effect is important in a practical, real-world sense. For more information, read my post about Practical vs. Statistical Significance .

If you like this post, read the companion post: How Hypothesis Tests Work: Confidence Intervals and Confidence Levels .

You can also read my other posts that describe how other tests work:

  • How t-Tests Work
  • How the F-test works in ANOVA
  • How Chi-Squared Tests of Independence Work

To see an alternative approach to traditional hypothesis testing that does not use probability distributions and test statistics, learn about bootstrapping in statistics !

Share this:

p value of testing a hypothesis

Reader Interactions

' src=

December 11, 2022 at 10:56 am

For very easy concept about level of significance & p-value 1.Teacher has given a one assignment to student & asked how many error you have doing this assignment? Student reply, he can has error ≤ 5% (it is level of significance). After completion of assignment, teacher checked his error which is ≤ 5% (may be 4% or 3% or 2% even less, it is p-value) it means his results are significant. Otherwise he has error > 5% (may be 6% or 7% or 8% even more, it is p-value) it means his results are non-significant. 2. Teacher has given a one assignment to student & asked how many error you have doing this assignment? Student reply, he can has error ≤ 1% (it is level of significance). After completion of assignment, teacher checked his error which is ≤ 1% (may be 0.9% or 0.8% or 0.7% even less, it is p-value) it means his results are significant. Otherwise he has error > 1% (may be 1.1% or 1.5% or 2% even more, it is p-value) it means his results are non-significant. p-value is significant or not mainly dependent upon the level of significance.

' src=

December 11, 2022 at 7:50 pm

I think that approach helps explain how to determine statistical significance–is the p-value less than or equal to the significance level. However, it doesn’t really explain what statistical significance means. I find that comparing the p-value to the significance level is the easy part. Knowing what it means and how to choose your significance level is the harder part!

' src=

December 3, 2022 at 5:54 pm

What would you say to someone who believes that a p-value higher than the level of significance (alpha) means the null hypothesis has been proven? Should you support that statement or deny it?

December 3, 2022 at 10:18 pm

Hi Emmanuel,

When the p-value is greater than the significance level, you fail to reject the null hypothesis . That is different than proving it. To learn why and what it means, click the link to read a post that I’ve written that will answer your question!

' src=

April 19, 2021 at 12:27 am

Thank you so much Sir

April 18, 2021 at 2:37 pm

Hi sir, your blogs are much more helpful for clearing the concepts of statistics, as a researcher I find them much more useful. I have some quarries:

1. In many research papers I have seen authors using the statement ” means or values are statically at par at p = 0.05″ when they do some pair wise comparison between the treatments (a kind of post hoc) using some value of CD (critical difference) or we can say LSD which is calculated using alpha not using p. So with this article I think this should be alpha =0.05 or 5%, not p = 0.05 earlier I thought p and alpha are same. p it self is compared with alpha 0.05. Correct me if I am wrong.

2. When we can draw a conclusion using critical value based on critical values (CV) which is based on alpha values in different tests (e.g. in F test CV is at F (0.05, t-1, error df) when alpha is 0.05 which is table value of F and is compared with F calculated for drawing the conclusion); then why we go for p values, and draw a conclusion based on p values, even many online software do not give p value, they just mention CD (LSD)

3. can you please help me in interpreting interaction in two factor analysis (Factor A X Factor b) in Anova.

Thank You so much!

(Commenting again as I have not seen my comment in comment list; don’t know why)

April 18, 2021 at 10:57 pm

Hi Himanshu,

I manually approve comments so there will be some time lag involved before they show up.

Regarding your first question, yes, you’re correct. Test results are significant at particular significance levels or alpha. They should not use p to define the significance level. You’re also correct in that you compare p to alpha.

Critical values are a different (but related) approach for determining significance. It was more common before computer analysis took off because it reduced the calculations. Using this approach in its simplest form, you only know whether a result is significant or not at the given alpha. You just determine whether the test statistic falls within a critical region to determine statistical significance or not significant. However, it is ok to supplement this type of result with the actual p-value. Knowing the precise p-value provides additional information that significant/not significant does not provide. The critical value and p-value approaches will always agree too. For more information about why the exact p-value is useful, read my post about Five Tips for Interpreting P-values .

Finally, I’ve written about two-way ANOVA in my post, How to do Two-Way ANOVA in Excel . Additionally, I write about it in my Hypothesis Testing ebook .

' src=

January 28, 2021 at 3:12 pm

Thank you for your answer, Jim, I really appreciate it. I’m taking a Coursera stats course and online learning without being able to ask questions of a real teacher is not my forte!

You’re right, I don’t think I’m ready for that calculation! However, I think I’m struggling with something far more basic, perhaps even the interpretation of the t-table? I’m just not sure how you came up with the p-value as .03112, with the 24 degrees of freedom. When I pull up a t-table and look at the 24-degrees of freedom row, I’m not sure how any of those numbers correspond with your answer? Either the single tail of 0.01556 or the combined of 0.03112. What am I not getting? (which, frankly, could be a lot!!) Again, thank you SO much for your time.

January 28, 2021 at 11:19 pm

Ah ok, I see! First, let me point you to several posts I’ve written about t-values and the t-distribution. I don’t cover those in this post because I wanted to present a simplified version that just uses the data in its regular units. The basic idea is that the hypothesis tests actually convert all your raw data down into one value for a test statistic, such as the t-value. And then it uses that test statistic to determine whether your results are statistically significant. To be significant, the t-value must exceed a critical value, which is what you lookup in the table. Although, nowadays you’d typically let your software just tell you.

So, read the following two posts, which covers several aspects of t-values and distributions. And then if you have more questions after that, you can post them. But, you’ll have a lot more information about them and probably some of your questions will be answered! T-values T-distributions

January 27, 2021 at 3:10 pm

Jim, just found your website and really appreciate your thoughtful, thorough way of explaining things. I feel very dumb, but I’m struggling with p-values and was hoping you could help me.

Here’s the section that’s getting me confused:

“First, we need to calculate the effect that is present in our sample. The effect is the distance between the sample value and null value: 330.6 – 260 = 70.6. Next, I’ll shade the regions on both sides of the distribution that are at least as far away as 70.6 from the null (260 +/- 70.6). This process graphs the probability of observing a sample mean at least as extreme as our sample mean.

** I’m good up to this point. Draw the picture, do the subtraction, shade the regions. BUT, I’m not sure how to figure out the area of the shaded region — even with a T-table. When I look at the T-table on 24 df, I’m not sure what to do with those numbers, as none of them seem to correspond in any way to what I’m looking at in the problem. In the end, I have no idea how you calculated each shaded area being 0.01556.

I feel like there’s a (very simple) step that everyone else knows how to do, but for some reason I’m missing it.

Again, dumb question, but I’d love your help clarifying that.

thank you, Sara

January 27, 2021 at 9:51 pm

That’s not a dumb question at all. I actually don’t show or explain the calculations for figuring out the area. The reason for that is the same reason why students never calculate the critical t-values for their test, instead you look them up in tables or use statistical software. The common reason for all that is because calculating these values is extremely complicated! It’s best to let software do that for you or, when looking critical values, use the tables!

The principal though is that percentage of the area under the curve equals the probability that values will fall within that range.

Equation for t-distribution

And then, for this example, you’d need to figure out the area under the curve for particular ranges!

' src=

January 15, 2021 at 10:57 am

HI Jim, I have a question related to Hypothesis test.. in Medical imaging, there are different way to measure signal intensity (from a tumor lesion for example). I tested for the same 100 patients 4 different ways to measure tumor captation to a injected dose. So for the 100 patients, i got 4 linear regression (relation between injected dose and measured quantity at tumor sites) = so an output of 4 equations Condition A output = -0,034308 + 0,0006602*input Condition B output = 0,0117631 + 0,0005425*input Condition C output = 0,0087871 + 0,0005563*input Condition D output = 0,001911 + 0,0006255*input

My question : i want to compare the 4 methods to find the best one (compared to others) : do Hypothesis test good to me… and if Yes, i do not find test to perform it. Can you suggest me a software. I uselly used JMP for my stats… but open to other softwares…

THank for your time G

' src=

November 16, 2020 at 5:42 am

Thank you very much for writing about this topic!

Your explanation made more sense to me about: Why we reject Null Hypothesis when p value < significance level

Kind greetings, Jalal

' src=

September 25, 2020 at 1:04 pm

Hi Jim, Your explanations are so helpful! Thank you. I wondered about your first graph. I see that the mean of the graph is 260 from the null hypothesis, and it looks like the standard deviation of the graph is about 31. Where did you get 31 from? Thank you

September 25, 2020 at 4:08 pm

Hi Michelle,

That is a great question. Very observant. And it gets to how these tests work. The hypothesis test that I’m illustrating here is the one-sample t-test. And this graph illustrates the sampling distribution for the t-test. T-tests use the t-distribution to determine the sampling distribution. For the t-distribution, you need to specify the degrees of freedom, which entirely defines the distribution (i.e., it’s the only parameter). For 1-sample t-tests, the degrees of freedom equal the number of observations minus 1. This dataset has 25 observations. Hence, the 24 DF you see in the graph.

Unlike the normal distribution, there is no standard deviation parameter. Instead, the degrees of freedom determines the spread of the curve. Typically, with t-tests, you’ll see results discussed in terms of t-values, both for your sample and for defining the critical regions. However, for this introductory example, I’ve converted the t-values into the raw data units (t-value * SE mean).

So, the standard deviation you’re seeing in the graph is a result of the spread of the underlying t-distribution that has 24 degrees of freedom and then applying the conversion from t-values to raw values.

' src=

September 10, 2020 at 8:19 am

Your blog is incredible.

I am having difficulty understanding why the phrase ‘as extreme as’ is required in the definition of p-value (“P values are the probability that a sample will have an effect at least as extreme as the effect observed in your sample if the null hypothesis is correct.”)

Why can’t P-Values simply be defined as “The probability of sample observation if the null hypothesis is correct?”

In your other blog titled ‘Interpreting P values’ you have explained p-values as “P-values indicate the believability of the devil’s advocate case that the null hypothesis is correct given the sample data”. I understand (or accept) this explanation. How does one move from this definition to one that contains the phrase ‘as extreme as’?

September 11, 2020 at 5:05 pm

Thanks so much for your kind words! I’m glad that my website has been helpful!

The key to understanding the “at least as extreme” wording lies in the probability plots for p-values. Using probability plots for continuous data, you can calculate probabilities, but only for ranges of values. I discuss this in my post about understanding probability distributions . In a nutshell, we need a range of values for these probabilities because the probabilities are derived from the area under a distribution curve. A single value just produces a line on these graphs rather than an area. Those ranges are the shaded regions in the probability plots. For p-values, the range corresponds to the “at least as extreme” wording. That’s where it comes from. We need a range to calculate a probability. We can’t use the single value of the observed effect because it doesn’t produce an area under the curve.

I hope that helps! I think this is a particularly confusing part of understanding p-values that most people don’t understand.

' src=

August 7, 2020 at 5:45 pm

Hi Jim, thanks for the post.

Could you please clarify the following excerpt from ‘Graphing Significance Levels as Critical Regions’:

“The percentage of the area under the curve that is shaded equals the probability that the sample value will fall in those regions if the null hypothesis is correct.”

I’m not sure if I understood this correctly. If the sample value fall in one of the shaded regions, doesn’t mean that the null hypothesis can be rejected, hence that is not correct?

August 7, 2020 at 10:23 pm

Think of it this way. There are two basic reasons for why a sample value could fall in a critical region:

  • The null hypothesis is correct and random chance caused the sample value to be unusual.
  • The null hypothesis is not correct.

You don’t know which one is true. Remember, just because you reject the null hypothesis it doesn’t mean the null is false. However, by using hypothesis tests to determine statistical significance, you control the chances of #1 occurring. The rate at which #1 occurs equals your significance level. On the hand, you don’t know the probability of the sample value falling in a critical region if the alternative hypothesis is correct (#2). It depends on the precise distribution for the alternative hypothesis and you usually don’t know that, which is why you’re testing the hypotheses in the first place!

I hope I answered the question you were asking. If not, feel free to ask follow up questions. Also, this ties into how to interpret p-values . It’s not exactly straightforward. Click the link to learn more.

' src=

June 4, 2020 at 6:17 am

Hi Jim, thank you very much for your answer. You helped me a lot!

June 3, 2020 at 5:23 pm

Hi, Thanks for this post. I’ve been learning a lot with you. My question is regarding to lack of fit. The p-value of my lack of fit is really low, making my lack of fit significant, meaning my model does not fit well. Is my case a “false negative”? given that my pure error is really low, making the computation of the lack of fit low. So it means my model is good. Below I show some information, that I hope helps to clarify my question.

SumSq DF MeanSq F pValue ________ __ ________ ______ __________

Total 1246.5 18 69.25 Model 1241.7 6 206.94 514.43 9.3841e-14 . Linear 1196.6 3 398.87 991.53 1.2318e-14 . Nonlinear 45.046 3 15.015 37.326 2.3092e-06 Residual 4.8274 12 0.40228 . Lack of fit 4.7388 7 0.67698 38.238 0.0004787 . Pure error 0.088521 5 0.017704

June 3, 2020 at 7:53 pm

As you say, a low p-value for a lack of fit test indicates that the model doesn’t fit your data adequately. This is a positive result for the test, which means it can’t be a “false negative.” At best, it could be a false positive, meaning that your data actually fit model well despite the low p-value.

I’d recommend graphing the residuals and looking for patterns . There is probably a relationship between variables that you’re not modeling correctly, such as curvature or interaction effects. There’s no way to diagnose the specific nature of the lack-of-fit problem by using the statistical output. You’ll need the graphs.

If there are no patterns in the residual plots, then your lack-of-fit results might be a false positive.

I hope this helps!

' src=

May 30, 2020 at 6:23 am

First of all, I have to say there are not many resources that explain a complicated topic in an easier manner.

My question is, how do we arrive at “if p value is less than alpha, we reject the null hypothesis.”

Is this covered in a separate article I could read?

Thanks Shekhar

' src=

May 25, 2020 at 12:21 pm

Hi Jim, terrific website, blog, and after this I’m ordering your book. One of my biggest challenges is nomenclature, definitions, context, and formulating the hypotheses. Here’s one I want to double-be-sure I understand: From above you write: ” These tools allow us to test these two hypotheses:

Null hypothesis: The population mean equals the null hypothesis mean (260). Alternative hypothesis: The population mean does not equal the null hypothesis mean (260). ” I keep thinking that 260 is the population mean mu, the underlying population (that we never really know exactly) and that the Null Hypothesis is comparing mu to x-bar (the sample mean of the 25 families randomly sampled w mean = sample mean = x-bar = 330.6).

So is the following incorrect, and if so, why? Null hypothesis: The population mean mu=260 equals the null hypothesis mean x-bar (330.6). Alternative hypothesis: The population mean mu=269 does not equal the null hypothesis mean x-bar (330.6).

And my thinking is that usually the formulation of null and alternative hypotheses is “test value” = “mu current of underlying population”, whereas I read the formulation on the webpage above to be the reverse.

Any comments appreciated. Many Thanks,

May 26, 2020 at 8:56 pm

The null hypothesis states that population value equals the null value. Now, I know that’s not particularly helpful! But, the null value varies based on test and context. So, in this example, we’re setting the null value aa $260, which was the mean from the previous year. So, our null hypothesis states:

Null: the population mean (mu) = 260. Alternative: the population mean ≠ 260.

These hypothesis statements are about the population parameter. For this type of one-sample analysis, the target or reference value you specify is the null hypothesis value. Additionally, you don’t include the sample estimate in these statements, which is the X-bar portion you tacked on at the end. It’s strictly about the value of the population parameter you’re testing. You don’t know the value of the underlying distribution. However, given the mutually exclusive nature of the null and alternative hypothesis, you know one or the other is correct. The null states that mu equals 260 while the alternative states that it doesn’t equal 260. The data help you decide, which brings us to . . .

However, the procedure does compare our sample data to the null hypothesis value, which is how it determines how strong our evidence is against the null hypothesis.

I hope I answered your question. If not, please let me know!

' src=

May 8, 2020 at 6:00 pm

Really using the interpretation “In other words, you will observe sample effects at least as large as 70.6 about 3.1% of the time if the null is true”, our head seems to tie a knot. However, doing the reverse interpretation, it is much more intuitive and easier. That is, we will observe the sample effect of at least 70.6 in about 96.9% of the time, if the null is false (that is, our hypothesis is true).

May 8, 2020 at 7:25 pm

Your phrasing really isn’t any simpler. And it has the additional misfortune of being incorrect.

What you’re essentially doing is creating a one-sided confidence interval by using the p-value from a two-sided test. That’s incorrect in two ways.

  • Don’t mix and match one-sided and two-sided test results.
  • Confidence levels are determine by the significance level, not p-values.

So, what you need is a two-sided 95% CI (1-alpha). You could then state the results are statistically significant and you have 95% confidence that the population effect is between X and Y. If you want a lower bound as you propose, then you’ll need to use a one-sided hypothesis test with a 95% Lower Bound. That’ll give you a different value for the lower bound than the one you use.

I like confidence intervals. As I write elsewhere, I think they’re easier to understand and provide more information than a binary test result. But, you need to use them correctly!

One other point. When you are talking about p-values, it’s always under the assumption that the null hypothesis is correct. You *never* state anything about the p-value in relation to the null being false (i.e. alternative is true). But, if you want to use the type of phrasing you suggest, use it in the context of CIs and incorporate the points I cover above.

' src=

February 10, 2020 at 11:13 am

Muchas gracias profesor por compartir sus conocimientos. Un saliud especial desde Colombia.

' src=

August 6, 2019 at 11:46 pm

i found this really helpful . also can you help me out ?

I’m a little confused Can you tell me if level of significance and pvalue are comparable or not and if they are what does it mean if pvalue < LS . Do we reject the null hypothesis or do we accept the null hypothesis ?

August 7, 2019 at 12:49 am

Hi Divyanshu,

Yes, you compare the p-value to the significance level. When the p-value is less than the significance level (alpha), your results are statistically significant and you reject the null hypothesis.

I’d suggest re-reading the “Using P values and Significance Levels Together” section near the end of this post more closely. That describes the process. The next section describes what it all means.

' src=

July 1, 2019 at 4:19 am

sure.. I will use only in my class rooms that too offline with due credits to your orginal page. I will encourage my students to visit your blog . I have purchased your eBook on Regressions….immensely useful.

July 1, 2019 at 9:52 am

Hi Narasimha, that sounds perfect. Thanks for buying my ebook as well. I’m thrilled to hear that you’ve found it to be helpful!

June 28, 2019 at 6:22 am

I have benefited a lot by your writings….Can I share the same with my students in the classroom?

June 30, 2019 at 8:44 pm

Hi Narasimha,

Yes, you can certainly share with your students. Please attribute my original page. And please don’t copy whole sections of my posts onto another webpage as that can be bad with Google! Thanks!

' src=

February 11, 2019 at 7:46 pm

Hello, great site and my apologies if the answer to the following question exists already.

I’ve always wondered why we put the sampling distribution about the null hypothesis rather than simply leave it about the observed mean. I can see mathematically we are measuring the same distance from the null and basically can draw the same conclusions.

For example we take a sample (say 50 people) we gather an observation (mean wage) estimate the standard error in that observation and so can build a sampling distribution about the observed mean. That sampling distribution contains a confidence interval, where say, i am 95% confident the true mean lies (i.e. in repeated sampling the true mean would reside within this interval 95% of the time).

When i use this for a hyp-test, am i right in saying that we place the sampling dist over the reference level simply because it’s mathematically equivalent and it just seems easier to gauge how far the observation is from 0 via t-stats or its likelihood via p-values?

It seems more natural to me to look at it the other way around. leave the sampling distribution on the observed value, and then look where the null sits…if it’s too far left or right then it is unlikely the true population parameter is what we believed it to be, because if the null were true it would only occur ~ 5% of the time in repeated samples…so perhaps we need to change our opinion.

Can i interpret a hyp-test that way? Or do i have a misconception?

February 12, 2019 at 8:25 pm

The short answer is that, yes, you can draw the interval around the sample mean instead. And, that is, in fact, how you construct confidence intervals. The distance around the null hypothesis for hypothesis tests and the distance around the sample for confidence intervals are the same distance, which is why the results will always agree as long as you use corresponding alpha levels and confidence levels (e.g., alpha 0.05 with a 95% confidence level). I write about how this works in a post about confidence intervals .

I prefer confidence intervals for a number of reasons. They’ll indicate whether you have significant results if they exclude the null value and they indicate the precision of the effect size estimate. Corresponding with what you’re saying, it’s easier to gauge how far a confidence interval is from the null value (often zero) whereas a p-value doesn’t provide that information. See Practical versus Statistical Significance .

So, you don’t have any misconception at all! Just refer to it as a confidence interval rather than a hypothesis test, but, of course, they are very closely related.

' src=

January 9, 2019 at 10:37 pm

Hi Jim, Nice Article.. I have a question… I read the Central limit theorem article before this article…

Coming to this article, During almost every hypothesis test, we draw a normal distribution curve assuming there is a sampling distribution (and then we go for test statistic, p value etc…). Do we draw a normal distribution curve for hypo tests because of the central limit theorem…

Thanks in advance, Surya

January 10, 2019 at 1:57 am

These distributions are actually the t-distribution which are different from the normal distribution. T-distributions only have one parameter–the degrees of freedom. As the DF of increases, the t-distribution tightens up. Around 25 degrees of freedom, the t-distribution approximates the normal distribution. Depending on the type of t-test, this corresponds to a sample size of 26 or 27. Similarly, the sampling distribution of the means also approximate the normal distribution at around these sample sizes. With a large enough sample size, both the t-distribution and the sample distribution converge to a normal distribution regardless (largely) of the underlying population distribution. So, yes, the central limit theorem plays a strong role in this.

It’s more accurate to say that central limit theorem causes the sampling distribution of the means to converge on the same distribution that the t-test uses, which allows you to assume that the test produces valid results. But, technically, the t-test is based on the t-distribution.

Problems can occur if the underlying distribution is non-normal and you have a small sample size. In that case, the sampling distribution of the means won’t approximate the t-distribution that the t-test uses. However, the test results will assume that it does and produce results based on that–which is why it causes problems!

' src=

November 19, 2018 at 9:15 am

Dear Jim! Thank you very much for your explanation. I need your help to understand my data. I have two samples (about 300 observations) with biased distributions. I did the ttest and obtained the p-value, which is quite small. Can I draw the conclusion that the effect size is small even when the distribution of my data is not normal? Thank you

November 19, 2018 at 9:34 am

Hi Tetyana,

First, when you say that your p-value is small and that you want to “draw the conclusion that the effect size is small,” I assume that you mean statistically significant. When the p-value is low, the null hypothesis must go! In other words, you reject the null and conclude that there is a statistically significant effect–not a small effect.

Now, back to the question at hand! Yes, When you have a sufficiently large sample-size, t-tests are robust to departures from normality. For a 2-sample t-test, you should have at least 15 samples per group, which you exceed by quite a bit. So, yes, you can reliably conclude that your results are statistically significant!

You can thank the central limit theorem! 🙂

' src=

September 10, 2018 at 12:18 am

Hello Jim, I am very sorry; I have very elementary of knowledge of stats. So, would you please explain how you got a p- value of 0.03112 in the above calculation/t-test? By looking at a chart? Would you also explain how you got the information that “you will observe sample effects at least as large as 70.6 about 3.1% of the time if the null is true”?

' src=

July 6, 2018 at 7:02 am

A quick question regarding your use of two-tailed critical regions in the article above: why? I mean, what is a real-world scenario that would warrant a two-tailed test of any kind (z, t, etc.)? And if there are none, why keep using the two-tailed scenario as an example, instead of the one-tailed which is both more intuitive and applicable to most if not all practical situations. Just curious, as one person attempting to educate people on stats to another (my take on the one vs. two-tailed tests can be seen here: http://blog.analytics-toolkit.com/2017/one-tailed-two-tailed-tests-significance-ab-testing/ )

Thanks, Georgi

July 6, 2018 at 12:05 pm

There’s the appropriate time and place for both one-tailed and two-tailed tests. I plan to write a post on this issue specifically, so I’ll keep my comments here brief.

So much of statistics is context sensitive. People often want concrete rules for how to do things in statistics but that’s often hard to provide because the answer depends on the context, goals, etc. The question of whether to use a one-tailed or two-tailed test falls firmly in this category of it depends.

I did read the article you wrote. I’ll say that I can see how in the context of A/B testing specifically there might be a propensity to use one-tailed tests. You only care about improvements. There’s probably not too much downside in only caring about one direction. In fact, in a post where I compare different tests and different options , I suggest using a one-tailed test for a similar type of casing involving defects. So, I’m onboard with the idea of using one-tailed tests when they’re appropriate. However, I do think that two-tailed tests should be considered the default choice and that you need good reasons to move to a one-tailed test. Again, your A/B testing area might supply those reasons on a regular basis, but I can’t make that a blanket statement for all research areas.

I think your article mischaracterizes some of the pros and cons of both types of tests. Just a couple of for instances. In a two-tailed test, you don’t have to take the same action regardless of which direction the results are significant (example below). And, yes, you can determine the direction of the effect in a two-tailed test. You simply look at the estimated effect. Is it positive or negative?

On the other hand, I do agree that one-tailed tests don’t increase the overall Type I error. However, there is a big caveat for that. In a two-tailed test, the Type I error rate is evenly split in both tails. For a one-tailed test, the overall Type I error rate does not change, but the Type I errors are redistributed so they all occur in the direction that you are interested in rather than being split between the positive and negative directions. In other words, you’ll have twice as many Type I errors in the specific direction that you’re interested in. That’s not good.

My big concerns with one-tailed tests are that it makes it easier to obtain the results that you want to obtain. And, all of the Type I errors (false positives) are in that direction too. It’s just not a good combination.

To answer your question about when you might want to use two-tailed tests, there are plenty of reasons. For one, you might want to avoid the situation I describe above. Additionally, in a lot of scientific research, the researchers truly are interested in detecting effects in either direction for the sake of science. Even in cases with a practical application, you might want to learn about effects in either direction.

For example, I was involved in a research study that looked at the effects of an exercise intervention on bone density. The idea was that it might be a good way to prevent osteoporosis. I used a two-tailed test. Obviously, we’re hoping that there was positive effect. However, we’d be very interested in knowing whether there was a negative effect too. And, this illustrates how you can have different actions based on both directions. If there was a positive effect, you can recommend that as a good approach and try to promote its use. If there’s a negative effect, you’d issue a warning to not do that intervention. You have the potential for learning both what is good and what is bad. The extra false-positives would’ve cause problems because we’d think that there’d be health benefits for participants when those benefits don’t actually exist. Also, if we had performed only a one-tailed test and didn’t obtain significant results, we’d learn that it wasn’t a positive effect, but we would not know whether it was actually detrimental or not.

Here’s when I’d say it’s OK to use a one-tailed test. Consider a one-tailed test when you’re in situation where you truly only need to know whether an effect exists in one direction, and the extra Type I errors in that direction are an acceptable risk (false positives don’t cause problems), and there’s no benefit in determining whether an effect exists in the other direction. Those conditions really restrict when one-tailed tests are the best choice. Again, those restrictions might not be relevant for your specific field, but as for the usage of statistics as a whole, they’re absolutely crucial to consider.

On the other hand, according to this article, two-tailed tests might be important in A/B testing !

' src=

March 30, 2018 at 5:29 am

Dear Sir, please confirm if there is an inadvertent mistake in interpretation as, “We can conclude that mean fuel expenditures have increased since last year.” Our null hypothesis is =260. If found significant, it implies two possibilities – both increase and decrease. Please let us know if we are mistaken here. Many Thanks!

March 30, 2018 at 9:59 am

Hi Khalid, the null hypothesis as it is defined for this test represents the mean monthly expenditure for the previous year (260). The mean expenditure for the current year is 330.6 whereas it was 260 for the previous year. Consequently, the mean has increased from 260 to 330.7 over the course of a year. The p-value indicates that this increase is statistically significant. This finding does not suggest both an increase and a decrease–just an increase. Keep in mind that a significant result prompts us to reject the null hypothesis. So, we reject the null that the mean equals 260.

Let’s explore the other possible findings to be sure that this makes sense. Suppose the sample mean had been closer to 260 and the p-value was greater than the significance level, those results would indicate that the results were not statistically significant. The conclusion that we’d draw is that we have insufficient evidence to conclude that mean fuel expenditures have changed since the previous year.

If the sample mean was less than the null hypothesis (260) and if the p-value is statistically significant, we’d concluded that mean fuel expenditures have decreased and that this decrease is statistically significant.

When you interpret the results, you have to be sure to understand what the null hypothesis represents. In this case, it represents the mean monthly expenditure for the previous year and we’re comparing this year’s mean to it–hence our sample suggests an increase.

Comments and Questions Cancel reply

  • Search Search Please fill out this field.

What Is P-Value?

Understanding p-value.

  • P-Value in Hypothesis Testing

The Bottom Line

  • Corporate Finance
  • Financial Analysis

P-Value: What It Is, How to Calculate It, and Why It Matters

p value of testing a hypothesis

Yarilet Perez is an experienced multimedia journalist and fact-checker with a Master of Science in Journalism. She has worked in multiple cities covering breaking news, politics, education, and more. Her expertise is in personal finance and investing, and real estate.

p value of testing a hypothesis

In statistics, a p-value is defined as In statistics, a p-value indicates the likelihood of obtaining a value equal to or greater than the observed result if the null hypothesis is true.

The p-value serves as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means stronger evidence in favor of the alternative hypothesis.

P-value is often used to promote credibility for studies or reports by government agencies. For example, the U.S. Census Bureau stipulates that any analysis with a p-value greater than 0.10 must be accompanied by a statement that the difference is not statistically different from zero. The Census Bureau also has standards in place stipulating which p-values are acceptable for various publications.

Key Takeaways

  • A p-value is a statistical measurement used to validate a hypothesis against observed data.
  • A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true.
  • The lower the p-value, the greater the statistical significance of the observed difference.
  • A p-value of 0.05 or lower is generally considered statistically significant.
  • P-value can serve as an alternative to—or in addition to—preselected confidence levels for hypothesis testing.

Jessica Olah / Investopedia

P-values are usually calculated using statistical software or p-value tables based on the assumed or known probability distribution of the specific statistic tested. While the sample size influences the reliability of the observed data, the p-value approach to hypothesis testing specifically involves calculating the p-value based on the deviation between the observed value and a chosen reference value, given the probability distribution of the statistic. A greater difference between the two values corresponds to a lower p-value.

Mathematically, the p-value is calculated using integral calculus from the area under the probability distribution curve for all values of statistics that are at least as far from the reference value as the observed value is, relative to the total area under the probability distribution curve. Standard deviations, which quantify the dispersion of data points from the mean, are instrumental in this calculation.

The calculation for a p-value varies based on the type of test performed. The three test types describe the location on the probability distribution curve: lower-tailed test, upper-tailed test, or two-tailed test . In each case, the degrees of freedom play a crucial role in determining the shape of the distribution and thus, the calculation of the p-value.

In a nutshell, the greater the difference between two observed values, the less likely it is that the difference is due to simple random chance, and this is reflected by a lower p-value.

The P-Value Approach to Hypothesis Testing

The p-value approach to hypothesis testing uses the calculated probability to determine whether there is evidence to reject the null hypothesis. This determination relies heavily on the test statistic, which summarizes the information from the sample relevant to the hypothesis being tested. The null hypothesis, also known as the conjecture, is the initial claim about a population (or data-generating process). The alternative hypothesis states whether the population parameter differs from the value of the population parameter stated in the conjecture.

In practice, the significance level is stated in advance to determine how small the p-value must be to reject the null hypothesis. Because different researchers use different levels of significance when examining a question, a reader may sometimes have difficulty comparing results from two different tests. P-values provide a solution to this problem.

Even a low p-value is not necessarily proof of statistical significance, since there is still a possibility that the observed data are the result of chance. Only repeated experiments or studies can confirm if a relationship is statistically significant.

For example, suppose a study comparing returns from two particular assets was undertaken by different researchers who used the same data but different significance levels. The researchers might come to opposite conclusions regarding whether the assets differ.

If one researcher used a confidence level of 90% and the other required a confidence level of 95% to reject the null hypothesis, and if the p-value of the observed difference between the two returns was 0.08 (corresponding to a confidence level of 92%), then the first researcher would find that the two assets have a difference that is statistically significant , while the second would find no statistically significant difference between the returns.

To avoid this problem, the researchers could report the p-value of the hypothesis test and allow readers to interpret the statistical significance themselves. This is called a p-value approach to hypothesis testing. Independent observers could note the p-value and decide for themselves whether that represents a statistically significant difference or not.

Example of P-Value

An investor claims that their investment portfolio’s performance is equivalent to that of the Standard & Poor’s (S&P) 500 Index . To determine this, the investor conducts a two-tailed test.

The null hypothesis states that the portfolio’s returns are equivalent to the S&P 500’s returns over a specified period, while the alternative hypothesis states that the portfolio’s returns and the S&P 500’s returns are not equivalent—if the investor conducted a one-tailed test , the alternative hypothesis would state that the portfolio’s returns are either less than or greater than the S&P 500’s returns.

The p-value hypothesis test does not necessarily make use of a preselected confidence level at which the investor should reset the null hypothesis that the returns are equivalent. Instead, it provides a measure of how much evidence there is to reject the null hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis.

Thus, if the investor finds that the p-value is 0.001, there is strong evidence against the null hypothesis, and the investor can confidently conclude that the portfolio’s returns and the S&P 500’s returns are not equivalent.

Although this does not provide an exact threshold as to when the investor should accept or reject the null hypothesis, it does have another very practical advantage. P-value hypothesis testing offers a direct way to compare the relative confidence that the investor can have when choosing among multiple different types of investments or portfolios relative to a benchmark such as the S&P 500.

For example, for two portfolios, A and B, whose performance differs from the S&P 500 with p-values of 0.10 and 0.01, respectively, the investor can be much more confident that portfolio B, with a lower p-value, will actually show consistently different results.

Is a 0.05 P-Value Significant?

A p-value less than 0.05 is typically considered to be statistically significant, in which case the null hypothesis should be rejected. A p-value greater than 0.05 means that deviation from the null hypothesis is not statistically significant, and the null hypothesis is not rejected.

What Does a P-Value of 0.001 Mean?

A p-value of 0.001 indicates that if the null hypothesis tested were indeed true, then there would be a one-in-1,000 chance of observing results at least as extreme. This leads the observer to reject the null hypothesis because either a highly rare data result has been observed or the null hypothesis is incorrect.

How Can You Use P-Value to Compare 2 Different Results of a Hypothesis Test?

If you have two different results, one with a p-value of 0.04 and one with a p-value of 0.06, the result with a p-value of 0.04 will be considered more statistically significant than the p-value of 0.06. Beyond this simplified example, you could compare a 0.04 p-value to a 0.001 p-value. Both are statistically significant, but the 0.001 example provides an even stronger case against the null hypothesis than the 0.04.

The p-value is used to measure the significance of observational data. When researchers identify an apparent relationship between two variables, there is always a possibility that this correlation might be a coincidence. A p-value calculation helps determine if the observed relationship could arise as a result of chance.

U.S. Census Bureau. “ Statistical Quality Standard E1: Analyzing Data .”

p value of testing a hypothesis

  • Terms of Service
  • Editorial Policy
  • Privacy Policy

Warning: The NCBI web site requires JavaScript to function. more...

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Hypothesis testing, p values, confidence intervals, and significance.

Jacob Shreffler ; Martin R. Huecker .

Affiliations

Last Update: March 13, 2023 .

  • Definition/Introduction

Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a research hypothesis is tested with results provided, typically with p values, confidence intervals, or both. Additionally, statistical or research significance is estimated or determined by the investigators. Unfortunately, healthcare providers may have different comfort levels in interpreting these findings, which may affect the adequate application of the data.

  • Issues of Concern

Without a foundational understanding of hypothesis testing, p values, confidence intervals, and the difference between statistical and clinical significance, it may affect healthcare providers' ability to make clinical decisions without relying purely on the research investigators deemed level of significance. Therefore, an overview of these concepts is provided to allow medical professionals to use their expertise to determine if results are reported sufficiently and if the study outcomes are clinically appropriate to be applied in healthcare practice.

Hypothesis Testing

Investigators conducting studies need research questions and hypotheses to guide analyses. Starting with broad research questions (RQs), investigators then identify a gap in current clinical practice or research. Any research problem or statement is grounded in a better understanding of relationships between two or more variables. For this article, we will use the following research question example:

Research Question: Is Drug 23 an effective treatment for Disease A?

Research questions do not directly imply specific guesses or predictions; we must formulate research hypotheses. A hypothesis is a predetermined declaration regarding the research question in which the investigator(s) makes a precise, educated guess about a study outcome. This is sometimes called the alternative hypothesis and ultimately allows the researcher to take a stance based on experience or insight from medical literature. An example of a hypothesis is below.

Research Hypothesis: Drug 23 will significantly reduce symptoms associated with Disease A compared to Drug 22.

The null hypothesis states that there is no statistical difference between groups based on the stated research hypothesis.

Researchers should be aware of journal recommendations when considering how to report p values, and manuscripts should remain internally consistent.

Regarding p values, as the number of individuals enrolled in a study (the sample size) increases, the likelihood of finding a statistically significant effect increases. With very large sample sizes, the p-value can be very low significant differences in the reduction of symptoms for Disease A between Drug 23 and Drug 22. The null hypothesis is deemed true until a study presents significant data to support rejecting the null hypothesis. Based on the results, the investigators will either reject the null hypothesis (if they found significant differences or associations) or fail to reject the null hypothesis (they could not provide proof that there were significant differences or associations).

To test a hypothesis, researchers obtain data on a representative sample to determine whether to reject or fail to reject a null hypothesis. In most research studies, it is not feasible to obtain data for an entire population. Using a sampling procedure allows for statistical inference, though this involves a certain possibility of error. [1]  When determining whether to reject or fail to reject the null hypothesis, mistakes can be made: Type I and Type II errors. Though it is impossible to ensure that these errors have not occurred, researchers should limit the possibilities of these faults. [2]

Significance

Significance is a term to describe the substantive importance of medical research. Statistical significance is the likelihood of results due to chance. [3]  Healthcare providers should always delineate statistical significance from clinical significance, a common error when reviewing biomedical research. [4]  When conceptualizing findings reported as either significant or not significant, healthcare providers should not simply accept researchers' results or conclusions without considering the clinical significance. Healthcare professionals should consider the clinical importance of findings and understand both p values and confidence intervals so they do not have to rely on the researchers to determine the level of significance. [5]  One criterion often used to determine statistical significance is the utilization of p values.

P values are used in research to determine whether the sample estimate is significantly different from a hypothesized value. The p-value is the probability that the observed effect within the study would have occurred by chance if, in reality, there was no true effect. Conventionally, data yielding a p<0.05 or p<0.01 is considered statistically significant. While some have debated that the 0.05 level should be lowered, it is still universally practiced. [6]  Hypothesis testing allows us to determine the size of the effect.

An example of findings reported with p values are below:

Statement: Drug 23 reduced patients' symptoms compared to Drug 22. Patients who received Drug 23 (n=100) were 2.1 times less likely than patients who received Drug 22 (n = 100) to experience symptoms of Disease A, p<0.05.

Statement:Individuals who were prescribed Drug 23 experienced fewer symptoms (M = 1.3, SD = 0.7) compared to individuals who were prescribed Drug 22 (M = 5.3, SD = 1.9). This finding was statistically significant, p= 0.02.

For either statement, if the threshold had been set at 0.05, the null hypothesis (that there was no relationship) should be rejected, and we should conclude significant differences. Noticeably, as can be seen in the two statements above, some researchers will report findings with < or > and others will provide an exact p-value (0.000001) but never zero [6] . When examining research, readers should understand how p values are reported. The best practice is to report all p values for all variables within a study design, rather than only providing p values for variables with significant findings. [7]  The inclusion of all p values provides evidence for study validity and limits suspicion for selective reporting/data mining.  

While researchers have historically used p values, experts who find p values problematic encourage the use of confidence intervals. [8] . P-values alone do not allow us to understand the size or the extent of the differences or associations. [3]  In March 2016, the American Statistical Association (ASA) released a statement on p values, noting that scientific decision-making and conclusions should not be based on a fixed p-value threshold (e.g., 0.05). They recommend focusing on the significance of results in the context of study design, quality of measurements, and validity of data. Ultimately, the ASA statement noted that in isolation, a p-value does not provide strong evidence. [9]

When conceptualizing clinical work, healthcare professionals should consider p values with a concurrent appraisal study design validity. For example, a p-value from a double-blinded randomized clinical trial (designed to minimize bias) should be weighted higher than one from a retrospective observational study [7] . The p-value debate has smoldered since the 1950s [10] , and replacement with confidence intervals has been suggested since the 1980s. [11]

Confidence Intervals

A confidence interval provides a range of values within given confidence (e.g., 95%), including the accurate value of the statistical constraint within a targeted population. [12]  Most research uses a 95% CI, but investigators can set any level (e.g., 90% CI, 99% CI). [13]  A CI provides a range with the lower bound and upper bound limits of a difference or association that would be plausible for a population. [14]  Therefore, a CI of 95% indicates that if a study were to be carried out 100 times, the range would contain the true value in 95, [15]  confidence intervals provide more evidence regarding the precision of an estimate compared to p-values. [6]

In consideration of the similar research example provided above, one could make the following statement with 95% CI:

Statement: Individuals who were prescribed Drug 23 had no symptoms after three days, which was significantly faster than those prescribed Drug 22; there was a mean difference between the two groups of days to the recovery of 4.2 days (95% CI: 1.9 – 7.8).

It is important to note that the width of the CI is affected by the standard error and the sample size; reducing a study sample number will result in less precision of the CI (increase the width). [14]  A larger width indicates a smaller sample size or a larger variability. [16]  A researcher would want to increase the precision of the CI. For example, a 95% CI of 1.43 – 1.47 is much more precise than the one provided in the example above. In research and clinical practice, CIs provide valuable information on whether the interval includes or excludes any clinically significant values. [14]

Null values are sometimes used for differences with CI (zero for differential comparisons and 1 for ratios). However, CIs provide more information than that. [15]  Consider this example: A hospital implements a new protocol that reduced wait time for patients in the emergency department by an average of 25 minutes (95% CI: -2.5 – 41 minutes). Because the range crosses zero, implementing this protocol in different populations could result in longer wait times; however, the range is much higher on the positive side. Thus, while the p-value used to detect statistical significance for this may result in "not significant" findings, individuals should examine this range, consider the study design, and weigh whether or not it is still worth piloting in their workplace.

Similarly to p-values, 95% CIs cannot control for researchers' errors (e.g., study bias or improper data analysis). [14]  In consideration of whether to report p-values or CIs, researchers should examine journal preferences. When in doubt, reporting both may be beneficial. [13]  An example is below:

Reporting both: Individuals who were prescribed Drug 23 had no symptoms after three days, which was significantly faster than those prescribed Drug 22, p = 0.009. There was a mean difference between the two groups of days to the recovery of 4.2 days (95% CI: 1.9 – 7.8).

  • Clinical Significance

Recall that clinical significance and statistical significance are two different concepts. Healthcare providers should remember that a study with statistically significant differences and large sample size may be of no interest to clinicians, whereas a study with smaller sample size and statistically non-significant results could impact clinical practice. [14]  Additionally, as previously mentioned, a non-significant finding may reflect the study design itself rather than relationships between variables.

Healthcare providers using evidence-based medicine to inform practice should use clinical judgment to determine the practical importance of studies through careful evaluation of the design, sample size, power, likelihood of type I and type II errors, data analysis, and reporting of statistical findings (p values, 95% CI or both). [4]  Interestingly, some experts have called for "statistically significant" or "not significant" to be excluded from work as statistical significance never has and will never be equivalent to clinical significance. [17]

The decision on what is clinically significant can be challenging, depending on the providers' experience and especially the severity of the disease. Providers should use their knowledge and experiences to determine the meaningfulness of study results and make inferences based not only on significant or insignificant results by researchers but through their understanding of study limitations and practical implications.

  • Nursing, Allied Health, and Interprofessional Team Interventions

All physicians, nurses, pharmacists, and other healthcare professionals should strive to understand the concepts in this chapter. These individuals should maintain the ability to review and incorporate new literature for evidence-based and safe care. 

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Disclosure: Jacob Shreffler declares no relevant financial relationships with ineligible companies.

Disclosure: Martin Huecker declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Shreffler J, Huecker MR. Hypothesis Testing, P Values, Confidence Intervals, and Significance. [Updated 2023 Mar 13]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Similar articles in PubMed

  • The reporting of p values, confidence intervals and statistical significance in Preventive Veterinary Medicine (1997-2017). [PeerJ. 2021] The reporting of p values, confidence intervals and statistical significance in Preventive Veterinary Medicine (1997-2017). Messam LLM, Weng HY, Rosenberger NWY, Tan ZH, Payet SDM, Santbakshsing M. PeerJ. 2021; 9:e12453. Epub 2021 Nov 24.
  • Review Clinical versus statistical significance: interpreting P values and confidence intervals related to measures of association to guide decision making. [J Pharm Pract. 2010] Review Clinical versus statistical significance: interpreting P values and confidence intervals related to measures of association to guide decision making. Ferrill MJ, Brown DA, Kyle JA. J Pharm Pract. 2010 Aug; 23(4):344-51. Epub 2010 Apr 13.
  • Interpreting "statistical hypothesis testing" results in clinical research. [J Ayurveda Integr Med. 2012] Interpreting "statistical hypothesis testing" results in clinical research. Sarmukaddam SB. J Ayurveda Integr Med. 2012 Apr; 3(2):65-9.
  • Confidence intervals in procedural dermatology: an intuitive approach to interpreting data. [Dermatol Surg. 2005] Confidence intervals in procedural dermatology: an intuitive approach to interpreting data. Alam M, Barzilai DA, Wrone DA. Dermatol Surg. 2005 Apr; 31(4):462-6.
  • Review Is statistical significance testing useful in interpreting data? [Reprod Toxicol. 1993] Review Is statistical significance testing useful in interpreting data? Savitz DA. Reprod Toxicol. 1993; 7(2):95-100.

Recent Activity

  • Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearl... Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

p-value Calculator

What is p-value, how do i calculate p-value from test statistic, how to interpret p-value, how to use the p-value calculator to find p-value from test statistic, how do i find p-value from z-score, how do i find p-value from t, p-value from chi-square score (χ² score), p-value from f-score.

Welcome to our p-value calculator! You will never again have to wonder how to find the p-value, as here you can determine the one-sided and two-sided p-values from test statistics, following all the most popular distributions: normal, t-Student, chi-squared, and Snedecor's F.

P-values appear all over science, yet many people find the concept a bit intimidating. Don't worry – in this article, we will explain not only what the p-value is but also how to interpret p-values correctly . Have you ever been curious about how to calculate the p-value by hand? We provide you with all the necessary formulae as well!

🙋 If you want to revise some basics from statistics, our normal distribution calculator is an excellent place to start.

Formally, the p-value is the probability that the test statistic will produce values at least as extreme as the value it produced for your sample . It is crucial to remember that this probability is calculated under the assumption that the null hypothesis H 0 is true !

More intuitively, p-value answers the question:

Assuming that I live in a world where the null hypothesis holds, how probable is it that, for another sample, the test I'm performing will generate a value at least as extreme as the one I observed for the sample I already have?

It is the alternative hypothesis that determines what "extreme" actually means , so the p-value depends on the alternative hypothesis that you state: left-tailed, right-tailed, or two-tailed. In the formulas below, S stands for a test statistic, x for the value it produced for a given sample, and Pr(event | H 0 ) is the probability of an event, calculated under the assumption that H 0 is true:

Left-tailed test: p-value = Pr(S ≤ x | H 0 )

Right-tailed test: p-value = Pr(S ≥ x | H 0 )

Two-tailed test:

p-value = 2 × min{Pr(S ≤ x | H 0 ), Pr(S ≥ x | H 0 )}

(By min{a,b} , we denote the smaller number out of a and b .)

If the distribution of the test statistic under H 0 is symmetric about 0 , then: p-value = 2 × Pr(S ≥ |x| | H 0 )

or, equivalently: p-value = 2 × Pr(S ≤ -|x| | H 0 )

As a picture is worth a thousand words, let us illustrate these definitions. Here, we use the fact that the probability can be neatly depicted as the area under the density curve for a given distribution. We give two sets of pictures: one for a symmetric distribution and the other for a skewed (non-symmetric) distribution.

  • Symmetric case: normal distribution:

p-values for symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

  • Non-symmetric case: chi-squared distribution:

p-values for non-symmetric distribution — left-tailed, right-tailed, and two-tailed tests.

In the last picture (two-tailed p-value for skewed distribution), the area of the left-hand side is equal to the area of the right-hand side.

To determine the p-value, you need to know the distribution of your test statistic under the assumption that the null hypothesis is true . Then, with the help of the cumulative distribution function ( cdf ) of this distribution, we can express the probability of the test statistics being at least as extreme as its value x for the sample:

Left-tailed test:

p-value = cdf(x) .

Right-tailed test:

p-value = 1 - cdf(x) .

p-value = 2 × min{cdf(x) , 1 - cdf(x)} .

If the distribution of the test statistic under H 0 is symmetric about 0 , then a two-sided p-value can be simplified to p-value = 2 × cdf(-|x|) , or, equivalently, as p-value = 2 - 2 × cdf(|x|) .

The probability distributions that are most widespread in hypothesis testing tend to have complicated cdf formulae, and finding the p-value by hand may not be possible. You'll likely need to resort to a computer or to a statistical table, where people have gathered approximate cdf values.

Well, you now know how to calculate the p-value, but… why do you need to calculate this number in the first place? In hypothesis testing, the p-value approach is an alternative to the critical value approach . Recall that the latter requires researchers to pre-set the significance level, α, which is the probability of rejecting the null hypothesis when it is true (so of type I error ). Once you have your p-value, you just need to compare it with any given α to quickly decide whether or not to reject the null hypothesis at that significance level, α. For details, check the next section, where we explain how to interpret p-values.

As we have mentioned above, the p-value is the answer to the following question:

What does that mean for you? Well, you've got two options:

  • A high p-value means that your data is highly compatible with the null hypothesis; and
  • A small p-value provides evidence against the null hypothesis , as it means that your result would be very improbable if the null hypothesis were true.

However, it may happen that the null hypothesis is true, but your sample is highly unusual! For example, imagine we studied the effect of a new drug and got a p-value of 0.03 . This means that in 3% of similar studies, random chance alone would still be able to produce the value of the test statistic that we obtained, or a value even more extreme, even if the drug had no effect at all!

The question "what is p-value" can also be answered as follows: p-value is the smallest level of significance at which the null hypothesis would be rejected. So, if you now want to make a decision on the null hypothesis at some significance level α , just compare your p-value with α :

  • If p-value ≤ α , then you reject the null hypothesis and accept the alternative hypothesis; and
  • If p-value ≥ α , then you don't have enough evidence to reject the null hypothesis.

Obviously, the fate of the null hypothesis depends on α . For instance, if the p-value was 0.03 , we would reject the null hypothesis at a significance level of 0.05 , but not at a level of 0.01 . That's why the significance level should be stated in advance and not adapted conveniently after the p-value has been established! A significance level of 0.05 is the most common value, but there's nothing magical about it. Here, you can see what too strong a faith in the 0.05 threshold can lead to. It's always best to report the p-value, and allow the reader to make their own conclusions.

Also, bear in mind that subject area expertise (and common reason) is crucial. Otherwise, mindlessly applying statistical principles, you can easily arrive at statistically significant, despite the conclusion being 100% untrue.

As our p-value calculator is here at your service, you no longer need to wonder how to find p-value from all those complicated test statistics! Here are the steps you need to follow:

Pick the alternative hypothesis : two-tailed, right-tailed, or left-tailed.

Tell us the distribution of your test statistic under the null hypothesis: is it N(0,1), t-Student, chi-squared, or Snedecor's F? If you are unsure, check the sections below, as they are devoted to these distributions.

If needed, specify the degrees of freedom of the test statistic's distribution.

Enter the value of test statistic computed for your data sample.

Our calculator determines the p-value from the test statistic and provides the decision to be made about the null hypothesis. The standard significance level is 0.05 by default.

Go to the advanced mode if you need to increase the precision with which the calculations are performed or change the significance level .

In terms of the cumulative distribution function (cdf) of the standard normal distribution, which is traditionally denoted by Φ , the p-value is given by the following formulae:

Left-tailed z-test:

p-value = Φ(Z score )

Right-tailed z-test:

p-value = 1 - Φ(Z score )

Two-tailed z-test:

p-value = 2 × Φ(−|Z score |)

p-value = 2 - 2 × Φ(|Z score |)

🙋 To learn more about Z-tests, head to Omni's Z-test calculator .

We use the Z-score if the test statistic approximately follows the standard normal distribution N(0,1) . Thanks to the central limit theorem, you can count on the approximation if you have a large sample (say at least 50 data points) and treat your distribution as normal.

A Z-test most often refers to testing the population mean , or the difference between two population means, in particular between two proportions. You can also find Z-tests in maximum likelihood estimations.

The p-value from the t-score is given by the following formulae, in which cdf t,d stands for the cumulative distribution function of the t-Student distribution with d degrees of freedom:

Left-tailed t-test:

p-value = cdf t,d (t score )

Right-tailed t-test:

p-value = 1 - cdf t,d (t score )

Two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

p-value = 2 - 2 × cdf t,d (|t score |)

Use the t-score option if your test statistic follows the t-Student distribution . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails – the exact shape depends on the parameter called the degrees of freedom . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from the normal distribution N(0,1).

The most common t-tests are those for population means with an unknown population standard deviation, or for the difference between means of two populations , with either equal or unequal yet unknown population standard deviations. There's also a t-test for paired (dependent) samples .

🙋 To get more insights into t-statistics, we recommend using our t-test calculator .

Use the χ²-score option when performing a test in which the test statistic follows the χ²-distribution .

This distribution arises if, for example, you take the sum of squared variables, each following the normal distribution N(0,1). Remember to check the number of degrees of freedom of the χ²-distribution of your test statistic!

How to find the p-value from chi-square-score ? You can do it with the help of the following formulae, in which cdf χ²,d denotes the cumulative distribution function of the χ²-distribution with d degrees of freedom:

Left-tailed χ²-test:

p-value = cdf χ²,d (χ² score )

Right-tailed χ²-test:

p-value = 1 - cdf χ²,d (χ² score )

Remember that χ²-tests for goodness-of-fit and independence are right-tailed tests! (see below)

Two-tailed χ²-test:

p-value = 2 × min{cdf χ²,d (χ² score ), 1 - cdf χ²,d (χ² score )}

(By min{a,b} , we denote the smaller of the numbers a and b .)

The most popular tests which lead to a χ²-score are the following:

Testing whether the variance of normally distributed data has some pre-determined value. In this case, the test statistic has the χ²-distribution with n - 1 degrees of freedom, where n is the sample size. This can be a one-tailed or two-tailed test .

Goodness-of-fit test checks whether the empirical (sample) distribution agrees with some expected probability distribution. In this case, the test statistic follows the χ²-distribution with k - 1 degrees of freedom, where k is the number of classes into which the sample is divided. This is a right-tailed test .

Independence test is used to determine if there is a statistically significant relationship between two variables. In this case, its test statistic is based on the contingency table and follows the χ²-distribution with (r - 1)(c - 1) degrees of freedom, where r is the number of rows, and c is the number of columns in this contingency table. This also is a right-tailed test .

Finally, the F-score option should be used when you perform a test in which the test statistic follows the F-distribution , also known as the Fisher–Snedecor distribution. The exact shape of an F-distribution depends on two degrees of freedom .

To see where those degrees of freedom come from, consider the independent random variables X and Y , which both follow the χ²-distributions with d 1 and d 2 degrees of freedom, respectively. In that case, the ratio (X/d 1 )/(Y/d 2 ) follows the F-distribution, with (d 1 , d 2 ) -degrees of freedom. For this reason, the two parameters d 1 and d 2 are also called the numerator and denominator degrees of freedom .

The p-value from F-score is given by the following formulae, where we let cdf F,d1,d2 denote the cumulative distribution function of the F-distribution, with (d 1 , d 2 ) -degrees of freedom:

Left-tailed F-test:

p-value = cdf F,d1,d2 (F score )

Right-tailed F-test:

p-value = 1 - cdf F,d1,d2 (F score )

Two-tailed F-test:

p-value = 2 × min{cdf F,d1,d2 (F score ), 1 - cdf F,d1,d2 (F score )}

Below we list the most important tests that produce F-scores. All of them are right-tailed tests .

A test for the equality of variances in two normally distributed populations . Its test statistic follows the F-distribution with (n - 1, m - 1) -degrees of freedom, where n and m are the respective sample sizes.

ANOVA is used to test the equality of means in three or more groups that come from normally distributed populations with equal variances. We arrive at the F-distribution with (k - 1, n - k) -degrees of freedom, where k is the number of groups, and n is the total sample size (in all groups together).

A test for overall significance of regression analysis . The test statistic has an F-distribution with (k - 1, n - k) -degrees of freedom, where n is the sample size, and k is the number of variables (including the intercept).

With the presence of the linear relationship having been established in your data sample with the above test, you can calculate the coefficient of determination, R 2 , which indicates the strength of this relationship . You can do it by hand or use our coefficient of determination calculator .

A test to compare two nested regression models . The test statistic follows the F-distribution with (k 2 - k 1 , n - k 2 ) -degrees of freedom, where k 1 and k 2 are the numbers of variables in the smaller and bigger models, respectively, and n is the sample size.

You may notice that the F-test of an overall significance is a particular form of the F-test for comparing two nested models: it tests whether our model does significantly better than the model with no predictors (i.e., the intercept-only model).

Can p-value be negative?

No, the p-value cannot be negative. This is because probabilities cannot be negative, and the p-value is the probability of the test statistic satisfying certain conditions.

What does a high p-value mean?

A high p-value means that under the null hypothesis, there's a high probability that for another sample, the test statistic will generate a value at least as extreme as the one observed in the sample you already have. A high p-value doesn't allow you to reject the null hypothesis.

What does a low p-value mean?

A low p-value means that under the null hypothesis, there's little probability that for another sample, the test statistic will generate a value at least as extreme as the one observed for the sample you already have. A low p-value is evidence in favor of the alternative hypothesis – it allows you to reject the null hypothesis.

Geometric distribution

Plastic footprint.

  • Biology (102)
  • Chemistry (101)
  • Construction (148)
  • Conversion (304)
  • Ecology (31)
  • Everyday life (263)
  • Finance (592)
  • Health (443)
  • Physics (513)
  • Sports (108)
  • Statistics (184)
  • Other (186)
  • Discover Omni (40)

The p value – definition and interpretation of p-values in statistics

This article examines the most common statistic reported in scientific papers and used in applied statistical analyses – the p -value . The article goes through the definition illustrated with examples, discusses its utility, interpretation, and common misinterpretations of observed statistical significance and significance levels. It is structured as follows:

What does ‘ p ‘ in ‘ p -value’ stand for?

What does p measure and how to interpret it.

  • A p-value only makes sense under a specified null hypothesis

How to calculate a p -value?

A practical example, p -values as convenient summary statistics.

  • Quantifying the relative uncertainty of data

Easy comparison of different statistical tests

  • p -value interpretation in outcomes of experiments (randomized controlled trials)
  • p -value interpretation in regressions and correlations of observational data

Mistaking statistical significance with practical significance

Treating the significance level as likelihood for the observed effect, treating p -values as likelihoods attached to hypotheses, a high p -value means the null hypothesis is true, lack of statistical significance suggests a small effect size, p -value definition and meaning.

The technical definition of the p -value is (based on [4,5,6]):

A p -value is the probability of the data-generating mechanism corresponding to a specified null hypothesis to produce an outcome as extreme or more extreme than the one observed.

However, it is only straightforward to understand for those already familiar in detail with terms such as ‘probability’, ‘null hypothesis’, ‘data generating mechanism’, ‘extreme outcome’. These, in turn, require knowledge of what a ‘hypothesis’, a ‘statistical model’ and ‘statistic’ mean, and so on. While some of these will be explained on a cursory level in the following paragraphs, those looking for deeper understanding should consider consulting the following glossary definitions: statistical model , hypothesis , null hypothesis , statistic .

A slightly less technical and therefore more accessible definition is:

A p -value quantifies how likely it is to erroneously reject a specific statistical hypothesis, were it true, based on a given set of data.

Let us break these down and examine several examples to make both of these definitions make sense.

p stands for p robability where probability means the frequency with which an event occurs under certain assumptions. The most common example is the frequency with which a coin lands heads under the assumption that it is equally balanced (a fair coin toss ). That frequency is 0.5 (50%).

Capital ‘P’ stands for probability in general, whereas lowercase ‘ p ‘ refers to the probability of a particular data realization. To expand on the coin toss example: P would stand for the probability of heads in general, whereas p could refer to the probability of landing a series of five heads in a row, or the probability of landing less than or equal to 38 heads out of 100 coin flips.

Given that it was established that p stands for probability, it is easy to figure out it measures a sort of probability.

In everyday language the term ‘probability’ might be used as synonymous to ‘chance’, ‘likelihood’, ‘odds’, e.g. there is 90% probability that it will rain tomorrow. However, in statistics one cannot speak of ‘probability’ without specifying a mechanism which generates the observed data. A simple example of such a mechanism is a device which produces fair coin tosses. A statistical model based on this data-generating mechanism can be put forth and under that model the probability of 38 or less heads out of 100 tosses can be estimated to be 1.05%, for example by using a binomial calculator . The p -value against the model of a fair coin would be ~0.01 (rounding it to 0.01 from hereon for the purposes of the article).

The way to interpret that p -value is: observing 38 heads or less out of the 100 tosses could have happened in only 1% of infinitely many series of 100 fair coin tosses. The null hypothesis in this case is defined as the coin being fair, therefore having a 50% chance for heads and 50% chance for tails on each toss.

Assuming the null hypothesis is true allows the comparison of the observed data to what would have been expected under the null. It turns out the particular observation of 38/100 heads is a rather improbable and thus surprising outcome under the assumption of the null hypothesis. This is measured by the low p -value which also accounts for more extreme outcomes such as 37/100, 36/100, and so on all the way to 0/100.

If one had a predefined level of statistical significance at 0.05 then one would claim that the outcome is statistically significant since it’s p -value of 0.01 meets the 0.05 significance level (0.01 ≤ 0.05). A visual representation of the relationship between p -values, significance level ( p -value threshold), and statistical significance of an outcome is illustrated visually in this graph:

P-value and significance level explained

In fact, had the significance threshold been at any value above 0.01, the outcome would have been statistically significant, therefore it is usually said that with a p -value of 0.01, the outcome is statistically significant at any level above 0.01 .

Continuing with the interpretation: were one to reject the null hypothesis based on this p -value of 0.01, they would be acting as if a significance level of 0.01 or lower provides sufficient evidence against the hypothesis of the coin being fair. One could interpret this as a rule for a long-run series of experiments and inferences . In such a series, by using this p -value threshold one would incorrectly reject the fair coin hypothesis in at most 1 out of 100 cases, regardless of whether the coin is actually fair in any one of them. An incorrect rejection of the null is often called a type I error as opposed to a type II error which is to incorrectly fail to reject a null.

A more intuitive interpretation proceeds without reference to hypothetical long-runs. This second interpretation comes in the form of a strong argument from coincidence :

  • there was a low probability (0.01 or 1%) that something would have happened assuming the null was true
  • it did happen so it has to be an unusual (to the extent that the p -value is low) coincidence that it happened
  • this warrants the conclusion to reject the null hypothesis

( source ). It stems from the concept of severe testing as developed by Prof. Deborah Mayo in her various works [1,2,3,4,5] and reflects an error-probabilistic approach to inference.

A p -value only makes sense under a specified null hypothesis

It is important to understand why a specified ‘null hypothesis’ should always accompany any reported p -value and why p-values are crucial in so-called Null Hypothesis Statistical Tests (NHST) . Statistical significance only makes sense when referring to a particular statistical model which in turn corresponds to a given null hypothesis. A p -value calculation has a statistical model and a statistical null hypothesis defined within it as prerequisites, and a statistical null is only interesting because of some tightly related substantive null such as ‘this treatment improves outcomes’. The relationship is shown in the chart below:

The relationship between a substantive hypothesis to a statistical model, significance threshold and p-value

In the coin example, the substantive null that is interesting to (potentially) reject is the claim that the coin is fair. It translates to a statistical null hypothesis (model) with the following key properties:

  • heads having 50% chance and tails having 50% chance, on each toss
  • independence of each toss from any other toss. The outcome of any given coin toss does not depend on past or future coin tosses.
  • homogeneity of the coin behavior over time (the true chance does not change across infinitely many tosses)
  • a binomial error distribution

The resulting p -value of 0.01 from the coin toss experiment should be interpreted as the probability only under these particular assumptions.

What happens, however, if someone is interested in rejecting the claim that the coin is somewhat biased against heads? To be precise: the claim that it has a true frequency of heads of 40% or less (hence 60% for tails) is the one they are looking to deny with a certain evidential threshold.

The p -value needs to be recalculated under their null hypothesis so now the same 38 heads out of 100 tosses result in a p -value of ~0.38 ( calculation ). If they were interested in rejecting such a null hypothesis, then this data provide poor evidence against it since a 38/100 outcome would not be unusual at all if it were in fact true (p ≤ 0.38 would occur with probability 38%).

Similarly, the p -value needs to be recalculated for a claim of bias in the other direction, say that the coin produces heads with a frequency of 60% or more. The probability of observing 38 or fewer out of 100 under this null hypothesis is so extremely small ( p -value ~= 0.000007364 or 7.364 x 10 -6 in standard form , calculation ) that maintaining a claim for 60/40 bias in favor of heads becomes near-impossible for most practical purposes.

A p -value can be calculated for any frequentist statistical test. Common types of statistical tests include tests for:

  • absolute difference in proportions;
  • absolute difference in means;
  • relative difference in means or proportions;
  • goodness-of-fit;
  • homogeneity
  • independence
  • analysis of variance (ANOVA)

and others. Different statistics would be computed depending on the error distribution of the parameter of interest in each case, e.g. a t value, z value, chi-square (Χ 2 ) value, f -value, and so on.

p -values can then be calculated based on the cumulative distribution functions (CDFs) of these statistics whereas pre-test significance thresholds (critical values) can be computed based on the inverses of these functions. You can try these by plugging different inputs in our critical value calculator , and also by consulting its documentation.

In its generic form, a p -value formula can be written down as:

p = P(d(X) ≥ d(x 0 ); H 0 )

where P stands for probability, d(X) is a test statistic (distance function) of a random variable X , x 0 is a typical realization of X and H 0 is the selected null hypothesis. The semi-colon means ‘assuming’. The distance function is the aforementioned cumulative distribution function for the relevant error distribution. In its generic form a distance function equation can be written as:

Standard score distance function

X -bar is the arithmetic mean of the observed values, μ 0 is a hypothetical or expected mean to which X is compared, and n is the sample size. The result of a distance function will often be expressed in a standardized form – the number of standard deviations between the observed value and the expected value.

The p -value calculation is different in each case and so a different formula will be applied depending on circumstances. You can see examples in the p -values reported in our statistical calculators, such as the statistical significance calculator for difference of means or proportions , the Chi-square calculator , the risk ratio calculator , odds ratio calculator , hazard ratio calculator , and the normality calculator .

A very fresh (as of late 2020) example of the application of p -values in scientific hypothesis testing can be found in the recently concluded COVID-19 clinical trials. Multiple vaccines for the virus which spread from China in late 2019 and early 2020 have been tested on tens of thousands of volunteers split randomly into two groups – one gets the vaccine and the other gets a placebo. This is called a randomized controlled trial (RCT). The main parameter of interest is the difference between the rates of infections in the two groups. An appropriate test is the t-test for difference of proportions, but the same data can be examined in terms of risk ratios or odds ratio.

The null hypothesis in many of these medical trials is that the vaccine is at least 30% efficient. A statistical model can be built about the expected difference in proportions if the vaccine’s efficiency is 30% or less, and then the actual observed data from a medical trial can be compared to that null hypothesis. Most trials set their significance level at the minimum required by the regulatory bodies (FDA, EMA, etc.), which is usually set at 0.05 . So, if the p -value from a vaccine trial is calculated to be below 0.05, the outcome would be statistically significant and the null hypothesis of the vaccine being less than or equal to 30% efficient would be rejected.

Let us say a vaccine trial results in a p -value of 0.0001 against that null hypothesis. As this is highly unlikely under the assumption of the null hypothesis being true, it provides very strong evidence against the hypothesis that the tested treatment has less than 30% efficiency.

However, many regulators stated that they require at least 50% proven efficiency. They posit a different null hypothesis and so the p -value presented before these bodies needs to be calculated against it. This p -value would be somewhat increased since 50% is a higher null value than 30%, but given that the observed effects of the first vaccines to finalize their trials are around 95% with 95% confidence interval bounds hovering around 90%, the p -value against a null hypothesis stating that the vaccine’s efficiency is 50% or less is likely to still be highly statistically significant, say at 0.001 . Such an outcome is to be interpreted as follows: had the efficiency been 50% or below, such an extreme outcome would have most likely not been observed, therefore one can proceed to reject the claim that the vaccine has efficiency of 50% or less with a significance level of 0.001 .

While this example is fictitious in that it doesn’t reference any particular experiment, it should serve as a good illustration of how null hypothesis statistical testing (NHST) operates based on p -values and significance thresholds.

The utility of p -values and statistical significance

It is not often appreciated how much utility p-values bring to the practice of performing statistical tests for scientific and business purposes.

Quantifying relative uncertainty of data

First and foremost, p -values are a convenient expression of the uncertainty in the data with respect to a given claim. They quantify how unexpected a given observation is, assuming some claim which is put to the test is true. If the p-value is low the probability that it would have been observed under the null hypothesis is low. This means the uncertainty the data introduce is high. Therefore, anyone defending the substantive claim which corresponds to the statistical null hypothesis would be pressed to concede that their position is untenable in the face of such data.

If the p-value is high, then the uncertainty with regard to the null hypothesis is low and we are not in a position to reject it, hence the corresponding claim can still be maintained.

As evident by the generic p -value formula and the equation for a distance function which is a part of it, a p -value incorporates information about:

  • the observed effect size relative to the null effect size
  • the sample size of the test
  • the variance and error distribution of the statistic of interest

It would be much more complicated to communicate the outcomes of a statistical test if one had to communicate all three pieces of information. Instead, by way of a single value on the scale of 0 to 1 one can communicate how surprising an outcome is. This value is affected by any change in any of these variables.

This quality stems from the fact that assuming that a p -value from one statistical test can easily and directly be compared to another. The minimal assumptions behind significance tests mean that given that all of them are met, the strength of the statistical evidence offered by data relative to a null hypothesis of interest is the same in two tests if they have approximately equal p -values.

This is especially useful in conducting meta-analyses of various sorts, or for combining evidence from multiple tests.

p -value interpretation in outcomes of experiments

When a p -value is calculated for the outcome of a randomized controlled experiment, it is used to assess the strength of evidence against a null hypothesis of interest, such as that a given intervention does not have a positive effect. If H 0 : μ 0 ≤ 0% and the observed effect is μ 1 = 30% and the calculated p -value is 0.025, this can be used to reject the claim H 0 : μ 0 ≤ 0% at any significance level ≥ 0.025. This, in turn, allows us to claim that H 1 , a complementary hypothesis called the ‘alternative hypothesis’, is in fact true. In this case since H 0 : μ 0 ≤ 0% then H 1 : μ 1 > 0% in order to exhaust the parameter space, as illustrated below:

Composite null versus composite alternative hypothesis in NHST

A claim as the above corresponds to what is called a one-sided null hypothesis . There could be a point null as well, for example the claim that an intervention has no effect whatsoever translates to H 0 : μ 0 = 0%. In such a case the corresponding p -value refers to that point null and hence should be interpreted as rejecting the claim of the effect being exactly zero. For those interested in the differences between point null hypotheses and one-sided hypotheses the articles on onesided.org should be an interesting read. TLDR: most of the time you’d want to reject a directional claim and hence a one-tailed p -value should be reported [8] .

These finer points aside, after observing a low enough p -value, one can claim the rejection of the null and hence the adoption of the complementary alternative hypothesis as true. The alternative hypothesis is simply a negation of the null and is therefore a composite claim such as ‘there is a positive effect’ or ‘there is some non-zero effect’. Note that any inference about a particular effect size within the alternative space has not been tested and hence claiming it has probability equal to p calculated against a zero effect null hypothesis (a.k.a. the nil hypothesis) does not make sense.

p – value interpretation in regressions and correlations of observational data

When performing statistical analyses of observational data p -values are often calculated for regressors in addition to regression coefficients and for the correlation in addition to correlation coefficients. A p -value falling below a specific statistical significance threshold measures how surprising the observed correlation or regression coefficient would be if the variable of interest is in fact orthogonal to the outcome variable. That is – how likely would it be to observe the apparent relationship, if there was no actual relationship between the variable and the outcome variable.

Our correlation calculator outputs both p -values and confidence intervals for the calculated coefficients and is an easy way to explore the concept in the case of correlations. Extrapolating to regressions is then straightforward.

Misinterpretations of statistically significant p -values

There are several common misinterpretations [7] of p -values and statistical significance and no calculator can save one from falling for them. The following errors are often committed when a result is seen as statistically significant.

A result may be highly statistically significant (e.g. p -value 0.0001) but it might still have no practical consequences due to a trivial effect size. This often happens with overpowered designs, but it can also happen in a properly designed statistical test. This error can be avoided by always reporting the effect size and confidence intervals around it.

Observing a highly significant result, say p -value 0.01 does not mean that the likelihood that the observed difference is the true difference. In fact, that likelihood is much, much smaller. Remember that statistical significance has a strict meaning in the NHST framework.

For example, if the observed effect size μ 1 from an intervention is 20% improvement in some outcome and a p -value against the null hypothesis of μ 0 ≤ 0% has been calculated to be 0.01, it does not mean that one can reject μ 0 ≤ 20% with a p -value of 0.01. In fact, the p -value against μ 0 ≤ 20% would be 0.5, which is not statistically significant by any measure.

To make claims about a particular effect size it is recommended to use confidence intervals or severity, or both.

For example, stating that a p -value of 0.02 means that there is 98% probability that the alternative hypothesis is true or that there is 2% probability that the null hypothesis is true . This is a logical error.

By design, even if the null hypothesis is true, p -values equal to or lower than 0.02 would be observed exactly 2% of the time, so one cannot use the fact that a low p -value has been observed to argue there is only 2% probability that the null hypothesis is true. Frequentist and error-statistical methods do not allow one to attach probabilities to hypotheses or claims, only to events [4] . Doing so requires an exhaustive list of hypotheses and prior probabilities attached to them which goes firmly into decision-making territory. Put in Bayesian terms, the p -value is not a posterior probability.

Misinterpretations of statistically non-significant outcomes

Statistically non-significant p-values – that is, p is greater than the specified significance threshold α (alpha), can lead to a different set of misinterpretations. Due to the ubiquitous use of p -values, these are committed often as well.

Treating a high p -value / low significance level as evidence, by itself, that the null hypothesis is true is a common mistake. For example, after observing p = 0.2 one may claim this is evidence that there is no effect, e.g. no difference between two means, is a common mistake.

However, it is trivial to demonstrate why it is wrong to interpret a high p -value as providing support for the null hypothesis. Take a simple experiment in which one measures only 2 (two) people or objects in the control and treatment groups. The p -value for this test of significance will surely not be statistically significant. Does that mean that the intervention is ineffective? Of course not, since that claim has not been tested severely enough. Using a statistic such as severity can completely eliminate this error [4,5] .

A more detailed response would say that failure to observe a statistically significant result, given that the test has enough statistical power, can be used to argue for accepting the null hypothesis to the extent warranted by the power and with reference to the minimum detectable effect for which it was calculated. For example, if the statistical test had 99% power to detect an effect of size μ 1 at level α and it failed, then it could be argued that it is quite unlikely that there exists and effect of size μ 1 or greater as in that case one would have most likely observed a significant p -value.

This is a softer version of the above mistake wherein instead of claiming support for the null hypothesis, a low p -value is taken, by itself, as indicating that the effect size must be small.

This is a mistake since the test might have simply lacked power to exclude many effects of meaningful size. Examining confidence intervals and performing severity calculations against particular hypothesized effect sizes would be a way to avoid this issue.

References:

[1] Mayo, D.G. 1983. “An Objective Theory of Statistical Testing.” Synthese 57 (3): 297–340. DOI:10.1007/BF01064701. [2] Mayo, D.G. 1996 “Error and the Growth of Experimental Knowledge.” Chicago, Illinois: University of Chicago Press. DOI:10.1080/106351599260247. [3] Mayo, D.G., and A. Spanos. 2006. “Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction.” The British Journal for the Philosophy of Science 57 (2): 323–357. DOI:10.1093/bjps/axl003. [4] Mayo, D.G., and A. Spanos. 2011. “Error Statistics.” Vol. 7, in Handbook of Philosophy of Science Volume 7 – Philosophy of Statistics , by D.G., Spanos, A. et al. Mayo, 1-46. Elsevier. [5] Mayo, D.G. 2018 “Statistical Inference as Severe Testing.” Cambridge: Cambridge University Press. ISBN: 978-1107664647 [6] Georgiev, G.Z. (2019) “Statistical Methods in Online A/B Testing”, ISBN: 978-1694079725 [7] Greenland, S. et al. (2016) “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations”, European Journal of Epidemiology 31:337–350; DOI:10.1007/s10654-016-0149-3 [8] Georgiev, G.Z. (2018) “Directional claims require directional (statistical) hypotheses” [online, accessed on Dec 07, 2020, at https://www.onesided.org/articles/directional-claims-require-directional-hypotheses.php]

p value of testing a hypothesis

An applied statistician, data analyst, and optimizer by calling, Georgi has expertise in web analytics, statistics, design of experiments, and business risk management. He covers a variety of topics where mathematical models and statistics are useful. Georgi is also the author of “Statistical Methods in Online A/B Testing”.

Recent Articles

  • Mastering Formulas in Baking: Baker’s Math, Kitchen Conversions, + More
  • Margin vs. Markup: Decoding Profitability in Simple Terms
  • How Much Do I Have to Earn Per Hour to Afford the Cost of Living?
  • How to Calculate for VAT When Traveling Abroad
  • Mathematics in the Kitchen
  • Search GIGA Articles
  • Cybersecurity
  • Home & Garden
  • Mathematics

Table of Contents

What is p-value , p value vs alpha level, p values and critical values, how is p-value calculated, p-value in hypothesis testing, p-values and statistical significance, reporting p-values, our learners also ask, what is p-value in statistical hypothesis.

What Is P-Value in Statistical Hypothesis?

Few statistical estimates are as significant as the p-value. The p-value or probability value is a number, calculated from a statistical test , that describes how likely your results would have occurred if the null hypothesis were true. A P-value less than 0.5 is statistically significant, while a value higher than 0.5 indicates the null hypothesis is true; hence it is not statistically significant. So, what is P-Value exactly, and why is it so important?

In statistical hypothesis testing , P-Value or probability value can be defined as the measure of the probability that a real-valued test statistic is at least as extreme as the value actually obtained. P-value shows how likely it is that your set of observations could have occurred under the null hypothesis. P-Values are used in statistical hypothesis testing to determine whether to reject the null hypothesis. The smaller the p-value, the stronger the likelihood that you should reject the null hypothesis. 

Your Data Analytics Career is Around The Corner!

Your Data Analytics Career is Around The Corner!

P-values are expressed as decimals and can be converted into percentage. For example, a p-value of 0.0237 is 2.37%, which means there's a 2.37% chance of your results being random or having happened by chance. The smaller the P-value, the more significant your results are. 

In a hypothesis test, you can compare the p value from your test with the alpha level selected while running the test. Now, let’s try to understand what is P-Value vs Alpha level.    

A P-value indicates the probability of getting an effect no less than that actually observed in the sample data.

An alpha level will tell you the probability of wrongly rejecting a true null hypothesis. The level is selected by the researcher and obtained by subtracting your confidence level from 100%. For instance, if you are 95% confident in your research, the alpha level will be 5% (0.05).

When you run the hypothesis test, if you get:

  • A small p value (<=0.05), you should reject the null hypothesis
  • A large p value (>0.05), you should not reject the null hypothesis

In addition to the P-value, you can use other values given by your test to determine if your null hypothesis is true. 

For example, if you run an F-test to compare two variances in Excel, you will obtain a p-value, an f-critical value, and a f-value. Compare the f-value with f-critical value. If f-critical value is lower, you should reject the null hypothesis. 

P-Values are usually calculated using p-value tables or spreadsheets, or calculated automatically using statistical software like R, SPSS, etc. 

Depending on the test statistic and degrees of freedom (subtracting no. of independent variables from no. of observations) of your test, you can find out from the tables how frequently you can expect the test statistic to be under the null hypothesis. 

How to calculate P-value depends on which statistical test you’re using to test your hypothesis.  

  • Every statistical test uses different assumptions and generates different statistics. Select the test method that best suits your data and matches the effect or relationship being tested.
  • The number of independent variables included in your test determines how big or small the test statistic should be in order to generate the same p-value. 

Regardless of what statistical test you are using, the p-value will always denote the same thing – how frequently you can expect to get a test statistic as extreme or even more extreme than the one given by your test. 

In the P-Value approach to hypothesis testing, a calculated probability is used to decide if there’s evidence to reject the null hypothesis, also known as the conjecture. The conjecture is the initial claim about a data population, while the alternative hypothesis ascertains if the observed population parameter differs from the population parameter value according to the conjecture. 

Effectively, the significance level is declared in advance to determine how small the P-value needs to be such that the null hypothesis is rejected.  The levels of significance vary from one researcher to another; so it can get difficult for readers to compare results from two different tests. That is when P-value makes things easier. 

Readers could interpret the statistical significance by referring to the reported P-value of the hypothesis test. This is known as the P-value approach to hypothesis testing. Using this, readers could decide for themselves whether the p value represents a statistically significant difference.  

The level of statistical significance is usually represented as a P-value between 0 and 1. The smaller the p-value, the more likely it is that you would reject the null hypothesis. 

  • A P-Value < or = 0.05 is considered statistically significant. It denotes strong evidence against the null hypothesis, since there is below 5% probability of the null being correct. So, we reject the null hypothesis and accept the alternative hypothesis.
  • But if P-Value is lower than your threshold of significance, though the null hypothesis can be rejected, it does not mean that there is 95% probability of the alternative hypothesis being true. 
  • A P-Value >0.05 is not statistically significant. It denotes strong evidence for the null hypothesis being true. Thus, we retain the null hypothesis and reject the alternative hypothesis. We cannot accept null hypothesis; we can only reject or not reject it. 

A statistically significant result does not prove a research hypothesis to be correct. Instead, it provides support for or provides evidence for the hypothesis. 

  • You should report exact P-Values upto two or three decimal places. 
  • For P-values less than .001, report as p < .001. 
  • Do not use 0 before the decimal point as it cannot equal1. Write p = .001, and not p = 0.001
  • Make sure p is always italicized and there is space on either side of the = sign. 
  • It is impossible to get P = .000, and should be written as p < .001

An investor says that the performance of their investment portfolio is equivalent to that of the Standard & Poor’s (S&P) 500 Index. He performs a two-tailed test to determine this. 

The null hypothesis here says that the portfolio’s returns are equivalent to the returns of S&P 500, while the alternative hypothesis says that the returns of the portfolio and the returns of the S&P 500 are not equivalent.  

The p-value hypothesis test gives a measure of how much evidence is present to reject the null hypothesis. The smaller the p value, the higher the evidence against null hypothesis. 

Therefore, if the investor gets a P value of .001, it indicates strong evidence against null hypothesis. So he confidently deduces that the portfolio’s returns and the S&P 500’s returns are not equivalent.

1. What does P-value mean?

P-Value or probability value is a number that denotes the likelihood of your data having occurred under the null hypothesis of your statistical test. 

2. What does p 0.05 mean?

A P-value less than 0.05 is deemed to be statistically significant, meaning the null hypothesis should be rejected in such a case. A P-Value greater than 0.05 is not considered to be statistically significant, meaning the null hypothesis should not be rejected. 

3. What is P-value and how is it calculated?

The p-value or probability value is a number, calculated from a statistical test, that tells how likely it is that your results would have occurred under the null hypothesis of the test.  

P-values are usually automatically calculated using statistical software. They can also be calculated using p-value tables for the relevant statistical test. P values are calculated based on the null distribution of the test statistic. In case the test statistic is far from the mean of the null distribution, the p-value obtained is small. It indicates that the test statistic is unlikely to have occurred under the null hypothesis. 

4. What is p-value in research?

P values are used in hypothesis testing to help determine whether the null hypothesis should be rejected. It plays a major role when results of research are discussed. Hypothesis testing is a statistical methodology frequently used in medical and clinical research studies. 

5. Why is the p-value significant?

Statistical significance is a term that researchers use to say that it is not likely that their observations could have occurred if the null hypothesis were true. The level of statistical significance is usually represented as a P-value or probability value between 0 and 1. The smaller the p-value, the more likely it is that you would reject the null hypothesis. 

6. What is null hypothesis and what is p-value?

A null hypothesis is a kind of statistical hypothesis that suggests that there is no statistical significance in a set of given observations. It says there is no relationship between your variables.   

P-value or probability value is a number, calculated from a statistical test, that tells how likely it is that your results would have occurred under the null hypothesis of the test.   

P-Value is used to determine the significance of observational data. Whenever researchers notice an apparent relation between two variables, a P-Value calculation helps ascertain if the observed relationship happened as a result of chance. Learn more about statistical analysis and data analytics and fast track your career with our Professional Certificate Program In Data Analytics .  

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Program NameDurationFees

Cohort Starts:

3 Months€ 1,999

Cohort Starts:

11 months€ 2,290

Cohort Starts:

8 months€ 2,790

Cohort Starts:

11 Months€ 3,790

Cohort Starts:

11 months€ 2,790

Cohort Starts:

32 weeks€ 1,790
11 months€ 1,099
11 months€ 1,099

Recommended Reads

Unlocking Client Value with GenAI: A Guide for IT Service Leaders to Build Capability

Inferential Statistics Explained: From Basics to Advanced!

A Comprehensive Look at Percentile in Statistics

Free eBook: Top Programming Languages For A Data Scientist

The Difference Between Data Mining and Statistics

All You Need to Know About Bias in Statistics

Get Affiliated Certifications with Live Class programs

Post graduate program in data analytics.

  • Post Graduate Program certificate and Alumni Association membership
  • Exclusive hackathons and Ask me Anything sessions by IBM

Data Analyst

  • Industry-recognized Data Analyst Master’s certificate from Simplilearn
  • Dedicated live sessions by faculty of industry experts
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • Number System and Arithmetic
  • Trigonometry
  • Probability
  • Mensuration
  • Linear Algebra
  • CBSE Class 8 Maths Formulas
  • CBSE Class 9 Maths Formulas
  • CBSE Class 10 Maths Formulas
  • CBSE Class 11 Maths Formulas

How to Find p Value from Test Statistic

P-values are widely used in statistics and are important for many hypothesis tests. But how do you find a p-value? The method can vary depending on the specific test, but there’s a general process you can follow. In this article, you’ll learn how to find the p-value, get an overview of the general steps for all hypothesis tests, and see a detailed example of how to calculate a p-value.

Hypothesis tests check if a claim about a population is true. This claim is called the null hypothesis (H0). The alternative hypothesis (Ha) is what you would believe if the null hypothesis is false. Knowing how to find the p-value is crucial in testing because it helps you decide if the null hypothesis is likely true or not.

Table of Content

Understanding p-value and Test Statistic

Steps to find p-value from test statistic, example calculating p-value, using statistical software to find p-value, practical applications of p-value.

To understand more about p-value and test statistics read the article added below:

The p-value is calculated using the test statistic’s sampling distribution under the null hypothesis, the sample data, and the type of test being conducted (lower-tailed, upper-tailed, or two-sided test).

The p-value for:

  • A lower-tailed test is given by: p-value = P(TS ≤ ts | H0 is true) = cdf(ts)
  • An upper-tailed test is given by: p-value = P(TS ≥ ts | H0 is true) = 1 – cdf(ts)
  • For a two-sided test, assuming the distribution of the test statistic under H0 is symmetric about 0: p-value = 2 * P(TS ≥ |ts| | H0 is true) = 2 * (1 – cdf(|ts|))
  • P is the probability of an event.
  • TS is the test statistic.
  • ts is the observed value of the test statistic calculated from your sample.
  • cdf() is the cumulative distribution function of the distribution of the test statistic (TS) under the null hypothesis.

Test Statistic

A test statistic measures how closely your data matches the distribution predicted by the null hypothesis of the statistical test you’re using.

  • Distribution of data shows how often each observation occurs and can be described by its central tendency and variation around that central tendency. Different statistical tests predict different types of distributions, so it’s important to choose the right test for your hypothesis.
  • Test statistic sums up your observed data into a single number using central tendency, variation, sample size, and the number of predictor variables in your model.

Usually, the test statistic is calculated as the pattern in your data (like the correlation between variables or the difference between groups) divided by the variance in the data (such as the standard deviation).

Test Statistic Example

You are testing the relationship between temperature and flowering date for a type of apple tree. You use long-term data tracking temperature and flowering dates from the past 25 years by randomly sampling 100 trees each year in an experimental field.

  • Null Hypothesis (H 0 ) : There is no correlation between temperature and flowering date.
  • Alternative Hypothesis (H1) : There is a correlation between temperature and flowering date.

To test this hypothesis, you perform a regression test, which generates a t-value as its test statistic. The t-value compares the observed correlation between these variables to the null hypothesis of zero correlation.

Here are steps to help calculate the p-value for a data sample:

Step-1: State Null and Alternative Hypotheses

Start by looking at your data and forming a null and alternative hypothesis. For example, you might hypothesize that the mean “μ” is 10. Thus, the alternative hypothesis is that the mean “μ” is not 10. You can write these as:

H 0 : μ = 10

H 1 : μ ≠ 10

In these hypotheses:

  • H0 is the null hypothesis.
  • H1 is the alternative hypothesis.
  • μ is the hypothesized mean.
  • ≠ means does not equal.

Step-2: Use a t-test and its Formula

After setting your hypotheses, calculate the test statistic “t” using your data set. The formula is:

t = (x̄ – μ) / (s / √n)
  • t is the test statistic.
  • x̄ is the sample mean.
  • s is the standard deviation of the sample.
  • n is the sample size.

Standard deviation measures how spread out the data points are in a set. It shows how close a data point is to the mean compared to other data points.

Step-3: Use a t-distribution table to find the p-value

After calculating “t,” find the p-value using a t-distribution table, which you can find online. The table includes significance levels of 0.01, 0.05, and 0.1, which indicate how close the hypothesis is to the data set. To use the table, subtract 1 from your sample size “n.”

For example:

10 – 1 = 9

Use this number and your chosen significance level to find the corresponding value in the table.

If you have a one-tailed distribution, this value is your p-value. For a two-tailed distribution, which is more common, multiply this value by two to get your p-value.

Here’s an example of calculating the p-value based on a known set of data:

Emma wants to know if the average number of hours students study each week is 15 hours. She gathers data from a sample of students and finds that the sample mean is 13 hours, with a standard deviation of 3 hours. She decides to perform a two-tailed t-test to find the p-value at a 0.05 significance level to determine if 15 hours is the true mean. She forms the following hypotheses:

  • H 0 : μ = 15 hours
  • H 1 : μ ≠ 15 hours

After forming her hypotheses, she calculates the absolute value, or “|t|,” of the test like this:

  • t = (13 – 15) / (3 / √20)
  • t = (-2) / (0.67082)
  • |t| = 2.980

Using this t-value, she refers to a t-distribution table to locate values based on her significance level of 0.05 and her t-value of 2.980. She uses a sample size of 20 and subtracts 1 to get the degrees of freedom:

  • 20 – 1 = 19

She then reviews her t-value of 2.980, which falls between the levels 0.005 and 0.001 on a t-distribution table. She averages 0.005 and 0.001 to get a value of 0.003. With a two-tailed test, she multiplies this value by 2 to get 0.006, which is the p-value for this test. Since the p-value is less than the 0.05 significance level, she rejects the null hypothesis and accepts the alternative hypothesis that the average number of hours students study each week is not 15 hours.

P-Values can be calculated using p-value tables, spreadsheets, or statistical software like R or SPSS. You can find out how often the test statistic falls under the null hypothesis by using the test statistic and degrees of freedom (number of observations minus the number of independent variables).

The method to calculate a p-value depends on the statistical test you’re using. Different statistical tests have different assumptions and produce different statistics. Choose the test method that best fits your data and the effect or relationship you’re testing. The number of independent variables in your test affects the size of the test statistic needed to produce the same p-value.

No matter what statistical test you use, the p-value always indicates how often you can expect to get a test statistic as extreme or more extreme than the one from your test.

P-Value is important in many engineering fields, from electrical engineering to civil engineering. It helps test prototype reliability, validate experiment results, and optimize systems, supporting statistically-informed decisions.

  • Electrical Engineering: Electrical engineers use P-Values to test the efficiency of electrical devices, compare different models’ performance, and validate results from complex circuit simulations.
  • Civil Engineering: In civil engineering, P-Values help validate the strength of construction materials, assess new design methods’ effectiveness, and analyze various structural designs’ safety.

Knowing how to calculate and understand p-values is important for making good decisions based on statistical tests. Whether you’re in electrical engineering, civil engineering, or another field, p-values show you how reliable and significant your data is. By learning the steps to find p-values and using the right statistical tests, you can check your hypotheses and make confident, data-based decisions.

Frequently Asked Questions

How to find p-value from test statistic on calculator.

You can get a p-value by performing an inference test, which can be done by  pressing the stat key followed by two clicks to the right . There will be a list of tests, and by putting in your numbers, the calculator will give you a p-value.

How do you find the p-value from the F test statistic?

To find the p values for the f test you need to  consult the f table . Use the degrees of freedom given in the ANOVA table (provided as part of the SPSS regression output). To find the p values for the t test you need to use the Df2 i.e. df denominator.

What is the formula for the p-value of the t-test?

p-value = P(T ≥ t∗|T ∼ p0) . In other words, the p-value is the probability under H0 of observing a test statistic at least as extreme as what was observed. If the test statistic has a continuous distribution, then under H0 the p-value is uniformly distributed between 0 and 1.

What is the formula for test statistic?

For a z-test, the test statistic is  z = x ¯ − μ σ n  and for a t-test, the test statistic is t = x ¯ − μ s n , where is the sample mean, is the population mean, is the population standard deviation, is the sample standard deviation, and is the sample size.

How to find the p-value of the test statistic?

The p-value is calculated using the sampling distribution of the test statistic under the null hypothesis, the sample data, and the type of test being done (lower-tailed test, upper-tailed test, or two-sided test). The p-value for: a lower-tailed test is specified by:  p-value = P(TS ts | H  0  is true) = cdf(ts)

Please Login to comment...

Similar reads.

  • School Learning
  • Math-Statistics

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Hypothesis testing.

Key Topics:

  • Basic approach
  • Null and alternative hypothesis
  • Decision making and the p -value
  • Z-test & Nonparametric alternative

Basic approach to hypothesis testing

  • State a model describing the relationship between the explanatory variables and the outcome variable(s) in the population and the nature of the variability. State all of your assumptions .
  • Specify the null and alternative hypotheses in terms of the parameters of the model.
  • Invent a test statistic that will tend to be different under the null and alternative hypotheses.
  • Using the assumptions of step 1, find the theoretical sampling distribution of the statistic under the null hypothesis of step 2. Ideally the form of the sampling distribution should be one of the “standard distributions”(e.g. normal, t , binomial..)
  • Calculate a p -value , as the area under the sampling distribution more extreme than your statistic. Depends on the form of the alternative hypothesis.
  • Choose your acceptable type 1 error rate (alpha) and apply the decision rule : reject the null hypothesis if the p-value is less than alpha, otherwise do not reject.
sampled from a with unknown mean μ and known variance σ . : μ = μ
H : μ ≤ μ
H : μ ≥ μ
: μ ≠ μ
H : μ > μ
H : μ < μ
  • \(\frac{\bar{X}-\mu_0}{\sigma / \sqrt{n}}\)
  • general form is: (estimate - value we are testing)/(st.dev of the estimate)
  • z-statistic follows N(0,1) distribution
  • 2 × the area above |z|, area above z,or area below z, or
  • compare the statistic to a critical value, |z| ≥ z α/2 , z ≥ z α , or z ≤ - z α
  • Choose the acceptable level of Alpha = 0.05, we conclude …. ?

Making the Decision

It is either likely or unlikely that we would collect the evidence we did given the initial assumption. (Note: “likely” or “unlikely” is measured by calculating a probability!)

If it is likely , then we “ do not reject ” our initial assumption. There is not enough evidence to do otherwise.

If it is unlikely , then:

  • either our initial assumption is correct and we experienced an unusual event or,
  • our initial assumption is incorrect

In statistics, if it is unlikely, we decide to “ reject ” our initial assumption.

Example: Criminal Trial Analogy

First, state 2 hypotheses, the null hypothesis (“H 0 ”) and the alternative hypothesis (“H A ”)

  • H 0 : Defendant is not guilty.
  • H A : Defendant is guilty.

Usually the H 0 is a statement of “no effect”, or “no change”, or “chance only” about a population parameter.

While the H A , depending on the situation, is that there is a difference, trend, effect, or a relationship with respect to a population parameter.

  • It can one-sided and two-sided.
  • In two-sided we only care there is a difference, but not the direction of it. In one-sided we care about a particular direction of the relationship. We want to know if the value is strictly larger or smaller.

Then, collect evidence, such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, handwriting samples, etc. (In statistics, the data are the evidence.)

Next, you make your initial assumption.

  • Defendant is innocent until proven guilty.

In statistics, we always assume the null hypothesis is true .

Then, make a decision based on the available evidence.

  • If there is sufficient evidence (“beyond a reasonable doubt”), reject the null hypothesis . (Behave as if defendant is guilty.)
  • If there is not enough evidence, do not reject the null hypothesis . (Behave as if defendant is not guilty.)

If the observed outcome, e.g., a sample statistic, is surprising under the assumption that the null hypothesis is true, but more probable if the alternative is true, then this outcome is evidence against H 0 and in favor of H A .

An observed effect so large that it would rarely occur by chance is called statistically significant (i.e., not likely to happen by chance).

Using the p -value to make the decision

The p -value represents how likely we would be to observe such an extreme sample if the null hypothesis were true. The p -value is a probability computed assuming the null hypothesis is true, that the test statistic would take a value as extreme or more extreme than that actually observed. Since it's a probability, it is a number between 0 and 1. The closer the number is to 0 means the event is “unlikely.” So if p -value is “small,” (typically, less than 0.05), we can then reject the null hypothesis.

Significance level and p -value

Significance level, α, is a decisive value for p -value. In this context, significant does not mean “important”, but it means “not likely to happened just by chance”.

α is the maximum probability of rejecting the null hypothesis when the null hypothesis is true. If α = 1 we always reject the null, if α = 0 we never reject the null hypothesis. In articles, journals, etc… you may read: “The results were significant ( p <0.05).” So if p =0.03, it's significant at the level of α = 0.05 but not at the level of α = 0.01. If we reject the H 0 at the level of α = 0.05 (which corresponds to 95% CI), we are saying that if H 0 is true, the observed phenomenon would happen no more than 5% of the time (that is 1 in 20). If we choose to compare the p -value to α = 0.01, we are insisting on a stronger evidence!

Neither decision of rejecting or not rejecting the H entails proving the null hypothesis or the alternative hypothesis. We merely state there is enough evidence to behave one way or the other. This is also always true in statistics!

So, what kind of error could we make? No matter what decision we make, there is always a chance we made an error.

Errors in Criminal Trial:

Errors in Hypothesis Testing

Type I error (False positive): The null hypothesis is rejected when it is true.

  • α is the maximum probability of making a Type I error.

Type II error (False negative): The null hypothesis is not rejected when it is false.

  • β is the probability of making a Type II error

There is always a chance of making one of these errors. But, a good scientific study will minimize the chance of doing so!

The power of a statistical test is its probability of rejecting the null hypothesis if the null hypothesis is false. That is, power is the ability to correctly reject H 0 and detect a significant effect. In other words, power is one minus the type II error risk.

\(\text{Power }=1-\beta = P\left(\text{reject} H_0 | H_0 \text{is false } \right)\)

Which error is worse?

Type I = you are innocent, yet accused of cheating on the test. Type II = you cheated on the test, but you are found innocent.

This depends on the context of the problem too. But in most cases scientists are trying to be “conservative”; it's worse to make a spurious discovery than to fail to make a good one. Our goal it to increase the power of the test that is to minimize the length of the CI.

We need to keep in mind:

  • the effect of the sample size,
  • the correctness of the underlying assumptions about the population,
  • statistical vs. practical significance, etc…

(see the handout). To study the tradeoffs between the sample size, α, and Type II error we can use power and operating characteristic curves.

Assume data are independently sampled from a normal distribution with unknown mean μ and known variance σ = 9. Make an initial assumption that μ = 65.

Specify the hypothesis: H : μ = 65 H : μ ≠ 65

z-statistic: 3.58

z-statistic follow N(0,1) distribution

The -value, < 0.0001, indicates that, if the average height in the population is 65 inches, it is unlikely that a sample of 54 students would have an average height of 66.4630.

Alpha = 0.05. Decision: -value < alpha, thus

Conclude that the average height is not equal to 65.

What type of error might we have made?

Type I error is claiming that average student height is not 65 inches, when it really is. Type II error is failing to claim that the average student height is not 65in when it is.

We rejected the null hypothesis, i.e., claimed that the height is not 65, thus making potentially a Type I error. But sometimes the p -value is too low because of the large sample size, and we may have statistical significance but not really practical significance! That's why most statisticians are much more comfortable with using CI than tests.

Based on the CI only, how do you know that you should reject the null hypothesis?

The 95% CI is (65.6628,67.2631) ...

What about practical and statistical significance now? Is there another reason to suspect this test, and the -value calculations?

There is a need for a further generalization. What if we can't assume that σ is known? In this case we would use s (the sample standard deviation) to estimate σ.

If the sample is very large, we can treat σ as known by assuming that σ = s . According to the law of large numbers, this is not too bad a thing to do. But if the sample is small, the fact that we have to estimate both the standard deviation and the mean adds extra uncertainty to our inference. In practice this means that we need a larger multiplier for the standard error.

We need one-sample t -test.

One sample t -test

  • Assume data are independently sampled from a normal distribution with unknown mean μ and variance σ 2 . Make an initial assumption, μ 0 .
: μ = μ
H : μ ≤ μ
H : μ ≥ μ
: μ ≠ μ
H : μ > μ
H : μ < μ
  • t-statistic: \(\frac{\bar{X}-\mu_0}{s / \sqrt{n}}\) where s is a sample st.dev.
  • t-statistic follows t -distribution with df = n - 1
  • Alpha = 0.05, we conclude ….

Testing for the population proportion

Let's go back to our CNN poll. Assume we have a SRS of 1,017 adults.

We are interested in testing the following hypothesis: H 0 : p = 0.50 vs. p > 0.50

What is the test statistic?

If alpha = 0.05, what do we conclude?

We will see more details in the next lesson on proportions, then distributions, and possible tests.

  • Correspondence
  • Open access
  • Published: 09 August 2024

Untargeted lipidomics reveals racial differences in lipid species among women

  • Ghazaleh Pourali 1   na1 ,
  • Liang Li 2   na1 ,
  • Kayla R. Getz 1 ,
  • Myung Sik Jeon 1 , 3 ,
  • Jingqin Luo 1 , 3 ,
  • Chongliang Luo 1 , 3 &
  • Adetunji T. Toriola 1 , 4  

Biomarker Research volume  12 , Article number:  79 ( 2024 ) Cite this article

Metrics details

Understanding the biological mechanisms underlying racial differences in diseases is crucial to developing targeted prevention and treatment. There is, however, limited knowledge of the impact of race on lipids. To address this, we performed comprehensive lipidomics analyses to evaluate racial differences in lipid species among 506 non-Hispanic White (NHW) and 163 non-Hispanic Black (NHB) women. Plasma lipidomic profiling quantified 982 lipid species. We used multivariable linear regression models, adjusted for confounders, to identify racial differences in lipid species and corrected for multiple testing using a Bonferroni-adjusted p-value < 10 –5 . We identified 248 lipid species that were significantly associated with race. NHB women had lower levels of several lipid species, most notably in the triacylglycerols sub-pathway ( N  = 198 out of 518) with 46 lipid species exhibiting an absolute percentage difference ≥ 50% lower in NHB compared with NHW women. We report several novel differences in lipid species between NHW and NHB women, which may underlie racial differences in health and have implications for disease prevention.

To the editor,

Non-Hispanic Black populations (NHBs) experience higher rates of chronic diseases, including metabolic disorders compared with non-Hispanic White populations (NHWs) [ 1 ]. While social determinants of health contribute to these differences, evaluating possible biological contributors to the observed racial differences is crucial to designing effective health interventions. Lipids play key roles in cellular functions and diseases [ 2 ]. Due to a lack of data, we performed the first comprehensive analysis of lipidomic profiles in NHB and NHW women to determine whether racial differences exist. This is critical to understanding the biological basis of racial disparities in disease incidence and outcomes.

Our study included 669 women [506 NHW (75.6%) and 163 NHB (24.4%)] recruited at Washington University School of Medicine, St. Louis, MO. A detailed description of this study population has been published [ 3 ]. Our lipidomic profiling performed at Metabolon identified 982 lipid species within 3 super-pathways and 14 sub-pathways. We excluded 125 out of 982 lipid species missing in ≥ 300 of the women for the analyses. We investigated the associations of lipid species with race using linear regression models, adjusting for confounders, and accounted for multiple testings by applying the Bonferroni correction. The study methods are detailed in the Supplementary Methods. Participants' characteristics are described in Table S1. NHB women had higher BMI (34.6 kg/m 2 ) compared with NHW women (28.7 kg/m 2 ) ( p -value < 0.001). Additionally, a higher proportion of NHB women reported no alcohol consumption (52.8%) compared to NHW women (21.5%) ( p -value < 0.001).

We observed significant racial differences at the super-pathway, sub-pathway, and species levels in 248 lipid species out of the 857 lipid species (28.9%) (Bonferroni-adjusted p-value < 10 –5 ) (Fig.  1 , Table S2). These species belong to triacylglycerols (TAG, n  = 198), diacylglycerols (DAG, n  = 19), phosphatidylcholines (PC, n  = 14), phosphatidylethanolamines (PE, n  = 4), cholesteryl esters (CE, n  = 3), lysophosphatidylcholines (LPC, n  = 3), lysophosphatidylethanolamines (LPE, n  = 3), sphingomyelins (SM, n  = 2), ceramides (CER, n  = 1), and phosphatidylinositols (PI, n  = 1) sub-pathways. All lipid species except TAG58:10-FA20:4 were lower in NHB women. Forty-six lipid species [TAG ( n  = 45) and DAG ( n  = 1)] exhibited an absolute percentage difference ≥ 50% (Table  1 ). The top 5 species with the largest absolute percentage differences were all lower in NHB than in NHW: TAG46:2-FA16:1 (60.9%, p  = 2.8 × 10 –12 ), TAG44:0-FA14:0 (59.8%, p  = 9.5 × 10 –6 ), TAG47:2-FA16:1 (59.8%, p  = 4.6 × 10 –16 ), TAG46:1-FA16:1 (59.3%, p  = 1.9 × 10 –9 ), and TAG44:1-FA14:0 (58.8%, p  = 4.8 × 10 –6 ). At absolute percentage difference thresholds of ≥ 40%, ≥ 30%, and ≥ 20%, we identified 120, 201, and 238 lipid species, respectively, that were significantly associated with race. (Table S3). At the lipid sub-pathway level, 4 out of 14 lipid sub-pathways including TAG, DAG, LPE, and PC, were significantly lower in NHB women compared with NHW women (Table S4). TAG sub-pathway showed significant enrichment ( p -value of 2.1 × 10 –14 ), with 198 out of the 518 total species in this sub-pathway significantly different between NHB and NHW women (Table S5).

figure 1

Associations of Race with Lipid Species by Super-pathway. Multivariable linear regression analysis between the log-transformed lipid species and race (non-Hispanic Black, non-Hispanic White) was performed, adjusted for age, BMI at enrollment, alcohol consumption, and education. Multiple hypothesis testing was corrected using Bonferroni correction. Percentage differences were back-transformed linear regression coefficients, calculated as \({100\times (e}^{\beta }-1)\) , with a 95% confidence interval. Volcano plots of log Bonferroni-adjusted p -values were generated against the percentage differences. A  Associations of race with lipid species in neutral complex lipids super-pathway. Names of neutral complex lipid species with Bonferroni-adjusted p-values < \({10}^{-10}\) or percentage differences > 15% were labeled. B  Associations of race with lipid species in phospholipid super-pathway. Names of phospholipid species with Bonferroni-adjusted p -values < \({10}^{-5}\) or percentage differences > 15% were labeled. C Associations of race with lipid species in sphingolipid super-pathway. Names of sphingolipid species with Bonferroni-adjusted p -values < \({10}^{-5}\) or percentage differences > 15% were labeled. BMI, body mass index; NHB, non-Hispanic Black; NHW, non-Hispanic White; CE, cholesteryl ester; CER, ceramide; DAG, diacylglycerol; DCER, dihydroceramide; HCER, hexosylceramide; LCER, lactosylceramide; LPC, lysophosphatidylcholine; LPE, lysophosphatidylethanolamine; MAG, monoacylglycerol; PC, phosphatidylcholine; PE, phosphatidylethanolamine; PI, phosphatidylinositol; SM, sphingomyelin; TAG, triacylglycerol

This is the first study to comprehensively use untargeted lipidomic profiling to identify differences in lipid species between NHW and NHB women. Other studies that have compared lipid levels in NHB individuals with NHW individuals analyzed only a limited number of lipids, such as cholesterol, LDL, and HDL [ 4 , 5 ]. Our approach reveals novel insights that have not been captured in prior studies. A recent study of 175 women and 175 men using a metabolomics approach identified racial differences in some lipids [ 6 ]. Our study population of 669 women is much larger and we analyzed a broader range of lipid species which provides greater power to detect smaller differences. This broader coverage, particularly in TAGs, allows for a more comprehensive assessment of racial differences in lipid species and pathways.

The only lipid species higher in NHB women was TAG58:10-FA20:4. Our finding is similar to a prior report of higher levels of long-chain polyunsaturated TAGs (C56-C60 with 8–12 double-bonds) in NHBs [ 7 ]. Similar to the report by Cai et al. [ 8 ], our data confirms lower levels of PC(16:0/18:1) in NHB women. Unlike their study, which employed pooled samples from only 35 individuals and targeted 193 lipids, we analyzed individual samples using Metabolon's untargeted approach along with robust data analysis and adjustment for confounders. This comprehensive approach minimizes bias due to sample pooling and allows for a robust evaluation of racial differences in lipids. Prior research associates low levels of specific PC species, including PC(33:1)[PC(15:0/18:1)] and PC(33:2)[PC(15:0/18:2)], with increased risk of coronary artery disease and myocardial infarction [ 9 ]. Notably, these same PC species were found to be significantly lower in NHB women in our study, which warrants further investigation to determine if these reduced PC levels contribute to health disparities in NHB women.

Our findings highlight the importance of evaluating racial differences in molecular markers relevant to health and can inform targeted interventions and personalized disease prevention. Future studies should confirm whether these differences contribute to racial health disparities and disease susceptibility.

Availability of data and materials

Data described in the manuscript, code book, and analytic code will be made available upon request and approval.

Abbreviations

  • Non-Hispanic White
  • Non-Hispanic Black

Cholesteryl esters

Diacylglycerols

  • Triacylglycerols

Monoacylglycerols

Phosphatidylcholines

Phosphatidylinositols

Phosphatidylethanolamines

Lysophosphatidylcholines

Lysophosphatidylethanolamines

Dihydroceramides

Hexosylceramides

Lactosylceramides

Sphingomyelins

Carnethon MR, Pu J, Howard G, Albert MA, Anderson CAM, Bertoni AG, et al. Cardiovascular Health in African Americans: A Scientific Statement From the American Heart Association. Circulation. 2017;136(21):e393–423.

Article   PubMed   Google Scholar  

Hornburg D, Wu S, Moqri M, Zhou X, Contrepois K, Bararpour N, et al. Dynamic lipidome alterations associated with human health, disease and ageing. Nat Metab. 2023;5(9):1578–94.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Getz KR, Jeon MS, Luo C, Luo J, Toriola AT. Lipidome of mammographic breast density in premenopausal women. Breast Cancer Res. 2023;25(1):121.

McIntosh MS, Kumar V, Kalynych C, Lott M, Hsi A, Chang JL, et al. Racial Differences in Blood Lipids Lead to Underestimation of Cardiovascular Risk in Black Women in a Nested observational Study. Glob Adv Health Med. 2013;2(2):76–9.

Article   PubMed   PubMed Central   Google Scholar  

Frank AT, Zhao B, Jose PO, Azar KM, Fortmann SP, Palaniappan LP. Racial/ethnic differences in dyslipidemia patterns. Circulation. 2014;129(5):570–9.

Article   CAS   PubMed   Google Scholar  

Butler FM, Utt J, Mathew RO, Casiano CA, Montgomery S, Wiafe SA, et al. Plasma metabolomics profiles in Black and White participants of the Adventist Health Study-2 cohort. BMC Med. 2023;21(1):408.

Hu J, Yao J, Deng S, Balasubramanian R, Jiménez MC, Li J, et al. Differences in Metabolomic Profiles Between Black and White Women and Risk of Coronary Heart Disease: an Observational Study of Women From Four US Cohorts. Circ Res. 2022;131(7):601–15.

Cai X, Perttula K, Sk P, Hubbard AE, Dk N, Rappaport SM. Untargeted Lipidomic Profiling of Human Plasma Reveals Differences due to Race. Gender and Smoking Status Metabolomics. 2014;4:1–8.

Google Scholar  

Sutter I, Klingenberg R, Othman A, Rohrer L, Landmesser U, Heg D, et al. Decreased phosphatidylcholine plasmalogens – A putative novel lipid signature in patients with stable coronary artery disease and acute myocardial infarction. Atherosclerosis. 2016;246:130–40.

Download references

Acknowledgments

We would like to thank all of the women who participated in this study.

This work was supported by funding from NIH/NCI (R01CA246592 to A.T. Toriola). The content is solely the responsibility of the authors and does not represent the official view of the NIH.

Author information

Ghazaleh Pourali and Liang Li contributed equally to this work.

Authors and Affiliations

Department of Surgery, Division of Public Health Sciences, Washington University School of Medicine, 660 South Euclid Avenue, Campus, Box 8100, St. Louis, MO, 63110, USA

Ghazaleh Pourali, Kayla R. Getz, Myung Sik Jeon, Jingqin Luo, Chongliang Luo & Adetunji T. Toriola

Institute for Informatics, Data Science & Biostatistics, School of Medicine, Washington University, St. Louis, MO, USA

Siteman Cancer Center Biostatistics Shared Resource, Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA

Myung Sik Jeon, Jingqin Luo & Chongliang Luo

Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA

Adetunji T. Toriola

You can also search for this author in PubMed   Google Scholar

Contributions

ATT conceptualized the study design, acquired the data, and provided critical overview. GP, LL, and MSJ analyzed the data. KRG, GP, JL, CL, and ATT interpreted the data. GP wrote the first draft of the manuscript and all authors provided critical revision of the manuscript.

Corresponding author

Correspondence to Adetunji T. Toriola .

Ethics declarations

Ethics approval and consent to participate.

The study was performed in accordance with the Declaration of Helsinki and was approved by the Washington University Institutional Review Board. All participants provided written informed consent.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Pourali, G., Li, L., Getz, K.R. et al. Untargeted lipidomics reveals racial differences in lipid species among women. Biomark Res 12 , 79 (2024). https://doi.org/10.1186/s40364-024-00635-4

Download citation

Received : 14 June 2024

Accepted : 05 August 2024

Published : 09 August 2024

DOI : https://doi.org/10.1186/s40364-024-00635-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Body Mass Index

Biomarker Research

ISSN: 2050-7771

p value of testing a hypothesis

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

water-logo

Article Menu

p value of testing a hypothesis

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Generalization ability of bagging and boosting type deep learning models in evapotranspiration estimation.

p value of testing a hypothesis

1. Introduction

2. materials and methods, 2.1. location, 2.2. data set and methodology, 2.3. fao-56 penman monteith (p-m) eto estimation, 2.4. ensemble learning through regression tree algorithm, 2.5. random forest algorithm (bagging)—rf model, 2.6. gradient boosting (gb) algorithm—gb model, 2.7. extreme gradient boosting (xgboost) algorithm—xgboost model, 2.8. performance evaluation, 3.1. performance evaluation of rf model, 3.2. performance evaluation of the gb model, 3.3. performance evaluation of the xgboost model, 4. discussion, 5. conclusions, author contributions, data availability statement, acknowledgments, conflicts of interest.

  • Smith, M.; Allen, R.G.; Pereira, L.; Camp, C.R.; Sadler, E.J.; Yoder, R.E. Revised FAO methodology for crop water requirements. In Proceedings of the International Conference on Evapotranspiration and Irrigation Scheduling, San Antonio, TX, USA, 3–6 November 1996; ASCE: Reston, VA, USA, 1996; pp. 116–123. [ Google Scholar ]
  • Ghiat, I.; Mackey, H.R.; Al-Ansari, T. A review of evapotranspiration measurement models, techniques and methods for open and closed agricultural field applications. Water 2021 , 13 , 2523. [ Google Scholar ] [ CrossRef ]
  • Penman, H.L. Vegetation and hydrology. Soil Sci. 1963 , 96 , 357. [ Google Scholar ] [ CrossRef ]
  • Monteith, J.L. Evaporation and environment. In Symposia of the Society for Experimental Biology ; Cambridge University Press (CUP): Cambridge, UK, 1965; Volume 19, pp. 205–234. [ Google Scholar ]
  • Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapo-Transpiration: Guidelines for Computing Crop Water Requirements ; Irrigation and Drainage Paper No. 56; FAO: Rome, Italy, 1998. [ Google Scholar ]
  • Pereira, L.S.; Perrier, A.; Allen, R.G.; Alves, I. Evapotranspiration: Concepts and future trends. J. Irrig. Drain. Eng. 1999 , 125 , 45–51. [ Google Scholar ] [ CrossRef ]
  • Jensen, M.E.; Burman, R.D.; Allen, R.G. Evapotranspiration and irrigation water requirements: A manual. In ASCE Manuals and Reports on Engineering Practice (USA) ; No. 70; ASCE: New York, NY, USA, 1990. [ Google Scholar ]
  • Srivastava, A.; Sahoo, B.; Raghuwanshi, N.S.; Singh, R. Evaluation of variable-infiltration capacity model and MODIS-terra satellite-derived grid-scale evapotranspiration estimates in a River Basin with Tropical Monsoon-Type climatology. J. Irrig. Drain. Eng. 2017 , 143 , 04017028. [ Google Scholar ] [ CrossRef ]
  • Heramb, P.; Ramana Rao, K.V.; Subeesh, A.; Srivastava, A. Predictive modelling of reference evapotranspiration using machine learning models coupled with grey wolf optimizer. Water 2023 , 15 , 856. [ Google Scholar ] [ CrossRef ]
  • ASCE Task Committee on Application of Artificial Neural Networks in Hydrology. Artificial neural networks in hydrology-I: Preliminary concepts. J. Hydrol. Eng. 2000 , 5 , 115–123. [ Google Scholar ] [ CrossRef ]
  • ASCE Task Committee on Application of Artificial Neural Networks in Hydrology. Artificial neural networks in hydrology-II: Hydrologic applications. J. Hydrol. Eng. 2000 , 5 , 124–137. [ Google Scholar ] [ CrossRef ]
  • Kumar, M.; Raghuwanshi, N.S.; Singh, R. Artificial neural networks approach in evapotranspiration modeling: A review. Irrig. Sci. 2011 , 29 , 11–25. [ Google Scholar ] [ CrossRef ]
  • Kumar, M.; Raghuwanshi, N.S.; Singh, R.; Wallender, W.W.; Pruitt, W.O. Estimating evapotranspiration using artificial neural network. J. Irrig. Drain. Eng. 2002 , 128 , 224–233. [ Google Scholar ] [ CrossRef ]
  • Kumar, M.; Bandyopadhyay, A.; Raghuwanshi, N.S.; Singh, R. Comparative study of conventional and artificial neural network based ETo estimation models. Irrig. Sci. 2008 , 26 , 531–545. [ Google Scholar ] [ CrossRef ]
  • Kumar, M.; Raghuwanshi, N.S.; Singh, R. Development and validation of GANN model for evapotranspiration estimation. J. Hydrol. Eng. 2009 , 14 , 131–140. [ Google Scholar ] [ CrossRef ]
  • Eslamian, S.S.; Gohari, S.A.; Zareian, M.J.; Firoozfar, A. Estimating Penman–Monteith reference evapotranspiration using artificial neural networks and genetic algorithm: A case study. Arab. J. Sci. Eng. 2012 , 37 , 935–944. [ Google Scholar ] [ CrossRef ]
  • Adamala, S.; Raghuwanshi, N.S.; Mishra, A.; Tiwari, M.K. Evapotranspiration modeling using second-order neural networks. J. Hydrol. Eng. 2014 , 19 , 1131–1140. [ Google Scholar ] [ CrossRef ]
  • Adamala, S.; Raghuwanshi, N.S.; Mishra, A.; Tiwari, M.K. Development of generalized higher-order synaptic neural based ETo models for different agro-ecological regions in India. J. Irrig. Drain. Eng. 2014 , 140 , 04014038. [ Google Scholar ] [ CrossRef ]
  • Adamala, S.; Raghuwanshi, N.S.; Mishra, A. Generalized quadratic synaptic neural networks for ETo modeling. Environ. Process. 2015 , 2 , 309–329. [ Google Scholar ] [ CrossRef ]
  • Dai, X.; Shi, H.; Li, Y.; Ouyang, Z.; Huo, Z. Artificial neural network models for estimating regional reference evapotranspiration based on climate factors. Hydrol. Process. 2009 , 23 , 442–450. [ Google Scholar ] [ CrossRef ]
  • Jahanbani, H.; El-Shafie, A.H. Application of artificial neural network in estimating monthly time series reference evapotranspiration with minimum and maximum temperatures. Paddy Water Environ. 2011 , 9 , 207–220. [ Google Scholar ] [ CrossRef ]
  • Jain, S.K.; Nayak, P.C.; Sudheer, K.P. Models for estimating evapotranspiration using artificial neural networks, and their physical interpretation. Hydrol. Process. 2008 , 22 , 2225–2234. [ Google Scholar ] [ CrossRef ]
  • Kisi, O. The potential of different ANN techniques in evapotranspiration modelling. Hydrol. Process. 2008 , 22 , 2449–2460. [ Google Scholar ]
  • Kisi, O. Modeling reference evapotranspiration using evolutionary neural networks. J. Irrig. Drain. Eng. 2011 , 137 , 636–643. [ Google Scholar ]
  • Kisi, O. Evapotranspiration modeling using a wavelet regression model. Irrig. Sci. 2011 , 29 , 241–252. [ Google Scholar ]
  • Marti, P.; Royuela, A.; Manzano, J.; Palau-Salvador, G. Generalization of ETo ANN models through data supplanting. J. Irrig. Drain. Eng. 2010 , 136 , 161–174. [ Google Scholar ] [ CrossRef ]
  • Rahimikhoob, A. Estimation of evapotranspiration based on only air temperature data using artificial neural networks for a subtropical climate in Iran. Theor. Appl. Climatol. 2010 , 101 , 83–91. [ Google Scholar ] [ CrossRef ]
  • Zanetti, S.S.; Sousa, E.F.; Oliveira, V.P.S.; Almeida, F.T.; Bernardo, S. Estimating evapotranspiration using artificial neural network and minimum climatological data. J. Irrig. Drain. Eng. 2007 , 133 , 83–89. [ Google Scholar ] [ CrossRef ]
  • Yao, Y.; Liang, S.; Li, X.; Chen, J.; Liu, S.; Jia, K.; Zhang, X.; Xiao, Z.; Fisher, J.B.; Mu, Q.; et al. Improving global terrestrial evapotranspiration estimation using support vector machine by integrating three process-based algorithms. Agric. For. Meteorol. 2017 , 242 , 55–74. [ Google Scholar ] [ CrossRef ]
  • Hassan, M.A.; Khalil, A.; Kaseb, S.; Kassem, M.A. Exploring the potential of tree-based ensemble methods in solar radiation modelling. Appl. Energy 2017 , 203 , 897–916. [ Google Scholar ] [ CrossRef ]
  • Feng, Y.; Cui, N.; Zhao, L.; Hu, X.; Gong, D. Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China. J. Hydrol. 2016 , 536 , 376–383. [ Google Scholar ] [ CrossRef ]
  • Fan, J.; Wang, X.; Wu, L.; Zhou, H.; Zhang, F.; Yu, X.; Lu, X.; Xiang, Y. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Convers. Manag. 2018 , 164 , 102–111. [ Google Scholar ] [ CrossRef ]
  • Chia, M.Y.; Huang, Y.F.; Koo, C.H.; Fung, K.F. Recent advances in evapotranspiration estimation using artificial intelligence approaches with a focus on hybridization techniques—A review. Agronomy 2020 , 10 , 101. [ Google Scholar ] [ CrossRef ]
  • Chia, M.Y.; Huang, Y.F.; Koo, C.H. Swarm-based optimization as stochastic training strategy for estimation of reference evapotranspiration using extreme learning machine. Agric. Water Manag. 2021 , 243 , 106447. [ Google Scholar ] [ CrossRef ]
  • Tiwari, M.K.; Chatterjee, C. Development of an accurate and reliable hourly flood forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach. J. Hydrol. 2010 , 394 , 458–470. [ Google Scholar ] [ CrossRef ]
  • Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J. Hydrol. 2019 , 574 , 1029–1041. [ Google Scholar ] [ CrossRef ]
  • Ponraj, A.S.; Vigneswaran, T. Daily evapotranspiration prediction using gradient boost regression model for irrigation planning. J. Supercomput. 2020 , 76 , 5732–5744. [ Google Scholar ] [ CrossRef ]
  • Wu, M.; Feng, Q.; Wen, X.; Deo, R.C.; Yin, Z.; Yang, L.; Sheng, D. Random forest predictive model development with uncertainty analysis capability for the estimation of evapotranspiration in an Arid Oasis region. Hydrol. Res. 2020 , 51 , 648–665. [ Google Scholar ] [ CrossRef ]
  • Wu, T.; Zhang, W.; Jiao, X.; Guo, W.; Hamoud, Y.A. Comparison of five boosting-based models for estimating daily reference evapotranspiration with limited meteorological variables. PLoS ONE 2020 , 15 , e0235324. [ Google Scholar ] [ CrossRef ]
  • Mokari, E.; DuBois, D.; Samani, Z.; Mohebzadeh, H.; Djaman, K. Estimation of daily reference evapotranspiration with limited climatic data using machine learning approaches across different climate zones in New Mexico. Theor. Appl. Climatol. 2022 , 147 , 575–587. [ Google Scholar ] [ CrossRef ]
  • Pagano, A.; Amato, F.; Ippolito, M.; De Caro, D.; Croce, D.; Motisi, A.; Provenzano, G.; Tinnirello, I. Machine learning models to predict daily actual evapotranspiration of citrus orchards under regulated deficit irrigation. Ecol. Inform. 2023 , 76 , 102133. [ Google Scholar ] [ CrossRef ]
  • Kiraga, S.; Peters, R.T.; Molaei, B.; Evett, S.R.; Marek, G. Reference evapotranspiration estimation using genetic algorithm-optimized machine learning models and standardized Penman–Monteith equation in a highly advective environment. Water 2024 , 16 , 12. [ Google Scholar ] [ CrossRef ]
  • Liaw, A.; Wiener, M. Classification and regression by random forest. R News 2002 , 2 , 18–22. [ Google Scholar ]
  • Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T. Xgboost: Extreme gradient boosting. R Package Version 0.4-2 2015 , 1 , 1–4. Available online: https://cran.r-project.org/web/packages/xgboost/index.html (accessed on 1 August 2024).
  • Greenwell, B.; Boehmke, B.; Cunningham, J.; Developers, G.B.M. Gbm: Generalized boosted regression models. R Package Version 2.1-4 2018 , 2 , 37–40. Available online: https://cran.r-project.org/web/packages/gbm/index.html (accessed on 1 August 2024).
  • Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. Promot. Commun. Stat. Stata 2020 , 20 , 3–29. [ Google Scholar ] [ CrossRef ]
  • Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001 , 29 , 1189–1232. [ Google Scholar ] [ CrossRef ]
  • Agrawal, Y.; Kumar, M.; Ananthakrishnan, S.; Kumarapuram, G. Evapotranspiration modeling using different tree based ensembled machine learning algorithm. Water Resour. Manag. 2022 , 36 , 1025–1042. [ Google Scholar ] [ CrossRef ]
  • Siasar, H.; Honar, T.; Abdolahipour, M. Comparing of generalized linear models, random forest and gradient boosting trees in estimation of reference crop evapotranspiration (Case study: The Sistan plain). J. Water Soil Sci. 2020 , 23 , 395–410. [ Google Scholar ]
  • Tausif, M.; Dilshad, S.; Umer, Q.; Iqbal, M.W.; Latif, Z.; Lee, C.; Bashir, R.N. Ensemble learning-based estimation of reference evapotranspiration (ETo). Internet Things 2023 , 24 , 100973. [ Google Scholar ] [ CrossRef ]
  • Kumar, U. Modelling monthly reference evapotranspiration estimation using machine learning approach in data-scarce North Western Himalaya region (Almora), Uttarakhand. J. Earth Syst. Sci. 2023 , 132 , 129. [ Google Scholar ] [ CrossRef ]
  • Akar, F.; Katipoğlu, O.M.; Yeşilyurt, S.N.; Taş, M.B.H. Evaluation of tree-based machine learning and deep learning techniques in temperature-based potential evapotranspiration prediction. Polish J. Environ. Stud. 2023 , 32 , 1009–1023. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jayashree, T.R.; Reddy, N.V.S.; Acharya, U.D. Modeling daily reference evapotranspiration from climate variables: Assessment of bagging and boosting regression approaches. Water Resour. Manag. 2023 , 37 , 1013–1032. [ Google Scholar ]
  • Feng, Y.; Cui, N.; Gong, D.; Zhang, Q.; Zhao, L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 2017 , 193 , 163–173. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Sl. No.LocationLocation CharacteristicsLength of Records
Latitude (°N)Longitude (°E)Elevation (m)Model Development (Training)Model Testing
1.Bengaluru12.9777.599201 January 1976–31 December 20171 January 2018–30 June 2020
2.Ballari15.1476.92485-
3.Chikmaglur13.3175.771090
4.Chitradurga14.2276.4732
5.Devnagiri14.3375.99603
6.Dharwad15.4675.01750
7.Gadag15.4375.63654
8.Haveri14.7975.4571
9.Koppal15.3576.16529
10.Mandya12.5276.89678
11.Shivmoga13.9375.57569
12.Tumkuru13.3477.12822
Sl. No.LocationRF (mm)T (°C)T (°C)RH (%)RH (%)W (km/hour)SS (hours/day)P-M ET (mm/day)
1.Bengaluru650.6 + 172.5 30.99 ± 2.8 20.19 ± 1.8 77.9 ± 11.7 47.4 ± 19.7 6.76 ± 4.2 7.22 ± 3.2 6.31 ± 1.9
2.Ballari529.1 ± 124.6 32.93 ± 4.6 20.27 ± 3.0 75.1 ± 13.1 48.3 ± 20.4 7.10 ± 6.8 7.93 ± 3.1 6.92 ± 3.0
3.Chikmaglur1074.9 ± 183.7 28.42 ± 3.3 18.14 ± 1.9 82.8 ± 6.8 66.0 ± 16.6 7.79 ± 1.7 6.62 ± 3.2 5.07 ± 1.8
4.Chitradurga796.7 ± 184.4 32.03 ± 3.5 20.66 ± 2.0 80.8 ± 10.9 49.2 ± 22.9 13.2 ± 7.3 7.00 ± 3.9 7.97 ± 3.5
5.Devnagiri764.6 ± 156.7 32.23 ± 4.0 19.25 ± 2.5 77.9 ± 9.9 47.5 ± 21.2 4.58 ± 4.5 8.16 ± 1.9 6.21 ± 2.9
6.Dharwad1067.6 ± 217.1 31.64 ± 3.8 19.79 ± 2.6 77.6 ± 14.7 52.7 ± 21.9 5.85 ± 3.8 8.27 ± 2.0 6.04 ± 2.5
7.Gadag827.8 ± 158.6 32.54 ± 3.8 20.74 ± 2.2 76.4 ± 12.7 44.2 ± 20.2 9.21 ± 6.2 7.20 ± 3.2 7.53 ± 2.7
8.Haveri1038.7 ± 289.5 31.30 ± 4.0 20.50 ± 2.6 79.4 ± 11.9 52.9 ± 20.7 6.80 ± 3.8 8.14 ± 0.6 6.30 ± 2.6
9.Koppal849.1 ± 131.5 33.48 ± 3.9 21.73 ± 2.4 78.4 ± 13.4 53.2 ± 20.9 4.76 ± 4.3 6.79 ± 1.9 6.27 ± 3.2
10.Mandya856.4 ± 74.9 31.72 ± 2.6 20.45 ± 2.1 80.1 ± 8.4 50.1 ± 18.0 5.05 ± 2.7 8.52 ± 1.7 6.03 ± 1.9
11.Shivmoga1021.3 ± 106.4 31.89 ± 3.5 19.79 ± 2.3 82.5 ± 10.2 57.9 ± 21.2 6.39 ± 2.5 7.20 ± 1.5 6.04 ± 2.4
12.Tumkuru828.3 ± 123.8 31.45 ± 2.8 20.05 ± 1.8 75.4 ± 12.6 47.5 ± 19.1 6.87 ± 4.5 8.05 ± 1.0 6.60 ± 2.2
Statistical ModelEquation
Average Absolute Relative Error in which,
Noise-to-Signal Ratio
Mean Absolute Error
Coefficient of Correlation
Nash and Sutcliffe Efficiency
LocationModel Performance Criteria
WSEErAARENSMAEɳ
Ballari1.050.927.360.260.560.92
Bengaluru0.280.983.240.130.190.98
Chikmaglur0.330.983.600.120.160.99
Chitradurga0.990.965.140.210.450.95
Devnagiri0.740.956.450.200.410.95
Dharwad0.720.946.040.210.360.95
Gadag0.880.935.750.240.440.94
Haveri0.560.966.340.180.340.97
Koppal1.000.8013.050.320.680.89
Mandya0.310.983.730.150.210.98
Shivmoga0.550.946.660.210.360.96
Tumkuru0.390.973.780.160.250.97
LocationModel Performance Criteria
WSEErAARENSMAEɳ
Ballari0.870.955.430.210.430.95
Bengaluru0.250.983.140.120.170.98
Chikmaglur0.320.983.750.150.180.98
Chitradurga0.760.965.700.210.450.95
Devnagiri0.530.984.000.140.270.98
Dharwad0.560.974.740.160.270.97
Gadag0.660.955.140.190.340.96
Haveri0.390.984.410.100.190.99
Koppal0.610.965.380.170.330.96
Mandya0.210.992.840.110.150.99
Shivmoga0.420.975.630.160.290.97
Tumkuru0.310.983.280.140.200.98
LocationModel Performance Criteria
WSEErAARENSMAEɳ
Ballari0.840.964.910.200.400.95
Bengaluru0.190.992.130.090.120.99
Chikmaglur0.190.992.820.090.130.99
Chitradurga0.710.983.430.150.290.98
Devnagiri0.490.983.770.130.250.98
Dharwad0.500.983.890.140.230.98
Gadag0.620.973.800.160.270.97
Haveri0.330.993.170.100.190.99
Koppal0.660.956.110.180.360.96
Mandya0.190.992.430.090.130.99
Shivmoga0.340.984.180.130.230.98
Tumkuru0.220.992.350.090.150.99
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Kumar, M.; Agrawal, Y.; Adamala, S.; Pushpanjali; Subbarao, A.V.M.; Singh, V.K.; Srivastava, A. Generalization Ability of Bagging and Boosting Type Deep Learning Models in Evapotranspiration Estimation. Water 2024 , 16 , 2233. https://doi.org/10.3390/w16162233

Kumar M, Agrawal Y, Adamala S, Pushpanjali, Subbarao AVM, Singh VK, Srivastava A. Generalization Ability of Bagging and Boosting Type Deep Learning Models in Evapotranspiration Estimation. Water . 2024; 16(16):2233. https://doi.org/10.3390/w16162233

Kumar, Manoranjan, Yash Agrawal, Sirisha Adamala, Pushpanjali, A. V. M. Subbarao, V. K. Singh, and Ankur Srivastava. 2024. "Generalization Ability of Bagging and Boosting Type Deep Learning Models in Evapotranspiration Estimation" Water 16, no. 16: 2233. https://doi.org/10.3390/w16162233

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Donald Trump approaches a podium as he prepares to give a speech as president, with signs displaying the NATO logo behind him.

Fears of a NATO Withdrawal Rise as Trump Seeks a Return to Power

Current and former European diplomats said there was growing concern a second Trump presidency could mean an American retreat from the continent and a gutting of NATO.

Former President Donald J. Trump has made it clear that he primarily sees NATO as a drain on American resources. Credit... Doug Mills/The New York Times

Supported by

  • Share full article

Jonathan Swan

By Jonathan Swan Charlie Savage and Maggie Haberman

  • Published Dec. 9, 2023 Updated July 10, 2024

In December, The New York Times published a deep look at the implications for NATO if Donald J. Trump were to win the 2024 presidential election. With NATO leaders gathering in Washington this week to celebrate the alliance’s 75th anniversary, that possibility has only taken on greater urgency.

For 74 years, the North Atlantic Treaty Organization has been America’s most important military alliance. Presidents of both parties have seen NATO as a force multiplier enhancing the influence of the United States by uniting countries on both sides of the Atlantic in a vow to defend one another.

Donald J. Trump has made it clear that he sees NATO as a drain on American resources by freeloaders. He has held that view for at least a quarter of a century.

In his 2000 book, “The America We Deserve,” Mr. Trump wrote that “pulling back from Europe would save this country millions of dollars annually.” As president, he repeatedly threatened a United States withdrawal from the alliance.

Yet as he runs to regain the White House, Mr. Trump has said precious little about his intentions. His campaign website contains a single cryptic sentence : “We have to finish the process we began under my administration of fundamentally re-evaluating NATO’s purpose and NATO’s mission.” He and his team refuse to elaborate.

That vague line has generated enormous uncertainty and anxiety among European allies and American supporters of the country’s traditional foreign-policy role.

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

Advertisement

IMAGES

  1. What is P-value in hypothesis testing

    p value of testing a hypothesis

  2. Hypothesis testing tutorial using p value method

    p value of testing a hypothesis

  3. P-Value Method For Hypothesis Testing

    p value of testing a hypothesis

  4. Hypothesis testing tutorial using p value method

    p value of testing a hypothesis

  5. P-Value

    p value of testing a hypothesis

  6. Hypothesis testing p value steps to buying

    p value of testing a hypothesis

COMMENTS

  1. S.3.2 Hypothesis Testing (P-Value Approach)

    The P -value is, therefore, the area under a tn - 1 = t14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually. Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests.

  2. Understanding P-values

    The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true. P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis.

  3. P-Value in Statistical Hypothesis Tests: What is it?

    A p value is used in hypothesis testing to help you support or reject the null hypothesis. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. P values are expressed as decimals although it may be easier to understand what they are if you convert ...

  4. How to Find the P value: Process and Calculations

    This chart has two shaded regions because we performed a two-tailed test. Each region has a probability of 0.01559. When you sum them, you obtain the p-value of 0.03118. In other words, the likelihood of a t-value falling in either shaded region when the null hypothesis is true is 0.03118. I showed you how to find the p value for a t-test.

  5. Understanding P-Values and Statistical Significance

    A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p -value, the less likely the results occurred by random chance, and the ...

  6. Interpreting P values

    Here is the technical definition of P values: P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true. Let's go back to our hypothetical medication study. Suppose the hypothesis test generates a P value of 0.03.

  7. An Explanation of P-Values and Statistical Significance

    The textbook definition of a p-value is: A p-value is the probability of observing a sample statistic that is at least as extreme as your sample statistic, given that the null hypothesis is true. For example, suppose a factory claims that they produce tires that have a mean weight of 200 pounds. An auditor hypothesizes that the true mean weight ...

  8. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    Using P values and Significance Levels Together. If your P value is less than or equal to your alpha level, reject the null hypothesis. The P value results are consistent with our graphical representation. The P value of 0.03112 is significant at the alpha level of 0.05 but not 0.01.

  9. p-value

    The p -value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic . [ note 2] The lower the p -value is, the lower the probability of getting that result if the null hypothesis were true. A result is said to be statistically ...

  10. 9.3

    The test statistic is, therefore: Z = p ^ − p 0 p 0 ( 1 − p 0) n = 0.853 − 0.90 0.90 ( 0.10) 150 = − 1.92. And, the rejection region is: Z P lesson 9.3 α = 0.05 -1.645 0 0.90. Since the test statistic Z = −1.92 < −1.645, we reject the null hypothesis. There is sufficient evidence at the α = 0.05 level to conclude that the rate has ...

  11. P-Value: What It Is, How to Calculate It, and Why It Matters

    P-Value: The p-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. The p-value is used as an ...

  12. 9.5: The p value of a test

    9.5: The p value of a test. In one sense, our hypothesis test is complete; we've constructed a test statistic, figured out its sampling distribution if the null hypothesis is true, and then constructed the critical region for the test. Nevertheless, I've actually omitted the most important number of all: the p value.

  13. Hypothesis testing and p-values (video)

    In this video there was no critical value set for this experiment. In the last seconds of the video, Sal briefly mentions a p-value of 5% (0.05), which would have a critical of value of z = (+/-) 1.96. Since the experiment produced a z-score of 3, which is more extreme than 1.96, we reject the null hypothesis.

  14. Hypothesis Testing, P Values, Confidence Intervals, and Significance

    Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a research hypothesis is tested with results provided, typically with p values, confidence intervals, or both. Additionally, statistical or research significance is estimated or determined by the investigators. Unfortunately, healthcare providers may have different comfort levels in interpreting ...

  15. p-value Calculator

    Formally, the p-value is the probability that the test statistic will produce values at least as extreme as the value it produced for your sample.It is crucial to remember that this probability is calculated under the assumption that the null hypothesis H 0 is true!. More intuitively, p-value answers the question: Assuming that I live in a world where the null hypothesis holds, how probable is ...

  16. P-Value: Comprehensive Guide to Understand, Apply, and Interpret

    The p-value is a crucial concept in statistical hypothesis testing, providing a quantitative measure of the strength of evidence against the null hypothesis. It guides decision-making by comparing the p-value to a chosen significance level, typically 0.05.

  17. Introduction to Hypothesis Testing

    The p-value tells us the strength of evidence in support of a null hypothesis. If the p-value is less than the significance level, we reject the null hypothesis. 5. Interpret the results. Interpret the results of the hypothesis test in the context of the question being asked. The Two Types of Decision Errors

  18. S.3.2 Hypothesis Testing (P-Value Approach)

    The P -value approach involves determining "likely" or "unlikely" by determining the probability — assuming the null hypothesis was true — of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed. If the P -value is small, say less than (or equal to) α, then it is "unlikely."

  19. The p value

    In its generic form, a p-value formula can be written down as: p = P (d (X) ≥ d (x0); H0) where P stands for probability, d (X) is a test statistic (distance function) of a random variable X, x0 is a typical realization of X and H0 is the selected null hypothesis. The semi-colon means 'assuming'.

  20. P-values and significance tests (video)

    Transcript. We compare a P-value to a significance level to make a conclusion in a significance test. Given the null hypothesis is true, a p-value is the probability of getting a result as or more extreme than the sample result by random chance alone. If a p-value is lower than our significance level, we reject the null hypothesis.

  21. What Is P-Value in Statistical Hypothesis?

    The p-value or probability value is a number, calculated from a statistical test, that tells how likely it is that your results would have occurred under the null hypothesis of the test. P-values are usually automatically calculated using statistical software. They can also be calculated using p-value tables for the relevant statistical test.

  22. How to Find p Value from Test Statistic

    Understanding p-value and Test Statistic. To understand more about p-value and test statistics read the article added below: p-Value. The p-value is calculated using the test statistic's sampling distribution under the null hypothesis, the sample data, and the type of test being conducted (lower-tailed, upper-tailed, or two-sided test).

  23. Hypothesis Testing

    The p-value is a probability computed assuming the null hypothesis is true, that the test statistic would take a value as extreme or more extreme than that actually observed. Since it's a probability, it is a number between 0 and 1. The closer the number is to 0 means the event is "unlikely." So if p-value is "small," (typically, less ...

  24. Untargeted lipidomics reveals racial differences in lipid species among

    Multiple hypothesis testing was corrected using Bonferroni correction. Percentage differences were back-transformed linear regression coefficients, calculated as \({100\times (e}^{\beta }-1)\), with a 95% confidence interval. Volcano plots of log Bonferroni-adjusted p-values were generated against the percentage differences.

  25. Why the stock market is freaking out again

    Fear has set in on Wall Street, and stocks are having another miserable day. The Dow tumbled more than 1,000 points, and the broader market plunged 3% Monday. The Nasdaq, full of risky tech stocks ...

  26. Water

    The potential of generalized deep learning models developed for crop water estimation was examined in the current study. This study was conducted in a semiarid region of India, i.e., Karnataka, with daily climatic data (maximum and minimum air temperatures, maximum and minimum relative humidity, wind speed, sunshine hours, and rainfall) of 44 years (1976-2020) for twelve locations.

  27. Fears of a NATO Withdrawal Rise as Trump Seeks a Return to Power

    At least one ambassador, Finland's Mikko Hautala, has reached out directly to Mr. Trump and sought to persuade him of his country's value to NATO as a new member, according to two people ...