helpful professor logo

25 Confounding Variable Examples

25 Confounding Variable Examples

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

Learn about our Editorial Process

confounding variable example and definition, explained below

Confounding variables are variables that ‘confound’ (meaning to confuse) the data in a study. In scholarly terms, we say that they are extraneous variables that correlate (positively or negatively) with both the dependent variable and the independent variable (Scharrer & Ramasubramanian, 2021).

These variables present a challenge in research as they can obscure the potential relationships between the variables under examination, leading to spurious correlations and the famous third variable problem .

Accurately isolating and controlling confounding variables is thus crucial in maximizing the validity of an experiment or study, primarily when trying to determine cause-effect relationships between variables (Knapp, 2017; Nestor & Schutt, 2018).

chris

Confounding Variables Examples

1. IQ and Reading Ability A study could find a positive correlation between children’s IQ and reading ability. However, the socioeconomic status of the families could be a confounding variable, as children from wealthier families could have more access to books and educational resources.

2. Coffee Intake and Heart Disease A research finding suggests a positive correlation between coffee intake and heart disease. But the variable ‘exercise’ could confound the situation, as those who drink a lot of coffee might also do less exercise.

3. Medication and Recovery Time A study posits a link between a specific medication and faster recovery time from a disease. However, the overall health of the patient, which can significantly affect recovery, serves as a confounding variable.

4. Unemployment and Mental Health There seems to be a relationship between unemployment and poor mental health. However, the confounding variable can be the quality of the support network, as unemployed individuals with robust emotional support might have better mental health.

5. Exercise and Stress Levels A study might show a negative correlation between exercise levels and stress. But, sleep patterns could act as a confounder, as individuals who exercise more might also have better sleep, which in turn could lower stress levels.

6. Height and Self-esteem A study claims a positive correlation between height and self-esteem. In this case, attractiveness can confound the result, as sometimes taller people might be judged by society as more attractive, leading to higher self-esteem.

7. Class Attendance and Grades Research indicates that students who attend classes regularly have better grades. However, a student’s intrinsic motivation to learn could be a confounding variable, as these students might not only attend class but also study more outside of class.

8. Age and Job Satisfaction A study might suggest that older employees are more satisfied with their jobs. In this scenario, job position could be a confounder, as older employees might occupy higher, more gratifying positions in the company.

9. Light Exposure and Depression Researching seasonal depression might show a connection between reduced light exposure in winter and increased depression rates. However, physical activity (which tends to decrease in winter) could confound these results.

10. Parent’s Education and Children’s Success at School A study states that children of highly educated parents perform better at school. However, a confounding variable might be the parents’ income, which could allow for a range of educational resources.

11. Physical Exercise and Academic Performance A positive correlation may be found between daily physical exercise and academic performance. However, time management skills can be a potential confounder as students with good time management skills might be more likely to fit regular exercise into their schedule and also keep up with their academic work efficiently.

12. Daily Screen Time and Obesity Research suggests a link between extensive daily screen time and obesity. But the confounding variable could be the lack of physical activity, which is often associated with both increased screen time and obesity.

13. Breakfast Consumption and Academic Performance It might be suggested that students who eat breakfast regularly perform better academically. However, the confounding factor could be the overall nutritional status of the students, as those who eat breakfast regularly may also follow healthier eating habits that boost their academic performance.

14. Population Density and Disease Transmission A study may show higher disease transmission rates in densely populated areas. Still, public health infrastructure could be a confounding variable, as densely populated areas with poor health facilities might witness even higher transmission rates.

15. Age and Skin Cancer A study might suggest that older individuals are at a higher risk of skin cancer. However, exposure to sunlight, a major factor contributing to skin cancer, may confound the relationship, with individuals exposed to more sunlight over time having a greater risk.

16. Working Hours and Job Satisfaction A hypothetical study indicates that employees working longer hours report lower job satisfaction levels. However, the job’s intrinsic interest could be a confounder, as someone who finds their job genuinely interesting might report higher satisfaction levels despite working long hours.

17. Sugar Consumption and Tooth Decay Sugar intake is linked to tooth decay rates. However, dental hygiene practice is a typical confounding variable: individuals who consume a lot of sugar but maintain good oral hygiene might show lower tooth decay rates.

18. Farm Exposure and Respiratory Illness A study observes a relationship between farm exposure and reduced respiratory illnesses. Yet, a healthier overall lifestyle associated with living in rural areas might confound these results.

19. Outdoor Activities and Mental Health Research might suggest a link between participating in outdoor activities and improved mental health. However, pre-existing physical health could be a confounding variable, as those enjoying good physical health could be more likely to participate in frequent outdoor activities, thereby resulting in improved mental health.

20. Pet Ownership and Happiness A study shows that pet owners report higher levels of happiness. However, family dynamics can serve as a confounding variable, as the presence of pets might be linked to a more active and happier family life.

21. Vitamin D Levels and Depression Research indicates a correlation between low vitamin D levels and depression. However, sunlight exposure might act as a confounding variable, as it affects both vitamin D levels and mood.

22. Employee Training and Organizational Performance A positive relationship might be found between the level of employee training and organizational performance. Still, the organization’s leadership quality could confound these results, being significant in both successful employee training implementation and high organizational performance.

23. Social Media Use and Loneliness There appears to be a positive correlation between high social media use and feelings of loneliness. However, personal temperament can be a confounding variable, as individuals with certain temperaments may spend more time on social media and feel more isolated.

24. Respiratory Illnesses and Air Pollution Studies indicate that areas with higher air pollution have more respiratory illnesses. However, the time spent outdoors could be a confounding variable, as those spending more time outside in polluted areas have a higher exposure to pollutants.

25. Maternal Age and Birth Complications Advanced maternal age is linked to increased risk of birth complications. Yet, health conditions such as hypertension, more common in older women, could confound these results.

Types of Confounding Variables

The scope of confounding variables spans across order effects, participant variability, social desirability effect, Hawthorne effect, demand characteristics, and evaluation apprehension , among other types (Parker & Berman, 2016).

  • Order Effects refer to the impact on a participant’s performance or behavior brought on by the order in which the experimental tasks are presented (Riegelman, 2020). The learning or performance of a task could influence the performance or understanding of subsequent tasks (experiment with multiple language assessments: German followed by French, could have different results if tested in the reverse order).
  • Participant Variability tackles the inconsistencies stemming from unique characteristics or behaviors of individual participants, which could inadvertently impact the results. Physical fitness levels among participants in an exercise study could greatly influence the results.
  • Social Desirability Effect comes into play when participants modify their responses to be more socially acceptable, often leading to bias in self-reporting studies. For instance, in a study measuring dietary habits, participants might overreport healthy food consumption and underreport unhealthy food choices to align with what they perceive as socially desirable.
  • Hawthorne Effect constitutes a type of observer effect where individuals modify their behavior in response to being observed during a study (Nestor & Schutt, 2018; Riegelman, 2020). In a job efficiency study, employees may work harder just because they know they’re being observed.
  • Demand Characteristics include cues that might inadvertently inform participants of the experiment’s purpose or anticipated results, resulting in biased outcomes (Lock et al., 2020). If participants in a product testing study deduce the product being promoted, it might alter their responses.
  • Evaluation Apprehension could affect the findings of a study when participants’ anxiety about being evaluated leads them to alter their behavior (Boniface, 2019; Knapp, 2017). This is common in performance studies where participants know their results will be judged or compared.

Confounding variables can complicate and potentially distort the results of experiments and studies. Yet, by accurately recognizing and controlling for these confounding variables, researchers can ensure more valid findings and more precise observations about the relationships between variables. Understanding the nature and impact of confounding variables and the inherent challenges in isolating them is crucial for anyone engaged in rigorous research.

Boniface, D. R. (2019). Experiment Design and Statistical Methods For Behavioural and Social Research . CRC Press. ISBN: 9781351449298.

Knapp, H. (2017). Intermediate Statistics Using SPSS. SAGE Publications.

Lock, R. H., Lock, P. F., Morgan, K. L., Lock, E. F., & Lock, D. F. (2020). Statistics: Unlocking the Power of Data (3rd ed.). Wiley.

Nestor, P. G., & Schutt, R. K. (2018). Research Methods in Psychology: Investigating Human Behavior . SAGE Publications.

Parker, R. A., & Berman, N. G. (2016). Planning Clinical Research . Cambridge University Press.

Riegelman, R. K. (2020). Studying a Study and Testing a Test (7th ed.). Wolters Kluwer Health.

Scharrer, E., & Ramasubramanian, S. (2021). Quantitative Research Methods in Communication: The Power of Numbers for Social Justice . Taylor & Francis.

Chris

  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 25 Study Desk Aesthetic Ideas
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 25 Thoughtful Ways to Greet your Students
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 25 Pre-School Decor Ideas (Inspiring & Beautiful!)
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 25 Number Games for Kids (Free and Easy)

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

Back Home

  • Science Notes Posts
  • Contact Science Notes
  • Todd Helmenstine Biography
  • Anne Helmenstine Biography
  • Free Printable Periodic Tables (PDF and PNG)
  • Periodic Table Wallpapers
  • Interactive Periodic Table
  • Periodic Table Posters
  • Science Experiments for Kids
  • How to Grow Crystals
  • Chemistry Projects
  • Fire and Flames Projects
  • Holiday Science
  • Chemistry Problems With Answers
  • Physics Problems
  • Unit Conversion Example Problems
  • Chemistry Worksheets
  • Biology Worksheets
  • Periodic Table Worksheets
  • Physical Science Worksheets
  • Science Lab Worksheets
  • My Amazon Books

What Is a Confounding Variable? Definition and Examples

A confounding variable leads to a false association between the independent and dependent variable.

A confounding variable is a variable that influences both the independent variable and dependent variable and leads to a false correlation between them. A confounding variable is also called a confounder, confounding factor, or lurking variable. Because confounding variables often exist in experiments, correlation does not mean causation. In other words, when you see a change in the independent variable and a change in the dependent variable, you can’t be certain the two variables are related.

Here are examples of confounding variables, a look at the difference between a confounder and a mediator, and ways to reduce the risk of confounding variables leading to incorrect conclusions.

Positive and Negative Confounding

Sometimes confounding points to a false cause-and-effect relationship, while other times it masks a true effect.

  • Positive Confounding: Positive confounding overestimates the relationship between the independent and dependent variables. It biases results away from the null hypothesis.
  • Negative Confounding: Negative confounding underestimates the relationship between the independent and dependent variables. It biases results toward the null hypothesis.

Confounding Variable Examples

  • In a study where the independent variable is ice cream sales and the dependent variable is shark attacks, a researcher sees that increased sales go hand-in-hand with shark attacks. The confounding variable is the heat index. When it’s hotter, more people buy ice cream and more people go swimming in (shark-infested) waters. There’s no causal relationship between people buying ice cream and getting attacked by sharks.
  • Real Positive Confounding Example: A 1981 Harvard study linked drinking coffee to pancreatic cancer. Smoking was the confounding variable in this study. Many of the coffee drinkers in the study also smoked. When the data was adjusted for smoking, the link between coffee consumption (the independent variable) and pancreatic cancer incidence (the dependent variable) vanished.
  • Real Negative Confounding Example: In a 2008 study of the toxicity (dependent variable) of methylmercury in fish and seafood (independent variable), researchers found the beneficial nutrients in the food (confounding variable) counteracted some of the negative effects of mercury toxicity.

Correlation does not imply causation. If you’re unconvinced, check out the spurious correlations compiled by Tyler Vigen.

How to Reduce the Risk of Confounding

The first step to reduce the risk of confounding variables affecting your experiment is to try to identify anything that might affect the study. It’s a good idea to check the literature or at least ask other researchers about confounders. Otherwise, you’re likely to find out about them during peer review!

When you design an experiment, consider these techniques for reducing the effect of confounding variables:

  • Introduce control variables . For example, if you think age is a confounder, only test within a certain age group. If temperature is a potential confounder, control it.
  • Be consistent about time. Take data at the same time of day. Repeat experiments at the same time of year. Don’t vary the duration of treatments within a single experiment.
  • When possible, use double blinding. In a double blind experiment , neither the researcher nor the subject knows whether or not a treatment was applied.
  • Randomize. Select control group subjects and test subjects randomly, rather than having the researcher choose the group or (in human experiments) letting the subjects select participation.
  • Use case controls or matching. If you suspect confounding variables, match the test subject and control as much as possible. In human experiments, you might select subjects of the same age, sex, ethnicity, education, diet, etc. For animal and plant studies, you’d use pure lines. In chemical studies, use samples from the same supplier and batch.

Confounder vs Mediator or Effect Modifier

A confounder affects both the independent and dependent variables. In contrast, a mediator or effect modifier does not affect the independent variable, but does modify the effect the independent variable has on the dependent variable. For example, in a test of drug effectiveness, the drug may be more effective in children than adults. In this case, age is an effect modifier. Age doesn’t affect the drug itself, so it is not a confounder.

Confounder vs Bias

In a way, a confounding variable results in bias in that it distorts the outcome of an experiment. However, bias usually refers to a type of systematic error from experimental design, data collection, or data analysis. An experiment can contain bias without being affected by a confounding variable.

Confounding Variable: A factor that affects both the independent and dependent variables, leading to a false association between them. Effect Modifier: A variable that positively or negatively modifies the the effect of the independent variable on the dependent variable. Bias: A systematic error that masks the true effect of the independent variable on the dependent variable.

  • Axelson, O. (1989). “Confounding from smoking in occupational epidemiology”.  British Journal of Industrial Medicine .  46  (8): 505–07. doi: 10.1136/oem.46.8.505
  • Kish, L (1959). “Some statistical problems in research design”.  Am Sociol .  26  (3): 328–338. doi: 10.2307/2089381
  • VanderWeele, T.J.; Shpitser, I. (2013). “On the definition of a confounder”.  Annals of Statistics .  41  (1): 196–220. doi: 10.1214/12-aos1058
  • Yule, G. Udny (1926). “Why do we Sometimes get Nonsense-Correlations between Time-Series? A Study in Sampling and the Nature of Time-Series”.  Journal of the Royal Statistical Society . 89 (1): 1–63. doi: 10.2307/2341482

Related Posts

Confounding Variables in Psychology: Definition & Examples

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A confounding variable is an unmeasured third variable that influences, or “confounds,” the relationship between an independent and a dependent variable by suggesting the presence of a spurious correlation.

Confounding Variables in Research

Due to the presence of confounding variables in research, we should never assume that a correlation between two variables implies causation.

When an extraneous variable has not been properly controlled and interferes with the dependent variable (i.e., results), it is called a confounding variable.

Confounding Variable

For example, if there is an association between an independent variable (IV) and a dependent variable (DV), but that association is due to the fact that the two variables are both affected by a third variable (C). The association between IV and DV is extraneous.

Variable C would be considered the confounding variable in this example. We would say that the IV and DV are confounded by C whenever C causally influences both the IV and the DV.

In order to accurately estimate the effect of the IV on the DV, the researcher must reduce the effects of C.

If you identify a causal relationship between the independent variable and the dependent variable, that relationship might not actually exist because it could be affected by the presence of a confounding variable.

Even if the cause-and-effect relationship does exist, the confounding variable still might overestimate or underestimate the impact of the independent variable on the dependent variable.

Reducing Confounding Variables

It is important to identify all possible confounding variables and consider their impact of them on your research design in order to ensure the internal validity of your results.

Here are some techniques to reduce the effects of these confounding variables:
  • Random allocation : randomization will help eliminate the impact of confounding variables. You can randomly assign half of your subjects to a treatment group and the other half to a control group. This will ensure that confounders have the same effect on both groups, so they cannot correlate with your independent variable.
  • Control variables : This involves restricting the treatment group only to include subjects with the same potential for confounding factors. For example, you can restrict your subject pool by age, sex, demographic, level of education, or weight (etc.) to ensure that these variables are the same among all subjects and thus cannot confound the cause-and-effect relationship at hand.
  • Within-subjects design : In a within-subjects design, all participants participate in every condition.
  • Case-control studies : Case-control studies assign confounders to both groups (the experimental group and the control group) equally.

Suppose we wanted to measure the effects of caloric intake (IV) on weight (DV). We would have to try to ensure that confounding variables did not affect the results. These variables could include the following:

  • Metabolic rate : If you have a faster metabolism, you tend to burn calories more quickly.
  • Age : Age can affect weight gain differently, as younger individuals tend to burn calories quicker than older individuals.
  • Physical Activity : Those who exercise or are more active will burn more calories and could weigh less, even if they consume more.
  • Height : Taller individuals tend to need to consume more calories in order to gain weight.
  • Sex : Men and women have different caloric needs to maintain a certain weight.

Frequently asked questions

1. what is a confounding variable in psychology.

A confounding variable in psychology is an extraneous factor that interferes with the relationship between an experiment’s independent and dependent variables . It’s not the variable of interest but can influence the outcome, leading to inaccurate conclusions about the relationship being studied.

For instance, if studying the impact of studying time on test scores, a confounding variable might be a student’s inherent aptitude or previous knowledge.

2. What is the difference between an extraneous variable and a confounding variable?

A confounding variable is a type of extraneous variable . Confounding variables affect both the independent and dependent variables. They influence the dependent variable directly and either correlate with or causally affect the independent variable.

An extraneous variable is any variable that you are not investigating that can influence the dependent variable.

3. What is Confounding Bias?

Confounding bias is a bias that is the result of having confounding variables in your study design. If the observed association overestimates the effect of the independent variable on the dependent variable, this is known as a positive confounding bias.

If the observed association underestimates the effect of the independent variable on the dependent variable, this is known as a negative confounding bias.

Glen, Stephanie. Confounding Variable: Simple Definition and Example. Retrieved from StatisticsHowTo.com: Elementary Statistics for the rest of us! https://www.statisticshowto.com/experimental-design/confounding-variable/

Thomas, L. (2021). Understanding confounding variables. Scribbr. Retrieved from https://www.scribbr.com/methodology/confounding-variables/

University of Michigan. (n.d.). Confounding Variables. ICPSR. Retrieved from https://www.icpsr.umich.edu/web/pages/instructors/setups2012/exercises/notes/confounding-variable.html

Print Friendly, PDF & Email

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Confounding Variable: Definition & Examples

By Jim Frost 86 Comments

Confounding Variable Definition

In studies examining possible causal links, a confounding variable is an unaccounted factor that impacts both the potential cause and effect and can distort the results. Recognizing and addressing these variables in your experimental design is crucial for producing valid findings. Statisticians also refer to confounding variables that cause bias as confounders, omitted variables, and lurking variables .

diagram that displays how confounding works.

A confounding variable systematically influences both an independent and dependent variable in a manner that changes the apparent relationship between them. Failing to account for a confounding variable can bias your results, leading to erroneous interpretations. This bias can produce the following problems:

  • Overestimate the strength of an effect.
  • Underestimate the strength of an effect.
  • Change the direction of an effect.
  • Mask an effect that actually exists.
  • Create Spurious Correlations .

Additionally confounding variables reduce an experiment’s internal validity , thereby reducing its ability to make causal inferences about treatment effects. You don’t want any of these problems!

In this post, you’ll learn about confounding variables, the problems they cause, and how to minimize their effects. I’ll provide plenty of examples along the way!

What is a Confounding Variable?

Confounding variables bias the results when researchers don’t account for them. How can variables you don’t measure affect the results for variables that you record? At first glance, this problem might not make sense.

Confounding variables influence both the independent and dependent variable, distorting the observed relationship between them. To be a confounding variable, the following two conditions must exist:

  • It must correlate with the dependent variable.
  • It must correlate with at least one independent variable in the experiment.

The diagram below illustrates these two conditions. There must be non-zero correlations (r) on all three sides of the triangle. X1 is the independent variable of interest while Y is the dependent variable. X2 is the confounding variable.

Diagram that displays the conditions for confounding variables to produce bias.

The correlation structure can cause confounding variables to bias the results that appear in your statistical output. In short, The amount of bias depends on the strength of these correlations. Strong correlations produce greater bias. If the relationships are weak, the bias might not be severe. If any of the correlations are zero, the extraneous variable won’t produce bias even if the researchers don’t control for it.

Leaving a confounding variable out of a regression model can produce omitted variable bias .

Confounding Variable Examples

Exercise and weight loss.

In a study examining the relationship between regular exercise and weight loss, diet is a confounding variable. People who exercise are likely to have other healthy habits that affect weight loss, such as diet. Without controlling for dietary habits, it’s unclear whether weight loss is due to exercise, changes in diet, or both.

Education and Income Level

When researching the correlation between the level of education and income, geographic location can be a confounding variable. Different regions may have varying economic opportunities, influencing income levels irrespective of education. Without controlling for location, you can’t be sure if education or location is driving income.

Exercise and Bone Density

I used to work in a biomechanics lab. For a bone density study, we measured various characteristics including the subjects’ activity levels, their weights, and bone densities among many others. Bone growth theories suggest that a positive correlation between activity level and bone density likely exists. Higher activity should produce greater bone density.

Early in the study, I wanted to validate our initial data quickly by using simple regression analysis to assess the relationship between activity and bone density. There should be a positive relationship. To my great surprise, there was no relationship at all!

Long story short, a confounding variable was hiding a significant positive correlation between activity and bone density. The offending variable was the subjects’ weights because it correlates with both the independent (activity) and dependent variable (bone density), thus allowing it to bias the results.

After including weight in the regression model, the results indicated that both activity and weight are statistically significant and positively correlate with bone density. Accounting for the confounding variable revealed the true relationship!

The diagram below shows the signs of the correlations between the variables. In the next section, I’ll explain how the confounder (Weight) hid the true relationship.

Diagram of the bone density model.

Related post : Identifying Independent and Dependent Variables

How the Confounder Hid the Relationship

The diagram for the Activity and Bone Density study indicates the conditions exist for the confounding variable (Weight) to bias the results because all three sides of the triangle have non-zero correlations. Let’s find out how leaving the confounding variable of weight out of the model masked the relationship between activity and bone density.

The correlation structure produces two opposing effects of activity. More active subjects get a bone density boost directly. However, they also tend to weigh less, which reduces bone density.

When I fit a regression model with only activity, the model had to attribute both opposing effects to activity alone. Hence, the zero correlation. However, when I fit the model with both activity and weight, it could assign the opposing effects to each variable separately.

Now imagine if we didn’t have the weight data. We wouldn’t have discovered the positive correlation between activity and bone density. Hence, the example shows the importance of controlling confounding variables. Which leads to the next section!

Reducing the Effect of Confounding Variables

As you saw above, accounting for the influence of confounding variables is essential to ensure your findings’ validity . Here are four methods to reduce their effects.

Restriction

Restriction involves limiting the study population to a specific group or criteria to eliminate confounding variables.

For example, in a study on the effects of caffeine on heart rate, researchers might restrict participants to non-smokers. This restriction eliminates smoking as a confounder that can influence heart rate.

This process involves pairing subjects by matching characteristics pertinent to the study. Then, researchers randomly assign one individual from each pair to the control group and the other to the experimental group. This randomness helps eliminate bias, ensuring a balanced and fair comparison between groups. This process controls confounding variables by equalizing them between groups. The goal is to create groups as similar as possible except for the experimental treatment.

For example, in a study examining the impact of a new education method on student performance, researchers match students on age, socioeconomic status, and baseline academic performance to control these potential confounders.

Learn more about Matched Pairs Design: Use & Examples .

Random Assignment

Randomly assigning subjects to the control and treatment groups helps ensure that the groups are statistically similar, minimizing the influence of confounding variables.

For example, in clinical trials for a new medication, participants are randomly assigned to either the treatment or control group. This random assignment helps evenly distribute variables such as age, gender, and health status across both groups.

Learn more about Random Assignment in Experiments .

Statistical Control

Statistical control involves using analytical techniques to adjust for the effect of confounding variables in the analysis phase. Researchers can use methods like regression analysis to control potential confounders.

For example, I showed you how I controlled for weight as a confounding variable in the bone density study. Including weight in the regression model revealed the genuine relationship between activity and bone density.

Learn more about controlling confounders by using regression analysis .

By incorporating these strategies into research design and analysis, researchers can significantly reduce the impact of confounding variables, leading to more accurate results.

If you aren’t careful, the hidden hazards of a confounding variable can completely flip the results of your experiment!

Kamangar F. Confounding variables in epidemiologic studies: basics and beyond . Arch Iran Med. 2012 Aug;15(8):508-16. PMID: 22827790.

Share this:

confounding variable in experiment example

Reader Interactions

' src=

January 15, 2024 at 10:02 am

To address this potential problem, I collect all the possible variables and create a correlation matrix to identify all the correlations, there direction, and their statistical significance, before regression.

' src=

January 15, 2024 at 2:54 pm

That’s a great practice for understanding the underlying correlation structure of your data. Definitely a good thing to do along with graphing the scatterplots for all those pairs because they’re good at displaying curved relationships that might not register with Pearson’s correlation.

It’s been awhile since I worked on the bone density study, but I’m sure I created that correlation & scatterplot matrix to get the lay of the land.

A couple of caveats:

Those correlations are pairwise relationships, equivalent to one predictor for a response (but without the directionality). So, those correlations can be affected by a confounding variable just like a simple regression model. Going back to the example in my post, if I did a pairwise correlation between all variables, including activity and bone density, that would’ve still been essentially zero–affected by the weight confounder in the same way as the regression model. At least with a correlation matrix, you’d be able to piece together that weight was a confounder likely affecting the other correlation.

And a confounder can exist outside your dataset. You might not have even measured a confounder, so it won’t be in your correlation matrix, but it can still impact your results. Hence, it’s always good to consider variables that you didn’t record as well.

I’m guessing you know all that, I’m more spelling it out for other readers.

And if I’m remember correctly, your background is more with randomized experiments. The random assignment process should break any correlation between a confounder and the outcome, making it essentially zero. Consequently, randomizes experiments tend to prevent confounding variables from affecting the results.

' src=

July 17, 2023 at 11:11 am

Hi Jim, In multivariate regression, I have always removed variables that aren’t significant. However, recently a reviewer said that this approach is unjustified. Is there a consensus about this? a reference article? thanks, Ray

July 17, 2023 at 4:52 pm

Hi Raymond,

I don’t have an article handy to refer you to. But based on what happens to models when you retain and exclude variables, I recommend the following approach.

Deciding whether to eliminate an insignificant independent variable from a regression model requires a thorough understanding of the theoretical implications related to that variable. If there’s strong theoretical justification for its inclusion, it might be advisable to keep it within the model, despite its insignificance.

Maintaining an insignificant variable in the model does not typically degrade its overall performance. On the contrary, removing a theoretically justified but insignificant variable can lead to biased outcomes for the remaining independent variables, a situation known as omitted variable bias . Therefore, it can be beneficial to retain an insignificant variable within the model.

It’s vital to consider two major aspects when making this decision. Firstly, whether there’s strong theoretical support for retaining the insignificant variable, and secondly, whether excluding it has a significant impact on the coefficient estimates of the remaining variables. In short, if you remove an insignificant variable and the other coefficients change, you need to assess the situation.

If there are no theoretical reasons to retain an insignificant variable and removing it doesn’t appear to bias the result, then you probably should remove it because it might increase the precision of your model somewhat.

Consequently, I advise “considering” the removal of insignificant independent variables from the model, instead of asserting that you “should” remove them, as this decision depends on the aforementioned factors and is not a hard-and-fast rule. Of course, when you do the write-up, explain your reasoning for including insignificant variables along with everything else.

' src=

January 16, 2023 at 5:31 pm

Thank you very much! That helped a lot.

January 15, 2023 at 9:12 am

thank you for the interesting post. I would like to ask a question because I think that I am very much stuck into a discipline mismatch. I come from economics but I am now working in the social sciences field.

You describe that conditions for confounding bias: 1) there is a correlation between x1 and x2 (the OVB) 2) x1 associates with y 3) x2 associates with y. I interpret 1) as that sometime x1 may determine x2 or the contrary.

However, I read quite recently a social stat paper in which they define confounding bias differently. 2)3) still hold but 1) says that x2 –> x1, not the contrary. So, the direction of the relationship cannot go the other way around. Otherwise that would be mediation..

I am a bit confused and think that this could be due to the different disciplines but I would be interested in knowing what you think.

Thank you. Best, Vero

January 16, 2023 at 12:56 am

Hi Veronica,

Some of your notation looks garbled in the comment, but I think I get the gist of your question. Unfortunately, the comments section doesn’t handle formatting well!

So, X 1 and X 2 are explanatory variables while Y is the outcome. The two x variables correlate with each other and the Y variable. In this scenario, yes, if you exclude X 2 , it will cause some degree of omitted variable bias. It is a confounding variable. The degree of bias depends on the collective strength of all three correlations.

Now, as for the question of the direction of the relationship between X 1 and X 2 , that doesn’t matter statistically. As long as the correlation is there, the potential for confounding bias exists. This is true whether the relationship between X 1 and X 2 is causal in either direction or totally non-causal. It just depends on the set of correlations existing.

I think you’re correct in that this is a difference between disciplines.

The social sciences define a mediator variable as explaining the process by which two variables are related, which gets to your point about the direction of a causal relationship. When X 1 –> X 2 , I’d say that the social sciences would call that a mediator variable AND that X 2 is still a confounder that will cause bias if it is omitted from the model. Both things are true.

I hope that helps!

' src=

October 10, 2022 at 11:07 am

Thanks in advance for your awesome content.

Regarding this question brought by Lucy, I want to ask the following: If introducing variables reduces the bias (because the model controls for it), why don’t we just insert all variables at once to see the real impact of each variable?

Let’s say I have a dataset of 150 observations and I want to study the impact of 20 variables (dummies and continuous), it is advantageous to introduce everything at once and see which variables are significant? I got the idea that introducing variables is always positive because it forces the model to show the real effects (of course I am talking about fundamented variables), but are there any caveats of doing so? Is it possible that some variables in fact may “hide” the significance of others because they will overshadow the others regressors? Usually it is said that, if the significance changes when introducing a variable, it was due to confounding. My question now is: is possible that confounding was not case and, in fact, the significance is just being hiden due to a present of a much more strong predictor?

October 10, 2022 at 8:10 pm

In some ways, you’re correct. Generally speaking, it is better to include too many variables than too few. However, there is a cost for including more variables than necessary, particularly when they’re not significant. Adding more variables than needed increases the model’s variance, which reduces statistical power and precision of the estimates. Ideally, you want a balance of all the necessary variables, no more, and no less. I write about this tradeoff in my post about selecting the best model . That should answer a lot of your questions.

I think the approach of starting with model with all possible variables has merit. You can always start removing the ones that are not significant. Just do that by removing one at a time and start by removing the least significant. Watch for any abrupt changes in coefficient signs and p-values as you remove each one.

As for caveats, there are rules of thumb as to how many independent variables you can include in a model based on how many observations you have. If you include too many, you can run into overfitting, which can produce whacky results. Read my post about overfitting models for information about that. So, in some cases, you just won’t be able to add all the potential variables at once, but that depends on the number of variables versus the number of observations. The overfitting post describes that.

And, to answer your last question, overfitting is another case where adding variables can change the significance that’s not due to confounding.

' src=

January 20, 2022 at 8:10 am

Thanks for the clear explanation, it was reallly helpful! I do have a question regarding this sentence: “The important takeaway here is that leaving out a confounding variable not only reduces the goodness-of-fit (larger residuals), but it can also bias the coefficient estimates.”

Is it always the case that leaving out a confounding variable leads to a lesser fit? I was thinking about the case of positive bias: say variables x and y are both negatively correlated with the dependent variable, but x and y are positively correlated with each other. If a high value for x is caused by a high value of y both variables ‘convey the information’ of variable y. So adding variable x to a model wouldn’t add any additional information, and thus wouldn’t improve the fit of the model.

Am I making a mistake in my reasoning somewhere? Or does leaving out a confounding variable not lead to a worse fit in this case?

Thanks again for the article! Sterre

January 20, 2022 at 2:20 pm

Think about it this way. In general, adding an IV always causes R-squared to increase to some degree–even when it’s only a chance correlation. That still applies when you add a confounding variable. However, with a confounding variable, you know it’s an appropriate variable to add.

Yes, the correlation with the IV in the model might capture some of the confounder’s explanatory power, but you can also be sure that adding it will cause the model to fit better. And, again, it’s an entirely appropriate variable to include because of its relationship with the DV (i.e., you’re not adding it just to artificially inflate R-squared/goodness-of-fit). Additionally, unless there’s a perfect correlation between the included IV and the confounder, the included IV can’t contain all the confounder’s information. But, if there was a perfect correlation, you wouldn’t be able to add both anyway.

There are cases where you might not want to include the confounder. If you’re mainly interested in making predictions and don’t need to understand the role of each IV, you might not need to include the confounder if your model makes sufficiently precise predictions. That’s particularly true if the confounder is difficult/expensive to measure.

Alternatively, if there is a very high, but not perfect correlation, between the included IV and the confounder, adding the confounder might introduce too much multicollinearity , which causes its own problems. So, you might be willing to take the tradeoff between exchanging multicollinearity issues for omitted variable bias. However, that’s a very specific weighing of pros and cons given the relative degree of severity for both problems for your specific model. So, there’s no general advice for which way to go. It’s also important to note that there are other types of regression analysis (Ridge and LASSO) that can effectively handle multicollinearity, although at the cost of introducing a slight bias. Another possibility to balance!

But, to your main question, yes, if you add the confounder, you can expect the model fit to improve to some degree. It may or may not be an improvement that’s important in a practical sense. Even if the fit isn’t notably better, it’s often worthwhile adding the confounder to address the bias.

' src=

May 2, 2021 at 4:23 pm

Jim, this was a great article, but I do not understand the table. I am sure it is easy, and I am missing something basic. what does it mean to be included and omitted: negative correlation…. etc. in the 2 way by 2 way table? I cannot wrap my head around the titles, and correspdonding scenarios. thanks John

May 3, 2021 at 9:39 pm

When I refer to “included” and “omitted,” I’m talking about whether the variable in question an independent variable IN the model (included), or a potential independent variable that is NOT in the model (omitted). After all, we’re talking about omitted variable bias, which is the bias caused by leaving an important variable out of the model.

The table allows you to determine the direction the coefficient estimate is being biased if you can determine the direction of the correlation between several variables.

In the example, I’m looking at a model where Activity (the included IV) predicts the bone density of the individual (the DV). The omitted confounder is weight. So, now we just need to assess the relationships between those variables to determine the direction of the bias. I explain the process of using the table with this example in the paragraph below the table, so I won’t retype it here. But, if you don’t understand something I write there, PLEASE let me know and I’ll help clarify it!

In the example, Activity = Included, Weight = Omitted, and Dependent = Bone Density. I use the signs from the triangle diagram that include a ways before the table which lists these three variables to determine the column and row to use.

Again, I’m not sure which part is tripping you up!

' src=

April 27, 2021 at 2:23 am

Thank you Jim ! The two groups are both people with illness, only different because they are illnesses that occur in different ages. The first illness group is of younger age like around 30, the other of older age around 45. Overlap of ages between these groups is very minimal. By control group, I meant a third group of healthy people without illness, and has ages uniformly distributed in the range represented in the two patient groups, and thus the group factor having three levels now.. I was thinking if this can reduce the previous problem of directly comparing the young and old patient groups where adding age as covariate can cause collinearity problem..

April 28, 2021 at 10:42 pm

Ah, ok. I didn’t realize that both groups had an illness. Usually a control group won’t have a condition.

I really wouldn’t worry about the type of multicollinearity you’re referring to. You’d want to include those two groups and age plus the interaction term, which you could remove if it’s not significant. If the two groups were completely distinct in age and had a decent gap between them, there are other model estimate problems to worry about, but that doesn’t seem to be the case. If age is a factor in this study area, you definitely don’t want to exclude it. Including it allows you to control for it. Otherwise, if you leave it out, the age effect will get rolled into the groups and, thereby, bias your results. Including age is particularly important in your case because you know the groups are unbalanced in age. You don’t want the model to attribute the difference in outcomes to the illness condition when it’s actually age that is unbalanced between those two conditions. I’d go so far to say that your model urgently needs you to include age!

That said, I would collect a true control group that has healthy people and ideally a broad range of ages that covers both groups. That will give you several benefits. Right now, you won’t know how your illness groups compare to a healthy group. You’ll only know how they compare to each other. Having that third group will allow you to compare each illness group to the healthy group. I’m assuming that’s useful information. Plus, having a full range of ages will allow the model to produce a better estimate of the age effect.

April 26, 2021 at 6:51 am

Hi JIm, Thanks a lot for your intuitive explanations!!

I want to study the effect of two Groups of patients (X1) on y (a test performance score), in a GLM framework. Age (X2) and Education (X3) are potential confounders on y.

However its not possible to match these two groups for age, as they are illnesses that occur in different age groups-one group is younger than the other. Hence the mean ages are significantly different between these groups.

I’m afraid adding age as a covariate could potentially cause multicollinearity problem as age is significantly different between groups, and make the estimation of group effect (β1) erroneous, although it might improve the model. Is recruiting a control group with age distribution comparable to the pooled patient groups, hence of a mean age mid-way between the two patient groups a good idea to improve the statistical power of the study? In this case my group factor X1 will have three levels. Can this reduce the multicollinearity problem to an extent as the ages of patients in the two patient groups are approximately represented in the control group also..? Should I add an interaction term of Age*Group in the GLM to account for the age difference between groups..? Thank you in advance.. -Mohan

April 26, 2021 at 11:13 pm

I’d at least try including age to see what happens. If there’s any overlap in age between the two groups, I think you’ll be ok. Even if there is no overlap, age is obviously a crucial variable. My guess would be that it’s doing more harm by excluding it from the model when it’s clearly important.

I’m a bit confused by what you’re suggesting for the control group. Isn’t one of your groups those individuals with the condition and the other without it?

It does sound possible that there would be an interaction effect in this case. I’d definitely try fitting and see what the results are! That interaction term would show whether the relationship between age and test score is different between the groups.

' src=

April 26, 2021 at 12:44 am

In the paragraph below the table, both weight and activity are referred to as included variables.

April 26, 2021 at 12:50 am

Hi Joshua, yes, you’re correct! A big thanks! I’ve corrected the text. In that example, activity is the included variable, weight is the omitted variable, and bone density it the dependent variable.

' src=

April 24, 2021 at 1:06 pm

Hi, Jim. Great article. However, is that a typo in the direction of omitted variable bias table? For the rows, it makes more sense to me if they were “correlation between dependent and omitted variables” instead of between depedent and included variables”.

April 25, 2021 at 11:21 pm

No, that’s not a typo!

' src=

April 22, 2021 at 9:53 am

Please let me know if this summary makes sense. Again, Thanks for the great posts !

Scenario 1: There are 10 IVs. They are modeled using OLS. We get the regression coefficients.

Scenario 2: One of the IVs is removed. It is not a confounder. The only impact is on the residuals (they increase). The coefficients obtained in Scenario 1 remain intact. Is that correct ?

Scenario 3: The IV that was removed in Scenario 2, is placed back into the mix. This time, another IV is removed. Now this one’s a confounder. OLS modeling is re-run. There are 3 resutls.

1) The residuals increase — because it is correlated with the dependent variable. 2) The coefficient of the other IV, to which this removed confounder is correlated, changes. 3) The coefficients of the other IVs remain intact.

Are these 3 scenarios an accurate summary, Jim? A reply would be much appreciated !

Again, do keep up the good work.

April 25, 2021 at 11:26 pm

Yes, that all sounds right on! 🙂

April 22, 2021 at 8:37 am

Great post, Jim !

Probably a basic question, but would appreciate your answer on this, since we have encountered this in practical scenarios. Thanks in advance.

What if we know of a variable that should get included on the IV side, we don’t have data for that, we know (from domain expertise) that it is correlated with the dependent variable, but it is not correlated with any of the IVs…In other words, it is not a confounding variable in the strictest sense of the term (since it is not correlated to any of the IVs).

How do we account for such variables?

Here again the solution would be to use proxy variables? In other words, can we consider proxy variables to be a workaround for not just confounders, but also non-confounders of the above type ?

Thanks again !

April 23, 2021 at 11:20 pm

I discuss several methods in this article. The one I’d recommend if at all possible is identifying a proxy variable that stands in for the important variable that you don’t have. It sounds like in your case it’s not a confounder. So, it’s probably not biasing your other coefficients. However, your model is missing important information. You might be able to improve the precision using a proxy variable.

' src=

March 19, 2021 at 10:45 am

Hi Jim, that article is helping me a lot during my research project, thank you so for that! However, there is one question for which I couldn’t find a satisfactory answer on the internet, so I hope that maybe you can shed some light on this: In my panel regression, I have my main independent variable on “Policy Uncertainty”, that catpures uncertainty related to the possible impact of future government policies. It is based on an index that has a mean of 100. My dependent variable is whether a firm has received funding in quarter t (Yes = 1, No = 0), thus I want to estimate the impact of policy uncertainty on the likelihood of receiving external funding. In my baseline regression, the coefficient on policy uncertainty is insignificant, suggesting that policy uncertainty has no impact. When I now add a proxy for uncertainty related finincial markets (e.g. implied stock market volatitily), then policy uncertainty becomes significant at the 1% level and the market uncertainty proxy is statistically significant at the 1% level too! The correlation between both is rather low, 0.2. Furthermore, both have opposite signs (poilcy uncertainty is positively associated with the likelihood of receiving funding), additionally, the magnitude of the coefficients is comparable.

Now am I wondering what this tells me…did the variable on policy uncertainty previously capture the effect of market uncertainty before including the latter in regression? Would be great if you could help 🙂

March 19, 2021 at 2:56 pm

Thanks for writing with the interesting questions!

First, I’ll assume you’re using binary logistic regression because you have a binary dependent variable. For logistic regression, you don’t interpret the coefficients that same ways as you do for say least squares regression. Typically, you’ll assess the odds ratio to understand the IVs relationship to the binary DV.

On to your example. It’s entirely possible that leaving out market uncertainty was causing omitted variable bias in the policy uncertainty. That might be what is happening. But, the positive sign of one and the negative sign of the other could be cancelling each other out when you only include the one. That is what happens in the example I use in this post. However, for that type of bias/confounding, you’d expect there to be a correlation between the two DVs and you say it is low.

Another possibility is the fact that for each variable in a model, the significance refers to the Adj SS for the variable, which factors in all the other variables before entering variable in question. So, the policy uncertainty in the model with market volatility is significant after accounting for the variance that the other variables explain, including market volatility. For the model without market volatility, the policy uncertainty is not significant in that different pool of remaining variability. Given the low correlation (0.2) between those two IVs, I’d lean towards this explanation. If there was a stronger correlation between the policy/market uncertainty, I’d lean towards omitted variable bias.

Also be sure that your model doesn’t have any other type of problems, such as overfitting or patterns in the residual plots . Those can cause weird things to happen with the coefficients.

It can be unnerving when the significance of one variable depends entirely on the presence of another variable. It makes choosing the correct model difficult! I’d let theory be your guide. I write about that towards the end of my post about selecting the correct regression model . That’s written in the contest of least squares regression, but the same ideas about theory and other research apply here.

You should definitely investigate this mystery further!

' src=

February 11, 2021 at 12:31 am

Thank you for this blog. I have a question: If two independent variables are corelated, can we not convert one into the other and replace that in the model? For example, If Y=X1 +X2, and X2= – 0.5X1, then Y=0.5X1. However, I don’t see that as a suggestion in the blog. The blog mentions that activity is related to weight, but then somehow both are finally included in the model, rather than replacing one with the other in the model. Will this not help with multicollinearity, too? I am sure I am missing something here that you can see, but I am unable to find that out. Can you please help?

Regards, Kushal Jain

February 11, 2021 at 4:45 pm

Why would you want to convert one to another? Typically, you want to understand the relationship between each independent variable and the dependent variable. In the model I talk about, I’d want to know the relationship between both activity and weight with bone density. Converting activity to weight does not help with that.

And, I’m not understanding what you mean by “then somehow both are finally included in the model.” You just include both variables in the model the normal way.

There’s no benefit to converting the variables as you describe and there are reasons not to do that!

' src=

November 25, 2020 at 2:22 pm

Hi Jim, I have been trying to figure out covariates for a study we are doing for some time. My colleague believes that if two covariates have a high correlation (>20%) then one should be removed from the model. I’m assuming this is true unless both are correlated to the dependent variable, per your discussion above? Also, what do you think about selecting covariates by using the 10% change method? Any thoughts would be helpful. We’ve had a heck of a time selecting covariates for this study. Thanks, Erin

November 27, 2020 at 2:06 am

It’s usually ok to have covariates that have a correlation greater than 20%. The exact value depends on the number of covariates and the strength of their correlations. But 20% is low and almost never a problem. When covariates are corelated, it’s known as multicollinearity. And, there’s a special measure known as VIFs that determine whether you have an excessive amount of correlation amongst your covariates. I have a post that discusses multicollinearity and how to detect and correct it .

I have not used the 10% change method myself. However, I would suggest using that method only as one point of information. I’d really place more emphasis on theory and understanding the subject area. However, observing how much a covariate changes can provide useful information about whether bias is a problem or not. In general, if you’re uncertain, I’d err on the side of unnecessarily including a covariate than leaving it out. There are usually fewer problems associated with having an additional variable than omitting one. However, keep an eye out on the VIFs as you do that. And, having a number of unnecessary variables could lead to problems if taken to an extreme or if you have a really small sample size.

I wrote a post about model selection . I give some practical tips in it. Overall, I suggest using a mix of theory, subject area knowledge, and statistical approaches. I’d suggest reading that. It’s not specifically about controlling for confounders but the same principles apply. Also, I’d highly recommend reading about what researchers performing similar studies have done if that’s at all possible. They might have already addressed that issue!

' src=

November 5, 2020 at 6:29 am

Hi Jim Im not sure whether my problem fits under this category or not so apologies if not. I am looking at whether an inflammatory biomarker (independant variable) correlates with a measure of cognitive function (dependant variable). It does if its just a simple linear regression however the biomarker (independant variable) is affected by age, sex and whether you’re a smoker or not. Correcting for these 3 covariables in the model shows that actually there is no correlation between the biomarker and cognitive function. I assume this was the correct thing to do but wanted to make sure seeing as a) none of the 3 covariables correlate with/predict my dependant variable, and b) as age correlates highly with the biomarker, does this not introduce colinearity? Thanks! Charlotte

November 6, 2020 at 9:46 pm

Hi Charlotte,

Yes, it sounds like you did the right thing. Including the other variables in the model allows the model to control for them.

The collinearity (aka multicollinearity or correlation between independent variables) between age and the biomarker is a potential concern. However, a little correlation, or a moderate amount of correlation is fine. What you really need to do is to assess the VIFs for your independent variables. I discuss VIFs and multicollinearity in my post about multicollinearity . So, your next step should be to determine whether you have problematic levels of multicollinearity.

One symptom of multicollinearity is a lack of statistical significance, which your model is experience. So, it would be good to check.

Actually, I’m noticing that at least several of your independent variables are binary. Smoker. Gender. Is the biomarker also binary? Present or not present? If so, that’s doesn’t change the rational for including the other variables in the model but it does mean VIFs won’t detect the multicollinearity.

' src=

October 28, 2020 at 9:33 pm

Thanks for the clarification, Jim. Best regards.

October 24, 2020 at 11:30 pm

I think the section on “Predicting the Direction of Omitted Variable Bias” has a typo on the first column, first two rows. It should state:

*Omitted* and Dependent: Negative Correlation

*Omitted* and Dependent: Positive Correlation

This makes it consistent with the required two conditions for Omitted Variable Bias to occurs:

The *omitted* variable must correlate with the dependent variable. The omitted variable must correlate with at least one independent variable that is in the regression model.

October 25, 2020 at 12:24 am

Hi Humberto,

Thanks for the close reading of my article! The table is correct as it is, but you are also correct. Let’s see why!

There are the following two requirements for omitted variable bias to exist: *The omitted variable must correlate with an IV in the model. *That IV must correlate with the DV.

The table accurately depicts both those conditions. The columns indicate the relationship between the IV (included) and omitted variable. The rows indicate the nature of the relationship between the IV and DV.

If both those conditions are true, you can then infer that there is a correlation between the omitted variable and the dependent variable and the nature of the correlation, as you indicate. I could include that in the table, but it is redundant information.

We’re thinking along the same lines and portraying the same overall picture. Alas, I’d need to use a three dimensional matrix to portray those three conditions! Fortunately, using the two conditions that I show in the table, we can still determine the direction of bias. And you could use those two relationships to determine the relationship between the omitted variable and dependent variable if you so wanted. However, that information doesn’t change our understanding of the direction of bias because it’s redundant with information already in the table.

Thanks for the great comment and it’s always beneficial thinking through these things using a different perspective!

' src=

August 14, 2020 at 3:00 am

Thank you for the intuitive explanation, Jim! I would like to ask a query. Suppose i have two groups-one with a recently diagnosed lung disease and another with chronic lung disease where i would like to do an independent t-test for the amount of lung damage. It happens that the two groups also significantly differ in their mean age. The group with recently diagnosed disease has a lesser mean age than the group with chronic disease. Also theory says Age can cause some damage in lung as a normal course too. So if i include age as a covariate in the model, wont it regress out the effect of DV and give underestimated effect as the IV (age) significantly correlates with DV (lung damage)? How do we address this confounding effect of correlation between only IV and DV? Should it be by having a control group without lung disease? If so can one control group help? Or should there be 2 control groups with age-matching to the two study groups? Thank you in advance.

August 15, 2020 at 3:46 pm

Hi Vineeth,

First, yes, if you know age is a factor, you should include it as a covariate in the model. It won’t “regress out” the true effect between the two groups. I would think of it a little differently.

You have two groups and you suspect that something caused those two groups to have differing amounts of lung damage. You also know that age plays a role. And those groups have different ages. So, if you look only at the groups without factoring in age, the effect of age is still present but the model is incorrectly attributing it to the groups. In your case, it will make the effect look larger.

When you include age, yes, it will reduce the effect size between the groups, but it’s reveal the correct effect by accounting for age. So, yes, in your cases, it’ll make the group difference look smaller, but don’t think of it as “regressing out” the effect but instead it is removing the bias in the other results. In other words, you’re improving the quality of your results.

When you look at your model results for say the grouping variable, it’s already controlling for the age variable. So, you’re left with what you need, just the effect between the IV and DV that is accounted for by another variable in the model, such as age. That’s what you need!

A control group for any experiment is always a good idea if you can manage one. However, it’s not always possible. I write about these experimental design issues, randomized experiments, observational studies, how to design a good experiment, etc. among other topics in my Introduction to Statistics ebook , which you might consider. It’s also just now available in print on Amazon !

' src=

August 12, 2020 at 7:04 am

I was wondering whether it’s correct to check the correlation between the independent variables and the error term in order to check for endogeneity. If we assume that there is endogeneity then the estimated errors aren’t correct and so the correlation between the independent variables and those errors doesn’t say much. Am I missing something here?

best regards,

' src=

July 15, 2020 at 1:57 pm

I wanted to look at the effects of confounders on my study but I’m not sure what analysis(es) to use for dichotomous covariates. I have one categorical iv with two levels, two continuous dvs, and then the two dichotomous confounding variables. It was hard to finds information for categorical covariates online. Thanks in advance Jim!

' src=

May 8, 2020 at 10:04 am

Thank you for your nice blog. I have still a question. Let’s say I want to determine the effect of one independent variable on a dependent variable with a linear regression analysis. I have selected a number of potential variables for this relationship based on literature, such as age, gender, health status and education level. How can I check (with statistical analyses) if these are indeed confounders? I would like to know for which of them I should control for in my linear regression analysis. Can I create a correlationmatrix beforehand to see if the potential confounder is both correlated with my independent and dependent variable? And what threshold for the correlation coefficient should be taken here? Is this every correlation coefficient except zero (for instance 0.004? Are there scientific articles/books that endorce this threshold? Or is it maybe better to use a “change-in-estimate” criterion to see if my regression coefficient changes with a particular size after adding my potential confounder in the linear regression model? What would be the threshold here?

I hope my question is clear. Thanks in advance!

' src=

April 29, 2020 at 2:47 am

thanks for a wonderful website! I love your example with the bone density which does not appear to be correlated to physical activity if looked at alone, and needs to have the weight added as explanatory variable to make both of them appear as significantly correlated with bone density. I would love to use this example in my class, as I think it is very important to understand that there are situations where a single-parameter model can lead you badly astray (here into thinking activity is not correlated with bone density). Of course, I could make up some numbers for my students, but it would be even nicer if I could give them your real data. Could you by any chance make a file of real measurements of bone densities, physical activity and weight available? I would be very grateful, and I suppose a lot of other teachers/students too!

best regards Martin

April 30, 2020 at 5:06 pm

When I wrote this post, I wanted to share the data. Unfortunately, it seems like I no longer have it. If I uncover it, I’ll add it to the post.

' src=

February 8, 2020 at 1:45 pm

The work you have done is amazing, and I’ve learned so much through this website. . I am at beginner level in SPSS and I would be grateful if you could answer my question. I have found that a medical treatment results in worse quality of life. But I know from crosstabs that people that are taking this treatment present more severe disease (continuous variable) that also correlates to quality of life. How can I test if it is treatment or severity that worsens quality of life?

February 8, 2020 at 3:16 pm

Hi Evangelia,

Thanks so much for your kind words, I really appreciate them! And, I’m glad my website has been helpful!

That’s a great question and a valid concern to have. Fortunately, in a regression model, the solution is very simple. Just include both the treatment and severity of the disease in the model as independent variables. Doing that allows the model to hold disease severity constant (i.e., controls for it) while it estimates the effect of the treatment.

Conversely, if you did not include severity of the disease in the model, and it correlates with both the treatment and quality of life, it is uncontrolled and will be a confounding variable. In other words, if you don’t include severity of disease, the estimate for the relationship between treatment and quality of life will be biased.

We can use the table in this post for estimating the direction of bias. Based on what you wrote, I’ll assume that the treatment condition and severity have a positive correlation. Those taking the treatment present a more severe disease. And, that the treatment condition has a negative correlation with quality of life. Those on the treatment have a lower quality of life for the reasons you indicated. That puts us in the top-right quadrant of the table, which indicates that if you do not include severity of disease as an IV, the treatment effect will be underestimated.

Again, simply by including disease severity in your model will reduce the bias!

' src=

December 7, 2019 at 7:32 pm

Just a question about what you said about power. Will adding more independent variables to a regression model cause a loss of power? (at a fixed sample size). Or does it depend on the type of independent variable added: confounder vs. non confounder.

' src=

November 1, 2019 at 8:54 pm

you mention “Suppose you have a regression model with two significant independent variables, X1 and X2. These independent variables correlate with each other and the dependent variable” How is possible for two random variables (in this case the two factors) to correlate with each other if they are independent? If two random variables are independent then covariance is zero and therefore correlaton is zero.

Corr(X1,X2)=Cov(X1, X2)/(sqrt(var(X1))*sqrt(var(X2))) Cov(X1,X2)=E[X1*X2]-E[X1]*E[X2] if X1 and X2 are independent then E[X1*X2]=E[X1]*E[X2] and therefore covariance is zero.

November 4, 2019 at 9:07 am

Ah, there’s a bit of confusion here. The explanatory variables in a regression model are often referred to as independent variables, as well as predictors, x-variables, inputs, etc. I was using “independent variable” as the name. You’re correct, if they were independent in the sense that you describe them, there would be no correlation. Ideally, there would be no correlation between them in a regression model. However, they can, in fact, be correlated. If that correlation is too strong, it will cause problems with the model.

“Independent variable” in the regression context refers to the predictors and describes their ideal state. In practice, they’ll often have some degree of correlation.

I hope this helps!

' src=

April 8, 2019 at 12:33 pm

Ah! Enlightenment!

I had taken your statement about the correlation of the independent variable with the residuals to be a statement about computed value of the correlation between them, that is, that cor(X1, resid) was nonzero. I believe that (in a model with a constant term), this is impossible.

But I think I get now that that you were using the term more loosely, referring to a (nonlinear) pattern appearing between the values of X1 and the corresponding residuals, in the same way as you would see a parabolic pattern in a scatterplot of residuals versus X if you tried to make a linear fit of quadratic data. The linear correlation between X and the residuals would still compute out, numerically, to zero, so X1 and the residuals would would technically be uncorrelated, but they would not be statistically independent. If the residuals are showing a nonlinear pattern when plotted against X, look for a lurker.

The Albany example was very helpful. Thanks so much for digging it up!

April 8, 2019 at 8:38 am

Hi, Jim! Thanks very much for you speedy reply!

I appreciate the clarity that you aim for in your writing, and I’m sorry if I wasn’t clear in my post. Let me try again, being a bit more precise, hopefully without getting too technical.

My problem is that I think that the very process used in finding the OLS coefficients (like minimizing the sum squared error of the residuals) results in a regression equation that satisfies two properties. First, that the sum (or mean) of the resulting residuals is zero. Second, that for any regressor Xi, Xi is orthogonal to the vector of residuals, which in turn leads to the covariance of the residuals with any regressor having to be zero. Certainly, the true error terms need not sum to zero, nor need they be uncorrelated with a regressor…but if I understand correctly, these properties of the _residuals_ is an automatic consequence of fitting OLS to a data set, regardless of whether the actual error terms are correlated to the regressor or not.

I’ve found a number of sources that seem to say this–one online example is on page two here: https://www.stat.berkeley.edu/~aditya/resources/LectureSIX.pdf . I’ll be happy to provide others on request.

I’ve also generated a number of my own data sets with correlated regressors X1 and X2 and Y values generated by a X1 + b X2 + (error), where a and b are constants and (error) is a normally distributed error term of fixed variance, independently chosen for each point in the data set. In each case, leaving X2 out of the model still left me with zero correlation between X1 and the residuals, although there was a correlation between X1 and the true error terms, of course.

If I have it wrong, I’d love to see a data set that demonstrates what you’re talking about. If you don’t have time to find one (which I certainly understand), I’d be quite happy with any reference you might point me to that talks about this kind of correlation between residuals and one of the regressors in OLS, in any context.

Thanks again for your help, and for making regression more comprehensible to so many people.

Scott Stevens

April 8, 2019 at 10:59 am

Unfortunately, the analysis doesn’t fix all possible problems with the residuals. It is possible to specify models where the residuals exhibit various problems. You mention that residuals will sum to zero. However, if you specify a model without a constant, the residuals won’t necessarily sum to zero-read about that here . If you have a time series model, it’s possible to have autocorrelation in the residuals if you leave out important variables. If you specify a model that doesn’t adequately model curvature in the data, you’ll see patterns in the residuals.

In a similar vein, if you leave out an important variable that is correlated both with the DV and another IV in the model, you can have residuals that correlate with an IV. The standard practice is to graph the residuals by the independent variable to look for that relationship because it might have a curved shape which indicates a relationship but not necessarily a linear one that correlation would detect.

As for references, any regression textbook should cover this assumption. Again, it’ll refer to error, but the key is to remember that residuals are the proxy for error.

Here’s a reference from the University of Albany about Omitted Variable Bias that goes into it in more detail from the standpoint of residuals and includes an example of graphing the residuals by the omitted variable.

April 7, 2019 at 11:17 am

Hi, Jim. I very much enjoy how you make regression more accessible, and I like to use your approaches with my own students. I’m confused, though by the matter brought up by SFDude.

I certainly see how the _error_ term in a regression model will be correlated with an independent variable when a confounding variable is omitted, but it seems to me that the normal equations that define the regression coefficients assure that an independent variable in the model will always be uncorrelated with the _residuals_ of that model, regardless of whether an omitted confounding variable exists or not. Certainly, “X1 correlates with X2, and X2 correlates with the residuals. Ergo, variable X1 correlates with the residuals” would not hold for any three variables X1 and X2 and R. For example, if A and B are independent, then “A correlates with A + B, A + B correlates with B. Ergo, A correlates with B” is a false statement.

If I’m missing something here, I’d very much appreciate a data set that demonstrates the kind of correlation between an independent variable and the residuals of the model that it seems you’re talking about.

Thanks! Scott Stevens

April 7, 2019 at 6:28 pm

Thanks for writing. And, I’m glad to hear that you find my website helpful!

The key thing to remember is that while the OLS assumptions refer to the error, we can’t directly observe the true error. So, we use the residuals as estimates of the error. If the error is correlated with an omitted variable, we’d expect the residuals to be correlated as well in approximately the same manner. Omitted variable bias is a real condition, and that description is simply getting deep into the nuts and bolts of how it works. But, it’s the accepted explanation. You can read it in textbooks. While the assumptions refer to error, we can only assess the residuals instead. They’re the best we’ve got!

When you say A and B are “independent”, if you mean they are not correlated, I’d agree that removing a truly uncorrelated variable from the model does not cause this type of bias. I mention that in this post. This bias only occurs when independent variables are correlated with each other to some degree, and with the dependent variable, and you exclude one of the IVs.

I guess I’m not exactly sure which part is causing the difficulty? The regression equations can’t ensure that the residuals are not uncorrelated if the model is specified in such a way that it causes them to be correlated. It’s just like in time series regression models, you have to be on the look out for autocorrelation (correlated residuals) because the model doesn’t account for time-order effects. Incorrectly specified models can and do cause problems with the residuals, including residuals that are correlated with other variables and themselves.

I’ll have to see if I can find a dataset with this condition.

' src=

March 10, 2019 at 10:41 am

Hi Jim, I am involved in a study which involves looking into s number of clinical paramaters like platelet count and Haemogobin for patients who underwent emergency change of a mechanical circulatory support device due to thrombosis or clotting of the actual device. The purpose is to look if there is a trend in these parameters in the time frame of before 3 days and after 3 days of the change and establish if these parameters could be used as predictor of the event. My concern is that there is no control group for this study. But I dont see the need for looking into trend in a group which never had an event itself. Will not having a control group be considered as a weakness for this study? Also, what would be best statistical test for this. I was thinking of the generalized linear model. I would really appreciate your guidance here. Thank you

' src=

February 20, 2019 at 8:49 am

I’m looking at a published paper that develops clinical prediction rules by using logistic regression in order to help primary care doctors to decide who to refer to breast clinics for further investigation. The dependent variable is simply whether breast cancer is found to be present or not. The independent variables include 11 symptoms and age in (mostly) ten year increments (six separate age bands). The age bands were decided before the logistical regression was carried out. The paper goes on to use the data to create a scoring system based on symptoms and age. If this scoring system were to be used then above a certain score a woman would be referred, and below a certain score a woman would not be referred.

The total sample size is 6590 women referred to a breast clinic of which 320 were found to have breast cancer. The sample itself is very skewed. In younger women, breast cancer is rare and so some categories the numbers are very low. So for instance, in the 18-29 age band there are 62 women referred of whom 8 women have breast cancer, and in the 30-39 age band there are 755 women referred of which only one woman has breast cancer. So my first question is: if there are fewer individuals in particular categories than symptoms can the paper still use logistic regression to predict who to refer to a breast clinic based on a scoring system that includes both age and symptoms? My second question is: if there is meant to be at least 10 individuals per variable in logistic regression, are the numbers of women with breast cancer in these age groups too small for logistic regression to apply?

When I look at the total number of women in the sample (6590) and then the total number of symptoms (8616) there is a discrepancy. This means that some women have had more than one symptom recorded. (Or from the symptoms’ point of view, some women have been recorded more than once). So my third question is: does this mean that some of the independent variables are not actually independent of each other? (There is around a 30%-32% discrepancy in all categories. How significant is this?)

There are lots of other problems with the paper (the fact the authors only look at referred women rather than all the symptomatic women that a primary care doctor sees is a case in point) but I’d like to know whether the statistics are flawed too. If there are any other questions I need to ask about the data please do let me know.

With very best wishes,

Ms Susan Mitchell

February 20, 2019 at 11:23 pm

Offhand, I don’t see anything that screams to me that there is a definite problem. I’d have to read the study to be really sure. Here’s some thoughts.

I’m not in the medical field, but I’ve heard talks by people in the that field and it sounds like this is a fairly common use for binary logistic regression. The analyst creates a model where you indicate which characteristics, risk factors, etc apply to an individual. Then, the model predicts the probability of an outcome for them. I’ve seen similar models for surgical success, death, etc. The idea is that it’s fairly easy to use because some can just enter the characteristics of the patient and the model spits out a probability. For any model of this type, you’d really have to check the residuals and see all the output to determine how well the model fits the data. But, there’s nothing inherently wrong with this approach.

I don’t see a problem with the sample size (6590) and the number of IVs (12). That’s actually a very good ratio of observations per IV.

It’s ok that there are fewer individuals in some categories. It’s better if you have a fairly equal number but it’s not a show stopper. Categories with fewer observations will have less precise estimates. It can potentially reduce the precision of model. You’d have to see how well the model fit the data to really know how well it works out. But, yes, if you have an extremely low number of individuals that have a particular symptom, you won’t get as precise of an estimate for that symptoms effect. You might see a wider CI for its odds ratio. But, it’s hard to say without seeing all of that output and how the numbers by symptoms. And, it’s possible that they selected the characteristics that apply to a sufficient number of women. Again, I wouldn’t be able to say. It’s an issue to consider for sure.

As for the number of symptoms versus the number of women, it’s ok that a woman can have more than one symptom. Each symptom is in it’s own column and will be coded with a 1 or 0. A row corresponds to one woman and she’ll have a 1 for each characteristic that she has and 0s for the ones that she does not have. It’s possible these symptoms are correlated. These are categorical variables, so you couldn’t use Pearson’s correlation. You’d need to use something like the chi-square test of independence. And, some correlation is okay. Only very high correlation would be problematic. Again, I can’t say whether that’s a problem in this study or not because it depends on the degree of correlation. It might be, but it’s not necessarily a problem. You’d hope that the study strategically included a good set of IVs that aren’t overly correlated.

Regarding the referred women vs symptomatic women, that comes down to the population that is being sampled and how generalizeable the results are. Not being familiar with the field, I don’t have a good sense for how that affects generalizability, but yes that would be a concern to consider.

So, I don’t see anything that shouts to me that it’s a definite problem. But, as with any regression model, it would come down to the usual assessments of how well the model fits the data. You mention issues that could be concerns, but again, it depends on the specifics.

Sorry I couldn’t provide more detailed thoughts but evaluating these things requires real specific information. But, the general approach for this study seems sound to me.

' src=

February 17, 2019 at 3:48 pm

I have a question, how well can we evaluate a regression equation “fits” the data by examing the R Square statistic, and test for statistical significance of the whole regression equation using the F-Test?

February 18, 2019 at 4:56 pm

I have two blog posts that will be perfect for you!

Interpreting R-squared Interpreting the F-test of Overall Significance

If you have questions about either one, please post it in the comments section of the corresponding post. But, I think those posts will go a long way in answering your questions!

' src=

January 18, 2019 at 7:00 pm

Mr. Frost I know I need to run a regression model however I’m still unsure of which one. I’m examining the effects of alcohol use on teenagers with 4 confounders.

January 19, 2019 at 6:47 pm

Hi Dahlia, to make the decision, I’d need to know what types of variables they all are (continuous, categorical, binary, etc). However, if the effect of alcohol is a continuous variable, then OLS linear regression is a great place to start!

Best of luck with your analysis!

' src=

January 5, 2019 at 2:39 am

Thank you very much Jim,

Very helpful, I think my problem is really on the number of observation (25 obs). Yes, I have read that post also, and I always keep the theory in mind when analyzing the IVs.

My main objective is to show the existing relationship between X2 and Y, which is also supported by literature, however, if I do not control for X1 I will never be sure that the effect I have found is due to X2 or X1, because X1 and X2 are correlated.

I think only correlation would be ok, since my number of observation are limited and by using regression it limits me about the number of IVs to be included in the model also, which may make me leave out of the model some others IVs, which is also bad.

Thank you again

Best regards!

January 4, 2019 at 9:40 am

Thank you for this very good post.

However, I have a question. What to do if the (IV) X1 and X2 are correlated (says 0.75) and both are correlated to Y (DV) at 0.60. However, when include X1 and X2 in the same model X2 is not statistically significant, but when put separably they become statistically significant. On the other hand, the model with only X1 has higher explanatory power than the model with only X2.

Note: In individual model both meet the OLS assumptions but, together, X2 become not statistically significant (using stepwise regression X2 is removed from the model), what this means. In addition, I know from the literarture that X2 affects Y, but I am testing X1, and X1 is showing better fits that X2.

Thank you in advance, I hope you understand my question!

January 4, 2019 at 3:15 pm

Yes, I understand completely! This situation isn’t too unusual. The underlying problem is that because the two IVs are correlated, they’re supplying a similar type of predictive information. There isn’t enough unique predictive information for both of them to be statistically significant. If you had a larger sample size, it’s possible that both would significant. Also, keep in mind that correlation is a pairwise measure and doesn’t account for other variables. When you include both IVs in the model, the relationship between each IV and the DV is determined after accounting for the other variables in the model. That’s why you can see a pairwise correlation but not a relationship in a regression model.

I know you’ve read a number of my posts, but I’m not sure if you’ve read the one about model specification. In that post, a key point I make is not to use statistical measures alone to determine which IVs to leave in the model. If theory suggests that X2 should be included, you have a very strong case for including it even if it’s not significant when X1 is in the model–just be sure to include that discussion in your write-up.

Conversely, just because X2 seems to provide a better fit statistically and is significant with or without X1 doesn’t mean you must include it in the model. Those are strong signs that you should consider including a variable in the model. However, as always, use theory as a guide and document the rational for the decisions you make.

For your case, you might consider include both IVs in the model. If they’re both supplying similar information and X2 is justified by theory, chances are that X1 is as well. Again, document your rationale. If you include both, check the VIFs to be sure that you don’t have problematic levels of multicollinearity when you include both IVs. If those are the only two IVs in your model, that won’t be problematic given the correlations you describe. But, it could be problematic if you more IVs in the model that are also correlated to X1 and X2.

Another thing to look at is whether the coefficients for X1 and X2 vary greatly depending on whether you have one or both of the IVs in the model. If they don’t change much, that’s nice and simple. However, if they do change quite a bit, then you need to determine which coefficient values are likely to be closer to the correct value because that corresponds to the choice about which IVs to include! I’m sounding like a broken record, but if this is a factor, document your rational and decisions.

I hope that helps! Best of luck with your analysis!

' src=

November 28, 2018 at 11:30 pm

Another great post! Thank you for truly making statistics intuitive. I learned a lot of this material back in school, but am only now understanding them more conceptually thanks to you. Super useful for my work in analytics. Please keep it up!

November 29, 2018 at 8:54 am

Thanks, Patrick! It’s great to hear that it was helpful!

' src=

November 12, 2018 at 12:54 pm

I think there may be a typo here – “These are important variables that the statistical model does include and, therefore, cannot control.” Shouldn’t it be “does not include”, if I understand correctly?

November 12, 2018 at 1:19 pm

Thanks, Jayant! Good eagle eyes! That is indeed a typo. I will fix it. Thanks for pointing it out!

' src=

November 3, 2018 at 12:07 pm

Mr. Jim thank you for making me understand econometrics. I thought that omitted variable is excluded from the model and that why they under/overestimate the coefficients. Somewhere in this article you mentioned that they are still included in the model but not controlled for. I find that very confusing, would you be able to clarify ? Thanks a lot.

November 3, 2018 at 2:26 pm

You’re definitely correct. Omitted variable bias occurs when you exclude a variable from the model. If I gave the impression that it’s included, please let me know where in the text because I want to clarify that! Thanks!

By excluding the variable, the model does not control for it, which biases the results. When you include a previously excluded variable, the model can now control for it and the bias goes away. Maybe I wrote that in a confusing way?

Thanks! I always strive to make my posts as clear as possible, so I’ll think about how to explain this better.

September 28, 2018 at 4:31 pm

In addition to mean square error, adj R-squared, I use Cp, IC, HQC, and SBIC to decide the number of dependent variables in multiple regression.

September 28, 2018 at 4:39 pm

I think there are a variety of good measures. I’d also add predicted R-squared–as long as you use them in conjunction with subject-area expertise. As I mention in this post, the entire set of estimate relationships must make theoretical sense. If they don’t, the statistical measures are not important.

September 28, 2018 at 4:13 pm

i have to read the article you named. Having said that, caution should be given when regression models model systems or processes not in statistical control. Also, some processes have physical bounds that a regression model does not capture and calculated predicted values have no physical meaning. Further, models from narrow ranges of independent variables may not be applicable outside the ranges of the independent variables.

September 28, 2018 at 4:19 pm

Hi Stan, those are all great points, and true. They all illustrate how you need to use your subject-area knowledge in conjunction with statistical analyses.

I talk about the issue of not going outside the range of the data, amongst other issues, in my post about Using Regression to Make Predictions .

I also agree about statistical control, which I think is under appreciated outside of the quality improvement arena. I’ve written about this in a post about using control charts with hypothesis tests .

September 28, 2018 at 2:30 pm

Valid confidence/prediction intervals are important if the regression model represents a process that is being characterized. When the prediction intervals are wide or too wide, the model’s validity and utility are in question.

September 28, 2018 at 2:49 pm

You’re definitely correct! If the model doesn’t fit the data, your predictions are worthless. One minor caveat that I’d add to your comment.

The prediction intervals can be too wide to be useful yet the model might still be valid. It’s really two separate assessments. Valid model and degree of precision. I write about this in several posts including the following: Understanding Precision in Prediction

September 26, 2018 at 9:13 am

Jim, does centering any independent explanatory variable require centering them all? Center the dependent and explanatory variables? I always make a normal probability plot of the deleted residuals as one test of the prediction capability of the fitted model. It is remarkable how good models give good normal probability plots. I also use the Shapiro-Wilks test to assess the deleted variables for normality. Stan Alekman

September 26, 2018 at 9:46 am

Yes, you should center all of the continuous independent variables if your goal is to reduce multicollinearity and/or to be able to interpret the intercept. I’ve never seen a reason to center the dependent variable.

It’s funny that you mention that about normally distributed residuals! I, too, have been impressed with how frequently that occurs even with fairly simple models. I’ve recently written a post about OLS assumptions and I mention how normal residuals are sort of optional. They only need to be normally distributed if you want to perform hypothesis tests and have valid confidence/prediction intervals. Most analysts want at least the hypothesis tests!

' src=

September 25, 2018 at 2:32 am

Hey Jim,your blogs are really helpful for me to learn data science.Here is my question in my assignment:

You have built a classification model with 90% accuracy but your client is not happy because False Positive rate was very high then what will you do? Can we do something to it by precision or recall??

this is the question..nothing is given in the background

though they should have given!

' src=

September 25, 2018 at 1:20 am

Thank you Jim Really interesting

September 25, 2018 at 1:26 am

Hi Brahim, you’re very welcome! I’m glad it was interesting!

' src=

September 24, 2018 at 10:30 pm

Hey Jim, you are awesome.

September 24, 2018 at 11:04 pm

Aw, MG, thanks so much!! 🙂

' src=

September 24, 2018 at 10:59 am

Thanks for another great article, Jim!.

Q: Could you expand with a specific plot example to explain more clearly, this statement: “We know that for omitted variable bias to exist, an independent variable must correlate with the residuals. Consequently, we can plot the residuals by the variables in our model. If we see a relationship in the plot, rather than random scatter, it both tells us that there is a problem and points us towards the solution. We know which independent variable correlates with the confounding variable.”

Thanks! SFdude

September 24, 2018 at 11:48 am

Hi, thanks!

I’ll try to find a good example plot to include soon. Basically, you’re looking for any non-random pattern. For example, the residuals might tend to either increase or decrease as the value of independent variable increases. That relationship can follow a straight line or display curvature, depending on the nature of relationship.

' src=

September 24, 2018 at 1:37 am

It’s been a long time I heard from you Jim . Missed your stats

September 24, 2018 at 9:53 am

Hi Saketh, thanks, you’re too kind! I try to post here every two weeks at least. Occasionally, weekly!

Comments and Questions Cancel reply

Confounding Variable: Simple Definition and Example

Design of Experiments > Confounding Variable

What is a Confounding Variable?

A confounding variable is an “extra” variable that you didn’t account for. They can ruin an experiment and give you useless results. They can suggest there is correlation when in fact there isn’t. They can even introduce bias . That’s why it’s important to know what one is, and how to avoid getting them into your experiment in the first place.

Confounding variable

In an experiment, the independent variable typically has an effect on your dependent variable . For example, if you are researching whether lack of exercise leads to weight gain, then lack of exercise is your independent variable and weight gain is your dependent variable. Confounding variables are any other variable that also has an effect on your dependent variable. They are like extra independent variables that are having a hidden effect on your dependent variables. Confounding variables can cause two major problems:

  • Increase variance
  • Introduce bias .

Let’s say you test 200 volunteers (100 men and 100 women). You find that lack of exercise leads to weight gain. One problem with your experiment is that is lacks any control variables . For example, the use of placebos , or random assignment to groups. So you really can’t say for sure whether lack of exercise leads to weight gain. One confounding variable is how much people eat. It’s also possible that men eat more than women; this could also make sex a confounding variable. Nothing was mentioned about starting weight, occupation or age either. A poor study design like this could lead to bias. For example, if all of the women in the study were middle-aged, and all of the men were aged 16, age would have a direct effect on weight gain. That makes age a confounding variable.

Confounding Bias

Technically, confounding isn’t a true bias , because bias is usually a result of errors in data collection or measurement. However, one definition of bias is “…the tendency of a statistic to overestimate or underestimate a parameter”, so in this sense, confounding is a type of bias.

Confounding bias is the result of having confounding variables in your model. It has a direction, depending on if it over- or underestimates the effects of your model:

  • Positive confounding is when the observed association is biased away from the null. In other words, it overestimates the effect.
  • Negative confounding is when the observed association is biased toward the null. In other words, it underestimates the effect.

How to Reduce Confounding Variables

Make sure you identify all of the possible confounding variables in your study. Make a list of everything you can think of and one by one, consider whether those listed items might influence the outcome of your study. Usually, someone has done a similar study before you. So check the academic databases for ideas about what to include on your list. Once you have figured out the variables , use one of the following techniques to reduce the effect of those confounding variables:

  • Bias can be eliminated with random samples .
  • Introduce control variables to control for confounding variables. For example, you could control for age by only measuring 30 year olds.
  • Within subjects designs test the same subjects each time. Anything could happen to the test subject in the “between” period so this doesn’t make for perfect immunity from confounding variables.
  • Counterbalancing can be used if you have paired designs. In counterbalancing, half of the group is measured under condition 1 and half is measured under condition 2.

Related Articles:

Age Graded Influences Confounding by Indication History Graded Influences Nonnormative Influences

Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences , Wiley. Everitt, B. S.; Skrondal, A. (2010), The Cambridge Dictionary of Statistics , Cambridge University Press. Smith, G. Essential Statistics, Regression, and Econometrics 2nd Edition. Academic Press, 2015.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

1.4.1 - confounding variables.

Randomized experiments are typically preferred over observational studies or experimental studies that lack randomization because they allow for more control. A common problem in studies without randomization is that there may be other variables influencing the results. These are known as  confounding variables . A confounding variable is related to both the explanatory variable and the response variable. 

Characteristic that varies between cases and is related to both the explanatory and response variables; also known as a  lurking variable  or a  third variable

Example: Ice Cream & Home Invasions Section  

There is a positive relationship between ice cream sales and home invasions (i.e., as ice cream sales increase throughout the year so do home invasions). It is clear that increases in ice cream sales do not cause home invasions to increase, and home invasions do not cause an increase in ice cream sales. There is a third variable at play here: outdoor temperature. When the weather is warmer both ice cream sales and home invasions increase. In this case, outdoor temperature is a  confounding variable  because it is related to both ice cream sales and home invasions. 

Example: Weight & Preferred Beverage Section  

Research question: Do adults who prefer to drink beer, wine, and water differ in terms of their mean weights?

Data were collected from a sample of World Campus students to address the research question above. The researchers found that adults who preferred beer tended to weigh more than those who preferred wine.

A confounding variable in this study was gender identity. Those who identified as men were more likely to prefer beer and those who identified as women were more likely to prefer wine. In the sample, men weighed more than women on average.

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Confounding Variables in Psychology Research

Getty Images / Andrew Brookes

  • Real World Examples

Confounding variables are external factors (typically a third variable) in research that can interfere with the relationship between dependent and independent variables .

At a Glance

A confounding variable alters the risk of the condition being studied and confuses the “true” relationship between the variables. The role of confounding variables in research is critical to understanding the causes of all kinds of physical, mental, and behavioral conditions and phenomena.

Real World Examples of Confounding Variables

Typical examples of confounding variables often relate to demographics and social and economic outcomes.

For instance, people who are relatively low in socioeconomic status during childhood tend to do, on average, worse financially than others do when they reach adulthood, explains Glenn Geher , PhD, professor of psychology at State University of New York at New Paltz and author of “Own Your Psychology Major!” While he said we could simply think this because poverty begets poverty, he also says there are other variables that are conflated with poverty.

People with lower economic means tend to have less access to high quality education, which is also related to fiscal success in adulthood, Geher explained. Furthermore, poverty is often associated with limited access to healthcare and, thus, with increased risk of adverse health outcomes. These factors can also play roles in fiscal success in adulthood.

“The bottom line here is that when looking to find factors that predict adult economic success, there are many variables that predict this outcome, and so many of these factors are confounded with one another,” Geher said. 

The Impact of Confounding Variables on Research

Psychology researchers must be diligent in controlling for confounding variables, because if they are not, they may draw inaccurate conclusions.

For example, during a research project, Geher’s team found the number of stitches one received in childhood predicted one’s sexual activity in adulthood.

However, Geher said "to conclude that getting stitches causes promiscuous behavior would be unwarranted and odd. In fact, it is much more likely that childhood health outcomes, such as getting stitches, predicts environmental instability during childhood, which has been found to indirectly bear on adult sexual and relationship outcomes,” said Geher.

In other words, the number of stitches is confounded with environmental instability in childhood. It's not that the number of stitches is directly correlated with sexual activity.

Another example that shows confounding variables is the idea that there is a positive correlation between ice cream sales and homicide rates. However, in fact, both these variables are confounded with time of year, said Geher. “They are both higher in summer when days are longer, days are hotter, and people are more likely to encounter others in social contexts because in the winter when it is cold people are more likely to stay home—so they are less likely to buy ice cream cones and to kill others,” he said. 

Both of these are examples of how it is in the best interest of researchers to ensure that they control for confounding variables to increase the likelihood that their conclusions are truly warranted.

Universal confounding variables across research on a particular topic can also be influential. In an evaluation of confounding variables that assessed the effect of alcohol consumption on the risk of ischemic heart disease, researchers found a large variation in the confounders considered across observational studies.

While 85 of 87 studies that the researchers analyzed made a connection to alcohol and ischemic heart disease, confounding variables that could influence ischemic heart disease included, smoking, age, and BMI, height, and/or weight. This means that these factors could have also affected heart disease, not just alcohol.

While most studies mentioned or alluded to “confounding” in their Abstract or Discussion sections, only one stated that their main findings were likely to be affected by confounding variables. The authors concluded that almost all studies ignored or eventually dismissed confounding variables in their conclusions.

Because study results and interpretations may be affected by the mix of potential confounders included within models, the researchers suggest that “efforts are necessary to standardize approaches for selecting and accounting for confounders in observational studies.”

Techniques to Identify Confounding Variables

The best way to control for confounding variables is to conduct “true experimental research,” which means researchers experimentally manipulate a variable that they think causes a certain outcome. They typically do this by randomly assigning study participants to different levels of the first variable, which is referred to as the “independent variable.”

For example, if researchers want to determine if, separate from other factors, receiving a full high-quality education, including a four-year college degree from a respected school, causes positive fiscal outcomes in adulthood, they would need to find a pool of participants, such as a group of young adults from the same broad socioeconomic group as one another. Once the group is selected, half of them would need to be randomly assigned to receive a free, high-quality education and the other half would need to be randomly assigned to not receive such an education.

“This methodology would allow you to see if there are fiscal outcomes on average for the two groups later in life and, if so, you could reasonably conclude that the cause of the differential fiscal outcomes is found in the educational differences across the two groups,” said Geher. “You can draw this conclusion because you randomly assigned the participants to these different groups—and process that naturally controls for confounding variables.” 

However, with this process, different problems emerge. For instance, it would not be ethical or practical to randomly assign some participants to a “high-quality education” group and others to a “no-education” group.

“[Controlling] confounding variables via experimental manipulation is not always feasible,” Geher said. 

Because of this, there are also statistical ways to try to control for confounding variables, such as “partial correlation,” which looks at a correlation between two variables (e.g., childhood SES and adulthood SES) while factoring out the effects of a potential confounding variable (e.g., educational attainment).

However, statistical control that addresses confounding by measurement can point to confounding through inappropriate control.

“This statistically oriented process is definitely not considered the gold standard compared with true experimental procedures, but often, it is the best you can do given ethical and/or practical constraints,” said Geher.

The Importance of Addressing Confounding Variables in Research

Controlling for confounding variables is critical in research primarily because it allows researchers to make sure that they are drawing valid and accurate conclusions. 

“If you don’t correct for confounding variables, you put yourself at risk for drawing conclusions regarding relationships between variables that are simply wrong (at the worst) or incomplete (at the best),” said Geher.

Controlling for confounding variables includes a basic set of skills when it comes to the social and behavioral sciences, he added. 

The Role of Confounding Variables in Valid Research

Human behavior is highly complex and any single action often has a broad array of variables that underlie it. 

“Understanding the concept of confounding variables, as well as how to control for these variables, makes for better behavioral science with conclusions that are, simply, more valid that research that does not effectively take confounding variables into account,” Geher said.

Wallach JD, Serghiou S, Chu L, et al. Evaluation of confounding in epidemiologic studies assessing alcohol consumption on the risk of ischemic heart disease. BMC Med Res Methodol. 2020;20(1):64. https://doi.org/10.1186/s12874-020-0914-6

Pourhoseingholi MA, Baghestani AR, Vahedi M. How to control confounding effects by statistical analysis. Gastroenterol Hepatol Bed Bench. 2012;5(2):79-83. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4017459/

By Cathy Cassata Cathy Cassata is a freelance writer who specializes in stories around health, mental health, medical news, and inspirational people.

What is a Confounding Variable? (Definition & Example)

In any experiment, there are two main variables:

The independent variable:  the variable that an experimenter changes or controls so that they can observe the effects on the dependent variable.

The dependent variable:  the variable being measured in an experiment that is “dependent” on the independent variable.

confounding variable in experiment example

Researchers are often interested in understanding how changes in the independent variable affect the dependent variable.

However, sometimes there is a third variable that is not accounted for that can affect the relationship between the two variables under study.

Confounding variable

This type of variable is known as a confounding variable and it can  confound the results of a study and make it appear that there exists some type of cause-and-effect relationship between two variables that doesn’t actually exist.

Confounding variable: A variable that is not included in an experiment, yet affects the relationship between the two variables in an experiment.   This type of variable can confound the results of an experiment and lead to unreliable findings.

For example, suppose a researcher collects data on ice cream sales and shark attacks and finds that the two variables are highly correlated. Does this mean that increased ice cream sales cause more shark attacks?

That’s unlikely. The more likely cause is the confounding variable temperature . When it is warmer outside, more people buy ice cream and more people go in the ocean.

Example of confounding variable

Requirements for Confounding Variables

In order for a variable to be a confounding variable, it must meet the following requirements:

1. It must be correlated with the independent variable.

In the previous example, temperature was correlated with the independent variable of ice cream sales. In particular, warmer temperatures are associated with higher ice cream sales and cooler temperatures are associated with lower sales.

2. It must have a causal relationship with the dependent variable.

In the previous example, temperature had a direct causal effect on the number of shark attacks. In particular, warmer temperatures cause more people to go into the ocean which directly increases the probability of shark attacks occurring.

Why Are Confounding Variables Problematic?

Confounding variables are problematic for two reasons:

1. Confounding variables can make it seem that cause-and-effect relationships exist when they don’t.

In our previous example, the confounding variable of temperature made it seem like there existed a cause-and-effect relationship between ice cream sales and shark attacks.

However, we know that ice cream sales don’t cause shark attacks. The confounding variable of temperature just made it seem this way.

2. Confounding variables can mask the true cause-and-effect relationship between variables.

Suppose we’re studying the ability of exercise to reduce blood pressure. One potential confounding variable is starting weight, which is correlated with exercise and has a direct causal effect on blood pressure.

confounding variable in experiment example

While increased exercise may lead to reduced blood pressure, an individual’s starting weight also has a big impact on the relationship between these two variables.

Confounding Variables & Internal Validity

In technical terms, confounding variables affect the  internal validity of a study, which refers to how valid it is to attribute any changes in the dependent variable to changes in the independent variable.

When confounding variables are present, we can’t always say with complete confidence that the changes we observe in the dependent variable are a direct result of changes in the independent variable.

How to Reduce the Effect of Confounding Variables

There are several ways to reduce the effect of confounding variables, including the following methods:

1. Random Assignment

Random assignment refers to the process of randomly assigning individuals in a study to either a treatment group or a control group.

For example, suppose we want to study the effect of a new pill on blood pressure. If we recruit 100 individuals to participate in the study then we might use a random number generator to randomly assign 50 individuals to a control group (no pill) and 50 individuals to a treatment group (new pill).

By using random assignment, we increase the chances that the two groups will have roughly similar characteristics, which means that any difference we observe between the two groups can be attributed to the treatment.

This means the study should have internal validity  – it’s valid to attribute any differences in blood pressure between the groups to the pill itself as opposed to differences between the individuals in the groups.

2. Blocking

Blocking refers to the practice of dividing individuals in a study into “blocks” based on some value of a confounding variable to eliminate the effect of the confounding variable.

For example, suppose researchers want to understand the effect that a new diet has on weight less. The independent variable is the new diet and the dependent variable is the amount of weight loss.

However, a confounding variable that will likely cause variation in weight loss is gender . It’s likely that the gender of an individual will effect the amount of weight they’ll lose, regardless of whether the new diet works or not.

One way to handle this problem is to place individuals into one of two blocks: 

Then, within each block we would randomly assign individuals to one of two treatments:

  • A standard diet

By doing this, the variation within each block would be much lower compared to the variation among all individuals and we would be able to gain a better understanding of how the new diet affects weight loss while controlling for gender.

3. Matching

A matched pairs design is a type of experimental design in which we “match” individuals based on values of potential confounding variables.

For example, suppose researchers want to know how a new diet affects weight loss compared to a standard diet. Two potential confounding variables in this situation are  age and  gender .

To account for this, researchers recruit 100 subjects, then group the subjects into 50 pairs based on their age and gender. For example:

  • A 25-year-old male will be paired with another 25-year-old male, since they “match” in terms of age and gender.
  • A 30-year-old female will be paired with another 30-year-old female since they also match on age and gender, and so on.

Then, within each pair, one subject will randomly be assigned to follow the new diet for 30 days and the other subject will be assigned to follow the standard diet for 30 days.

At the end of the 30 days, researchers will measure the total weight loss for each subject.

confounding variable in experiment example

By using this type of design, researchers can be confident that any differences in weight loss can be attributed to the type of diet used rather than the confounding variables age and  gender .

This type of design does have a few drawbacks, including:

1. Losing two subjects if one drops out.  If one subject decides to drop out of the study, you actually lose two subjects since you no longer have a complete pair.

2. Time-consuming to find matches . It can be quite time-consuming to find subjects who match on certain variables, such as gender and age.

3. Impossible to match subjects perfectly . No matter how hard you try, there will always be some variation within the subjects in each pair.

However, if a study has the resources available to implement this design it can be highly effective at eliminating the effects of confounding variables.

When to Use a Chi-Square Test (With Examples)

How to create a normal probability plot in excel (step-by-step), related posts, how to normalize data between -1 and 1, vba: how to check if string contains another..., how to interpret f-values in a two-way anova, how to create a vector of ones in..., how to find the mode of a histogram..., how to find quartiles in even and odd..., how to determine if a probability distribution is..., what is a symmetric histogram (definition & examples), how to calculate sxy in statistics (with example), how to calculate sxx in statistics (with example).

Enago Academy

Demystifying the Role of Confounding Variables in Research

' src=

In the realm of scientific research, the pursuit of knowledge often involves complex investigations, meticulous data collection , and rigorous statistical analysis . Achieving accurate and reliable results is paramount in research. Therefore researchers strive to design experiments and studies that can isolate and scrutinize the specific variables which they aim to investigate. However, some hidden factors can obscure the true relationships between variables and lead to erroneous conclusions. These covert culprits are known as confounding variables, which in their elusive nature, has the potential to skew results and hinder the quest for truth.

Table of Contents

What Are Confounding Variables

Confounding variables, also referred to as confounders or lurking variables, are the variables that affect the cause and outcome of a study. However, they are not the variables of primary interest. They serve as an unmeasured third variable that acts as an extraneous factor. Furthermore, it interferes with the interpretation of the relationship between the independent and dependent variables within a study. Confounding variables in statistics can be categorical, ordinal, or continuous. Some common types of confounding include Selection bias, Information bias, Time-related confounding, Age-related confounding etc.

Additionally, in the world of scientific inquiry, the term “confounding bias”, is used to describe a systematic error or distortion in research findings that occurs when a confounder is not properly accounted for in a study. This can lead to misleading conclusions about the relationship between the independent variable(s) and the dependent variable, potentially introducing bias into the study’s results.

Key Characteristics of Confounding Variables

Key characteristics of confounding variables or confounding factors include:

Characteristics of confounding variables

Confounding factors can distort the relationship between independent and dependent variables in research. Thus, recognizing, controlling, and addressing them is essential to ensure the accuracy and validity of study findings.

Effects of Confounding Variables

Confounding variables play a crucial role in the internal validity of a research. Understanding its effects is necessary for producing credible, applicable, and ethically sound research.

Here are some impacts of confounding variables in research.

1. Lack of Attribution of Cause and Effect

  • Confounding variables can lead researchers to erroneously attribute causation where there is none.
  • This happens when a confounder is mistaken for the independent variable, causing researchers to believe that a relationship exists between variables even when it does not.

2. Overestimate or Underestimate Effects

  • Confounding variables can distort the magnitude and direction of relationships between variables.
  • Additionally, they can either inflate or diminish it, leading to inaccurate assessments of the true impact.
  • Furthermore, they can also hide genuine associations between variables.

3. Distort Results

  • Confounding variables can create false associations between variables.
  • In these cases, the observed relationship is driven by the confounder rather than any meaningful connection between the independent and dependent variables.
  • This distorts the relationship between the variables of interest, leading to incorrect conclusions.

4. Reduce Precision and Reliability

  • Confounding variables can introduce noise and variability in the data.
  • This can make it challenging to detect genuine effects or differences.
  • Furthermore, the results of a study may not generalize well to other populations or contexts as the impact of the confounders might be specific to the study sample or conditions.

5. Introduce Bias

  • If confounding variables are not properly addressed, the conclusions drawn from a study can be biased.
  • These biased conclusions can have real-world implications, especially in fields like medicine, public policy, and social sciences.
  • Studies plagued by confounding variables have reduced internal validity, which can hinder scientific progress and the development of effective interventions

6. Introduce Ethical Implications

  • In certain cases, failing to control for confounding variables can have ethical implications.
  • For instance, if a study erroneously concludes that a particular group is more prone to a disease due to a confounder, it may lead to stigmatization or discrimination.

Researchers must identify these variables and employ rigorous methods to account them for ensuring that their findings accurately reflect the relationships they intend to study.

Why to Account Confounding Variables

Accounting confounding variables is crucial in research as it helps the researchers to obtain more accurate results with a broader application. Furthermore, controlling confounders helps maintain internal validity and establishes causal relationships between variables.

Accounting confounding variables also provide a proper guidance to health interventions or policies and demonstrates scientific rigor and a commitment to producing high-quality, unbiased research . Also, researchers have an ethical responsibility to accurately report and interpret their findings.

Researchers must recognize the potential impact of confounders and take adequate steps to identify and measure them to control for their effects to ensure the integrity and reliability of their research findings.

How to Identify Confounding Variables

Recognizing confounding variables is a crucial step in research. Researchers can employ various strategies to identify potential confounders.

How to Identify Confounding Variables

Strategies to Control Confounding Variables

Controlling confounding variables can help researchers to establish a more robust research and employing appropriate strategies to mitigate them is necessary in establishing reliability and accuracy in research reporting.

1. Randomization

Randomly assigning subjects to experimental and control groups can help distribute confounding variables evenly, reducing their impact.

2. Matching

Matching subjects based on key characteristics can minimize the influence of confounding variables. For example, in a drug trial, matching participants by age, gender, and baseline health status can help control for these factors.

3. Statistical Control

Advanced statistical techniques like multiple regression analysis can help account for the influence of known confounding variables in data analysis.

4. Conduct Sensitivity Analysis

Researchers should test the robustness of their findings by conducting sensitivity analyses, systematically varying assumptions about confounding variables to assess their impact on results.

Although these measures can control confounding variables effectively, addressing them ethically is crucial in maintain the research integrity.

Examples of Confounding Variables

Here are some examples of confounding variables:

1. Smoking and Lung Cancer:

In a study investigating the link between smoking and lung cancer, age can be a confounding variable. Older individuals are more likely to both smoke and develop lung cancer. Therefore, if age is not controlled for in the study, it could falsely suggest a stronger association between smoking and lung cancer than actually exists.

2. Education and Income:

Suppose a study is examining the relationship between education level and income, occupation and the years of experience could be a confounding variable because certain jobs pay more. Without considering occupation and experience, the study might incorrectly reach to a conclusion.

3. Coffee Consumption and Heart Disease:

When studying the relationship between coffee consumption and heart disease, exercise and habits can be a confounding variable. Unhealthy behaviors like smoking, poor diet and lack of physical activity can contribute to heart disease. Failing to control for these factors could erroneously attribute heart disease risk solely to coffee consumption.

Controlling confounding variables through study design or statistical techniques is essential to ensure that research findings accurately represent the relationships being studied.

Statistical Approaches When Reporting And Discussing Confounding Variables

Statistical approaches for reporting and discussing confounding variables are essential to ensure the transparency, rigor, and validity of research findings. Here are some key statistical approaches and strategies to consider when dealing with confounding variables:

1. Descriptive Statistics

  • Begin by providing descriptive statistics for the confounding variables.
  • This includes measures such as mean, median, standard deviation, and frequency distribution.
  • This information helps to understand the characteristics of the confounders in your study.

3. Bivariate Analysis

  • Conduct bivariate analyses to examine the unadjusted relationships between the independent variable(s) and the dependent variable, as well as between the independent variable(s) and the confounding variables.

4. Stratification

  • Stratify your analysis by levels or categories of the confounding variable.
  • This allows you to examine the relationship between the independent variable and the dependent variable within each stratum.
  • It can help identify whether the effect of the independent variable varies across different levels of the confounder.

4. Multivariate Analysis

  • Use multivariate statistical techniques, such as regression analysis , to control for confounding variables.
  • In regression analysis, you can include the confounding variables as covariates in the model.
  • This helps to isolate the effect of the independent variable(s) while holding the confounders constant.

5. Interaction Testing

  • Investigate potential interactions between the independent variable(s) and the confounding variable.
  • Interaction terms in regression models can help determine whether the effect of the independent variable(s) varies based on different levels of the confounder. Interaction tests assess whether the relationship between the independent variable and the dependent variable is modified by the confounder.

6. Model Fit and Goodness of Fit

  • Assess the fit of your statistical model. This includes checking for goodness-of-fit statistics and examining diagnostic plots.
  • A well-fitting model is important for reliable results.

7. Graphical Representation

  • Utilize graphical representations , such as scatter plots, bar charts, or forest plots, to visualize the relationships between variables and the impact of confounding variables on your results.

These statistical approaches help researchers control for confounding variables and provide a comprehensive understanding of the relationships between variables in their studies. Thorough and transparent reporting and discussion of confounding variables in research involve a combination of statistical studies and a strong research design . Reporting these variables ethically is crucial in acknowledging them effectively.

Ethical Considerations While Dealing Confounding Variables

Ethical considerations play a significant role in dealing with confounding variables in research. Addressing confounding variables ethically is essential to ensure that research is conducted with integrity, transparency, and respect for participants and the broader community. Here are some ethical considerations to keep in mind:

1. Disclosure and Transparency

  • Researchers are ethically obliged to disclose potential confounding variables, as well as their plans for addressing them, in research proposals, publications, and presentations.
  • Moreover, transparent reporting allows readers to assess the study’s validity and the potential impact of confounding.

2. Informed Consent

  • When participants are involved in a study, they should be fully informed about the research objectives , procedures, and potential sources of bias, including confounding variables.
  • Informed consent should include explanations of how confounders will be addressed and why it is important.

3. Minimizing Harm

  • Researchers should take steps to minimize any potential harm to participants that may result from addressing confounding variables.
  • This includes ensuring that data collection and analysis procedures do not cause undue distress or discomfort.

4. Fair and Equitable Treatment

  • Researchers must ensure that the methods used to control for confounding variables are fair and equitable.
  • This means that any adjustments or controls should be applied consistently to all participants or groups in a study to avoid bias or discrimination.

5. Respect for Autonomy

  • Ethical research respects the autonomy of participants.
  • This includes allowing participants to withdraw from the study at any time if they feel uncomfortable with the research process or have concerns about how confounding variables are being managed.

6. Consider Community Impact

  • Consider the broader impact of research on the community.
  • Addressing confounding variables can help ensure that research results are accurate and relevant to the community, ultimately contributing to better-informed decisions and policies.

7. Avoiding Misleading Results

  • Ethical research avoids producing results that are misleading due to unaddressed confounding variables.
  • Misleading results can have serious consequences in fields like medicine and public health, where policies and treatments are based on research findings.

8. Ethical Oversight

  • Research involving human participants often requires ethical review and oversight by institutional review boards or ethics committees.
  • Researchers should follow the guidance and recommendations of these oversight bodies when dealing with confounding variables.

9. Continual Evaluation

  • Ethical research involves ongoing evaluation of the impact of confounding variables and the effectiveness of strategies to control them.
  • Additionally, researchers should be prepared to adjust their methods if necessary to maintain ethical standards.

Researchers must uphold these ethical principles to maintain the trust and credibility of their work within the scientific community and society at large.

The quest for knowledge is not solely defined by the variables you aim to study, but also by the diligence with which you address the complexities associated with the confounding variables. This would foster a clearer and more accurate reporting of research which is reliable and sound.

What are your experiences dealing with confounding variables? Share your views and ideas with the community on Enago Academy’s Open Platform and grow your connections with like-minded academics.

Frequently Asked Questions

Confounding bias is a type of bias that occurs when a third variable influences both the independent and dependent variables, leading to erroneous conclusions in research and statistical analysis. It occurs when a third variable (confounding variable) which is not considered in the research design or analysis, is related to both the dependent variable (the outcome of interest) and the independent variable (the factor being studied).

Controlling for confounding variables is a crucial aspect of designing and analyzing research studies. Some methods to control confounding variables are: 1. Randomization 2. Matching 3. Stratification 4. Multivariable Regression Analysis 5. Propensity Score Matching 6. Cohort Studies 7. Restriction 8. Sensitivity Analysis 9. Review Existing Literature 10. Expert Consultation

Some common types of confounding are selection bias, information bias. time-related confounding, age-related confounding, residual confounding, reverse causation etc.

Confounding variables affects the credibility, applicability, and ethical soundness of the study. Their effects include: 1. Lack of Attribution of Cause and Effect 2. Overestimate or Underestimate Effects 3. Distort Results 4. Reduce Precision and Reliability 5. Introduce Bias 6. Introduce Ethical Implications To produce valid research, researchers must identify and rigorously account for confounding variables, ensuring that their findings accurately reflect the relationships they intend to study.

Identifying confounding variables is a critical step in research design and analysis. Here are some strategies and approaches to help identify potential confounding variables: 1. Literature Review 2. Subject Matter Knowledge 3. Theoretical Framework 4. Pilot Testing 5. Consultation 6. Hypothesis Testing 7. Directed Acyclic Graphs (DAGs) 8. Statistical Software 9. Expert Review

Rate this article Cancel Reply

Your email address will not be published.

confounding variable in experiment example

Enago Academy's Most Popular Articles

10 Tips to Prevent Research Papers From Being Retracted

  • Publishing Research

10 Tips to Prevent Research Papers From Being Retracted

Research paper retractions represent a critical event in the scientific community. When a published article…

2024 Scholar Metrics: Unveiling research impact (2019-2023)

  • Industry News

Google Releases 2024 Scholar Metrics, Evaluates Impact of Scholarly Articles

Google has released its 2024 Scholar Metrics, assessing scholarly articles from 2019 to 2023. This…

What is Academic Integrity and How to Uphold it [FREE CHECKLIST]

Ensuring Academic Integrity and Transparency in Academic Research: A comprehensive checklist for researchers

Academic integrity is the foundation upon which the credibility and value of scientific findings are…

7 Step Guide for Optimizing Impactful Research Process

  • Reporting Research

How to Optimize Your Research Process: A step-by-step guide

For researchers across disciplines, the path to uncovering novel findings and insights is often filled…

Launch of "Sony Women in Technology Award with Nature"

  • Trending Now

Breaking Barriers: Sony and Nature unveil “Women in Technology Award”

Sony Group Corporation and the prestigious scientific journal Nature have collaborated to launch the inaugural…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Research Recommendations – Guiding policy-makers for evidence-based decision making

Language as a Bridge, Not a Barrier: ESL researchers’ path to successful…

confounding variable in experiment example

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

  • AI in Academia
  • Promoting Research
  • Career Corner
  • Diversity and Inclusion
  • Infographics
  • Expert Video Library
  • Other Resources
  • Enago Learn
  • Upcoming & On-Demand Webinars
  • Peer-Review Week 2023
  • Open Access Week 2023
  • Conference Videos
  • Enago Report
  • Journal Finder
  • Enago Plagiarism & AI Grammar Check
  • Editing Services
  • Publication Support Services
  • Research Impact
  • Translation Services
  • Publication solutions
  • AI-Based Solutions
  • Thought Leadership
  • Call for Articles
  • Call for Speakers
  • Author Training
  • Edit Profile

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

confounding variable in experiment example

In your opinion, what is the most effective way to improve integrity in the peer review process?

  • Privacy Policy

Research Method

Home » Confounding Variable – Definition, Method and Examples

Confounding Variable – Definition, Method and Examples

Table of Contents

Confounding Variable

Confounding Variable

Definition:

A confounding variable is an extraneous variable that is not the main variable of interest in a study but can affect the outcome of the study. Confounding variables can obscure or distort the true relationship between the independent and dependent variables being studied.

Confounding Variable Control Methods

Methods for controlling confounding variables in research are as follows:

Randomization

Randomization is a powerful method for controlling confounding variables in experimental research. By randomly assigning participants to different groups, researchers can ensure that any extraneous factors that could influence the outcome variable are evenly distributed across the groups.

Matching is a method used in observational studies to control for confounding variables. In this method, researchers match participants on one or more variables that could influence the outcome variable, such as age or gender.

Statistical Analysis

Statistical analysis is used to control for confounding variables in both experimental and observational studies. This can be achieved through the use of regression analysis, which allows researchers to control for the effects of confounding variables on the outcome variable.

Restriction

Restriction involves limiting the range of values for the confounding variable. For example, researchers might only include participants within a certain age range to control for age-related differences.

Stratification

Stratification involves dividing the sample into subgroups based on the confounding variable. Researchers can then compare the outcome variable across the subgroups to determine if the relationship holds for each subgroup.

Design Control

Design control refers to the process of carefully designing the study to minimize the potential for confounding variables. This can involve selecting a representative sample, controlling for extraneous variables, and using appropriate measures to assess the outcome variable.

Confounding Variable Examples

Confounding Variable Examples are as follows:

  • Age : Suppose that a study is investigating the effect of a new teaching method on student performance in a particular subject. If the students’ ages are not controlled for, age could be a confounding variable as older students may perform better due to greater maturity or prior knowledge.
  • Gender : Suppose a study is investigating the effect of a new medication on blood pressure. If the study does not control for gender, gender could be a confounding variable as women generally have lower blood pressure than men.
  • Socioeconomic status : Suppose a study is investigating the relationship between physical activity and health outcomes. If the study does not control for socioeconomic status, it could be a confounding variable as people with higher socioeconomic status may have better access to facilities for exercise and better nutrition.
  • Time of day: Suppose a study is investigating the effect of caffeine on alertness. If the study is conducted at different times of day, time of day could be a confounding variable as individuals may naturally be more alert at certain times of the day.
  • Environmental factors : Suppose a study is investigating the effect of a new air purifier on asthma symptoms. If the study does not control for environmental factors such as pollen or pollution levels, they could be a confounding variable as these factors could affect asthma symptoms independent of the air purifier.
  • Placebo effect : Suppose a study is investigating the effect of a new drug on pain relief. If the study does not control for the placebo effect, it could be a confounding variable as participants may experience a reduction in pain simply due to the belief that they are receiving a treatment.

Applications of Confounding Variable

Here are some applications of confounding variables:

  • Control for Confounding Variables : In experimental research, researchers try to control for confounding variables by holding them constant or statistically adjusting for them in the analysis. This helps to isolate the effects of the independent variable on the dependent variable.
  • Identifying Alternative Explanations : Confounding variables can help researchers identify alternative explanations for their findings. By examining the potential confounding variables, researchers can better understand the factors that may be contributing to the relationship between the independent and dependent variables.
  • Generalizability : Researchers can use confounding variables to improve the generalizability of their findings. By including a diverse range of participants and controlling for potential confounding variables, researchers can better understand how their findings apply to different populations.
  • Real-world Applications : Understanding confounding variables can have real-world applications. For example, in medical research, understanding the potential confounding variables can help clinicians better understand the effectiveness of treatments and improve patient outcomes.
  • Improving Study Design : By considering the potential confounding variables, researchers can improve the design of their studies to reduce the potential for confounding variables to impact their findings.

When to identify Confounding Variable

Identifying confounding variables is an essential step in designing and conducting research. Confounding variables are factors that may impact the relationship between the independent variable and the dependent variable, and they can potentially distort the study’s results. Here are some key points to consider when identifying confounding variables:

  • Before conducting the study: Researchers should identify potential confounding variables before the study begins. This allows them to design the study to control for or adjust for confounding variables to ensure that the results are reliable and valid.
  • During data collection: As researchers collect data, they may identify additional confounding variables that were not anticipated during the study’s design. In such cases, researchers may need to modify the study’s design or analysis to account for the newly identified confounding variables.
  • Statistical analysis: During the analysis, researchers should examine the relationship between the independent and dependent variables while controlling for potential confounding variables. This helps to isolate the effects of the independent variable on the dependent variable.
  • Reporting results: Researchers should report the potential confounding variables that were identified and how they were controlled for or adjusted for in the analysis. This helps other researchers to interpret and replicate the findings accurately.

Purpose of Confounding Variable

The purpose of identifying and controlling for confounding variables in research is to ensure that the relationship between the independent variable and the dependent variable is accurately measured. Confounding variables can introduce bias into a study, making it difficult to determine the true relationship between the variables of interest. By identifying and controlling for confounding variables, researchers can:

  • Improve the validity of the study: Confounding variables can introduce bias into a study, making it difficult to determine whether the results accurately reflect the relationship between the independent and dependent variables. By controlling for confounding variables, researchers can ensure that the results of their study are valid and accurately reflect the relationship between the variables of interest.
  • Improve the reliability of the study: Confounding variables can also affect the reliability of a study by making it more difficult to replicate the results. By controlling for confounding variables, researchers can ensure that their study is reliable and can be replicated by others.
  • Improve the generalizability of the study: Confounding variables can also affect the generalizability of a study by making it difficult to apply the results to other populations. By controlling for confounding variables, researchers can improve the generalizability of their study and increase the likelihood that the results can be applied to other populations.

Characteristics of Confounding Variable

Here are some characteristics of confounding variables:

  • Related to both the independent and dependent variables: Confounding variables are related to both the independent and dependent variables, meaning that they have an impact on both of these variables.
  • Associated with the outcome variable: Confounding variables are associated with the outcome variable or the dependent variable. This means that they can potentially affect the results of the study and make it difficult to determine the true relationship between the independent and dependent variables.
  • Not part of the study’s design: Confounding variables are not part of the study’s design, meaning that they are not intentionally measured or manipulated by the researcher.
  • Can introduce bias: Confounding variables can introduce bias into a study, making it difficult to determine the true effect of the independent variable on the dependent variable.
  • Can be controlled for: While confounding variables cannot be eliminated, they can be controlled for in the study’s design or statistical analysis. This helps to ensure that the true relationship between the independent and dependent variables is accurately measured.
  • Can affect generalizability : Confounding variables can also affect the generalizability of a study, making it difficult to apply the results to other populations or settings.

Advantages of Confounding Variable

Here are some advantages of confounding variables:

  • Improved accuracy of results: By controlling for confounding variables, researchers can improve the accuracy of their results. By isolating the effect of the independent variable on the dependent variable, researchers can determine the true relationship between these variables and avoid any distortions introduced by confounding variables.
  • More reliable results: Controlling for confounding variables can also lead to more reliable results. By minimizing the impact of confounding variables on the study, researchers can increase the likelihood that their findings are accurate and can be replicated by others.
  • Greater generalizability: Controlling for confounding variables can also increase the generalizability of the study. By minimizing the impact of confounding variables, researchers can increase the likelihood that their findings are applicable to other populations or settings.
  • Improved study design: The process of identifying and controlling for confounding variables can also improve the overall study design. By considering potential confounding variables during the study design phase, researchers can develop more robust studies that are better able to isolate the effect of the independent variable on the dependent variable.

Limitations of Confounding Variable

  • Identification : One limitation of confounding variables is that they may be difficult to identify. Confounding variables can come from a variety of sources and may be difficult to measure or control for in a study.
  • Time and resource constraints: Controlling for confounding variables can also be time-consuming and resource-intensive. This can limit the ability of researchers to fully control for all potential confounding variables.
  • Reduced sample size : Controlling for confounding variables may also require a larger sample size, which can be costly and time-consuming.
  • Limitations of statistical methods : While statistical methods can be used to control for confounding variables, there are limitations to these methods. For example, some statistical methods assume that the relationship between the independent and dependent variables is linear, which may not always be the case.
  • Potential for overadjustment : Controlling for too many confounding variables can also lead to overadjustment, where the relationship between the independent and dependent variables is obscured.

Disadvantages of Confounding Variable

Some Disadvantages of Limitations of Confounding Variable are as follows:

  • They can obscure or distort the true relationship between the independent and dependent variables, making it difficult to draw accurate conclusions.
  • They can make it challenging to replicate research findings because the confounding variable may not be accounted for in subsequent studies.
  • They can lead to incorrect conclusions about causality, as the observed relationship between the independent and dependent variables may be due to the confounding variable and not the independent variable.
  • They can reduce the precision of estimates and increase the variability of results.
  • They can lead to false associations or overestimation of the effect size of the independent variable.
  • They can also limit the generalizability of research findings to other populations or settings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Categorical Variable

Categorical Variable – Definition, Types and...

Moderating Variable

Moderating Variable – Definition, Analysis...

Quantitative Variable

Quantitative Variable – Definition, Types and...

Independent Variable

Independent Variable – Definition, Types and...

Extraneous Variable

Extraneous Variable – Types, Control and Examples

Variables in Research

Variables in Research – Definition, Types and...

Confounding Variable in Psychology (Examples + Definition)

practical psychology logo

There are 4 types of variables that are mostly focused on. These are dependent, independent, extraneous, and confounding variables. Confounding variables play a huge role in both dependent and independent variables. 

An external factor known as a confounding variable alters the relation between dependent and independent variables. The outcome of a study design is influenced by this unintentional factor. An additional element that was not considered is known as a confounding variable.

Confounding variables can be tricky as they have the ability to muddy study results. However, there is a way to try and stop that from happening. Here is what you need to know about confounding variables, including helpful examples.

All About Confounding Variables

Confounding variables are one type of extraneous variable. Confounding variables mainly have one answer that needs to be addressed. How does one know that the change that occurs in the dependent variable being observed is caused by the independent variable?

Confounding variables are generally defined as elements showing that the independent variable is not the only one influencing the dependent variable. Simply put, it is a factor that is related to both the independent and dependent variables but was excluded from your analysis.

It is referred to as a confounding variable. In a statistical model, a variable is confounded with other variables if they cannot be estimated separately from the data. If two variables in a statistical model can't be evaluated independently from the data, they are said to be confounded.

For instance, the study could be about memory recollection. Such as how many items a particular number of people can recall and how accurate that recollection is. Only when half of that group first underwent memory training while the other half did not. The participants were recruited and randomly placed in two different groups.

One group is trained in memory recollection studies, and the other is not. In this case, it would be clear that the independent variable affects the dependent variable as the training would make the first group recall more items and more accurately.

However, the confounding variable here could be age. Suppose the two training groups were not recruited according to a specific age, and their ages do not match. In that case, differences in the number of items recalled and the accuracy cannot be attributed to the training offered to the one group.

Age means different cognitive competencies, which leads to the vast differences in the study independent of the memory training.

The correlation need not be causal. In this example, it simply happens in that way. For instance, you might be looking at whether a certain treatment helps people recover from colds more quickly. 50 individuals with colds receive the treatment, and the other 50 more do not receive it, but all are tracked.

However, by coincidence, you administered the treatment to 30 individuals suffering from common colds and 20 individuals with more severe flu, compared to the control group's 30 individuals with flu and 20 individuals with colds. You conclude that the medication is quite effective. Still, it could also be the case that the group it was administered to was already on the road to recovery.

How Do Confounding Variables Interfere In Studies?

Having any kind of confounding variables interfere in the study makes it quite challenging to determine and clearly isolate whether or not changes in the dependent variable we are studying were because of the independent variable or if there is another factor that somehow casts the same effect on our dependent variables, as the independent variable in our study would have.

How Can You Ensure Compounding Variables Don't Ruin Studies?

Confounding variables are additional variables that we neglected to account for during the experimentation. Because they tend to raise variation and create bias, confounding variables can render our results meaningless. To avoid this happening, simply include a control variable in your study.

For instance, if you determine whether a lack of physical activity causes weight gain, age would be the confounding variable in the study. This is because it affects weight gain. Therefore, by including a control variable that is simply a fixed age in the study, you can reduce the impact of the confounding variables.

Observational studies are more problematic when confounding variables are present. The other half solution is to examine populations both qualitatively and quantitatively using all available metrics. This may make you aware of potential confounding factors.

Why Study Results Are Adjusted For Confounding Variables

Unsurprisingly, results that are observed from studies of dependent and independent variables are often adjusted for confounding factors such as age, sex, etc. So, what exactly does adjustment mean? For example, if there were to be a study conducted with dependent and independent variables, weight and height, you would be able to see a strong relation between the two.

However, generally speaking, it is widely known that gender typically influences height and weight. Therefore this would be a confounding factor. Therefore, the individuals should be split according to their sex. When this is done, the relation between the height and weight of both groups is still strong, but not like before.

Therefore, this means that the relationship is adjusted for sex. Simply put, the confounding variable, which is sex, is kept fixed. However, as we know, according to the definition of confounding variables, there is more than one factor to adjust for, as the definition states that it is any factor in relation to both the dependent and independent variables.

Therefore, it makes sense that in practice, studies adjust for many factors all in one go. Here is an example that considers other confounding variables to give you an idea of what that would be like. Suppose this study deals with dependent and independent variables of exercise and heart disease.

In that case, we are aware that a person's age can influence how often they exercise and their risk of heart disease. Therefore, unless age is considered, it would interfere in understanding the relation between the two variables. Because of that interference, age is therefore known as a confounder. There are many confounders. Others are diet, smoking, etc.

What to keep in mind is that these factors may not directly affect the exercise and heart disease variables. Still, they are linked to both the variables we are interested in. Therefore, the goal would be to estimate the relation between the two variables of interest while keeping the confounding variables fixed.

How is that done? Say there is a group of individuals. Data must be collected on all of the factors that affect them. Afterward, a statistical method must be used. The suitable method for this is regression analysis. This method is important if you do not want confounding factors rendering your results meaningless.

This is because we can then estimate the relationship between two variables of interest, all while keeping any confounding variables fixed. This is known as adjustment. As much as adjustment can help with considering confounding factors, it is also inadequate. This is because there may be other confounding variables that we just don't know about and, therefore, have not measured.

There may also be confounding factors that get measured incorrectly. For example, people can lie when asked about something they feel uncomfortable discussing. They may understate how much they smoke or not fully disclose what they eat. Although adjustment has its faults, it is still essential when dealing with confounding factors.

Are Confounding Variables The Same As Extraneous Variables?

One mustn't confuse the extraneous variable with the confounding variable. To make it clear, an extraneous variable is usually any variable that relates to any of either the dependent or independent variables that are the primary focus that we simply did not take into account, as we should have when the study was designed.

This may seem tricky as the definitions are almost similar. The word is 'and' when it comes to extraneous and compound variables. First of all, all compound variables are extraneous variables, but not all extraneous variables are compounds.

Therefore, extraneous variables are any variables relating to either any of the independent or independent variables. Whereas confounding variables are any variables relating to any of the independent and dependent variables.

With confounding variables, you can also say that maybe something either than the variable we are interested in is what caused an effect on the dependent variable. Basically, could some other variable be an alternative explanation for the study's findings?

Furthermore, the variable must relate to both the dependent and independent variables. That is, the dependent variable and at least one independent variable must change due to the confounding variable.

Example Of Extraneous And Confounding Variables In Action

The same study will show how the extraneous variables differ from the confounding variables.

Suppose there is a study on whether people with children stay longer in rented apartments or for short periods as opposed to those that don't have children. One extraneous variable is the fatigue levels. Raising a kid isn't a joke, and parents are bound to be more exhausted than those that do not have kids.

Another extraneous variable could be the age of the parents with children compared to tenants that do not have children. Typically, older people are the ones that have children compared to teenagers. When looking at the extraneous variables mentioned, they only relate to the independent variable.

When you take a look at what kind of extraneous variables could influence the dependent variable, then you would be looking at a job that requires relocation or the apartment building no longer being safe. As you can see, all the extraneous variables mentioned only relate to either the dependent or the independent variable, not both.

In contrast, confounding variables relate to both the dependent and independent variables. Let's look at how this is possible using the same study. These are the same variables of interest, having children, how long they stay in an apartment, and the same extraneous variables.

For this part, you simply need to determine if any of the extraneous variables are also confounding variables. For example, fatigue levels are related to whether people have children, but is it also related to the dependent variable? Is it also related to how long someone will live in an apartment complex? The answer is no. tiredness is in no way related to how long someone will live in an apartment.

Therefore this is not a confounding variable as it does not relate to both the dependent and independent variables. Take relocating for a new job, for example. It is related to the dependent variable. However, is it also related to the independent variable? Not exactly. Relocating for a job will influence moving out.

Still, it doesn't influence whether someone has children as they may have those children already. Therefore, this is just an extraneous variable. If you look at age, it's clear that it is related to whether or not an individual has children. However, is it related to how long someone lives in an apartment? Absolutely.

Simply put, whether a family decides to stay in a complex may not have to do with the fact that they are parents. It could be because those parents have aged and have no desire to live in an apartment anymore.

It relates to both the dependent and independent variables, making it a confounding variable. Telling the difference between the extraneous and confounding variables may be tricky, but it gets better when you keep the definitions in mind and practice using various examples.

As you can see, extraneous and confounding variables are not the same, although they are similar. Knowing the difference will help you compile a useful list of factors for your studies. Ensuring that you are aware of confounding variables when conducting studies can go a long way in helping you get results that are not influenced to the extent that you are unable to determine which factor made the changes.

  • https://www.scribbr.com/methodology/confounding-variables/#:~:text=A%20confounding%20variable%20is%20a,your%20independent%20and%20dependent%20variables
  • https://www.scribbr.com/frequently-asked-questions/prevent-confounding-variables-from-interfering-with-research/
  • https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704-ep713_confounding-em/bs704-ep713_confounding-em_print.html
  • https://www.scribbr.com/frequently-asked-questions/extraneous-vs-confounding-variables/#:~:text=An%20extraneous%20variable%20is%20any,related%20to%20the%20independent%20variable.

Related posts:

  • Variable Ratio Reinforcement Schedule (Examples)
  • State Dependent Memory + Learning (Definition and Examples)
  • Variable Interval Reinforcement Schedule (Examples)
  • Operant Conditioning (Examples + Research)
  • Positive Reinforcement (Definition + Examples)

Reference this article:

About The Author

Photo of author

PracticalPie.com is a participant in the Amazon Associates Program. As an Amazon Associate we earn from qualifying purchases.

Follow Us On:

Youtube Facebook Instagram X/Twitter

Psychology Resources

Developmental

Personality

Relationships

Psychologists

Serial Killers

Psychology Tests

Personality Quiz

Memory Test

Depression test

Type A/B Personality Test

© PracticalPsychology. All rights reserved

Privacy Policy | Terms of Use

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Confounding Variables | Definition, Examples & Controls

Confounding Variables | Definition, Examples & Controls

Published on 4 May 2022 by Lauren Thomas . Revised on 12 April 2023.

In research that investigates a potential cause-and-effect relationship, a confounding variable is an unmeasured third variable that influences both the supposed cause and the supposed effect.

It’s important to consider potential confounding variables and account for them in your research design to ensure your results are valid .

Table of contents

What is a confounding variable, why confounding variables matter, how to reduce the impact of confounding variables, frequently asked questions about confounding variables.

Confounding variables (aka confounders or confounding factors) are a type of extraneous variable related to a study’s independent and dependent variables . A variable must meet two conditions to be a confounder:

  • It must be correlated with the independent variable. This may be a causal relationship, but it does not have to be.
  • It must be causally related to the dependent variable.

Example of a confounding variable

Prevent plagiarism, run a free check.

To ensure the internal validity of your research, you must account for confounding variables. If you fail to do so, your results may not reflect the actual relationship between the variables that you are interested in.

For instance, you may find a cause-and-effect relationship that does not actually exist, because the effect you measure is caused by the confounding variable (and not by your independent variable).

Even if you correctly identify a cause-and-effect relationship, confounding variables can result in over- or underestimating the impact of your independent variable on your dependent variable.

There are several methods of accounting for confounding variables. You can use the following methods when studying any type of subjects (humans, animals, plants, chemicals, etc). Each method has its own advantages and disadvantages.

Restriction

In this method, you restrict your treatment group by only including subjects with the same values of potential confounding factors.

Since these values do not differ among the subjects of your study, they cannot correlate with your independent variable and thus cannot confound the cause-and-effect relationship you are studying.

  • Relatively easy to implement
  • Restricts your sample a great deal
  • You might fail to consider other potential confounders

In this method, you select a comparison group that matches with the treatment group. Each member of the comparison group should have a counterpart in the treatment group with the same values of potential confounders, but different independent variable values.

This allows you to eliminate the possibility that differences in confounding variables cause the variation in outcomes between the treatment and comparison group. If you have accounted for any potential confounders, you can thus conclude that the difference in the independent variable must be the cause of the variation in the dependent variable.

  • Allows you to include more subjects than restriction
  • Can prove difficult to implement since you need pairs of subjects that match on every potential confounding variable
  • Other variables that you cannot match on might also be confounding variables

Statistical control

If you have already collected the data, you can include the possible confounders as control variables in your regression models ; in this way, you will control for the impact of the confounding variable.

Any effect that the potential confounding variable has on the dependent variable will show up in the results of the regression and allow you to separate the impact of the independent variable.

  • Easy to implement
  • Can be performed after data collection
  • You can only control for variables that you observe directly, but other confounding variables you have not accounted for might remain

Randomisation

Another way to minimise the impact of confounding variables is to randomise the values of your independent variable. For instance, if some of your participants are assigned to a treatment group while others are in a control group , you can randomly assign participants to each group.

Randomisation ensures that with a sufficiently large sample, all potential confounding variables (even those you cannot directly observe in your study) will have the same average value between different groups. Since these variables do not differ by group assignment, they cannot correlate with your independent variable and thus cannot confound your study.

Since this method allows you to account for all potential confounding variables, which is nearly impossible to do otherwise, it is often considered to be the best way to reduce the impact of confounding variables.

  • Allows you to account for all possible confounding variables, including ones that you may not observe directly
  • Considered the best method for minimising the impact of confounding variables
  • Most difficult to carry out
  • Must be implemented prior to beginning data collection
  • You must ensure that only those in the treatment (and not control) group receive the treatment

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control, and randomisation.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomisation , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Thomas, L. (2023, April 12). Confounding Variables | Definition, Examples & Controls. Scribbr. Retrieved 12 August 2024, from https://www.scribbr.co.uk/research-methods/confounding-variable/

Is this article helpful?

Lauren Thomas

Lauren Thomas

Other students also liked, extraneous variables | examples, types, controls, a quick guide to experimental design | 5 steps & examples.

Confounding variable: Everything you need to know

Last updated

27 February 2023

Reviewed by

Cathy Heath

Short on time? Get an AI generated summary of this article instead

This article will discuss what a confounding variable is and provide examples and guidelines on ensuring this type of variable doesn't muddle your research.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What's the difference between confounding, independent, and dependent variables?

When defining a confounding variable (also known as a confounder), looking at the dependent and independent variables is helpful. These are critical elements in many studies. 

A dependent variable is an element in a scientific experiment or study. This is the thing you’re measuring or observing to see how it responds to changes in another variable, known as the independent variable. 

In other words, the dependent variable's value hinges on the independent variable's value. An independent variable is an element in an experiment that the researchers manipulate. 

Imagine you’re evaluating the effectiveness of a tutoring program on students' math scores. You may choose the quality or length of the tutoring program as the independent variable and the students' scores as the dependent variable. 

This type of experiment usually has a control group and a treatment group. 

The treatment group is the group receiving the treatment, which would be tutoring in this example. It could also be a medication or type of therapy. The control group does not receive the treatment, which is necessary to measure the treatment’s effectiveness.

  • What is a confounding variable?

A confounding variable relates to an experiment's dependent and independent variables. Confounding variables can be difficult to see because you usually don't bring them into an experiment deliberately. However, they can affect the outcome.

Returning to the example of a tutoring program, researchers should look for possible confounding variables. A simple confounding variable could be parental involvement. Perhaps the parents are more actively involved in their children’s education, so they signed them up for tutoring. 

If students achieved higher math scores after tutoring, you could argue that the involved parents played a role rather than the tutoring program. 

Why do confounding variables matter?

So, why are confounding variables problematic? Confounders can influence the conclusions of an experiment in ways that the researchers do not intend. 

Confounding variables can create false associations. A researcher may believe that Variable A leads to Conclusion B, but it’s actually the confounder causing the change.

Experimenters look for ways to reduce or eliminate confounding variables, as they can make it challenging to arrive at clear conclusions. 

If you were studying the tutoring program, you'd have to assign students to the program randomly rather than let their parents sign them up. This effectively eliminates the confounding variable of parental involvement. 

  • How can confounding variables harm research?

Confounding variables can seem rather abstract. However, they have important implications for research. There are several documented cases of confounding variables causing doubt or invalidating well-known studies. Let's look at some examples. 

Relationship between alcohol consumption and lung cancer

A study found that drinking alcohol more than doubles the risk of lung cancer. However, this study did not consider a confounding variable: A high percentage of people who consume alcohol also smoke. 

As we know, scientists have ascertained that smoking increases the risk of lung cancer. So, once researchers controlled studies for smoking, they discovered the link between alcohol and lung cancer didn’t exist. 

Does obesity lower the risk of death for heart patients?

One study found that people with obesity have a lower risk of mortality after a heart attack . In this case, researchers ignored age as the confounding factor. 

Heart disease is more prevalent among older people. Meanwhile, obesity is associated with a shorter life span overall. Also, people with obesity and heart disease tend to be younger than those who aren't obese. 

Once researchers controlled the studies for age, the association between obesity and higher survival disappeared.

Requirements for confounding variables

For something to be considered a confounding variable, it must meet two criteria: 

It must correlate with the independent variable 

It must causally relate to the dependent variable

If we use the example of correlating alcohol consumption with lung cancer, the consumption of alcohol is the independent variable, while cancer rates are the dependent variable. 

As we saw, smoking is a confounder, as it meets both conditions: There's a correlation between smoking and consuming alcohol, and it’s causally related to lung cancer. 

  • Confounding variables vs. selection bias

People sometimes confuse confounding variables with selection bias, another factor that often invalidates research. While these two can appear similar, they are not the same. 

What is bias in research?

Some type of bias is common in all experiments and research. Several types of bias may come from researchers or study participants. 

Bias is a significant issue in the social sciences, where attitudes and demographics (e.g., race, income, gender) can play a significant role in experiments, polls, and survey results. 

However, bias can also occur in medicine and the hard (natural) sciences, such as astronomy and meteorology. 

The following are some common types of bias.

Selection bias

Selection bias relates to the people you choose for research. It’s a common issue when researchers conduct studies at colleges and universities, where the population tends to be young and middle class. If you surveyed the reading habits of college students, you could not generalize this and assume it's true of the population in general. 

Similar issues can occur with medical research. If you studied the effectiveness of a flu vaccine, a young population could distort the results, as older adults are more likely to experience severe flu symptoms. 

Data or measurement collecting bias

This involves the data collection methods researchers use. Are you thinking of calling people on landlines to conduct a survey? As most younger people no longer use landlines, your collection method would skew your data. Collecting data on a website or via email excludes people who don't have regular internet access, including unsheltered people and older adults. 

Procedural bias

This bias occurs if you compel people to participate in research or pressure them to answer questions quickly. For example, if a company forces its employees to complete a survey during their break or lunch hour, they may fill out a form as fast as possible. 

Why confounding is different from bias

Qualifying Health uses the above example of the possible correlation between alcohol use disorder and cancer to illustrate how bias and confounding differ. As we saw, smoking is the confounding factor in this correlation, as many with alcohol use disorder also smoke. 

However, studying people who smoke and have alcoholism is not a type of selection bias as long as participants in the research are not disproportionately smokers. 

For example, Statista reports that about twice as many men as women use tobacco products. A study that disproportionately targets men would be a type of selection bias. 

As the article in Qualifying Health summarizes it, bias leads to false conclusions because the research hasn't used the correct sample type. On the other hand, confounding the correlation between variables is real but not necessarily causal. For example, people with alcohol use disorder have higher than average cancer rates, but it's unclear whether alcohol is the cause. 

  • Confounding variables and Simpson's Paradox

Simpson's Paradox is an issue that can perplex researchers, causing results that seem contradictory. It occurs when researchers combine data from two groups (e.g., men and women) which contradicts the effects of the separate groups. 

Simpson's Paradox results from confounding variables you’ve not identified. 

A famous example of Simpson's Paradox occurred when the University of California at Berkeley was sued for gender discrimination . Acceptance rates revealed that 44% of male applicants were accepted compared to 35% of female applicants. However, when researchers broke data down into separate departments at Berkeley, they found the acceptance rate for women was equal to or higher than for men in most cases. 

How can we explain these contradictory results? Looking at the data in isolation, the university appears more likely to accept men. However, women applied disproportionately to departments with low acceptance rates, while men typically applied to departments with high acceptance rates. 

Once we consider the confounding variable of the specific department applications, the data tells a different story. 

  • How to reduce the impact of confounding variables

Several techniques minimize the effect of confounding variables in research .

Randomization

Randomization works well for studies such as clinical trials for medications, as each subject should have an equal chance of receiving a particular treatment. This means a confounding variable is more likely to distribute across the groups evenly. 

For example, if a study was comparing the effectiveness of two drugs, researchers could randomly give their subjects Drug A, Drug B, or a placebo. 

Matching can be effective when you know your study’s confounding variables. You match an equal number of participants exposed or not exposed to the confounding variable. Studies often use siblings because they have similar genetics and family backgrounds. Twins are especially useful as they share identical genes. 

If you studied the effects of a specific behavior, such as smoking, you could compare a set of twins where one twin was a smoker and the other a non-smoker. This type of study eliminates confounding factors such as age differences, economic disparity, geography, and others. One drawback of matching is that it will only work if you are aware of the confounding variables.

Restricting enrollment

One of the simplest ways to control for confounding variables is to limit study enrollment to people equally affected by the confounder. 

For example, many medical conditions disproportionately affect older people. Age would therefore be a confounding variable if a study included people of all different ages. To prevent this, you can confine enrollment to older participants. 

The main drawbacks of restriction are that you must be aware of the confounders and the subjects' status relative to the confounder. Some study participants may have unknown health conditions or may not be truthful about their behavior. 

With factors such as age, the range may be too large to eliminate confounders, although you could confine a study to people over 65. However, if you enroll people aged 65–80, the risk factors could still vary significantly within that range. 

Include confounders as control variables

Controlling for confounders is possible by using them as control variables when performing regression analysis. With regression analysis, you include all the variables that impact the study, including dependent and independent variables. If you can identify confounding variables, you should also take these into account. 

The main challenge is to identify as many potential confounders as possible.

Imagine you’re studying the correlation between soda and obesity. Many potential confounders could affect the results, such as age, gender, other health conditions, and participants' overall diet. By considering these, you can avoid them influencing the results unduly. 

How do you identify a confounding variable?

While researchers receive training to account for confounders, the main challenge is identifying them in the first place. One of the best ways to identify confounders is to study previous research on similar topics, which may have identified relevant confounders. 

It's also helpful to carefully consider differences in participants in a study. Quantifying differences in age, income, and behavior rather than reducing everything to simple binaries can control confounders. 

Grouping people by age can be imprecise if the ranges are too large. For experiments focusing on the effects of food, drink, and drugs, learning how much participants consume may be very significant to the results. Simply asking if participants consume soda or not leaves a large margin for error: Some people drink a few ounces a month, while others drink gallons a week.

  • Don’t overlook confounding variables

Researchers need to be aware of confounding variables to ensure their experiments are valid. It also prevents them from coming to incorrect conclusions, as confounders can cause you to attribute causation to the wrong factors. To ensure this doesn't happen, uncover as many potential confounders as possible and take the appropriate precautions.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 6 February 2023

Last updated: 5 February 2023

Last updated: 16 April 2023

Last updated: 9 March 2023

Last updated: 30 April 2024

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 4 July 2024

Last updated: 6 March 2024

Last updated: 5 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next, log in or sign up.

Get started for free

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Gastroenterol Hepatol Bed Bench
  • v.5(2); Spring 2012

How to control confounding effects by statistical analysis

Mohamad amin pourhoseingholi.

1 Department of Biostatistics, Shahid Beheshti University of Medical Sciences, Tehran, Iran

Ahmad Reza Baghestani

2 Department of Mathematic, Islamic Azad University - South Tehran Branch, Iran

Mohsen Vahedi

3 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran

A Confounder is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship. There are various ways to exclude or control confounding variables including Randomization, Restriction and Matching. But all these methods are applicable at the time of study design. When experimental designs are premature, impractical, or impossible, researchers must rely on statistical methods to adjust for potentially confounding effects. These Statistical models (especially regression models) are flexible to eliminate the effects of confounders.

Introduction

Confounding variables or confounders are often defined as the variables correlate (positively or negatively) with both the dependent variable and the independent variable ( 1 ). A Confounder is an extraneous variable whose presence affects the variables being studied so that the results do not reflect the actual relationship between the variables under study.

The aim of major epidemiological studies is to search for the causes of diseases, based on associations with various risk factors. There may be also other factors that are associated with the exposure and affect the risk of developing the disease and they will distort the observed association between the disease and exposure under study. A hypothetical example would be a study of relation between coffee drinking and lung cancer. If the person who entered in the study as a coffee drinker was also more likely to be cigarette smoker, and the study only measured coffee drinking but not smoking, the results may seem to show that coffee drinking increases the risk of lung cancer, which may not be true. However, if a confounding factor (in this example, smoking) is recognized, adjustments can be made in the study design or data analysis so that the effects of confounder would be removed from the final results. Simpson's paradox too is another classic example of confounding ( 2 ). Simpson's paradox refers to the reversal of the direction of an association when data from several groups are combined to form a single group.

The researchers therefore need to account for these variables - either through experimental design and before the data gathering, or through statistical analysis after the data gathering process. In this case the researchers are said to account for their effects to avoid a false positive (Type I) error (a false conclusion that the dependent variables are in a casual relationship with the independent variable). Thus, confounding is a major threat to the validity of inferences made about cause and effect (internal validity). There are various ways to modify a study design to actively exclude or control confounding variables ( 3 ) including Randomization, Restriction and Matching.

In randomization the random assignment of study subjects to exposure categories to breaking any links between exposure and confounders. This reduces potential for confounding by generating groups that are fairly comparable with respect to known and unknown confounding variables.

Restriction eliminates variation in the confounder (for example if an investigator only selects subjects of the same age or same sex then, the study will eliminate confounding by sex or age group). Matching which involves selection of a comparison group with respect to the distribution of one or more potential confounders.

Matching is commonly used in case-control studies (for example, if age and sex are the matching variables, then a 45 year old male case is matched to a male control with same age).

But all these methods mentioned above are applicable at the time of study design and before the process of data gathering. When experimental designs are premature, impractical, or impossible, researchers must rely on statistical methods to adjust for potentially confounding effects ( 4 ).

Statistical Analysis to eliminate confounding effects

Unlike selection or information bias, confounding is one type of bias that can be, adjusted after data gathering, using statistical models. To control for confounding in the analyses, investigators should measure the confounders in the study. Researchers usually do this by collecting data on all known, previously identified confounders. There are mostly two options to dealing with confounders in analysis stage; Stratification and Multivariate methods.

1. Stratification

Objective of stratification is to fix the level of the confounders and produce groups within which the confounder does not vary. Then evaluate the exposure-outcome association within each stratum of the confounder. So within each stratum, the confounder cannot confound because it does not vary across the exposure-outcome.

After stratification, Mantel-Haenszel (M-H) estimator can be employed to provide an adjusted result according to strata. If there is difference between Crude result and adjusted result (produced from strata) confounding is likely. But in the case that Crude result dose not differ from the adjusted result, then confounding is unlikely.

2. Multivariate Models

Stratified analysis works best in the way that there are not a lot of strata and if only 1 or 2 confounders have to be controlled. If the number of potential confounders or the level of their grouping is large, multivariate analysis offers the only solution.

Multivariate models can handle large numbers of covariates (and also confounders) simultaneously. For example in a study that aimed to measure the relation between body mass index and Dyspepsia, one could control for other covariates like as age, sex, smoking, alcohol, ethnicity, etc in the same model.

2.1. Logistic Regression

Logistic regression is a mathematical process that produces results that can be interpreted as an odds ratio, and it is easy to use by any statistical package. The special thing about logistic regression is that it can control for numerous confounders (if there is a large enough sample size). Thus logistic regression is a mathematical model that can give an odds ratio which is controlled for multiple confounders. This odds ratio is known as the adjusted odds ratio, because its value has been adjusted for the other covariates (including confounders).

2.2. Linear Regression

The linear regression analysis is another statistical model that can be used to examine the association between multiple covariates and a numeric outcome. This model can be employed as a multiple linear regression to see through confounding and isolate the relationship of interest ( 5 ). For example, in a research seeking for relationship between LDL cholesterol level and age, the multiple linear regression lets you answer the question, How does LDL level vary with age, after accounting for blood sugar and lipid (as the confounding factors)? In multiple linear regression (as mentioned for logistic regression), investigators can include many covariates at one time. The process of accounting for covariates is also called adjustment (similar to logistic regression model) and comparing the results of simple and multiple linear regressions can clarify that how much the confounders in the model distort the relationship between exposure and outcome.

2.3. Analysis of Covariance

The Analysis of Covariance (ANCOVA) is a type of Analysis of Variance (ANOVA) that is used to control for potential confounding variables. ANCOVA is a statistical linear model with a continuous outcome variable (quantitative, scaled) and two or more predictor variables where at least one is continuous (quantitative, scaled) and at least one is categorical (nominal, non-scaled). ANCOVA is a combination of ANOVA and linear regression. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative covariates (confounders) account. The inclusion of this analysis can increase the statistical power.

The Analysis of Covariance (ANCOVA) is a type of Analysis of Variance (ANOVA) that is used to control for potential confounding variables . ANCOVA is a statistical linear model with a continuous outcome variable (quantitative, scaled) and two or more predictor variables where at least one is continuous (quantitative, scaled) and at least one is categorical (nominal, non-scaled). ANCOVA is a combination of ANOVA and linear regression. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative covariates (confounders) account. The inclusion of this analysis can increase the statistical power.

Practical example

Suppose that, in a cross-sectional study, we are seeking for the relation between infection with Helicobacter. Pylori (HP) and Dyspepsia Symptoms. The study conducted on 550 persons with positive H.P and 440 persons without HP. The results are appeared in 2*2 crude table ( Table 1 ) that indicated that the relation between infection with H.P and Dyspepsia is a reverese association (OR = 0.60, 95% CI: 0.42-0.94). Now suppose that weight can be a potential confounder in this study. So we break the crude table down in two stratum according to the weight of subjects (normal weight or over weight) and then calculate OR's for each stratum again. If stratum-specific OR is similar to crude OR, there is no potential impact from confounding factors. In this example there are different OR for each stratum (for normal weight group OR= 0.80, 95% CI: 0.38-1.69 and for overweight group OR= 1.60, 95% CI: 0.79-3.27).

The crude contingency table of association between H.Pylori and Dyspepsia

Dyspepsia (positive)Dyspepsia (negative)
50500
60380

The contingency table of association between H. Pylori and Dyspepsia for person who are in normal weight group

Dyspepsia (positive)Dyspepsia (negative)
1050
50200

The contingency table of association between H. Pylori and Dyspepsia for person who are in over weight group

Dyspepsia (positive)Dyspepsia (negative)
40450
10180

This shows that there is a potential confounding affects which is presented by weight in this study. This example is a type of Simpson's paradox, therefore the crude OR is not justified for this study. We calculated the Mantel-Haenszel (M-H) estimator as an alternative statistical analysis to remove the confounding effects (OR= 1.16, 95% CI: 0.71-1.90). Also logistic regression model (in which, weight is presented in multiple model) would be conducted to control the confounder, its result is similar as M-H estimator (OR= 1.15, 95% CI: 0.71-1.89).

The results of this example clearly indicated that if the impacts of confounders did not account in the analysis, the results can deceive the researchers with unjustified results.

Confounders are common causes of both treatment/exposure and of response/outcome. Confounding is better taken care of by randomization at the design stage of the research ( 6 ).

A successful randomization minimizes confounding by unmeasured as well as measured factors, whereas statistical control that addresses confounding by measurement and can introduce confounding through inappropriate control ( 7 – 9 ).

Confounding can persist, even after adjustment. In many studies, confounders are not adjusted because they were not measured during the process of data gathering. In some situation, confounder variables are measured with error or their categories are improperly defined (for example age categories were not well implied its confounding nature) ( 10 ). Also there is a possibility that the variables that are controlled as the confounders were actually not confounders.

Before applying a statistical correction method, one has to decide which factors are confounders. This sometimes is a complex issue ( 11 – 13 ). Common strategies to decide whether a variable is a confounder that should be adjusted or not, rely mostly on statistical criteria. The research strategy should be based on the knowledge of the field and on conceptual framework and causal model. So expertise' criteria should be involved for evaluating the confounders. Statistical models (especially regression models) are a flexible way of investigating the separate or joint effects of several risk factors for disease or ill health ( 14 ). But the researchers should notice that wrong assumptions about the form of the relationship between confounder and disease can lead to wrong conclusions about exposure effects too.

( Please cite as: Pourhoseingholi MA, Baghestani AR, Vahedi M. How to control confounding effects by statistical analysis. Gastroenterol Hepatol Bed Bench 2012;5(2):79-83.)

5 Real-World Examples of Confounding [With References]

An association between 2 variables X and Y cannot be interpreted as causal if it can be attributed to an alternative mechanism.

Confounding is an example of such mechanism that alters the relationship between X and Y, and therefore, leads to an over or underestimation of the true effect between them.

In its simplest form, it is due to a third variable — the confounder C — that represents a common cause:

Confounding representation in a causal directed acyclic graph

In order to uncover the true relationship between X and Y, we can use statistical techniques to control/adjust for that confounder. [If you are interested, I suggest: 7 Different Ways to Control for Confounding ]

Here’s a list of 5 real-world examples where confounding explains part of, or the entire, relationship between 2 variables:

Example 1: Confounding by smoking

Description:.

Alcohol consumption is associated with a higher risk of lung cancer . This association is non-causal; it is due to the confounding effect of smoking .

Visual representation:

Causal diagram representing the confounding effect of smoking on alcohol and lung cancer

Explanation:

Alcohol consumers also tend to be smokers. And since smoking is a known cause of lung cancer, this explains the high prevalence of lung cancer in the group of alcohol consumers.

Real-world evidence:

Zang and Wynder have shown that alcohol consumption increases the odds of lung cancer by 2.4 times. However, this effect disappeared when they controlled for smoking.

Example 2: Confounding by heart disease

Low blood pressure is associated with a higher risk of mortality . This association is non-causal; it is due to the confounding effect of heart disease .

Causal diagram representing the confounding effect of heart disease on low blood pressure and mortality

A low blood pressure is, in many cases, a marker of a more serious heart disease which explains the high-risk of mortality in the group of people with low blood pressure.

Busby and colleagues demonstrated that among elderly people, those with low diastolic blood pressure have almost twice the risk of dying than those with normal blood pressure. This effect disappeared when they controlled for confounding factors such as the presence of heart disease.

Example 3: Confounding by severity

The administration route of corticosteroids for the treatment of asthma is associated with the risk of hospitalization . This association is non-causal; it is due to the confounding effect of asthma severity .

Causal diagram representing confounding by severity

Severe cases of asthma require intravenous treatment with corticosteroids. These patients also have a higher risk of being hospitalized, not because they were treated intravenously, but because they have a more serious illness.

Mild cases of asthma tend to be treated orally. Also, these patients are less likely to be hospitalized.

This explains the higher risk of hospitalization in the group of intravenously treated patients.

An observational study by Clark and colleagues reported that asthmatic patients treated with intravenous corticosteroids had 2.6 times the odds of being hospitalized compared with orally treated patients.

This effect disappeared after controlling for confounding by severity, among other factors, through the use of randomized controlled trials. In fact, a meta-analysis by Rowe and colleagues found that intravenous and oral corticosteroids are equally effective for treating asthma.

Example 4: Confounding by indication

Acetaminophen use is associated with a higher risk of mortality . This association is non-causal; it is due to the confounding effect of a serious disease such as cancer.

Causal diagram representing confounding by indication

Serious illnesses such as cancer require analgesic treatment, of which acetaminophen. So, patients who use acetaminophen will have higher mortality rate. But this is due to their underlying illness and not to the use of acetaminophen itself.

Lipworth and colleagues conducted a cohort study where they followed 49,890 persons who were prescribed acetaminophen. They showed that their mortality rate was nearly doubled during the first year of follow-up (i.e. during the first year after acetaminophen indication), before dropping in later years.

This just reflects the fact that analgesics, such as acetaminophen, may be prescribed for people with serious illness:

  • Those who had a terminal illness died within the first year of follow-up — this explains the high mortality rate during the first year.
  • Those who did not die within the first year of follow-up probably had some minor issue that required an acetaminophen prescription — this explains the drop in the mortality rate after 1 year.

In this case, to eliminate the effect of confounding by indication, we can only use the mortality rate after 1 year of acetaminophen indication. This is called applying a lag-time to the exposure (you can learn more about it in this article on protopathic bias ).

Example 5: Confounding by age

In patients with heart failure, obesity is associated with a lower risk of mortality after a heart attack. This association is non-causal (i.e. does not provide evidence that obesity protects against death from heart attacks), it is due to the confounding effect of age .

Causal diagram representing the confounding effect of age on obesity and mortality

Heart disease is more prevalent in the elderly population. In this subgroup, obesity is less prevalent among those over 75 years (probably due to the shorter life expectancy among those with obesity [ source ]). As a consequence, the average obese patient with heart failure will be younger than the average non-obese patient with heart failure.

So, obese patients will have a lower mortality risk, not because of the protecting effect of obesity itself, but because of their younger age.

Wu and colleagues found that obese patients with heart failure have 18% less risk of mortality after a heart attack. This association disappears after controlling for age.

Further reading

  • An Example of Identifying and Adjusting for Confounding
  • 4 Simple Ways to Identify Confounding
  • 7 Different Ways to Control for Confounding
  • Front-Door Criterion to Adjust for Unmeasured Confounding
  • Why Confounding is Not a Type of Bias

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Extraneous Variables | Examples, Types & Controls

Extraneous Variables | Examples, Types & Controls

Published on April 2, 2021 by Pritha Bhandari . Revised on June 22, 2023.

In an experiment , an extraneous variable is any variable that you’re not investigating that can potentially affect the outcomes of your research study.

If left uncontrolled, extraneous variables can lead to inaccurate conclusions about the relationship between independent and dependent variables . They can also introduce a variety of research biases to your work, particularly selection bias

Research question Extraneous variables
Is memory capacity related to test performance?
Does sleep deprivation affect driving ability?
Does light exposure improve learning ability in mice?

Table of contents

Why do extraneous variables matter, extraneous vs. confounding variables, types and controls of extraneous variables, other interesting articles, frequently asked questions about extraneous variables.

Extraneous variables can threaten the internal validity of your study by providing alternative explanations for your results.

When not accounted for, this type of variable can also introduce many biases to your research, particularly types of selection bias such as:

  • Sampling bias or ascertainment bias : when some members of the intended population are less likely to be included than others.
  • Attrition bias : when participants who drop out of a study are systematically different from those who stay.
  • Survivorship bias : when researchers draw conclusions by only focusing on examples of successful individuals (the “survivors”) rather than the group as a whole.
  • Nonresponse bias : when people who don’t respond to a survey are different in significant ways from those who do.
  • Undercoverage bias : when some members of your population are not represented in the sample.

In an experiment , you manipulate an independent variable to study its effects on a dependent variable.

You recruit students from a university to participate in the study. You manipulate the independent variable by splitting participants into two groups:

  • Participants in the experimental group are asked to wear a lab coat during the study.
  • Participants in the control group are asked to wear a casual coat during the study.

When extraneous variables are uncontrolled, it’s hard to determine the exact effects of the independent variable on the dependent variable, because the effects of extraneous variables may mask them.

Uncontrolled extraneous variables can also make it seem as though there is a true effect of the independent variable in an experiment when there’s actually none.

  • Participant’s major (e.g., STEM or humanities)
  • Participant’s interest in science
  • Demographic variables such as gender or educational background
  • Time of day of testing
  • Experiment environment or setting

Controlling extraneous variables is an important aspect of experimental design . When you control an extraneous variable, you turn it into a control variable .

Prevent plagiarism. Run a free check.

A confounding variable is a type of extraneous variable that is associated with both the independent and dependent variables.

  • An extraneous variable is anything that could influence the dependent variable.
  • A confounding variable influences the dependent variable, and also correlates with or causally affects the independent variable.

In a conceptual framework diagram, you can draw an arrow from a confounder to the independent variable as well as to the dependent variable. You can draw an arrow from extraneous variables to a dependent variable.

Extraneous vs confounding variables

People who work in labs would regularly wear lab coats and may have higher scientific knowledge in general. Therefore, it’s unlikely that your manipulation will increase scientific reasoning abilities for these participants.

Variables that only impact on scientific reasoning are extraneous variables. These include participants’ interests in science and undergraduate majors. While interest in science may affect scientific reasoning ability, it’s not necessarily related to wearing a lab coat.

Example of extraneous vs confounding variables

Demand characteristics

Demand characteristics are cues that encourage participants to conform to researchers’ behavioral expectations.

Sometimes, participants can infer the intentions behind a research study from the materials or experimental settings, and use these hints to act in ways that are consistent with study hypotheses. These demand characteristics can bias the study outcomes and reduce the external validity , or generalizability , of the results.

You can avoid demand characteristics by making it difficult for participants to guess the aim of your study. Ask participants to perform unrelated filler tasks or fill out plausibly relevant surveys to lead them away from the true nature of the study.

Experimenter effects

Experimenter effects are unintentional actions by researchers that can influence study outcomes.

There are two main types of experimenter effects:

  • Experimenters’ interactions with participants can unintentionally affect their behaviours.
  • Errors in measurement, observation, analysis, or interpretation may change the study results.

To avoid experimenter effects, you can implement masking (blinding) to hide the condition assignment from participants and experimenters. In a double-blind study, researchers won’t be able to bias participants towards acting in expected ways or selectively interpret results to suit their hypotheses .

Situational variables

Situational variables, such as lighting or temperature, can alter participants’ behaviors in study environments. These factors are sources of random error or random variation in your measurements.

To understand the true relationship between independent and dependent variables, you’ll need to reduce or eliminate the effect of situational factors on your study outcomes.

To avoid situational variables from influencing study outcomes, it’s best to hold variables constant throughout the study or statistically account for them in your analyses.

Participant variables

A participant variable is any characteristic or aspect of a participant’s background that could affect study results, even though it’s not the focus of an experiment.

Participant variables can include sex, gender identity, age, educational attainment, marital status, religious affiliation, etc.

Since these individual differences between participants may lead to different outcomes, it’s important to measure and analyze these variables.

To control participant variables, you should aim to use random assignment to divide your sample into control and experimental groups. Random assignment makes your groups comparable by evenly distributing participant characteristics between them.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

There are 4 main types of extraneous variables :

  • Demand characteristics : environmental cues that encourage participants to conform to researchers’ expectations.
  • Experimenter effects : unintentional actions by researchers that influence study outcomes.
  • Situational variables : environmental variables that alter participants’ behaviors.
  • Participant variables : any characteristic or aspect of a participant’s background that could affect study results.

Control variables help you establish a correlational or causal relationship between variables by enhancing internal validity .

If you don’t control relevant extraneous variables , they may influence the outcomes of your study, and you may not be able to demonstrate that your results are really an effect of your independent variable .

“Controlling for a variable” means measuring extraneous variables and accounting for them statistically to remove their effects on other variables.

Researchers often model control variable data along with independent and dependent variable data in regression analyses and ANCOVAs . That way, you can isolate the control variable’s effects from the relationship between the variables of interest.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). Extraneous Variables | Examples, Types & Controls. Scribbr. Retrieved August 12, 2024, from https://www.scribbr.com/methodology/extraneous-variables/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, confounding variables | definition, examples & controls, control variables | what are they & why do they matter, guide to experimental design | overview, steps, & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Famous easy to understand examples of a confounding variable invalidating a study

Are there any well-known statistical studies that were originally published and thought to be valid, but later had to be thrown out due to a confounding variable that wasn't taken into account? I'm looking for something easy to understand that could be explained to and appreciated by a quantitative literacy class that has zero pre-requisites.

  • experiment-design
  • confounding
  • observational-study
  • 7 $\begingroup$ What is the difference between "quantitative literacy" and "numeracy" as a state of mind? $\endgroup$ –  Henry Commented Oct 23, 2019 at 23:35
  • 1 $\begingroup$ The Stanford Marshmallow Experiment is the first one that came to my mind. $\endgroup$ –  Apollys supports Monica Commented Oct 25, 2019 at 21:23
  • 1 $\begingroup$ I think of all the public opinion polls that assume no correlation between a person’s opinion and their willingness to answer questions from strangers (or their willingness to tolerate a phone call interrupting whatever). $\endgroup$ –  WGroleau Commented Oct 26, 2019 at 15:04
  • 1 $\begingroup$ It's just the preferred terminology. But I guess really to me numeracy means having a great grasp on number sense itself, whereas in our QL class we try to teach a smattering of lots of things - but probably none of them too well (such as statistics and logic). $\endgroup$ –  NathanLite Commented Oct 28, 2019 at 14:52

12 Answers 12

Coffee drinking & lung cancer.

My favorite example is that supposedly, "coffee drinkers have a greater risk of lung cancer", despite most coffee drinkers... well... drinking coffee, rather than inhaling it.

There have been various studies about this, but the consensus remains that studies with this conclusion usually just have a larger proportion of smoking coffee drinkers, than non-smoking coffee drinkers. In other words, the effect of smoking confounds the effect of coffee consumption , if not included in the model. The most recent article on this I could find is a meta analysis by Vania Galarraga and Paolo Boffetta (2016). $^\dagger$

The Obesity Paradox

Another example that plagues clinical research, is the claim that obesity can be beneficial for certain diseases. Specifically, many articles, still to this day (just do a quick search for obesity paradox on pubmed and be amazed), claim the following:

  • While a higher BMI increases the risk of diabetes, cardiovascular disease and certain types of cancer, once a patient already has the disease, a higher BMI is associated with lower rates of major adversarial events or death.

Why does this happen? Obesity is defined as excess fat negatively affecting health, yet we classify obesity based on BMI. BMI is just calculated as:

$$\text{BMI} = \frac{\text{weight in kg}}{(\text{height in m})^2},$$

so the most direct way to combat obesity is through weight loss (or by growing taller somehow).

Regimes that focus on loss of weight rather than fat , tend to result in a proportionally large loss of muscle. This is likely what causes lower BMI to be associated with a higher rate of major adversarial events.

Because many studies do not include measures of body fat (percentage), but only BMI as a proxy, the amount of body fat confounds the effect of BMI on health.

A nice review of this phenomenon was written by Steven G. Chrysant (2018). $^\ddagger$ He ends with:

[B]ased on the recent evidence, the obesity paradox is a misnomer and could convey the wrong message to the general public that obesity is not bad.

Followed by:

Journals [should] no longer accept articles about the 'obesity paradox'.

$\dagger$ : Vania Galarraga and Paolo Boffetta (2016): Coffee Drinking and Risk of Lung Cancer—A Meta-Analysis. Cancer Epidemiol Biomarkers Prev June 1 2016 (25) (6) 951-957; DOI: 10.1158/1055-9965.EPI-15-0727

$\ddagger$ : Steven G. Chrysant (2018): Obesity is bad regardless of the obesity paradox for hypertension and heart disease. J Clin Hypertens (Greenwich). 2018 May;20(5):842-846. doi: 10.1111/jch.13281. Epub 2018 Apr 17.

Examples of (poor) studies claiming to have demonstrated the obesity paradox:

  • McAuley et al. (2018): Exercise Capacity and the Obesity Paradox in Heart Failure: The FIT (Henry Ford Exercise Testing) Project
  • Weatherald et al. (2018): The association between body mass index and obesity with survival in pulmonary arterial hypertension
  • Patel et al. (2018): The obestiy paradox: the protective effect of obesity on right ventricular function using echocardiographic strain imaging in patients with pulmonary hypertension

Articles refuting the obesity paradox as a mere confounding effect of body fat:

  • Lin et al. (2017): Impact of Misclassification of Obesity by Body Mass Index on Mortality in Patients With CKD
  • Leggio et al. (2018): High body mass index, healthy metabolic profile and low visceral adipose tissue: The paradox is to call it obesity again
  • Medina-Inojosa et al. (2018): Association Between Adiposity and Lean Mass With Long-Term Cardiovascular Events in Patients With Coronary Artery Disease: No Paradox
  • Flegal & Ioannidis (2018): The Obesity Paradox: A Misleading Term That Should Be Abandoned

Articles about the obesity paradox in cancer:

  • Cespedes et al. (2018): The Obesity Paradox in Cancer: How Important Is Muscle?
  • Caan et al. (2018): The Importance of Body Composition in Explaining the Overweight Paradox in Cancer-Counterpoint
  • 9 $\begingroup$ Another example I recall from our classes was the correlation of the amount of fire engines involved in fighting a fire with the damage of the fire. The confounding variable there was that the bigger the fire, the bigger the damages, but also the more engines were needed to fight the fire. I can't recall how serious those studies were or if it was just some toy example. $\endgroup$ –  JAD Commented Oct 24, 2019 at 9:33
  • 1 $\begingroup$ On the obesity paradox --- older subjects with higher BMI suffer fewer broken bones (hips etc) through falls it seems. Do they fall less? Are they 'cushioned' by fat (the Paradox explanation)? Have they got less athropied muscles and other health problems thus fewer falls? Etc. $\endgroup$ –  user3445853 Commented Oct 24, 2019 at 13:41
  • 1 $\begingroup$ To clarify, is the explanation of the obesity paradox that some people who are diagnosed with a disease try to lose weight, and consequently lose muscle, which has a negative effect, while others who do not try to lose weight, and therefore remain obese, do not suffer this effect? $\endgroup$ –  StackOverthrow Commented Oct 24, 2019 at 17:08
  • 1 $\begingroup$ On the obesity paradox, consider that many cancers cause the patient to lose a lot of weight, and "unexplained weight loss" is one of worrying symptoms that should lead a person to see a doctor - I'm convinced that this also has something do to with the paradox. $\endgroup$ –  Noctiphobia Commented Oct 24, 2019 at 21:05
  • $\begingroup$ I think I remember reading a "joke" study that determined that in football being ahead in the 3rd quarter was highly correlated with winning the game. The "takeaway" for coaches was to win more games, they should score more points than the other team. $\endgroup$ –  emory Commented Oct 25, 2019 at 11:35

You might want to introduce Simpson's Paradox .

The first example that page is the UC Berkeley gender bias case where it was thought that there was gender bias (towards males) in admissions when looking at overall acceptance rates, but this was eliminated or reversed when investigated by department . The confounding variable of department picked up on a gender difference in applying to more competetive departments.

  • $\begingroup$ Another takeaway from that should be to investigate if this applying to more competitive departments (suppose for women) is a more general phenomenon? Anybody have references/links about that? $\endgroup$ –  kjetil b halvorsen ♦ Commented Dec 1, 2019 at 5:28

Power Lines and Cancer

After an initial study finding a link between living next to high-voltage transmission lines and cancer, follow-up studies found that when you include income in the model the effect of the power lines goes away.

Living next to power lines is a moderately accurate predictor of low household income / wealth. Put bluntly, there aren't as many fancy mansions next to transmission lines as elsewhere.

There is correlation between poverty and cancer. When comparisons were made between households on similar income brackets close to and far away from transmission lines, the effect of transmission lines disappeared.

In this case, the confounding variables were household wealth and distance to the nearest high voltage line.

Background reading .

  • $\begingroup$ As I recall the author of these studies was later shown to have outright falsified his data, the results were discredited and he is now prevented from doing research in this field. $\endgroup$ –  meh Commented Oct 29, 2019 at 19:21
  • $\begingroup$ Interesting. Do you have links? $\endgroup$ –  Jason Commented Oct 29, 2019 at 21:49
  • $\begingroup$ @aginensky The original study found correlation (as Jason said), but incorrectly claimed causation without sufficient data or evidence. The link does reference another researcher who did falsify a study in 1992 though (who was later fired in 1999 when the falsification was discovered). $\endgroup$ –  Graham Commented Dec 2, 2019 at 11:59

Consider the following examples. I am not sure they are necessarily very famous but they help to demonstrate the potential negative effects of confounding variables.

Say one is studying the relation between birth order (1st child, 2nd child, etc.) and the presence of Down Syndrome in the child. In this scenario, maternal age would be a confounding variable:

Higher maternal age is directly associated with Down Syndrome in the child

Higher maternal age is directly associated with Down Syndrome, regardless of birth order (a mother having her 1st vs 3rd child at age 50 confers the same risk)

Maternal age is directly associated with birth order (the 2nd child, except in the case of twins, is born when the mother is older than she was for the birth of the 1st child)

Maternal age is not a consequence of birth order (having a 2nd child does not change the mother's age)

More examples

In risk assessments, factors such as age, gender, and educational levels often affect health status and so should be controlled. Beyond these factors, researchers may not consider or have access to data on other causal factors. An example is the study of smoking tobacco on human health. Smoking, drinking alcohol, and diet are lifestyle activities that are related. A risk assessment that looks at the effects of smoking but does not control for alcohol consumption or diet may overestimate the risk of smoking (Tjønneland, Grønbaek, Stripp, & Overvad, 1999). Smoking and confounding are reviewed in occupational risk assessments such as the safety of coal mining (Axelson, 1989). When there is not a large sample population of non-smokers or non-drinkers in a particular occupation, the risk assessment may be biased towards finding a negative effect on health.

References: https://en.wikipedia.org/wiki/Confounding

Tjønneland, A., Grønbaek, M., Stripp, C., & Overvad, K. (1999). Wine intake and diet in a random sample of 48763 Danish men and women. The American Journal of Clinical Nutrition, 69(1), 49-54.

Axelson, O. (1989). Confounding from smoking in occupational epidemiology. British Journal of Industrial Medicine, 46(8), 505-507.

There was one about diet that looked at diet in different countries and concluded that meat caused all sorts of problems (e.g. heart disease), but failed to account for the average lifespan in each country: The countries that ate very little meat also had lower life expectancies and the problems that meat "caused" were ones that were linked to age.

I don't have citations for this - I read about it about 25 years ago - but maybe someone will remember or maybe you can find it.

  • 1 $\begingroup$ Are you sure it wasn’t fat rather than meat? The infamous “seven countries study” that started the don’t-eat-fat myth not only ignored a few variables but strangely omitted data from many countries. $\endgroup$ –  WGroleau Commented Oct 26, 2019 at 15:01
  • 2 $\begingroup$ I'm sort of sure it was meat. There could easily have been multiple studies. Diet is notorious for bad statistics. $\endgroup$ –  Peter Flom Commented Oct 26, 2019 at 18:54

I'm not sure it entirely counts as a confounding variable so much as confounding situations , but animals' abilities to find their way through a maze may qualify.

As described in this ScienceDirect summary , studies of rats (or other animals) in mazes were popular for a large part of the 20th century, and continue today to some extent. One possible purpose is to study the subject's ability to remember a maze which it has previously run; another popular purpose is to study any bias in the subject's choices of whether to turn left or right at junctions, in a maze which the subject has not previously run.

It should be immediately clear that if the subject has forgotten the maze, then any inherent bias in choice of route will be a confounding factor. If the "right" direction coincides with the subject's bias, then they could find their way in spite of not remembering the route.

In addition to this, studies found various other confounding features exist which might not have been considered. The height of walls and width of passages are factors, for example. And if another subject has previously navigated the maze, subjects which rely strongly on their sense of smell (mice and dogs, for instance) may find their way simply by tracking the previous subject's scent. Even the construction of the maze may be an issue - animals tend to be less happy to run over "hollow-sounding" floors.

Many animal maze studies ended up finding confounding factors instead of the intended study results. More disturbingly, according to Richard Feynmann , the studies reporting these confounding factors were not picked up by researchers at the time. As a result we simply don't know if any animal maze studies carried out around this time have any validity whatsoever. That's decades worth of high-end research at the finest universities around the world, by the finest psychologists and animal behaviourists, and every last shred of work had to at best be taken with a very large spoon of salt. Later researchers had to go back and duplicate all this work, to find out what was actually valid and what wasn't repeatable.

  • 1 $\begingroup$ If I read your link correctly, it appears that Richard Feynmann, who is not a researcher in ethology, animal behaviour or psychology commented in a talk (not an academic conference or publication of any kind) on some unidentified unpublished “study” that may or may not have showed what he says it showed and then concluded that researchers in the field do not care about this because they do not talk about “Mr. Young”. And that's supposed to be an example of scientific rigor or evidence that researchers in the field do not care about confounding factors in general? $\endgroup$ –  Gala Commented Oct 26, 2019 at 10:56
  • 1 $\begingroup$ @Gala Not that they did not care, but that they did not know . Feynman's point wasn't about scientific rigor from individual scientists - he had no doubt they did the best they could. His point was that the scientific community as a whole was substandard at passing on this important information so that other people don't fall into the same trap. He was using this as a nice hook to hang his lesson on, but there are so many examples of this - Mendel, for instance. $\endgroup$ –  Graham Commented Oct 31, 2019 at 0:59

There was a great study of mobile phone use and brain cancer. Most people with a lateral brain cancer, when asked which hand they hold their phone in, answer the diseased side. This seemed to show that phone use caused cancer.

However, maybe the answers are informed by hindsight. Someone thought of a great test for this. The sample was big enough to include some people with two cancers. So you could ask, does the declared side of phone use influence the risk of a cancer on the other side of the brain? It was actually protective , thus showing the hindsight bias in the original result.

Sorry, I don't have the reference.

'Statistics' by Freedman, Purvis et al. has a number of examples in the first couple of chapters. My personal favorite is that ice cream causes polio. The confounding variable is that they are both prevalent in the summertime when young children are out, about, and spreading polio. The book is "Statistics (Fourth Edition) 4th Edition, Kindle Edition- by David Freedman (Author), Robert Pisani (Author), Roger Purves (Author)"

This is not a study, but a gallery of spurious correlations that could be appreciated by a quantitative literacy class. The downside of this is the lack of an explanation (aside from chance).

See: Subversive Subjects: Rule-Breaking and Deception in Clinical Trials https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520402/

Hormone replacement Therapy and heart disease?

https://www.teachepi.org/wp-content/uploads/OldTE/documents/courses/bfiles/The%20B%20Files_File1_HRT_Final_Complete.pdf

The benefits were determined by observation, and essentially it appears that the people who chose to do hrt had higher socioeconomic status, healthier lifestyle etc

(So one could argue on confounding Vs observational study)

There are lots of good examples in Howard Weiner's books. In particular, Chapter 1 "The most dangerous equation" in "How to understand, communicate and control uncertainty through graphical display"

Examples include:

The small schools movement. People noticed that some small schools had better performance than large schools so spent money to reduce school size. It turned out that some small schools also had worse performance than large schools. It was largely an artefact of extreme outcomes showing up in small samples.

Kidney cancer rates (This example is also used Daniel Kahneman's "Thinking Fast and Slow", see the start of Chapter 10). Lowest kidney cancer rates in rural, sparsely populated counties. These low rates have to be because of the the clean living rural life style. But wait, counties with the highest incidence of kidney cancer are also rural and sparsely populated. This has to be because of the lack of access to good medical care and too much drinking. Of course, the extreme are actually an artefact of the small populations.

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged experiment-design confounding observational-study paradox or ask your own question .

  • Featured on Meta
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Bringing clarity to status tag usage on meta sites

Hot Network Questions

  • How to read data from Philips P2000C over its serial port to a modern computer?
  • What is the meaning of "Exit, pursued by a bear"?
  • What does it mean to have a truth value of a 'nothing' type instance?
  • How would a culture living in an extremely vertical environment deal with dead bodies?
  • How do you "stealth" a relativistic superweapon?
  • What majority age is taken into consideration when travelling from country to country?
  • What is the legal status of the Avengers before Civil War and after Winter Soldier?
  • Column generation for bin packing problem with variable size minimizing the largest cost of any class
  • Density matrices and locality
  • How to allow just one user to use SSH?
  • Physical basis of "forced harmonics" on a violin
  • Is there a law against biohacking your pet?
  • Unreachable statement wen upgrading APEX class version
  • Did the United States have consent from Texas to cede a piece of land that was part of Texas?
  • What's wrong with my app authentication scheme?
  • Is my encryption format secure?
  • Terminal autocomplete (tab) not completing when changing directory up one level (cd ../)
  • They come in twos
  • Does a Way of the Astral Self Monk HAVE to do force damage with Arms of the Astral Self from 10' away, or can it be bludgeoning?
  • Advice needed: Team needs developers, but company isn't posting jobs
  • How can I cover all my skin (face+neck+body) while swimming outside (sea or outdoor pool) to avoid UV radiations?
  • The complement of a properly embedded annulus in a handlebody is a handlebody
  • MOSFETs keep shorting way below rated current
  • Were there mistakes in converting Dijkstra's Algol-60 compiler to Pascal?

confounding variable in experiment example

IMAGES

  1. Confounding Variables

    confounding variable in experiment example

  2. What is a Confounding Variable? (Definition & Example)

    confounding variable in experiment example

  3. A Complete Overview of Confounding Variables in Research

    confounding variable in experiment example

  4. Confounding Variable / Third Variable

    confounding variable in experiment example

  5. Confounding Variables

    confounding variable in experiment example

  6. Types of Variables in Science Experiments

    confounding variable in experiment example

COMMENTS

  1. 25 Confounding Variable Examples (2024)

    Accurately isolating and controlling confounding variables is thus crucial in maximizing the validity of an experiment or study, primarily when trying to determine cause-effect relationships between variables (Knapp, 2017; Nestor & Schutt, 2018). ... Confounding Variables Examples. 1. IQ and Reading Ability

  2. Confounding Variables

    Confounding variables (a.k.a. confounders or confounding factors) are a type of extraneous variable that are related to a study's independent and dependent variables. A variable must meet two conditions to be a confounder: It must be correlated with the independent variable. This may be a causal relationship, but it does not have to be.

  3. What is a Confounding Variable? (Definition & Example)

    Confounding variable: A variable that is not included in an experiment, yet affects the relationship between the two variables in an experiment. This type of variable can confound the results of an experiment and lead to unreliable findings. For example, suppose a researcher collects data on ice cream sales and shark attacks and finds that the ...

  4. What Is a Confounding Variable? Definition and Examples

    A confounding variable is a variable that influences both the independent variable and dependent variable and leads to a false correlation between them. A confounding variable is also called a confounder, confounding factor, or lurking variable. Because confounding variables often exist in experiments, correlation does not mean causation.

  5. Confounding Variables in Psychology: Definition & Examples

    A confounding variable in psychology is an extraneous factor that interferes with the relationship between an experiment's independent and dependent variables. It's not the variable of interest but can influence the outcome, leading to inaccurate conclusions about the relationship being studied. For instance, if studying the impact of ...

  6. Confounding Variable: Definition & Examples

    In studies examining possible causal links, a confounding variable is an unaccounted factor that impacts both the potential cause and effect and can distort the results. Recognizing and addressing these variables in your experimental design is crucial for producing valid findings. Statisticians also refer to confounding variables that cause ...

  7. Confounding Variable: Simple Definition and Example

    A confounding variable can have a hidden effect on your experiment's outcome. In an experiment, the independent variable typically has an effect on your dependent variable . For example, if you are researching whether lack of exercise leads to weight gain, then lack of exercise is your independent variable and weight gain is your dependent ...

  8. 1.4.1

    1.4.1 - Confounding Variables. Randomized experiments are typically preferred over observational studies or experimental studies that lack randomization because they allow for more control. A common problem in studies without randomization is that there may be other variables influencing the results. These are known as confounding variables.

  9. Confounding Variables in Psychology Research

    The Impact of Confounding Variables on Research. Psychology researchers must be diligent in controlling for confounding variables, because if they are not, they may draw inaccurate conclusions. For example, during a research project, Geher's team found the number of stitches one received in childhood predicted one's sexual activity in ...

  10. What is a Confounding Variable? (Definition & Example)

    Confounding variable: A variable that is not included in an experiment, yet affects the relationship between the two variables in an experiment. ... A matched pairs design is a type of experimental design in which we "match" individuals based on values of potential confounding variables. For example, suppose researchers want to know how a ...

  11. Confounding Variables

    For example, in a drug trial, matching participants by age, gender, and baseline health status can help control for these factors. 3. Statistical Control. Advanced statistical techniques like multiple regression analysis can help account for the influence of known confounding variables in data analysis. 4.

  12. Confounding Variable

    Confounding Variable. Definition: A confounding variable is an extraneous variable that is not the main variable of interest in a study but can affect the outcome of the study. Confounding variables can obscure or distort the true relationship between the independent and dependent variables being studied.

  13. 1.5: Confounding Variables

    A confounding variable is a variable that may affect the dependent variable. This can lead to erroneous conclusions about the relationship between the independent and dependent variables. You deal with confounding variables by controlling them; by matching; by randomizing; or by statistical control. Due to a variety of genetic, developmental ...

  14. Confounding Variables in Statistics

    Confounding variables are the other variables or factors that may cause research results. For example, let's say that Michael conducts a new experiment to test the effectiveness of the pain reliever.

  15. Confounding Variable in Psychology (Examples + Definition)

    Confounding variables are generally defined as elements showing that the independent variable is not the only one influencing the dependent variable. Simply put, it is a factor that is related to both the independent and dependent variables but was excluded from your analysis. It is referred to as a confounding variable.

  16. Confounding Variables

    Confounding variables (aka confounders or confounding factors) are a type of extraneous variable related to a study's independent and dependent variables. A variable must meet two conditions to be a confounder: It must be correlated with the independent variable. This may be a causal relationship, but it does not have to be.

  17. What is a confounding variable?

    A confounding variable relates to an experiment's dependent and independent variables. Confounding variables can be difficult to see because you usually don't bring them into an experiment deliberately. However, they can affect the outcome. Returning to the example of a tutoring program, researchers should look for possible confounding variables.

  18. How to control confounding effects by statistical analysis

    To control for confounding in the analyses, investigators should measure the confounders in the study. Researchers usually do this by collecting data on all known, previously identified confounders. There are mostly two options to dealing with confounders in analysis stage; Stratification and Multivariate methods. 1.

  19. 5 Real-World Examples of Confounding [With References]

    Confounding is an example of such mechanism that alters the relationship between X and Y, and therefore, leads to an over or underestimation of the true effect between them. In its simplest form, it is due to a third variable — the confounder C — that represents a common cause: In order to uncover the true relationship between X and Y, we ...

  20. Confounding Variables: 3 Examples and 5 Case Studies

    Confounding variables can be defined as variables that influence both the dependent variable and independent variable, causing a spurious or erroneous association. This characteristic offers a perplexing complexity to research, for they meddle with the causal relationship that the experiment intends to establish, adding an extra layer of ...

  21. 3.6: Research Design I- Experimental Designs

    Limited control: Field experiments are often conducted in real-world settings, which means the researcher has less control over the experimental conditions than in lab-based experiments. This can make it more difficult to isolate the effects of the independent variable and to control for extraneous variables that may affect the outcome.

  22. Confounding variables

    A confounding variable is a variable, other than the independent variable that you're interested in, that may affect the dependent variable. This can lead to erroneous conclusions about the relationship between the independent and dependent variables. You deal with confounding variables by controlling them; by matching; by randomizing; or by ...

  23. Extraneous Variables

    Example: Confounding vs. extraneous variables Having participants who work in scientific professions (in labs) is a confounding variable in your study, because this type of work correlates with wearing a lab coat and better scientific reasoning.. People who work in labs would regularly wear lab coats and may have higher scientific knowledge in general.

  24. Famous easy to understand examples of a confounding variable

    $\begingroup$ Another example I recall from our classes was the correlation of the amount of fire engines involved in fighting a fire with the damage of the fire. The confounding variable there was that the bigger the fire, the bigger the damages, but also the more engines were needed to fight the fire.