- Privacy Policy
Home » Experimental Design – Types, Methods, Guide
Experimental Design – Types, Methods, Guide
Table of Contents
Experimental design is a structured approach used to conduct scientific experiments. It enables researchers to explore cause-and-effect relationships by controlling variables and testing hypotheses. This guide explores the types of experimental designs, common methods, and best practices for planning and conducting experiments.
Experimental Design
Experimental design refers to the process of planning a study to test a hypothesis, where variables are manipulated to observe their effects on outcomes. By carefully controlling conditions, researchers can determine whether specific factors cause changes in a dependent variable.
Key Characteristics of Experimental Design :
- Manipulation of Variables : The researcher intentionally changes one or more independent variables.
- Control of Extraneous Factors : Other variables are kept constant to avoid interference.
- Randomization : Subjects are often randomly assigned to groups to reduce bias.
- Replication : Repeating the experiment or having multiple subjects helps verify results.
Purpose of Experimental Design
The primary purpose of experimental design is to establish causal relationships by controlling for extraneous factors and reducing bias. Experimental designs help:
- Test Hypotheses : Determine if there is a significant effect of independent variables on dependent variables.
- Control Confounding Variables : Minimize the impact of variables that could distort results.
- Generate Reproducible Results : Provide a structured approach that allows other researchers to replicate findings.
Types of Experimental Designs
Experimental designs can vary based on the number of variables, the assignment of participants, and the purpose of the experiment. Here are some common types:
1. Pre-Experimental Designs
These designs are exploratory and lack random assignment, often used when strict control is not feasible. They provide initial insights but are less rigorous in establishing causality.
- Example : A training program is provided, and participants’ knowledge is tested afterward, without a pretest.
- Example : A group is tested on reading skills, receives instruction, and is tested again to measure improvement.
2. True Experimental Designs
True experiments involve random assignment of participants to control or experimental groups, providing high levels of control over variables.
- Example : A new drug’s efficacy is tested with patients randomly assigned to receive the drug or a placebo.
- Example : Two groups are observed after one group receives a treatment, and the other receives no intervention.
3. Quasi-Experimental Designs
Quasi-experiments lack random assignment but still aim to determine causality by comparing groups or time periods. They are often used when randomization isn’t possible, such as in natural or field experiments.
- Example : Schools receive different curriculums, and students’ test scores are compared before and after implementation.
- Example : Traffic accident rates are recorded for a city before and after a new speed limit is enforced.
4. Factorial Designs
Factorial designs test the effects of multiple independent variables simultaneously. This design is useful for studying the interactions between variables.
- Example : Studying how caffeine (variable 1) and sleep deprivation (variable 2) affect memory performance.
- Example : An experiment studying the impact of age, gender, and education level on technology usage.
5. Repeated Measures Design
In repeated measures designs, the same participants are exposed to different conditions or treatments. This design is valuable for studying changes within subjects over time.
- Example : Measuring reaction time in participants before, during, and after caffeine consumption.
- Example : Testing two medications, with each participant receiving both but in a different sequence.
Methods for Implementing Experimental Designs
- Purpose : Ensures each participant has an equal chance of being assigned to any group, reducing selection bias.
- Method : Use random number generators or assignment software to allocate participants randomly.
- Purpose : Prevents participants or researchers from knowing which group (experimental or control) participants belong to, reducing bias.
- Method : Implement single-blind (participants unaware) or double-blind (both participants and researchers unaware) procedures.
- Purpose : Provides a baseline for comparison, showing what would happen without the intervention.
- Method : Include a group that does not receive the treatment but otherwise undergoes the same conditions.
- Purpose : Controls for order effects in repeated measures designs by varying the order of treatments.
- Method : Assign different sequences to participants, ensuring that each condition appears equally across orders.
- Purpose : Ensures reliability by repeating the experiment or including multiple participants within groups.
- Method : Increase sample size or repeat studies with different samples or in different settings.
Steps to Conduct an Experimental Design
- Clearly state what you intend to discover or prove through the experiment. A strong hypothesis guides the experiment’s design and variable selection.
- Independent Variable (IV) : The factor manipulated by the researcher (e.g., amount of sleep).
- Dependent Variable (DV) : The outcome measured (e.g., reaction time).
- Control Variables : Factors kept constant to prevent interference with results (e.g., time of day for testing).
- Choose a design type that aligns with your research question, hypothesis, and available resources. For example, an RCT for a medical study or a factorial design for complex interactions.
- Randomly assign participants to experimental or control groups. Ensure control groups are similar to experimental groups in all respects except for the treatment received.
- Randomize the assignment and, if possible, apply blinding to minimize potential bias.
- Follow a consistent procedure for each group, collecting data systematically. Record observations and manage any unexpected events or variables that may arise.
- Use appropriate statistical methods to test for significant differences between groups, such as t-tests, ANOVA, or regression analysis.
- Determine whether the results support your hypothesis and analyze any trends, patterns, or unexpected findings. Discuss possible limitations and implications of your results.
Examples of Experimental Design in Research
- Medicine : Testing a new drug’s effectiveness through a randomized controlled trial, where one group receives the drug and another receives a placebo.
- Psychology : Studying the effect of sleep deprivation on memory using a within-subject design, where participants are tested with different sleep conditions.
- Education : Comparing teaching methods in a quasi-experimental design by measuring students’ performance before and after implementing a new curriculum.
- Marketing : Using a factorial design to examine the effects of advertisement type and frequency on consumer purchase behavior.
- Environmental Science : Testing the impact of a pollution reduction policy through a time series design, recording pollution levels before and after implementation.
Experimental design is fundamental to conducting rigorous and reliable research, offering a systematic approach to exploring causal relationships. With various types of designs and methods, researchers can choose the most appropriate setup to answer their research questions effectively. By applying best practices, controlling variables, and selecting suitable statistical methods, experimental design supports meaningful insights across scientific, medical, and social research fields.
- Campbell, D. T., & Stanley, J. C. (1963). Experimental and Quasi-Experimental Designs for Research . Houghton Mifflin Company.
- Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference . Houghton Mifflin.
- Fisher, R. A. (1935). The Design of Experiments . Oliver and Boyd.
- Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics . Sage Publications.
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences . Routledge.
About the author
Muhammad Hassan
Researcher, Academic Writer, Web developer
You may also like
Correlational Research – Methods, Types and...
Basic Research – Types, Methods and Examples
Quasi-Experimental Research Design – Types...
Qualitative Research – Methods, Analysis Types...
Research Methods – Types, Examples and Guide
Observational Research – Methods and Guide
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
10 Experimental research
Experimental research—often considered to be the ‘gold standard’ in research designs—is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different treatment levels (random assignment), and the results of the treatments on outcomes (dependent variables) are observed. The unique strength of experimental research is its internal validity (causality) due to its ability to link cause and effect through treatment manipulation, while controlling for the spurious effect of extraneous variable.
Experimental research is best suited for explanatory research—rather than for descriptive or exploratory research—where the goal of the study is to examine cause-effect relationships. It also works well for research that involves a relatively limited and well-defined set of independent variables that can either be manipulated or controlled. Experimental research can be conducted in laboratory or field settings. Laboratory experiments , conducted in laboratory (artificial) settings, tend to be high in internal validity, but this comes at the cost of low external validity (generalisability), because the artificial (laboratory) setting in which the study is conducted may not reflect the real world. Field experiments are conducted in field settings such as in a real organisation, and are high in both internal and external validity. But such experiments are relatively rare, because of the difficulties associated with manipulating treatments and controlling for extraneous effects in a field setting.
Experimental research can be grouped into two broad categories: true experimental designs and quasi-experimental designs. Both designs require treatment manipulation, but while true experiments also require random assignment, quasi-experiments do not. Sometimes, we also refer to non-experimental research, which is not really a research design, but an all-inclusive term that includes all types of research that do not employ treatment manipulation or random assignment, such as survey research, observational research, and correlational studies.
Basic concepts
Treatment and control groups. In experimental research, some subjects are administered one or more experimental stimulus called a treatment (the treatment group ) while other subjects are not given such a stimulus (the control group ). The treatment may be considered successful if subjects in the treatment group rate more favourably on outcome variables than control group subjects. Multiple levels of experimental stimulus may be administered, in which case, there may be more than one treatment group. For example, in order to test the effects of a new drug intended to treat a certain medical condition like dementia, if a sample of dementia patients is randomly divided into three groups, with the first group receiving a high dosage of the drug, the second group receiving a low dosage, and the third group receiving a placebo such as a sugar pill (control group), then the first two groups are experimental groups and the third group is a control group. After administering the drug for a period of time, if the condition of the experimental group subjects improved significantly more than the control group subjects, we can say that the drug is effective. We can also compare the conditions of the high and low dosage experimental groups to determine if the high dose is more effective than the low dose.
Treatment manipulation. Treatments are the unique feature of experimental research that sets this design apart from all other research methods. Treatment manipulation helps control for the ‘cause’ in cause-effect relationships. Naturally, the validity of experimental research depends on how well the treatment was manipulated. Treatment manipulation must be checked using pretests and pilot tests prior to the experimental study. Any measurements conducted before the treatment is administered are called pretest measures , while those conducted after the treatment are posttest measures .
Random selection and assignment. Random selection is the process of randomly drawing a sample from a population or a sampling frame. This approach is typically employed in survey research, and ensures that each unit in the population has a positive chance of being selected into the sample. Random assignment, however, is a process of randomly assigning subjects to experimental or control groups. This is a standard practice in true experimental research to ensure that treatment groups are similar (equivalent) to each other and to the control group prior to treatment administration. Random selection is related to sampling, and is therefore more closely related to the external validity (generalisability) of findings. However, random assignment is related to design, and is therefore most related to internal validity. It is possible to have both random selection and random assignment in well-designed experimental research, but quasi-experimental research involves neither random selection nor random assignment.
Threats to internal validity. Although experimental designs are considered more rigorous than other research methods in terms of the internal validity of their inferences (by virtue of their ability to control causes through treatment manipulation), they are not immune to internal validity threats. Some of these threats to internal validity are described below, within the context of a study of the impact of a special remedial math tutoring program for improving the math abilities of high school students.
History threat is the possibility that the observed effects (dependent variables) are caused by extraneous or historical events rather than by the experimental treatment. For instance, students’ post-remedial math score improvement may have been caused by their preparation for a math exam at their school, rather than the remedial math program.
Maturation threat refers to the possibility that observed effects are caused by natural maturation of subjects (e.g., a general improvement in their intellectual ability to understand complex concepts) rather than the experimental treatment.
Testing threat is a threat in pre-post designs where subjects’ posttest responses are conditioned by their pretest responses. For instance, if students remember their answers from the pretest evaluation, they may tend to repeat them in the posttest exam.
Not conducting a pretest can help avoid this threat.
Instrumentation threat , which also occurs in pre-post designs, refers to the possibility that the difference between pretest and posttest scores is not due to the remedial math program, but due to changes in the administered test, such as the posttest having a higher or lower degree of difficulty than the pretest.
Mortality threat refers to the possibility that subjects may be dropping out of the study at differential rates between the treatment and control groups due to a systematic reason, such that the dropouts were mostly students who scored low on the pretest. If the low-performing students drop out, the results of the posttest will be artificially inflated by the preponderance of high-performing students.
Regression threat —also called a regression to the mean—refers to the statistical tendency of a group’s overall performance to regress toward the mean during a posttest rather than in the anticipated direction. For instance, if subjects scored high on a pretest, they will have a tendency to score lower on the posttest (closer to the mean) because their high scores (away from the mean) during the pretest were possibly a statistical aberration. This problem tends to be more prevalent in non-random samples and when the two measures are imperfectly correlated.
Two-group experimental designs
Pretest-posttest control group design . In this design, subjects are randomly assigned to treatment and control groups, subjected to an initial (pretest) measurement of the dependent variables of interest, the treatment group is administered a treatment (representing the independent variable of interest), and the dependent variables measured again (posttest). The notation of this design is shown in Figure 10.1.
Statistical analysis of this design involves a simple analysis of variance (ANOVA) between the treatment and control groups. The pretest-posttest design handles several threats to internal validity, such as maturation, testing, and regression, since these threats can be expected to influence both treatment and control groups in a similar (random) manner. The selection threat is controlled via random assignment. However, additional threats to internal validity may exist. For instance, mortality can be a problem if there are differential dropout rates between the two groups, and the pretest measurement may bias the posttest measurement—especially if the pretest introduces unusual topics or content.
Posttest -only control group design . This design is a simpler version of the pretest-posttest design where pretest measurements are omitted. The design notation is shown in Figure 10.2.
The treatment effect is measured simply as the difference in the posttest scores between the two groups:
The appropriate statistical analysis of this design is also a two-group analysis of variance (ANOVA). The simplicity of this design makes it more attractive than the pretest-posttest design in terms of internal validity. This design controls for maturation, testing, regression, selection, and pretest-posttest interaction, though the mortality threat may continue to exist.
Because the pretest measure is not a measurement of the dependent variable, but rather a covariate, the treatment effect is measured as the difference in the posttest scores between the treatment and control groups as:
Due to the presence of covariates, the right statistical analysis of this design is a two-group analysis of covariance (ANCOVA). This design has all the advantages of posttest-only design, but with internal validity due to the controlling of covariates. Covariance designs can also be extended to pretest-posttest control group design.
Factorial designs
Two-group designs are inadequate if your research requires manipulation of two or more independent variables (treatments). In such cases, you would need four or higher-group designs. Such designs, quite popular in experimental research, are commonly called factorial designs. Each independent variable in this design is called a factor , and each subdivision of a factor is called a level . Factorial designs enable the researcher to examine not only the individual effect of each treatment on the dependent variables (called main effects), but also their joint effect (called interaction effects).
In a factorial design, a main effect is said to exist if the dependent variable shows a significant difference between multiple levels of one factor, at all levels of other factors. No change in the dependent variable across factor levels is the null case (baseline), from which main effects are evaluated. In the above example, you may see a main effect of instructional type, instructional time, or both on learning outcomes. An interaction effect exists when the effect of differences in one factor depends upon the level of a second factor. In our example, if the effect of instructional type on learning outcomes is greater for three hours/week of instructional time than for one and a half hours/week, then we can say that there is an interaction effect between instructional type and instructional time on learning outcomes. Note that the presence of interaction effects dominate and make main effects irrelevant, and it is not meaningful to interpret main effects if interaction effects are significant.
Hybrid experimental designs
Hybrid designs are those that are formed by combining features of more established designs. Three such hybrid designs are randomised bocks design, Solomon four-group design, and switched replications design.
Randomised block design. This is a variation of the posttest-only or pretest-posttest control group design where the subject population can be grouped into relatively homogeneous subgroups (called blocks ) within which the experiment is replicated. For instance, if you want to replicate the same posttest-only design among university students and full-time working professionals (two homogeneous blocks), subjects in both blocks are randomly split between the treatment group (receiving the same treatment) and the control group (see Figure 10.5). The purpose of this design is to reduce the ‘noise’ or variance in data that may be attributable to differences between the blocks so that the actual effect of interest can be detected more accurately.
Solomon four-group design . In this design, the sample is divided into two treatment groups and two control groups. One treatment group and one control group receive the pretest, and the other two groups do not. This design represents a combination of posttest-only and pretest-posttest control group design, and is intended to test for the potential biasing effect of pretest measurement on posttest measures that tends to occur in pretest-posttest designs, but not in posttest-only designs. The design notation is shown in Figure 10.6.
Switched replication design . This is a two-group design implemented in two phases with three waves of measurement. The treatment group in the first phase serves as the control group in the second phase, and the control group in the first phase becomes the treatment group in the second phase, as illustrated in Figure 10.7. In other words, the original design is repeated or replicated temporally with treatment/control roles switched between the two groups. By the end of the study, all participants will have received the treatment either during the first or the second phase. This design is most feasible in organisational contexts where organisational programs (e.g., employee training) are implemented in a phased manner or are repeated at regular intervals.
Quasi-experimental designs
Quasi-experimental designs are almost identical to true experimental designs, but lacking one key ingredient: random assignment. For instance, one entire class section or one organisation is used as the treatment group, while another section of the same class or a different organisation in the same industry is used as the control group. This lack of random assignment potentially results in groups that are non-equivalent, such as one group possessing greater mastery of certain content than the other group, say by virtue of having a better teacher in a previous semester, which introduces the possibility of selection bias . Quasi-experimental designs are therefore inferior to true experimental designs in interval validity due to the presence of a variety of selection related threats such as selection-maturation threat (the treatment and control groups maturing at different rates), selection-history threat (the treatment and control groups being differentially impacted by extraneous or historical events), selection-regression threat (the treatment and control groups regressing toward the mean between pretest and posttest at different rates), selection-instrumentation threat (the treatment and control groups responding differently to the measurement), selection-testing (the treatment and control groups responding differently to the pretest), and selection-mortality (the treatment and control groups demonstrating differential dropout rates). Given these selection threats, it is generally preferable to avoid quasi-experimental designs to the greatest extent possible.
In addition, there are quite a few unique non-equivalent designs without corresponding true experimental design cousins. Some of the more useful of these designs are discussed next.
Regression discontinuity (RD) design . This is a non-equivalent pretest-posttest design where subjects are assigned to the treatment or control group based on a cut-off score on a preprogram measure. For instance, patients who are severely ill may be assigned to a treatment group to test the efficacy of a new drug or treatment protocol and those who are mildly ill are assigned to the control group. In another example, students who are lagging behind on standardised test scores may be selected for a remedial curriculum program intended to improve their performance, while those who score high on such tests are not selected from the remedial program.
Because of the use of a cut-off score, it is possible that the observed results may be a function of the cut-off score rather than the treatment, which introduces a new threat to internal validity. However, using the cut-off score also ensures that limited or costly resources are distributed to people who need them the most, rather than randomly across a population, while simultaneously allowing a quasi-experimental treatment. The control group scores in the RD design do not serve as a benchmark for comparing treatment group scores, given the systematic non-equivalence between the two groups. Rather, if there is no discontinuity between pretest and posttest scores in the control group, but such a discontinuity persists in the treatment group, then this discontinuity is viewed as evidence of the treatment effect.
Proxy pretest design . This design, shown in Figure 10.11, looks very similar to the standard NEGD (pretest-posttest) design, with one critical difference: the pretest score is collected after the treatment is administered. A typical application of this design is when a researcher is brought in to test the efficacy of a program (e.g., an educational program) after the program has already started and pretest data is not available. Under such circumstances, the best option for the researcher is often to use a different prerecorded measure, such as students’ grade point average before the start of the program, as a proxy for pretest data. A variation of the proxy pretest design is to use subjects’ posttest recollection of pretest data, which may be subject to recall bias, but nevertheless may provide a measure of perceived gain or change in the dependent variable.
Separate pretest-posttest samples design . This design is useful if it is not possible to collect pretest and posttest data from the same subjects for some reason. As shown in Figure 10.12, there are four groups in this design, but two groups come from a single non-equivalent group, while the other two groups come from a different non-equivalent group. For instance, say you want to test customer satisfaction with a new online service that is implemented in one city but not in another. In this case, customers in the first city serve as the treatment group and those in the second city constitute the control group. If it is not possible to obtain pretest and posttest measures from the same customers, you can measure customer satisfaction at one point in time, implement the new service program, and measure customer satisfaction (with a different set of customers) after the program is implemented. Customer satisfaction is also measured in the control group at the same times as in the treatment group, but without the new program implementation. The design is not particularly strong, because you cannot examine the changes in any specific customer’s satisfaction score before and after the implementation, but you can only examine average customer satisfaction scores. Despite the lower internal validity, this design may still be a useful way of collecting quasi-experimental data when pretest and posttest data is not available from the same subjects.
An interesting variation of the NEDV design is a pattern-matching NEDV design , which employs multiple outcome variables and a theory that explains how much each variable will be affected by the treatment. The researcher can then examine if the theoretical prediction is matched in actual observations. This pattern-matching technique—based on the degree of correspondence between theoretical and observed patterns—is a powerful way of alleviating internal validity concerns in the original NEDV design.
Perils of experimental research
Experimental research is one of the most difficult of research designs, and should not be taken lightly. This type of research is often best with a multitude of methodological problems. First, though experimental research requires theories for framing hypotheses for testing, much of current experimental research is atheoretical. Without theories, the hypotheses being tested tend to be ad hoc, possibly illogical, and meaningless. Second, many of the measurement instruments used in experimental research are not tested for reliability and validity, and are incomparable across studies. Consequently, results generated using such instruments are also incomparable. Third, often experimental research uses inappropriate research designs, such as irrelevant dependent variables, no interaction effects, no experimental controls, and non-equivalent stimulus across treatment groups. Findings from such studies tend to lack internal validity and are highly suspect. Fourth, the treatments (tasks) used in experimental research may be diverse, incomparable, and inconsistent across studies, and sometimes inappropriate for the subject population. For instance, undergraduate student subjects are often asked to pretend that they are marketing managers and asked to perform a complex budget allocation task in which they have no experience or expertise. The use of such inappropriate tasks, introduces new threats to internal validity (i.e., subject’s performance may be an artefact of the content or difficulty of the task setting), generates findings that are non-interpretable and meaningless, and makes integration of findings across studies impossible.
The design of proper experimental treatments is a very important task in experimental design, because the treatment is the raison d’etre of the experimental method, and must never be rushed or neglected. To design an adequate and appropriate task, researchers should use prevalidated tasks if available, conduct treatment manipulation checks to check for the adequacy of such tasks (by debriefing subjects after performing the assigned task), conduct pilot tests (repeatedly, if necessary), and if in doubt, use tasks that are simple and familiar for the respondent sample rather than tasks that are complex or unfamiliar.
In summary, this chapter introduced key concepts in the experimental design research method and introduced a variety of true experimental and quasi-experimental designs. Although these designs vary widely in internal validity, designs with less internal validity should not be overlooked and may sometimes be useful under specific circumstances and empirical contingencies.
Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Share This Book
Experimentation in Scientific Research: Variables and controls in practice
by Anthony Carpi, Ph.D., Anne E. Egger, Ph.D.
Listen to this reading
Did you know that experimental design was developed more than a thousand years ago by a Middle Eastern scientist who studied light? All of us use a form of experimental research in our day to day lives when we try to find the spot with the best cell phone reception, try out new cooking recipes, and more. Scientific experiments are built on similar principles.
Experimentation is a research method in which one or more variables are consciously manipulated and the outcome or effect of that manipulation on other variables is observed.
Experimental designs often make use of controls that provide a measure of variability within a system and a check for sources of error.
Experimental methods are commonly applied to determine causal relationships or to quantify the magnitude of response of a variable.
Anyone who has used a cellular phone knows that certain situations require a bit of research: If you suddenly find yourself in an area with poor phone reception, you might move a bit to the left or right, walk a few steps forward or back, or even hold the phone over your head to get a better signal. While the actions of a cell phone user might seem obvious, the person seeking cell phone reception is actually performing a scientific experiment: consciously manipulating one component (the location of the cell phone) and observing the effect of that action on another component (the phone's reception). Scientific experiments are obviously a bit more complicated, and generally involve more rigorous use of controls , but they draw on the same type of reasoning that we use in many everyday situations. In fact, the earliest documented scientific experiments were devised to answer a very common everyday question: how vision works.
- A brief history of experimental methods
Figure 1: Alhazen (965-ca.1039) as pictured on an Iraqi 10,000-dinar note
One of the first ideas regarding how human vision works came from the Greek philosopher Empedocles around 450 BCE . Empedocles reasoned that the Greek goddess Aphrodite had lit a fire in the human eye, and vision was possible because light rays from this fire emanated from the eye, illuminating objects around us. While a number of people challenged this proposal, the idea that light radiated from the human eye proved surprisingly persistent until around 1,000 CE , when a Middle Eastern scientist advanced our knowledge of the nature of light and, in so doing, developed a new and more rigorous approach to scientific research . Abū 'Alī al-Hasan ibn al-Hasan ibn al-Haytham, also known as Alhazen , was born in 965 CE in the Arabian city of Basra in what is present-day Iraq. He began his scientific studies in physics, mathematics, and other sciences after reading the works of several Greek philosophers. One of Alhazen's most significant contributions was a seven-volume work on optics titled Kitab al-Manazir (later translated to Latin as Opticae Thesaurus Alhazeni – Alhazen's Book of Optics ). Beyond the contributions this book made to the field of optics, it was a remarkable work in that it based conclusions on experimental evidence rather than abstract reasoning – the first major publication to do so. Alhazen's contributions have proved so significant that his likeness was immortalized on the 2003 10,000-dinar note issued by Iraq (Figure 1).
Alhazen invested significant time studying light , color, shadows, rainbows, and other optical phenomena. Among this work was a study in which he stood in a darkened room with a small hole in one wall. Outside of the room, he hung two lanterns at different heights. Alhazen observed that the light from each lantern illuminated a different spot in the room, and each lighted spot formed a direct line with the hole and one of the lanterns outside the room. He also found that covering a lantern caused the spot it illuminated to darken, and exposing the lantern caused the spot to reappear. Thus, Alhazen provided some of the first experimental evidence that light does not emanate from the human eye but rather is emitted by certain objects (like lanterns) and travels from these objects in straight lines. Alhazen's experiment may seem simplistic today, but his methodology was groundbreaking: He developed a hypothesis based on observations of physical relationships (that light comes from objects), and then designed an experiment to test that hypothesis. Despite the simplicity of the method , Alhazen's experiment was a critical step in refuting the long-standing theory that light emanated from the human eye, and it was a major event in the development of modern scientific research methodology.
Comprehension Checkpoint
- Experimentation as a scientific research method
Experimentation is one scientific research method , perhaps the most recognizable, in a spectrum of methods that also includes description, comparison, and modeling (see our Description , Comparison , and Modeling modules). While all of these methods share in common a scientific approach, experimentation is unique in that it involves the conscious manipulation of certain aspects of a real system and the observation of the effects of that manipulation. You could solve a cell phone reception problem by walking around a neighborhood until you see a cell phone tower, observing other cell phone users to see where those people who get the best reception are standing, or looking on the web for a map of cell phone signal coverage. All of these methods could also provide answers, but by moving around and testing reception yourself, you are experimenting.
- Variables: Independent and dependent
In the experimental method , a condition or a parameter , generally referred to as a variable , is consciously manipulated (often referred to as a treatment) and the outcome or effect of that manipulation is observed on other variables. Variables are given different names depending on whether they are the ones manipulated or the ones observed:
- Independent variable refers to a condition within an experiment that is manipulated by the scientist.
- Dependent variable refers to an event or outcome of an experiment that might be affected by the manipulation of the independent variable .
Scientific experimentation helps to determine the nature of the relationship between independent and dependent variables . While it is often difficult, or sometimes impossible, to manipulate a single variable in an experiment , scientists often work to minimize the number of variables being manipulated. For example, as we move from one location to another to get better cell reception, we likely change the orientation of our body, perhaps from south-facing to east-facing, or we hold the cell phone at a different angle. Which variable affected reception: location, orientation, or angle of the phone? It is critical that scientists understand which aspects of their experiment they are manipulating so that they can accurately determine the impacts of that manipulation . In order to constrain the possible outcomes of an experimental procedure, most scientific experiments use a system of controls .
- Controls: Negative, positive, and placebos
In a controlled study, a scientist essentially runs two (or more) parallel and simultaneous experiments: a treatment group, in which the effect of an experimental manipulation is observed on a dependent variable , and a control group, which uses all of the same conditions as the first with the exception of the actual treatment. Controls can fall into one of two groups: negative controls and positive controls .
In a negative control , the control group is exposed to all of the experimental conditions except for the actual treatment . The need to match all experimental conditions exactly is so great that, for example, in a trial for a new drug, the negative control group will be given a pill or liquid that looks exactly like the drug, except that it will not contain the drug itself, a control often referred to as a placebo . Negative controls allow scientists to measure the natural variability of the dependent variable(s), provide a means of measuring error in the experiment , and also provide a baseline to measure against the experimental treatment.
Some experimental designs also make use of positive controls . A positive control is run as a parallel experiment and generally involves the use of an alternative treatment that the researcher knows will have an effect on the dependent variable . For example, when testing the effectiveness of a new drug for pain relief, a scientist might administer treatment placebo to one group of patients as a negative control , and a known treatment like aspirin to a separate group of individuals as a positive control since the pain-relieving aspects of aspirin are well documented. In both cases, the controls allow scientists to quantify background variability and reject alternative hypotheses that might otherwise explain the effect of the treatment on the dependent variable .
- Experimentation in practice: The case of Louis Pasteur
Well-controlled experiments generally provide strong evidence of causality, demonstrating whether the manipulation of one variable causes a response in another variable. For example, as early as the 6th century BCE , Anaximander , a Greek philosopher, speculated that life could be formed from a mixture of sea water, mud, and sunlight. The idea probably stemmed from the observation of worms, mosquitoes, and other insects "magically" appearing in mudflats and other shallow areas. While the suggestion was challenged on a number of occasions, the idea that living microorganisms could be spontaneously generated from air persisted until the middle of the 18 th century.
In the 1750s, John Needham, a Scottish clergyman and naturalist, claimed to have proved that spontaneous generation does occur when he showed that microorganisms flourished in certain foods such as soup broth, even after they had been briefly boiled and covered. Several years later, the Italian abbot and biologist Lazzaro Spallanzani , boiled soup broth for over an hour and then placed bowls of this soup in different conditions, sealing some and leaving others exposed to air. Spallanzani found that microorganisms grew in the soup exposed to air but were absent from the sealed soup. He therefore challenged Needham's conclusions and hypothesized that microorganisms suspended in air settled onto the exposed soup but not the sealed soup, and rejected the idea of spontaneous generation .
Needham countered, arguing that the growth of bacteria in the soup was not due to microbes settling onto the soup from the air, but rather because spontaneous generation required contact with an intangible "life force" in the air itself. He proposed that Spallanzani's extensive boiling destroyed the "life force" present in the soup, preventing spontaneous generation in the sealed bowls but allowing air to replenish the life force in the open bowls. For several decades, scientists continued to debate the spontaneous generation theory of life, with support for the theory coming from several notable scientists including Félix Pouchet and Henry Bastion. Pouchet, Director of the Rouen Museum of Natural History in France, and Bastion, a well-known British bacteriologist, argued that living organisms could spontaneously arise from chemical processes such as fermentation and putrefaction. The debate became so heated that in 1860, the French Academy of Sciences established the Alhumbert prize of 2,500 francs to the first person who could conclusively resolve the conflict. In 1864, Louis Pasteur achieved that result with a series of well-controlled experiments and in doing so claimed the Alhumbert prize.
Pasteur prepared for his experiments by studying the work of others that came before him. In fact, in April 1861 Pasteur wrote to Pouchet to obtain a research description that Pouchet had published. In this letter, Pasteur writes:
Paris, April 3, 1861 Dear Colleague, The difference of our opinions on the famous question of spontaneous generation does not prevent me from esteeming highly your labor and praiseworthy efforts... The sincerity of these sentiments...permits me to have recourse to your obligingness in full confidence. I read with great care everything that you write on the subject that occupies both of us. Now, I cannot obtain a brochure that I understand you have just published.... I would be happy to have a copy of it because I am at present editing the totality of my observations, where naturally I criticize your assertions. L. Pasteur (Porter, 1961)
Pasteur received the brochure from Pouchet several days later and went on to conduct his own experiments . In these, he repeated Spallanzani's method of boiling soup broth, but he divided the broth into portions and exposed these portions to different controlled conditions. Some broth was placed in flasks that had straight necks that were open to the air, some broth was placed in sealed flasks that were not open to the air, and some broth was placed into a specially designed set of swan-necked flasks, in which the broth would be open to the air but the air would have to travel a curved path before reaching the broth, thus preventing anything that might be present in the air from simply settling onto the soup (Figure 2). Pasteur then observed the response of the dependent variable (the growth of microorganisms) in response to the independent variable (the design of the flask). Pasteur's experiments contained both positive controls (samples in the straight-necked flasks that he knew would become contaminated with microorganisms) and negative controls (samples in the sealed flasks that he knew would remain sterile). If spontaneous generation did indeed occur upon exposure to air, Pasteur hypothesized, microorganisms would be found in both the swan-neck flasks and the straight-necked flasks, but not in the sealed flasks. Instead, Pasteur found that microorganisms appeared in the straight-necked flasks, but not in the sealed flasks or the swan-necked flasks.
Figure 2: Pasteur's drawings of the flasks he used (Pasteur, 1861). Fig. 25 D, C, and B (top) show various sealed flasks (negative controls); Fig. 26 (bottom right) illustrates a straight-necked flask directly open to the atmosphere (positive control); and Fig. 25 A (bottom left) illustrates the specially designed swan-necked flask (treatment group).
By using controls and replicating his experiment (he used more than one of each type of flask), Pasteur was able to answer many of the questions that still surrounded the issue of spontaneous generation. Pasteur said of his experimental design, "I affirm with the most perfect sincerity that I have never had a single experiment, arranged as I have just explained, which gave me a doubtful result" (Porter, 1961). Pasteur's work helped refute the theory of spontaneous generation – his experiments showed that air alone was not the cause of bacterial growth in the flask, and his research supported the hypothesis that live microorganisms suspended in air could settle onto the broth in open-necked flasks via gravity .
- Experimentation across disciplines
Experiments are used across all scientific disciplines to investigate a multitude of questions. In some cases, scientific experiments are used for exploratory purposes in which the scientist does not know what the dependent variable is. In this type of experiment, the scientist will manipulate an independent variable and observe what the effect of the manipulation is in order to identify a dependent variable (or variables). Exploratory experiments are sometimes used in nutritional biology when scientists probe the function and purpose of dietary nutrients . In one approach, a scientist will expose one group of animals to a normal diet, and a second group to a similar diet except that it is lacking a specific vitamin or nutrient. The researcher will then observe the two groups to see what specific physiological changes or medical problems arise in the group lacking the nutrient being studied.
Scientific experiments are also commonly used to quantify the magnitude of a relationship between two or more variables . For example, in the fields of pharmacology and toxicology, scientific experiments are used to determine the dose-response relationship of a new drug or chemical. In these approaches, researchers perform a series of experiments in which a population of organisms , such as laboratory mice, is separated into groups and each group is exposed to a different amount of the drug or chemical of interest. The analysis of the data that result from these experiments (see our Data Analysis and Interpretation module) involves comparing the degree of the organism's response to the dose of the substance administered.
In this context, experiments can provide additional evidence to complement other research methods . For example, in the 1950s a great debate ensued over whether or not the chemicals in cigarette smoke cause cancer. Several researchers had conducted comparative studies (see our Comparison in Scientific Research module) that indicated that patients who smoked had a higher probability of developing lung cancer when compared to nonsmokers. Comparative studies differ slightly from experimental methods in that you do not consciously manipulate a variable ; rather you observe differences between two or more groups depending on whether or not they fall into a treatment or control group. Cigarette companies and lobbyists criticized these studies, suggesting that the relationship between smoking and lung cancer was coincidental. Several researchers noted the need for a clear dose-response study; however, the difficulties in getting cigarette smoke into the lungs of laboratory animals prevented this research. In the mid-1950s, Ernest Wynder and colleagues had an ingenious idea: They condensed the chemicals from cigarette smoke into a liquid and applied this in various doses to the skin of groups of mice. The researchers published data from a dose-response experiment of the effect of tobacco smoke condensate on mice (Wynder et al., 1957).
As seen in Figure 3, the researchers found a positive relationship between the amount of condensate applied to the skin of mice and the number of cancers that developed. The graph shows the results of a study in which different groups of mice were exposed to increasing amounts of cigarette tar. The black dots indicate the percentage of each sample group of mice that developed cancer for a given amount cigarette smoke "condensate" applied to their skin. The vertical lines are error bars, showing the amount of uncertainty . The graph shows generally increasing cancer rates with greater exposure. This study was one of the first pieces of experimental evidence in the cigarette smoking debate , and it helped strengthen the case for cigarette smoke as the causative agent in lung cancer in smokers.
Figure 3: Percentage of mice with cancer versus the amount cigarette smoke "condensate" applied to their skin (source: Wynder et al., 1957).
Sometimes experimental approaches and other research methods are not clearly distinct, or scientists may even use multiple research approaches in combination. For example, at 1:52 a.m. EDT on July 4, 2005, scientists with the National Aeronautics and Space Administration (NASA) conducted a study in which a 370 kg spacecraft named Deep Impact was purposely slammed into passing comet Tempel 1. A nearby spacecraft observed the impact and radioed data back to Earth. The research was partially descriptive in that it documented the chemical composition of the comet, but it was also partly experimental in that the effect of slamming the Deep Impact probe into the comet on the volatilization of previously undetected compounds , such as water, was assessed (A'Hearn et al., 2005). It is particularly common that experimentation and description overlap: Another example is Jane Goodall 's research on the behavior of chimpanzees, which can be read in our Description in Scientific Research module.
- Limitations of experimental methods
Figure 4: An image of comet Tempel 1 67 seconds after collision with the Deep Impact impactor. Image credit: NASA/JPL-Caltech/UMD http://deepimpact.umd.edu/gallery/HRI_937_1.html
While scientific experiments provide invaluable data regarding causal relationships, they do have limitations. One criticism of experiments is that they do not necessarily represent real-world situations. In order to clearly identify the relationship between an independent variable and a dependent variable , experiments are designed so that many other contributing variables are fixed or eliminated. For example, in an experiment designed to quantify the effect of vitamin A dose on the metabolism of beta-carotene in humans, Shawna Lemke and colleagues had to precisely control the diet of their human volunteers (Lemke, Dueker et al. 2003). They asked their participants to limit their intake of foods rich in vitamin A and further asked that they maintain a precise log of all foods eaten for 1 week prior to their study. At the time of their study, they controlled their participants' diet by feeding them all the same meals, described in the methods section of their research article in this way:
Meals were controlled for time and content on the dose administration day. Lunch was served at 5.5 h postdosing and consisted of a frozen dinner (Enchiladas, Amy's Kitchen, Petaluma, CA), a blueberry bagel with jelly, 1 apple and 1 banana, and a large chocolate chunk cookie (Pepperidge Farm). Dinner was served 10.5 h post dose and consisted of a frozen dinner (Chinese Stir Fry, Amy's Kitchen) plus the bagel and fruit taken for lunch.
While this is an important aspect of making an experiment manageable and informative, it is often not representative of the real world, in which many variables may change at once, including the foods you eat. Still, experimental research is an excellent way of determining relationships between variables that can be later validated in real world settings through descriptive or comparative studies.
Design is critical to the success or failure of an experiment . Slight variations in the experimental set-up could strongly affect the outcome being measured. For example, during the 1950s, a number of experiments were conducted to evaluate the toxicity in mammals of the metal molybdenum, using rats as experimental subjects . Unexpectedly, these experiments seemed to indicate that the type of cage the rats were housed in affected the toxicity of molybdenum. In response, G. Brinkman and Russell Miller set up an experiment to investigate this observation (Brinkman & Miller, 1961). Brinkman and Miller fed two groups of rats a normal diet that was supplemented with 200 parts per million (ppm) of molybdenum. One group of rats was housed in galvanized steel (steel coated with zinc to reduce corrosion) cages and the second group was housed in stainless steel cages. Rats housed in the galvanized steel cages suffered more from molybdenum toxicity than the other group: They had higher concentrations of molybdenum in their livers and lower blood hemoglobin levels. It was then shown that when the rats chewed on their cages, those housed in the galvanized metal cages absorbed zinc plated onto the metal bars, and zinc is now known to affect the toxicity of molybdenum. In order to control for zinc exposure, then, stainless steel cages needed to be used for all rats.
Scientists also have an obligation to adhere to ethical limits in designing and conducting experiments . During World War II, doctors working in Nazi Germany conducted many heinous experiments using human subjects . Among them was an experiment meant to identify effective treatments for hypothermia in humans, in which concentration camp prisoners were forced to sit in ice water or left naked outdoors in freezing temperatures and then re-warmed by various means. Many of the exposed victims froze to death or suffered permanent injuries. As a result of the Nazi experiments and other unethical research , strict scientific ethical standards have been adopted by the United States and other governments, and by the scientific community at large. Among other things, ethical standards (see our Scientific Ethics module) require that the benefits of research outweigh the risks to human subjects, and those who participate do so voluntarily and only after they have been made fully aware of all the risks posed by the research. These guidelines have far-reaching effects: While the clearest indication of causation in the cigarette smoke and lung cancer debate would have been to design an experiment in which one group of people was asked to take up smoking and another group was asked to refrain from smoking, it would be highly unethical for a scientist to purposefully expose a group of healthy people to a suspected cancer causing agent. As an alternative, comparative studies (see our Comparison in Scientific Research module) were initiated in humans, and experimental studies focused on animal subjects. The combination of these and other studies provided even stronger evidence of the link between smoking and lung cancer than either one method alone would have.
- Experimentation in modern practice
Like all scientific research , the results of experiments are shared with the scientific community, are built upon, and inspire additional experiments and research. For example, once Alhazen established that light given off by objects enters the human eye, the natural question that was asked was "What is the nature of light that enters the human eye?" Two common theories about the nature of light were debated for many years. Sir Isaac Newton was among the principal proponents of a theory suggesting that light was made of small particles . The English naturalist Robert Hooke (who held the interesting title of Curator of Experiments at the Royal Society of London) supported a different theory stating that light was a type of wave, like sound waves . In 1801, Thomas Young conducted a now classic scientific experiment that helped resolve this controversy . Young, like Alhazen, worked in a darkened room and allowed light to enter only through a small hole in a window shade. Young refocused the beam of light with mirrors and split the beam with a paper-thin card. The split light beams were then projected onto a screen, and formed an alternating light and dark banding pattern (Figure 5) – that was a sign that light was indeed a wave (see our Light I: Particle or Wave? module).
Figure 5: Young's depiction of the results of his experiment (Young, 1845). The dark spot represents the card held in front of a window slit, producing two parallel beams of light. The light and dark bands represent the brighter and darker bands he observed.
Approximately 100 years later, in 1905, new experiments led Albert Einstein to conclude that light exhibits properties of both waves and particles . Einstein's dual wave-particle theory is now generally accepted by scientists.
Experiments continue to help refine our understanding of light even today. In addition to his wave-particle theory , Einstein also proposed that the speed of light was unchanging and absolute. Yet in 1998 a group of scientists led by Lene Hau showed that light could be slowed from its normal speed of 3 x 10 8 meters per second to a mere 17 meters per second with a special experimental apparatus (Hau et al., 1999). The series of experiments that began with Alhazen 's work 1000 years ago has led to a progressively deeper understanding of the nature of light. Although the tools with which scientists conduct experiments may have become more complex, the principles behind controlled experiments are remarkably similar to those used by Pasteur and Alhazen hundreds of years ago.
Table of Contents
Activate glossary term highlighting to easily identify key terms within the module. Once highlighted, you can click on these terms to view their definitions.
Activate NGSS annotations to easily identify NGSS standards within the module. Once highlighted, you can click on them to view these standards.
IMAGES
VIDEO
COMMENTS
A good experimental design requires a strong understanding of the system you are studying. There are five key steps in designing an experiment: Consider your variables and how they are related; Write a specific, testable hypothesis; Design experimental treatments to manipulate your independent variable
The treatment group (also called the experimental group) receives the treatment whose effect the researcher is interested in. The control group receives either no treatment, a standard treatment whose effect is already known, or a placebo (a fake treatment to control for placebo effect).
In the design of experiments, hypotheses are applied to experimental units in a treatment group. [1] . In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. [2] . There may be more than one treatment group, more than one control group, or both.
This guide explores the types of experimental designs, common methods, and best practices for planning and conducting experiments. Experimental design refers to the process of planning a study to test a hypothesis, where variables are manipulated to observe their effects on outcomes.
In experimental research, some subjects are administered one or more experimental stimulus called a treatment (the treatment group) while other subjects are not given such a stimulus (the control group).
In a comparative experiment, the experimental group (aka the treatment group) is the group being tested for a reaction to a change in the variable. There may be experimental groups in a study, each testing a different level or amount of the variable.
This paper delineates the key features of experimental research, including the manipulation of variables, controlled conditions, random assignment, and meticulous measurement techniques to...
The group that receives the treatment in an experiment (here, the watered pot) is called the experimental group, while the group that does not receive the treatment (here, the dry pot) is called the control group. The control group provides a baseline that lets us see if the treatment has an effect.
In experiments, a treatment is something that researchers administer to experimental units. For example, a corn field is divided into four, each part is 'treated' with a different fertiliser to see which produces the most corn; a teacher practices different teaching methods on different groups in her class to see which yields the best results ...
Learn about the key aspects of experimentation as a research method in science. Includes information on manipulating variables and controls.