Back Home

  • Science Notes Posts
  • Contact Science Notes
  • Todd Helmenstine Biography
  • Anne Helmenstine Biography
  • Free Printable Periodic Tables (PDF and PNG)
  • Periodic Table Wallpapers
  • Interactive Periodic Table
  • Periodic Table Posters
  • Science Experiments for Kids
  • How to Grow Crystals
  • Chemistry Projects
  • Fire and Flames Projects
  • Holiday Science
  • Chemistry Problems With Answers
  • Physics Problems
  • Unit Conversion Example Problems
  • Chemistry Worksheets
  • Biology Worksheets
  • Periodic Table Worksheets
  • Physical Science Worksheets
  • Science Lab Worksheets
  • My Amazon Books

Experiment Definition in Science – What Is a Science Experiment?

Experiment Definition in Science

In science, an experiment is simply a test of a hypothesis in the scientific method . It is a controlled examination of cause and effect. Here is a look at what a science experiment is (and is not), the key factors in an experiment, examples, and types of experiments.

Experiment Definition in Science

By definition, an experiment is a procedure that tests a hypothesis. A hypothesis, in turn, is a prediction of cause and effect or the predicted outcome of changing one factor of a situation. Both the hypothesis and experiment are components of the scientific method. The steps of the scientific method are:

  • Make observations.
  • Ask a question or identify a problem.
  • State a hypothesis.
  • Perform an experiment that tests the hypothesis.
  • Based on the results of the experiment, either accept or reject the hypothesis.
  • Draw conclusions and report the outcome of the experiment.

Key Parts of an Experiment

The two key parts of an experiment are the independent and dependent variables. The independent variable is the one factor that you control or change in an experiment. The dependent variable is the factor that you measure that responds to the independent variable. An experiment often includes other types of variables , but at its heart, it’s all about the relationship between the independent and dependent variable.

Examples of Experiments

Fertilizer and plant size.

For example, you think a certain fertilizer helps plants grow better. You’ve watched your plants grow and they seem to do better when they have the fertilizer compared to when they don’t. But, observations are only the beginning of science. So, you state a hypothesis: Adding fertilizer increases plant size. Note, you could have stated the hypothesis in different ways. Maybe you think the fertilizer increases plant mass or fruit production, for example. However you state the hypothesis, it includes both the independent and dependent variables. In this case, the independent variable is the presence or absence of fertilizer. The dependent variable is the response to the independent variable, which is the size of the plants.

Now that you have a hypothesis, the next step is designing an experiment that tests it. Experimental design is very important because the way you conduct an experiment influences its outcome. For example, if you use too small of an amount of fertilizer you may see no effect from the treatment. Or, if you dump an entire container of fertilizer on a plant you could kill it! So, recording the steps of the experiment help you judge the outcome of the experiment and aid others who come after you and examine your work. Other factors that might influence your results might include the species of plant and duration of the treatment. Record any conditions that might affect the outcome. Ideally, you want the only difference between your two groups of plants to be whether or not they receive fertilizer. Then, measure the height of the plants and see if there is a difference between the two groups.

Salt and Cookies

You don’t need a lab for an experiment. For example, consider a baking experiment. Let’s say you like the flavor of salt in your cookies, but you’re pretty sure the batch you made using extra salt fell a bit flat. If you double the amount of salt in a recipe, will it affect their size? Here, the independent variable is the amount of salt in the recipe and the dependent variable is cookie size.

Test this hypothesis with an experiment. Bake cookies using the normal recipe (your control group ) and bake some using twice the salt (the experimental group). Make sure it’s the exact same recipe. Bake the cookies at the same temperature and for the same time. Only change the amount of salt in the recipe. Then measure the height or diameter of the cookies and decide whether to accept or reject the hypothesis.

Examples of Things That Are Not Experiments

Based on the examples of experiments, you should see what is not an experiment:

  • Making observations does not constitute an experiment. Initial observations often lead to an experiment, but are not a substitute for one.
  • Making a model is not an experiment.
  • Neither is making a poster.
  • Just trying something to see what happens is not an experiment. You need a hypothesis or prediction about the outcome.
  • Changing a lot of things at once isn’t an experiment. You only have one independent and one dependent variable. However, in an experiment, you might suspect the independent variable has an effect on a separate. So, you design a new experiment to test this.

Types of Experiments

There are three main types of experiments: controlled experiments, natural experiments, and field experiments,

  • Controlled experiment : A controlled experiment compares two groups of samples that differ only in independent variable. For example, a drug trial compares the effect of a group taking a placebo (control group) against those getting the drug (the treatment group). Experiments in a lab or home generally are controlled experiments
  • Natural experiment : Another name for a natural experiment is a quasi-experiment. In this type of experiment, the researcher does not directly control the independent variable, plus there may be other variables at play. Here, the goal is establishing a correlation between the independent and dependent variable. For example, in the formation of new elements a scientist hypothesizes that a certain collision between particles creates a new atom. But, other outcomes may be possible. Or, perhaps only decay products are observed that indicate the element, and not the new atom itself. Many fields of science rely on natural experiments, since controlled experiments aren’t always possible.
  • Field experiment : While a controlled experiments takes place in a lab or other controlled setting, a field experiment occurs in a natural setting. Some phenomena cannot be readily studied in a lab or else the setting exerts an influence that affects the results. So, a field experiment may have higher validity. However, since the setting is not controlled, it is also subject to external factors and potential contamination. For example, if you study whether a certain plumage color affects bird mate selection, a field experiment in a natural environment eliminates the stressors of an artificial environment. Yet, other factors that could be controlled in a lab may influence results. For example, nutrition and health are controlled in a lab, but not in the field.
  • Bailey, R.A. (2008). Design of Comparative Experiments . Cambridge: Cambridge University Press. ISBN 9780521683579.
  • di Francia, G. Toraldo (1981). The Investigation of the Physical World . Cambridge University Press. ISBN 0-521-29925-X.
  • Hinkelmann, Klaus; Kempthorne, Oscar (2008). Design and Analysis of Experiments. Volume I: Introduction to Experimental Design (2nd ed.). Wiley. ISBN 978-0-471-72756-9.
  • Holland, Paul W. (December 1986). “Statistics and Causal Inference”.  Journal of the American Statistical Association . 81 (396): 945–960. doi: 10.2307/2289064
  • Stohr-Hunt, Patricia (1996). “An Analysis of Frequency of Hands-on Experience and Science Achievement”. Journal of Research in Science Teaching . 33 (1): 101–109. doi: 10.1002/(SICI)1098-2736(199601)33:1<101::AID-TEA6>3.0.CO;2-Z

Related Posts

  • School Guide
  • Mathematics
  • Number System and Arithmetic
  • Trigonometry
  • Probability
  • Mensuration
  • Maths Formulas
  • Class 8 Maths Notes
  • Class 9 Maths Notes
  • Class 10 Maths Notes
  • Class 11 Maths Notes
  • Class 12 Maths Notes

Experimental Design

Experimental design is reviewed as an important part of the research methodology with an implication for the confirmation and reliability of the scientific studies. This is the scientific, logical and planned way of arranging tests and how they may be conducted so that hypotheses can be tested with the possibility of arriving at some conclusions. It refers to a procedure followed in order to control variables and conditions that may influence the outcome of a given study to reduce bias as well as improve the effectiveness of data collection and subsequently the quality of the results.

What is Experimental Design?

Experimental design simply refers to the strategy that is employed in conducting experiments to test hypotheses and arrive at valid conclusions. The process comprises firstly, the formulation of research questions, variable selection, specifications of the conditions for the experiment, and a protocol for data collection and analysis. The importance of experimental design can be seen through its potential to prevent bias, reduce variability, and increase the precision of results in an attempt to achieve high internal validity of studies. By using experimental design, the researchers can generate valid results which can be generalized in other settings which helps the advancement of knowledge in various fields.

Experimental-Design

Definition of Experimental Design

Experimental design is a systematic method of implementing experiments in which one can manipulate variables in a structured way in order to analyze hypotheses and draw outcomes based on empirical evidence.

Types of Experimental Design

Experimental design encompasses various approaches to conducting research studies, each tailored to address specific research questions and objectives. The primary types of experimental design include:

Pre-experimental Research Design

  • True Experimental Research Design
  • Quasi-Experimental Research Design

Statistical Experimental Design

A preliminary approach where groups are observed after implementing cause and effect factors to determine the need for further investigation. It is often employed when limited information is available or when researchers seek to gain initial insights into a topic. Pre-experimental designs lack random assignment and control groups, making it difficult to establish causal relationships.

Classifications:

  • One-Shot Case Study
  • One-Group Pretest-Posttest Design
  • Static-Group Comparison

True-experimental Research Design

The true-experimental research design involves the random assignment of participants to experimental and control groups to establish cause-and-effect relationships between variables. It is used to determine the impact of an intervention or treatment on the outcome of interest. True-experimental designs satisfy the following factors: 

Factors to Satisfy:

  • Random Assignment
  • Control Group
  • Experimental Group
  • Pretest-Posttest Measures

Quasi-Experimental Design

A quasi-experimental design is an alternative to the true-experimental design when the random assignment of participants to the groups is not possible or desirable. It allows for comparisons between groups without random assignment, providing valuable insights into causal relationships in real-world settings. Quasi-experimental designs are used typically in conditions wherein the random assignment of the participants cannot be done or it may not be ethical, for example, an educational or community-based intervention.

Statistical experimental design, also known as design of experiments (DOE), is a branch of statistics that focuses on planning, conducting, analyzing, and interpreting controlled tests to evaluate the factors that may influence a particular outcome or process. The primary goal is to determine cause-and-effect relationships and to identify the optimal conditions for achieving desired results. The detailed is discussed below:

Design of Experiments: Goals & Settings

The goals and settings for design of experiments are as follows:

  • Identifying Research Objectives: Clearly defining the goals and hypotheses of the experiment is crucial for designing an effective study.
  • Selecting Appropriate Variables: Determining the independent, dependent, and control variables based on the research question.
  • Considering Experimental Conditions: Identifying the settings and constraints under which the experiment will be conducted.
  • Ensuring Validity and Reliability: Designing the experiment to minimize threats to internal and external validity.

Developing an Experimental Design

Developing an experimental design involves a systematic process of planning and structuring the study to achieve the research objectives. Here are the key steps:

  • Define the research question and hypotheses
  • Identify the independent and dependent variables
  • Determine the experimental conditions and treatments
  • Select the appropriate experimental design (e.g., completely randomized, randomized block, factorial)
  • Determine the sample size and sampling method
  • Establish protocols for data collection and analysis
  • Conduct a pilot study to test the feasibility and refine the design
  • Implement the experiment and collect data
  • Analyze the data using appropriate statistical methods
  • Interpret the results and draw conclusions

Preplanning, Defining, and Operationalizing for Design of Experiments

Preplanning, defining, and operationalizing are crucial steps in the design of experiments. Preplanning involves identifying the research objectives, selecting variables, and determining the experimental conditions. Defining refers to clearly stating the research question, hypotheses, and operational definitions of the variables. Operationalizing involves translating the conceptual definitions into measurable terms and establishing protocols for data collection.

For example, in a study investigating the effect of different fertilizers on plant growth, the researcher would preplan by selecting the independent variable (fertilizer type), dependent variable (plant height), and control variables (soil type, sunlight exposure). The research question would be defined as “Does the type of fertilizer affect the height of plants?” The operational definitions would include specific methods for measuring plant height and applying the fertilizers.

Randomized Block Design

Randomized block design is an experimental approach where subjects or units are grouped into blocks based on a known source of variability, such as location, time, or individual characteristics. The treatments are then randomly assigned to the units within each block. This design helps control for confounding factors, reduce experimental error, and increase the precision of estimates. By blocking, researchers can account for systematic differences between groups and focus on the effects of the treatments being studied

Consider a study investigating the effectiveness of two teaching methods (A and B) on student performance. The steps involved in a randomized block design would include:

  • Identifying blocks based on student ability levels.
  • Randomly assigning students within each block to either method A or B.
  • Conducting the teaching interventions.
  • Analyzing the results within each block to account for variability.

Completely Randomized Design

A completely randomized design is a straightforward experimental approach where treatments are randomly assigned to experimental units without any specific blocking. This design is suitable when there are no known sources of variability that need to be controlled for. In a completely randomized design, all units have an equal chance of receiving any treatment, and the treatments are distributed independently. This design is simple to implement and analyze but may be less efficient than a randomized block design when there are known sources of variability

Between-Subjects vs Within-Subjects Experimental Designs

Here is a detailed comparison among Between-Subject and Within-Subject is tabulated below:

Between-Subjects

Within-Subjects

Each participant experiences only one condition

Each participant experiences all conditions

Typically includes a control group for comparison

Does not involve a control group as participants serve as their own control

Requires a larger sample size for statistical power

Requires a smaller sample size for statistical power

Less susceptible to order effects

More susceptible to order effects

More impacted by individual differences among participants

Less impacted by individual differences among participants

Design of Experiments Examples

The examples of design experiments are as follows:

Between-Subjects Design Example:

In a study comparing the effectiveness of two teaching methods on student performance, one group of students (Group A) is taught using Method 1, while another group (Group B) is taught using Method 2. The performance of both groups is then compared to determine the impact of the teaching methods on student outcomes.

Within-Subjects Design Example:

In a study assessing the effects of different exercise routines on fitness levels, each participant undergoes all exercise routines over a period of time. Participants’ fitness levels are measured before and after each routine to evaluate the impact of the exercises on their fitness levels.

Application of Experimental Design

The applications of Experimental design are as follows:

  • Product Testing:  Experimental design is used to evaluate the effectiveness of new products or interventions.
  • Medical Research:  It helps in testing the efficacy of treatments and interventions in controlled settings.
  • Agricultural Studies:  Experimental design is crucial in testing new farming techniques or crop varieties.
  • Psychological Experiments:  It is employed to study human behavior and cognitive processes.
  • Quality Control:  Experimental design aids in optimizing processes and improving product quality.

In scientific research, experimental design is a crucial procedure that helps to outline an effective strategy for carrying out a meaningful experiment and making correct conclusions. This means that through proper control and coordination in conducting experiments, increased reliability and validity can be attained, and expansion of knowledge can take place generally across various fields. Using proper experimental design principles is crucial in ensuring that the experimental outcomes are impactful and valid.

Also, Check

  • What is Hypothesis
  • Null Hypothesis
  • Real-life Applications of Hypothesis Testing

FAQs on Experimental Design

What is experimental design in math.

Experimental design refers to the aspect of planning experiments to gather data, decide the way in which to control the variable and draw sensible conclusions from the outcomes.

What are the advantages of the experimental method in math?

The advantages of the experimental method include control of variables, establishment of cause-and-effector relationship and use of statistical tools for proper data analysis.

What is the main purpose of experimental design?

The goal of experimental design is to describe the nature of variables and examine how changes in one or more variables impact the outcome of the experiment.

What are the limitations of experimental design?

Limitations include potential biases, the complexity of controlling all variables, ethical considerations, and the fact that some experiments can be costly or impractical.

What are the statistical tools used in experimental design?

Statistical tools utilized include ANOVA, regression analysis, t-tests, chi-square tests and factorial designs to conduct scientific research.

Please Login to comment...

Similar reads.

  • School Learning
  • Math-Statistics
  • Discord Launches End-To-End Encryption For Audio & Video Chats
  • iPadOS 18 is Now Available: Complete Features and How to Install
  • Microsoft’s Latest 365 Copilot Updates: Enhanced AI Tools for Excel, PowerPoint, and Teams
  • Microsoft Unveils New AI Features: Copilot Pages and Autonomous AI Agents in Copilot Wave 2
  • 10 Best PrimeWire Alternatives (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • ScientificWorldJournal
  • v.2014; 2014

Logo of tswj

Experiments in Computing: A Survey

Matti tedre.

1 Department of Computer and Systems Sciences, Stockholm University, 16440 Kista, Sweden

Nella Moisseinen

2 Faculty of Behavioural Sciences, University of Helsinki, 00014 Helsinki, Finland

Experiments play a central role in science. The role of experiments in computing is, however, unclear. Questions about the relevance of experiments in computing attracted little attention until the 1980s. As the discipline then saw a push towards experimental computer science, a variety of technically, theoretically, and empirically oriented views on experiments emerged. As a consequence of those debates, today's computing fields use experiments and experiment terminology in a variety of ways. This paper analyzes experimentation debates in computing. It presents five ways in which debaters have conceptualized experiments in computing: feasibility experiment, trial experiment, field experiment, comparison experiment, and controlled experiment. This paper has three aims: to clarify experiment terminology in computing; to contribute to disciplinary self-understanding of computing; and, due to computing's centrality in other fields, to promote understanding of experiments in modern science in general.

1. Introduction

After the birth of the stored-program paradigm in the mid-1940s, computing as a discipline started to form up. The first step in the discipline creation was to separate it from the fields that gave birth to it, especially from mathematics and electrical engineering. In the 1960s and the 1970s the field was divided over a debate concerning the mathematical nature of computing (e.g., [ 1 – 6 ]). There were a variety of formal, theory-oriented views of computing as a discipline. Some theoretically proficient computer scientists emphasized the mathematical analysis of algorithms for the general conclusions such analysis could provide [ 7 – 12 ]. Another group focused on developing a mathematical theory of program construction [ 13 – 18 ]. The most vehement advocates of a mathematical theory of computing went as far as to suggest that programming as an activity is fully reducible to mathematics [ 19 ]. In the theoretical advocates' visions of the discipline, the role of empirical work and experimentation was often ambiguous, as it was rarely, if ever, discussed in detail.

Another debate that characterized the development of computing as a discipline was concerned with the field's engineering character. Engineering aspects of computing were, for several decades, effectively kept out of the academic debate about computing as a discipline; despite the fact that the first computers were built in universities, they were used for applied sciences, and the development of early computing in universities had a strong engineering character [ 20 – 23 ]. The late 1960s, however, saw a new turn in these debates when software engineering was brought to limelight [ 24 ]—and harshly criticized [ 25 ]. For decades, software engineering remained a target of sustained criticism. Software engineers were accused of basing their work on a combination of anecdotal evidence and human authority [ 26 ]. What is more, meta-analyses of literature found that a large portion of software engineering articles failed to experimentally validate their results [ 27 – 29 ]. Lacking experimentation was one of the commonly criticized aspects of software engineering.

A third debate about the essence of computing as a discipline was concerned with the scientific character of computing. There were arguments over whether computing is a science or not, and there were arguments over what might computing be a science of [ 30 ]. In one of the influential early defenses of the scientific nature of computer science it was argued that computer science is the study of computers and phenomena surrounding them [ 31 ]. Other proposals for the subject matter of computing included, for instance, information, algorithms, classes of computations, programming, complexity, and procedures [ 5 , 32 – 37 ].

Arguments that looked at the subject matter of computing never managed to settle the debate over the scientific character of computing. But over the course of time, the focus of the “science” debates shifted from subjects to activities. It became increasingly common to argue that computing is indeed science—not by virtue of its subject matter but by virtue of its method of inquiry .

The methodology question entered computing debates gradually. Many early arguments for computing as a science glossed over methodological questions. Although some descriptions of the “axiomatic” or “mathematical” sciences of computation compared computing with natural sciences (e.g., [ 16 ]), they rarely discussed either the relevance of the scientific method to computing or the role of experiments in the field. Similarly, one of the first descriptions of computing as an empirical science, by Newell et al. [ 31 ], was vague about methods and empirical approaches in the science of computing. The methodology question was finally brought into limelight by the experimental computer science debate, when a campaign for “rejuvenating experimental computer science” started at the turn of the 1980s [ 38 – 41 ].

The view that computing is an inseparable combination of three very different intellectual traditions—theory, engineering, and empirical science [ 42 ]—complicates many debates about computing. One such debate is the “experimental computer science” debate. The words “experiment” and “experimental” are understood very differently between the traditions, which makes it difficult to grasp the intent of each argument on experimental computer science. This paper presents a survey of arguments about experimental computer science and presents that at least five different uses of the terms “experiment” and “experimental” can be found in the computing literature. This paper is a survey of how terminology is actually used and not of how it should be used. For instance, in the engineering tradition experimentation terminology is used much more loosely than in the tradition of experiment-based science. In short, the paper seeks an answer to the question, “ What do computer scientists mean when they talk about experiments in computer science ?”

2. Experimentation in Computing

Among researchers in computing disciplines there is wide support for views of computing as an empirical or experimental science. However, the terms empirical and experimental are not always used coherently. In sciences in general, it is relatively common to see the term “empirical” used to refer to research that relies on observation-based collection of primary data. The term “empirical research” stands in contrast with theoretical and analytical research. In many fields of science the term “experimental” goes deeper than “empirical” and refers to a specific kind of research, where controlled experiments are used for testing hypotheses. However, in the field of computing the term “experimental” has been used in a much more wider range of meanings.

The role of experimentation in computing became a hot topic when Feldman and Sutherland [ 40 ] published their report entitled “Rejuvenating Experimental Computer Science.” That report recommended that universities and the U.S. government should recognize and support experimental computer science. Denning [ 38 ] joined ranks with the Feldman committee and wrote that no scientific discipline can be productive in the long term if its experimenters merely build components. Also the ACM Executive Committee, which included Denning, agreed with the Feldman committee in that experimental computer science was undervalued at the time [ 41 ].

The “rejuvenating” report marked a shift of focus in methodology debates from the roles of theory and subject matter to the amount and methodological quality of empirical work in computing. The following decades saw numerous descriptive and normative arguments on the role of empirical and experimental research in computing. While some described how computer scientists actually work, others prescribed how they should work. Several studies compared research reports in computing with those in other fields—usually natural sciences or established branches of engineering [ 28 , 45 ]. In those studies, it was a common finding that research in computing fields experiment significantly less than researchers in many other disciplines do [ 28 , 45 ].

Over the course of time, many authority figures in computing advised computer scientists to experiment more [ 46 , 47 ]. Given that much of that encouragement was due to inspiration from other fields, it is interesting to look at the computing side of the story. In particular, what do computer scientists from different backgrounds mean by “experimental computer science?” This section presents, firstly, the context of the experimental science debate through four viewpoints: empirical dimensions of computing, subjects of experimentation, experimental activities, and various terminological and classification viewpoints. Secondly, this section outlines critical viewpoints to experiments in computing, as presented in computing literature.

2.1. Experimentation in Computing Context

2.1.1. empirical dimensions of computing.

All the different accounts of experiments in computing—from controlled experiments to experimental algorithmics—fall into the broader category of empirical work. Computing and empirical research have been coupled in the literature in various ways, of which one particular perspective is discussed below. Computing and computers are, for one thing, subjects of research. Second, they are instruments of research. Third, they may be both at once.

One popular way of discussing computing and experimentation is to see computers and phenomena around them as a subject of research (e.g., [ 31 ]). There is a rich body of experimental work on computers, programming languages, interfaces, users, and algorithms, just to name a few. Some experiments are done in a very controlled manner, while some authors refer to their exploratory work as “experimental.” Viewing computing, computers, and phenomena surrounding them as a subject of inquiry opens doors for a variety of views on experimentation, and this paper looks at that aspect of experiments in computing.

Another popular way of discussing experimentation in computing is through seeing computers as research instruments in other fields. The history of computing and computers as instruments for experiments (simulations) in other fields is a long-established one. In his introduction to the famous 1946 Moore School lectures, Stibitz [ 48 ] argued that digital computers are an incredible laboratory where “ the ingredients of every experiment are perfectly isolated .” Stibitz wrote that computers offer unlimited precision and an unlimited supply of instruments for research. Later on, the first modern computers were used for applied sciences, such as ballistics calculations, warfare [ 49 , page 214], meteorology, astronomy [ 50 , page 189], and quantum physics [ 25 , page 122]. Progress in modern science is so heavily dependent on computing that different authors have called the increased dependence “algorithmization” of sciences [ 51 ], “the age of computer simulation” [ 52 ], and even an “info-computational” view of the world [ 53 ]. Computing has introduced a plethora of tools for other sciences—take, for instance, virtual experiments, simulations, heuristic models, and neural networks [ 52 ]. Viewing computing as an instrument of research paints another image on experimentation, different from viewing computing as a subject of research.

Various kinds of models also pervade the field of computing. One can easily consider specifications, program texts, and programming languages to be certain kinds of models [ 54 , 55 ]. The experiment is a central part of validating models or testing the fit between the model and the world [ 56 ]. However, when computer models are used as tools, one should ask which discipline is actually being studied. Colburn [ 43 ] used computational models in genetics as an example. Is the programmer actually doing genetics or computer science? In many computational sciences joint work benefits both computing and the field where it is applied [ 57 ]. That is, computing can at once be a tool and a subject of study. This paper, however, does not focus on the instrumental aspect of computing but on research of computing for computing's sake.

2.1.2. Subjects and Topics

Another angle at describing the context of experimentation in computing is to look at its subjects and topics. As there is already a good number of arguments about experimental computer science, one can borrow examples directly from the literature. In his discussion on experiments in computer science, Denning [ 39 ] brought up research on memory policies in time sharing and research on queuing networks. Freeman [ 58 ] proposed examples of a robot competition, research of data-intensive supercomputing, and research of future network architectures. Feldman and Sutherland [ 40 ] included advanced applications of computers. Gustedt et al. [ 59 ] highlighted research on, for instance, grid computing, parallel computing, large-scale distributed systems, and various other projects in large-scale computing. Basili and Zelkowitz [ 60 ] mentioned software engineering and high-end computing. Various authors from Chaitin to Zuse have argued that nature itself calculates [ 61 , 62 ]. In the end, subject as such is not of importance [ 63 ]. Any subject can be studied scientifically, and many can be studied experimentally.

2.1.3. Activities

One can also take a look at what kind of activities the term “experimental computer science” might cover. In the original “rejuvenating” report [ 40 ], experimenting in computer science was characterized as exploration (page 498), construction and testing (page 499), hypothesis-testing, demonstration, and modeling (page 500). Denning listed modeling, simulation, measurement, validation, prototyping, testing, performance analysis, and comparisons [ 64 ]. Other participants of the debate mentioned, for example, measuring, testing, making hypotheses, observing, collecting data, classifying, and sustaining or refuting hypotheses [ 39 , 41 ]. As a prime example of experimental computer science, Denning [ 39 ] referred to performance analysis—the construction, validation, and empirical evaluation of computer systems. Belady [ 65 ] wrote that his experimental computer science involved building prototypes, observing, organizing observations, and formalizing them into models. All the activities above are central to science, but they are central to different kinds of science.

At the end of the 1980s the famous report “Computing as a discipline” by Denning et al. [ 42 ] raised modeling as one of the three cornerstones of computing. In that report, experiments played a role similar to their role in natural sciences. Denning et al. described the cycle of work on the science side of computing through four steps, (1) Form a hypothesis, (2) construct a model and make a prediction, (3) design an experiment and collect data, and (4) analyze results. Freeman [ 58 ] dropped the hypothesis part and advocated a view of experimentation in computing based on a cycle of observation, measurement, and analysis of results. Gelernter [ 66 , page 44] emphasized the generalizability of results, he explicitly noted the deductive and inductive phases of research, and he argued that computing is indeed a science insofar as its combination of theoretical foundations and experiments allows the making and proving of general statements.

One unique formulation of an experiment-like procedure in computing—one with automated and repeatable experiments—can be found in the cycle of test-driven development ( Figure 1 ; see, e.g., [ 67 ]). In test-driven development, each cycle in software construction starts with writing a test for an added software feature. The procedure continues with running all the tests and seeing the previously added test fail, writing code that implements the wanted feature, and running the tests again to see if the newly written code really implements the desired functionality. In other words, the programmer starts from a certain functionality requirement, designs an automated experiment that is aimed at testing that functionality, and implements code that passes all the new and previous tests.

An external file that holds a picture, illustration, etc.
Object name is TSWJ2014-549398.001.jpg

Cycle of work in test-driven development.

In the field of software engineering there is a rich history of discussions on experimental methods—including highly influential accounts like that of Basili, Selby, and Hutchens [ 68 ]—although terminology in those discussions is often used differently from what the stalwart proponents of experimental computer science advocated. Zelkowitz and Wallace [ 28 , 29 ] categorized “experimental approaches” in software engineering into three categories: observational methods, which collect data throughout the project; historical methods, which collect data from already completed projects; and controlled methods, which attempt to increase the statistical validity of results by providing multiple instances of observations. Of observational methods, they listed project monitoring, case study, assertion, and field study [ 28 ]. Of historical methods, they listed literature search, legacy data, lessons learned, and static analysis. Of controlled methods, they listed replicated experiment, synthetic environment experiments, dynamic analysis, and simulation. It is important to note that Zelkowitz and Wallace [ 28 , 29 ] did not call their lists “empirical” but “experimental” models and approaches. They argued that their categories cover the previously presented taxonomies, such as the nine variants of quantitative and qualitative experiments described by Kitchenham [ 69 ] as well as the six types identified by Basili [ 70 ]. Again, the descriptions of experimentation in software engineering are all central to science but to different kinds of science.

On the broader level, Morrison and Snodgrass [ 71 ] wrote that debugging is one aspect of the scientific method that computer scientists do well. Different from Dijkstra [ 13 ], who opposed debugging as “ putting the cart before the horse ,” Morrison and Snodgrass described debugging as “ one of the purest forms of empirical investigation .” There are indeed various attempts to describe debugging as a “science of debugging” [ 44 , 72 , 73 ]. One of the pioneering works in the philosophy of experiment, by Hacking [ 74 ], named “debugging” as a central element in modern experimentation—although its meaning in the context that Hacking discussed is different from its meaning in computing. Also other modern views of the scientific method include debugging, under different names, in the cycle of scientific research (e.g., [ 75 ]). The literature on the philosophy of engineering takes that aspect of research further through, for instance, parameter variation : the repeated measurement of a device's performance, while systematically adjusting the device's parameters of its conditions of operation [ 76 , page 139].

Colburn [ 43 ] sketched another formulation of experiment-based work in computer science in the form of “solution engineering.” In various branches of computer science the usual scenario includes rigorous requirements, and the task of the computer scientist is to engineer an algorithmic solution. Table 1 presents Bartley's [ 44 ] description of debugging, in parallel with Colburn's [ 43 ] “solution engineering” and a simplified three-step view of the scientific method.

Analogy between the scientific method, Colburn's [ 43 ] “solution engineering,” and Bartley's [ 44 ] view of debugging.

The Scientific method Solution engineering Debugging
Formulate a for explaining a phenomenon Formulate an for solving a problem Make a as to what causes an identified bug
Test the hypothesis by conducting an Test the algorithm by writing and running a Test the guess by, for instance, tracing the program states
the hypothesis by evaluating the results of the experiment the algorithm by evaluating the results of running the program the guess by evaluating the program states

In Colburn's analogy in Table 1 , what is being tested in the scientific method is not the experiment but the hypothesis. The experiment is a tool for testing the hypothesis. Similarly, in Colburn's analogy, what is being tested in problem solving in computer science is not the program but the algorithm. The program is written in order to test the algorithm. In this analogy, writing a program is analogous to constructing a test situation. Khalil and Levy [ 35 ] made a similar analogy as they wrote, “ programming is to computer science what the laboratory is to the physical sciences .”

Although solution engineering presents another view of experimentation in computing disciplines, it has been argued that an experiment in science can never test an isolated hypothesis but the whole theoretical group: assumptions, auxiliary hypotheses, and indeed the whole test situation [ 77 , 78 ]. Similarly, running a program cannot accept or reject an algorithm alone, but it can only accept or reject the whole test system—including, for example, the operating system, hardware, quality of data, and contingent environmental inference. It can never be ruled out that the algorithm and the corresponding program were fine but something else in the test system caused wrong results—and it can not be ruled out that the program was incorrect but, due to a problem with the test system, it contingently yielded right results.

2.1.4. Terminology and Classifications

There have also been analyses of experimentation terminology in computing. Feitelson [ 79 ] distinguished between three uses of the term “experimental computer science.” He argued that the most prominent use of the term is to use it as a counterpart to theoretical computer science. The second use of the term, according to Feitelson [ 79 ], is as a part of a feedback loop for the development of models, systems, and various other elements of computer science. Feitelson's third notion referred to the adoption of scientific experimental methods for the evaluation of computer systems. Gustedt, Jeannot, and Quinson presented four examples from large-scale systems: in situ experiments, emulation, benchmarking, and simulation [ 59 ].

Amigoni et al. [ 80 ] analyzed experimental activities in mobile robotics and classified them according to their purposes, the data sets they employ, and their measured quantities, be they intrinsic or extrinsic. Regarding purposes , they found demonstrations, gathering insight into a system's behavior, assessing limits of applicability, and comparing systems. Regarding data sets , they found publicly available instances, as well as uses of different environments. Regarding measured quantities , they found a number of measures, ranging from analytical (in fact nonmeasured, such as time complexity) to empirical (such as accuracy and robustness).

To summarize, the context in which experimental approaches in computing are discussed is extremely broad. Right or wrong, experimentation terminology is by no means used in the same way it is used in, for instance, physics [ 81 – 83 ], biology [ 84 ], or chemistry. There are various views on the role of computing regarding experiments, there is a diversity of opinions on methods applicable, there are various examples of appropriate subjects and topics, and there are many existing analyses of experimentation in computing. However, although there are many advocates of experimentation in computing, various critical viewpoints can also be found in the literature.

2.2. Critique of Experimentation

Although the general atmosphere in disciplinary debates of computing has become positive towards experimental computer science, the identity of the field is still in a state of flux, and there is a notable history of critical views towards experiments and experimentation language in computing. Some critics argued that the role or the nature of experiments differs between computing and natural sciences [ 85 , 86 ]. Others disputed the centrality of experiments in computing [ 87 ]. Yet others claimed that in computing experiments are not done right or are not articulated right [ 28 , 29 ].

The mathematical reductionists, for one, had reservations about experimentation in computing. In his famous argument for programming as a mathematical activity, Hoare [ 19 ] complained that, because computers and programs are not constructed with mathematical rigor, the only way of finding out what they do is by experiment. He wrote that such experiments in computing certainly are not mathematics, and that because their findings often can not be generalized, “ unfortunately, they are not even science ” [ 19 ]. Hoare's answer at the time was to rigorously prove that a system will work as planned. Fletcher [ 87 ] criticized some authors' preoccupation with experimentation and noted that without the theoretical idea of Turing equivalence of all computers there would be no academic discipline of computing but just eclectic knowledge about particular machines. Many others who advocated variants of “mathematical” or “axiomatic” approaches to computing never made their stance towards experiments clear (e.g., [ 16 ]).

The second source of objections was concerned with the differences between experiment in natural sciences and in computing. Emphasizing the view that computing is a constructive discipline, Hartmanis [ 85 ] argued that experimentation in computer science is different from the natural sciences, as it focuses “ more on the how than the what .” He wrote that whereas advancements in natural sciences are documented by dramatic experiments, in computer science—which Hartmanis [ 88 ] called the “ engineering of mathematics ”—advancements are documented by dramatic demonstrations. The role of experiments in computing, according to Hartmanis and Lin [ 86 ], is to uncover practical issues with theoretical work instead of proving those theories wrong—quite a different view compared to an idealized view of the role of experiments in science (as described in, for instance, the old falsificationist, hypothetico-deductive, and deductive-nomological models of science [ 89 – 91 ].)

Hartmanis [ 92 ] claimed that there are three differences between natural sciences and computing: in computing theories do not compete with each other as explanations of the fundamental nature of information; in computing anomalies in experimental results do not lead to revision of theories, and in computing there is no history of critical experiments that decide between the validity of competing theories. Hartmanis' [ 86 , 92 ] views faced immediate criticism. Loui [ 93 ] responded that, instead of calling computing a new species among sciences, it would be more appropriate to call computer science a new species of engineering. Stewart [ 94 ] responded by writing that computer scientists should strive to make computer science similar to the natural sciences. Dijkstra [ 95 ] responded that it is ridiculous to support computer science and engineering as a “ laboratory discipline ( i.e., with both theoretical and experimental components )” if the material taught in computing has a half-life of five years. Finally, even if one accepted the controversial claim that computing has no history of critical experiments that decide between theories, there surely is a history of critical demonstrations that have decided between competing techniques and guided technical development efforts.

The third common type of objection was concerned with the artificial nature of data and subject matter of computing. McKee [ 96 ] noted that in natural sciences research is based on observations (data), which scientists can explain, predict, and replicate. In the field of computing, McKee continued that there is no data beyond the computer and programs, which behave exactly as they were designed to behave. In a similar manner, also Brooks [ 57 ] argued that computer science is not a science but a synthetic, engineering discipline. The role of experimentation in a synthetic discipline is different from its role in natural sciences (see [ 86 , 97 ]).

The fourth common objection was concerned with terminology. The careless use of experimental terminology—not experiments per se—has been criticized by various authors (e.g., [ 60 , 79 ]). A meta-analysis by Zelkowitz and Wallace [ 28 , 29 ] revealed that terms “experiment” and “effective” were often used loosely or ambiguously. The authors wrote, “ Researchers write papers that explain some new technology; then they perform “experiments” to show how effective the technology is .” Zelkowitz and Wallace's central concern was the same as Denning's [ 38 ]. It is not science to develop something and say that it seemed to work well.

One could add a fifth objection related to the normative claims that advocates of experimentation sometimes made. Many of those authors who urged computer scientists to experiment more failed to justify why computer scientists should aspire to work like scientists or engineers in other fields do. One might justly ask, “If the subject matter of computer science is different from the other sciences, on what grounds should its methods be the same?” Computing is a unique field that introduces an array of novel techniques, so perhaps some room should be left for uniqueness in methodological sense, too.

In addition to the objections, Gustedt et al. [ 59 ] proposed various assumptions that may explain the lack of experimenting in computing: insufficient funding for experimenting, “missing disposability of dedicated experimental environments,” lack of appreciation of work-intensive experimental results, and lack of methods and tools. Similarly, Tichy [ 27 ] suggested eight (mis)beliefs that he believed to explain why experiments are not more popular: “Traditional scientific method is not applicable,” “The current level of experimentation is good enough,” “Experiments cost too much,” “Demonstrations will suffice,” “There's too much noise in the way,” “Experimentation will slow progress,” “Technology changes too fast,” and “You'll never get it published.” Also Denning [ 38 ] objected against three hypothetical misconceptions about experimental computer science: “It is not novel to repeat an experiment,” “mathematics is the antithesis of experiment,” and “tinkering is experimental science.”

3. Five Views on Experimental Computer Science

Discussions about experimental computer science, as presented in the section above, are complicated by the various uses of the terms “to experiment” (the verb), “an experiment” (the noun), “experimentation” (the noun), “experimental” (the adjective), and the myriad derivatives of those words. The confusion was visible already in the “rejuvenating” report, and, while a lot of effort has been spent on clarifying the concepts (e.g., [ 28 , 39 , 45 , 98 ]), there is still no agreement on experimentation terminology. This chapter presents five different uses of the term “experiment,” each relatively common in the computing literature. It should be noted that this chapter passes no judgment on “correct” uses of experimentation terminology; it only describes how it has been used in the literature.

3.1. Feasibility Experiment

The first and loosest use of the term “experiment” can be found in many texts that report and describe new techniques and tools. Typically, in those texts, it is not known if task t can be automated efficiently, reliably, feasibly, cost-efficiently, or by meeting some other simple criterion. A demonstration of experimental (novel, untested, and newly implemented) technology shows that it can indeed be done. Including the terms “demonstration” and “experimental” in the same sentence may sound like a forced marriage of two incompatible concepts, but in the computing literature “experiment” is indeed sometimes used nearly synonymously with “demonstration,” “proof of concept,” or “feasibility proof” as the following examples demonstrate.

Hartmanis and Lin [ 86 , pages 213-214] wrote that in computer science and engineering theories develop over years of practice, with “ experiments largely establishing the feasibility of new systems .” Plaice [ 99 ] wrote, in ACM Computing Surveys , that the development of large software systems exemplifies experimentation in computer science—“ and experimentation is the correct word, because we often have no idea what these tools will offer until they are actually used .” He continued to describe that what constitutes an experiment is that a scientist “ carefully defines what must be done and then carefully sets out to do it .” Feitelson [ 79 ] identified the “demonstration of feasibility” view as one of the three common views to experimental computer science. Feitelson also noted that the “demonstration of feasibility” experiments in applied computer science are largely divorced from theoretical computer science [ 79 ].

The ACM FCRC Workshop on Experimental Computer Science ( http://people.csail.mit.edu/rudolph/expcs.pdf (retrieved January 30, 2013)) involved “experimental engineering” that produces new “ techniques, insights, and understanding that come from building and using computer systems .” Hartmanis [ 85 ], though, wanted to make the difference between experiments and demonstrations explicit, calling for computing researchers to acknowledge the central role of demonstrations in the discipline. In their description of experimental computer science Basili and Zelkowitz [ 60 ], too, criticized the “demonstration” view of experimentation in computing: “ experimentation generally means the ability to build a tool or system—more an existence proof than experiment .”

3.2. Trial Experiment

The second use of the term “experiment” in computing goes further than demonstrations of feasibility. The trial experiment evaluates various aspects of the system using some predefined set of variables. Typically, in those studies, it is not known how well a new system s meets its specifications or how well it performs. A trial (or test, or experiment) is designed to evaluate (or test, or experiment with) the qualities of the system s . Those tests are often laboratory based but can also be conducted in the actual context of use with various limitations.

Of Gustedt et al.'s [ 59 ] four-way categorization of experiments ( in situ experiments, emulation, benchmarking, and simulation), the ones that permit the most abstraction—emulation, simulation, and benchmarking—fall into the trial experiment category. Emulation runs a real application in a model environment, simulation runs a model (limited functionality) application in a model environment, and benchmarking evaluates a model application in a real environment [ 59 ]. Similar “toy-versus-real” distinctions are made in descriptions of experimentation in software engineering [ 100 ].

McCracken et al. [ 41 ] wrote that experimental research is about “ not only the construction of new kinds of computers and software systems, but also the measurement and testing ” of those systems. Furthermore, trial experiments are not a privilege of the applied side of computing. Glass [ 47 ] proposed that formal theory needs to be validated by experiments, and Fletcher [ 87 ] wrote that theoretical computer scientists may “ resort to trial runs because the problem is mathematically intractable .” Many types of validation of computational models of phenomena fall under trial experiments.

3.3. Field Experiment

A third common use of the term “experiment” is similar to trial experiments in that it is also concerned with evaluating a system's performance against some set of measures. However, the field experiment takes the system out of the laboratory. Typically, in those studies, it is not known how well a system fulfills its intended purpose and requirements in its sociotechnical context of use. The system is tested in a live environment and measured for things such as performance, usability attributes, or robustness. The term “field experiment” is used in, for instance, information systems [ 101 ], while Gustedt et al. [ 59 ] used the term “ in situ experiments”: real applications executed at the real scale using real hardware.

The experimental computer science debates involve various examples of field experiments. A robot car race is an oft-used example of a field experiment, or “ experimentation under real-world conditions ” [ 58 ]. In the DARPA Grand Challenge, driverless vehicles compete with each other in finding their way through various types of environments. A common downside to the field experiment is diminished reproducibility that is brought about by the large number of variables and limited control in live environments. Yet, as they are often quasi-experiments or limited-control experiments, field experiments offer more control than case studies or surveys do [ 101 ].

3.4. Comparison Experiment

A fourth common use of the term “experiment” refers to comparison between solutions. Many branches of computing research are concerned with looking for the “best” solution for a specific problem [ 87 ] or developing a new way of doing things “better” in one way or another. Typically, in reports of those studies, it is not known if (or rather, “not shown that”) system A outperforms system B with data set d and parameters p . An experiment is set up to measure and compare A ( d , p ) and B ( d , p ), and the report shows that the new system beats its predecessors in terms of a set of criteria C . Johnson [ 10 ] called that type of experimental analysis “horse race papers.” Fletcher [ 87 ] argued that many brands of experimental computer science are most applicable to that type of research (Fletcher referred to [ 45 , 47 ]).

However, although comparison experiments seem “objective” in many ways, they are, in fact, susceptible to bias in a number of ways [ 79 , 102 ]. It has been noted that often such experiments do not follow the standard precautions against experimenter bias, such as the blinding principle [ 87 ]. The researcher should not be able to choose B , d , C , or p favorably for his or her own system A . Zelkowitz and Wallace [ 28 , 29 ] argued that “ All too often the experiment is a weak example favoring the proposed technology over alternatives .” There again, many fields of computing have introduced standard tests, input data, and expected outputs, against which competing solutions can be compared (e.g., [ 103 ]).

3.5. Controlled Experiment

A fifth common use of the term “experiment” refers to the controlled experiment . The controlled experiment is the gold standard of scientific research in many fields of science—especially when researchers aim at eliminating confounding causes—and it typically enables generalization and prediction. There are numerous uses for the controlled experiment setup; for instance, it is often used for situations where it is not known if two or more variables are associated, or if x causes y .

In many arguments for experimental computer science, by “experiment” the author explicitly or implicitly means “controlled experiment” but not always for the same reasons. Peisert [ 104 ] advocated controlled experiments for research on computer security, and their vision was that it promotes increased generalizability and better justified claims about products. Morrison and Snodgrass [ 71 ] wanted to see more generalizable results in software development. Schorr [ 105 ] argued that software and systems, with their increased user interaction, have grown too large for other kinds of methods but controlled experiments. Curtis [ 106 ] and Pfleeger [ 107 ] emphasized the role of controlled experiments in software engineering due to their potential for probabilistic knowledge about causality and increased confidence about what exactly in technical interventions caused the change. Feitelson [ 108 ] promoted evaluations under controlled conditions for all applied computer science.

4. Discussion

Experiments played a central part in the development of modern science, and over the centuries experiments also evolved. In modern science experiments play many roles; often in relation to theory but also independent of theory [ 74 ]. In scientific practice, the relationship between theory and experiments has always been fluid, and the many faces of experiment benefit scientific investigation in different ways at different stages of research [ 82 ]. Different fields employ experiment in different ways, and the fit between experiment, apparatus, and theory varies between disciplines [ 75 ].

The spectrum of experiments is fully visible in computing fields. The breakthroughs in computing happened at a junction of various fields, such as mathematical logic, electrical engineering, and materials science. Since the birth of the stored-program paradigm, computing has adopted methods from an even broader variety of fields. As the disciplines that gave birth to computing each have reserved a very different role for experiments, it is unsurprising that the computing literature uses experimentation terminology in a variety of ways. Sometimes the term refers to empirical research in general, sometimes to evaluation strategies, sometimes to proofs of concept, and sometimes to controlled experiments. The philosophy of experiment reveals some diversity of experimental terminology in fields other than computing, too.

The role of experiments in computing disciplines has been highly debated since the “rejuvenating experimental computer science” report in 1979. A large number of viewpoints to experimental computer science have advocated a variety of views to experiments, each with their own aims, methods, and assumptions. Experiment terminology also played a key rhetorical role in debates about the future directions of computing as a discipline. As experiments are historically central to sciences, in visions of computing as a discipline it is less risky to adopt and redefine the term “experiment” than to ignore it. The ambiguity of methodology terminology in computing parallels the situation in the philosophy of science, where experiments remained an unopened black box until the 1980s [ 109 ].

The disciplinary understanding of computing requires a naturalistic view into experiments in the field. There surely is a place for the many normative arguments on experiments in computing that have been structured around idealized views of science and the experiment. But there are also good reasons to challenge the idealized and received views of experiments. How scientists experiment has changed greatly since the days of Galileo and Bacon, as has the role of experiments in the philosophy of science. The form and function of experiments have never been rigid. The experiment has never been a mere judge between right and wrong theories. Experiment is a multidimensional phenomenon, and it is important that those dimensions are appropriately analyzed and discussed in computing, too. Also, insofar as experimentation language in computing needs clarification, it is of great help to understand the different ways in which experiments have been conceived in computing.

Methodological surveys and meta-analyses of computing research have already revealed a great diversity of views concerning empirical methods in computing, as well as what is called “experiments” in computing. Many of those views are similar to the epistemological strategies of researchers identified in the philosophy of experiment [ 81 , 82 ]. Also representing and intervening—the two new characteristics of experimentation in modern science [ 110 ]—are at the very heart of modern computing, but their manifestations in computing deserve deeper analysis, especially with the age of simulation and virtual experiments.

Perhaps the use of experimentation terminology in computing should be made stricter and brought in line with some strict definitions of experimental science. Or perhaps our terminology needs to reflect what is really going on in computing and other disciplines. Either way, it is a matter of disciplinary self-understanding to take computing seriously, in its own right, and to study the discipline of computing from a nonidealized, naturalistic viewpoint. This short survey presents five faces of experiments in computing—feasibility experiment, trial, field experiment, comparison, and controlled experiment. There is a lot more to experiments in computing than what meets the eye, and we believe that their study can benefit both computing as a discipline and our general understanding of experiments in science.

Acknowledgments

This text is based on an invited talk at European Computer Science Summit 2012, Workshop on the Role and Relevance of Experimentation in Informatics, coordinated by Viola Schiaffonati and chaired by Fabio A. Schreiber, Francesco Bruschi, Jan Van Leeuwen, and Letizia Tanca. The authors would like to thank the workshop organizers and participants, as well as the anonymous peer reviewers, for their ideas and input. This research received funding from the Academy of Finland grant no. 132572.

Conflict of Interests

The authors declare that there are no conflict of interests regarding the publication of this paper.

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Computer Simulations in Science

Computer simulation was pioneered as a scientific tool in meteorology and nuclear physics in the period directly following World War II, and since then has become indispensable in a growing number of disciplines. The list of sciences that make extensive use of computer simulation has grown to include astrophysics, particle physics, materials science, engineering, fluid mechanics, climate science, evolutionary biology, ecology, economics, decision theory, medicine, sociology, epidemiology, and many others. There are even a few disciplines, such as chaos theory and complexity theory, whose very existence has emerged alongside the development of the computational models they study.

After a slow start, philosophers of science have begun to devote more attention to the role of computer simulation in science. Several areas of philosophical interest in computer simulation have emerged: What is the structure of the epistemology of computer simulation? What is the relationship between computer simulation and experiment? Does computer simulation raise issues for the philosophy of science that are not fully covered by recent work on models more generally? What does computer simulation teach us about emergence? About the structure of scientific theories? About the role (if any) of fictions in scientific modeling?

1.1 A Narrow definition

1.2 a broad definition, 1.3 an alternative point of view, 2.1 equation-based simulations, 2.2 agent-based simulations, 2.3 multiscale simulations, 2.4 monte carlo simulations, 3. purposes of simulation, 4.1 novel features of eocs, 4.2 eocs and the epistemology of experiment, 4.3 verification and validation, 4.4 eocs and epistemic entitlement, 4.5 pragmatic approaches to eocs, 5. simulation and experiment, 6. computer simulation and the structure of scientific theories, 7. emergence, 8. fictions, other internet resources, related entries, 1. what is computer simulation.

No single definition of computer simulation is appropriate. In the first place, the term is used in both a narrow and a broad sense. In the second place, one might want to understand the term from more than one point of view.

In its narrowest sense, a computer simulation is a program that is run on a computer and that uses step-by-step methods to explore the approximate behavior of a mathematical model. Usually this is a model of a real-world system (although the system in question might be an imaginary or hypothetical one). Such a computer program is a computer simulation model . One run of the program on the computer is a computer simulation of the system. The algorithm takes as its input a specification of the system’s state (the value of all of its variables) at some time t. It then calculates the system’s state at time t+1. From the values characterizing that second state, it then calculates the system’s state at time t+2, and so on. When run on a computer, the algorithm thus produces a numerical picture of the evolution of the system’s state, as it is conceptualized in the model.

This sequence of values for the model variables can be saved as a large collection of “data” and is often viewed on a computer screen using methods of visualization. Often, but certainly not always, the methods of visualization are designed to mimic the output of some scientific instrument—so that the simulation appears to be measuring a system of interest.

Sometimes the step-by-step methods of computer simulation are used because the model of interest contains continuous (differential) equations (which specify continuous rates of change in time) that cannot be solved analytically—either in principle or perhaps only in practice. This underwrites the spirit of the following definition given by Paul Humphreys: “any computer-implemented method for exploring the properties of mathematical models where analytic methods are not available” (1991, 500). But even as a narrow definition, this one should be read carefully, and not be taken to suggest that simulations are only used when there are analytically unsolvable equations in the model. Computer simulations are often used either because the original model itself contains discrete equations—which can be directly implemented in an algorithm suitable for simulation—or because the original model consists of something better described as rules of evolution than as equations .

In the former case, when equations are being “discretized” (the turning of equations that describe continuous rates of change into discrete equations), it should be emphasized that, although it is common to speak of simulations “solving” those equations, a discretization can at best only find something which approximates the solution of continuous equations, to some desired degree of accuracy. Finally, when speaking of “a computer simulation” in the narrowest sense, we should be speaking of a particular implementation of the algorithm on a particular digital computer, written in a particular language, using a particular compiler, etc. There are cases in which different results can be obtained as a result of variations in any of these particulars.

More broadly, we can think of computer simulation as a comprehensive method for studying systems. In this broader sense of the term, it refers to an entire process. This process includes choosing a model; finding a way of implementing that model in a form that can be run on a computer; calculating the output of the algorithm; and visualizing and studying the resultant data. The method includes this entire process—used to make inferences about the target system that one tries to model—as well as the procedures used to sanction those inferences. This is more or less the definition of computer simulation studies in Winsberg 2003 (111). “Successful simulation studies do more than compute numbers. They make use of a variety of techniques to draw inferences from these numbers. Simulations make creative use of calculational techniques that can only be motivated extra-mathematically and extra-theoretically. As such, unlike simple computations that can be carried out on a computer, the results of simulations are not automatically reliable. Much effort and expertise goes into deciding which simulation results are reliable and which are not.” When philosophers of science write about computer simulation, and make claims about what epistemological or methodological properties “computer simulations” have, they usually mean the term to be understood in this broad sense of a computer simulation study.

Both of the above definitions take computer simulation to be fundamentally about using a computer to solve, or to approximately solve, the mathematical equations of a model that is meant to represent some system—either real or hypothetical. Another approach is to try to define “simulation” independently of the notion of computer simulation, and then to define “computer simulation” compositionally: as a simulation that is carried out by a programmed digital computer. On this approach, a simulation is any system that is believed, or hoped, to have dynamical behavior that is similar enough to some other system such that the former can be studied to learn about the latter.

For example, if we study some object because we believe it is sufficiently dynamically similar to a basin of fluid for us to learn about basins of fluid by studying the it, then it provides a simulation of basins of fluid. This is in line with the definition of simulation we find in Hartmann: it is something that “imitates one process by another process. In this definition the term ‘process’ refers solely to some object or system whose state changes in time” (1996, 83). Hughes (1999) objected that Hartmann’s definition ruled out simulations that imitate a system’s structure rather than its dynamics. Humphreys revised his definition of simulation to accord with the remarks of Hartmann and Hughes as follows:

System S provides a core simulation of an object or process B just in case S is a concrete computational device that produces, via a temporal process, solutions to a computational model … that correctly represents B, either dynamically or statically. If in addition the computational model used by S correctly represents the structure of the real system R, then S provides a core simulation of system R with respect to B. (2004, p. 110)

(Note that Humphreys is here defining computer simulation, not simulation generally, but he is doing it in the spirit of defining a compositional term.) It should be noted that Humphreys’ definitions make simulation out to be a success term, and that seems unfortunate. A better definition would be one that, like the one in the last section, included a word like “believed” or “hoped” to address this issue.

In most philosophical discussions of computer simulation, the more useful concept is the one defined in 1.2. The exception is when it is explicitly the goal of the discussion to understand computer simulation as an example of simulation more generally (see section 5). Examples of simulations that are not computer simulations include the famous physical model of the San Francisco Bay (Huggins & Schultz 1973). This is a working hydraulic scale model of the San Francisco Bay and Sacramento-San Joaquin River Delta System built in the 1950s by the Army Corps of engineers to study possible engineering interventions in the Bay. Another nice example, which is discussed extensively in (Dardashti et al., 2015, 2019) is the use of acoustic “dumb holes” made out of Bose-Einstein condensates to study the behavior of Black Holes. Physicist Bill Unruh noted that in certain fluids, something akin to a black hole would arise if there were regions of the fluid that were moving so fast that waves would have to move faster than the speed of sound (something they cannot do) in order to escape from them (Unruh 1981). Such regions would in effect have sonic event horizons. Unruh called such a physical setup a “dumb hole” (“dumb” as in “mute”) and proposed that it could be studied in order to learn things we do not know about black holes. For some time, this proposal was viewed as nothing more than a clever idea, but physicists have recently come to realize that, using Bose-Einstein condensates, they can actually build and study dumb holes in the laboratory. It is clear why we should think of such a setup as a simulation: the dumb hole simulates the black hole. Instead of finding a computer program to simulate the black holes, physicists find a fluid dynamical setup for which they believe they have a good model and for which that model has fundamental mathematical similarities to the model of the systems of interest. They observe the behavior of the fluid setup in the laboratory in order to make inferences about the black holes. The point, then, of the definitions of simulation in this section is to try to understand in what sense computer simulation and these sorts of activities are species of the same genus. We might then be in a better situation to understand why a simulation in the sense of 1.3 which happens to be run on a computer overlaps with a simulation in the sense of 1.2. We will come back to this in section 5.

Barberousse et al. (2009), however, have been critical of this analogy. They point out that computer simulations do not work the way Unruh’s simulation works. It is not the case that the computer as a material object and the target system follow the same differential equations. A good reference about simulations that are not computer simulations is Trenholme 1994.

2. Types of Computer Simulations

Two types of computer simulation are often distinguished: equation-based simulations and agent-based (or individual-based ) simulations . Computer Simulations of both types are used for three different general sorts of purposes: prediction (both pointwise and global/qualitative), understanding, and exploratory or heuristic purposes.

Equation-based simulations are most commonly used in the physical sciences and other sciences where there is governing theory that can guide the construction of mathematical models based on differential equations. I use the term “equation based” here to refer to simulations based on the kinds of global equations we associate with physical theories—as opposed to “rules of evolution” (which are discussed in the next section.) Equation based simulations can either be particle-based, where there are n many discrete bodies and a set of differential equations governing their interaction, or they can be field-based, where there is a set of equations governing the time evolution of a continuous medium or field. An example of the former is a simulation of galaxy formation, in which the gravitational interaction between a finite collection of discrete bodies is discretized in time and space. An example of the latter is the simulation of a fluid, such as a meteorological system like a severe storm. Here the system is treated as a continuous medium—a fluid—and a field representing its distribution of the relevant variables in space is discretized in space and then updated in discrete intervals of time.

Agent-based simulations are most common in the social and behavioral sciences, though we also find them in such disciplines as artificial life, epidemiology, ecology, and any discipline in which the networked interaction of many individuals is being studied. Agent-based simulations are similar to particle-based simulations in that they represent the behavior of n-many discrete individuals. But unlike equation-particle-based simulations, there are no global differential equations that govern the motions of the individuals. Rather, in agent-based simulations, the behavior of the individuals is dictated by their own local rules

To give one example: a famous and groundbreaking agent-based simulation was Thomas Schelling’s (1971) model of “segregation.” The agents in his simulation were individuals who “lived” on a chessboard. The individuals were divided into two groups in the society (e.g. two different races, boys and girls, smokers and non-smokers, etc.) Each square on the board represented a house, with at most one person per house. An individual is happy if he/she has a certain percent of neighbors of his/her own group. Happy agents stay where they are, unhappy agents move to free locations. Schelling found that the board quickly evolved into a strongly segregated location pattern if the agents’ “happiness rules” were specified so that segregation was heavily favored. Surprisingly, however, he also found that initially integrated boards tipped into full segregation even if the agents’ happiness rules expressed only a mild preference for having neighbors of their own type.

In section 2.1 we discussed equation-based models that are based on particle methods and those that are based on field methods. But some simulation models are hybrids of different kinds of modeling methods. Multiscale simulation models, in particular, couple together modeling elements from different scales of description. A good example of this would be a model that simulates the dynamics of bulk matter by treating the material as a field undergoing stress and strain at a relatively coarse level of description, but which zooms into particular regions of the material where important small scale effects are taking place, and models those smaller regions with relatively more fine-grained modeling methods. Such methods might rely on molecular dynamics, or quantum mechanics, or both—each of which is a more fine-grained description of matter than is offered by treating the material as a field. Multiscale simulation methods can be further broken down into serial multiscale and parallel multiscale methods. The more traditional method is serial multi-scale modeling. The idea here is to choose a region, simulate it at the lower level of description, summarize the results into a set of parameters digestible by the higher level model, and pass them up to into the part of the algorithm calculating at the higher level.

Serial multiscale methods are not effective when the different scales are strongly coupled together. When the different scales interact strongly to produce the observed behavior, what is required is an approach that simulates each region simultaneously. This is called parallel multiscale modeling. Parallel multiscale modeling is the foundation of a nearly ubiquitous simulation method: so called “sub-grid” modeling. Sub-grid modeling refers to the representation of important small-scale physical processes that occur at length-scales that cannot be adequately resolved on the grid size of a particular simulation. (Remember that many simulations discretize continuous equations, so they have a relatively arbitrary finite “grid size.”) In the study of turbulence in fluids, for example, a common practical strategy for calculation is to account for the missing small-scale vortices (or eddies ) that fall inside the grid cells. This is done by adding to the large-scale motion an eddy viscosity that characterizes the transport and dissipation of energy in the smaller-scale flow—or any such feature that occurs at too small a scale to be captured by the grid.

In climate science and kindred disciplines, sub-grid modeling is called “parameterization.” This, again, refers to the method of replacing processes—ones that are too small-scale or complex to be physically represented in the model— by a more simple mathematical description. This is as opposed to other processes—e.g., large-scale flow of the atmosphere—that are calculated at the grid level in accordance with the basic theory. It is called “parameterization” because various non-physical parameters are needed to drive the highly approximative algorithms that compute the sub-grid values. Examples of parameterization in climate simulations include the descent rate of raindrops, the rate of atmospheric radiative transfer, and the rate of cloud formation. For example, the average cloudiness over a 100 km 2 grid box is not cleanly related to the average humidity over the box. Nonetheless, as the average humidity increases, average cloudiness will also increase—hence there could be a parameter linking average cloudiness to average humidity inside a grid box. Even though modern-day parameterizations of cloud formation are more sophisticated than this, the basic idea is well illustrated by the example. The use of sub-grid modeling methods in simulation has important consequences for understanding the structure of the epistemology of simulation. This will be discussed in greater detail in section 4.

Sub-grid modelling methods can be contrasted with another kind of parallel multiscale model where the sub-grid algorithms are more theoretically principled, but are motivated by a theory at a different level of description. In the example of the simulation of bulk matter mentioned above, for example, the algorithm driving the smaller level of description is not built by the seat-of-the-pants. The algorithm driving the smaller level is actually more theoretically principled than the higher level in the sense that the physics is more fundamental: quantum mechanics or molecular dynamics vs. continuum mechanics. These kinds of multiscale models, in other words, cobble together the resources of theories at different levels of description. So they provide for interesting examples that provoke our thinking about intertheoretic relationships, and that challenge the widely-held view that an inconsistent set of laws can have no models.

In the scientific literature, there is another large class of computer simulations called Monte Carlo (MC) Simulations. MC simulations are computer algorithms that use randomness to calculate the properties of a mathematical model and where the randomness of the algorithm is not a feature of the target model. A nice example is the use of a random algorithm to calculate the value of π. If you draw a unit square on a piece of paper and inscribe a circle in it, and then randomly drop a collection of objects inside the square, the proportion of objects that land in the circle would be roughly equal to π/4. A computer simulation that simulated a procedure like that would be called a MC simulation for calculating π.

Many philosophers of science have deviated from ordinary scientific language here and have shied away from thinking of MC simulations as genuine simulations. Grüne-Yanoff and Weirich (2010) offer the following reasoning: “The Monte Carlo approach does not have a mimetic purpose: It imitates the deterministic system not in order to serve as a surrogate that is investigated in its stead but only in order to offer an alternative computation of the deterministic system’s properties” (p.30). This shows that MC simulations do not fit any of the above definitions aptly. On the other hand, the divide between philosophers and ordinary language can perhaps be squared by noting that MC simulations simulate an imaginary process that might be used for calculating something relevant to studying some other process. Suppose I am modeling a planetary orbit and for my calculation I need to know the value of π. If I do the MC simulation mentioned in the last paragraph, I am simulating the process of randomly dropping objects into a square, but what I am modeling is a planetary orbit. This is the sense in which MC simulations are simulations, but they are not simulations of the systems they are being used to study. However, as Beisbart and Norton (2012) point out, some MC simulations (viz. those that use MC techniques to solve stochastic dynamical equations referring to a physical system) are in fact simulations of the systems they study.

There are three general categories of purposes to which computer simulations can be put. Simulations can be used for heuristic purposes, for the purpose of predicting data that we do not have, and for generating understanding of data that we do already have.

Under the category of heuristic models, simulations can be further subdivided into those used to communicate knowledge to others, and those used to represent information to ourselves. When Watson and Crick played with tin plates and wire, they were doing the latter at first, and the former when they showed the results to others. When the army corps built the model of the San Francisco Bay to convince the voting population that a particular intervention was dangerous, they were using it for this kind of heuristic purpose. Computer simulations can be used for both of these kinds of purposes—to explore features of possible representational structures; or to communicate knowledge to others. For example: computer simulations of natural processes, such as bacterial reproduction, tectonic shifting, chemical reactions, and evolution have all been used in classroom settings to help students visualize hidden structure in phenomena and processes that are impractical, impossible, or costly to illustrate in a “wet” laboratory setting.

Another broad class of purposes to which computer simulations can be put is in telling us about how we should expect some system in the real world to behave under a particular set of circumstances. Loosely speaking: computer simulation can be used for prediction. We can use models to predict the future, or to retrodict the past; we can use them to make precise predictions or loose and general ones. With regard to the relative precision of the predictions we make with simulations, we can be slightly more fine-grained in our taxonomy. There are a) Point predictions: Where will the planet Mars be on October 21st, 2300? b) “Qualitative” or global or systemic predictions: Is the orbit of this planet stable? What scaling law emerges in these kinds of systems? What is the fractal dimension of the attractor for systems of this kind? and c) Range predictions: It is 66% likely that the global mean surface temperature will increase by between 2–5 degrees C by the year 2100; it is “highly likely” that sea level will rise by at least two feet; it is “implausible” that the thermohaline will shut down in the next 50 years.

Finally, simulations can be used to understand systems and their behavior. If we already have data telling us how some system behaves, we can use computer simulation to answer questions about how these events could possibly have occurred; or about how those events actually did occur.

When thinking about the topic of the next section, the epistemology of computer simulations, we should also keep in mind that the procedures needed to sanction the results of simulations will often depend, in large part, on which of the above kind of purpose or purposes the simulation will be put to.

4. The Epistemology of Computer Simulations

As computer simulation methods have gained importance in more and more disciplines, the issue of their trustworthiness for generating new knowledge has grown, especially when simulations are expected to be counted as epistemic peers with experiments and traditional analytic theoretical methods. The relevant question is always whether or not the results of a particular computer simulation are accurate enough for their intended purpose. If a simulation is being used to forecast weather, does it predict the variables we are interested in to a degree of accuracy that is sufficient to meet the needs of its consumers? If a simulation of the atmosphere above a Midwestern plain is being used to understand the structure of a severe thunderstorm, do we have confidence that the structures in the flow—the ones that will play an explanatory role in our account of why the storm sometimes splits in two, or why it sometimes forms tornados—are being depicted accurately enough to support our confidence in the explanation? If a simulation is being used in engineering and design, are the predictions made by the simulation reliable enough to sanction a particular choice of design parameters, or to sanction our belief that a particular design of airplane wing will function? Assuming that the answer to these questions is sometimes “yes”, i.e. that these kinds of inferences are at least sometimes justified, the central philosophical question is: what justifies them? More generally, how can the claim that a simulation is good enough for its intended purpose be evaluated? These are the central questions of the epistemology of computer simulation (EOCS).

Given that confirmation theory is one of the traditional topics in philosophy of science, it might seem obvious that the latter would have the resources to begin to approach these questions. Winsberg (1999), however, argued that when it comes to topics related to the credentialing of knowledge claims, philosophy of science has traditionally concerned itself with the justification of theories, not their application. Most simulation, on the other hand, to the extent that it makes use of the theory, tends to make use of the well-established theory. EOCS, in other words, is rarely about testing the basic theories that may go into the simulation, and most often about establishing the credibility of the hypotheses that are, in part, the result of applications of those theories.

Winsberg (2001) argued that, unlike the epistemological issues that take center stage in traditional confirmation theory, an adequate EOCS must meet three conditions. In particular it must take account of the fact that the knowledge produced by computer simulations is the result of inferences that are downward , motley , and autonomous .

Downward . EOCS must reflect the fact that in a large number of cases, accepted scientific theories are the starting point for the construction of computer simulation models and play an important role in the justification of inferences from simulation results to conclusions about real-world target systems. The word “downward” was meant to signal the fact that, unlike most scientific inferences that have traditionally interested philosophers, which move up from observation instances to theories, here we have inferences that are drawn (in part) from high theory, down to particular features of phenomena. Motley . EOCS must take into account that simulation results nevertheless typically depend not just on theory but on many other model ingredients and resources as well, including parameterizations (discussed above), numerical solution methods, mathematical tricks, approximations and idealizations, outright fictions, ad hoc assumptions, function libraries, compilers and computer hardware, and perhaps most importantly, the blood, sweat, and tears of much trial and error. Autonomous . EOCS must take into account the autonomy of the knowledge produced by simulation in the sense that the knowledge produced by simulation cannot be sanctioned entirely by comparison with observation. Simulations are usually employed to study phenomena where data are sparse. In these circumstances, simulations are meant to replace experiments and observations as sources of data about the world because the relevant experiments or observations are out of reach, for principled, practical, or ethical reasons.

Parker (2013) has made the point that the usefulness of these conditions is somewhat compromised by the fact that it is overly focused on simulation in the physical sciences, and other disciplines where simulation is theory-driven and equation-based. This seems correct. In the social and behavioral sciences, and other disciplines where agent-based simulation (see 2.2) are more the norm, and where models are built in the absence of established and quantitative theories, EOCS probably ought to be characterized in other terms.

For instance, some social scientists who use agent-based simulation pursue a methodology in which social phenomena (for example an observed pattern like segregation) are explained, or accounted for, by generating similar looking phenomena in their simulations (Epstein and Axtell 1996; Epstein 1999). But this raises its own sorts of epistemological questions. What exactly has been accomplished, what kind of knowledge has been acquired, when an observed social phenomenon is more or less reproduced by an agent-based simulation? Does this count as an explanation of the phenomenon? A possible explanation? (see e.g., Grüne-Yanoff 2007). Giuseppe Primiero (2019) argues that there is a whole domain of “artificial sciences” built around agent-based and multi-agent system based simulations, and that it requires its own epistemology--one where validation cannot be defined by comparison with an existing real-world system, but must be defined vis a vis an intended system.

It is also fair to say, as Parker does (2013), that the conditions outlined above pay insufficient attention to the various and differing purposes for which simulations are used (as discussed in 2.4). If we are using a simulation to make detailed quantitative predictions about the future behavior of a target system, the epistemology of such inferences might require more stringent standards than those that are involved when the inferences being made are about the general, qualitative behavior of a whole class of systems. Indeed, it is also fair to say that much more work could be done in classifying the kinds of purposes to which computer simulations are put and the constraints those purposes place on the structure of their epistemology.

Frigg and Reiss (2009) argued that none of these three conditions are new to computer simulation. They argued that ordinary ‘paper and pencil’ modeling incorporate these features. Indeed, they argued that computer simulation could not possibly raise new epistemological issues because the epistemological issues could be cleanly divided into the question of the appropriateness of the model underlying the simulation, which is an issue that is identical to the epistemological issues that arise in ordinary modeling, and the question of the correctness of the solution to the model equations delivered by the simulation, which is a mathematical question, and not one related to the epistemology of science. On the first point, Winsberg (2009b) replied that it was the simultaneous confluence of all three features that was new to simulation. We will return to the second point in section 4.3

Some of the work on the EOCS has developed analogies between computer simulation in order to draw on recent work in the epistemology of experiment, particularly the work of Allan Franklin; see the entry on experiments in physics .

In his work on the epistemology of experiment, Franklin (1986, 1989) identified a number of strategies that experimenters use to increase rational confidence in their results. Weissart (1997) and Parker (2008a) argued for various forms of analogy between these strategies and a number of strategies available to simulationists to sanction their results. The most detailed analysis of these relationships is to be found in Parker 2008a, where she also uses these analogies to highlight weaknesses in current approaches to simulation model evaluation.

Winsberg (2003) also makes use of Ian Hacking’s (1983, 1988, 1992) work on the philosophy of experiment. One of Hacking’s central insights about experiment is captured in his slogan that experiments have a life of their own’ (1992: 306). Hacking intended to convey two things with this slogan. The first was a reaction against the unstable picture of science that comes, for example, from Kuhn. Hacking (1992) suggests that experimental results can remain stable even in the face of dramatic changes in the other parts of sciences. The second, related, point he intended to convey was that ‘experiments are organic, develop, change, and yet retain a certain long-term development which makes us talk about repeating and replicating experiments’ (1992: 307). Some of the techniques that simulationists use to construct their models get credentialed in much the same way that Hacking says that instruments and experimental procedures and methods do; the credentials develop over an extended period of time and become deeply tradition-bound. In Hacking’s language, the techniques and sets of assumptions that simulationists use become ‘self-vindicating’. Perhaps a better expression would be that they carry their own credentials. This provides a response to the problem posed in 4.1, of understanding how simulation could have a viable epistemology despite the motley and autonomous nature of its inferences.

Drawing inspiration from another philosopher of experiment (Mayo 1996), Parker (2008b) suggests a remedy to some of the shortcomings in current approaches to simulation model evaluation. In this work, Parker suggests that Mayo’s error-statistical approach for understanding the traditional experiment—which makes use of the notion of a “severe test”—could shed light on the epistemology of simulation. The central question of the epistemology of simulation from an error-statistical perspective becomes, ‘What warrants our taking a computer simulation to be a severe test of some hypothesis about the natural world? That is, what warrants our concluding that the simulation would be unlikely to give the results that it in fact gave, if the hypothesis of interest were false (2008b, 380)? Parker believes that too much of what passes for simulation model evaluation lacks rigor and structure because it:

consists in little more than side-by-side comparisons of simulation output and observational data, with little or no explicit argumentation concerning what, if anything, these comparisons indicate about the capacity of the model to provide evidence for specific scientific hypotheses of interest. (2008b, 381)

Drawing explicitly upon Mayo’s (1996) work, she argues that what the epistemology of simulation ought to be doing, instead, is offering some account of the ‘canonical errors’ that can arise, as well as strategies for probing for their presence.

Practitioners of simulation, particularly in engineering contexts, in weapons testing, and in climate science, tend to conceptualize the EOCS in terms of verification and validation . Verification is said to be the process of determining whether the output of the simulation approximates the true solutions to the differential equations of the original model. Validation , on the other hand, is said to be the process of determining whether the chosen model is a good enough representation of the real-world system for the purpose of the simulation. The literature on verification and validation from engineers and scientists is enormous and it is beginning to receive some attention from philosophers.

Verification can be divided into solution verification and code verification. The former verifies that the output of the intended algorithm approximates the true solutions to the differential equations of the original model. The latter verifies that the code, as written, carries out the intended algorithm. Code verification has been mostly ignored by philosophers of science; probably because it has been seen as more of a problem in computer science than in empirical science—perhaps a mistake. Part of solution verification consists in comparing computed output with analytic solutions (so called “benchmark solutions”). Though this method can of course help to make case for the results of a computer simulation, it is by itself inadequate , since simulations are often used precisely because analytic solution is unavailable for regions of solution space that are of interest. Other indirect techniques are available: the most important of which is probably checking to see whether and at what rate computed output converges to a stable solution as the time and spatial resolution of the discretization grid gets finer.

The principal strategy of validation involves comparing model output with observable data. Again, of course, this strategy is limited in most cases, where simulations are being run because observable data are sparse. But complex strategies can be employed, including comparing the output of subsystems of a simulation to relevant experiments (Parker, 2013; Oberkampf and Roy 2010).

The concepts of verification and validation has drawn some criticism from philosophers. Oreskes et al. 1994, a very widely-cited article, was mostly critical of the terminology, arguing that “validity,” in particular, is a property that only applies to logical arguments, and that hence the term, when applied to models, might lead to overconfidence.

Winsberg (2010, 2018, p.155) has argued that the conceptual division between verification and validation can be misleading, if it is taken to suggest that there is one set of methods which can, by itself, show that we have solved the equations right, and that there is another set of methods, which can, by itself, show that we’ve got the right equations. He also argued that it is misleading to think that the epistemology of simulation is cleanly divided into an empirical part (verification) and a mathematical (and computer science) part (validation.) But this misleading idea often follows discussion of verification and validation. We find this both in the work of practitioners and philosophers.

Here is the standard line from a practitioner, Roy: “Verification deals with mathematics and addresses the correctness of the numerical solution to a given model. Validation, on the other hand, deals with physics and addresses the appropriateness of the model in reproducing experimental data. Verification can be thought of as solving the chosen equations correctly, while validation is choosing the correct equations in the first place” (Roy 2005).

Some philosophers have put this distinction to work in arguments about the philosophical novelty of simulation. We first raised this issue in section 4.1, where Frigg and Reiss argued that simulation could have no epistemologically novel features, since it contained two distinct components: a component that is identical to the epistemology of ordinary modeling, and a component that is entirely mathematical. “We should distinguish two different notions of reliability here, answering two different questions. First, are the solutions that the computer provides close enough to the actual (but unavailable) solutions to be useful?…this is a purely mathematical question and falls within the class of problems we have just mentioned. So, there is nothing new here from a philosophical point of view and the question is indeed one of number crunching. Second, do the computational models that are the basis of the simulations represent the target system correctly? That is, are the simulation results externally valid? This is a serious question, but one that is independent of the first problem, and one that equally arises in connection with models that do not involve intractable mathematics and ordinary experiments” (Frigg and Reiss 2009).

But verification and validation are not, strictly speaking, so cleanly separable. That is because most methods of validation, by themselves, are much too weak to establish the validity of a simulation. And most model equations chosen for simulation are not in any straightforward sense “the right equations”; they are not the model equations we would choose in an ideal world. We have good reason to think, in other words, that there are model equations out there that enjoy better empirical support, in the abstract. The equations we choose often reflect a compromise between what we think best describes the phenomena and computational tractability. So the equations that are chosen are rarely well “validated” on their own. If we want to understand why simulation results are taken to be credible, we have to look at the epistemology of simulation as an integrated whole, not as cleanly divided into verification and validation—each of which, on its own, would look inadequate to the task.

So one point is that verification and validation are not independently-successful and separable activities. But the other point is that there are not two independent entities onto which these activities can be directed: a model chosen to discretized, and a method for discretizing it. Once one recognizes that the equations to be “solved” are sometimes chosen so as to cancel out discretization errors, etc. (Lenhard 2007 has a very nice example of this involving the Arakawa operator), this later distinction gets harder to maintain. So success is achieved in simulation with a kind of back-and-forth, trial-and-error, piecemeal adjustment between model and method of calculation. And when this is the case, it is hard even to know what it means to say that a simulation is separately verified and validated.

No one has argued that V&V isn’t a useful distinction, but rather that scientists shouldn’t overinflate a pragmatically useful distinction into a clean methodological dictate that misrepresents the messiness of their own practice. Collaterally, Frigg and Reiss’s argument for the absence of epistemological novelty in simulation fails for just this reason. It is not “a purely mathematical question” whether the solutions that the computer provides close enough to the actual (but unavailable) solutions to be useful. At least not in this respect: it is not a question that can be answered, as a pragmatic matter, entirely using mathematical methods. And hence it is an empirical/epistemological issue that does not arise in ordinary modeling.

A major strand of ordinary (outside of the philosophy of science) epistemology is to emphasize the degree to which it is a condition for the possibility of knowledge that we rely on our senses and the testimony of other people in a way that we cannot ourselves justify. According to Tyler Burge (1993,1998), belief in the results of these two processes are warranted but not justified. Rather, according to Burge, we are entitled to these beliefs. “[w]e are entitled to rely, other things equal, on perception, memory, deductive and inductive reasoning, and on…the word of others” (1993, p. 458). Beliefs in which a believer is entitled are those that are unsupported by evidence available to the believer, but which the believer is nevertheless warranted in believing.

Some work in EOCS has developed analogies between computer simulation and the kinds of knowledge producing practices Burge associates with entitlement . (See especially Barberousse and Vorms, 2014, and Beisbart, 2017.) This is, in some ways, a natural outgrowth of Burge’s arguments that we view computer assisted proofs in this way (1998). Computer simulations are extremely complex, often the result of the epistemic labor of a diverse set of scientists and other experts, and perhaps most importantly, epistemically opaque (Humphreys, 2004). Because of these features, Beisbart argues that it is reasonable to treat computer simulations in the same way that we treat our senses and the testimony of others: simply as things that can be trusted on the assumption that everything is working smoothly. (Beisbart, 2017).

Symons and Alvarado (2019) argue that there is a fundamental problem with this approach to EOCS and it has to do with a feature of computer-aided proof that was crucial to Burge’s original account: that of a being a ‘transparent conveyor’. “It is very important to note, for example, that Burge’s account of content preservation and transparent conveying requires that the recipient already has reason not to doubt the source” (p. 13). But Symons and Alvarado point to many of the properties of computer simulations (drawing from Winsberg 2010 and Ruphy 2015) in virtue of which they fail to have these properties. Lenhard and Küster 2019 is also relevant here, as they argue that there are many features of computer simulation that make them difficult to reproduce and that therefore undermine some of the stability that would be required for them to be transparent conveyors. For these reasons and others having to do with many of the features discussed in 4.2 and 4.3, Symons and Alvarado argue that it is implausible that we should view computer simulation as a basic epistemic practice on a par with sense perception, memory, testimony, or the like.

Another approach to EOCS is to ground it in the practical aspects of the craft of modeling and simulation. According to this view, in other words, the best account we can give of the reasons we have for believing the results of computer simulation studies is to have trust in the practical skills and craft of the modelers that use them. A good example of this kind of account is (Hubig and Kaminski, 2017). The epistemological goal of this kind of work is to identify the locus of our trust in simulations in practical aspects of the craft of modeling and simulation, rather than in any features of the models themselves. (Resch et al, 2017) argue that a good part of the reason we should trust simulations is not because of the simulations themselves, but because of the interpretive artistry of those who employ their art and skill to interpret simulation outputs. Symons and Alvarado (2019) are also critical of this approach, arguing that “Part of the task of the epistemology of computer simulation is to explain the difference between the contemporary scientist’s position in relation to epistemically opaque computer simulations..” (p.7) and the believers in a mechanical oracle’s relation to their oracles. Pragmatic and epistemic considerations, according to Symons and Alvarado, co-exist, and they are not possible competitors for the correct explanation of our trust in simulations--the epistemic reasons are ultimate what explain and ground the pragmatic ones.

Working scientists sometimes describe simulation studies in experimental terms. The connection between simulation and experiment probably goes back as far as von Neumann, who, when advocating very early on for the use of computers in physics, noted that many difficult experiments had to be conducted merely to determine facts that ought, in principle, to be derivable from theory. Once von Neumann’s vision became a reality, and some of these experiments began to be replaced by simulations, it became somewhat natural to view them as versions of experiment. A representative passage can be found in a popular book on simulation:

A simulation that accurately mimics a complex phenomenon contains a wealth of information about that phenomenon. Variables such as temperature, pressure, humidity, and wind velocity are evaluated at thousands of points by the supercomputer as it simulates the development of a storm, for example. Such data, which far exceed anything that could be gained from launching a fleet of weather balloons, reveals intimate details of what is going on in the storm cloud. (Kaufmann and Smarr 1993, 4)

The idea of “in silico” experiments becomes even more plausible when a simulation study is designed to learn what happens to a system as a result of various possible interventions: What would happen to the global climate if x amount of carbon were added to the atmosphere? What will happen to this airplane wing if it is subjected to such-and-such strain? How would traffic patterns change if an onramp is added at this location?

Philosophers, consequently, have begun to consider in what sense, if any, computer simulations are like experiments and in what sense they differ. A related issue is the question of when a process that fundamentally involves computer simulation can counts as measurement (Parker, 2017) A number of views have emerged in the literature centered around defending and criticizing two theses:

The identity thesis . Computer simulation studies are literally instances of experiments. The epistemological dependence thesis . The identity thesis would (if it were true) be a good reason (weak version), or the best reason (stronger version), or the only reason (strongest version; it is a necessary condition) to believe that simulations can provide warrants for belief in the hypotheses that they support. A consequence of the strongest version is that only if the identity thesis is true is there reason to believe that simulations can confer warrant for believing in hypotheses.

The central idea behind the epistemological dependence thesis is that experiments are the canonical entities that play a central role in warranting our belief in scientific hypotheses, and that therefore the degree to which we ought to think that simulations can also play a role in warranting such beliefs depends on the extent to which they can be identified as a kind of experiment.

One can find philosophers arguing for the identity thesis as early as Humphreys 1995 and Hughes 1999. And there is at least implicit support for (the stronger) version of the epistemological dependence thesis in Hughes. The earliest explicit argument in favor of the epistemological dependence thesis, however, is in Norton and Suppe 2001. According to Norton and Suppe, simulations can warrant belief precisely because they literally are experiments. They have a detailed story to tell about in what sense they are experiments, and how this is all supposed to work. According to Norton and Suppe, a valid simulation is one in which certain formal relations (what they call ‘realization’) hold between a base model, the modeled physical system itself, and the computer running the algorithm. When the proper conditions are met, ‘a simulation can be used as an instrument for probing or detecting real world phenomena. Empirical data about real phenomena are produced under conditions of experimental control’ (p. 73).

One problem with this story is that the formal conditions that they set out are much too strict. It is unlikely that there are very many real examples of computer simulations that meet their strict standards. Simulation is almost always a far more idealizing and approximating enterprise. So, if simulations are experiments, it is probably not in the way that Norton and Suppe imagined.

More generally, the identity thesis has drawn fire from other quarters.

Gilbert and Troitzsch argued that “[t]he major difference is that while in an experiment, one is controlling the actual object of interest (for example, in a chemistry experiment, the chemicals under investigation), in a simulation one is experimenting with a model rather than the phenomenon itself.” (Gilbert and Troitzsch 1999, 13). But this doesn’t seem right. Many (Guala 2002, 2008, Morgan 2003, Parker 2009a, Winsberg 2009a) have pointed to problems with the claim. If Gilbert and Troitzsch mean that simulationists manipulate models in the sense of abstract objects, then the claim is difficult to understand—how do we manipulate an abstract entity? If, on the other hand, they simply mean to point to the fact that the physical object that simulationists manipulate—a digital computer—is not the actual object of interest, then it is not clear why this differs from ordinary experiments.

It is false that real experiments always manipulate exactly their targets of interest. In fact, in both real experiments and simulations, there is a complex relationship between what is manipulated in the investigation on the one hand, and the real-world systems that are the targets of the investigation on the other. In cases of both experiment and simulation, therefore, it takes an argument of some substance to establish the ‘external validity’ of the investigation – to establish that what is learned about the system being manipulated is applicable to the system of interest. Mendel, for example, manipulated pea plants, but he was interested in learning about the phenomenon of heritability generally. The idea of a model organism in biology makes this idea perspicuous. We experiment on Caenorhabditis elegans because we are interested in understanding how organism in general use genes to control development and genealogy. We experiment on Drosophila melanogaster , because it provides a useful model of mutations and genetic inheritance. But the idea is not limited to biology. Galileo experimented with inclined planes because he was interested in how objects fall and how they would behave in the absence of interfering forces—phenomena that the inclined plane experiments did not even actually instantiate.

Of course, this view about experiments is not uncontested. It is true that, quite often, experimentalists infer something about a system distinct from the system they interfere with. However, it is not clear whether this inference is proper part of the original experiment. Peschard (2010) mounts a criticism along these lines, and hence can be seen as a defender of Gilbert and Troitzsch. Peschard argues that the fundamental assumption of their critics—that in experimentation, just as in simulation, what is manipulated is a system standing in for a target system—is confused. It confuses, Peschard argues, the epistemic target of an experiment with its epistemic motivation. She argues that while the epistemic motivation for doing experiments on C. elegans might be quite far-reaching, the proper epistemic target for any such experiment is the worm itself. In a simulation, according to Peschard, however, the epistemic target is never the digital computer itself. Thus, simulation is distinct from experiment, according to her, in that its epistemic target (as opposed to merely its epistemic motivation) is distinct from the object being manipulated. Roush (2017) can also be seen as a defender of the Gilbert and Troitzsch line, but Roush appeals to sameness of natural kinds as the crucial feature that separates experiments and simulations. Other opponents of the identity thesis include Giere (2009) and Beisbart and Norton (2012, Other Internet Resources).

It is not clear how to adjudicate this dispute, and it seems to revolve primarily around a difference of emphasis. One can emphasize the difference between experiment and simulation, following Gilbert and Troitzsch and Peschard, by insisting that experiments teach us first about their epistemic targets and only secondarily allow inferences to the behavior of other systems. (I.e., experiments on worms teach us, in the first instance, about worms, and only secondarily allow us to make inferences about genetic control more generally.) This would make them conceptually different from computer simulations, which are not thought to teach us, in the first instance, about the behavior of computers, and only in the second instance about storms, or galaxies, or whatever.

Or one can emphasize similarity in the opposite way. One can emphasize the degree to which experimental targets are always chosen as surrogates for what’s really of interest. Morrison, 2009 is probably the most forceful defender of emphasizing this aspect of the similarity of experiment and simulation. She argues that most experimental practice, and indeed most measurement practice, involve the same kinds of modeling practices as simulations. In any case, pace Peschard, nothing but a debate about nomenclature—and maybe an appeal to the ordinary language use of scientists; not always the most compelling kind of argument—would prevent us from saying that the epistemic target of a storm simulation is the computer, and that the storm is merely the epistemic motivation for studying the computer.

Be that as it may, many philosophers of simulation, including those discussed in this section, have chosen the latter path—partly as a way of drawing attention to ways in which the message lurking behind Gilbert and Troitzsch’s quoted claim paints an overly simplistic picture of experiment. It does seem overly simplistic to paint a picture according to which experiment gets a direct grip on the world, whereas simulation’s situation is exactly opposite . And this is the picture one seems to get from the Gilber and Troitzsch quotation. Peschard’s more sophisticated picture involving a distinction between epistemic targets and epistemic motivations goes a long way towards smoothing over those concerns without pushing us into the territory of thinking that simulation and experiment are exactly the same , in this regard.

Still, despite rejecting Gilbert and Troitzsch’s characterization of the difference between simulation and experiment, Guala and Morgan both reject the identity thesis. Drawing on the work of Simon (1969), Guala argues that simulations differ fundamentally from experiments in that the object of manipulation in an experiment bears a material similarity to the target of interest, but in a simulation, the similarity between object and target is merely formal. Interestingly, while Morgan accepts this argument against the identity thesis, she seems to hold to a version of the epistemological dependency thesis . She argues, in other words, that the difference between experiments and simulations identified by Guala implies that simulations are epistemologically inferior to real experiments – that they have intrinsically less power to warrant belief in hypotheses about the real world because they are not experiments.

A defense of the epistemic power of simulations against Morgan’s (2002) argument could come in the form of a defense of the identity thesis, or in the form of a rejection of the epistemological dependency thesis. On the former front, there seem to be two problems with Guala’s (2002) argument against the identity thesis. The first is that the notion of material similarity here is too weak, and the second is that the notion of mere formal similarity is too vague, to do the required work. Consider, for example, the fact that it is not uncommon, in the engineering sciences, to use simulation methods to study the behavior of systems fabricated out of silicon. The engineer wants to learn about the properties of different design possibilities for a silicon device, so she develops a computational model of the device and runs a simulation of its behavior on a digital computer. There are deep material similarities between, and some of the same material causes are at work in, the central processor of the computer and the silicon device being studied. On Guala’s line of reasoning, this should mark this as an example of a real experiment, but that seems wrong. The peculiarities of this example illustrate the problem rather starkly, but the problem is in fact quite general: any two systems bear some material similarities to each other and some differences.

On the flip side, the idea that the existence of a formal similarity between two material entities could mark anything interesting is conceptually confused. Given any two sufficiently complex entities, there are many ways in which they are formally identical, not to mention similar. There are also ways in which they are formally completely different. Now, we can speak loosely, and say that two things bear a formal similarity, but what we really mean is that our best formal representations of the two entities have formal similarities. In any case, there appear to be good grounds for rejecting both the Gilbert and Troitzsch and the Morgan and Guala grounds for distinguishing experiments and simulations.

Returning to the defense of the epistemic power of simulations, there are also grounds for rejecting the epistemological dependence thesis. As Parker (2009a) points out, in both experiment and simulation, we can have relevant similarities between computer simulations and target systems, and that’s what matters. When the relevant background knowledge is in place, a simulation can provide more reliable knowledge of a system than an experiment. A computer simulation of the solar system, based on our most sophisticated models of celestial dynamics, will produce better representations of the planets’ orbits than any experiment.

Parke (2014) argues against the epistemological dependency thesis by undermining two premises that she believes support it: first, that experiments generate greater inferential power than simulations, and second, that simulations cannot surprise us in the same way that experiments can. The argument that simulations cannot surprise us comes from Morgan (2005). Pace Morgan, Parke argues that simulationists are often surprised by their simulations, both because they are not computationally omniscient, and because they are not always the sole creators of the models and code they use. She argues, moreover, that ‘[d]ifferences in researcher’s epistemic states, alone, seem like the wrong grounds for tracking a distinction between experiment and simulation’ (258). Adrian Curry (2017) defends Morgan’s original intuition by making two friendly amendments. He argues that the distinction Morgan was really after was between two different kinds of surprise, and in particular to what the source of surprise is: surprise due to bringing out theoretical knowledge into contact with the world are distinctive of experiment. He also more carefully defines surprise in a non-psychological way such that it is a “quality the attainment of which constitutes genuine epistemic progress” (p. 640).

Paul Humphreys (2004) has argued that computer simulations have profound implications for our understanding of the structure of theories; he argues that they reveal inadequacies with both the semantic and syntactic views of scientific theories. This claim has drawn sharp fire from Roman Frigg and Julian Reiss (2009). Frigg and Reiss argue that whether a model admits of analytic solution or not has no bearing on how it relates to the world. They use the example of the double pendulum to show this. Whether or not the pendulum’s inner fulcrum is held fixed (a fact which will determine whether the relevant model is analytically solvable) has no bearing on the semantics of the elements of the model. From this, they conclude that the semantics of a model, or how it relates to the world, is unaffected by whether or not the model is analytically solvable.

This was not responsive, however, to the most charitable reading of what Humphreys was pointing at. The syntactic and semantic views of theories, after all, were not just accounts of how our abstract scientific representations relate to the world. More particularly, they were not stories about the relation between particular models and the world, but rather about the relation between theories and the world, and the role, if any, that models played in that relation.

They were also stories that had a lot to say about where the philosophically interesting action is when it comes to scientific theorizing. The syntactic view suggested that scientific practice could be adequately rationally reconstructed by thinking of theories as axiomatic systems, and, more importantly, that logical deduction was a useful regulative ideal for thinking about how inferences from theory to the world are drawn. The syntactic view also, by omission, made if fairly clear that modeling played, if anything, only a heuristic role in science. (This was a feature of the syntactic view of theories that Frederick Suppe, one of its most ardent critics, often railed against.) Theories themselves had nothing to do with models, and theories could be compared directly to the world, without any important role for modeling to play.

The semantic view of theories, on the other hand, did emphasize an important role for models, but it also urged that theories were non-linguistic entities. It urged philosophers not to be distracted by the contingencies of the particular form of linguistic expression a theory might be found in, say, a particular textbook.

Computer simulations, however, do seem to illustrate that both of these themes were misguided. It was profoundly wrong to think that logical deduction was the right tool for rationally reconstructing the process of theory application. Computer simulations show that there are methods of theory application that vastly outstrip the inferential power of logical deduction. The space of solutions, for example, that is available via logical deduction from the theory of fluids is microscopic compared with the space of applications that can be explored via computer simulation. On the flip side, computer simulations seem to reveal that, as Humphreys (2004) has urged, syntax matters. It was wrong, it turns out, to suggest, as the semantic view did, that the particular linguistic form in which a scientific theory is expressed is philosophically uninteresting. The syntax of the theory’s expression will have a deep effect on what inferences can be drawn from it, what kinds of idealizations will work well with it, etc. Humphreys put the point as follows: “the specific syntactic representation used is often crucial to the solvability of the theory’s equations” (Humphreys 2009, p.620). The theory of fluids can be used to emphasize this point: whether we express that theory in Eulerian or Lagrangian form will deeply affect what, in practice, we can calculate and how; it will affect what idealizations, approximations, and calculational techniques will be effective and reliable in which circumstances. So the epistemology of computer simulation needs to be sensitive to the particular syntactic formulation of a theory, and how well that particular formulation has been credentialed. Hence, it does seem right to emphasize, as Humphreys (2004) did, that computer simulations have revealed inadequacies with both the syntactic and semantic theories.

Paul Humphreys (2004) and Mark Bedau (1997, 2011) have argued that philosophers interested in the topic of emergence can learn a great deal by looking at computer simulation. Philosophers interested in this topic should consult the entry on emergent properties , where the contributions of all these philosophers have been discussed.

The connection between emergence and simulation was perhaps best articulated by Bedau in his (2011). Bedau argued that any conception of emergence must meet the twin hallmarks of explaining how the whole depends on its parts and how the whole is independent of its parts. He argues that philosophers often focus on what he calls “strong” emergence, which posits brute downward causation that is irreducible in principle. But he argues that this is a mistake. He focuses instead on what he calls “weak” emergence, which allows for reducibility of wholes to parts in principle but not in practice . Systems that produce emergent properties are mere mechanisms, but the mechanisms are very complex (they have very many independently interacting parts). As a result, there is no way to figure out exactly what will happen given a specific set of initial and boundary conditions, except to “crawl the causal web”. It is here that the connection to computer simulation arises. Weakly emergent properties are characteristic of complex systems in nature. And it is also characteristic of complex computer simulations that there is no way to predict what they will do except to let them run. Weak emergence explains, according to Bedau, why computer simulations play a central role in the science of complex systems. The best way to understand and predict how real complex systems behave is to simulate them by crawling the micro-causal web, and see what happens.

Models of course involve idealizations. But it has been argued that some kinds of idealization, which play an especially prominent role in the kinds of modeling involved in computer simulation, are special—to the point that they deserve the title of “fiction.” This section will discuss attempts to define fictions and explore their role in computer simulation.

There are two different lines of thinking about the role of fictions in science. According to one, all models are fictions. This line of thinking is motivated by considering the role, for example, of “the ideal pendulum” in science. Scientists, it is argued, often make claims about these sorts of entities (e.g., “the ideal pendulum has a period proportional to the square-root of its length”) but they are nowhere to be found in the real world; hence they must be fictional entities. This line of argument about fictional entities in science does not connect up in any special way with computer simulation—readers interested in this topic should consult the entry on -->scientific representation --> [forthcoming].

Another line of thinking about fictions is concerned with the question of what sorts of representations in science ought to be regarded as fictional. Here, the concern is not so much about the ontology of scientific model entities, but about the representational character of various postulated model entities. Here, Winsberg (2009c) has argued that fictions do have a special connection to computer simulations. Or rather, that some computer simulations contain elements that best typify what we might call fictional representations in science, even if those representations are not uniquely present in simulations.

He notes that the first conception of a fiction—mentioned above—which makes “any representation that contradicts reality a fiction” (p. 179), doesn’t correspond to our ordinary use of the term: a rough map is not fiction. He then proposes an alternative definition: nonfiction is offered as a “good enough” guide to some part of the world (p. 181); fiction is not. But the definition needs to be refined. Take the fable of the grasshopper and the ant. Although the fable offers lessons about how the world is, it is still fiction because it is “a useful guide to the way the world is in some general sense” rather than a specific guide to the way a part of the world is, its “prima facie representational target”, a singing grasshopper and toiling ant. Nonfictions, on the other hand, “point to a certain part of the world” and are a guide to that part of the world (p. 181).

These kinds of fictional components of models are paradigmatically exemplified in certain computer simulations. Two of his examples are the “silogen atom” and “artificial viscosity.” Silogen atoms appear in certain nanomechanical models of cracks in silicon—a species of the kind of multiscale models that blend quantum mechanics and molecular mechanics mentioned in section 2.3. The silogen containing models of crack propagation in silicon work by describing the crack itself using quantum mechanics and the region immediately surrounding the crack using classical molecular dynamics. To bring together the modeling frameworks in the two regions, the boundary gets treated as if it contains ‘silogen’ atoms, which have a mixture of the properties of silicon and those of hydrogen. Silogen atoms are fictions. They are not offered as even a ‘good enough’ description of the atoms at the boundary—their prima facie representational targets. But they are used so that the overall model can be hoped to get things right. Thus the overall model is not a fiction, but one of its components is. Artificial viscosity is a similar sort of example. Fluids with abrupt shocks are difficult to model on a computational grid because the abrupt shock hides inside a single grid cell, and cannot be resolved by such an algorithm. Artificial viscosity is a technique that pretends that the fluid is highly viscous—a fiction—right were the shock is, so that he shock becomes less abrupt, and blurs over several grid cells. Getting the viscosity, and hence the thickness of the shock, wrong, helps to get the overall model to work “well enough.” Again, the overall model of the fluid is not a fiction, it is a reliable enough guide to the behavior of the fluid. But the component called artificial viscosity is a fiction—it is not being used to reliably model the shock. It is being incorporated into a larger modeling framework so as to make that larger framework, “reliable enough.”

This account has drawn two sorts of criticisms. Toon (2010) has argued that this definition of a fiction is too narrow. He gives examples of historical fictions like I, Claudius , and Schindler’s Ark , which he argues are fictions, despite the fact that “they are offered as ‘good enough’ guides to those people, places and events in certain respects and we are entitled to take them as such.” (p. 286–7). Toon, presumably, supports a broader conception of the role of fictions in science, then, according to which they do not play a particularly prominent or heightened role in computer simulation.

Gordon Purves (forthcoming) argues that there are examples of fictions in computational models (his example is so-called “imaginary cracks”), and elsewhere, that do not meet the strict requirements discussed above. Unlike Toon, however, he also wants to delineate fictional modeling elements from the non-fictional ones. His principal criticism is of the criterion of fictionhood in terms of social norms of use—and Purves argues that we ought to be able to settle whether or not some piece of modeling is a fiction in the absence of such norms. Thus, he wants to find an intrinsic characterization of a scientific fiction. His proposal takes as constitutive of model fictions that they fail to have the characteristic that Laymon (1985) called “piecewise improvability” (PI). PI is a characteristic of many models that are idealizations; it says that as you de-idealize, your model becomes more and more accurate. But as you de-idealize a silogen atom, you do not get a more and more accurate simulation of a silicon crack. But Purves takes this failure of PI to be constitutive of a fiction, rather than merely symptomatic of them.

  • Barberousse, A., and P. Ludwig, 2009. “Models as Fictions,” in Fictions in Science. Philosophical Essays in Modeling and Idealizations , London: Routledge, 56–73.
  • Barberousse, A., and Vorms, M. 2014. “About the warrants of computer-based empirical knowledge,” Synthese , 191(15): 3595–3620.
  • Bedau, M.A., 2011. “Weak emergence and computer simulation,” in P. Humphreys and C. Imbert (eds.), Models, Simulations, and Representations , New York: Routledge, 91–114.
  • –––, 1997. “Weak Emergence,” Noûs (Supplement 11), 31: 375–399.
  • Beisbart, C. and J. Norton, 2012. “Why Monte Carlo Simulations are Inferences and not Experiments,” in International Studies in Philosophy of Science , 26: 403–422.
  • Beisbart, C., 2017. “Advancing knowledge through computer simulations? A socratic exercise,” in M. Resch, A. Kaminski, & P. Gehring (eds.), The Science and Art of Simulation (Volume I), Cham: Springer, pp. 153–174./
  • Burge, T., 1993. “Content preservation,” The Philosophical Review , 102(4): 457–488.
  • –––, 1998. “Computer proof, apriori knowledge, and other minds: The sixth philosophical perspectives lecture,” Noûs, , 32(S12): 1–37.
  • Currie, Adrian, 2018. “The argument from surprise,” Canadian Journal of Philosophy , 48(5): 639–661
  • Dardashti, R., Thebault, K., and Winsberg, E., 2015. “Confirmation via analogue simulation: what dumb holes could tell us about gravity,” in British Journal for the Philosophy of Science , 68(1): 55–89
  • Dardashti, R., Hartmann, S., Thebault, K., and Winsberg, E., 2019. “Hawking radiation and analogue experiments: A Bayesian analysis,” in Studies in History and Philosophy of Modern Physics , 67: 1–11.
  • Epstein, J., and R. Axtell, 1996. Growing artificial societies: Social science from the bottom-up , Cambridge, MA: MIT Press.
  • Epstein, J., 1999. “Agent-based computational models and generative social science,” Complexity , 4(5): 41–57.
  • Franklin, A., 1996. The Neglect of Experiment , Cambridge: Cambridge University Press.
  • –––, 1989. “The Epistemology of Experiment,” The Uses of Experiment , D. Gooding, T. Pinch and S. Schaffer (eds.), Cambridge: Cambridge University Press, 437–60.
  • Frigg, R., and J. Reiss, 2009. “The philosophy of simulation: Hot new issues or same old stew,” Synthese , 169: 593–613.
  • Giere, R. N., 2009. “Is Computer Simulation Changing the Face of Experimentation?,” Philosophical Studies , 143: 59–62
  • Gilbert, N., and K. Troitzsch, 1999. Simulation for the Social Scientist , Philadelphia, PA: Open University Press.
  • Grüne-Yanoff, T., 2007. “Bounded Rationality,” Philosophy Compass , 2(3): 534–563.
  • Grüne-Yanoff, T. and Weirich, P., 2010. “Philosophy of Simulation,” Simulation and Gaming: An Interdisciplinary Journal , 41(1): 1–31.
  • Guala, F., 2002. “Models, Simulations, and Experiments,” Model-Based Reasoning: Science, Technology, Values , L. Magnani and N. Nersessian (eds.), New York: Kluwer, 59–74.
  • –––, 2008. “Paradigmatic Experiments: The Ultimatum Game from Testing to Measurement Device,” Philosophy of Science , 75: 658–669.
  • Hacking, I., 1983. Representing and Intervening: Introductory Topics in the Philosophy of Natural Science , Cambridge: Cambridge University Press.
  • –––, 1988. “On the Stability of the Laboratory Sciences,” The Journal of Philosophy , 85: 507–15.
  • –––, 1992. “Do Thought Experiments have a Life of Their Own?” PSA (Volume 2), A. Fine, M. Forbes and K. Okruhlik (eds.), East Lansing: The Philosophy of Science Association, 302–10.
  • Hartmann, S., 1996. “The World as a Process: Simulations in the Natural and Social Sciences,” in R. Hegselmann, et al. (eds.), Modelling and Simulation in the Social Sciences from the Philosophy of Science Point of View , Dordrecht: Kluwer, 77–100.
  • Hubig, C, & Kaminski, A., 2017. “Outlines of a pragmatic theory of truth and error in computer simulation,” in M. Resch, A. Kaminski, & P. Gehring (eds.), The Science and Art of Simulation (Volume I), Cham: Springer, pp. 121–136.
  • Hughes, R., 1999. “The Ising Model, Computer Simulation, and Universal Physics,” in M. Morgan and M. Morrison (eds.), Models as Mediators , Cambridge: Cambridge University Press.
  • Huggins, E. M.,and E. A. Schultz, 1967. “San Francisco bay in a warehouse,” Journal of the Institute of Environmental Sciences and Technology , 10(5): 9–16.
  • Humphreys, P., 1990. “Computer Simulation,” in A. Fine, M. Forbes, and L. Wessels (eds.), PSA 1990 (Volume 2), East Lansing, MI: The Philosophy of Science Association, 497–506.
  • –––, 1995. “Computational science and scientific method,” in Minds and Machines , 5(1): 499–512.
  • –––, 2004. Extending ourselves: Computational science, empiricism, and scientific method , New York: Oxford University Press.
  • –––, 2009. “The philosophical novelty of computer simulation methods,” Synthese , 169: 615–626.
  • Kaufmann, W. J., and L. L. Smarr, 1993. Supercomputing and the Transformation of Science , New York: Scientific American Library.
  • Laymon, R., 1985. “Idealizations and the testing of theories by experimentation,” in Observation, Experiment and Hypothesis in Modern Physical Science , P. Achinstein and O. Hannaway (eds.), Cambridge, MA: MIT Press, 147–73.
  • Lenhard, J., 2007. “Computer simulation: The cooperation between experimenting and modeling,” Philosophy of Science , 74: 176–94.
  • –––, 2019. Calculated Surprises: A Philosophy of Computer Simulation , Oxford: Oxford University Press
  • Lenhard, J. & Küster, U., 2019. Minds & Machines . 29: 19.
  • Morgan, M., 2003. “Experiments without material intervention: Model experiments, virtual experiments and virtually experiments,” in The Philosophy of Scientific Experimentation , H. Radder (ed.), Pittsburgh, PA: University of Pittsburgh Press, 216–35.
  • Morrison, M., 2012. “Models, measurement and computer simulation: The changing face of experimentation,” Philosophical Studies , 143: 33–57.
  • Norton, S., and F. Suppe, 2001. “Why atmospheric modeling is good science,” in Changing the Atmosphere: Expert Knowledge and Environmental Governance , C. Miller and P. Edwards (eds.), Cambridge, MA: MIT Press, 88–133.
  • Oberkampf, W. and C. Roy, 2010. Verification and Validation in Scientific Computing , Cambridge: Cambridge University Press.
  • Oreskes, N., with K. Shrader-Frechette and K. Belitz, 1994. “Verification, Validation and Confirmation of Numerical Models in the Earth Sciences,” Science , 263(5147): 641–646.
  • Parke, E., 2014. “Experiments, Simulations, and Epistemic Privilege,” Philosophy of Science , 81(4): 516–36.
  • Parker, W., 2008a. “Franklin, Holmes and the Epistemology of Computer Simulation,” International Studies in the Philosophy of Science , 22(2): 165–83.
  • –––, 2008b. “Computer Simulation through an Error-Statistical Lens,” Synthese , 163(3): 371–84.
  • –––, 2009a. “Does Matter Really Matter? Computer Simulations, Experiments and Materiality,” Synthese , 169(3): 483–96.
  • –––, 2013. “Computer Simulation,” in S. Psillos and M. Curd (eds.), The Routledge Companion to Philosophy of Science , 2nd Edition, London: Routledge.
  • –––, 2017. “Computer Simulation, Measurement, and Data Assimilation,” British Journal for the Philosophy of Science , 68(1): 273–304.
  • Peschard, I., 2010. “Modeling and Experimenting,” in P. Humphreys and C. Imbert (eds), Models, Simulations, and Representations , London: Routledge, 42–61.
  • Primiero, G., 2019. “A Minimalist Epistemology for Agent-Based Simulations in the Artificial Sciences,” Minds and Machines , 29(1): 127–148.
  • Purves, G.M., forthcoming. “Finding truth in fictions: identifying non-fictions in imaginary cracks,” Synthese .
  • Resch, M. M., Kaminski, A., & Gehring, P. (eds.), 2017. The science and art of simulation I: Exploring-understanding-knowing , Berlin: Springer.
  • Roush, S., 2015. “The epistemic superiority of experiment to simulation,” Synthese , 169: 1–24.
  • Roy, S., 2005. “Recent advances in numerical methods for fluid dynamics and heat transfer,” Journal of Fluid Engineering , 127(4): 629–30.
  • Ruphy, S., 2015. “Computer simulations: A new mode of scientific inquiry?” in S. O. Hansen (ed.), The Role of Technology in Science: Philosophical Perspectives , Dordrecht: Springer, pp. 131–149
  • Schelling, T. C., 1971. “Dynamic Models of Segregation,” Journal of Mathematical Sociology , 1: 143–186.
  • Simon, H., 1969. The Sciences of the Artificial , Boston, MA: MIT Press.
  • Symons, J., & Alvarado, R., 2019. “Epistemic Entitlements and the Practice of Computer Simulation,” Minds and Machines , 29(1): 37–60.
  • Toon, A., 2010. “Novel Approaches to Models,” Metascience , 19(2): 285–288.
  • Trenholme R., 1994. “Analog Simulation,” Philosophy of Science , 61: 115–131.
  • Unruh, W. G., 1981. “Experimental black-hole evaporation?” Physical Review Letters , 46(21): 1351–53.
  • Winsberg, E., 2018. Philosophy and Climate Science , Cambridge: Cambridge University Press
  • –––, 2010. Science in the Age of Computer Simulation , Chicago: The University of Chicago Press.
  • –––, 2009a. “A Tale of Two Methods,” Synthese , 169(3): 575–92
  • –––, 2009b. “Computer Simulation and the Philosophy of Science,” Philosophy Compass , 4/5: 835–845.
  • –––, 2009c. “A Function for Fictions: Expanding the scope of science,” in Fictions in Science: Philosophical Essays on Modeling and Idealization , M. Suarez (ed.), London: Routledge.
  • –––, 2006. “Handshaking Your Way to the Top: Inconsistency and falsification in intertheoretic reduction,” Philosophy of Science , 73: 582–594.
  • –––, 2003. “Simulated Experiments: Methodology for a Virtual World,” Philosophy of Science , 70: 105–125.
  • –––, 2001. “Simulations, Models, and Theories: Complex Physical Systems and their Representations,” Philosophy of Science , 68: S442–S454.
  • –––, 1999. “Sanctioning Models: The Epistemology of Simulation,” Science in Context , 12(3): 275–92.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Phys.org – computer simulations .
  • Computer simulation , at sciencedaily.com.
  • IPPC – Intergovernmental Panel on Climate Change .

biology: experiment in | computation: in physical systems | computer science, philosophy of | computing: modern history of | emergent properties | models in science | physics: experiment in | science: theory and observation in | scientific representation | scientific theories: structure of

Copyright © 2019 by Eric Winsberg < winsberg @ usf . edu >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

  • U.S. Department of Health & Human Services
  • National Institutes of Health

NIBIB Logo

En Español | Site Map | Staff Directory | Contact Us

  • Science Education
  • Science Topics

Computational Modeling

What is computational modeling, how is computational modeling used to study complex systems, how can computational modeling improve medical care and research, how are nibib-funded researchers using computational modeling to improve health.

Computational modeling is the use of computers to simulate and study complex systems using mathematics, physics and computer science. A computational model contains numerous variables that characterize the system being studied. Simulation is done by adjusting the variables alone or in combination and observing the outcomes. Computer modeling allows scientists to conduct thousands of simulated experiments by computer. The thousands of computer experiments identify the handful of laboratory experiments that are most likely to solve the problem being studied.

Today’s computational models can study a biological system at multiple levels. Models of how disease develops include molecular processes, cell to cell interactions, and how those changes affect tissues and organs. Studying systems at multiple levels is known as multiscale modeling (MSM).

graphic of computational modeling

Weather forecasting models make predictions based on numerous atmospheric factors. Accurate weather predictions can protect life and property and help utility companies plan for power increases that occur with extreme climate shifts.

Flight simulators use complex equations that govern how aircraft fly and react to factors such as turbulence, air density, and precipitation. Simulators are used to train pilots, design aircraft, and study how aircraft are affected as conditions change.

Earthquake simulations aim to save lives, buildings, and infrastructure. Computational models predict how the composition, and motion of structures interact with the underlying surfaces to affect what happens during an earthquake.

Tracking infectious diseases. Computational models are being used to track infectious diseases in populations, identify the most effective interventions, and monitor and adjust interventions to reduce the spread of disease. Identifying and implementing interventions that curb the spread of disease are critical for saving lives and reducing stress on the healthcare system during infectious disease pandemics.

Clinical decision support. Computational models intelligently gather, filter, analyze and present health information to provide guidance to doctors for disease treatment based on detailed characteristics of each patient. The systems help to provide informed and consistent care of a patient as they transfer to appropriate hospital facilities and departments and receive various tests during their course of treatment.

Predicting drug side effects. Researchers use computational modeling to help design drugs that will be the safest for patients and least likely to have side effects. The approach can reduce the many years needed to develop a safe and effective medication.

Modeling infectious disease spread to identify effective interventions . Modeling infectious diseases accurately relies on numerous large sets of data. For example, evaluation of the efficacy of social distancing on the spread of flu-like illness must include information on friendships and interactions of individuals, as well as standard biometric and demographic data. NIBIB-funded researchers are developing new computational tools that can incorporate newly available data sets into models designed to identify the best courses of action and the most effective interventions during pandemic spread of infectious disease and other public health emergencies. 

multiscale modeling schematic

Tracking viral evolution during spread of infectious disease. RNA viruses such as HIV, hepatitis B, and coronavirus continually mutate to develop drug resistance, escape immune response, and establish new infections. Samples of sequenced pathogens from thousands of infected individuals can be used to identify millions of evolving viral variants. NIBIB-funded researchers are creating computational tools to incorporate this important data into infectious disease analysis by health care professionals. The new tools will be created in partnership with the CDC and made available online to researchers and health care workers. The project will enhance worldwide disease surveillance and treatment and enable development of more effective disease eradication strategies. 

Transforming wireless health data into improved health and healthcare. Health monitoring devices at hospitals, and wearable sensors such as smartwatches generate vast amounts of health data in real-time. Data-driven medical care promises to be fast, accurate, and less expensive, but the continual data streams currently overwhelm the ability to use the information. NIBIB-funded researchers are developing computational models that convert streaming health data into a useful form. The new models will provide real-time physiological monitoring for clinical decision making at the Nationwide Children’s Hospital. A team of mathematicians, biomedical informaticians, and hospital staff will generate publicly shared data and software. The project will leverage the $11 billion wireless health market to significantly improve healthcare.

Human and machine learning for customized control of assistive robots. The more severe a person’s motor impairment, the more challenging it is to operate assistive machines such as powered wheelchairs and robotic arms. Available controls such as sip-and-puff devices are not adequate for persons with severe paralysis. NIBIB-funded researchers are engineering a system to enable people with tetraplegia to control a robotic arm while promoting exercise and maintenance of residual motor skills. The technology uses body-machine interfaces that respond to minimal movement in limbs, head, tongue, shoulders, and eyes. Initially, when the user moves, machine learning augments the signal to perform a task with a robotic arm. Help is scaled back as the machine transfers control to the progressively skilled user. The approach aims to empower people with severe paralysis and provide an interface to safely learn to control robotic assistants.

Updated May 2020

download icon

Are computer simulations experiments? And if not, how are they related to each other?

  • Original Paper in Philosophy of Science
  • Published: 24 July 2017
  • Volume 8 , pages 171–204, ( 2018 )

Cite this article

experiment definition computer

  • Claus Beisbart   ORCID: orcid.org/0000-0003-2731-6200 1  

1543 Accesses

20 Citations

3 Altmetric

Explore all metrics

Computer simulations and experiments share many important features. One way of explaining the similarities is to say that computer simulations just are experiments. This claim is quite popular in the literature. The aim of this paper is to argue against the claim and to develop an alternative explanation of why computer simulations resemble experiments. To this purpose, experiment is characterized in terms of an intervention on a system and of the observation of the reaction. Thus, if computer simulations are experiments, either the computer hardware or the target system must be intervened on and observed. I argue against the first option using the non-observation argument, among others. The second option is excluded by e.g. the over-control argument, which stresses epistemological differences between experiments and simulations. To account for the similarities between experiments and computer simulations, I propose to say that computer simulations can model possible experiments and do in fact often do so.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

experiment definition computer

Why computer simulations are not inferences, and in what sense they are experiments

experiment definition computer

The epistemic superiority of experiment to simulation

experiment definition computer

Myths of Simulation

See Efstathiou et al. ( 1985 ) and Bertschinger ( 1998 ) and Dolag et al. ( 2008 ) for simulations in cosmology.

E.g. Gramelsberger ( 2010 ).

E.g. Naumova et al. ( 2008 ).

Keller ( 2003 ), p. 203, in scare quotes.

Humphreys ( 1994 ), p. 103.

Dowling ( 1999 ), p. 261, in scare quotes.

Authors as different as Winsberg ( 1999 ), p. 277, Stöckler ( 2000 ), p. 366 and Barberousse et al. ( 2009 ), pp. 558–559 agree that this is an important task for a philosophical treatment of CS.

Consult (Imbert 2017 ), Sec. 3.5 for a very useful overview of the recent debate.

But see fn. 68 for a short remark on analog simulations.

I can draw on a rich philosophical literature about experiments. See Hacking ( 1983 ), Part B, Janich ( 1995 ), Morrison ( 1998 ), Heidelberger ( 2005 ), Radder ( 2009 ), Bogen ( 2010 ) and Franklin ( 2010 ) for introductory pieces or reviews about scientific experiments. Radder ( 2003 ) is a recent collection in the philosophy of experiment. See Falkenburg ( 2007 ), particularly Ch. 2, for a recent account of experimentation with applications to particle physics. Biological experiments are philosophically analyzed by Weber ( 2005 ), experiments in economics by Sugden ( 2005 ). For studies about modern experiments see also Knorr-Cetina ( 1981 ) and Rheinberger ( 1997 ).

See e.g. Janich ( 1995 ) and Parker ( 2009 ), p. 487 for the two components of experimentation. Causal interference in experiments is put into a historical perspective by Tiles ( 1993 ).

Kant himself is eager to stress the conceptual work necessary to ask nature a well-defined question, but this is not important in what follows; our focus concerning the idea that nature is asked a question is rather on the causal interference with the system that is observed.

See Hüttemann ( 2000 ) and Falkenburg ( 2007 ), Ch. 2 for a related discussion.

In the example of the potter, I cannot exclude that the potter runs an experiment, if additional conditions are fulfilled. But even if she does, the respective experiment would not count as scientific . Scientific experiments are embedded in a broader scientific practice. As a consequence, the epistemic difference an experiment is supposed to make is more pronounced.

See Guala ( 2002 ) and Winsberg ( 2009b ), pp. 52–53.

See e.g. Balzer ( 1997 ), p. 139.

See Shapere ( 1982 ) and Hacking ( 1983 ), Chs. 10–11, Falkenburg ( 2007 ), pp. 65–71 and Humphreys ( 2004 ), Ch. 1 for broad accounts of observation.

A recent monograph about theory-ladenness is Adam ( 2002 ).

See Peschard ( forthcoming ), Sec. 1 for a similar account of experiment.

For instance, Zimmerman ( 2003 ) calls his study a natural experiment. See Brown and Fehige ( 2017 ) for a recent overview of thought experiments

One may of course argue that natural and thought experiments are not really experiments, but this is not the place to do so.

I’m grateful to an anonymous referee for pointing me to this fact.

See Barberousse et al. ( 2009 ) and Humphreys ( 2013 ) for useful discussions of the notion of data in the context of CSs.

Cf. the “identity thesis” mentioned by Winsberg ( 2009a ), p. 840.

Similar claims as CE and CE+ figure in J. Norton’s reduction of thought experiments to arguments, see Norton ( 1996 ); cf. Beisbart ( 2012 ).

So far, the focus of the philosophy of computer simulations was on knowledge.

To be fair, I should mention that the paper by Morrison provides also indications that she does not fully support CE T . For one thing, the wordings of her central claims are very cautious; she never says that CSs are experiments, but rather e.g. that there are no reasons to maintain the distinction (e.g. p. 55). This claim seems also to be restricted to some “contexts” (p. 33). She further admits that computer simulations do not involve the manipulation of the target system (fn. 16 on p. 55). This concession does not seem to matter much for her argument; so, maybe, she does not think that intervention is crucial for experiment.

CE, CE+ and CME try to clarify the relationship between experiments and simulations. But what exactly do they mean by CSs? There are broadly two ways of conceiving CSs depending on whether a CS is supposed to be one run with a simulation program or whether it is what Parker ( 2009 ), p. 488 calls a “computer simulation study”, which also includes writing the program, testing it, etc. (e.g. Frigg and Reiss 2009 , p. 596; Parker 2009 , p. 488). Analogous questions can be raised about experiments too, e.g. is the construction of the detector used in an experiment part of the latter or not? In what follows, I will not rely upon any specific proposal as to what is included in an experiment or a CS. Rather, I will assume that experiments and CSs are identified in a similar way such that the claims under consideration have a chance of being true. For instance, when we discuss CE, it would be too uncharitable to assume that experiments include detector building and similar activities, while a computer simulation is simply one run of a simulation program. What is important though for my argument is that every experiment includes an intervention on the object of the experiment.

Cf. Hughes ( 1999 ), p. 137.

Note that we are here not talking about observing in the sense of looking at. Observation on this interpretation does not suffice for experimenting.

See Rechenberg ( 2000 ), Chs. 2–3 for a brief description of the hardware of computers; the details do not matter for my purposes.

Barberousse et al. ( 2009 ) seem to agree with my claim that the working scientist does not observe the hardware, for they write that

“the running computer is not observed qua physical system, but qua computer, i.e., qua symbolic system” (p. 564).

I’m grateful to an anonymous referee for raising this objection.

See Press et al. ( 2007 ), p. 9 for an example of a suitable program.

Such an argument is suggested by Imbert ( 2017 ), Sec. 3.5.

This is so if intervention in the second condition is meant to be intentional. If this is not so, then the argument needs the third condition which makes it very likely that the intervention is intentional.

The example of a simulation that is parallelized is also used by Barberousse et al. ( 2009 , p. 565), albeit in a different argument.

My arguments against CE H from this section can easily be generalized to show that computations carried out on computers (and not just simulations) do not include an experiment on the hardware.

Some measurement apparatuses used in experiments function in a similar fashion. We do not observe certain measurement apparatuses, but rather use them to observe something else (cf. fn. 2 on p. 7 above). To do so we have to trust the instruments. There is thus a close parallel between computers as instruments and instruments in experimentation. Cf. Humphreys ( 2004 ), Ch. 1 and Parker ( 2010 ).

For the purposes of this paper, we need not engage with Morrison’s argument in detail because Giere ( 2009 ) has made a convincing case against it.

My first two arguments against CE T may be summarized by the claim by Arnold ( 2013 , pp. 58–59) that CSs do not operate on the target itself.

Here, the last inferential step excludes the possibility that CSs include an experiment on the target even though the results of CSs are not experimental. This possibility can indeed be dismissed as being far-fetched and useless. Even if it were realized, we could not directly appeal to experimentation to explain the epistemic power of CSs.

Properly speaking, it is the results of CSs (or experiments) that are (not) over-controlled; but for convenience, I will sometimes say that CSs (experiments) are (not) over-controlled. See fn. 42 for a justification.

To elaborate a bit: The argument starts from what the working scientists wants. The reason is that the third condition on experiment is cast in terms of what the experimenter wants; it does not require that the experimenter be successful. The argument further assumes that the scientist is rational in that she only wants something that she takes to be possible and that she draws immediate consequences of her beliefs. For the objective absence of over-control, it is further assumed that the scientist correctly believes that she can learn about a reaction of the system. We can grant the additional assumptions because proponents of CE T will not want to save their claim by claiming that scientists are not rational or that they do not know basic things about the concrete setting. If we don’t find the additional assumptions convincing, we may restrict ourselves to successful experiments in which the goal mentioned in the third condition is in fact fulfilled. Proponents of CE T will not want to exclude such experiments. We can then argue that over-control would prevent success.

Note though that the notion of a reaction does a lot of work for the argument since we understand reaction in a way that excludes objective over-control. It may be objected that, in the conditions on experiment, “reaction” may instead be taken to be any consequence of the intervention. The assumption that experiments exclude over-control would then need additional justification. I think that the discussion provided in this section does provide this justification.

There may be exceptions. For instance, I may use a system that is known to follow certain dynamical equations to identify a solution to these equations. Here I need to know the equations, otherwise I can’t interpret the experiment in the way I wish to. To save my claim that experiments are not over-controlled, I may either deny that we are really talking about an experiment here. Alternatively, I may claim that knowledge of the equations is not needed for the experiment proper, but only for an inference that is based upon it.

See e.g. Beisbart and Norton ( 2012 ).

This is also what Hughes ( 1999 ), p. 142 claims. I am here speaking of combinations of assumptions because Duhem ( 1954 , Ch. 10, §§2–3) has taught us that many hypotheses cannot be tested or refuted in isolation.

Here, simulations are distinguished from arguments to the effect that the simulations get it right. When arguments of this type are part and parcel of simulations, the latter can of course empirically refute a set of assumptions, as can do experiments. But it is trivial that CSs in this extremely thick sense can do this. This trivial point cannot be the rationale for CE T . – Note also that purely mathematical theories may be falsified using computer simulations. But mathematical theories are not my concern here.

What then are the conditions under which a CS can replace an experiment? Well, it can do so if the CS is known to reflect the intervention and the reaction of the system experimented on in a sufficiently faithful way. What sufficiently faithful representation means depends on the aspects that the working scientists are interested in and on the desired level of accuracy.

Barberousse et al. ( 2009 ), pp. 562 and 565 make a similar point concerning the hardware of a computer. Their target is not so much an alleged experimental status of simulations rather than the idea that the physicality of the simulations is crucial for the epistemic power of simulations.

That CSs model experiments is sometimes assumed in the sciences, too, see e.g. Haasl and Payseur ( 2011 ), p. 161. The claim also makes sense of the following remark by Metropolis and Ulam ( 1949 ) about Monte Carlo simulations:

“These experiments will of course be performed not with any physical apparatus, but theoretically.” (p. 337).

As I shall show below, my argument also goes through for an alternative account of modeling that does not assume similarity.

See also Beisbart ( 2014 ) for the various ways in which computer simulations are related to models.

This analysis is focused on deterministic simulations, but can be generalized to Monte-Carlo simulations. The latter produce many sample trajectories of the computer model. Each sample trajectory arises from initial conditions subject to quasi-intervention and produces outputs that may be quasi-observed.

I have here concentrated on the conditions on experimentation introduced in Section  2 above. These conditions have not been shown to be sufficient. This is not a problem because we here not interested in the claim that an experiment is indeed run, but only in the proposition that an experiment is modeled. Now a model need not fully reflect its target. Thus, not every condition on experiment needs to be reflected in the model, if only crucial aspects of experiments are represented. This clearly seems to be the case.

A model in which all initial conditions and parameter values are fixed seems highly artificial, and even in this case, one may vary some of the model assumptions, which would suffice for quasi-intervention.

My claim that a CS can be or even is a modeled experiment is not meant to imply that this CS can be or is an experiment. A modeled experiment is not an experiment as fake snow is not snow.

If a CS does not actually model a possible experiment because it is supposed to reflect the way the target system does behave as a matter of fact, we may still say that it models a natural experiment. In such an experiment, no intervention is needed simply because the system that is observed happens to fulfill the conditions that are at the focus of the inquiry.

Some similarities on the list from Section  3 too apply to CSs of which CME 2 does not hold true. We can explain such similarities by saying that the simulations model only the behavior of a system as a reaction to certain conditions rather than also an intervention in which the system is subjected to the conditions.

See e.g. Mainzer ( 1995 ), p. 467.

It would be too much to claim that my proposal provides an independent explanation of the similarities listed. The reason is that some similarities are built into the proposal. What the proposal does though is to re-organize the similarities in a useful way. Of course, CE (plus, maybe, CE+) potentially reorganizes the similarities too, but CE and CE+ have been rejected on independent grounds.

They may do so by imitating natural experiments, which are not experiments according to our partial explication.

See also Imbert ( 2017 ), Sec. 5.2 for a similar strategy.

All this was very helpfully pointed out by an anonymous referee who also noted that the problem is not just restricted to my account, but rather affects any view that embraces the following two claims: i. Experiments are not over-controlled, while CSs are. ii. CSs can replace experiments.

In a similar way, Nagel ( 1986 ), p. 93 criticizes Berkeley’s so-called master argument because Berkeley confuses something that is needed to create an image with the content of the image.

Can my main thesis, viz. that CSs can, and do often do, model possible experiments be generalized to what is called analog simulation? I here take it that, in an analog simulation, the target and a physical model of it may be described using the same type of dynamical equations. The dynamics of the model then is investigated to learn about the dynamics of the target (Trenholme 1994 ). For instance, electric circuits may be used to study a fluid in this way (see Kroes ( 1989 ) for this example). Now, my claim that CSs can model possible experiments and often do so seems to apply to such analog simulations, too. For instance, if I set up an electric circuit to model a particular fluid, I’m modeling an experiment on the fluid. Note, however, the following difference between analog and computer simulations: If a possible experiment on the target is modeled in an analog simulation, this is a real experiment on the model system (the analogue). As I have argued above in Section  5 , this is not so in computer simulations.

Adam, M. (2002). Theoriebeladenheit und Objektivität. Zur Rolle von Beobachtungen in den Naturwissenschaften . Frankfurt am Main und London: Ontos.

Book   Google Scholar  

Arnold, E. (2013). Experiments and simulations: Do they fuse? In Durán, J.M., & Arnold, E. (Eds.) Computer simulations and the changing face of scientific experimentation (pp. 46–75). Newcastle upon Tyne: Cambridge Scholars Publishing.

Balzer, W. (1997). Die Wissenschaft und ihre Methoden . Freiburg und München: Karl Alber.

Google Scholar  

Barberousse, A., Franceschelli, S., & Imbert, C. (2009). Computer simulations as experiments. Synthese , 169 , 557–574.

Article   Google Scholar  

Barker-Plummer, D. (2016). Turing machines. In Zalta, E.N. (Ed.), The Stanford encyclopedia of philosophy. Winter 2016 edn, Metaphysics Research Lab, Stanford University .

Baumberger, C. (2011). Understanding and its relation to knowledge. In Löffler, C.J.W. (Ed.) Epistemology: contexts, values, disagreement. Papers of the 34th international Wittgenstein symposium (pp. 16–18). Austrian Ludwig Wittgenstein Society.

Beisbart, C. (2012). How can computer simulations produce new knowledge? European Journal for Philosophy of Science , 2(2012), 395–434.

Beisbart, C. (2014). Are we Sims? How computer simulations represent and what this means for the simulation argument. The Monist , 97/3 , 399–417.

Beisbart, C., & Norton, J.D. (2012). Why Monte Carlo simulations are inferences and not experiments. International Studies in the Philosophy of Science , 26 , 403–422.

Bertschinger, E. (1998). Simulations of structure formation in the Universe. Annual Review of Astronomy and Astrophysics , 36 , 599–654.

Binder, K., & Heermann, D. (2010). Monte Carlo simulation in statistical physics: An introduction, graduate texts in physics . Berlin: Springer Verlag.

Bogen, J. (2010). Theory and observation in science. In Zalta, E.N. (Ed.), The stanford encyclopedia of philosophy . Spring 2010 edn. http://plato.stanford.edu/archives/spr2010/entries/science-theory-observation/ .

Brown, J.R., & Fehige, Y. (2017). Thought experiments. In Zalta, E.N. (Ed.), The stanford encyclopedia of philosophy. Summer 2017 edn .

Carnap, R. (1962). Logical foundations of probability , 2nd edn. Chicago: University of Chicago Press.

Casti, J.L. (1997). Would-be worlds. How simulation is changing the frontiers of science . New York: Wiley.

Dolag, K., Borgani, S., Schindler, S., Diaferio, A., & Bykov, A.M. (2008). Simulation techniques for cosmological simulations. Space Science Reviews , 134 , 229–268. arXiv: 0801.1023v1 .

Dowling, D. (1999). Experimenting on theories. Science in Context , 12/2 , 261–273.

Duhem, P.M.M. (1954). The aim and structure of physical theory, Princeton science library . Princeton, NJ: Princeton University Press.

Durán, J.M. (2013). The use of the materiality argument in the literature on computer simulations. In Durán, J.M., & Arnold, E. (Eds.), Computer simulations and the changing face of scientific experimentation (pp. 76–98). Newcastle upon Tyne: Cambridge Scholars Publishing.

Efstathiou, G., Davis, M., White, S.D.M., & Frenk, C.S. (1985). Numerical techniques for large cosmological N-body simulations. Ap J Suppl , 57 , 241–260.

Falkenburg, B. (2007). Particle metaphysics. A critical account of subatomic reality . Heidelberg: Springer.

Franklin, A. (2010). Experiment in physics. In Zalta, E.N. (Ed.), The Stanford encyclopedia of philosophy. Spring 2010 edn .

Frigg, R.P., & Reiss, J. (2009). The philosophy of simulation: Hot mew issues or same old stew? Synthese, 169 , 593–613.

Fritzson, P. (2004). Principles of object-oriented modeling and simulation with Modelica 2.1 . IEEE Press.

Giere, R.N. (2004). How models are used to represent. Philosophy of Science , 71 , 742–752.

Giere, R.N. (2009). Is computer simulation changing the face of experimentation? Philosophical Studies, 143 (1), 59–62.

Gillespie, D.T. (1976). A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Journal of Computational Physics , 22 , 403–434.

Goodman, N. (1968). Languages of art: An approach to a theory of symbols . Indianapolis: Bobbs-Merrill.

Gramelsberger, G. (2010). Computerexperimente. Zum Wandel der Wissenschaft im Zeitalter des Computers . Transcript, Bielefeld.

Guala, F. (2002). Models, simulations, and experiments. In Magnani, L., & Nersessian, N. (Eds.), Model-based reasoning: science, technology, values (pp. 59–74). New York: Kluwer.

Chapter   Google Scholar  

Guillemot, H. (2010). Connections between simulations and observation in climate computer modeling. scientist’s practices and bottom-up epistemology lessons. Studies In History and Philosophy of Science Part B: Studies In History and Philosophy of Modern Physics , 41 , 242–252. Special Issue: Modelling and simulation in the atmospheric and climate sciences.

Haasl, R.J., & Payseur, B.A. (2011). Multi-locus inference of population structure: a comparison between single nucleotide polymorphisms and microsatellites. Heredity , 106 , 158–171.

Hacking, I. (1983). Representing and intervening . Cambridge: Cambridge University Press.

Hasty, J., McMillen, D., Isaacs, F., & Collins, J.J. (2001). Computational studies of gene regulatory networks: In numero molecular biology. Nature Reviews Genetics , 2 , 268–279.

Heidelberger, M. (2005). Experimentation and instrumentation. In Borchert, D. (Ed.), Encyclopedia of philosophy. Appendix (pp. 12–20). New York: Macmillan.

Hughes, R.I.G. (1997). Models and representation. Philosophy of Science (Proceedings) , 64 , S325–S336.

Hughes, R.I.G. (1999). The Ising model, computer simulation, and universal physics. In Morgan, M.S., & Morrison, M. (Eds.), Models as mediators. Perspectives on natural and social sciences (pp. 97–145). Cambridge: Cambridge University Press.

Humphreys, P. (1990). Computer simulations. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association , 1990 , 497–506.

Humphreys, P. (1994). Numerical experimentation. In Humphreys, P. (Ed.), Patrick Suppes. Scientific philosopher (Vol. 2, pp. 103–118). Dordrecht: Kluwer.

Humphreys, P. (2004). Extending ourselves: Computational science, empiricism, and scientific method . New York: Oxford University Press.

Humphreys, P.W. (2013). What are data about? In Durán, J.M., & Arnold, E. (Eds.) Computer simulations and the changing face of scientific experimentation (pp. 12–28). Newcastle upon Tyne: Cambridge Scholars Publishing.

Hüttemann, A. (2000). Natur und Labor. Über die Grenzen der Gültigkeit von Naturgesetzen. Philosophia Naturalis , 37 , 269–285.

Imbert, C. (2017). Computer simulations and computational models in science. In Magnani, L. & Bertolotti, T. (Eds.) Springer handbook of model-based science (Vol. 34, pp. 733–779), Cham, chapter: Springer.

Janich, P. (1995). Experiment. In Mittelstraß, J. (Ed.), Enzyklopädie Philosophie und Wissenschaftstheorie. Band 1, Metzler, Stuttgart (pp. 621–622).

Kant, I. (1998). Critique of pure reason . Cambridge: Cambridge University Press. translated by P. Guyer and A. W. Wood; Cambridge Edition of the Works of Kant.

Keller, E.F. (2003). Models, simulation, and computer experiments. In Radder, H. (Ed.), The philosophy of scientific experimentation (pp. 198–215). Pittsburgh: University of Pittsburgh Press.

Knorr-Cetina, K. (1981). The manufacture of knowledge: An essay on the constructivist and contextual nature of science . Pergamon international library of science, technology, engineering, and social studies, Pergamon Press.

Kroes, P. (1989). Structural analogies between physical systems. British Journal for the Philosophy of Science , 40 , 145–154.

Küppers, G., & Lenhard, J. (2005). Computersimulationen: Modellierungen 2. Ordnung. Journal for General Philosophy of Science , 36 (2), 305–329.

Lim, S., McKee, J.L., Woloszyn, L., Amit, Y., Feedman, D.J., Sheinberg, D.L., & Brunel, N. (2015). Inferring learning rules from distributions of firing rates in cortical neurons. Nature Neuroscience , 18 , 1804–1810.

Mainzer, K. (1995). Computer – neue Flügel des Geistes? Die Evolution computergestützter Technik, Wissenschaft, Kultur und Philosophie , 2nd edn. Berlin, New York: de Gruyter Verlag.

Metropolis, N., & Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical Association , 44 (247), 335–341.

Michelson, A.A. (1881). The relative motion of the earth and the luminiferous ether. American Journal of Science , 22 , 120–129.

Michelson, A.A., & Morley, E.W. (1887). On the relative motion of the earth and the luminiferous ether. American Journal of Science , 34 , 333–345.

Morgan, M.S. (2002). Model experiments and models in experiments. In Magnani, L., & Nersessian, N. (Eds.), Model-based reasoning: science, technology, values (pp. 41–58). New York: Kluwer.

Morgan, M.S. (2003). Experimentation without material intervention: Model experiments, virtual experiments, and virtually experiments. In Radder, H. (Ed.), The philosophy of scientific experimentation (pp. 216–235). Pittsburgh: University of Pittsburgh Press.

Morgan, M.S. (2005). Experiments versus models: New phenomena, inference and surprise. Journal of Economic Methodology , 12 (2), 317–329.

Morrison, M. (1998). Experiment. In Craig, E. (Ed.) Routledge encyclopedia of philosophy (Vol. III, pp. 514–518). London: Routledge and Kegan.

Morrison, M. (2009). Models, measurement and computer simulation: The changing face of experimentation. Philosophical Studies , 143 , 33–57.

Nagel, T. (1986). The view from nowhere . Oxford: Oxford University Press.

Naumova, E.N., Gorski, J., & Naumov, Y.N. (2008). Simulation studies for a multistage dynamic process of immune memory response to influenza: Experiment in silico. Annales Zoologici Fennici , 45 , 369–384.

Norton, J.D. (1996). Are thought experiments just what you thought? Canadian Journal of Philosophy, 26 , 333–366.

Norton, S.D., & Suppe, F. (2001). Why atmospheric modeling is good science. In Edwards, P., & Miller, C. (Eds.), Changing the atmosphere (pp. 67–106). Cambridge, MA: MIT Press.

Parker, W.S. (2008). Franklin, Holmes, and the epistemology of computer simulation. International Studies in the Philosophy of Science , 22 (2), 165–183.

Parker, W.S. (2009). Does matter really matter? Computer simulations, experiments, and materiality. Synthese , 169 (3), 483–496.

Parker, W.S. (2010). An instrument for what? Digital computers, simulation and scientific practice. Spontaneous Generations , 4 (1), 39–44.

Peschard, I. (forthcoming). Is simulation a substitute for experimentation? In Vaienti, S., & Livet, P. (Eds.) Simulations and networks . Aix-Marseille: Presses Universitaires d’Aix-Marseille. Here quoted after the preprint http://d30056166.purehost.com/Is_simulation_an_epistemic%20_substitute.pdf .

Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (2007). Numerical recipes. The art of scientific computing , 3rd edn. New York: Cambridge University Press.

Radder, H. (2009). The philosophy of scientific experimentation: A review. Automatic Experimentation 1. open access; http://www.aejournal.net/content/1/1/2 .

Radder, H. (Ed.) (2003). The philosophy of scientific experimentation . Pittsburgh: University of Pittsburgh Press.

Rechenberg, P. (2000). Was ist Informatik? Eine allgemeinverständliche Einführung , 3rd edn. München: Hanser.

Rheinberger, H.J. (1997). Toward a history of epistemic things: Synthesizing proteins in the test tube . Writing science, Stanford University Press.

Rohrlich, F. (1990). Computer simulation in the physical sciences. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association , 1990 , 507–518.

Scholz, O. R. (2004). Bild, Darstellung, Zeichen. Philosophische Theorien bildlicher Darstellung , 2nd edn. Frankfurt am Main: Vittorio Klostermann.

Shapere, D. (1982). The concept of observation in science and philosophy. Philosophy of Science , 49 (4), 485–525.

Skaf, R.E., & Imbert, C. (2013). Unfolding in the empirical sciences: experiments, thought experiments and computer simulations. Synthese , 190 (16), 3451–3474.

Stöckler, M. (2000). On modeling and simulations as instruments for the study of complex systems. In Carrier, M., Massey, G.J., & Ruetsche, L. (Eds.), Science at the century’s end: Philosophical questions on the progress and limits of science (pp. 355–373). Pittsburgh, PA: University of Pittsburgh Press.

Suárez, M. (2003). Scientific representation: Against similarity and isomorphism. International Studies in the Philosophy of Science , 17 , 225–244.

Suárez, M. (2004). An inferential conception of scientific representation. Philosophy of Science , 71 , 767–779.

Sugden, R. (Ed.) (2005). Experiment, theory, world: A symposium on the role of experiments in economics , Vol. 12/2. London: Routledge. Special issue of Journal of Economic Methodology.

Tiles, J.E. (1993). Experiment as intervention. British Journal for the Philosophy of Science , 44 (3), 463–475.

Trenholme, R. (1994). Analog simulation. Philosophy of Science , 61 (1), 115–131.

Turing, A. (1937). On computable numbers, with an application to the entscheidungsproblem, Proceedings of the London mathematical society (Vol. s2–42, no. 1) .

Weber, M. (2005). Philosophy of experimental biology . Cambridge: Cambridge University Press.

Weisberg, M. (2007). Who is a modeler? British Journal for Philosophy of Science, 58 , 207–233.

Winsberg, E. (1999). Sanctioning models. The epistemology of simulation. Science in Context , 12 , 275–292.

Winsberg, E. (2003). Simulated experiments: Methodology for a virtual world. Philosophy of Science , 70 , 105–125.

Winsberg, E. (2009a). Computer simulation and the philosophy of science. Philosophy Compass , 4/5 , 835–845.

Winsberg, E. (2009b). A tale of two methods. here quoted from Winsberg (2010) Ch. 4 pp. 49–71.

Winsberg, E. (2010). Science in the age of computer simulations . Chicago: University of Chicago Press.

Zimmerman, D.J. (2003). Peer effects in academic outcomes: Evidence from a natural experiment. The Review of Economics and Statistics , 85 (1), 9–23.

Download references

Acknowledgments

Thanks to Christoph Baumberger and Trude Hirsch Hadorn for extremely useful comments on an earlier version of this manuscript. I’m also very grateful for detailed and helpful comments and criticisms by two anonymous referees. One of them provided extensive, constructive and extremely helpful comments even about a revised version of this paper – thanks a lot for this!

Author information

Authors and affiliations.

Institut für Philosophie, Universität Bern, Länggassstr. 49a, CH-3000, Bern 9, Switzerland

Claus Beisbart

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Claus Beisbart .

Rights and permissions

Reprints and permissions

About this article

Beisbart, C. Are computer simulations experiments? And if not, how are they related to each other?. Euro Jnl Phil Sci 8 , 171–204 (2018). https://doi.org/10.1007/s13194-017-0181-5

Download citation

Received : 19 August 2016

Accepted : 28 June 2017

Published : 24 July 2017

Issue Date : May 2018

DOI : https://doi.org/10.1007/s13194-017-0181-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Computer simulation
  • Intervention
  • Experimental control
  • Observation
  • Find a journal
  • Publish with us
  • Track your research

experiment definition computer

Welcome to Lab 2.0 where computers replace experimental science

experiment definition computer

Lecturer in Physics, Griffith University

Disclosure statement

Timothy Gould does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

Griffith University provides funding as a member of The Conversation AU.

View all partners

We spend our lives surrounded by hi-tech materials and chemicals that make our batteries, solar cells and mobile phones work. But developing new technologies requires time-consuming, expensive and even dangerous experiments.

Luckily we now have a secret weapon that allows us to save time, money and risk by avoiding some of these experiments: computers.

Thanks to Moore’s law and a number of developments in physics, chemistry, computer science and mathematics over the past 50 years (leading to Nobel Prizes in Chemistry in 1998 and 2013 ) we can now carry out many experiments entirely on computers using modelling.

This lets us test chemicals, drugs and hi-tech materials on a computer before ever making them in a lab, which saves time and money and reduces risks. But to dispense with labs entirely we need computer models that will reliably give us the right answers. That’s a difficult task.

A grand challenge

Why so difficult? Because chemistry is the quantum mechanics of interacting electrons – usually based on Schrodinger’s equation – which require enormous amounts of memory and time to model.

For example, to study the interaction of three water molecules, we need to store around 10 80 pieces of data, and do at least 10 320 mathematical operations .

This basically means that when the universe ends we’d still be waiting for an answer. This is somewhat of a bottleneck.

But this bottleneck was broken by three major advances that allow modern computer models to approximate reality pretty well without taking billions of years.

Firstly, Pierre Hohenberg, Walter Kohn and Lu Jeu Sham turned the interaction problem on its head in the 1960s, greatly simplifying and improving theory .

They showed that the electronic density – a quantum mechanical probability that is fairly easy to calculate – is all you need to determine all properties of any quantum system.

This is a truly remarkable result. In the case of three water molecules, their approach needs only 3,000 pieces of data and around 100 billion maths operations.

Secondly, in the 1970s John Pople and co-workers found a very clever way to simplify the computing method by employing mathematical and computational shortcuts.

This lets us use just 300 pieces of data for three water molecules. Calculations need around 100 million operations, which would take a 1975 supercomputer two seconds but can be solved 500 times in a second on a modern phone .

And finally, the 1990s saw a bunch of people come up with some simple methods to approximate very complex interaction physics with surprisingly high accuracy.

Modern computer models are now mostly fast and mostly accurate, most of the time, for most chemistry.

Quantum mechanical modelling takes off

As a result, computer modelling has transformed chemistry. A quick glance through any recent chemistry journal shows that many experimental papers now include results from modelling.

experiment definition computer

Density functional theory (the technical name for the most common modelling method) is a feature in more than 15,000 scientific papers published in 2015. Its impact will only continue to grow as computers and theory improve.

Modelling is now used to uncover chemical mechanisms , to reveal details about systems that are hidden from experiments , and to propose novel materials that can later be made in a lab .

In a particularly exciting case, computers were able to predict that a molecule C 3 H+ (propynylidynium) was responsible for some strange astronomical observations .

C 3 H+ had never before been seen on Earth. When it was later made in a lab it behaved just as the modelling predicted.

New challenges need new solutions

However, the rise of graphene exposed a major flaw in existing models.

Graphene and similar 2D materials do not stick together in the same way as most chemicals. They are instead held together by what are known as van der Waals forces that are not included in standard models, making them fail in 2D systems.

This failure has led to a surge of interest in computer modelling of van der Waals forces.

For example, I was involved in an international project that used sophisticated modelling to determine the energy gained by forming graphite out of layers of graphene. This energy still cannot be determined by experiments.

Even more usefully, 2D materials can potentially be stacked like LEGO , offering vast technological promise. But there are basically an infinite number of ways to arrange these stacks.

We recently developed a fast and reliable model so that a computer can churn through different arrangements very quickly to find the best stacks for a given purpose. This would be impossible in a real lab.

On another front, electrical charge transfer in solar cells is also difficult to study with existing techniques, making the models unreliable for an important field of green technology.

Even worse, highly promising (but dangerous) lead based perovskite solar cells involve van der Waals forces and charge transfer together, as shown by some colleagues and me .

A substantial effort is underway to deal with this difficult problem, and the equally difficult (and related) magnetism and conduction problems.

Things will only get better

The ultimate goal of computer modelling is to replace experiments almost entirely. We can then build experiments on a computer in the same way people build things in Minecraft.

The computer would model the real world to allow us to save real time and money and avoid real dangerous experiments.

For example, the Titan supercomputer (pictured top) has recently been used to study non-icing surface materials at the molecular level to improve the efficiency of wind power turbines in cold climates.

This ultimate goal was almost met in the 1990s until the experimental scientists came up with graphene and perovskites that showed flaws in existing theories. Researchers like me continue to study, anticipate and fix these flaws so that computers can replace more challenging experiments.

Perhaps the 2020s will be the last decade when experiments are carried out before knowing what the answer will be. That is a certainly a model worth striving for.

  • Supercomputers
  • Scientific experiments

Lecturer / Senior Lecturer in Construction and Project Management

experiment definition computer

Lecturer in Strategy Innovation and Entrepreneurship (Education Focused) (Identified)

experiment definition computer

Research Fellow in Dynamic Energy and Mass Budget Modelling

experiment definition computer

Communications Director

experiment definition computer

University Relations Manager

  • More from M-W
  • To save this word, you'll need to log in. Log In

Definition of experiment

 (Entry 1 of 2)

Definition of experiment  (Entry 2 of 2)

intransitive verb

  • experimentation

Examples of experiment in a Sentence

These examples are programmatically compiled from various online sources to illustrate current usage of the word 'experiment.' Any opinions expressed in the examples do not represent those of Merriam-Webster or its editors. Send us feedback about these examples.

Word History

Middle English, "testing, proof, remedy," borrowed from Anglo-French esperiment, borrowed from Latin experīmentum "testing, experience, proof," from experīrī "to put to the test, attempt, have experience of, undergo" + -mentum -ment — more at experience entry 1

verbal derivative of experiment entry 1

14th century, in the meaning defined at sense 1a

1787, in the meaning defined above

Phrases Containing experiment

  • control experiment
  • controlled experiment
  • experiment station
  • pre - experiment
  • thought experiment

Articles Related to experiment

hypothesis

This is the Difference Between a...

This is the Difference Between a Hypothesis and a Theory

In scientific reasoning, they're two completely different things

Dictionary Entries Near experiment

experiential time

experimental

Cite this Entry

“Experiment.” Merriam-Webster.com Dictionary , Merriam-Webster, https://www.merriam-webster.com/dictionary/experiment. Accessed 19 Sep. 2024.

Kids Definition

Kids definition of experiment.

Kids Definition of experiment  (Entry 2 of 2)

Medical Definition

Medical definition of experiment.

Medical Definition of experiment  (Entry 2 of 2)

More from Merriam-Webster on experiment

Nglish: Translation of experiment for Spanish Speakers

Britannica English: Translation of experiment for Arabic Speakers

Subscribe to America's largest dictionary and get thousands more definitions and advanced search—ad free!

Play Quordle: Guess all four words in a limited number of tries.  Each of your guesses must be a real 5-letter word.

Can you solve 4 words at once?

Word of the day, approbation.

See Definitions and Examples »

Get Word of the Day daily email!

Popular in Grammar & Usage

Every letter is silent, sometimes: a-z list of examples, plural and possessive names: a guide, the difference between 'i.e.' and 'e.g.', more commonly misspelled words, absent letters that are heard anyway, popular in wordplay, weird words for autumn time, much ado about ‘folie à deux’, 10 words from taylor swift songs (merriam's version), 9 superb owl words, 15 words that used to mean something different, games & quizzes.

Play Blossom: Solve today's spelling word game by finding as many words as you can using just 7 letters. Longer words score more points.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Stretching the Traditional Notion of Experiment in Computing: Explorative Experiments

Affiliation.

  • 1 Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Milan, Italy. [email protected].
  • PMID: 26018042
  • DOI: 10.1007/s11948-015-9655-z

Experimentation represents today a 'hot' topic in computing. If experiments made with the support of computers, such as computer simulations, have received increasing attention from philosophers of science and technology, questions such as "what does it mean to do experiments in computer science and engineering and what are their benefits?" emerged only recently as central in the debate over the disciplinary status of the discipline. In this work we aim at showing, also by means of paradigmatic examples, how the traditional notion of controlled experiment should be revised to take into account a part of the experimental practice in computing along the lines of experimentation as exploration. Taking inspiration from the discussion on exploratory experimentation in the philosophy of science-experimentation that is not theory-driven-we advance the idea of explorative experiments that, although not new, can contribute to enlarge the debate about the nature and role of experimental methods in computing. In order to further refine this concept we recast explorative experiments as socio-technical experiments, that test new technologies in their socio-technical contexts. We suggest that, when experiments are explorative, control should be intended in a posteriori form, in opposition to the a priori form that usually takes place in traditional experimental contexts.

Keywords: Computing; Experiment; Experimental control; Explorative experiment.

PubMed Disclaimer

Similar articles

  • Experiments in computing: a survey. Tedre M, Moisseinen N. Tedre M, et al. ScientificWorldJournal. 2014 Feb 4;2014:549398. doi: 10.1155/2014/549398. eCollection 2014. ScientificWorldJournal. 2014. PMID: 24688404 Free PMC article.
  • Experiments on Socio-Technical Systems: The Problem of Control. Kroes P. Kroes P. Sci Eng Ethics. 2016 Jun;22(3):633-45. doi: 10.1007/s11948-015-9634-4. Epub 2015 Feb 22. Sci Eng Ethics. 2016. PMID: 25702146 Free PMC article.
  • Methodology is more than research design and technology. Proctor RW. Proctor RW. Behav Res Methods. 2005 May;37(2):197-201. doi: 10.3758/bf03192687. Behav Res Methods. 2005. PMID: 16171192
  • Promise and challenge of high-performance computing, with examples from molecular modelling. Dunning TH Jr, Harrison RJ, Feller D, Xantheas SS. Dunning TH Jr, et al. Philos Trans A Math Phys Eng Sci. 2002 Jun 15;360(1795):1079-105. doi: 10.1098/rsta.2002.0984. Philos Trans A Math Phys Eng Sci. 2002. PMID: 12804267 Review.
  • Ubiquitous computing in sports: A review and analysis. Baca A, Dabnichki P, Heller M, Kornfeind P. Baca A, et al. J Sports Sci. 2009 Oct;27(12):1335-46. doi: 10.1080/02640410903277427. J Sports Sci. 2009. PMID: 19764000 Review.
  • Editors' Overview: Experiments, Ethics, and New Technologies. Doorn N, Spruit S, Robaey Z. Doorn N, et al. Sci Eng Ethics. 2016 Jun;22(3):607-11. doi: 10.1007/s11948-015-9748-8. Epub 2016 Jan 9. Sci Eng Ethics. 2016. PMID: 26745994 Free PMC article. No abstract available.
  • Geoengineering as Collective Experimentation. Stilgoe J. Stilgoe J. Sci Eng Ethics. 2016 Jun;22(3):851-69. doi: 10.1007/s11948-015-9646-0. Epub 2015 Apr 11. Sci Eng Ethics. 2016. PMID: 25862639 Free PMC article.
  • Ann Sci. 2004 Jul;61(3):389-92 - PubMed
  • Hist Philos Life Sci. 2007;29(3):285-311 - PubMed
  • Hist Philos Life Sci. 1997;19(1):27-45 - PubMed
  • Autom Exp. 2009 Oct 29;1:2 - PubMed
  • Hist Philos Life Sci. 2007;29(3):275-84 - PubMed
  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Controlled Experiment? | Definitions & Examples

What Is a Controlled Experiment? | Definitions & Examples

Published on April 19, 2021 by Pritha Bhandari . Revised on June 22, 2023.

In experiments , researchers manipulate independent variables to test their effects on dependent variables. In a controlled experiment , all variables other than the independent variable are controlled or held constant so they don’t influence the dependent variable.

Controlling variables can involve:

  • holding variables at a constant or restricted level (e.g., keeping room temperature fixed).
  • measuring variables to statistically control for them in your analyses.
  • balancing variables across your experiment through randomization (e.g., using a random order of tasks).

Table of contents

Why does control matter in experiments, methods of control, problems with controlled experiments, other interesting articles, frequently asked questions about controlled experiments.

Control in experiments is critical for internal validity , which allows you to establish a cause-and-effect relationship between variables. Strong validity also helps you avoid research biases , particularly ones related to issues with generalizability (like sampling bias and selection bias .)

  • Your independent variable is the color used in advertising.
  • Your dependent variable is the price that participants are willing to pay for a standard fast food meal.

Extraneous variables are factors that you’re not interested in studying, but that can still influence the dependent variable. For strong internal validity, you need to remove their effects from your experiment.

  • Design and description of the meal,
  • Study environment (e.g., temperature or lighting),
  • Participant’s frequency of buying fast food,
  • Participant’s familiarity with the specific fast food brand,
  • Participant’s socioeconomic status.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

experiment definition computer

You can control some variables by standardizing your data collection procedures. All participants should be tested in the same environment with identical materials. Only the independent variable (e.g., ad color) should be systematically changed between groups.

Other extraneous variables can be controlled through your sampling procedures . Ideally, you’ll select a sample that’s representative of your target population by using relevant inclusion and exclusion criteria (e.g., including participants from a specific income bracket, and not including participants with color blindness).

By measuring extraneous participant variables (e.g., age or gender) that may affect your experimental results, you can also include them in later analyses.

After gathering your participants, you’ll need to place them into groups to test different independent variable treatments. The types of groups and method of assigning participants to groups will help you implement control in your experiment.

Control groups

Controlled experiments require control groups . Control groups allow you to test a comparable treatment, no treatment, or a fake treatment (e.g., a placebo to control for a placebo effect ), and compare the outcome with your experimental treatment.

You can assess whether it’s your treatment specifically that caused the outcomes, or whether time or any other treatment might have resulted in the same effects.

To test the effect of colors in advertising, each participant is placed in one of two groups:

  • A control group that’s presented with red advertisements for a fast food meal.
  • An experimental group that’s presented with green advertisements for the same fast food meal.

Random assignment

To avoid systematic differences and selection bias between the participants in your control and treatment groups, you should use random assignment .

This helps ensure that any extraneous participant variables are evenly distributed, allowing for a valid comparison between groups .

Random assignment is a hallmark of a “true experiment”—it differentiates true experiments from quasi-experiments .

Masking (blinding)

Masking in experiments means hiding condition assignment from participants or researchers—or, in a double-blind study , from both. It’s often used in clinical studies that test new treatments or drugs and is critical for avoiding several types of research bias .

Sometimes, researchers may unintentionally encourage participants to behave in ways that support their hypotheses , leading to observer bias . In other cases, cues in the study environment may signal the goal of the experiment to participants and influence their responses. These are called demand characteristics . If participants behave a particular way due to awareness of being observed (called a Hawthorne effect ), your results could be invalidated.

Using masking means that participants don’t know whether they’re in the control group or the experimental group. This helps you control biases from participants or researchers that could influence your study results.

You use an online survey form to present the advertisements to participants, and you leave the room while each participant completes the survey on the computer so that you can’t tell which condition each participant was in.

Although controlled experiments are the strongest way to test causal relationships, they also involve some challenges.

Difficult to control all variables

Especially in research with human participants, it’s impossible to hold all extraneous variables constant, because every individual has different experiences that may influence their perception, attitudes, or behaviors.

But measuring or restricting extraneous variables allows you to limit their influence or statistically control for them in your study.

Risk of low external validity

Controlled experiments have disadvantages when it comes to external validity —the extent to which your results can be generalized to broad populations and settings.

The more controlled your experiment is, the less it resembles real world contexts. That makes it harder to apply your findings outside of a controlled setting.

There’s always a tradeoff between internal and external validity . It’s important to consider your research aims when deciding whether to prioritize control or generalizability in your experiment.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:

  • A control group that receives a standard treatment, a fake treatment, or no treatment.
  • Random assignment of participants to ensure the groups are equivalent.

Depending on your study topic, there are various other methods of controlling variables .

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). What Is a Controlled Experiment? | Definitions & Examples. Scribbr. Retrieved September 18, 2024, from https://www.scribbr.com/methodology/controlled-experiment/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, extraneous variables | examples, types & controls, guide to experimental design | overview, steps, & examples, how to write a lab report, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

What Is a Computer Simulation and How Does It Work? A Deep Dive.

Whether it’s predicting how viruses spread or forecasting weather conditions, computer simulations replicate real-world events so we can plan for the future.

Mike Thomas

Computer simulations are programs that run various mathematical scenarios to determine the potential scope or impact that a particular scenario could have.

For example, simulations help car manufacturers to virtually crash test their new lines of vehicles. Instead of physically crashing dozens of new cars, researchers run simulations to see all possible scenarios that could occur to both the vehicle and passengers in a multitude of accidents. These simulations determine if the car is safe enough to drive.

The idea is that computer simulations allow researchers to replicate possible real-world events — ranging from the spread of infectious diseases to impending hurricanes — so we can save time and money planning for the future.

What Is a Computer Simulation?

Put simply, computer simulations are computer programs that model a real-life scenario or product and test many possible outcomes against it.

“If it’s too large-scale, too expensive or too risky to work with the real system itself — that’s why we use computer simulation,” Barry Nelson, a professor of engineering at Northwestern University, said. “Simulation allows you to create data or systems that are conceptual, that people want to build, want to consider or want to change. I sometimes say that simulation is data analytics for systems that don’t yet exist.”

Computer Simulation Definition

A computer simulation uses mathematical equations to model possible real-world scenarios, products or settings and create various responses to them. It works by duplicating the real-life model and its functions, and once the simulation is up and running, the simulation creates a record of what is being modeled and its responses which is translated into data.

Although computer simulations are typically used to test potential real-world scenarios, they can be more theoretical too. In 2016, scientists at Argonne National Laboratory near Chicago, Illinois concluded that it would take only a couple of months for zombies to overrun the city and wipe out its population.

Fortunately, we now have “the knowledge to develop an actionable program to train the population to both better defend themselves against zombies and also take offensive actions that are the most effective,” Chick Macal, an Argonne senior systems engineer, told Built In. Phew.

Zombies aren’t real, but infectious diseases are. Macal and his co-researchers wanted to predict how more plausible infectious diseases might spread, and to determine the most effective methods of intervention and policy action. Their research relied on what’s called agent-based computer modeling and simulation. This method has allowed researchers in all types of academic disciplines and commercial industries to figure out how things (equipment, viruses, etc.) would function or act in certain environments without having to physically replicate those conditions. In the case of Macal and his cohorts, that means no humans living — or undead — were harmed in the course of their work. 

Macal’s colleague, computational scientist Jonathan Ozik, described this part of their work as the “computational discovery of effective interventions,” and it’s especially good at working with a particular population of people. An added benefit, he said, is that “we can do these experiments without worrying about the cost of experiments or even ethical and privacy considerations,” because the populations they study are synthetic, mathematical representations, not the real thing.

Read Next: What Is Quantum Computing?

How Do Computer Simulations Work? 

Computer simulation is a step-by-step process in which a computer simulation program is modeled after a real-world system (a system can be a car, a building or even a tumor). In order to replicate the system and possible outcomes, the simulation uses mathematical equations to create an algorithm that defines the system’s state, or the combination of different possible variables.

If you’re simulating a car crash, for example, the simulation’s algorithm can be used to test what would happen if there was a storm during the crash against what happens when the weather is more mild. 

The simulation calculates the system’s state at a given time (t), then it moves to t+1 and so on. Once the simulation is complete, the sequence of variables are saved as large datasets, which can then be translated into a visualization.

“We’re not interested in simply extrapolating into the future,” Macal said. “We’re interested in looking at all the uncertainties as well as different parameters that characterize the model, and doing thousands or millions of simulations of all the different possibilities and trying to understand which interventions would be most robust. And this is where high-performance computing comes in.”

Thanks to the robust data-crunching powers of supercomputers , simulation is more advanced than ever — and evolving at a rapid pace. 

The computational resources at their disposal, Ozik said, allow researchers “to fully explore the behaviors that these models can exhibit rather than just applying ad hoc approaches to find certain interesting behaviors that might reflect some aspect of reality.”

Which is to say, the simulations are much broader, and therefore even more realistic — at least from a hypothetical perspective. 

Computer Simulations in the Real World 

Plenty of simulations are done with far less computing power than Argonne possesses. Alison Bridger, department chair of meteorology and climate science at San Jose State University in California, said on-site cluster computers are strong enough to run the climate simulation models she builds. Cloud computing services like those offered by Amazon (AWS) and Microsoft (Azure) are gradually gaining a foothold in the space as well.

Along with nuclear physics, meteorology was one of the first disciplines to make use of computer simulation after World War II. And climate modeling, Bridger said, “is like a close cousin of weather forecasting. Back in the 1960s, people used early weather forecasting models to predict the climate. Before you can predict the weather, you have to be able to properly reproduce it with your model.”

Bridger’s work employs a widely used “local scale” model called WRF, which stands for Weather, Research and Forecasting and can produce “reasonably good simulations of weather on the scale of, say, Northern Illinois — so Chicago up to Green Bay and down into the central part of the state. It will forecast things like high and low temperatures, rain and so forth. And it’s typically only run to simulate 24, 48 or 72 hours of weather.”

In further explaining her process, Bridger employs the imagery of a cube centered over Chicago that’s roughly a kilometer east-west by a kilometer north-south. The goal is to predict the temperature in the cube’s center and extrapolate that reading to the entire thing. There are also, in her telling, additional cubes surrounding the initial one “stacked up all the way to the top of the atmosphere” whose future temperatures will be predicted in various time increments — in an hour, in 12 hours, in one day, in three days and so on. 

Next, temperature-affecting variables are added to the mix, such as amount of sunshine, cloud cover, natural disasters like wildfires and manmade pollution. It’s then a matter of applying the laws of physics to determine a variety of weather-related events: rising and falling temperatures, amount of wind and rain.

Computer simulations can be used for much more than climate and weather predictions.

6 Examples of Computer Simulations

Whether scientists want to better understand healthcare responses or even explore blackholes, computer simulation allows for important research opportunities. Here are six that stand out:

1. Responding To Pandemics

Along with Ozik and their fellow researcher Nick Collier, Macal also worked on a modeling and simulation project that determined what might happen if the deadly Ebola virus — which initially spread through West Africa in 2013 through 2016, with devastating effects — would impact the U.S. population. Part of that process involved visiting Chicago hospitals to learn about Ebola-related procedures, then incorporating those procedures into their models.

2. Improving Cancer Treatment

Other Argonne scientists have used modeling and simulation to improve cancer treatment through predictive medicine, finding out how various patients and tumors respond to different drugs.

And one 2019 study found positive results in simulating breast cancer tumors. For the study, researchers built a computer simulation that modeled tumors from four different patients under 12-week therapy treatments. After two of the simulated tumors didn’t respond to treatment, they concluded that more frequent, lower doses of chemotherapy could reduce a low proliferative tumor, while lower doses of antiangiogenic agents helped poorly perfused tumors respond to drug treatment better.  

3. Predicting Health Code Violations

In Chicago, the city’s Department of Public Health uses computer modeling and simulation to predict where critical violations might pop up first. Those restaurants are then bumped to the top of a 15,000-establishment list that’s overseen by only three dozen inspectors. And apparently it’s working: One simulation yielded 14 percent more violations, which ideally means earlier inspection and a lower chance of patrons getting sick from poorly refrigerated fish.

4. Understanding Our Relationship with Religion and Crisis

Computer simulation is being used in interesting ways at the University of Boston. Wesley Wildman, a professor of philosophy, theology and ethics, uses computer simulation to study — as he put it in an article for The Conversation — “how religion interacts with complex human minds, including in processes such as managing reactions to terrifying events.”

In order to do so, he and his team designed a world and filled it with computer-controlled characters, or “agents,” that are “programmed to follow rules and tendencies identified in humans through psychological experiments, ethnographic observation and social analysis.” 

Then they observed what happened when their agents were tested against real-world examples like a massive earthquake that struck Christchurch, New Zealand in 2011.

“The better our agents mimic the behavior of real humans in those sorts of circumstances,” Wildman said, “the more closely aligned the model is with reality, and the more comfortable we are saying humans are likely to behave the way the agents did in new and unexplored situations.”

5. Researching Earthquakes

In Germany, a team at the Leibniz Supercomputing Centre performed earthquake simulations using the devastating Indian Ocean earthquake of 2004, which spurred a massive tsunami, as their point of origin. 

According to Professor Michael Bader of Germany’s Institut für Informatik, they wanted to “better understand the entire process of why some earthquakes and resulting tsunamis are so much bigger than others. Sometimes we see relatively small tsunamis when earthquakes are large, or surprisingly large tsunamis connected with relatively small earthquakes. Simulation is one of the tools to get insight into these events.”

But it’s far from perfect. New York Times reporter Sheri Fink detailed how a Seattle-based disaster response startup called One Concern developed an earthquake simulation that failed to include many densely populated commercial structures in its test-runs “because damage calculations relied largely on residential census data.” The potential real-world result of this faulty predictive model: Rescuers might not have known the location of many victims in need.

6. Exploring Black Holes 

In 2022, researchers built a black hole simulation by modeling a single-file chain of atoms to create the event horizon of a black hole. This led to the team observing Hawking radiation, the hypothetical theory that particles formed by the edge of a black hole create temperatures that are inversely proportional to a black hole’s mass. 

Although the research is still in its early stages, this could potentially help scientists understand and resolve the differences between the general theory of relativity and quantum mechanics.

Uses of Computer Simulation in Different Industries 

In the past 75 years, computer modeling and simulation has evolved from a primarily scientific tool to something industry has embraced for the purposes of optimization and profitability. 

“Industry is embracing simulation at a faster rate than ever before and connecting it to what I would call data analytics for things like scheduling and supply chain management ,” Macal said. “Industry is trying to simulate everything they do because they realize it’s cheaper and quicker than actually building a prototype system.”

When Northwestern’s Nelson spoke with Built In, he had recently returned from the annual Applied Probability Conference. There, the simulation applications discussed included but weren’t limited to the following: aviation modeling, cybersecurity , environmental sustainability and risk, financial risk management, healthcare, logistics , supply chain and transportation, semiconductor manufacturing, military applications, networking communications, project management and construction.

“Frequently, companies that use simulation want to optimize system performance in some sense,” Nelson said, using a car company that wants to build a new assembly plant or decide what vehicles to bring to market as an example. 

“So optimization is a key to lots of business in industry, but optimal solutions are often brittle,” he continued. “By which I mean, if small issues about the assumptions or the modeling approximations you made are wrong, then suddenly something that appeared to be optimal in your model can be catastrophically bad.”

Nelson added: “When people build mathematical and computer models, even though the model may have been built from data, they treat it as if the model is correct and therefore the solution that [results] is optimal. What we try to do is continue to incorporate in the model the uncertainty that was created when we built it.”

The financial crisis of 2008, Nelson said, is one instance where model risk was detrimentally downplayed.

“The financial industry uses a tremendous number of very sophisticated mathematical computer modeling [methods],” he said. “And it’s quite clear that the correlations among various financial instruments and securities and so on were kind of ignored, so we got cascading failures .”

Such cautionary tales, however, don’t mean that those who create the mathematical and computer models on which simulations are based must strive for perfection, because no model is perfect and models drive progress. Demanding perfection, Nelson said, “would paralyze us. But as we start to make more life-critical decisions based on models, then it does become more important to account for risks.”

Related Reading: 17 High-Performance Computing Applications and Examples

The Future of Computer Simulations 

Imagine this: It’s years from now and someone you know has been diagnosed with a cancerous tumor. But instead of immediately bombarding them with radiation and highly toxic chemotherapy drugs and hoping for the best, doctors instead perform tests from which they create a mathematical, virtual twin of that person’s malignant growth. The digital replica is then subjected to computational interventions in the form of millions or even billions of simulations that quickly determine the most effective form of treatment.

It’s less fantastical than it sounds.

“Recent developments in cancer-specific ‘big data’ and experimental technologies, coupled with advances in data analysis and high-performance computational capabilities, are creating unprecedented opportunities to advance understanding of cancer at greater and more precise scales,” the National Cancer Institute reported .

Other revolutionary developments with far-reaching impact are already being implemented. 

As Los Alamos National Laboratory physicist Justin Smith told Science Daily , “we can now model materials and molecular dynamics billions of times faster compared to conventional quantum methods, while retaining the same level of accuracy.”

That’s good news for drug developers, whose researchers study molecular movement in order to see what’s suitable for use in pharmaceutical manufacturing, as well as patients who are all too often caught up in a detrimental guessing game when it comes to treatment.

Penn State researchers working in tandem with colleagues at the University of Almeria in Spain developed “a computer model that can help forecasters recognize potential severe storms more quickly and accurately.” As Steve Wistar, a senior forensic meteorologist at AccuWeather, explained, the tool could lead to better forecasts because he and his fellow forecasters will have “a snapshot of the most complete look of the atmosphere.”

And so, while we may or may not be living in a computer-simulated world , the world is being transformed by computer simulation. As computers get faster and research methods more refined, there’s no telling where it might lead.

Mudi Yang, a cosmos-simulating high school senior from Nashville, put it eloquently when he said , “Computer simulations gave us the ability to create virtual worlds, and those virtual worlds allowed us to better understand our real one.”

34. Experimental Methods in Human-Computer Interaction

  • 34.0.1 Introduction

With its roots in Psychology, experiments have always played an important role in Human-Computer Interaction (HCI). For example, early experimental HCI papers include English et al. (1968), Card, Moran and Newell (1980) and Malone (1982). From these early days, the psychology-style experiment developed to become the basis for usability tests where different designs are compared through controlled studies (see, for example, Preece, Rogers and Sharp, 2010) and measured for important differences in the key aspects of usability, that is, efficiency, effectiveness and satisfaction (Hornbaek and Law, 2007). This was embodied in the proposed engineering approach to HCI, such as Dowell and Long (1989), where new interfaces would be engineered through the application of scientifically established knowledge about human factors to achieve predictable and measurable effects when people used the interfaces

Alongside this though, HCI has always recognised the far-reaching nature of technology in human lives and so has also increasingly been influenced by other disciplines such as the social sciences, cultural studies and more recently the arts and humanities. These different influences have brought their own research methods such as ethnomethodology (Suchman, 1995) and critical readings (Blythe et al. 2008 ). Additionally, the vision of engineering interfaces based on scientific principles has foundered as HCI has moved from merely worrying about the productivity of users to being concerned with the wider effects of systems on user experiences (McCarthy and Wright, 2005), including fun (Blythe at al. 2006), pleasure (Jordan, 2001), immersion (Jennett et al. 2008), and so on.

Nonetheless, experiments have maintained a steady presence within HCI and adapted, where appropriate, to address these more wide-ranging aims. They still embody the principle of gathering well-founded data to provide rigorous, empirically validated insights into interactions but made suitable for wider research questions. This can be seen, in part through moves within the HCI community to develop an orientation to Interaction Science, of which theories and experiments form the cornerstone (Howes et al., 2014).

Also, with increasing maturity, there is now a generation of HCI researchers who have only ever been HCI researchers, unlike in the early years where people moved into HCI having done their early research in entirely different disciplines, such as Psychology, Physics, English, Mathematics and so on. Thus, there are appearing texts that are aimed at describing the research methods and, in particular, the experimental methods that these second generation HCI researchers need. Overviews of experimental methods are given in Cairns and Cox (2008a), Lazar et al. (2009) and Gergle and Tan (2014). Purchase (2012) is entirely about the design and analysis of experiments in HCI. Additionally, there are texts that draw strongly on experimental methods such as Tullis and Albert (2008) and Sauro and Lewis (2012) in order to demonstrate how to effectively measure user interactions with a view to improving the design of interactive systems. These texts therefore also understandably provide a lot of details and insights about the design and conduct of experiments in HCI even if not in a purely research context.

There are also research articles that focus in on specific aspects of experiments to help address the problems of experimental methods that are specific to their deployment in HCI. These research articles address diverse topics like the quality of statistical analysis (Cairns, 2007), new statistical methods (Wobbrock et al., 2011) and the practicalities of running good experiments (Hornbaek, 2013).

This sets a challenge for the goal of this chapter. Experiments are clearly an important approach in the modern HCI researcher’s toolkit but there are several existing resources that describe experimental methods well and in more detail than is possible here. There are already overview chapters (including one by me!) that provide a starting resource for new researchers. What can this chapter usefully add? First, it should be noted that this is an Encyclopaedia chapter and therefore if this is an important topic in HCI, it ought to be represented. This chapter aims to represent this topic. Secondly, I think there is a perspective on conducting experiments in HCI that has not been explicitly addressed and that is one, which I wish to present here. Which is not to say that my esteemed colleagues already writing in this area are unaware of anything in this chapter but rather that this is my particular synthesis based on my own experiences of using experimental methods and more particularly of teaching students to use them.

Understandably, students partly struggle to learn experimental methods because they are new and unfamiliar but it seems some of the problems arise because they do not fully understand why they need to do things the way they are told to (whether by me or by textbooks). There seems to be a ritual mystique around experiments that is not made wholly clear. This is a problem noted in other disciplines, like Psychology where students and researchers ritually follow the experimental method (Gigerenzer, 2004), but fail to identify when the method is invalid or ineffective. The aim of this chapter is thus to communicate concisely some of the core ideas behind the experimental method and to demonstrate how these play out in the conduct of actual experimental research in HCI. From this perspective, I identify three pillars that underpin good experiments. These are:

Experimental design

Statistical analysis

Experimental write-up

If any of these pillars is weak, then the contribution of an experiment is weak. It is only when all three work together that we can have confidence in the research findings arising from an experiment.

This chapter discusses each of these pillars in turn and along the way shows how each is related to the other. Before doing this though, it is necessary to think about what experiments are. This is philosophy. Too much philosophy is crippling (at least for an HCI researcher though obviously not for philosophers). But an unconsidered life is not worth living (Plato, 1956, The Apology). So here goes.

  • 34.0.2 A Little Philosophy of Science

Though there is much to disagree about what science is, and indeed whether scientific method is in fact a coherent, meaningful term (Feyerabend, 2010), there is general agreement that science is about theories and experiments. Theories are statements expressed mathematically (“E = mc2”) or linguistically (“digital games fulfil people’s need for autonomy”) that describe something about how the world works and therefore can be used to predict to some extent what may happen in certain situations (Chalmers ,1999, chap. 4). Experiments are somewhat harder to define as they depend on the domain, the research questions and even the scientist doing them. As such they are full of craft skill and experience but basically they can be understood as tests or trials of ideas.

Historically, theory has dominated the philosophy of science with much concern about what a theory is and how we know it is true. And this reflected in the dominant influences in philosophy of science such as Popper (1959) and Kuhn (1962). Much of their thoughts are on understanding the nature of scientific theory. In these approaches, experiments are tests of theories that are able to undermine and even destroy theories but are essentially subservient to them. If this really were the case, this would be a big problem for HCI. It is hard to point within HCI to large, substantial and robust theories that can easily predict how people will interact with a particular system and what the outcome will be. Fitts’ Law (MacKenzie, 1992) is a good example of a well validated theory in HCI but, first, its specific implications for any differences in say button layout are so small as to be negligible (Thimbleby, 2013), and secondly it is not even interpreted correctly when it is used to inform design. For instance, it is often used to justify putting things at the edges of screens because then buttons effectively have infinite width (eg Smith, 2012) but updates of Fitts’ Law to modern interfaces show that both height and width are important (MacKenzie and Buxton, 1992).

Just in case you are in doubt about the status of theory within HCI, contrast this with psychology where there are many well-established theories that predict people’s behaviour in a wide variety of contexts: perception and inattentional blindness (Simons and Chabris, 1999); anchoring biases in decision making (Kahneman, 2012); embodied cognition (Wilson, 2002) and so on. This is not to say that these theories play out so clearly in our ordinary day-to-day lives but they do appear robustly in the lab and often outside of it as well. HCI would be hard pushed to demonstrate any such substantial theory of interactions.

Fortunately for HCI, more recently, it has been recognised that experiments have their own value independently of theories. This approach is termed new experimentalism, for example Chalmers (1999), and reflects the fact that sometimes experiments exist because of an idea but not necessarily one which has support from any existing theory and in some cases, quite the opposite. A classic example of this is Faraday’s first demonstration of the electric motor. It was known in Faraday’s time that there was some interaction between electrical currents and magnetic fields but it was Faraday who clearly isolated this effect by showing that a suspended wire would consistently rotate around a magnet when an electrical field was passed through it (Hacking, 1983). It took another 60 years for Maxwell to define the full theory of electromagnetism that would account for this even though that theory immediately seemed obviously flawed as it predicted a constant speed of light regardless of the motion of the observer (which turned out to be true!).

If experiments are not testing theories then what are they? In one sense, they simply isolate a phenomenon of interest (Hacking, 1983). How we identified that phenomenon in the first place (by theory, hunch or blind chance) is irrelevant. Once we have isolated it and can reliably reproduce it, that experiment stands as something that requires explanation, much like Faraday’s motor did. Few experiments though really are pure chance. Instead, they arise from a concerted effort by one or more people to identify something interesting going on, to isolate a recognised phenomenon.

A more sophisticated account of experiments’ role in science is that an experiment is a severe test of an idea (Mayo, 1996). In this formulation, we may have an idea, be it an established theory or some hunch about how things work, and we set up a situation where that idea is able to be severely tested. For example, we may believe that digital games are able to improve the outcomes of psychotherapy. So we need to set up a situation in which it ought to be clear that digital games have had an improvement, or not, on the outcomes of psychotherapy and that the cause of any such improvement can be directly attributed to the digital games. This has immediate consequences for how such an experiment might look: fixed style of therapy; possibly even a fixed therapist or set of therapists; comparable patients; clear assessment of outcomes; and so on. But nonetheless, having levelled the playing field, we expect digital games to demonstrate their improvements. The experiment isolates the phenomenon of interest, in this example; digital games in the context of psychotherapy, and sets it up so that if it is going to fail, it is obviously going to fail. Or if it succeeds then it is clear that the digital game is the sole cause of success. And so it is by such careful experiments that the idea is severely tested and each experiment therefore provides evidence in support of it.

The other thing to note in this definition of severe testing is that, in setting up the experiment, there is a prediction that is being tested and the prediction has structure, namely that, in situations represented by the experiment, the outcome will be a certain way. This prediction is central to any experiment (Abelson 1995; Cairns and Cox 2008a) but it should also be noted that it is a causal prediction: when X happens, Y will follow. Both these points are important in how experimental methods are defined and how statistical analysis is conducted.

Another immediate implication of seeing experiments as severe tests is that no single experiment can be enough to test all the implications of single idea (at least not one with any claim to have more than the narrowest impact). One experiment is only one test. There may be other tests where the idea might fail. Or other refinements of the test that are more severe and therefore better tests. Experiments do not live in isolation but need to form a cluster around a particular idea and although passing tests is important for establishing the robustness of an idea, it is only evidence in support of the idea and never proof that the idea is correct.

As with any branch of philosophy, Mayo’s notion of severe testing is not the last word on what experiments are nor on how they fit with science (Mayo and Spanos, 2010). However, it clearly has a good basis and, more pragmatically, I have found it to be very useful in the context of HCI. Many HCI experiments do not have recourse to theory: the contexts, tasks and devices under consideration are too complex to fall under a simple, well-accepted theory. Instead, researchers tend to argue for why a particular interaction should lead to a particular outcome in certain circumstances and in doing so hope to advance understanding of these sorts of interactions. Perhaps this in time will lead to a theory but perhaps it may only help designers with some solid information. A traditional usability test done in the form of an experiment is the best example of this. The researchers are not interested in a formal theory but simply how something works in some particular system as it will be used in some particular context. The interaction being predicted is being put under a severe test, where it might not play out as hoped, and when it passes the test then there is good evidence that the interaction is as understood and so merits further use or investigation.

What is also interesting for the purposes of this chapter is that the notion of severe testing does explain many features (and problems) of experimental methods in HCI even though the methods used pre-date this philosophy. It is also worth noting that none of the authors mentioned who have previously written about experiments in HCI (including myself) have adopted the notion of experiments as severe tests. Nonetheless, the people who developed, deployed and advocated such experiments seemed to know what was important in a good experiment even if they did not have the philosophy to say why explicitly.

Having positioned experiments philosophically in HCI, each of the three pillars of experimental research is now considered in turn.

  • 34.0.3 Experimental Design

Experimental design is basically the description of what exactly will go on in an experiment. Computer scientists often use the acronym GIGO, garbage in, garbage out, to describe the output from programmes based on poor quality data. So it is with experiments: the data from an experiment cannot be any better than the experimental design that produced it. Fortunately, HCI people are good at understanding the complexity of the design process which combines innovation , craft skill and sound knowledge (Lawson, 1997). Experimental design is no different in that regard from any other sort of design. What is perhaps deceptive to a new researcher is that experimental designs are written up in papers as objective descriptions of a state of affairs and the complex processes that led to that particular experiment are glossed over. In particular, the fact that an experimental design may have been iterated on paper, tested in a pilot or even failed entirely and iterated further may only merit the briefest of mentions, if any.

Given that this craft element of experimental design is rarely seen except in text books (eg Purchase, 2012), it can be hard to perceive the key features that are needed in a good design or the thoughts that need to go into the design process. But these features are important because in the process of moving from a big idea to an experiment, there are many choices that need to be made and, as any designer will tell you, correct choices are not always obvious. At the same time, there are definitely choices that are wrong but this is not obvious when you are new to experimental design.

The starting point of an experiment as a severe test is to set up a situation that tests an idea and that idea must be causal: one thing should influence another. In HCI-style experiments, this is also expressed as seeing the effect of the independent variable on the dependent variable. The independent variable is what is in the experimenters’ control and the experimenter explicitly manipulates it. The dependent variable is the numerical measure of what the outcome of the manipulation should be: the data that the experimenter gathers.

However, holding in mind GIGO, it is vital to ensure that the data coming out of the experiment is of the highest quality possible. In the world of experiments, quality is equated with validity and generally four types of validity are important (Yin, 2003; Harris, 2008):

Construct validity

Internal validity

External validity

Ecological validity

The ordering here is not essential but can be understood as moving from the details of the experiment to the wider world . They of course are all necessary and relate to each other in ensuring the overall quality of an experiment as a severe test. There are also other types of validity and slightly different ways of slicing them up but these four provide a firm foundation.

  • 34.0.3.1 Construct validity

Construct validity is about making sure that you are measuring what you think you are measuring. If this is not the case, despite measuring a dependent variable, the experiment is not testing the idea it is meant to. Accurate, meaningful measurement is therefore at the core of experiments in HCI.

Construct validity may seem trivial at first glance but in fact it is easy to mistake not only what a measure is but also what it means. Take something that is relatively uncontroversial to measure but relevant to HCI, time. Time, for our purposes, is well-defined and easily measured using a stopwatch or even clocks internal to the system being used. So where an experiment is looking at efficiency, the time it takes a person to complete a task may seem relatively uncontroversial: it is the time from being told to start the task to the time at which they stop it. But, thinking about this in the context of particular tasks, even time on task can be hard to specify. For example, suppose you are doing an experiment looking at people’s use of heating controllers, which are notoriously difficult to use (Cox and Young , 2000). The task might be something like setting the heating to come on every weekday at 6am and to go off at 9am. It is clear when people start the task but they may stop the task when they think they have completed it but not when they have actually completed it. For instance they may have set it to come on every day at 6am or only on Mondays, both of which are potentially easier or harder tasks depending on the controller design. To use the measured time as task completion time would therefore be flawed. And even when people have completely corrected all the steps of a task, they may have made a mistake so that the timer comes on at 6pm not 6am. The process of correctly checking the time could add to the completion time, particularly if checking can only be done through backtracking through the process. So an experimenter may choose to only consider those people who completed the task correctly but that could mean throwing away data that has potentially important insights for people who want to design better heating controllers. The people who did complete the task accurately may not be reliable representatives of the wider population which introduces another problem into the measurement. So what is the right measure of time in this context?

Consider further the measurement of time of resumption after interruptions or initiation of an action after moving to a different device. These are interesting questions about the use of interactive devices but it is not simply the case of setting a stopwatch going. The experiment needs to be designed to make it clear what are meaningful points at which to start and stop timing and researchers need to make careful decisions about this, for example, Andersen et al. (2012).

Increasingly, HCI is not just concerned with objective measures like time but subjective measures related to user experience like appreciation of aesthetics , enjoyment of a game or frustration with a website. Though there is some move to use objective measures like eye-tracking (Cox et al., 2006) or physiological measures (Ravaja et al., 2006) these still need to be traced back to the actual experiences of users which can only be done through asking them about their experiences.

In some cases, a simple naïve question, “how immersed were you in that game (1-10)?”, can provide some insights but there is always the risk that participants are not answering the question in front of them but are in fact answering something they can answer like how much they enjoyed the game (Kahneman, 2012) or what they think they are meant to answer (Field and Hole, 2003) Even so, questions of this sort are regularly seen in HCI and should be treated with doubt about their construct validity: how do we know that they really are measuring what they say they are?

Questionnaires are therefore useful tools for standardising and quantifying a wide range of subjective experiences. The idea behind a questionnaire is that people are not able to directly express their experiences accurately: one person’s immersion is another person’s enjoyment. Instead, within people’s thoughts there are internal constructs, also called latent variables, that are common to people’s thinking generally but hard for them to access directly. They are nonetheless believed to be meaningful and moreover can be measured. Such measurements can be compared between people provided they can be meaningfully revealed. Questionnaires aim to ask questions about different aspects of a single internal construct indirectly through the thoughts which people are able to access. By combining the answers to these accessible thoughts, it is possible to assign meaningful numerical values to the ( inaccessible ) latent variables. For example, while a person may not really understand what is meant (academically) by immersion, they may be able to more reliably report that they felt as if there were removed from their everyday concerns, that they lost track of time and so on (Jennett et al. 2008). A questionnaire consisting of these items is more likely to build up a reliable picture of immersion.

Simply having a questionnaire is not enough to guarantee construct validity. Good questionnaire design is its own research challenge (Kline, 2000). It initially needs a great deal of care just to produce a questionnaire that has potential and then even more work to demonstrate that it has relevance in realistic contexts. HCI understandably relies a lot on questionnaires to help get access to subjective experiences but many questionnaires are not as robust as claimed. A questionnaire designed specifically for an experiment suffers the same faults as direct questions: social desirability , lack of consistent meaning and answering a different (easier) question. Moreover, a worse crime is to use an experiment to demonstrate the validity of new questionnaire: “this experiment shows that gestural interfaces are easier to learn and moreover validates our questionnaire as a measure of ease of learning ” is a circular argument showing only that the experiment showed something was different about gestural interfaces. Even where questionnaires have been designed with more care, there is still a big issue in HCI as to whether or not they are still sufficiently well-designed (Cairns, 2013).

Overall then, whether the measure to be used is about objective variables or subjective experiences, every experiment needs a measure of the effect of the experimental manipulation. All such measures have the potential to be flawed so care needs to be taken at the very least to use a measure that is plausible and justifiable.

  • 34.0.3.2 Internal validity

If an experiment is set up to severely test whether influence X really does affect outcome Y then it needs to be clear any systematic changes in Y (now that we know we really are measuring Y thanks to construct validity) are wholly due to the change in X. This is the issue of internal validity.

The most obvious threat to internal validity comes from confounding variables. These are things other than X that might potentially influence Y and so when Y changes, we have not tested that it is only X causing Y to change and that consequently our experiment is not severe. Consider an experiment to test the effect of information architecture of a company’s website on people’s trust in the company. In manipulating information architecture, an enthusiastic researcher might “make improvements” to other aspects of the website such as use of colour or logos. Thus, when it comes to interpreting the results of the measure of trust in the company, any differences in trust between the different versions of the website might be due to the improved aesthetic qualities of the website and not the revised organisation of information on the website. The revised aesthetics is a confounding variable. There may be more subtle effects though that even a careful researcher cannot avoid. The information architecture may be being revised in order to better categorise the services of the company but in doing so the revisions might result in navigation bars being shorter. This makes it quicker for users to make choices and hence improve their sense of progress resulting in a better user experience all round including their sense of trust in the company. This is not a consequence of improved information architecture but simply shortening of menus – any reasonable shortening of menus might have been equally good!

Confounding variables can creep into an experiment in all sorts of ways that are nothing to do with the experimental manipulation. Some are easier to spot than others, for instance, having all men in one condition of the experiment and all women in another means that the sex of the participant is a confounding variable. A less obvious example is having only Google Chrome users in one condition and Firefox users in the other. This cannot be noticed at all unless the experimenter asks specifically about browser usage. In both cases, say in an experiment about speed of entering text on mobiles, there may be no apparent reason why such differences should be relevant but they cannot be ruled out as the reason for any systematic variation in outcomes. They are therefore potential confounds. Other confounds might be: time of day at which participants take part in the experiment; familiarity with the task, hardware or software being used; how the experimenter greets participants; the rooms in which the experiment is done; and so on. The list is potentially endless.

There are two general approaches to removing confounds. The first is to randomize participants between conditions of the experiment. The second is to experimentally control for potential confounds. In randomization, it is assumed that if people are randomly allocated to experimental conditions there is no potential for a systematic difference between the conditions to appear. Nonetheless, if you suspect there might be potential confounds, such as experience with a particular web browser then asking about this is a good idea, if only to discount it as a confound. Furthermore, it also gives the opportunity to add in such confounds as factors in the statistical analysis and so exercise statistical control over the confounds. In experimental control though, the approach is to remove variation between participants that might have confounding effects: experiments are all conducted in the same room; only men are asked to participate in the study (Nordin et al., 2014); people are screened for technology use before being allowed to participate; and so on. Even with experimental control, it is not possible to remove all possible confounds but only some of the worse ones that the experimenter could think of in advance.

Aside from confounds, there is a further threat to internal validity which arises from what I call experimental drift. In the process of developing any experiment, there are necessarily decisions that must be made about the participants used, the tasks they do, how the system is set up and so on. All of these decisions must have a degree of pragmatism about them: it is no good having a wonderful idea of how pre-industrial cultures are able to engage with touchscreen interfaces if you are unable to contact anyone from a pre-industrial culture! However, in certain contexts, what starts off as a sensible idea for an experiment is eroded through the practical concerns of devising the experiment. This is particular prevalent in HCI in studies intended to examine some influence on the user-centred design process. For example, a very good question is whether usability design patterns (Tidwell, 2005) improve the usability of interactive systems. So of course the ideal experiment sets up a situation where the same system is developed by two equally skilled teams but one uses design patterns and one does not. The two designs are then evaluated for usability. There are very few commercial companies that would expend the effort and cost of having two teams develop the same system. So some non-commercial context is preferred. Fortunately, many researchers are in universities so using the resources at hand, students are excellent replacements for commercial design teams and moreover multiple teams could be set up to do the same design task. Such students are most motivated if in fact they do the work as part of module on HCI and furthermore as the assessment for that module and design patterns may be especially taught to some but not other students to see how they are used by the students. But already it is clear that there is a big move away from professional design teams using patterns to novice designers (students) using a technique that they may have only just learned. Aside from all the confounding variables that might be introduced along the way, even if the experiment gives the desired result, does it really show that when design patterns are used that usability of the end-product improves? The experiment is at best only obliquely testing the original proposed causal idea.

Internal validity then can only be maintained by a process of vigilance whereby an initial experimental design is iteratively reviewed for its overall coherence but also possible confounds that the design might be introducing.

  • 34.0.3.3 External validity

Experiments naturally reduce from a general question with wide applicability to the specifics of the experiment actually done. The question then becomes to what extent the results of the experiment do have the intended wider applicability. This is the external validity or generalisability of the experiment.

In all experiments, the external validity of an experiment is a matter of judgment. For a typical HCI experiment, certain people were asked to do certain tasks in a particular context. The external validity of the experiment is the extent to which the results of the experiment generalise to other people doing other tasks in other contexts. To illustrate this, consider an experiment on the effect of accelerometer-based controls on the experience of playing digital games on mobile phones. The natural generalisation is from the sample of players to a wider audience of players in general. But what constitutes the wider audience? It depends very much on the people who participated: what was the range of their experience of games; their experience of accelerometer-based controls; of mobile devices; what was the sort of person who took part, be it young, old, men, women, children; and so on. A very large sample has the potential for greater generalisation but even so if that sample was collected from the undergraduate population of a university, which is very common (Sears, 1986) then generalising out to non-undergraduates may be unsound.

The experiment could also generalise to other mobile phones like the ones used in the experiment so if the study used iPhones, Samsung Galaxy phones (of a similar size) might be a reasonable generalisation. But would the results apply to iPads, Kindles, non-phones like PS Vita or Nintendo DS? Similarly, would the results apply to other games as well? Usually such experiments use one or two games, much as all HCI experiments use a small set of tasks for participants to complete. Do the experimental findings therefore to apply to all games? Or just games like the ones used? In which case how much like?

What is interesting is that in regard to this generalisation across devices and tasks, we rarely place the same emphasis in HCI as generalising across people. A true experiment that wished to generalise to games, should actually sample through the population of games just as we normally sample through the population of people. And then this should be factored into the analysis. But this is almost never done and the only example that I know of is Hassenzahl and Monk (2010) where they aim to overcome previous methodological problems in relating aesthetics and perceived usability by sample through both participants and products. And this is just one aspect of the task. With things like accelerometer-based interactions, as with many interactive modalities, there are parameters and settings that could be tweaked that may have a marked difference on people’s experiences of the interactions.

The true extent of external validity, therefore, can be very hard to judge. My feeling is that we are perhaps a bit keen when using other people’s work to say that it has wider relevance than might be warranted in order that we can use their work in our own. All experiments are of course intended to be particular instances of a more general phenomenon but it would seem there is room in HCI to consider generalising all three aspects (H, C and I not just H) much more explicitly in the design of experiments.

  • 34.0.3.4 Ecological Validity

External validity is concerned with the extent to which the results of the experiment apply to other contexts but only inasmuch that if similar but not identical experiments were run then they would produce the same results. By contrast, ecological validity is concerned with the extent to which the experimental findings would have relevance in the real world in which people find themselves using interactive systems as part of their daily life.

A simple example of this might be to consider different versions of a map App to see whether people are able to use them to navigate better, let’s say, around the tourist sights of a city. Obviously for the purposes of an experiment, participants would be given a navigation task which might be at a real tourist destination (something that would be easy for me living in the historically rich city of York). For example, we might ask participants to navigate from the train station to the city castle. This seems reasonable but is it really how tourists use map Apps? Perhaps some tourists set off with such a destination in mind and allow themselves to be distracted and attracted elsewhere along the way. Other tourists might simply be happy to wander aimlessly without anything but a vague idea for a destination until such time as they need to get back to their hotel at which point the draw on the App to not only help them to navigate but to tell them where they are. Or perhaps they wouldn’t use the App at all but instead prefer a traditional tourist guidebook that would give lots of information and especially tailored maps all in one handy package!

This relevance of experiments to real use is what ecological validity is all about in HCI and it is often carefully discussed. Many studies do take ecological validity seriously and strive to conduct studies in the most realistic context possible. The most extreme example is the kind of testing exemplified by Google where different users simply see different interfaces. This is not a situation set up to see how people use these interfaces, it is people using these interfaces. It could not be more ecologically valid. However, the consequences of high ecological validity, is often the loss of experimental control. In the context of real people using real systems, there can be many other factors that potentially influence the dependent variable. That is, ecologically valid studies have many potential confounds. Google may not need to worry about such things but where effects are small or the aim is to develop underlying theories of interaction, more modest experiments will always need to make a trade-off between internal and ecological validity.

  • 34.0.3.5 Validity as a whole

For all experiments, there is always compromise across all aspects of validity, at the very least ecological because in order to achieve experimental control, some aspects of the real world need to be constrained. Even then though an experiment does not have to be completely unrealistic, for example Anna Cox (personal communication) has set up a well-devised experiment to look at how people manage email but naturally allowed for an in-the-wild style of study in order to establish better ecological validity.

Depending on the complexity of the idea being tested in the experiment, different compromises need to be made. Internal validity can come at a cost to external validity. Too much experimental control results in the inability to generalise to a wide variety of tasks or even systems. Not enough experimental control and internal validity can be damaged. A general sample of the population is desirable but it can produce a lot of natural variation in an experiment that can mask any systematic differences that represent the goal of the experiment. Some aspects of user experience can simply be very difficult to measure, like culture or trust, so you may have to rely on a weak or, at best, a poorly validated construct in order to make progress in this area.

At the end of the day, all experiments are less than ideal. There are always compromises in validity and a researcher can only be honest and explicit about what those compromises were. And as already noted, one experiment is never enough so perhaps the best path is to acknowledge the compromises and do better (or at least something different) in the next experiment.

  • 34.0.4 Statistical Analysis

Anyone who has a reputation (however modest) for being good at statistics will tell you that, on a more or less regular basis, they are approached with the following sort of request: “I’ve just run this experiment but I can’t work out how to analyse the data.” The situation arises in HCI (and Psychology in my experience) because the learning of the necessary statistical methods is not easy. This makes it time-consuming and effortful and furthermore tangential to the actual work of an HCI researcher – they didn’t undertake HCI research in order to learn a lot of statistics! Whereas a researcher may (relatively) quickly learn to see what might make a sensible experiment, it can take a lot more time to be confident about what would constitute appropriate statistical analysis (Hulsizer & Woolf, 2009).

The problem though is that the statistical analysis is not a ritual “tag-on” to experimental methods (Gigerenzer, 2004). It is essential for an experiment to succeed. In the face of natural variation, as exhibited by people when using interactive devices to achieve a variety of goals, there can be no certainty that the experimental manipulation really did influence the dependent variable in some systematic way. One way to see this is to consider an experiment with two conditions where each participant is scored on a test out of 100. You would be (and ought to be) very surprised if the mean score of 20 participants in each condition came out to be identical: you would suspect a cut-and-paste error at the very least. When the mean scores of a study come out as different between the conditions, this is to be expected. Statistics are needed to help identify when such differences are systematic and meaningfully related to the independent variable and when they are just natural variation that you might expect to see all the time.

Even when statistics can be done and have produced a strong result, there can not necessarily be any certainty. Sometimes, natural variation between two different random samples can occasionally be strong enough to look just like a systematic variation. It is just bad luck for the experimenter. A strong statistical result can at best be evidence in support of the experimental aims. Conversely, if an experimenter conducts a lot of tests on different aspects of the data then, by chance, some of them are quite likely to come out as indicating systematic variation. Statistics alone do not provide good evidence.

The strength of a statistical analysis comes from the notion of an experiment as a severe test. An experiment is set up to severely test a particular idea so that if the idea were incorrect, the experiment would be very likely to reveal so. The statistical analysis should serve to support the severe test by directly addressing the aim of the study and any other analysis does not really support the test (however interesting it might turn out to be).

Consider for example a study to look at the use of gestures to control music players in cars. Gestures might improve safety by not requiring drivers to attend to small screens or small buttons when selecting which track or station to listen to while driving. So an experiment is devised where drivers try out different styles of interaction while monitoring their lane-keeping performance (in simulation!). In the analysis, it is found that there was not a statistically significant difference in interaction styles but based on a good insight, the experimenter wondered if this might be due to differences in the handed-ness of the participants. Sure enough when dominant hand of the participants was factored into the analysis, the results were significant and moreover clearer to interpret. Result!

But this experiment is not a severe test of handed-ness differences in gestural interactions while driving because if the experimenter were really interested in such differences then there would have been a planned, not incidental, manipulation in the experiment. The experiment would not have looked like this one. For instance, the sample would deliberately have tried to balance across right- and left-handed people. But once this has been addressed, it may also be relevant as to which side of the road people are used to driving because that determines which hand is expected to be free for gear changes and therefore other hand-gestures. The experimental design ought to factor this in as well. And so the experiment to test this idea is now quite different from the original experiment where handed-ness was incidental.

This explicitly builds on what I have previously called the gold standard statistical argument (Cairns & Cox, 2008). In that account, the aim of the experiment is important because prediction makes unlikely events interesting. Without prediction, unlikely events are just things that happen from time to time. This is the principle behind many magic tricks where very unlikely events, such as a hand of four aces or guessing the card a person has chosen in secret, which might happen by chance occasionally, are done on the first go in front of an audience. An experiment might exhibit the outcome by chance but it carries weight because it happened on the first go when the experimenter said it would. Under severe testing, it is even stronger than this though because the prediction is the foundation of the structure of the experiment. With a different prediction, a different experiment entirely would have been done.

What is interesting is that this has not always been apparent even to experienced researchers (Cohen, 1994). There is a lot of criticism of null hypothesis significance testing (NHST) which is the usual style of testing most people think of when talking about statistics. In NHST, an experiment has an alternative hypothesis which is the causal prediction being investigated in the experiment. It is very hard to “prove” that the alternative hypothesis holds in the face of natural variation so instead, experimenters put forward a null hypothesis that the prediction does not hold and there is no effect. This assumption is then used to calculate how likely it is to get the data from the experiment if there really is nothing going on. This results in the classic p value beloved of statistics. The p value is the probability of getting the data purely by chance, that is if the null hypothesis holds and the prediction, the alternative hypothesis, is wrong. The criticism of this approach is that the null hypothesis and not the alternative hypothesis is used in the statistical calculations and so we learn nothing about the probability of our prediction which is surely what we are really interested in (Cohen 1994).

But this is not logically sound. The fallacy lies in that the whole experiment is devised around the prediction: the null hypothesis is the only place where the counter to the prediction is considered. (There is also a lot of woolly thinking around the meaning of probabilities in the context of experiments which is explicitly related to this but that’s for another time.)

Coming back to the researcher who has devised the experiment but not the statistical analysis, it is clear that they are in a mess for a lot of reasons. First, if they do not know what to analyse, then it suggests that the idea that the experiment can test is not properly articulated. Secondly, it also suggests that perhaps they are looking for a “significant result” which the experiment simply is not in a position to give, often described as fishing. Where the idea is well articulated but the analysis did not give the desired result, the remedy may simply be help for them to see that the “failure” of a well-devised experiment is of value in its own right. It’s not as cool as a significant result but it can be as insightful. Fishing for significance is not the answer.

Given the challenges of setting up an experiment as a severe test and the narrowness of the results it can provide, there is a fall-back position for researchers, namely, that experiments can be exploring the possible influences of many factors on particular interactions or experiential outcomes. In which case, the experiment is not a severe test because there is not particular idea that is being put under duress. Therefore, statistics cannot possibly provide evidence in support of any particular factor influencing the outcome of the experiment. What statistics can indicate is where some results are perhaps more unlikely than might be expected and therefore worthy of further investigation. This may seem like a rather weak “out” for more complex experimental designs but for researchers who say it does not matter, my response would be to ask them for what philosophy of experiment they are therefore using. In the face of natural variation and uncertainty, what constitutes evidence is not easy to determine.

  • 34.0.4.1 Safe analysis

Knowing there are pretty dangerous pitfalls in statistical analysis and that statistics is a challenging area to master, I recommend a solution well known to engineers: keep it simple stupid (KISS). According to severe testing, the experiment should put under duress an idea about how the world works to see if the idea is able to predict what will happen. But as was seen in the previous section, an actual experiment must narrow down many things to produce something that is only a test of a single aspect of the idea. No experiment can test the whole idea (Mayo, 1996). So recognising that, it is always better to have several experiments. There is no need to expect one experiment to deliver everything. Furthermore, if each separate experiment offers clear and unambiguous results then this is a lot better than one big experiment that has a complex design and an even more complex analysis. Simpler experiments are actually more severe tests because there is nowhere to hide: something either clearly passes the test or it does not.

So what are the simple designs that lead to simple, safe statistics? Here are some simple rules that I would recommend to any researcher:

No more than two independent variables, ideally one

At most three conditions per independent variable, ideally two

Only one primary dependent variable (though you should measure other stuff to account for alternative explanations, accidental confounds, experimental manipulation etc)

These seem very restrictive but there is good reason for this: the statistical tests for designs like this are very straightforward. There are well established parametric and non-parametric tests that are easy to understand both in terms of applying them and in terms of interpreting them and the ones for two conditions are even easier than the ones for more than two. It may seem unduly restrictive but keep in mind the idea of a severe test: if there are more variables, independent or dependent, what exactly is being severely tested? And if there are lots of conditions, is there a clear prediction about what will happen in each of the conditions? If not, then the experiment is not testing a well-formed prediction.

Let’s see how this might work in practice. Suppose you are interested in the best layout of interfaces on touchscreens for use by older people. Even just thinking about buttons, there are various factors that are potentially important: button size, button spacing, text on the buttons, the use of a grid or a staggered layout and so on. If you do not know anything at all about this area, the first thing would be to find out whether any of these factors alone influence the use of touchscreen devices by older people. Take for instance button size. The overall goal might be to determine which button size is best. But if you don’t even know how big an effect button size can have, then why try out a dozen? Try out the extreme ends, very small and very large, of what might be considered reasonable sizes. And actually, when you think about it, there is not really opportunity for a huge range on most tablets or smartphones. And if the extreme ends do not make an appreciable difference then sizes that are more similar are not going to be any better. And if button size does make a difference only then is it worth seeing how it might interact with button spacing.

The question remains of what is meant by “best” in a layout. Is it speed and accuracy? These two things generally are measured together as there is often a speed/accuracy trade-off (Salthouse, 1979). But even so, in the context of this work, which is more important? It may be accuracy provided the speed effect is modest. So there’s the primary dependent variable with speed simply offering a possible guard against explanations that have nothing to do with button size.

But of course, as good HCI researchers, we always worry about user experience. Which is preferred by users? That’s an entirely different issue. It, of course, may be measured alongside speed and accuracy but if one layout is highly accurate but less preferred which is best? And if accuracy is irrelevant, then do not measure that in the first place. It’s a red herring!

The temptation is to devise a complicated experiment that could look at all of these things at once but the analysis instantly escalates and also opens up the possibility of over-testing (Cairns, 2007). By contrast, it should be clear that a series of experiments that target different variables are able to give much more unambiguous answers. Moreover, if several different experiments build up a consistent picture then there is far more evidence for a “best layout” because one experiment is always open to the possibility of just getting a good result by chance. This is less the case with lots of experiments and even less so if the experiments are different from each other.

  • 34.0.4.2 Interpreting analysis

Having devised an experiment that is simple to analyse, it is important also not to fall at the final hurdle of interpreting the results of the statistical tests. All statistics, under the traditional NHST style of statistical analysis, ultimately produce a statistic and a p value. The p value is the probability of the experiment having given the result it did purely by chance. The threshold of significance is almost always 0.05 so that if the probability of a chance outcome is less than 1 in 20, the experimental result is declared to be significant. There are other important thresholds as summarised in Table 1. These thresholds are all purely conventional and without specific scientific meaning outside of the convention but within the convention, if a test produces a p < 0.05 then it is deemed to be significant and the experiment has “worked.” That is, the experiment is providing evidence in support of the alternative hypothesis. If p comes out as more than 0.05 then the experiment has not worked and the null hypothesis is more likely.

But this is not really a fair picture. Because the thresholds are conventionalised, we need to recognise the apparent arbitrariness of the conventions and not make such black and white interpretations. So in particular, if p is close to 0.05 but slightly bigger than it, then there is the possibility that the experiment is working as intended but there is some issue such as a small sample size, a source of noise in your data or an insufficiently focused task that means the experiment is not able to give an unambiguous answer. Failure to meet the 0.05 threshold is not an indication that nothing is going on. Indeed, any insignificant result does not support that nothing is going on but only that this experiment is not showing what was expected to happen. So where p<0.1 it is usual to consider this as approaching significance or marginally significant and it should be interpreted, very cautiously, as potentially becoming interesting.

0.1 < p

Not significant

Prediction is not true

0.05 < p ≤ 0.1

Marginally significant,

Approaching significance

Prediction is likely to be true but might not be

0.01< p ≤ 0.05

Significant

Prediction is pretty sure to be true

0.001 < p ≤ 0.01

Significant,

Highly significant

Prediction is really true

p ≤ 0.001

Significant,

Highly significant

Prediction is really true

Table 1: The conventional thresholds for interpreting p values and their inflexible interpretations

At the other extreme, where p is very small so 0.001 or even less, this however is not an indication that is more convincing or important that any other p value less than 0.05. Such values surely can come out by chance (one in a million chances occur a lot more often than you might think depending on how you frame them) but if an effect was expected in the experiment then such small values ought to appear as well. This is where there is a move, within psychology at least, to also always report effect sizes where possible. Effect sizes are measures of importance because they indicate how much effect the experimental manipulation had on the differences seen in the data. Unlike p values though, there is no convention for what makes an important effect size, it depends on the type of study you are doing and what sort of effects are important.

One measure of effect that is used alongside t-tests is Cohen’s d. This statistic basically compares the difference in means between two experimental conditions against the base level of variation seen in participants. This variation is represented by the pooled standard deviation in the data collected but do not worry if you are not sure what that means (yet). A d value of 1 means (roughly) that the experimental conditions have as much effect as 1 standard deviation between participants, that is, the average variation between participants. That’s a really big effect because the experimental manipulation is easily perceived compared to natural variation. By contrast, d=0.1 means that the average variation between participants is a lot larger than the variation caused by the experimental conditions so the effect is not easily seen. What is interesting is that with a large sample it is possible, in fact common, to have p<0.001 but a small d value. This should be interpreted as a systematic difference in conditions but one which is easily swamped by other factors. Unlike p values, the importance of such small effects is heavily context dependent and so requires the experimenter to make an explicit interpretation.

Thinking this through further, a small sample that gives a significant effect is also more likely to demonstrate a big effect. That is, a smaller sample that gives significance is potentially more convincing than a larger sample! This is counter to much advice on sample sizes in experiments but is a consequence of thinking about experiments as severe tests (Mayo and Spanos, 2010). If an idea is able to predict a robust effect then it ought to be seen in small samples when put under a severe test. What small samples do threaten is generalizability and the risk of confounds due to consistent differences that appear by chance within a small number of participants. Even so, where these differences are not seen, small samples provide more convincing evidence!

The real snag with effect sizes is that they are only easily generated for parametric tests, that is, situations where the data follows the classic bell curve of the normal distribution. But many situations in HCI do not have such data. For example, data on the number of errors that people make typically has what is called a long-tailed distribution with most participants making very few errors, smaller number making several errors and one or two people making a lot of errors. Such data can only be robustly analysed using non-parametric tests and these do not provide easy measures of effect size. There are emerging measures of effect size that can be used in non-parametric situations (Vargha and Delaney, 2000) but they are not widely used and therefore not widely understood but hopefully this will merely be a matter of time and perhaps something that the HCI community could be in a position to lead, particularly within computer science.

Overall then, statistical analysis should be planned as part of the experiment and wherever possible simpler designs preferred over more complex ones so that the interpretation is clear. Moreover, where possible, effect sizes offer a meaningful way to assess the importance of a result, particularly when results are very significant or marginally significant.

  • 34.0.5 Experimental Write-up

It should be clear why both experimental design and statistical analysis are important pillars in the development of experiments. It may not be so clear why write-up is sufficiently important to be considered a pillar. Of course, a report in the form of an article or dissertation is essential to communicate the experiment and gain recognition for the work done. Hornbaek (2013) articulates many of the important features of a write-up (and I recommend the reader also read that article alongside this) in terms of enabling the reader to follow the chain of evidence being presented and to value it. To be clear though, people do not write-up experiments to be replicated: if a researcher replicates an experiment they may struggle to get published even when replication seems essential (Ritchie et al., 2012). Hornbaek (ibid) suggests that a write-up can help experimental design but I would go further. An experimental write-up is an important pillar in the development of experiments because it forces a commitment to ideas and from that the experiment its validity, analysis and meaning become scrutable.

Contrast this with describing an experiment in an academic conversation such as a supervision. Some of the details may yet be fluid and revised as a result of the conversation. This is the advantage of dialogue, as Socrates knew (Plato, 1973, Phaedrus). It is through talking and engaging with a person that their ideas and thoughts are best revealed. However, a dialogue is ephemeral and, YouTube notwithstanding, is generally only accessible to those present at the time. By writing the experimental description down, the details are usually presented in full (or at least substantially) and made available to others (usually supervisors) to understand and critique. As a consequence, it is possible to critique the experiment for validity in a measured and considered way.

If an experiment is found to be lacking in validity after it has been conducted then there is a problem. The data cannot be relied on to be meaningful. Or put another way, the test was not severe and so contributes little to our understanding. But I would argue that an experimental write-up not only could be written up before conducting any experiment but indeed should be written up before because without a write-up in advance, how can a researcher be sure what the experiment is let alone whether it is good?

What particularly supports writing up in advance is that all experimental reports have a more or less established structure:

Title and Abstract

Motivation/Literature Review

Experimental Method

Hypothesis – idea being tested

Participants – summary of who took part

Design – specify independent and dependent variables

And with the Experimental Method section, there are also clearly defined subsections:

Tasks are not always considered separately but this is a particular issue for HCI, as discussed with reference to External Validity above, because the Tasks may in fact be the subject of study rather than the participants.

The literature review perhaps has a special place in the experimental write-up in that it is not about the experiment itself but rather about what it might mean. A literature review in this sense serves two purposes: first to motivate the experiment as having some value; secondly to explain to the readers enough so that they can understand what the experiment is showing. The motivation for value comes either from doing work that is important or interesting. For instance, an experiment looking to reduce errors could be important because it could save time, money or even lives. Or it might be interesting simply because lots of people are looking at that particular problem, for instance, whether new game controllers lead to better user experiences. In the best cases, the experiment is both important and interesting by solving problems that people in the research community want solving and also by having an impact outside of the research community in ways that other people, communities or society value.

A literature review in itself though does not speak to what an experiment ought to be: there is no necessity in any experiment because like any designed artefact it must fit to the context in which it is designed. Instead, a literature review perhaps scopes what a good experiment should look like by defining the gap in knowledge that a suitable experiment could fill. How an experimenter fills that gap is a matter of design. In fact, a literature review is not even necessary do doing a good quality experiment but without a literature review the risk is that you redo an existing experiment or worse do an experiment that no-one else is interested in.

There are many good textbooks on experimental write-ups. My favourite is Harris (2008) but that is purely a matter of taste. Such books give clear guidelines not only as to the structure of an experimental report, in terms of these headings, but also what should be expected under each heading. Rather than rehearse these things here, the goal is to relate the typical experimental write-up of the method section back to the previous two pillars of experimental work as described here.

Construct validity or “are you measuring what you think you are measuring” appears primarily in the design when the dependent variable is specified. This must relate to the concepts that are under scrutiny either through being obvious (time is a measure of efficiency), through comparison with other studies or because a case has been made why it is a valid measure. Both of these latter two ought to have been clearly established in the literature review. In particular, as has been mentioned, an experiment that is used to validate a questionnaire and test a different idea at the same time does not make sense. Also, having a valid construct means that the statistical analysis must only focus on that construct. There is no equivocation about what constitutes the severe test when it comes to the statistics.

Internal validity is concern for confounds and the relevance of the experiment. Confounds typically become visible in the materials, tasks and procedure sections where problems may be arising from either what is used in the experiment or what is done in the experiment. Participants may not be guided correctly or there may be issues in training which are usually represented somewhere in materials or the procedure section. Furthermore, participants themselves may be a source of confounds because some are more experienced than others in a way that is relevant to the experimental tasks. Initially, these sorts of problems may not be apparent but in writing up these sections, doubts should not only be allowed to creep in but should be actively encouraged.

Internal validity also permeates the statistical analysis. Where an experiment has a clear causal connection to establish, the statistics should all be related to establishing whether or not that connection exists, at least in the data gathered. Other analysis may be interesting but does not contribute to the internal argument of the study nor can it constitute a severe test of a different idea.

External validity is about how well this experiment is representative of other experiments and therefore to what extent the results of this experiment generalise and would be seen in other, similar experiments. Participants and tasks are obvious directions in which to generalise and it is these that need to be clear. It is not necessary to specifically articulate what the expected generalisation is (at least not in the method section) but rather leave it to the reader to make this call for themselves. But at the point of devising an experiment, a researcher should be able to ask themselves to what extent does this experiment have a credible generalisation.

Ecological validity is the least easy to specify in a write up and essentially has to be a judgment call over the entire structure of the experiment. The design and procedure sections basically describe what happens in the experiment. In writing up an experiment, there may be specific references to ecological validity for particular design decisions made, particularly where it is being traded against the other forms of validity. In some cases though, it is not necessary to make any reference explicitly to ecological validity but let it be understood through the experiment as a whole. The reader must decide for themselves to what extent the experiment is sufficiently relevant to the real world interactions between people and systems.

The Discussion section is where a researcher is able to defend their choices and acknowledge the limitations of the experiment done. Usually, the discussion is considered only after the results are in because naturally, you cannot discuss the results till you know what they are. But this is not quite true. An experiment, strictly speaking, produces only one of two results: there is a significant difference in relation to the idea being tested or there is not a significant difference. Significance is a great result but what would be the discussion if it were not significant? Normally, the discussion looks at the limitations of the experiment in this case and suggests where there may be weaknesses leading to an unexpected null result. Yet this could be written in advance in (pessimistic) anticipation of just such a result. So why not do that first then make the experiment better? And if the experiment cannot be made better, then this can be thrown out as a challenge to future researchers to see if they can do better because when all is said and done, every experiment has its limitations.

It may of course be the case that there is no effect to be seen in the experiment despite a cogent argument, presumably earlier in the report, that there should be (otherwise why do this experiment). This highlights, though, a particular weakness of experiments. They are very good at showing when things are causally related but poor at demonstrating the absence of an effect. An absence of effect could be because the effect is hard to see not because it is actually absent. Weaknesses in experimental design, measurement of variables or variability between participants could all account for failing to see an effect and there cannot be any certainty. The only situation where a null result may carry more weight is where the effect has previously been strongly established but has failed to re-appear in some new context. Even then, experimental elements may be preventing the effect from being seen. In the case though that experiments are close to replications of other studies then null effects may start to get interesting. In which case, then once again the discussion around the null result can be written in advance.

Where the results are significant, the experiment is still limited. It is only one test of one aspect of some larger idea. So where are the limitations that lead to the further work? What were the compromises made in order to produce this particular experiment? How might they be mitigated in future? What else could be done to test this same aspect of the idea a different way? And what other aspects ought to be tested too? It is perfectly possible to have two versions of the discussion ready to insert into a report depending on either outcome of the analysis.

There are of course subtleties in any experiment that produce unexpected outcomes. Yes, the result is significant but the participants did strange things or were all heavy gamers or hated iPhones. These need more careful discussion and really can only be written after the data has been gathered. But even accounting for this, most of the experimental write up can be constructed before running a single participant: the literature review, the experimental method, the discussion and even the skeleton of the results because the statistical analysis should also be known in advance. Moreover, if the experiment is not convincing to you or a colleague, then no time has been wasted in gathering useless data. Make the experiment have a convincing write-up before doing the study and it is much more likely to be a good experiment.

  • 34.0.5.1 Summary

Each aspect of a write-up is necessary for a reason. The Method section reveals validity. The Discussion section accounts for the compromises made in validity and examines if they were acceptable. The Results section provides the analysis to support the Discussion. Together they make any experiment scrutable to others but, more usefully, can be used ahead of doing an experiment to make the experiment scrutable to the experimenter,

A general maxim that I use when writing up is that I need to be so clear that if I am wrong it is obvious. I believe that it is through such transparency and commitment to honesty that science is able to advance (Feynman, 1992). How this plays out in experimental write-ups is that if the experiment is giving a wrong result (albeit one I cannot see), then the diligent reader should be able to see it for me. The intention of any researcher is to do the best experiment possible but even with the best will in the world, this is not always achievable. The write-up is about acknowledging this not only to the reader but also to yourself.

  • 34.0.6 Summing up the experimental pillars

Experiments are an important method in the HCI researcher’s toolkit. They have a certain “look and feel” about them which is easy to identify and also moderately easy to emulate. The problem is that if the formal, apparently ritualistic, structures of an experiment are observed without understanding the purpose of the formalities, there is a real risk of producing an experiment that is unable to provide a useful research contribution, much like Cargo Cult Science (Feynman, 1992).

I have presented here what I think are key pillars in the construction of an effective experiment to reveal the reason for the formalities against the backdrop of experiments as severe tests of ideas. The experimental design is constructed to make the test valid and the write-up makes the validity scrutable. The analysis reveals whether the data supports the idea under test and so is an essential component of the experimental design. Furthermore, the write-up can be used as a constructive tool to allow a researcher to enter a dialogue with themselves and others about the effectiveness of the experiment before it has been carried out.

Of course, not every problem can be solved with an experiment but where there is a clearly articulated idea about how the world works and how one thing influences another, an experiment can be a way to show this that is both rigorous and defensible. The strength of an experiment though comes from the three things coming together: the experimental design to produce good data; the statistical analysis to produce clear interpretations; the write-up to present the findings. Without any one of these, an experiment is not able to make a reliable contribution to knowledge. This chapter goes a step further and holds that by taking each of these seriously before running an experiment, it is possible to produce better more rigorous and more defensible experiments in Human-Computer Interaction.

  • 34.0.7 References

Abelson, R.J. (1995) Statistics as Principled Argument , Lawrence Erlbaum Assoc.

Andersen, E., O'Rourke, E., Liu, Y-E., Snider, R., Lowdermilk, J., Truong, D., Cooper, S. and Popovic, Z. (2012) The impact of tutorials on games of varying complexity . Proc. of ACM CHI 2012 , ACM Press, 59-68

Blythe, M., Bardzell, J., Bardzell, S. and Blackwell, A. (2008) Critical issues in interaction design . Proc. of BCS HCI 2008 vol. 2, BCS, , 183-184.

Blythe, M., Overbeeke, K., Monk, A.F., Wright, P.C. (2003) Funology: from usability to enjoyment. Kluwer Academic Publishers.

Cairns, P. (2007) HCI. . . not as it should be: inferential statistics in HCI research. Proc. of BCS HCI 2007 vol 1, BCS, 195-201.

Cairns, P. (2013) A commentary on short questionnaires for assessing usability. Interacting with Computers , 25(4), 327-330

Cairns and Cox (2008a) Using statistics in usability research . In Cairns and Cox (2008b)

Cairns, P., Cox, A., eds (2008b) Research Methods for Human-Computer Interaction, Cambridge University Press.

Card, S. K., Moran, T. P. and Newell A.(1980) The keystroke-level model for user performance time with interactive systems. Communications of the ACM , 23(7 ), July 1980, 396-410.

Chalmers, A.F. (1999) What is this thing called science? 3rd edn. Open University Press

Cohen, Jacob (1994) The earth is round (p < .05). American Psychologist , 49(12), Dec 1994, 997-1003

Cox, A.L., Cairns, P., Berthouze, N. and Jennett, C. (2006) The use of eyetracking for measuring immersion. In: (Proceedings) workshop on What have eye movements told us so far, and what is next? at Proc. of CogSci2006, the Twenty-Eighth Annual Meeting of the Cognitive Science Society , Vancouver, Canada, July 26-29, 2006.

Cox, A. L., & Young, R. M. (2000). Device-oriented and task-oriented exploratory learning of interactive devices. Proc. of ICCM , 70-77.

Dowell, J. , Long, J. (1989) Towards a conception for an engineering disicipline of human-factors, Ergonomics , 32(11), 1513-1535.

English, W.K., Engelbart, D.C. and Berman, M.L. (1967) Display-selection techniques for text manipulation , IEEE Trans. Hum. Factors Electron ., vol. HFE-8, Mar. 1967, 5-15.

Feyerabend, P. K. (2010) Against Method, 4th edn. Verso.

Feynman, R. P. (1992) Surely you’re joking, Mr Feynman . Vintage.

Field, A., Hole, G. (2003) How to Design and Report Experiments. Sage Publications, London.

Gergle, D. and Tan, D. (2014). Experimental Research in HCI. In Olson, J.S. and Kellogg, W. (eds) Ways of Knowing in HCI . Springer, 191-227.

Gigerenzer, G. (2004) Mindless statistics. The Journal of Socio-Economics , 33( 5), 587-606.

Hacking, I. (1983) Representing and intervening. Cambridge University Press.

Harris, P. (2008) Designing and Reporting Experiment in Psychology, 3rd edn . Open University Press.

Hassenzahl, M., Monk, A. (2010) The Inference of Perceived Usability From Beauty , Human–Computer Interaction , 25(3), 235-260.

Hornbaek , K. (2011) Some whys and hows of Experiments in Human-Computer Interaction, Foundations and Trends in Human–Computer Interaction, 5(4), 299–373

Hornbæk, K. and Law, E. (2007) Meta-analysis of Correlations among Usability Measures, Proc. of ACM CHI 2007 , ACM Press, 617-626.

Hulsizer, M. R, Woolf, L. M. (2009) A guide to teaching statistics. Wiley-Blackwell.

Jennett, C., Cox, A.L., Cairns, P., Dhoparee, S., Epps, A., Tijs, T., Walton, A. (2008). Measuring and Defining the Experience of Immersion in Games. International Journal of Human Computer Studies , 66(9), 641-661

Jordan, p. (2002) Designing Pleasurable Products, CRC Press.

Kahneman, D. (2012) Thinking Fast and Slow, Penguin.

Kline, P. (2000) A Psychometrics Primer . Free Association Books.

Kuhn (1996), The Structure of Scientific Revolutions, 3rd edn. University of Chicago Press.

Lawson, B. (1997) How designers think: the process demystified. 3rd edn. Architectural Press.

Lazar, J., Feng, J. H., Hocheiser, H. (2009) Research Methods in Human-Computer Interaction, John Wiley & Sons.

MacKenzie, I. S. (1992) Fitts' law as a research and design tool in human-computer interaction. Human-Computer Interaction, 7(1), 91-139.

MacKenzie, I. S., Buxton, W. (1992) Extending Fitts’ Law to 2d tasks, Proc. of ACM CHI 1992 , 219-226

Malone. T. W. (1982) Heuristics for designing enjoyable user interfaces: Lessons from computer games. Proc. ACM CHI 1982 . ACM Press, 63-68

Mayo, D. (1996) Error and the Growth of Experimental Knowledge , University of Chicago Press.

Mayo, D, Spanos, A. eds (2010) Error and Inference. Cambridge University Press.

McCarthy, J., Wright, P. (2007) Technology as Experience , MIT Press

Nordin, A.I., Cairns, P., Hudson, M., Alonso, A., Calvillo Gamez, E. H. (2014)The effect of surroundings on gaming experience. In Proc. of 9th Foundation of Digital Games.

Popper, K. (1977) The logic of scientific discovery . Routledge.

Sharp, H. , Preece, J., Rogers, Y. (2010) Interaction Design, 3rd edn . John Wiley & Sons.

Plato (1954) The Last Days of Socrates . Penguin.

Plato (1973) Phaedrus and Letters VII and VIII. Penguin .

Purchase, H. (2012) Experimental Human-Computer Interaction , Cambridge University Press.

Ravaja, N., Saari, T., Salminen, J., Kallinen, K. (2006) Phasic Emotional Reactions to Video Game Events: A Psychophysiological Investigation , Media Psychology , 8(4), 343-367.

Ritchie, S.J., Wiseman, R, French, C. C. (2012) Failing the Future: Three Unsuccessful Attempts to Replicate Bem's ‘Retroactive Facilitation of Recall’ Effect. PLoS ONE, 7(3): e33423.

Salthouse, T. A. (1979) Adult age and the speed-accuracy trade-off, Ergonomics , 22(7), 811-821.

Sauro, J., Lewis, J. R. (2012) Quantifying the user experience . Morgan Kaufmann.

Sears, D. O. (1986), College sophomores in the laboratory: Influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology , 51(3), 515-530.

Simons, D. J., Chabris, C. F. (1999) Gorillas in our midst. Perception , 28, 1059-1074.

Smith, J. (2012) Applying Fitts’ Law to Mobile Interface Design .

http://webdesign.tutsplus.com/articles/applying-fitts-law-to-mobile-interface-design--webdesign-6919 , accessed 28th April, 2014

Suchman, L. (1987) Plans and Situated Actions, 2nd edn , Cambridge University Press.

Thimbleby, H. (2013) Action Graphs and User Performance Analysis. International Journal of Human-Computer Studies , 71(3), 276–302.

Tidwell, J. (2005) Designing Interfaces: Patterns for Effective Interaction Design. O’Reilly.

Tullis, T., Albert, W. (2010) Measuring the User Experience, Morgan Kaufmann.

Vargha , A., Delaney , H., D. (2000) A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong, Journal of Educational and Behavioral Statistics, 25(2) , 101-132

Wilson M. (2002) Six views of embodied cognition. Psychonomic Bulletin & Review , 9, 625–636

Wobbrock, J.O., Findlater, L., Gergle, D. and Higgins, J.J. (2011) The aligned rank transform for nonparametric factorial analyses using only anova procedures. Proc. of ACM CHI 2011, ACM Press, 143-146

Yin, R. K. (2003) Case Study Research: Design and Methods, 3rd edn . Sage Publications.

  • 34 Experimental Methods in Human-Computer Interaction

User Experience: The Beginner’s Guide

experiment definition computer

Get Weekly Design Tips

Topics in this book chapter, open access—link to us.

We believe in Open Access and the  democratization of knowledge . Unfortunately, world-class educational materials such as this page are normally hidden behind paywalls or in expensive textbooks.

If you want this to change , cite this book chapter , link to us, or join us to help us democratize design knowledge !

Privacy Settings

Our digital services use necessary tracking technologies, including third-party cookies, for security, functionality, and to uphold user rights. Optional cookies offer enhanced features, and analytics.

Experience the full potential of our site that remembers your preferences and supports secure sign-in.

Governs the storage of data necessary for maintaining website security, user authentication, and fraud prevention mechanisms.

Enhanced Functionality

Saves your settings and preferences, like your location, for a more personalized experience.

Referral Program

We use cookies to enable our referral program, giving you and your friends discounts.

Error Reporting

We share user ID with Bugsnag and NewRelic to help us track errors and fix issues.

Optimize your experience by allowing us to monitor site usage. You’ll enjoy a smoother, more personalized journey without compromising your privacy.

Analytics Storage

Collects anonymous data on how you navigate and interact, helping us make informed improvements.

Differentiates real visitors from automated bots, ensuring accurate usage data and improving your website experience.

Lets us tailor your digital ads to match your interests, making them more relevant and useful to you.

Advertising Storage

Stores information for better-targeted advertising, enhancing your online ad experience.

Personalization Storage

Permits storing data to personalize content and ads across Google services based on user behavior, enhancing overall user experience.

Advertising Personalization

Allows for content and ad personalization across Google services based on user behavior. This consent enhances user experiences.

Enables personalizing ads based on user data and interactions, allowing for more relevant advertising experiences across Google services.

Receive more relevant advertisements by sharing your interests and behavior with our trusted advertising partners.

Enables better ad targeting and measurement on Meta platforms, making ads you see more relevant.

Allows for improved ad effectiveness and measurement through Meta’s Conversions API, ensuring privacy-compliant data sharing.

LinkedIn Insights

Tracks conversions, retargeting, and web analytics for LinkedIn ad campaigns, enhancing ad relevance and performance.

LinkedIn CAPI

Enhances LinkedIn advertising through server-side event tracking, offering more accurate measurement and personalization.

Google Ads Tag

Tracks ad performance and user engagement, helping deliver ads that are most useful to you.

Share Knowledge, Get Respect!

or copy link

Cite according to academic standards

Simply copy and paste the text below into your bibliographic reference list, onto your blog, or anywhere else. You can also just hyperlink to this book chapter.

New to UX Design? We’re giving you a free ebook!

The Basics of User Experience Design

Download our free ebook The Basics of User Experience Design to learn about core concepts of UX design.

In 9 chapters, we’ll cover: conducting user interviews, design thinking, interaction design, mobile UX design, usability, UX research, and many more!

Download Premium UX Design Literature

Enjoy unlimited downloads of our literature. Our online textbooks are written by 100+ leading designers, bestselling authors and Ivy League professors.

experiment definition computer

New to UX Design? We’re Giving You a Free ebook!

Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center
  • Introduction & Top Questions

Analog computers

Mainframe computer.

  • Supercomputer
  • Minicomputer
  • Microcomputer
  • Laptop computer
  • Embedded processors
  • Central processing unit
  • Main memory
  • Secondary memory
  • Input devices
  • Output devices
  • Communication devices
  • Peripheral interfaces
  • Fabrication
  • Transistor size
  • Power consumption
  • Quantum computing
  • Molecular computing
  • Role of operating systems
  • Multiuser systems
  • Thin systems
  • Reactive systems
  • Operating system design approaches
  • Local area networks
  • Wide area networks
  • Business and personal software
  • Scientific and engineering software
  • Internet and collaborative software
  • Games and entertainment
  • Analog calculators: from Napier’s logarithms to the slide rule
  • Digital calculators: from the Calculating Clock to the Arithmometer
  • The Jacquard loom
  • The Difference Engine
  • The Analytical Engine
  • Ada Lovelace, the first programmer
  • Herman Hollerith’s census tabulator
  • Other early business machine companies
  • Vannevar Bush’s Differential Analyzer
  • Howard Aiken’s digital calculators
  • The Turing machine
  • The Atanasoff-Berry Computer
  • The first computer network
  • Konrad Zuse
  • Bigger brains
  • Von Neumann’s “Preliminary Discussion”
  • The first stored-program machines
  • Machine language
  • Zuse’s Plankalkül
  • Interpreters
  • Grace Murray Hopper
  • IBM develops FORTRAN
  • Control programs
  • The IBM 360
  • Time-sharing from Project MAC to UNIX
  • Minicomputers
  • Integrated circuits
  • The Intel 4004
  • Early computer enthusiasts
  • The hobby market expands
  • From Star Trek to Microsoft
  • Application software
  • Commodore and Tandy enter the field
  • The graphical user interface
  • The IBM Personal Computer
  • Microsoft’s Windows operating system
  • Workstation computers
  • Embedded systems
  • Handheld digital devices
  • The Internet
  • Social networking
  • Ubiquitous computing

A laptop computer

What is a computer?

Who invented the computer, what can computers do, are computers conscious, what is the impact of computer artificial intelligence (ai) on society.

Technical insides of a desktop computer

Our editors will review what you’ve submitted and determine whether to revise the article.

  • University of Rhode Island - College of Arts and Sciences - Department of Computer Science and Statistics - History of Computers
  • LiveScience - History of Computers: A Brief Timeline
  • Computer History Museum - Timeline of Computer history
  • Engineering LibreTexts - What is a computer?
  • Computer Hope - What is a Computer?
  • computer - Children's Encyclopedia (Ages 8-11)
  • computer - Student Encyclopedia (Ages 11 and up)
  • Table Of Contents

A laptop computer

A computer is a machine that can store and process information . Most computers rely on a binary system , which uses two variables, 0 and 1, to complete tasks such as storing data, calculating algorithms, and displaying information. Computers come in many different shapes and sizes, from handheld smartphones to supercomputers weighing more than 300 tons.

Many people throughout history are credited with developing early prototypes that led to the modern computer. During World War II, physicist John Mauchly , engineer J. Presper Eckert, Jr. , and their colleagues at the University of Pennsylvania designed the first programmable general-purpose electronic digital computer, the Electronic Numerical Integrator and Computer (ENIAC).

What is the most powerful computer in the world?

As of November 2021 the most powerful computer in the world is the Japanese supercomputer Fugaku, developed by RIKEN and Fujitsu . It has been used to model COVID-19 simulations.

How do programming languages work?

Popular modern programming languages , such as JavaScript and Python, work through multiple forms of programming paradigms. Functional programming, which uses mathematical functions to give outputs based on data input, is one of the more common ways code is used to provide instructions for a computer.

The most powerful computers can perform extremely complex tasks, such as simulating nuclear weapon experiments and predicting the development of climate change . The development of quantum computers , machines that can handle a large number of calculations through quantum parallelism (derived from superposition ), would be able to do even more-complex tasks.

A computer’s ability to gain consciousness is a widely debated topic. Some argue that consciousness depends on self-awareness and the ability to think , which means that computers are conscious because they recognize their environment and can process data. Others believe that human consciousness can never be replicated by physical processes. Read one researcher’s perspective.

Computer artificial intelligence's impact on society is widely debated. Many argue that AI improves the quality of everyday life by doing routine and even complicated tasks better than humans can, making life simpler, safer, and more efficient. Others argue AI poses dangerous privacy risks, exacerbates racism by standardizing people, and costs workers their jobs leading to greater unemployment. For more on the debate over artificial intelligence, visit ProCon.org .

Recent News

computer , device for processing, storing, and displaying information.

Computer once meant a person who did computations, but now the term almost universally refers to automated electronic machinery . The first section of this article focuses on modern digital electronic computers and their design, constituent parts, and applications. The second section covers the history of computing. For details on computer architecture , software , and theory, see computer science .

Computing basics

The first computers were used primarily for numerical calculations. However, as any information can be numerically encoded, people soon realized that computers are capable of general-purpose information processing . Their capacity to handle large amounts of data has extended the range and accuracy of weather forecasting . Their speed has allowed them to make decisions about routing telephone connections through a network and to control mechanical systems such as automobiles, nuclear reactors, and robotic surgical tools. They are also cheap enough to be embedded in everyday appliances and to make clothes dryers and rice cookers “smart.” Computers have allowed us to pose and answer questions that were difficult to pursue in the past. These questions might be about DNA sequences in genes, patterns of activity in a consumer market, or all the uses of a word in texts that have been stored in a database . Increasingly, computers can also learn and adapt as they operate by using processes such as machine learning .

Internet http://www blue screen. Hompepage blog 2009, history and society, media news television, crowd opinion protest, In the News 2009, breaking news

Computers also have limitations, some of which are theoretical. For example, there are undecidable propositions whose truth cannot be determined within a given set of rules, such as the logical structure of a computer. Because no universal algorithmic method can exist to identify such propositions, a computer asked to obtain the truth of such a proposition will (unless forcibly interrupted) continue indefinitely—a condition known as the “ halting problem .” ( See Turing machine .) Other limitations reflect current technology . For example, although computers have progressed greatly in terms of processing data and using artificial intelligence algorithms , they are limited by their incapacity to think in a more holistic fashion. Computers may imitate humans—quite effectively, even—but imitation may not replace the human element in social interaction. Ethical concerns also limit computers, because computers rely on data, rather than a moral compass or human conscience , to make decisions.

Analog computers use continuous physical magnitudes to represent quantitative information. At first they represented quantities with mechanical components ( see differential analyzer and integrator ), but after World War II voltages were used; by the 1960s digital computers had largely replaced them. Nonetheless, analog computers, and some hybrid digital-analog systems, continued in use through the 1960s in tasks such as aircraft and spaceflight simulation.

experiment definition computer

One advantage of analog computation is that it may be relatively simple to design and build an analog computer to solve a single problem. Another advantage is that analog computers can frequently represent and solve a problem in “real time”; that is, the computation proceeds at the same rate as the system being modeled by it. Their main disadvantages are that analog representations are limited in precision—typically a few decimal places but fewer in complex mechanisms—and general-purpose devices are expensive and not easily programmed.

Digital computers

In contrast to analog computers, digital computers represent information in discrete form, generally as sequences of 0s and 1s ( binary digits, or bits). The modern era of digital computers began in the late 1930s and early 1940s in the United States , Britain, and Germany . The first devices used switches operated by electromagnets (relays). Their programs were stored on punched paper tape or cards, and they had limited internal data storage. For historical developments, see the section Invention of the modern computer .

During the 1950s and ’60s, Unisys (maker of the UNIVAC computer), International Business Machines Corporation (IBM), and other companies made large, expensive computers of increasing power . They were used by major corporations and government research laboratories, typically as the sole computer in the organization. In 1959 the IBM 1401 computer rented for $8,000 per month (early IBM machines were almost always leased rather than sold), and in 1964 the largest IBM S/360 computer cost several million dollars.

These computers came to be called mainframes, though the term did not become common until smaller computers were built. Mainframe computers were characterized by having (for their time) large storage capabilities, fast components, and powerful computational abilities. They were highly reliable, and, because they frequently served vital needs in an organization, they were sometimes designed with redundant components that let them survive partial failures. Because they were complex systems, they were operated by a staff of systems programmers, who alone had access to the computer. Other users submitted “batch jobs” to be run one at a time on the mainframe.

Such systems remain important today, though they are no longer the sole, or even primary, central computing resource of an organization, which will typically have hundreds or thousands of personal computers (PCs). Mainframes now provide high-capacity data storage for Internet servers, or, through time-sharing techniques, they allow hundreds or thousands of users to run programs simultaneously. Because of their current roles, these computers are now called servers rather than mainframes.

What Is a Controlled Experiment?

Definition and Example

  • Scientific Method
  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

A controlled experiment is one in which everything is held constant except for one variable . Usually, a set of data is taken to be a control group , which is commonly the normal or usual state, and one or more other groups are examined where all conditions are identical to the control group and to each other except for one variable.

Sometimes it's necessary to change more than one variable, but all of the other experimental conditions will be controlled so that only the variables being examined change. And what is measured is the variables' amount or the way in which they change.

Controlled Experiment

  • A controlled experiment is simply an experiment in which all factors are held constant except for one: the independent variable.
  • A common type of controlled experiment compares a control group against an experimental group. All variables are identical between the two groups except for the factor being tested.
  • The advantage of a controlled experiment is that it is easier to eliminate uncertainty about the significance of the results.

Example of a Controlled Experiment

Let's say you want to know if the type of soil affects how long it takes a seed to germinate, and you decide to set up a controlled experiment to answer the question. You might take five identical pots, fill each with a different type of soil, plant identical bean seeds in each pot, place the pots in a sunny window, water them equally, and measure how long it takes for the seeds in each pot to sprout.

This is a controlled experiment because your goal is to keep every variable constant except the type of soil you use. You control these features.

Why Controlled Experiments Are Important

The big advantage of a controlled experiment is that you can eliminate much of the uncertainty about your results. If you couldn't control each variable, you might end up with a confusing outcome.

For example, if you planted different types of seeds in each of the pots, trying to determine if soil type affected germination, you might find some types of seeds germinate faster than others. You wouldn't be able to say, with any degree of certainty, that the rate of germination was due to the type of soil. It might as well have been due to the type of seeds.

Or, if you had placed some pots in a sunny window and some in the shade or watered some pots more than others, you could get mixed results. The value of a controlled experiment is that it yields a high degree of confidence in the outcome. You know which variable caused or did not cause a change.

Are All Experiments Controlled?

No, they are not. It's still possible to obtain useful data from uncontrolled experiments, but it's harder to draw conclusions based on the data.

An example of an area where controlled experiments are difficult is human testing. Say you want to know if a new diet pill helps with weight loss. You can collect a sample of people, give each of them the pill, and measure their weight. You can try to control as many variables as possible, such as how much exercise they get or how many calories they eat.

However, you will have several uncontrolled variables, which may include age, gender, genetic predisposition toward a high or low metabolism, how overweight they were before starting the test, whether they inadvertently eat something that interacts with the drug, etc.

Scientists try to record as much data as possible when conducting uncontrolled experiments, so they can see additional factors that may be affecting their results. Although it is harder to draw conclusions from uncontrolled experiments, new patterns often emerge that would not have been observable in a controlled experiment.

For example, you may notice the diet drug seems to work for female subjects, but not for male subjects, and this may lead to further experimentation and a possible breakthrough. If you had only been able to perform a controlled experiment, perhaps on male clones alone, you would have missed this connection.

  • Box, George E. P., et al.  Statistics for Experimenters: Design, Innovation, and Discovery . Wiley-Interscience, a John Wiley & Soncs, Inc., Publication, 2005. 
  • Creswell, John W.  Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research . Pearson/Merrill Prentice Hall, 2008.
  • Pronzato, L. "Optimal experimental design and some related control problems". Automatica . 2008.
  • Robbins, H. "Some Aspects of the Sequential Design of Experiments". Bulletin of the American Mathematical Society . 1952.
  • Understanding Simple vs Controlled Experiments
  • What Is the Difference Between a Control Variable and Control Group?
  • The Role of a Controlled Variable in an Experiment
  • Scientific Variable
  • DRY MIX Experiment Variables Acronym
  • Six Steps of the Scientific Method
  • Scientific Method Vocabulary Terms
  • What Are the Elements of a Good Hypothesis?
  • Scientific Method Flow Chart
  • What Is an Experimental Constant?
  • Scientific Hypothesis Examples
  • What Are Examples of a Hypothesis?
  • What Is a Hypothesis? (Science)
  • Null Hypothesis Examples
  • What Is a Testable Hypothesis?
  • Random Error vs. Systematic Error

IMAGES

  1. PPT

    experiment definition computer

  2. PPT

    experiment definition computer

  3. PPT

    experiment definition computer

  4. (PPT) Experimentation in Computer Science (Part 2). Experimentation in

    experiment definition computer

  5. Experiments in Computer Science Experiments in Computer Science

    experiment definition computer

  6. Experiment, computer model and results from both

    experiment definition computer

VIDEO

  1. Laplace Experimente

  2. Why Ideally It's Always a Right Angle Science Experiment Education Physics Experiment

  3. Using the internet in 1981

  4. A creative science experiment from the comment area Physics Experiment Education Mathematics

  5. Oersted’s Experiment

  6. What kind of class is this, mathematics or physics? Education, mathematics, physics experiment

COMMENTS

  1. Academic Careers for Experimental Computer Scientists and Engineers

    Experiments in physics are designed on the basis of a theory that predicts the phenomena to be observed. "Theory" in computer science is by tradition very close to mathematics. That is, theoreticians in computer science tend to prove theorems, and the standards for demonstrating correctness are very similar to those traditionally used in ...

  2. Experiment Definition in Science

    Experiment Definition in Science. By definition, an experiment is a procedure that tests a hypothesis. A hypothesis, in turn, is a prediction of cause and effect or the predicted outcome of changing one factor of a situation. Both the hypothesis and experiment are components of the scientific method. The steps of the scientific method are:

  3. Experimental Design: Types, Examples and Methods

    A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions. ... Experimental design simply refers to the strategy that is employed in conducting experiments to test hypotheses and arrive at ...

  4. Experiments in Computing: A Survey

    This paper presents a survey of arguments about experimental computer science and presents that at least five different uses of the terms "experiment" and "experimental" can be found in the computing literature. This paper is a survey of how terminology is actually used and not of how it should be used.

  5. PDF What is Experimental Computer Science?

    onception 1: It is not novel happens." Undirected work wanders to Let us employ traditional mea- aimlessly, finding interesting results repeat an experiment. sures when assessing experimental only by accid. nt; it produces "re- computer science. Let usalways have Many proposals are rejected because searchers" with spotty anderratic a clear plan ...

  6. Computer Simulations in Science

    Computer simulation was pioneered as a scientific tool in meteorology and nuclear physics in the period directly following World War II, and since then has become indispensable in a growing number of disciplines. The list of sciences that make extensive use of computer simulation has grown to include astrophysics, particle physics, materials ...

  7. Computational Modeling

    A computational model contains numerous variables that characterize the system being studied. Simulation is done by adjusting the variables alone or in combination and observing the outcomes. Computer modeling allows scientists to conduct thousands of simulated experiments by computer. The thousands of computer experiments identify the handful ...

  8. Are computer simulations experiments? And if not, how are they related

    Computer simulations and experiments share many important features. One way of explaining the similarities is to say that computer simulations just are experiments. This claim is quite popular in the literature. The aim of this paper is to argue against the claim and to develop an alternative explanation of why computer simulations resemble experiments. To this purpose, experiment is ...

  9. What Is an Experiment? Definition and Design

    An experiment is a procedure designed to test a hypothesis as part of the scientific method. The two key variables in any experiment are the independent and dependent variables. The independent variable is controlled or changed to test its effects on the dependent variable. Three key types of experiments are controlled experiments, field ...

  10. Steps of the Scientific Method

    The six steps of the scientific method include: 1) asking a question about something you observe, 2) doing background research to learn what is already known about the topic, 3) constructing a hypothesis, 4) experimenting to test the hypothesis, 5) analyzing the data from the experiment and drawing conclusions, and 6) communicating the results ...

  11. Simulation

    scientific method. simulation, in industry, science, and education, a research or teaching technique that reproduces actual events and processes under test conditions. Developing a simulation is often a highly complex mathematical process. Initially a set of rules, relationships, and operating procedures are specified, along with other variables.

  12. Welcome to Lab 2.0 where computers replace experimental science

    The ultimate goal of computer modelling is to replace experiments almost entirely. We can then build experiments on a computer in the same way people build things in Minecraft. The computer would ...

  13. Experiment Definition & Meaning

    The meaning of EXPERIMENT is test, trial. How to use experiment in a sentence.

  14. Stretching the Traditional Notion of Experiment in Computing

    Experimentation represents today a 'hot' topic in computing. If experiments made with the support of computers, such as computer simulations, have received increasing attention from philosophers of science and technology, questions such as "what does it mean to do experiments in computer science and …

  15. Guide to Experimental Design

    Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.

  16. What are Simulations? An Epistemological Approach

    Consequently, powerful computer simulations are as true as real models of reality. They are always conceptual representations of an undergrounded level of physical events. The only true experiment should experiment with all related research properties that we wish to study and that are present in the universe, all together and at the same time.

  17. What Is a Controlled Experiment?

    Random assignment is a hallmark of a "true experiment"—it differentiates true experiments from quasi-experiments. Example: Random assignment To divide your sample into groups, you assign a unique number to each participant. You use a computer program to randomly place each number into either a control group or an experimental group.

  18. Computer Simulations: Definition, Examples, Uses

    Computer simulation is a step-by-step process in which a computer simulation program is modeled after a real-world system (a system can be a car, a building or even a tumor). In order to replicate the system and possible outcomes, the simulation uses mathematical equations to create an algorithm that defines the system's state, or the ...

  19. Computer Science Science Experiments (64 results)

    Fun science experiments to explore everything from kitchen chemistry to DIY mini drones. Easy to set up and perfect for home or school. Browse the collection and see what you want to try first! From cell phones to social media, computer science is a part of your daily life. Everything from traffic lights to medical devices requires both ...

  20. Experiment

    An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. Experiments vary greatly in goal and scale but always rely on repeatable procedure and logical analysis of the results.

  21. Experimental Methods in Human-Computer Interaction

    The other thing to note in this definition of severe testing is that, in setting up the experiment, there is a prediction that is being tested and the prediction has structure, namely that, in situations represented by the experiment, the outcome will be a certain way. ... Some whys and hows of Experiments in Human-Computer Interaction ...

  22. Computer

    computer, device for processing, storing, and displaying information. Computer once meant a person who did computations, but now the term almost universally refers to automated electronic machinery. The first section of this article focuses on modern digital electronic computers and their design, constituent parts, and applications.

  23. What Is a Controlled Experiment?

    Controlled Experiment. A controlled experiment is simply an experiment in which all factors are held constant except for one: the independent variable. A common type of controlled experiment compares a control group against an experimental group. All variables are identical between the two groups except for the factor being tested.