Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is Ecological Validity? | Definition & Examples

What Is Ecological Validity? | Definition & Examples

Published on September 9, 2022 by Kassiani Nikolopoulou . Revised on June 22, 2023.

Ecological validity measures how generalizable experimental findings are to the real world, such as situations or settings typical of everyday life. It is a subtype of external validity .

If a test has high ecological validity, it can be generalized to other real-life situations, while tests with low ecological validity cannot.

Using this approach, your findings would have low ecological validity. The experience of watching the video at home is vastly different from watching it on a plane.

Ecological validity is often applied in experimental studies of human behavior and cognition, such as in psychology and related fields.

Table of contents

What is ecological validity, assessing ecological validity, ecological validity vs. external validity, examples of ecological validity, limitations of ecological validity, other interesting articles, frequently asked questions.

Ecological validity assesses the validity of a study’s findings based on the environment or setting in which the study took place. If you have reason to suspect that the study’s environment may have influenced the generalizability of its results or led to research bias , the study’s ecological validity may be questioned.

Prevent plagiarism. Run a free check.

To assess the ecological validity of a study, you must critically examine the setting where it took place. It’s not as cut-and-dried as “the experiment took place in a lab, therefore it lacks ecological validity.” Rather, it’s more about pointing out what can prevent results from one environment or setting from being successfully applied to another.

The following questions can help you assess ecological validity:

  • What environment is the study taking place in?
  • To what other environment(s) are you trying to apply these conclusions?
  • How are these two different, or similar?

It’s important to keep in mind that research studies conducted in a lab setting don’t necessarily lack ecological validity. And generalizability does not depend on ecological validity alone—you need to consider other factors, too, such as population validity .

External validity examines whether study findings can be generalized beyond the sample. In other words, it analyzes whether you can apply what you’ve found in your study to other populations , situations, or variables .

On the other hand, ecological validity examines, specifically, whether the study findings can be generalized to real-life settings. Ecological validity is a subtype of external validity.

To assess external validity, you need to ask whether the findings can be generalized to patients with characteristics that are different from those in the study in some way. This could mean patients who are treated in a different way, or patients who have longer-term follow-ups.

Measuring ecological validity shows you to what degree results obtained from research or experiments are representative of conditions in the real world. Here are a few examples.

  • Walk in the rain to get to their destination, an auditorium in an adjacent building
  • Walk inside, but for a much greater distance, to get to the same auditorium

Your findings show that 83% of the participants would choose to walk a shorter distance in the rain.

You can argue that your study findings can be generalized to the real world for two reasons:

  • The setting in which your test took place is an everyday setting, a campus.
  • The dilemma you presented participants with can easily occur in everyday life.

Now, you can feel confident in generalizing what most people in a similar situation would do. In this case, they would likely prefer to walk through the rain in order to reach their destination faster, rather than taking a longer route and staying dry.

When results obtained from research or (controlled) experiments are not representative of conditions in the real world, the study findings are characterized by low ecological validity.

After a few weeks, you observe the same players in a natural setting. Your results show that the players have more or less the same number of verbal outbursts in both settings. However, they are more willing to take monetary risks in the lab. Given that the players were aware that they were participating in an experiment, and there was no real money involved, this is not a surprising outcome.

Ecological validity has a few limitations to be aware of.

Laboratory environments

Often, research studies in fields like psychology are conducted in laboratories, with the goal of better understanding human behavior. Ideally, an experiment like this will produce generalizable results—meaning that it predicts behavior outside the laboratory. If so, the study shows evidence of ecological validity.

However, laboratories are controlled environments. Distractions are minimized so that study participants can focus on the task at hand, clear instructions are provided, and researchers ensure that equipment works. Additionally, lab experiments risk having demand characteristics , or cues that point to the study’s objectives . These cues may lead participants to alter their behavior.

As these are all conditions that are usually not present in real life, they may compromise the study’s ecological validity.

Lack of standard measurements

There is no consensus about a standard definition of ecological validity; in fact, multiple definitions exist. As a result, there are no agreed-upon standards for measuring ecological validity. This leads some researchers to question the usefulness of ecological validity, arguing that being specific about what behavior or context you are testing is sufficient to mitigate research bias .

Before addressing ecological validity in your dissertation or research paper , it is important to find out how your teacher, department, or field of study defines it.

Tradeoff with internal validity

As mentioned above, controlled laboratory environments are not always a good fit for high ecological validity. However, controlled environments are better for establishing the cause-and-effect relationships needed for high internal validity , where it’s ideal for circumstances to be as identical as possible.

This can lead to a bit of a tradeoff between the almost-unnatural setting needed to assess internal validity and the approximation of real life needed to assess ecological validity. While a natural environment yields high ecological validity, it comes with the risk of more external factors influencing the relationship between different types of variables , leading to low internal validity.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

The purpose of theory-testing mode is to find evidence in order to disprove, refine, or support a theory. As such, generalizability is not the aim of theory-testing mode.

Due to this, the priority of researchers in theory-testing mode is to eliminate alternative causes for relationships between variables . In other words, they prioritize internal validity over external validity , including ecological validity .

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .

External validity is the extent to which your results can be generalized to other contexts.

The validity of your experiment depends on your experimental design .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Nikolopoulou, K. (2023, June 22). What Is Ecological Validity? | Definition & Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/methodology/ecological-validity/

Is this article helpful?

Kassiani Nikolopoulou

Kassiani Nikolopoulou

Other students also liked, demand characteristics | definition, examples & control, external validity | definition, types, threats & examples, the 4 types of validity in research | definitions & examples, what is your plagiarism score.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • What Is Ecological Validity? | Definition & Examples

What Is Ecological Validity? | Definition & Examples

Published on 12 September 2022 by Kassiani Nikolopoulou . Revised on 10 October 2022.

Ecological validity measures how generalisable experimental findings are to the real world, such as situations or settings typical of everyday life. It is a subtype of external validity .

If a test has high ecological validity, it can be generalised to other real-life situations, while tests with low ecological validity cannot.

Using this approach, your findings would have low ecological validity. The experience of watching the video at home is vastly different from watching it on a plane.

Ecological validity is often applied in experimental studies of human behaviour and cognition, such as in psychology and related fields.

Table of contents

What is ecological validity, assessing ecological validity, ecological validity vs. external validity, examples of ecological validity, limitations of ecological validity, frequently asked questions about ecological validity.

Ecological validity assesses the validity of a study’s findings based on the environment or setting in which the study took place. If you have reason to suspect that the study’s environment may have influenced the generalisability of its results, the study’s ecological validity may be questioned.

Prevent plagiarism, run a free check.

To assess the ecological validity of a study, you must critically examine the setting where it took place. It’s not as cut-and-dried as ‘the experiment took place in a lab, therefore it lacks ecological validity’. Rather, it’s more about pointing out what can prevent results from one environment or setting from being successfully applied to another.

The following questions can help you assess ecological validity:

  • What environment is the study taking place in?
  • To what other environment(s) are you trying to apply these conclusions?
  • How are these two different, or similar?

It’s important to keep in mind that research studies conducted in a lab setting don’t necessarily lack ecological validity. And generalisability does not depend on ecological validity alone – you need to consider other factors, too, such as population validity .

External validity examines whether study findings can be generalised beyond the sample. In other words, it analyses whether you can apply what you’ve found in your study to other populations , situations, or variables .

On the other hand, ecological validity examines, specifically, whether the study findings can be generalised to real-life settings. Ecological validity is a subtype of external validity.

To assess external validity, you need to ask whether the findings can be generalised to patients with characteristics that are different from those in the study in some way. This could mean patients who are treated in a different way, or patients who have longer-term follow-ups.

Measuring ecological validity shows you to what degree results obtained from research or experiments are representative of conditions in the real world. Here are a few examples.

  • Walk in the rain to get to their destination, an auditorium in an adjacent building
  • Walk inside, but for a much greater distance, to get to the same auditorium

Your findings show that 83% of the participants would choose to walk a shorter distance in the rain.

You can argue that your study findings can be generalised to the real world for two reasons:

  • The setting in which your test took place is an everyday setting, a campus.
  • The dilemma you presented participants with can easily occur in everyday life.

Now, you can feel confident in generalising what most people in a similar situation would do. In this case, they would likely prefer to walk through the rain in order to reach their destination faster, rather than taking a longer route and staying dry.

When results obtained from research or (controlled) experiments are not representative of conditions in the real world, the study findings are characterised by low ecological validity.

After a few weeks, you observe the same players in a natural setting. Your results show that the players have more or less the same number of verbal outbursts in both settings. However, they are more willing to take monetary risks in the lab. Given that the players were aware that they were participating in an experiment, and there was no real money involved, this is not a surprising outcome.

Ecological validity has a few limitations to be aware of.

Laboratory environments

Often, research studies in fields like psychology are conducted in laboratories, with the goal of better understanding human behaviour. Ideally, an experiment like this will produce generalisable results – meaning that it predicts behaviour outside the laboratory. If so, the study shows evidence of ecological validity.

However, laboratories are controlled environments. Distractions are minimised so that study participants can focus on the task at hand, clear instructions are provided, and researchers ensure that equipment works. Additionally, lab experiments risk having demand characteristics , or cues that point to the study’s objectives . These cues may lead participants to alter their behaviour.

As these are all conditions that are usually not present in real life, they may compromise the study’s ecological validity.

Lack of standard measurements

There is no consensus about a standard definition of ecological validity; in fact, multiple definitions exist. As a result, there are no agreed-upon standards for measuring ecological validity. This leads some researchers to question the usefulness of ecological validity, arguing that being specific about what behaviour or context you are testing is sufficient.

Before addressing ecological validity in your dissertation or research paper , it is important to find out how your teacher, department, or field of study defines it.

Tradeoff with internal validity

As mentioned above, controlled laboratory environments are not always a good fit for high ecological validity. However, controlled environments are better for establishing the cause-and-effect relationships needed for high internal validity , where it’s ideal for circumstances to be as identical as possible.

The purpose of theory-testing mode is to find evidence in order to disprove, refine, or support a theory. As such, generalisability is not the aim of theory-testing mode.

Due to this, the priority of researchers in theory-testing mode is to eliminate alternative causes for relationships between variables . In other words, they prioritise internal validity over external validity , including ecological validity .

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research , you also have to consider the internal and external validity of your experiment.

I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .

External validity is the extent to which your results can be generalised to other contexts.

The validity of your experiment depends on your experimental design .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Nikolopoulou, K. (2022, October 10). What Is Ecological Validity? | Definition & Examples. Scribbr. Retrieved 3 September 2024, from https://www.scribbr.co.uk/research-methods/ecological-validity-explained/

Is this article helpful?

Kassiani Nikolopoulou

Kassiani Nikolopoulou

Other students also liked, external validity | types, threats & examples, construct validity | definition, types, & examples, face validity | guide with definition & examples.

CONCEPTUAL ANALYSIS article

The ‘real-world approach’ and its problems: a critique of the term ecological validity.

\r\nGijs A. Holleman,*

  • 1 Department of Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, Netherlands
  • 2 Department of Developmental Psychology, Utrecht University, Utrecht, Netherlands
  • 3 Brain Center, University Medical Center Utrecht, Utrecht, Netherlands

A popular goal in psychological science is to understand human cognition and behavior in the ‘real-world.’ In contrast, researchers have typically conducted their research in experimental research settings, a.k.a. the ‘psychologist’s laboratory.’ Critics have often questioned whether psychology’s laboratory experiments permit generalizable results. This is known as the ‘real-world or the lab’-dilemma. To bridge the gap between lab and life, many researchers have called for experiments with more ‘ecological validity’ to ensure that experiments more closely resemble and generalize to the ‘real-world.’ However, researchers seldom explain what they mean with this term, nor how more ecological validity should be achieved. In our opinion, the popular concept of ecological validity is ill-formed, lacks specificity, and falls short of addressing the problem of generalizability. To move beyond the ‘real-world or the lab’-dilemma, we believe that researchers in psychological science should always specify the particular context of cognitive and behavioral functioning in which they are interested, instead of advocating that experiments should be more ‘ecologically valid’ in order to generalize to the ‘real-world.’ We believe this will be a more constructive way to uncover the context-specific and context-generic principles of cognition and behavior.

Introduction

A popular goal in psychological science is to understand human cognition and behavior in the ‘real-world.’ In contrast, researchers have traditionally conducted experiments in specialized research settings, a.k.a. the ‘psychologist’s laboratory’ ( Danziger, 1994 ; Hatfield, 2002 ). Over the course of psychology’s history, critics have often questioned whether psychology’s lab-based experiments permit the generalization of results beyond the laboratory settings within which these results are typically obtained. In response, many researchers have advocated for more ‘ecologically valid’ experiments, as opposed to the so-called ‘conventional’ laboratory methods ( Neisser, 1976 ; Aanstoos, 1991 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). In recent years, several technological advances (e.g., virtual reality, wearable eye trackers, mobile EEG devices, fNIRS, biosensors, etc.) have further galvanized researchers to emphasize the importance of studying human cognition and behavior in the ‘real-world,’ as new technologies will aid researchers in overcoming some of the inherent limitations of laboratory experiments ( Schilbach, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ).

In this article, we will argue that the general aspiration of researchers to understand human cognition and behavior in the ‘real-world’ by conducting experiments that are more ‘ecologically valid’ (henceforth referred to as the ‘real-world approach’) is not without its problems. Most notably, we will argue that the popular term ‘ecological validity,’ which is widely used nowadays by researchers to discuss whether experimental research resembles and generalizes to the ‘real-world,’ is shrouded in both conceptual and methodological confusion. As we ourselves are interested in cognitive and behavioral functioning in the context of people’s everyday experience, and conduct experiments across various ‘laboratory’ and ‘real-world’ environments, we have seen how the uncritical use of the term ‘ecological validity’ can lead to rather misleading and counterproductive discussions. This not only holds for how this concept is used in many scholarly articles and textbooks, but also in presentations and discussions of experimental research at conferences, during the review process, and when talking with students about experimental design and the analysis of evidence.

Although the usage of the term ecological validity has previously been criticized by several scholars ( Hammond, 1998 ; Schmuckler, 2001 ; cf. Araujo et al., 2007 ; Dunlosky et al., 2009 ), we think that these critiques have largely been overlooked. Therefore, it will be necessary to cover some of the same ground. The contribution of this article is threefold. First, we extend the critique of the term ecological validity and apply it to the field of social attention. Second, we scrutinize some of the assumptions that guide the contemporary framework of ecological validity, specifically those regarding artificiality–naturality and simplicity–complexity. Finally, our article is meant to educate a new generation of students and researchers on the historical roots and conceptual issues of the term ecological validity. This article consists of four parts. First, we will provide a brief history of the so-called ‘real-world or the lab’-dilemma and discuss several definitions and interpretations of the term ecological validity. Second, we will go into the historical roots of the concept of ecological validity and describe how the original meaning of this concept has transformed significantly. Third, we will scrutinize the prevailing assumptions that seems to guide how researchers are currently using the term ecological validity. Finally, we will apply our conceptual analysis to a specific field of study, namely the field of social attention. In recent years, this field has been particularly concerned with issues of ecological validity and generalizability. Therefore, the field of social attention offers an exemplary case to explain how the uncritical use of the terms ‘ecological validity’ and the ‘real-world’ may lead to misleading and counterproductive conclusions.

A Brief History of the ‘Real-World or the Lab’-Dilemma

The popular story of psychology (or the broader ‘cognitive sciences’) has it that “psychology became a science by rising from the ‘armchair’ of speculation and uncontrolled observation, and entering the laboratory to undertake controlled observation and measurement” ( Hatfield, 2002 , p. 208). The ‘psychologist’s laboratory’, a special room furnished with all kinds of lab paraphernalia and sophisticated equipment, has been regarded as the celebrated vehicle of psychology’s journey into sciencehood ( Danziger, 1994 ; Goodwin, 2015 ). However, despite psychologists’ long tradition of laboratory experimentation (for a history and discussion, see Gillis and Schneider, 1966 ), there also have been many critical voices saying that psychology’s laboratory experiments are too limited in scope to study how people function in daily life. For example, Brunswik (1943 , p. 262) once wrote that experimental psychology was limited to “narrow-spanning problems of artificially isolated proximal or peripheral technicalities of mediation which are not representative of the larger patterns of life”. Barker (1968 , p. 3) wrote that “it is impossible to create in the laboratory the frequency, duration, scope and magnitude of some important human conditions.” Neisser (1976 , p. 34) wrote that “contemporary studies of cognitive processes usually use stimulus material that is abstract, discontinuous, and only marginally real.” Bronfenbrenner (1977 , p. 513) wrote that “many of these experiments involve situations that are unfamiliar, artificial, and short-lived and that call for unusual behaviors that are difficult to generalize to other settings.” Kingstone et al. (2008 , p. 355) declared that “the research performed in labs, and the findings they generate, are in principle and in practice unlikely to be of relevance to the more complex situations that people experience in everyday lif e, ” and Shamay-Tsoory and Mendelsohn (2019 , p. 1) stated that “ conventional experimental psychological approaches have mainly focused on investigating behavior of individuals as isolated agents situated in artificial, sensory, and socially deprived environments, limiting our understanding of naturalistic cognitive, emotional, and social phenomena.”

According to these scholars, psychological science is faced with a gloomy predicament: findings and results based on highly controlled and systematically designed laboratory experiments may not be a great discovery but only a “mere laboratory curiosity” ( Gibson, 1970 , pp. 426–427). As Anderson et al. (1999 , p. 3) put it: “A common truism has been that … laboratory studies are good at telling whether or not some manipulation of an independent variable causes changes in the dependent variable, but many scholars assume that these results do not generalize to the “real-world.” The general concern is that, due to the ‘artificiality’ and ‘simplicity’ of the laboratory, some (if not many) lab-based experiments do not adequately represent the ‘naturality’ and ‘complexity’ of psychological phenomena in everyday life (see Figure 1 ). This problem has become familiar to psychologists as the ‘real-world or the lab’-dilemma ( Hammond and Stewart, 2001 ). At the heart of psychology’s ‘real-world or the lab’-dilemma lies a pernicious methodological choice: “Should it [psychological science] pursue the goal of generality by demanding that research be generalizable to “real life” (aka the “real-world”), or should it pursue generalizability by holding onto its traditional laboratory research paradigm?” ( Hammond and Stewart, 2001 , p. 7).

www.frontiersin.org

Figure 1. Examples of historical and contemporary laboratory rooms and field experiments. (A) A laboratory room from the early 20th century. A participant is seated in front a ‘disc tachistoscope,’ an apparatus to display visual images (adapted from Hilton, 1920 ). (B) A picture of a field experiment by J. J. Gibson. Observers had to judge the size of an object in the distance (adapted from Gibson, 1950 ). (C) A 21st century eye tracking laboratory. A participant is seated in front of a SMI Hi-Speed tower-mounted eye tracker (based on Valtakari et al., 2019 ). (D) A wearable eye-tracker (barely visible) is used to measure gaze behavior while participants walked through corridors with human crowds ( Hessels et al., 2020 ). Copyright statement – Panels (A,B) . All photographs are used under the provision of the “fair use” U.S. Copyright Act 107 and Dutch Copyright Law Article 15a for non-profit purposes of research, education and scholarly comment. The photograph from W. Hilton’s book: Applied Psychology: Driving Power of Thought (Original date of publication, 1920). Retrieved April 1, 2020, from http://www.gutenberg.org/files/33076/33076-h/33076-h.htm . The photograph from J. J. Gibson’s book: The Perception of the Visual World (Original date of publication, 1950, Figure 74, p. 184) was retrieved from a copy of the Utrecht University library. (C,D) Photographs are owned by the authors and the people depicted in the images gave consent for publication.

Although psychological science is comprised of many specialized research areas, the goal to understand human cognition and behavior in the ‘real-world’ has become a critically acclaimed goal for psychologists and cognitive scientists of all stripes. Indeed, examples of the ‘real-world or the lab’-dilemma can be found not only in various ‘applied’ fields of psychology, such as ergonomics ( Hoc, 2001 ), clinical (neuro)psychology ( Wilson, 1993 ; Parsons, 2015 ), educational psychology ( Dunlosky et al., 2009 ), sport psychology ( Davids, 1988 ), marketing and consumer psychology ( Smith et al., 1998 ), and the psychology of driving ( Rogers et al., 2005 ), but also in the so-called ‘basic’ fields of psychological science, such as the study of perception ( Brunswik, 1956 ; Gibson, 1979/2014 ), attention ( Simons and Levin, 1998 ; Peelen and Kastner, 2014 ), memory ( Banaji and Crowder, 1989 ; Neisser, 1991 ; Cohen and Conway, 2007 ), social cognition ( Schilbach et al., 2013 ; Schilbach, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ), judgment-and-decision making ( Koehler, 1996 ), and child development ( Lewkowicz, 2001 ; Schmuckler, 2001 ; Adolph, 2019 ).

The ‘Real-World Approach’: A Call for Ecological Validity

In the past decades, researchers have often discussed how they may overcome some of the limitations of laboratory-based experiments. Perhaps the largest common denominator of what we call the ‘real-world approach’ is a strong emphasis on ‘ecological validity.’ Over the past decades, the term ecological validity has made its appearance whenever researchers became concerned with the potential limitations of laboratory experiments (see e.g., Jenkins, 1974 ; Neisser, 1976 ; Banaji and Crowder, 1989 ; Aanstoos, 1991 ; Koehler, 1996 ; Smilek et al., 2006 ; Risko et al., 2012 ; Schilbach, 2015 ; Caruana et al., 2017 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). As Neisser (1976 , p. 33) famously put it:

“The concept of ecological validity has become familiar to psychologists. It reminds them that the artificial situation created for an experiment may differ from the everyday world in crucial ways. When this is so, the results may be irrelevant to the phenomena that one would really like to explain.”

The main problem, according to Neisser and many others, is that experiments in psychological science are generally “lacking in ecological validity” ( Neisser, 1976 , p. 7; Smilek et al., 2006 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ). Aanstoos (1991 , p. 77) even referred to this problem as the “ecological validity crisis.” To counter this problem, many researchers have called for studies with ‘more’ or ‘greater’ ecological validity. For example, Koehler (1996 , p. 1) advocated for a “more ecologically valid research program,” Schilbach (2015 , p. 130) argued for “the inclusion of more ecologically valid conditions,” and Smilek et al. (2006 , p. 104) suggested that “in order for results to generalize to real-world scenarios we need to use tasks with greater ecological validity.” Clearly, ecological validity is regarded as an important feature of experimental research by researchers who pursue the ‘real-world approach.’ However, in our opinion, and we are not alone in this regard (see also Hammond, 1998 ; Araujo et al., 2007 ; Dunlosky et al., 2009 ), this notion of ecological validity has caused considerable confusion. To foreshadow some of our criticism of ecological validity, we will show that this concept has largely been detached from its original parentage (cf. Brunswik, 1949 ), and is now host to different interpretations guided by questionable assumptions (for a history, see Hammond, 1998 ). Worst of all, the concept is often wielded as a blunt weapon to criticize and dismiss experiments, even though researchers seldom make explicit what definition of ecological validity they use or by which set of criteria they have evaluated a study’s ecological validity (as previously pointed out by Hammond, 1998 ; Schmuckler, 2001 ; Dunlosky et al., 2009 ).

The Big Umbrella of Ecological Validity

In past decades, the concept of ecological validity has been related to various facets of psychological research, for example, the ecological validity of stimuli ( Neisser, 1976 ; Risko et al., 2012 ; Jack and Schyns, 2017 ), the ecological validity of tasks ( Smilek et al., 2006 ; Krakauer et al., 2017 ), the ecological validity of conditions ( Schilbach, 2015 ; Blanco-Elorrieta and Pylkkänen, 2018 ), the ecological validity of research settings ( Bronfenbrenner, 1977 ; Schmuckler, 2001 ), the ecological validity of results ( Eaton and Clore, 1975 ; Greenwald, 1976 ; Silverstein and Stang, 1976 ), the ecological validity of theories ( Neisser, 1976 ), the ecological validity of research designs ( Rogers et al., 2005 ), the ecological validity of methods ( Banaji and Crowder, 1989 ), the ecological validity of phenomena ( Johnston et al., 2014 ), the ecological validity of data ( Aspland and Gardner, 2003 ), and the ecological validity of paradigms ( Macdonald and Tatler, 2013 ; Schilbach et al., 2013 ). However, despite the popular usage of this term, specific definitions and requirements of ecological validity are not always clear.

A closer look at the literature suggests that different definitions and interpretations are used by researchers. Let’s consider some examples of the literature where researchers have been more explicit in their definitions of ecological validity. For example, Ashcraft and Radvansky (2009 , p. 511) defined ecological validity as: “The hotly debated principle that research must resemble the situations and task demands that are characteristic of the real-world rather than rely on artificial laboratory settings and tasks so that results will generalize to the real-world, that is, will have ecological validity.” Another influential definition of ecological validity was given by Bronfenbrenner (1977) , who defined ecological validity as “the extent to which the environment experienced by the subjects in a scientific investigation has the properties it is supposed or assumed to have by the investigator” (p. 516). In Bronfenbrenner’s view, a study’s ecological validity should not be predicated on the extent to which the research context resembles or is carried out in a ‘real-life’ environment. Instead, theoretical considerations should guide one’s methodological decisions on what type of research context is most appropriate given one’s focus of inquiry. For example, if one is interested in the behavioral responses of children when they are placed in a ‘strange situation’ then a laboratory room may be adequately suited for that particular research goal. However, if one is interested in how children behave within their home environment, then a laboratory room may not be the most suitable research context. As Bronfenbrenner (1977 , p. 516) remarked: “Specifically, so far as young children are concerned, the results indicate that the strangeness of the laboratory situation tends to increase anxiety and other negative feeling states and to decrease manifestations of social competence.”

Ecological validity has also been used interchangeably with (or regarded as a necessary component of) ‘external validity’ ( Berkowitz and Donnerstein, 1982 ; Mook, 1983 ; Hoc, 2001 ). The concept of external validity typically refers to whether a given study result or conclusion, usually obtained under one set of conditions and with one group of participants, can also be generalized to other people, tasks, and situations ( Campbell, 1957 ). For example, in the literature on neuropsychological assessment and rehabilitation, ecological validity has primarily been conceptualized as “ … the degree to which clinical tests of cognitive functioning predict functional impairment” ( Higginson et al., 2000 , p. 185). In this field, there has been much discussion about whether the neuropsychological tests used by clinicians accurately predict cognitive and behavioral impairments in everyday life ( Heinrichs, 1990 ; Wilson, 1993 ). One major concern is that the test materials are either too abstract or too general to adequately represent the kind of problems that people with cognitive and neurological impairments encounter in their daily routines, for example, while cooking or buying food at the supermarket. In response, various efforts have been made to increase the ecological validity of neuropsychological tests, for example, by developing performance measures with relevance for everyday tasks and activities ( Shallice and Burgess, 1991 ; Alderman et al., 2003 ), by combining and correlating tests results with behavioral observations and self-reports ( Wilson, 1993 ; Higginson et al., 2000 ), and by using Virtual Reality (VR) applications to create test situations in which a patient’s cognitive and functional impairments are likely to be expressed ( Parsons, 2015 ; Parsons et al., 2017 ).

The Historical Roots of Ecological Validity

As we have seen, definitions and interpretations of ecological validity may not only differ among researchers, but also across various subfields within psychology. As such, it is not always clear how the concept should be interpreted. Interestingly, the term ecological validity used to have a very precise meaning when it was first introduced to psychological science by Brunswik (1949 , 1952 , 1955 , 1956) . Brunswik coined the term ‘ecological validity’ to describe the correlation between a proximal sensory cue (e.g., retinal stimulation) and a distal object-variable (e.g., object in the environment). In Brunswik’s terminology, ecological validity refers to a measure (a correlation coefficient) that describes a probabilistic relationship between the distal and proximal layers of an organism-environment system. According to Brunswik (1955) : “A correlation between ecological variables, one which is capable of standing in this manner as a probability cue for the other, may thus be labeled “ecological validity”” (p. 199). Brunswik (1952) believed psychology to primarily be a science of organism-environment relations in which the “organism has to cope with an environment full of uncertainties” (p. 22). In Brunswik’s ‘lens model’ ( Brunswik, 1952 ), the ecological validities of perceptual cues indicate the potential utility of these cues for the organism to achieve its behavioral goals. Note that Brunswik’s concept of ecological validity is very different from how the term is generally used nowadays, namely to discuss and evaluate whether some laboratory-based experiments resemble and generalize to the ‘real-world’ (cf. Neisser, 1976 ; Smilek et al., 2006 ; Ashcraft and Radvansky, 2009 ; Shamay-Tsoory and Mendelsohn, 2019 ).

The erosion and distortion of Brunswik’s definition of ecological validity has been documented by several scholars (e.g., Hammond, 1998 ; Araujo et al., 2007 ; Holleman et al., in press ). As explained by Hammond (1998) , the original definition of ecological validity, as Brunswik (1949 , 1952) introduced it, has been conflated with Brunswik’s ‘representative design’ of experiments ( Brunswik, 1955 , 1956 ). Representative design was Brunswik’s methodological program for psychological science to achieve generalizability of results. To achieve this, researchers should not only conduct proper sampling on the side of the subjects, by sampling subjects who are representative of a specific ‘target population’ (e.g., children, patients), but researchers should also sample stimuli, tasks, and situations which are representative of a specific ‘target ecology.’ As such, an experiment may be treated as a sample of this ‘target ecology.’ By virtue of sampling theory, researchers may then determine whether results can be generalized to the intended conditions. In short, representative design requires researchers to first specify the conditions toward which they intend to generalize their findings, and then specify how those conditions are represented in the experimental arrangement ( Brunswik, 1956 ). For more in-depth discussions on representative design, see Hammond and Stewart (2001) ; Dhami et al. (2004) , and Hogarth (2005) .

A Systematic Approach to Ecological Validity?

The current lack of terminological precision surrounding ecological validity is, to say the least, problematic. There seems to be no agreed upon definition in the literature, nor any means of classification to determine or evaluate a study’s ecological validity. This seems to be at odds with the relative ease by which researchers routinely invoke this concept to discuss the limitations and shortcomings of laboratory experiments. All the while, researchers seldom make clear how they have determined a study’s ecological (in)validity. As Schmuckler (2001 , p. 419) pointed out: “One consequence of this problem is that concerns with ecological validity can be raised in most experimental situations.” To overcome these problems, several scholars have emphasized the need for a more systematic approach to ecological validity ( Lewkowicz, 2001 ; Schmuckler, 2001 ; Kingstone et al., 2008 ; Risko et al., 2012 ). For example, Lewkowicz (2001 , p. 443) wrote that: “What is missing is an independent, objective, and operational definition of the concept of ecological validity that makes it possible to quantify a stimulus or event as more or less ecologically valid.” According to Schmuckler (2001) , ecological validity can be evaluated on at least three dimensions: (1) the nature of the stimuli ; (2) the nature of task, behavior, or response ; (3) the nature of the research context . Researchers have primarily discussed these dimensions in terms of their artificiality–naturality (e.g., Hoc, 2001 ; Schmuckler, 2001 ; Risko et al., 2012 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ), and their simplicity–complexity (e.g., Kingstone et al., 2008 ; Peelen and Kastner, 2014 ; Lappi, 2015 ). As such, a general framework can be construed where stimuli, tasks, behaviors, and research contexts can be evaluated on a continuum of artificiality–naturality and simplicity–complexity (see also Risko et al., 2012 ; Lappi, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). At one extreme is the laboratory, characterized by its artificiality and simplicity. At the other extreme is the ‘real-world,’ characterized by its naturality and complexity. According to this multidimensional framework, researchers may determine a study’s overall ecological validity by combining (e.g., averaging or summing) the main components of ecological validity (i.e., stimuli, tasks/behaviors, research context) in terms of their relative artificiality–naturality and simplicity–complexity. However, while many researchers have conceptualized ecological validity alongside these dimensions, we think there are several problems to consider. Since the dimensions of this framework are supposedly important to determine the ecological validity of experimental research, this then raises the question of how researchers can judge the artificiality–naturality and simplicity–complexity of particular experiments. This question will be explored in the following sections.

Artificiality – Naturality

The contrast between ‘artificiality’ and ‘naturality’ is a particularly prominent point of discussion in the ‘real-world or the lab’-dilemma and when researchers talk about the ecological validity of experimental research practices ( Hoc, 2001 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ). According to Hoc (2001 , pp. 282–283), ‘artificial’ situations are “those that are specifically designed for research” and ‘natural’ situations are “the target situations to be understood by research” . Importantly, Hoc (2001) notes that this distinction is made from the perspective of the researcher. However, this artificiality–naturality distinction should also be considered from the subject’s point of view. For example, according to Sonkusare et al. (2019) : “naturalistic paradigms can be heuristically defined as those that employ the rich, multimodal dynamic stimuli that represent our daily lived experience, such as film clips, TV advertisements, news items, and spoken narratives, or that embody relatively unconstrained interactions with other agents, gaming environments, or virtual realities” (p. 700). Furthermore, researchers have long recognized that artificiality arises when the experimental methods employed by researchers interfere with the naturality of the psychological phenomena one aims to study. Consequently, there is always an inherent trade-off between the degree of artificiality imposed by the experimental conditions and the naturality of the phenomena under scientific investigation ( Brunswik, 1956 ; Barker, 1968 ; Banaji and Crowder, 1989 ; Kingstone et al., 2008 ; Risko et al., 2012 ; Caruana et al., 2017 ). However, as Winograd (1988) has previously remarked, it remains difficult to “draw a line where artificiality ends and ecological validity … for real events begins” (p. 18).

Interestingly, discussions on the naturality–artificiality of experimental methods have a long pedigree in psychological science. By the end of the 19th century, Thorndike (1899) and Mills (1899) already argued fiercely about what methodology should be favored to study the behavior of cats. Mills dismissed Thorndike’s work because of the artificiality of the experimental methods employed by Thorndike (see Figure 2 ), whereas Thorndike regarded the ethological approach favored by Mills as a collection of uncritical observations and anecdotes. Mills (1899 , p. 264) wrote that: “Dr. Thorndike … has given the impression that I have not made experiments, or ‘crucial experiments’ … I may remark that a laboratory as ordinarily understood is not well suited for making psychological experiments on animals” . Mills’ point was that: “cats placed in small enclosures … cannot be expected to act naturally. Thus, nothing from about their normal behavior can be determined from their behavior in highly artificial, abnormal surroundings” ( Goodwin, 2015 , p. 200). In response to Mills, Thorndike (1899 , p. 414) replied: “Professor Mills does not argue in concrete terms, does not criticize concrete unfitness in the situations I devised for the animals. He simply names them unnatural.” Thorndike clearly did not accept Mills’ charge on the artificiality of his experimental arrangements to study the behavior of cats because Mills did not define what should be considered natural behavior in the first place.

www.frontiersin.org

Figure 2. A ‘puzzle box’ devised by Thorndike (1899 , 2017) to study learning behavior of cats. A hungry cat is placed in a box which can be opened if the cat pushes a latch. A food reward (‘positive reinforcer’) will be obtained by the cat if it figures out how to escape from the box. Thorndike discovered that after several trials, the time it takes the cat to escape from the box decreases. Experiments with puzzle boxes remain popular today to study the cognitive capacities of animals, for example, see Richter et al. (2016) for a study with octopuses. Copyright statement – Image created and owned by author IH and is based on E. L. Thorndike’s book: Animal Intelligence (Original date of publication, 1911, Figure 1, p. 30).

We think that this historical discussion between Thorndike and Mills is illuminating, because it characterizes the heart of the discussion on ecological validity nowadays. Namely, what exactly did Mills consider to be ‘natural’ or ‘normal’ behavior? And how did Mills determine that Thorndike’s experiments failed to capture the ‘natural’ behavior of cats? Following Thorndike’s point on the matter, we think that researchers cannot readily determine the naturality–artificiality of any given experimental arrangement, at least not without specifying what is entailed by these ascriptions. As Dunlosky et al. (2009 , p. 431) previously remarked: “A naturalistic setting guarantees nothing, especially given that “naturalistic” is never unpacked – what does it mean?”. Indeed, our survey of the literature also shows that the historical discussion between Thorndike and Mills is by no means a discussion of the past. In fact, we regularly encounter discussions on the ‘artificiality’ and ‘naturality’ of experimental setups, the presentation of stimuli, the behavior of participants, or the specific tasks and procedures used in experiments – not only in the literature, but also among our colleagues and reviewers. We must often ask for the specifics, because such remarks typically remain undefined by those who toss them around.

Simplicity – Complexity

The contemporary framework of ecological validity also posits that the laboratory and the ‘real-world’ are inversely proportional in terms of their simplicity–complexity. Many researchers have lamented that laboratory experiments have a ‘reductionistic’ tendency to simplify the complexity of the psychological phenomena under study (e.g., Neisser, 1976 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ). For example, Sonkusare et al. (2019 , p. 699) stated that “the ecological validity of these abstract, laboratory-style experiments is debatable, as in many ways they do not resemble the complexity and dynamics of stimuli and behaviors in real-life.” But what exactly is meant by complexity? Let’s consider some examples from the literature. In the field of social attention, researchers have often used schematic images, photographs and videos of people and social scenes as stimuli to study the cognitive, behavioral, and physiological processes of face perception, gaze following and joint attention ( Langton et al., 2000 ; Frischen et al., 2007 ; Puce and Bertenthal, 2015 ). However, in recent years, there has been considerable debate that such stimuli are not ‘ecologically valid’ because they do not “capture the complexity of real social situations” ( Birmingham et al., 2012 , p. 30). While we agree that looking at a photographic image of a person’s face is different from looking at a living and breathing person, in what ways do these situations differ in complexity? Do these scholars mean that looking at a ‘live’ person is more complex than looking at a picture of that person? Or do they mean that the former is more complex than the latter from the perspective of the researcher who wants to understand the cognitive, behavioral, and physiological processes of face perception and social attention?

To take another example, Gabor patches are often used as stimuli by experimental psychologists to study ‘low-level visual processing’ (see Figure 3 ). Experimental psychologists use Gabor patches as visual stimuli because they offer a high degree of experimental control over various stimulus parameters (e.g., spatial frequency bandwidths, orientation, contrast, size, location). Gabor patches can described with mathematical precision (i.e., ”Gaussian-windowed sinusoidal gratings,” Fredericksen et al., 1997 , p. 1), and their spatial properties are considered to be a good representation of the receptive field profiles in the primary visual cortex. While Gabor patches may be considered ‘simple’ to researchers who study the relation between low-level visual processing and neural activity in terms of orientation-tuning and hemodynamic response functions, they also point to the yet to be explained ‘complexity’ of the many possible relations between other cognitive processes and patterns of neural activity in the brain. On the other hand, a naïve participant (who likely has no clue about what researchers have discovered about low-level visual processing) may describe these Gabor patches as blurry, kind of stripy, zebra-like circles, and think that they are incredibly boring to look at for many trials while lying quietly in a MRI scanner.

www.frontiersin.org

Figure 3. Are Gabor patches simple or complex compared to a picture of zebras? (A) A Gabor patch. (B) A photograph of zebras. The uniquely striped patterns of the zebra makes them most familiar to humans, whereas the question why zebras have such beautiful stripes remains the topic of much discussion among biologists, see e.g., Caro and Stankowich (2015) and Larison et al. (2015) . Copyright statement – Images are used under the provision of the “fair use” U.S. Copyright Act 107 and Dutch Copyright Law Article 15a for non-profit purposes of research, education and scholarly comment. Image of Gabor patch was adapted from Todorović (2016 , May 30). Retrieved April 1, 2020, from http://neuroanatody.com/2016/05/whats-in-a-gabor-patch/ ). Photograph of zebras was made by Ajay Lalu and has been made publicly available by the owner for non-profit purposes via Pixabay . Retrieved on April 1, 2020, from https://pixabay.com/nl/users/ajaylalu-1897335/ .

Our point here is that simplicity–complexity is in the eye of the beholder. Who is to say what is more simple or complex? Physicists, computer scientists, information theorists, and evolutionary biologists have developed various definitions and measures of complexity (e.g., physical complexity, computational complexity, effective complexity, algorithmic complexity, statistical complexity, structural complexity, functional complexity, etc.), typically expressed in strictly mathematical terms ( Edmonds, 1995 ; Gell-Mann, 1995 ; Adami, 2002 ). But what definitions and measures of complexity are used by psychologists and cognitive scientists? Researchers in psychological science seem to have more loosely used the term complexity, for example, to describe a wide range of biological, behavioral, cognitive, social, and cultural phenomena, which typically contain lots of many’s (i.e., many parts, many variables, many degrees of freedom). Researchers may refer to various phenomena as ‘complex’ because they are simply not (yet) understood, as in “the brain is too complex for us to understand” ( Edmonds, 1995 , p. 4). Yet, such intuitive notions of complexity, whether they are caused by ignorance or whether they are used to describe something’s size, number, or variety ( Edmonds, 1995 ), are not very helpful to evaluate the simplicity–complexity of stimuli, tasks, and situations, nor do such notions provide any formula by which these components can be summed to determine the total ecological validity of a given study. According to Gell-Mann (1995 , p. 16):

“As measures of something like complexity for an entity in the real-world, all such quantities are to some extent context-dependent or even subjective. They depend on the coarse graining (level of detail) of the description of the entity, on the previous knowledge and understanding of the world that is assumed, on the language employed, on the coding method used for conversion from that language into a string of bits, and on the particular idealized computer chosen as a standard.”

The ‘Real World’ or the ‘Laboratory’: Psychology’s False Dilemma?

We have discussed several problems with how researchers have used the term ‘ecological validity’. In short, the concept of ecological validity has transformed significantly over the past several decades since it was introduced by Brunswik (1949) . It has lost most of its former theoretical and methodological cohesion (for a history, see Hammond, 1998 ), and the definitions and requirements of ecological validity used by researchers nowadays are seldom made explicit. As such, some experiments may be regarded as ‘ecologically valid’ by one researcher while they can be casually dismissed as ‘ecologically invalid’ by others. A closer look at the literature suggests that many researchers seem to assume that everyone understands what is meant by this term, while in fact the concept of ecological validity is seldom defined. As such, the concept of ecological validity is primarily used nowadays to make hand-waving statements about whether some (lab-based) experiments resemble ‘real life,’ or whether some results obtained in the laboratory may or may not generalize to the ‘real-world.’

In our opinion, the contemporary framework of ecological validity eventually falls short of providing researchers with a tractable research program. Researchers seem to primarily base their judgments of ecological validity upon their own particular theoretical assumptions and considerations about the so-called artificiality–naturality and simplicity–complexity of experimental situations, typically in the absence of a more formal set of criteria. As such, while we certainly sympathize with the ‘call for ecological validity’, insofar it has motivated researchers to be critical about the limitations of experimental methods, we also think that the uncritical use of the term ecological validity has caused a lot of confusion, and in some cases has even been counterproductive. Perhaps the most problematic consequence of using the term ecological validity as an easy substitute for the ‘real-world’ was previously pointed out by Hammond (1998) . He commented that:

“There is, of course, no such thing as a “real-world.” It has been assigned no properties, and no definition; it is used simply because of the absence of a theory of tasks or other environments, and thus does not responsibly offer a frame of reference for the generalization” .

In Hammond’s view, the aim to understand cognitive and behavioral functioning in the ‘real-world’ is basically pointless if one does not first define this notion of the ‘real-world.’ As such, researchers have locked themselves “onto the horns of a false dilemma” ( Hammond and Stewart, 2001 , p. 7). Thus, in order to talk sensibly about whether some results can also be generalized to particular situations beyond the experimental conditions in which those results were obtained, researchers first need to specify the range and distributions of the variables and conditions to which their results are supposed to be applicable. Since the notion of the ‘real-world’ patently lacks specificity, this phrase inevitably hampers researchers to specify the range and boundary conditions of cognitive and behavioral functioning in any given research context, and thus precludes one from getting at the context-specific and context-generic principles of cognition and behavior (see also Kruglanski, 1975 ; Simons et al., 2017 ).

The Nature of the Environment?

Instead of trying to understand cognitive and behavioral functioning in the ‘real-world’, we completely agree with Hammond (1998) that the charge of researchers is to always specify and describe the particular context of behavior in which one is interested. Ultimately, the real challenge for researchers is to develop a theory of how specific environmental contexts are related to various forms of cognitive and behavioral functioning. But what constitutes a psychologist’s theory of the environment? Researchers in psychological science are typically concerned with the nature of the organism, yet, the nature of the environment and its relation to cognitive and behavioral functioning has received considerably less attention from a theoretical point of view ( Barker, 1966 ; Heft, 2013 ). Interestingly, there have been several scholars who have dedicated themselves to precisely this question, and whose theories of cognition and behavior included a clear perspective on the nature of the environment.

According to Tolman and Brunswik (1935) , the nature of the environment, as it appears to the organism, is full of uncertainties. The organism perceives the environment as an array of proximal ‘cues’ and ‘signs’ (i.e., information sources), which are the ‘local representatives’ of various distal objects and events in the organism’s environment. To function more or less efficiently, the organism needs to accumulate, combine, and substitute the information it derives from the available ‘cues’ and ‘signs,’ so that it can adequately adjust its means to achieve its behavioral goals (e.g., finding food or shelter). However, since the environment is inherently probabilistic and only partly predictable, the organism continually needs to adjust its assumptions about the state of the environment based on the available information sources. Another example is given by Barker (1968) , whose concept of ‘behavior settings’ (see also Heft, 2001 ) is key in describing how the environment shapes the frequency and occurrence of human cognition and behavior. Important to behavior settings is that they are the product of the collective actions of a group of individuals. Their geographical location can be specified (e.g., the supermarket, the cinema, etc.), and they have clear temporal and physical boundaries (e.g., opening hours, a door to enter and exit the building). Behavior settings are ‘independent’ of an individual’s subjective experience, yet what goes on inside any behavior setting is characterized by a high degree of interdependency and equivalence of actions between individuals (e.g., most people who are inside a supermarket are shopping for groceries and people in cinemas are watching movies). Another ‘classic’ example of a theory of the environment can be found in J. J. Gibson’s book The Ecological Approach to Visual Perception (1979/2014). According to Gibson, there exists a strong mutuality and reciprocity between the organism and its environment. He introduced the concept of ‘affordances’ to explain how the inherent ‘meaning’ of things (i.e., functional significance to the individual) can be directly perceived by an individual perceiver and how this ‘information’ shapes the possibilities for potential actions and experiences. For example, a sufficiently firm and smooth surface may be walk-on-able, run-on-able, or dance-on-able, whereas a rough surface cluttered with obstacles does not afford such actions ( Heft, 2001 ). In short, affordances are properties of an organism-environment system. They are perceiver-relative functional qualities of an object, event or place in the environment and they are dependent on the particular features of the environment and their relationships with the functional capabilities of a particular individual (for more in-depth discussions, see e.g., Heft, 2001 ; Stoffregen, 2003 ).

In order to describe and specify the environment and its relation to cognitive and behavioral functioning, we may draw on these scholars to guide us in a more specific direction. While we do not specifically recommend any of these perspectives, we think they are illuminating because these scholars motivate us to ask questions such as: What is the specific functional context of the cognitive and behavioral processes one is interested in? What are the relevant variables and conditions in this context given one’s focus of inquiry and level of analysis? What do we know or assume to know about the range and distribution of these variables and conditions? And how can these variables and conditions be represented in experimental designs to study specific patterns of cognitive and behavioral functioning? In order to answer some these questions, several researchers have emphasized the importance of first observing how people behave in everyday situations prior to experimentation. For example, Kingstone et al. (2008) advocated for an approach called Cognitive Ethology , which proposes that researchers should first observe how people behave in everyday situations before moving into the laboratory. In a similar vein, Adolph (2019) proposes that researchers should start with a rich description of the behaviors they are interested in order to first identify the “essential invariants” of these behaviors (p. 187).

The Field of Social Attention: Away From the Real-World and Toward Specificity About Context

To exemplify how some of the ideas outlined above may be useful to researchers, we will apply these ideas to a research topic of our interest: social attention. The field of social attention, as briefly discussed previously, is primarily focused on how attention is influenced by socially relevant objects, events, and situations, most notably, interactions with other social agents. In recent decades, it has been argued extensively that the experimental arrangements used by researchers in this field need more ‘ecological validity’ in order to adequately study the relevant characteristics of social attention in the ‘real-world’ ( Risko et al., 2012 , 2016 ; Schilbach et al., 2013 ; Caruana et al., 2017 ; Macdonald and Tatler, 2018 ; Shamay-Tsoory and Mendelsohn, 2019 ). In the light of these concerns, several researchers have advocated to study “real-world social attention” ( Risko et al., 2016 , p. 1) and “real-world social interaction” ( Macdonald and Tatler, 2018 , p. 1; see also Shamay-Tsoory and Mendelsohn, 2019 ). One example of this is given by Macdonald and Tatler (2018) . In this study, Macdonald and Tatler (2018) investigated how social roles given to participants influenced their social gaze behavior during a collaborative task: baking a cake together. Participants were either not given explicit social roles, or they were given a ‘Chef’ or ‘Gatherer’ role. Macdonald and Tatler (2018) showed that, regardless of whether social roles were assigned or not, participants did not gaze at their cake-baking partners very often while carrying out the task. After comparing their results with other so-called ‘real-world interaction studies’ (e.g., Laidlaw et al., 2011 ; Wu et al., 2013 ), the authors stated that: “we are not able to generalize about the specific amount of partner gaze during any given real-world interaction” ( Macdonald and Tatler, 2018 , p. 2171). We think that this statement clearly illustrates how the use of ‘real-world’ and ‘real life’ labels may lead to misleading and potentially counterproductive conclusions, as it seems to imply that ‘real-world interactions’ encompass a clearly defined category of behaviors. However, as argued previously, these so-called ‘real-world interactions’ are not a clearly defined category of behaviors. Instead, statements about generalizability need to be considered within a more constrained and carefully defined context (cf. Brunswik, 1956 ; Simons et al., 2017 ). This would make it more clear what researchers are talking about instead of subsuming studies under the big umbrella of the ‘real-world.’ For example, if the goal is to study how the cognitive and behavioral processes of social attention are influenced by different contexts and situations, researchers need to specify social gaze behavior as a function of these different contexts and situations.

Thus, instead of studying ‘real-world’ social attention in the context of ‘real-world’ social interactions, researchers should first try to describe and understand cake-baking attention ( Macdonald and Tatler, 2018 ), sharing-a-meal attention ( Wu et al., 2013 ), waiting-room attention ( Laidlaw et al., 2011 ), walking-on-campus attention ( Foulsham et al., 2011 ), Lego-block-building attention ( Macdonald and Tatler, 2013 ), playing-word-games attention ( Ho et al., 2015 ), interviewee-attention ( Freeth et al., 2013 ), and garage-sale attention ( Rubo and Gamer, 2018 ). By doing so, we may begin to understand the context-generic and context-specific aspects of attentional processes, allowing for a more sophisticated theory of social attention. These examples not only show the wide variety of behavioral tasks and contexts that are possible to study in relation to social attention, they also show that uncritical references to ‘ecological validity’ a.k.a. ‘real-worldliness’ are not very helpful to specify the relevant characteristics of particular behavioral contexts.

There are also good examples where researchers have been more explicit about the specific characteristics of social situations that they are interested in. Researchers in the field of social attention have, for example, tried to unravel the different functions of gaze behavior. One important function of gaze behavior is to acquire visual information from the world, however, within a social context, gaze may also signal important information to others which may be used to initiate and facilitate social interaction (see e.g., Gobel et al., 2015 ; Risko et al., 2016 ). In a series of experiments, researchers have systematically varied whether, and the degree to which social interaction between two people was possible, and measured how gaze was modulated as a function of the social context ( Laidlaw et al., 2011 ; Gobel et al., 2015 ; Gregory and Antolin, 2019 ; Holleman et al., 2020 ). In other studies, researchers have been explicit about the task-demands and social contexts that elicit specific patterns of gaze behavior, for example, in the context of face-to-face interactions and conversational exchanges ( Ho et al., 2015 ; Hessels et al., 2019 ). We think that, if researchers would try to be more explicit in their descriptions of task-demands and social contexts in relation to gaze, this may prove to be a solid basis for a more sophisticated theory of social attention, yet such work remains challenging (for a recent review, see Hessels, in press ).

We have argued that the ‘real-world approach’ and its call for ecological validity has several problems. The concept of ecological validity itself is seldom defined and interpretations differ among researchers. We believe that references to ecological validity and the ‘real-world’ can become superfluous if researchers would clearly specify and describe the particular contexts of behavior in which they are interested. This will be a more constructive way to uncover the context-specific and context-generic principles of cognition and behavior. As a final note, we hope that editors and reviewers will safeguard journals from publishing papers where terms such as ‘ecological validity’ and the ‘real-world’ are used without specification.

Author Contributions

GH and RH drafted the manuscript. RH, IH, and CK edited and revised the manuscript.

GH and RH were supported by the Consortium on Individual Development (CID). CID is funded through the Gravitation programme of the Dutch Ministry of Education, Culture, and Science and the NWO (grant no. 024.001.003 awarded to author CK).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Aanstoos, C. M. (1991). Experimental psychology and the challenge of real life. Am. Psychol. 1:77. doi: 10.1037/0003-066x.46.1.77

CrossRef Full Text | Google Scholar

Adami, C. (2002). What is complexity? Bioessays 24, 1085–1094.

PubMed Abstract | Google Scholar

Adolph, K. E. (2019). “Ecological validity: mistaking the lab for real life,” in My Biggest Research Mistake: Adventures and Misadventures in Psychological Research , Ed. R. Sternberg (New York, NY: Sage), 187–190. doi: 10.4135/9781071802601.n58

Alderman, N., Burgess, P. W., Knight, C., and Henman, C. (2003). Ecological validity of a simplified version of the multiple errands shopping test. J. Int. Neuropsychol. Soc. 9, 31–44. doi: 10.1017/s1355617703910046

PubMed Abstract | CrossRef Full Text | Google Scholar

Anderson, C. A., Lindsay, J. J., and Bushman, B. J. (1999). Research in the psychological laboratory: truth or triviality? Curr. Direct. Psychol. Sci. 8, 3–9. doi: 10.1111/1467-8721.00002

Araujo, D., Davids, K., and Passos, P. (2007). Ecological validity, representative design, and correspondence between experimental task constraints and behavioral setting: comment on Rogers. Kadar, and Costall (2005). Ecol. Psychol. 19, 69–78. doi: 10.1080/10407410709336951

Ashcraft, M., and Radvansky, G. (2009). Cognition , 5th Edn. Upper Saddle River, NJ: Pearson Education, Inc.

Google Scholar

Aspland, H., and Gardner, F. (2003). Observational measures of parent-child interaction: an introductory review. Child Adolesc. Mental Health 8, 136–143. doi: 10.1111/1475-3588.00061

Banaji, M. R., and Crowder, R. G. (1989). The bankruptcy of everyday memory. Am. Psychol. 44:1185. doi: 10.1037/0003-066x.44.9.1185

Barker, R. G. (1966). “On the nature of the environment,” in The Psychology of Egon Brunswik , ed. K. R. Hammond (New York: Holt, Rinehart and Winston).

Barker, R. G. (1968). Ecological Psychology: Concepts and Methods for Studying the Environment of Human Behavior. Stanford, CA: Stanford University Press.

Berkowitz, L., and Donnerstein, E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. Am. Psychol. 37:245. doi: 10.1037/0003-066x.37.3.245

Birmingham, E., Ristic, J., and Kingstone, A. (2012). “Investigating social attention: a case for increasing stimulus complexity in the laboratory,” in Cognitive Neuroscience, Development, and Psychopathology: Typical and Atypical Developmental Trajectories of Attention , eds J. A. Burack, J. T. Enns, and N. A. Fox (Oxford University Press), 251–276. doi: 10.1093/acprof:oso/9780195315455.003.0010

Blanco-Elorrieta, E., and Pylkkänen, L. (2018). Ecological validity in bilingualism research and the bilingual advantage. Trends Cogn. Sci. 22, 1117–1126. doi: 10.1016/j.tics.2018.10.001

Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. Am. Psychol. 32:513. doi: 10.1037/0003-066x.32.7.513

Brunswik, E. (1943). Organismic achievement and environmental probability. Psychol. Rev. 50:255. doi: 10.1037/h0060889

Brunswik, E. (1949). Remarks on functionalism in perception. J. Pers. 18, 56–65. doi: 10.1111/j.1467-6494.1949.tb01233.x

Brunswik, E. (1952). The Conceptual Framework of Psychology. Chicago: University of Chicago Press.

Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychol. Rev. 62:193. doi: 10.1037/h0047470

Brunswik, E. (1956). Perception and the Representative Design of Psychological Experiments. Berkeley: University of California Press.

Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54:297. doi: 10.1037/h0040950

Caro, T., and Stankowich, T. (2015). Concordance on zebra stripes: a comment on Larison et al.(2015). R. Soc. Open Sci. 2:150323. doi: 10.1098/rsos.150323

Caruana, N., McArthur, G., Woolgar, A., and Brock, J. (2017). Simulating social interactions for the experimental investigation of joint attention. Neurosci. Biobehav. Rev. 74, 115–125. doi: 10.1016/j.neubiorev.2016.12.022

Cohen, G., and Conway, M. A. (2007). Memory in the Real World. Abingdon: Psychology Press.

Danziger, K. (1994). Constructing the Subject: Historical Origins of Psychological Research. Cambridge: Cambridge University Press.

Davids, K. (1988). Ecological validity in understanding sport performance: some problems of definition. Quest 40, 126–136. doi: 10.1080/00336297.1988.10483894

Dhami, M. K., Hertwig, R., and Hoffrage, U. (2004). The role of representative design in an ecological approach to cognition. Psychol. Bull. 130:959. doi: 10.1037/0033-2909.130.6.959

Dunlosky, J., Bottiroli, S., and Hartwig, M. (2009). “Sins committed in the name of ecological validity: A call for representative design in education science,” in Handbook of Metacognition in Education , eds D. J. Hacker, J. Dunlosky, and A. C. Graesser (Abingdon: Routledge), 442–452.

Eaton, W. O., and Clore, G. L. (1975). Interracial imitation at a summer camp. J. Pers. Soc. Psychol. 32:1099. doi: 10.1037/0022-3514.32.6.1099

Edmonds, B. (1995). “What is complexity?-the philosophy of complexity per se with application to some examples in evolution,” in The Evolution of Complexity , Ed. J. T. Bonner (Dordrecht: Kluwer).

Foulsham, T., Walker, E., and Kingstone, A. (2011). The where, what and when of gaze allocation in the lab and the natural environment. Vis. Res. 51, 1920–1931. doi: 10.1016/j.visres.2011.07.002

Fredericksen, R., Bex, P. J., and Verstraten, F. A. (1997). How big is a Gabor patch, and why should we care? JOSA A 14, 1–12.

Freeth, M., Foulsham, T., and Kingstone, A. (2013). What affects social attention? Social presence, eye contact and autistic traits. PLoS One 8:e53286. doi: 10.1371/journal.pone.0053286

Frischen, A., Bayliss, A. P., and Tipper, S. P. (2007). Gaze cueing of attention: visual attention, social cognition, and individual differences. Psychol. Bull. 133:694. doi: 10.1037/0033-2909.133.4.694

Gell-Mann, M. (1995). What is complexity? Remarks on simplicity and complexity by the Nobel Prize-winning author of The Quark and the Jaguar. Complexity 1, 16–19. doi: 10.1002/cplx.6130010105

Gibson, J. J. (1950). The Perception of the Visual World. Cambridge: Houghton Mifflin Company.

Gibson, J. J. (1970). On the relation between hallucination and perception. Leonardo 3, 425–427.

Gibson, J. J. (2014). The Ecological Approach to Visual Perception: Classic Edition. New York, NY: Psychology Press. (Original date of publication 1979).

Gillis, J., and Schneider, C. (1966). “The historical preconditions of representative design,” in The Psychology of Egon Brunswik , ed. K. R. Hammond (New York, NY: Holt, Rinehart & Winston, Inc), 204–236.

Gobel, M. S., Kim, H. S., and Richardson, D. C. (2015). The dual function of social gaze. Cognition 136, 359–364. doi: 10.1016/j.cognition.2014.11.040

Goodwin, C. J. (2015). A History of Modern Psychology , 5 Edn. Hoboken, NJ: John Wiley & Sons.

Greenwald, A. G. (1976). Within-subjects designs: to use or not to use? Psychol. Bull. 83:314. doi: 10.1037/0033-2909.83.2.314

Gregory, N. J., and Antolin, J. V. (2019). Does social presence or the potential for interaction reduce social gaze in online social scenarios? Introducing the “live lab” paradigm. Q. J. Exp. Psychol. 72, 779–791. doi: 10.1177/1747021818772812

Hammond, K. R. (1998). Ecological Validity: Then and Now. Available online at: http://www.brunswik.org/notes/essay2.html (accessed April 1, 2020).

Hammond, K. R., and Stewart, T. R. (2001). The Essential Brunswik: Beginnings, Explications, Applications. New York, NY: Oxford University Press.

Hatfield, G. (2002). Psychology, philosophy, and cognitive science: reflections on the history and philosophy of experimental psychology. Mind Lang. 17, 207–232. doi: 10.1111/1468-0017.00196

Heft, H. (2001). Ecological Psychology in Context: James Gibson, Roger Barker, and the Legacy of William James’s Radical Empiricism. Hove: Psychology Press.

Heft, H. (2013). An ecological approach to psychology. Rev. Gen. Psychol. 17, 162–167. doi: 10.1037/a0032928

Heinrichs, R. W. (1990). Current and emergent applications of neuropsychological assessment: problems of validity and utility. Prof. Psychol. 21:171. doi: 10.1037/0735-7028.21.3.171

Hessels, R. S. (in press). How does gaze to faces support face-to-face interaction? A review and perspective. Psychonom. Bull. Rev. doi: 10.31219/osf.io/8zta5

Hessels, R. S., Holleman, G. A., Kingstone, A., Hooge, I. T. C., and Kemner, C. (2019). Gaze allocation in face-to-face communication is affected primarily by task structure and social context, not stimulus-driven factors. Cognition 184, 28–43. doi: 10.1016/j.cognition.2018.12.005

Hessels, R. S., van Doorn, A. J., Benjamins, J. S., Holleman, G. A., and Hooge, I. T. C. (2020). Task-related gaze control in human crowd navigation. Attent. Percept. Psychophys. doi: 10.3758/s13414-019-01952-9 [Online ahead of print]

Higginson, C. I., Arnett, P. A., and Voss, W. D. (2000). The ecological validity of clinical tests of memory and attention in multiple sclerosis. Arch. Clin. Neuropsychol. 15, 185–204. doi: 10.1016/s0887-6177(99)00004-9

Hilton, W. (1920). Applied Psychology: Driving Power of Thought. The Society of Applied Psychology . Available online at: http://www.gutenberg.org/files/33076/33076-h/33076-h.htm (accessed April 1, 2020).

Ho, S., Foulsham, T., and Kingstone, A. (2015). Speaking and listening with the eyes: gaze signaling during dyadic interactions. PLoS One 10:e0136905. doi: 10.1371/journal.pone.0136905

Hoc, J.-M. (2001). Towards ecological validity of research in cognitive ergonomics. Theor. Issues Ergon. Sci. 2, 278–288. doi: 10.1371/journal.pone.0184488

Hogarth, R. M. (2005). The challenge of representative design in psychology and economics. J. Econ. Methodol. 12, 253–263. doi: 10.1177/0269216311399663

Holleman, G. A., Hessels, R. S., Kemner, C., and Hooge, I. T. C. (2020). Implying social interaction and its influence on gaze behavior to the eyes. PLoS One 15:e0229203. doi: 10.1371/journal.pone.0229203

Holleman, G. A., Hooge, I. T. C., Kemner, C., and Hessels, R. S. (in press). The reality of ‘real-life’ neuroscience: a commentary on Shamay-Tsoory & Mendelsohn. Perspect. Psychol. Sci.

Jack, R. E., and Schyns, P. G. (2017). Toward a social psychophysics of face communication. Annu. Rev. Psychol. 68, 269–297. doi: 10.1146/annurev-psych-010416-044242

Jenkins, J. J. (1974). Remember that old theory of memory? Well, forget it. Am. Psychol. 29:785. doi: 10.1037/h0037399

Johnston, P., Molyneux, R., and Young, A. W. (2014). The N170 observed ‘in the wild’: robust event-related potentials to faces in cluttered dynamic visual scenes. Soc. Cogn. Affect. Neurosci. 10, 938–944. doi: 10.1093/scan/nsu136

Kingstone, A., Smilek, D., and Eastwood, J. D. (2008). Cognitive ethology: a new approach for studying human cognition. Br. J. Psychol. 99, 317–340. doi: 10.1348/000712607x251243

Koehler, J. J. (1996). The base rate fallacy reconsidered: descriptive, normative, and methodological challenges. Behav. Brain Sci. 19, 1–17. doi: 10.1017/s0140525x00041157

Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A., and Poeppel, D. (2017). Neuroscience needs behavior: correcting a reductionist bias. Neuron 93, 480–490. doi: 10.1016/j.neuron.2016.12.041

Kruglanski, A. W. (1975). The two meanings of external invalidity. Hum. Relat. 28, 653–659. doi: 10.1177/001872677502800704

Laidlaw, K. E., Foulsham, T., Kuhn, G., and Kingstone, A. (2011). Potential social interactions are important to social attention. Proc. Natl. Acad. Sci. U.S.A. 108, 5548–5553. doi: 10.1073/pnas.1017022108

Langton, S. R., Watt, R. J., and Bruce, V. (2000). Do the eyes have it? Cues to the direction of social attention. Trends Cogn. Sci. 4, 50–59. doi: 10.1016/s1364-6613(99)01436-9

Lappi, O. (2015). Eye tracking in the wild: the good, the bad and the ugly. J. Eye Mov. Res. 8:1. doi: 10.1016/j.dcn.2019.100710

Larison, B., Harrigan, R. J., Thomassen, H. A., Rubenstein, D. I., Chan-Golston, A. M., Li, E., et al. (2015). How the zebra got its stripes: a problem with too many solutions. R. Soc. Open Science 2:140452. doi: 10.1098/rsos.140452

Lewkowicz, D. J. (2001). The concept of ecological validity: what are its limitations and is it bad to be invalid? Infancy 2, 437–450. doi: 10.1207/s15327078in0204_03

Macdonald, R. G., and Tatler, B. W. (2013). Do as eye say: gaze cueing and language in a real-world social interaction. J. Vis. 13, 1–12. doi: 10.1167/13.4.6

Macdonald, R. G., and Tatler, B. W. (2018). Gaze in a real-world social interaction: a dual eye-tracking study. Q. J. Exp. Psychol. 71, 2162–2173. doi: 10.1177/1747021817739221

Mills, W. (1899). The nature of animal intelligence and the methods of investigating it. Psychol. Rev. 6:262. doi: 10.1037/h0074808

Mook, D. G. (1983). In defense of external invalidity. Am. Psychol. 38:379. doi: 10.1037/0003-066x.38.4.379

Neisser, U. (1976). Cognition and Reality: Principles and Implications Of Cognitive Psychology. San Fransisco, CA: W. H. Freeman and Company.

Neisser, U. (1991). A case of misplaced nostalgia. Am. Psychol. 46:34–36. doi: 10.1037/0003-066x.46.1.34

Osborne-Crowley, K. (2020). Social cognition in the real world: reconnecting the study of social cognition with social reality. Rev. Gen. Psychol. 1–15. doi: 10.4324/9781315648156-1

Parsons, T. D. (2015). Virtual reality for enhanced ecological validity and experimental control in the clinical, affective and social neurosciences. Front. Hum. Neurosci. 9:660. doi: 10.3389/fnhum.2015.00660

Parsons, T. D., Carlew, A. R., Magtoto, J., and Stonecipher, K. (2017). The potential of function-led virtual environments for ecologically valid measures of executive function in experimental and clinical neuropsychology. Neuropsychol. Rehabil. 27, 777–807. doi: 10.1080/09602011.2015.1109524

Peelen, M. V., and Kastner, S. (2014). Attention in the real world: toward understanding its neural basis. Trends Cogn. Sci. 18, 242–250. doi: 10.1016/j.tics.2014.02.004

Puce, A., and Bertenthal, B. I. (2015). The Many Faces of Social Attention: Behavioral and Neural Measures , eds A. Puce and B. I. Bertenthal (Switzerland: Springer).

Richter, J. N., Hochner, B., and Kuba, M. J. (2016). Pull or push? Octopuses solve a puzzle problem. PLoS One 11:e0152048. doi: 10.1371/journal.pone.0152048

Risko, E. F., Laidlaw, K., Freeth, M., Foulsham, T., and Kingstone, A. (2012). Social attention with real versus reel stimuli: toward an empirical approach to concerns about ecological validity. Front. Hum. Neurosci. 6:143. doi: 10.3389/fnhum.2012.00143

Risko, E. F., Richardson, D. C., and Kingstone, A. (2016). Breaking the fourth wall of cognitive science: real-world social attention and the dual function of gaze. Curr. Direct. Psychol. Sci. 25, 70–74. doi: 10.1177/0963721415617806

Rogers, S. D., Kadar, E. E., and Costall, A. (2005). Gaze patterns in the visual control of straight-road driving and braking as a function of speed and expertise. Ecol. Psychol. 17, 19–38. doi: 10.1207/s15326969eco1701_2

Rubo, M., and Gamer, M. (2018). “Virtual reality as a proxy for real-life social attention?,” Paper presented at the Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications. New York, NY.

Schilbach, L. (2015). Eye to eye, face to face and brain to brain: novel approaches to study the behavioral dynamics and neural mechanisms of social interactions. Curr. Opin. Behav. Sci. 3, 130–135. doi: 10.1016/j.cobeha.2015.03.006

Schilbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G., Schlicht, T., et al. (2013). Toward a second-person neuroscience. Behav. Brain Sci. 36, 393–414. doi: 10.1017/s0140525x12000660

Schmuckler, M. A. (2001). What is ecological validity? A dimensional analysis. Infancy 2, 419–436. doi: 10.1207/s15327078in0204_02

Shallice, T., and Burgess, P. W. (1991). Deficits in strategy application following frontal lobe damage in man. Brain 114, 727–741. doi: 10.1093/brain/114.2.727

Shamay-Tsoory, S. G., and Mendelsohn, A. (2019). Real-life neuroscience: an ecological approach to brain and behavior research. Perspect. Psychol. Sci. 14, 841–859. doi: 10.1177/1745691619856350

Silverstein, C. H., and Stang, D. J. (1976). Seating position and interaction in triads: a field study. Sociometry 39, 166–170.

Simons, D. J., and Levin, D. T. (1998). Failure to detect changes to people during a real-world interaction. Psychonom. Bull. Rev. 5, 644–649. doi: 10.3758/bf03208840

Simons, D. J., Shoda, Y., and Lindsay, D. S. (2017). Constraints on generality (COG): a proposed addition to all empirical papers. Perspect. Psychol. Sci. 12, 1123–1128. doi: 10.1177/1745691617708630

Smilek, D., Birmingham, E., Cameron, D., Bischof, W., and Kingstone, A. (2006). Cognitive ethology and exploring attention in real-world scenes. Brain Res. 1080, 101–119. doi: 10.1016/j.brainres.2005.12.090

Smith, P. W., Feinberg, R. A., and Burns, D. J. (1998). An examination of classical conditioning principles in an ecologically valid advertising context. J. Market. Theor. Pract. 6, 63–72. doi: 10.1080/10696679.1998.11501789

Sonkusare, S., Breakspear, M., and Guo, C. (2019). Naturalistic stimuli in neuroscience: critically acclaimed. Trends Cogn. Sci. 23, 699–714. doi: 10.1016/j.tics.2019.05.004

Stoffregen, T. A. (2003). Affordances as properties of the animal-environment system. Ecol. Psychol. 15, 115–134. doi: 10.1016/j.humov.2019.01.002

Thorndike, E. (1899). A reply to “The nature of animal intelligence and the methods of investigating it”. Psychol. Rev. 6, 412–420. doi: 10.1037/h0073289

Thorndike, E. (2017). Animal Intelligence: Experimental Studies. Abingdon: Routledge.

Todorović, A. (2016) What’s in a Gabor Patch? Vol. 2016. Available online at: http://neuroanatody.com/2016/05/whats-in-a-gabor-patch/ (accessed April 1, 2020).

Tolman, E. C., and Brunswik, E. (1935). The organism and the causal texture of the environment. Psychol. Rev. 42:43. doi: 10.1037/h0062156

Valtakari, N. V., Hooge, I. T. C., Benjamins, J. S., and Keizer, A. (2019). An eye-tracking approach to Autonomous sensory meridian response (ASMR): the physiology and nature of tingles in relation to the pupil. PLoS One 14:e226692. doi: 10.1371/journal.pone.0226692

Wilson, B. A. (1993). Ecological validity of neuropsychological assessment: do neuropsychological indexes predict performance in everyday activities? Appl. Prevent. Psychol. 2, 209–215. doi: 10.1016/s0962-1849(05)80091-5

Winograd, E. (1988). “Continuities between ecological and laboratory approaches to memory,” in Emory Symposia in Cognition, 2. Remembering Reconsidered: Ecological and Traditional Approaches to the Study of Memory eds U. Neisser and E. Winograd (Cambridge: Cambridge University Press), 11–20. doi: 10.1017/cbo9780511664014.003

Wu, D. W.-L., Bischof, W. F., and Kingstone, A. (2013). Looking while eating: the importance of social context to social attention. Sci. Rep. 3:2356. doi: 10.1038/srep02356

Keywords : ecological validity, experiments, real-world approach, generalizability, definitions

Citation: Holleman GA, Hooge ITC, Kemner C and Hessels RS (2020) The ‘Real-World Approach’ and Its Problems: A Critique of the Term Ecological Validity. Front. Psychol. 11:721. doi: 10.3389/fpsyg.2020.00721

Received: 24 January 2020; Accepted: 25 March 2020; Published: 30 April 2020.

Reviewed by:

Copyright © 2020 Holleman, Hooge, Kemner and Hessels. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Gijs A. Holleman, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Skip to main content
  • Skip to primary sidebar

IResearchNet

Experimental Realism

Experimental realism definition.

Experimental realism is the extent to which situations created in social psychology experiments are real and impactful to participants.

Experimental Realism Background

Experimental Realism

Experimental Realism Versus Mundane Realism

Aronson and Carlsmith distinguished between experimental realism and mundane realism. Experimental realism refers to the extent to which participants experience the experimental situation as intended. Mundane realism refers to the extent to which the experimental situation is similar to situations people are likely to encounter outside of the laboratory. Social psychologists are generally more concerned with experimental realism than with mundane realism in their research because participants must find the situation attention-grabbing and convincing in order for the experiment to elicit targeted sets of beliefs, motives, and affective states necessary to test the research hypothesis. A study that accomplishes this can provide much important insight, independent of how much mundane realism it possesses.

For instance, in Stanley Milgram’s classic investigation of obedience, participants were instructed to administer a series of electric shocks to an unseen confederate (though no shocks were actually delivered). As part of a supposed learning study, participants acted as “teachers” and were instructed by the experimenter to administer shocks of increasing intensity for every wrong response offered by the confederate. The events of this study were highly artificial; it is certainly far from normal to administer shocks to another human being under the instruction of an experimental psychologist. Yet, rather than questioning the reality of the situation, participants became extremely invested in it. Because participants took the experimental reality seriously, they responded naturally, shuddering and laughing nervously as they obeyed and administered increasing levels of shock. Due to the high impact of this study, an otherwise sterile laboratory setting was transformed into a legitimate testing-ground for examining human obedience.

The Importance of Experimental Realism

Experimental realism is of central importance to experimental social psychology. To capture the essence of important social psychological phenomena within laboratory settings, it is often necessary to use deception to construct events that seem real, nontrivial, and impactful to participants, within the bounds of ethical considerations. When this is accomplished, participants are unlikely to be suspicious of deceptive tactics in experiments, allowing researchers to investigate the psychological variables they want to study.

  • Aronson, E., & Carlsmith, J. M. (1968). Experimentation in social psychology. In G. Lindzey & E. Aronson (Eds.), The handbook of social psychology (2nd ed., Vol. 2, pp. 1-79). Reading, MA: Addison-Wesley.

helpful professor logo

Ecological Validity in Psychology: Definition & Examples

Ecological Validity in Psychology: Definition & Examples

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

Learn about our Editorial Process

ecological validity in psychology, explained below

Ecological validity is a subset of external validity , specifically focusing on the extent to which behaviors observed and recorded in a study can be expected to occur in real-world settings (Nestor & Schutt, 2018).

To grasp the concept better, think about a study on eating habits conducted in a laboratory with measured portions . The controlled environment of a lab, with allocated meals, differs drastically from a typical day for most people.

Therefore, even if the study presents robust data about people’s eating habits under these specific circumstances, it has limited ecological validity. It may not accurately indicate how people would eat in their own homes and routine conditions. The essence here is how well the study emulates real-life situations. 

Ecological Validity Definition in Psychology

Ecological validity is defined in the below quote from the Encyclopedia of Social Psychology (Baumeister & Vohs, 2007):

“Ecological validity is the extent to which research findings would generalize to settings typical of everyday life.”

Maintaining ecological validity is a critical consideration in research designs (Hammersley, 2019). Failing to consider this can limit the generalizability of the findings and their application to real-world scenarios. The best research designs aim to balance internal and external validity, including ecological validity, while acknowledging the trade-offs between control and realism.

High Ecological Validity Examples

1. Studying Shopping Habits A researcher could observe consumer behavior during a Black Friday sale in a large retail store. Observing customers’ decision-making processes, reactions to discounts, crowd navigation, and purchase patterns in this real-world retail environment provides a high ecological validity as it accurately reflects behaviors in a natural shopping scenario.

2. Classroom Learning Let’s consider a longitudinal study that observes students’ note-taking habits during actual classes over a semester. The researchers could investigate how these habits relate to performance on tests. This study would possess excellent ecological validity since it reflects real classroom conditions instead of contrived lab settings (Monahan, 2013).

3. Internet Use A researcher interested in studying online behavior could install software to track participants’ internet usage over several months. These data, reflecting the participants’ actual browsing patterns, website preferences, and time spent online, provide a high degree of ecological validity (Boase, 2013).

4. Workplace Dynamics For example, researchers studying teamwork and collaboration could observe an existing team in a corporation working on a long-term project. By observing real interactions, conflicts, problem-solving, and workflow in a natural work environment, the study maintains strong ecological validity.

5. Physical Activity A two-week study using fitness trackers to accurately record participants’ daily physical activity levels, sleep patterns, and heart rates would have stronger ecological validity compared to research relying solely on participants’ memory or honesty in self-report surveys. Such direct monitoring reflects actual behavior and routines, eliminating possible false reporting (Troiano, 2017).

Low Ecological Validity Examples

1. A Memory Test Laboratory studies often use tasks which aren’t representative of real-world memory usage. For example, a study might ask participants to memorize a random sequence of ten numbers and recall them after a certain time period. This scenario has low ecological validity because it isolates memory from its everyday context where people usually remember meaningful information like phone numbers or addresses in a familiar sequence.

2. A Sleep Study A study that investigates sleep patterns by having participants sleep in a lab under unfamiliar conditions, connected to various machines to monitor their sleep, has low ecological validity. This scenario varies substantially from how we naturally sleep in our comfortable, familiar home environment. Consequently, data collected this way may not accurately represent usual sleep patterns and could be influenced by factors such as anxiety or discomfort in the lab environment.

3. Emotional Response Study Let’s consider a research study that uses images or videos to elicit emotional responses from participants in a controlled laboratory setting. This study may have low ecological validity as it doesn’t account for complex, nuanced emotional/stress responses that occur in real-life situations that are often influenced by a variety of unpredictable elements, personal relationships, or specific contexts (Coan et al., 2019).

4. Food Preference Study Research that seeks to understand people’s food preferences by offering food samples in a lab may also have low ecological validity. People’s food choices in the real world can be influenced by factors such as ambiance, company, mood, and cultural preferences – factors which a controlled lab environment can’t replicate.

5. Group Dynamics Study A study of group dynamics in a work team would have low ecological validity if it involved only strangers working together in a one-off, contrived task. A more ecologically valid study may observe established work teams over some time as they work on meaningful, long-term projects.

Limitations of Ecological Validity

As crucial as ecological validity is in research, it does have its set of limitations that one needs to be aware of.

  • Time Constraints: Studies with a high degree of ecological validity often demand much more time, effort, and resources than controlled laboratory studies (Nestor & Schutt, 2018). Researchers must manage unpredictable variables that naturally occur in real-world contexts, necessitating a higher degree of flexibility during the data collection process.
  • Need for Control: Maintaining control over confounding variables becomes much more complex outside of a laboratory setting (Kazdin, 2010). In natural environments, countless factors could influence the results.
  • Struggles to Determine Cause and Effect: The more control you have over your dependent and independent variables , the more you can infer cause and effect. This is why experiments (which have lower ecological validity) tend to be able to infer causality while observational studies (which tend to have higher ecological validity) are merely descriptive. I discussed this in detail in my recent article on experimental vs observational studies .
  • Generalizability: Results from studies with high ecological validity can sometimes lack generalizability. This happens because these studies are conducted in specific real-world settings – the behaviors observed might not be applicable to different contexts (Shadish, et al., 2011). 
  • Ethical Concerns: Ethical considerations might often limit what can be examined in real-world settings, making it challenging to observe certain behaviors in their natural environments, for example, emotions after traumatic events, or reactions to sensitive topics (Kenny, 2019).

While ecological validity offers valuable insights, it’s crucial to interpret results with an understanding of these limitations. 

Ecological Validity vs External Validity

Ecological validity is a sub-type of external validity . As Andrade (2018) argues:

“External validity examines whether the study findings can be generalized to other contexts. Ecological validity examines, specifically, whether the study findings can be generalized to real-life settings; thus ecological validity is a subtype of external validity.” (Andrade, 2018, p., 498)

External validity is the extent to which results of a study can be generalized to other contexts, including different settings, people, and times.

This type of validity answers the question:

Can we apply the findings of this study beyond the specific conditions under which the research was conducted? (Mook, 2010).

For instance, if an experimental medication was successfully tested in a controlled laboratory environment with a specific age group, external validity would assess the likelihood of its effectiveness in a broader demographic and in various environments.

Ecological validity is a more specific term. It specifically addresses the realism of the research setting and procedures, assessing if the conditions of a study are reflective of real-world circumstances.

Consider a study on group dynamics conducted within an office setting with actual coworkers versus a simulated office setting with strangers acting as coworkers. The former would have higher ecological validity due to its more natural environment and interactions.

Therefore, while external validity looks at the scope of generalization, ecological validity focuses on the truthfulness of the methodology to real-life situations (Bortoli, 2018).

Ecological Validity vs Mundane Realism

Ecological validity and mundane realism are two crucial dimensions of validity in the realm of psychological research, each possessing its unique characteristics and implications ( Nestor & Schutt, 2018 ).

Ecological validity refers to the degree to which the behaviors observed in a research setting reflect the behaviors that occur in natural settings. Emphasizing the relationship between conditions of the study and real-world situations, it regards the extent to which a study’s findings can be applied to everyday life (Kazdin, 2010).

Mundane realism, on the other hand, is a form of ecological validity that focuses more strictly on the surface resemblance between the research scenario and situations people might encounter in their everyday lives (Aronson, 2013).

Mundane realism relates to the physical and superficial characteristics of the research environment.

For example, a laboratory study of people’s behavior at a virtual workplace, using a realistic workplace setting with actual office equipment, would count as a study with high mundane realism. Its focus is the extent to which the experimental situation, materials, and tasks are similar to circumstances people encounter in real life.

Both ecological validity and mundane realism strive to increase the applicability of research findings to real-world scenarios, but they do so with different orientations. Ecological validity focuses on the functional relationship of the behavior—how well the research behavior predicts real-world behavior—whereas mundane realism is concerned with the superficial similar features of the experiment and the real-world situation (Bargh, 2002).

Andrade, C. (2018). Internal, external, and ecological validity in research design, conduct, and evaluation. Indian journal of psychological medicine , 40 (5), 498-499.

Aronson, E., Wilson T.D., Akert R.M. (2013). Social Psychology (8th Edition) . London: Prentice Hall.

Bargh, J. A. (2002). Losing Consciousness: Automatic Influences on Consumer Judgment, Behavior, and Motivation. Journal of Consumer Research, 29(3), 280-285. doi: https://doi.org/10.1086/341577

Baumeister, R. F. & Vohs, K. D. (2007). Encyclopedia of social psychology . New York: Sage.

Boase, J., & Ling, R. (2013). Measuring mobile phone use: Self-report versus log data. Journal of Computer-Mediated Communication, 18 (4), 508-519. doi: https://doi.org/10.1111/jcc4.12021

Coan, J. A., Schaefer, H. S., & Davidson, R. J. (2006). Lending a hand: Social regulation of the neural response to threat. Psychological science , 17 (12), 1032-1039. doi: https://doi.org/10.1111/j.1467-9280.2006.01832.x

Gehl, J. (2010). Cities for People . Island Press.

Hammersley, M. (2019). Reflections on Linguistic Repertoire: Revolutionary Science, Breaching Experiments and Ecological Validity. Qualitative Research, 19 (1), 7–22. doi:10.1177/1468794118765086

Kazdin, A. E. (2021). Research design in clinical psychology . Cambridge: Cambridge University Press.

Kenny, D. A. (2019). Enhancing validity in psychological research. The American Psychologist , 74 (9), 1018–1028. https://doi.org/10.1037/amp0000531

Monahan, T., McArdle, G., & Bertolotto, M. (2013). Virtual reality for collaborative e-learning. Computers & Education, 60 (1), 1-15. Doi: https://doi.org/10.1016/j.compedu.2006.12.008

Mook, D. G. (2010). Classic experiments in psychology. Westport, Conn.: Greenwood Press.

Nestor, P. G., & Schutt, R. K. (2018). Research methods in psychology: Investigating human behavior . New York: Sage Publications.

Oldham, G. R., & Fried, Y. (2016). Job design research and theory: Past, present and future. Organizational behavior and human decision processes , 136 , 20-35. doi: https://doi.org/10.1016/j.obhdp.2016.05.002

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2011). Experimental and quasi-experimental designs for generalized causal inference . Boston: Houghton Mifflin.

Troiano, R. P., Berrigan, D., Dodd, K. W., Masse, L. C., Tilert, T., & McDowell, M. (2008). Physical activity in the United States measured by accelerometer. Medicine and science in sports and exercise , 40 (1), 181.

Chris

  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 10 Reasons you’re Perpetually Single
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 20 Montessori Toddler Bedrooms (Design Inspiration)
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 21 Montessori Homeschool Setups
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 101 Hidden Talents Examples

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Evaluating the perceived affective qualities of urban soundscapes through audiovisual experiments

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations Faculty of Visual Arts, Federal University of Goiás, Goiânia, Brazil, Chair of Acoustics and Haptics, Technische Universität Dresden, Dresden, Germany

ORCID logo

Roles Conceptualization, Data curation, Formal analysis, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

Affiliations Acoustics Research Centre, University of Salford, Manchester, United Kingdom, Environmental Research & Innovation Centre, University of Salford, Manchester, United Kingdom

Roles Conceptualization, Resources, Supervision, Validation, Writing – review & editing

Affiliation Chair of Acoustics and Haptics, Technische Universität Dresden, Dresden, Germany

Roles Conceptualization, Resources, Supervision, Writing – review & editing

  • Maria Luiza de Ulhôa Carvalho, 
  • Margret Sibylle Engel, 
  • Bruno M. Fazenda, 
  • William J. Davies

PLOS

  • Published: September 5, 2024
  • https://doi.org/10.1371/journal.pone.0306261
  • Reader Comments

Fig 1

The study of the perceived affective qualities (PAQs) in soundscape assessments have increased in recent years, with methods varying from in-situ to laboratory. Through technological advances, virtual reality (VR) has facilitated evaluations of multiple locations in the same experiment. In this paper, VR reproductions of different urban sites were presented in an online and laboratory environment testing three locations in Greater Manchester (‘Park’, ‘Plaza’, and pedestrian ‘Street’) in two population densities (empty and busy) using ISO/TS 12913–2 (2018) soundscape PAQs. The studied areas had audio and video recordings prepared for 360 video and binaural audio VR reproductions. The aims were to observe population density effects within locations (Wilcoxon test) and variations between locations (Mann-Whitney U test) within methods. Population density and comparisons among locations demonstrated a significant effect on most PAQs. Results also suggested that big cities can present homogenous sounds, composing a ‘blended’ urban soundscape, independently of functionality. These findings can support urban design in a low-cost approach, where urban planners can test different scenarios and interventions.

Citation: Carvalho MLdU, Engel MS, Fazenda BM, Davies WJ (2024) Evaluating the perceived affective qualities of urban soundscapes through audiovisual experiments. PLoS ONE 19(9): e0306261. https://doi.org/10.1371/journal.pone.0306261

Editor: Shazia Khalid, National University of Medical Sciences, PAKISTAN

Received: October 31, 2023; Accepted: June 13, 2024; Published: September 5, 2024

Copyright: © 2024 Carvalho et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data (scores for the perceived affective qualities - PAQs) are within the manuscript and its Supporting Information files.

Funding: The work was funded by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Finance Code 001 and the Universidade Federal de Goiás pole Goiânia, Brazil. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Since the publication of the ISO/TS 12913–2 [ 1 ], the characterisation of the affective attributes regarding the sonic environment has increased significantly over the years [ 2 – 7 ]. These affective attributes, or Perceived Affective Qualities (PAQs), originated from Axelsson et al. [ 8 ] research. They helped to detect the sound qualities of the investigated area, resulting in tools for urban sound management, effective urban planning, and noise control [ 9 ]. Studies point out that understanding emotional responses to soundscape supports design decisions [ 10 ], a better opportunity to achieve users’ satisfaction [ 11 ], and quality of life [ 12 ].

Regarding the emotional assessment of the acoustic environments, the work of Axelsson et al. [ 8 ] has been the reference for soundscape research. Their model was based on Russell’s circumplex affective model for environments [ 13 ]. Axelsson et al. [ 8 ] synthesised the semantic scales into a two-dimensional space constructed by pleasantness and eventfulness, which later was adopted as the PAQs in method A of the standard ISO/TS 12913–2 [ 1 ]. When rotating these two axes at 45 degrees, their diagonals result in additional dimensions, composed of the mixture related to the pleasant and eventful orthogonal axes. Thus, the standard ISO/TS 12913–2 introduces and describes the resulting eight attributes’ pairs: ‘eventful-uneventful’, ‘pleasant-annoying’, ‘vibrant-monotonous’, and ‘calm-chaotic’. However, this model is still under investigation and validation in other languages through the Soundscape Attributes Translation Project [ 14 ]. For instance, soundscape investigators lack consensus in identifying the origins and effects of emotional responses to sounds [ 4 , 15 , 16 ]. To assess these scales, researchers use self-reports, where people perceive these sounds through methods ranging from in-situ experiments to laboratory experiments, including virtual reality (VR).

The main methods for subjective data collection in soundscape studies have been soundwalks, interviews, listening tests, and focus groups [ 17 ]. The ISO/TS 12.913–2 suggests the first two methods [ 1 ]. However, the systematic review from Engel et al. [ 17 ] demonstrated that most recent studies use listening tests with the main topic of ‘soundscape quality’, using semantic differential tools to evaluate the stimuli of parks, squares, shopping areas, and traffic sounds, with students and academic staff as participants [ 17 ]. The controlled environment of the experiments happens in acoustically treated rooms with calibrated audio reproduction systems [ 18 ]. These studies allow the investigation of various aspects influencing auditory codification and perception [ 19 ], guaranteeing purity and control of factors [ 18 ], and enabling analyses of complex interactions or distinct effects [ 20 ]. In the laboratory, there are several listening experiment modalities, including with and without visual material [ 21 ], from simple (mono) [ 22 ] to complex audio reproduction (spatial audio) [ 23 ], multimodality (different sensorial stimuli), potentially implemented through Virtual Reality (VR) experiments.

Furthermore, VR technology can facilitate the evaluation of multiple locations in the same experiment under safe conditions [ 18 ] in a more engaging experiment [ 24 ], allowing observations of the effects on presence, realism, involvement, distraction level, and auditory aspect [ 25 ]. Participants are immersed in realist scenarios, giving them a ‘sense of presence’ [ 26 ], representing a similar experience of being in the real place. Audio, visual, tactile, and smells can enhance the multimodal experience. Regarding the virtual sonic environment, reproduction formats vary from mono to spatial audio [ 27 ]. Binaural audio played by headphones and ambisonics audio through loudspeakers are the main forms of audio reproduction in soundscape studies. In Sun et al. [ 28 , 29 ] study, when testing spatial audio through headphones and loudspeakers in a VR experiment, participants subjective responses demonstrated that the sense of immersion and realism were not affected by the type of audio reproduction.

Nevertheless, field and VR laboratory tests should sustain the experimental ‘ecological validity’. To guarantee this experimental condition, the laboratory reproduction of real-life audiovisual stimuli should create a similar sense of immersion and realism as in the original scenery [ 30 ]. If similarities are maintained between real and VR reproductions, laboratory experiments can support research with controlled factors. However, this may amplify results and biased conclusions, thus, outcomes should be interpreted cautiously [ 6 ]. So far, most studies have confirmed similar soundscape perceptions between in-situ and laboratory VR listening experiments [ 6 , 31 – 33 ], pointing out VR methods as a good strategy for soundscape research.

Another self-report data collection method is online experiments, which increased significantly during COVID-19. For example, the Lucid platform for online data collection in research tripled in purchases from 2019 to 2020 [ 34 ]. The drawbacks of online experiments are reduced attentiveness [ 34 ], the lack of controlled audio reproductions and system calibration used by the participants [ 32 ], the absence of assistants during the experiment, and unreliable responses given by different participants due to their context, among others [ 35 ]. The advantages of using a web-based approach in soundscape studies include a higher number of participants, ease of sharing, and engagement of citizens in sound design and urban planning. Regarding the urban sound design, ‘local experts’, people who live and use the studied location [ 36 ], local authorities, planners, designers and whoever is related to the site, should discuss their interests to indicate activities to the urban place [ 37 ]. Diversity in activities tends to create a more dynamic atmosphere in urban places. In these circumstances, acoustic zoning consists in giving the distance in space, time, or both [ 37 ]. Bento Coelho describes in his soundscape design process that a sound catalogue or sound identity map should be developed, where sounds are correlated to functions, activities, other senses, and preferred sounds of the place [ 38 ]. Additionally, appropriateness [ 7 ], and the expectations [ 39 ] of the sonic environment should reach towards a coherent soundscape. The guidelines mentioned above can delimit the acoustic zones based on sound sources, avoiding ‘lo-fi’ soundscapes. The latter represents sounds that are not easily located in an obscure population of sounds [ 40 ]—which may represent a ‘blended’ sonic environment. Its opposite is the ‘hi-fi’ soundscape with a clear distinction between foreground and background sounds [ 40 ], making it simple to identify the predominant sound source in the sonic environment.

The acoustically delimitated zones can correlate to the characteristics and functions of the locations. Urban soundscape studies have sites varying among natural places, public areas, squares, pedestrian streets, and shopping areas [ 17 ]. However, vibrant places are less studied. These are related to pleasant and eventful attributes linked to busy contexts in specific human activities [ 41 ]. Previous works confirm that the ‘presence of people’ in places leads to the ‘eventful’ dimension and may define a vibrant experience [ 3 , 29 ]. Most soundscape studies investigate parks, where natural sounds indicate psychological restoration [ 42 ], places for human de-stress [ 5 , 42 ], and improvement in the sonic environment evaluation [ 43 ]. These locations may represent pleasant places that can flourish feelings of joy and facilitate the public into fulfilling self-selected activities.

Based on the presented factors, this work adopts VR experiments through an online VR experiment, The Manchester Soundscape Experiment Online (MCR online), carried out in 2020, and a laboratory VR experiment, The Laboratory VR Soundscape Experiment (LAB VR), carried out in 2022, using spatial audio and 360° video recordings. Participants will be exposed to three urban sites (Peel Park—an urban park; Market Street—a pedestrian street; and Piccadilly Gardens—a plaza) in two population densities (empty and busy), followed by a self-report of the soundscape PAQs. The investigated hypotheses are four statements stated below. The Wilcoxon signed-rank test will be applied for comparisons within the two experiments, empty and busy conditions for the same location. In this case, the null and alternative hypotheses are:

  • H 01 = The perceptual response (PAQs) will change when in different population densities in the same location and experiment; and
  • H a1 = The perceptual response (PAQs) will not change when in different population densities in the same location and experiment.

The Mann–Whitney U test will be applied to compare the different soundscape locations for each data collection method, being their hypotheses as follows:

  • H 02 = The perceptual response (PAQs) will change according to the different urban locations for each data collection method; and
  • H a2 = The perceptual response (PAQs) will not change according to the different urban locations for each data collection method.

The PAQs of the ISO/TS 12913–2 [ 1 ] were selected as subjective responses given its international standardization. The aim is to observe the PAQ results from the previous two perspectives. The first view concerns an evaluation within each experiment where differences between the two population densities are analysed. Second, the variation between locations for each experimental method is investigated. Findings are considered to enhance comprehension of how people perceive the studied urban soundscape conditions through different VR methods, supporting urban sound design and future urban development appraisal [ 44 ].

2. Materials and methods

Fig 1 illustrates the investigated areas defined according to a previous study by Carvalho et al. [ 45 ]. They were derived from a structured interview to identify locations within the four quadrants of the ISO/TS 12913–2 [ 1 ] PAQs quadrants (‘vibrant’, ‘calm’, ‘monotonous’, and ‘chaotic’ attributes).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

The top illustrates all locations on the Manchester map. The middle row shows the ‘Street’ map, pictures of empty and busy conditions, the ‘Plaza’ map, and pictures of empty and busy conditions. The bottom row illustrates the ‘Park’ map, pictures of empty and busy conditions, north, and the UK map with Manchester’s position. The yellow dots are the evaluated sites. The areas shaded in blue are the areas studied. Pictures of Carvalho taken between 2019 to 2020.

https://doi.org/10.1371/journal.pone.0306261.g001

2.1 Study areas

Piccadilly Gardens (a popular plaza in the city centre) represented the ‘vibrant’ attribute called ‘Plaza’ from now on in the paper. Peel Park (a park at the University of Salford) exemplified the ‘calm’ attribute referred to as ‘Park’ hereafter. A bus stop (common bus stop in front of the University of Salford) corresponded to the ‘monotonous’ attribute, and Market Street (pedestrian commercial street) was selected for the ‘chaotic’ attribute, hereinafter, referred to as ‘Street’. The bus stop was excluded because the LAB VR experiment did not use this condition.

Piccadilly Gardens is the largest public space in central Manchester, with 1.49 Ha and various functions such as crossing, eating places, children’s play, and places for small and large events [ 46 ]. A contemporary design changed the garden into a Plaza in 2002 [ 46 ] that included a water fountain, playground, café store, a barrier by Japanese architect Tadao Ando that also served as protection of the central plaza, grass areas, and trees where people sit on sunny days. The location is surrounded by Piccadilly Street at the north, Mosley Street at the west, Parker Street at the south, and One Piccadilly Gardens building at the east side. The constant sound source in both population densities was sounds originating from the water fountain. In the empty condition, the fountain sound was predominant, but mechanical sounds were also present in the background. In the busy condition, the predominant sound was a rich presence of human sounds, such as chat and kids shouting, while traffic sounds from nearby trams and their breaks were audible in the background.

Peel Park has 9.40 Ha and is one of the oldest public parks in the world, dating from 1846 [ 47 ]. Today, it integrates with the Peel Park Campus of the University of Salford, including walking paths, tall and scattered trees, a playground structure, sculptures, a garden with flowerbeds, lots of green area, and benches to sit. The park is surrounded by the Student Accommodation and access to the David Lewis Sports Ground at the north; the River Irwell with a bridge to The Meadow, a public green space, and a housing area at the east; the Maxwell Building, and the Salford Museum and Art Gallery on the south; and the University House, the Clifford Library, and the Cockcroft Building at the westside. The local population uses the location for ‘passive’ recreation, exercise, and crossing paths to other sites. The constant sound source in both population densities was sounds of nature, specifically from the calls of birds. In the empty condition, four different bird calls were predominant and identified, them being ‘Pica Pica’, ‘Eurasian Wren’, ‘Redwing’, and the ‘Eurasian Tree Cree’. In the busy conditions, the bird call was not recognized, given the masking effects of human sounds, placing the nature sounds in the background, while the predominant foreground sounds were children talking, shouting, and playing football.

Market Street is approximately 370 meters long, with a 280-meter pedestrian zone occupying around 0.91 Ha. Exchange Street delimits it on the west until High Street on the east. The pedestrian zone is between High Street and Corporation Street, with primarily commercial activities such as clothes and shoe stores, banks, grocery stores, street food huts, gyms, bookstores, mobile stores, pharmacies, coffee stores, and three accesses to the Manchester Arndale Shopping. When the street gains traffic, commercial activities are more related to beauty products, confectionery, stationary, clothing and footwear, coffee shops, and access to the Royal Exchange Building. The constant sound source in both population densities was the ‘hoot’ from the nearby tram. In the empty condition, the predominant sounds were mechanical sounds, such as snaps of machinery in different rhythms and frequency intervals. Traffic and chats were also present in this condition. In the busy condition, snaps were still present, but predominance was related to human-made sounds, such as babble and footsteps.

2.2 Audiovisual preparation

Two different footages of the same studied areas were tested with two methods: an online VR questionnaire (MCR online) and a laboratory VR experiment (LAB VR). Audiovisual stimuli were different recordings in each experiment because participants of the MCR online complained about the video resolution. Thus, new recordings with a higher resolution camera occurred for the LAB VR. Nevertheless, all recordings were done in the same position. The study was conducted and approved by the Research, Innovation and Academic Engagement Ethical Approval Panel of the University of Salford (protocol code STR1819-31). Fig 2 illustrates the workflow for constructing the VR environments for the experiments.

thumbnail

Each column represents a stage.

https://doi.org/10.1371/journal.pone.0306261.g002

The Sound field microphone ST250 and the sound pressure level meter, type BSWA 308, were used in recordings with a sampling rate of 44.1 kHz. For the MCR online, the microphone was plugged into a ZOOM H6 Handy Recorder for the audios, and the Ricoh Theta S camera was used for the 360° videos. In the LAB VR, the microphone was plugged into an Edirol R44 Recorder, and an Insta 360 Pro2 360° video camera was used for video recording.

Given ethical approval restrictions, a sign warning ‘Filming in progress’ was displayed with the equipment for public awareness before recordings. With a previously calibrated sound pressure level meter, a one-minute sample of A-weighted equivalent continuous sound pressure (L Aeq,60 ) registered sound levels to adjust the field levels to laboratory reproductions. After initiating the microphone and camera, the researcher clapped in front of the equipment for future audiovisual alignment.

Recordings were done in the early hours (4 to 6 am) of a weekday for empty, and the afternoon (2 to 4 pm) at the weekend for busy conditions. On arrival, the locations were established so, as to not interrupt circulation. The experimenter merged into the scenery, and the recordings lasted 10 to 12 minutes [ 29 ]. These procedures resembled those done by the ‘Urban Soundscapes of the World’ project group [ 28 , 29 , 48 ].

Video files were transformed into equirectangular format (MCR online) or edited together (VR LAB). Audio and video stimuli were synchronised in time with the initial clap, verified and corrected when necessary. On the MCR online, the selected audiovisual stimuli had a 30-second duration following a previous study [ 49 ]. The stimuli duration changed to 8 seconds in the LAB VR, using as reference a fMRI soundscape experiment [ 50 ], because of a physiological test in another stage of the experiment.

A population density calculation occurred using the footage to select the audiovisual stimuli. The people-counting criteria followed a previous study that measured the number of individuals from a selected frame [ 51 ]. Surveys with ten participants were used to certify selected footage for empty and busy conditions. When the criteria failed, new stimuli selection took place. A descriptive analysis of the sound events, foreground and background sounds, was done of the footage with empty and busy conditions to select fragments rich in soundscape diversity [ 52 ], identity [ 53 ], character [ 54 ], and sound signal [ 40 ]. The LAB VR also had controlled sound signals, such as the water fountain at the ‘Plaza’, the tram hoot at the ‘Street’, and the bird calls at the ‘Park’ in empty and busy conditions.

Audio files were calibrated to the field sound levels using a pre-calibrated High-frequency Head and Torso Simulator (HATS) connected to a PULSE software of Brüel & Kjær [ 6 ]. Audiovisual stimuli were aligned through audio rotation using the azimuth angle θ from the first-order ambisonics equations, that is, audio X from front-back positions of B-format audio recordings—WXYZ) [ 22 ]. The audio and video files were rendered into 3D head-tracked stimuli for VR reproduction. Stimuli reproductions were tested through the final experimental VR and headphone setup, recorded for calibration, verified in each step, and corrected when necessary.

2.3 Participants and experimental procedures

Participants were recruited by the Acoustics Research Centre of Salford mailing list representing people with connections to the University of Salford, and above 18 years old in both experiments. The MCR online also had respondents recruited by convenience sampling over the internet on social networks, such as Facebook, Instagram, Twitter, and LinkedIn, and participated voluntarily from August 26 to November 30, 2020. The LAB VR received a compensation of £25 in Amazon voucher. These subjects were recruited from June 27 to August 5, 2022.

Conditions were three locations (‘Park’, ‘Plaza’, and ‘Street’) in two population densities (empty and busy) responding to the eight PAQs questions. MCR online had 80 individuals rating the ‘Plaza’ and ‘Street’ (80 x 2-sites x 2-densities x 8-PAQs = 2560 results), and 75 assessing the ‘Park’ (75 x 2-densities x 8-PAQs = 1200 results). LAB VR had 36 participants (36 x 3-sites x 2-densities x 8-PAQs = 1728 results).

At the beginning of both experiments, participants signed a written consent form and received an information sheet describing the experiment and its procedure. Given the MCR online also had Brazilian participants, the questionnaires were translated to the Portuguese language. Subjects were divided into two groups to reduce experimental time: ‘Plaza’ and ‘Street’, and ‘Park’ and a bus stop. Recommendations were to use headphones and, when using mobile phones, to turn into a landscape orientation for better performance.

In the LAB VR, tests were done inside a semi-anechoic chamber at the Acoustics Research Centre of the University of Salford, Manchester, UK. Considering that cases of COVID were still occurring (July 2022), an email detailed COVID-free protocol before arriving. Participants sat in the centre of the semi-anechoic chamber, watched a short video explaining the research, answered the general information questions, and conducted a training session. They watched the six audiovisual stimuli through the VIVE HMD with a Beyerdynamic DT 1990 Pro headset as many times as they wished and answered the subjective questions presented on a laptop.

Questionnaires were developed in an online platform. For the MCR online, the questionnaire began with a written consent form. General questions were asked about demographics (gender, age, nationality, and residency), auditory health (evidence of hearing loss, and tinnitus), and digital settings (what audio and video system they used during the experiment). Questions were responded to after watching each video. They were phrased: ‘Please, slide to the word that best describes the sounds you just heard. To the left (-) is NEGATIVE, and to the right (+) is POSITIVE.’ Paired PAQs presented with three synonyms each were ‘unpleasant-pleasant’, ‘uneventful-eventful’, ‘chaotic-calm’, and ‘monotonous-vibrant’ PAQs. Scores ranged from -10 to +10 for negative to positive semantic values of terms through a slider.

In the LAB VR, video and questions were randomly presented. General questions were demographic, auditory health (as in the MCR online), number of languages spoken, education level, and acoustic or music background (no, a little, moderate, and expert level). The experimental questions were formulated: ‘To what extent do you think the sound environment you just experienced was. . . 0 = Not at all, 50 = Neutral, and 100 = Extremely’. The PAQs were presented individually and rated through a slider. The soundscape attributes tested were ‘pleasant’, ‘calm’, ‘uneventful’, ‘monotonous’, ‘annoying’, ‘chaotic’, ‘eventful’, and ‘vibrant’ PAQs separately. In both experiments, there was a final open question to have feedback regarding experiments.

2.4 Statistical analysis

Since data collection had different scales, the MCR online results separated the Paired PAQs, and -10 to +10 ratings inverted to zero (0) to one hundred (100) scores, while the LAB VR maintained as in the original scale. A summary of collected data is presented in Table 1 . Statistical analysis included the Wilcoxon signed-rank test for comparisons of the empty and busy conditions within the same location, and the Mann–Whitney U test for comparing the different locations for the same population density, being both tests within the same experiment. Given comparisons were only between two conditions and data collection was on a continuous scale, a correction for multiple comparisons (Bonferroni) was unnecessary. Significant group differences were tested with the help of the statistical package IBM SPSS Statistics 29.0.1.0 ®.

thumbnail

https://doi.org/10.1371/journal.pone.0306261.t001

3.1 Descriptive analysis of participants

Table 2 presents the demographic information for the MCR online and LAB VR experiments. The MCR online occurred online from August to November 2020. The 155 participants came from 63 countries: 52% from Brazil, 12% from the UK, and 14% from other parts of the world, including Europe, Africa, North and South America, Asia, and the Middle East. In Group 1, 80% used a computer screen and 20% a smartphone to watch the videos, while 76% used headphones and 24% external audio to reproduce audio signals during the experiment. 89% declared they had no hearing loss, and 11% had some hearing loss. 77% mentioned not to have tinnitus, and 23% to have signs of tinnitus [ 45 ]. In Group 2, 86% used a computer screen and 14% a smartphone to watch the videos, while 65% used headphones and 35% external audio to reproduce audio signals during the experiment. 90% declared they had no hearing loss, and 10% had some hearing loss. 81% mentioned not to have tinnitus, and 19% to have signs of tinnitus [ 55 ].

thumbnail

https://doi.org/10.1371/journal.pone.0306261.t002

For the LAB VR, participants originated from 11 countries, with 47% from the United Kingdom, 17% from India, and 36% from other parts of the world including Europe, Africa, South America, and Asia. 97% declared no hearing loss, and 3% mild hearing loss. 83% mentioned not having tinnitus, and 17% heard infrequently or regularly signs of tinnitus.

The MCR online counted 4.3 times more participants (N = 155) compared to the LAB VR (N = 36). In summary, over 50% of Brazilians participated in the MCR online, followed by 12% of British with a predominant age range of 26 to 35 years old (35%) and balanced gender distribution.

3.2 Descriptive analysis of auditory stimuli

The acoustic and psychoacoustic characteristics of the auditory stimuli for each tested scenario are demonstrated in Tables 3 and 4 . For the MCR online, 17 visits from January to December 2019 on days with no precipitation were done at Peel Park, Piccadilly Gardens, and Market Street in empty and busy conditions to collect audio recordings for the online experiment. For the LAB VR, a total of nine visits to execute field recordings were done from December 2020 to July 2021 on days with no precipitation forecast in the empty and busy conditions at Piccadilly Gardens (Plaza), Market Street (Street), and Peel Park (Park).

thumbnail

https://doi.org/10.1371/journal.pone.0306261.t003

thumbnail

Loudness (N), Sharpness (S), Roughness (R), Fluctuation Strength (FS), and Tonality (T).

https://doi.org/10.1371/journal.pone.0306261.t004

As observed in Table 3 , the higher value for 1 min L Aeq on the MCR online was for the ‘Plaza’ busy scenario with 70 dB(A), while the smallest value was observed for the ‘Park’ empty scenario with 46 dB(A). In the LAB VR, the superior value was for the ‘Plaza’ empty with 64.5 dB(A), and the smallest appeared for the ‘Park’ empty scenario with 47.1 dB(A).

Table 4 shows the psychoacoustic metrics of each scenario’s auditory stimuli used for the LAB VR. Greater values are observed at the ‘Plaza’ busy for Loudness (N = 23.01 sone), Sharpness (S = 1.84 acum), and Tonality (T = 0.25 tu); at the ‘Park’ empty for Roughness (R = 0.03 asper); at the ‘Park’ busy for Roughness (R = 0.03 asper) and Tonality (T = 0.25 tu); and at the ‘Street ‘ busy for Roughness (R = 0.03 asper) and Fluctuation Strength (FS = 0.04 vacil). The smallest values are observed at the ‘Street’ empty for Loudnes (N = 10.61 sone), Sharpness (S = 1.31 acum), Roughness (R = 0.02 asper), Fluctuation Strength (FS = 0.02 vacil), and Tonality (T = 0.02 tu). It was also observed the smaller values of Sharpness(S = 1.31 acum) at the ‘Park’ busy, Roughness (R = 0.02 asper), at the ‘Plaza’ busy; Roughness (R = 0.02 asper), and Fluctuation Strength (FS = 0.02 vacil) at the ‘Plaza’ empty.

3.3 Wilcoxon signed-ranks test results for busy versus empty conditions

The Wilcoxon signed-ranks test evaluated how the spaces were rated in busy and empty conditions for each location and the data collection method. Table 5 shows the Wilcoxon signed-ranks test results, which suit two related samples with a non-normal distribution. Values with significant p-values indicate that there are differences between samples. 85.4% (41 PAQs) of results presented significant differences between empty and busy conditions in the studied locations, and 14.6% (7 PAQs) of results had an unexpected similarity. Fig 3 shows a set of boxplots for each studied area and data collection method, where comparing the results in busy and empty conditions is possible. It also represents the significance level of the Wilcoxon signed rank test using * for p-values below 0.05 and ** for p-values inferior to 0.001. In the boxplots, there is a higher distribution in busy conditions on positive qualities such as ‘calm’, ‘eventful’, ‘pleasant’ and ‘vibrant’ in all samples (3a-3f), while in empty conditions, ratings concentrated over the neutral answer. A smaller distribution of negative qualities such as ‘uneventful’ and ‘monotonous’ is also observed.

thumbnail

Columns for ‘Plaza’ (3a & 3d), ‘Park’ (3b & 3e), and ‘Street’ (3c & 3f); and rows for MCR online (3a-3c), and LAB VR (3d-3f). * for significant p-value at < .05, and ** for significant p-value at < .001.

https://doi.org/10.1371/journal.pone.0306261.g003

thumbnail

Where * represents the p-value for 2-tailed significance.

https://doi.org/10.1371/journal.pone.0306261.t005

As observed in Table 5 , the significant results for the MCR Online dataset between busy and empty presented in descending order were as follows: the ‘eventful’ PAQ in the ‘Street’ (Z = -7.16, p<0.001); the ‘vibrant’ PAQ in the ‘Plaza’ (Z = -6.888, p<0.001); the ‘uneventful’ PAQ in the ‘Street’ (Z = -6.647, p<0.001); the ‘calm’ in the ‘Park’ (Z = -6.645, p<0.001); the ‘monotonous’ PAQ in the ‘Street’ (Z = -6.629, p<0.001); the ‘pleasant’ PAQ in the ‘Park’ (Z = -5.791, p<0.001); the ‘chaotic’ PAQ in the ‘Street’ (Z = -4.626, p<0.001); and the ‘annoying’ PAQ in the ‘Plaza’ (Z = -3.685, p<0.001).

As observed in Table 5 , the PAQ with non-significant values on the MCR Online and LAB VR is the quality of ‘annoying’ with a score of zero in all studied areas, except on the MCR Online at the ‘Plaza’. Non-significant level ratings regarding the quality ‘pleasant’ were observed with a score around 50 at the ‘Plaza’ and ‘vibrant’ with a neutral score at the ‘Street’ studied areas. The non-significant p-values from the qualities mentioned above indicate no perceived acoustic differences between the empty and busy conditions.

For the LAB VR dataset, the superior difference between busy and empty were in descending order as follows: the ‘vibrant’ PAQ at the ‘Plaza’ (Z = -4.611, p<0.001); the ‘uneventful’ PAQ at the ‘Street’ (Z = -4.577, p<0.001); the ‘eventful’ PAQ at the ‘Park’ (Z = -4.263, p<0.001); the ‘monotonous’ PAQ at the ‘Street’ (Z = -4.229, p<0.001); the ‘calm’ PAQ at the ‘Park’ (Z = -4.227, p<0.001); the ‘chaotic’ PAQ at the ‘Street’ (Z = -3.99, p<0.001); and the ‘pleasant’ PAQ at the ‘Street’ (Z = -3.359, p<0.05).

3.4 Mann-Whitney U test results for comparison between locations

The Mann-Whitney U test helped compare the same population density condition among different locations in each data collection method. Table 6 shows the results of the Mann-Whitney U test, which suits two independent samples with non-normal distribution. Significant p-values indicate that there are differences between locations. Some PAQs had no differences among locations, meaning no significance with a p-value higher than 0.05. Figs 4 and 5 show the set of boxplots for each studied area comparisons and data collection, where it is possible to compare the results in busy and empty conditions. It also represents the significance level of the Mann-Whitney U tests using * for p-values below 0.05 and ** for p-values inferior to 0.001.

thumbnail

Columns for comparisons of ‘Plaza’ vs. ‘Street’ (4a & 4d), Park’ vs. ‘Street’ (4b & 4e), and ‘Park’ vs. ‘Plaza’ (4c & 4f); and rows for empty (4a-4c), and busy (4d-4f) conditions. * for significant p-value at < .05, and ** for significant p-value at < .001.

https://doi.org/10.1371/journal.pone.0306261.g004

thumbnail

Columns for comparisons of ‘Plaza’ vs. ‘Street’ (5a & 5d), Park’ vs. ‘Street’ (5b & 5e), and ‘Park’ vs. ‘Plaza’ (5c & 5f); and rows for empty (5a-5c), and busy (5d-5f) conditions. * for significant p-value at < .05, and ** for significant p-value at < .001.

https://doi.org/10.1371/journal.pone.0306261.g005

thumbnail

https://doi.org/10.1371/journal.pone.0306261.t006

For MCR online, 64.6% (31 PAQs) of results presented significant differences when comparing different locations, and 35.4% (17 PAQs) had similar results. Fig 4 shows the results from MCR online. It is possible to observe in the comparison of ‘Plaza’ vs. ‘Street’ that in the empty condition, there is a higher dispersion of results on the attribute ‘calm’ ( Fig 4A ). In contrast, in busy conditions, the same dispersion occurs on ‘vibrant’, ‘eventful’, ‘annoying’, ‘chaotic’, and ‘pleasant’ ( Fig 4D ). For the ‘Park’ vs. ‘Street’ comparison, the dispersion of responses in the empty condition happens on the ‘calm’, ‘monotonous’, and ‘uneventful’ attributes ( Fig 4B ), meanwhile, for the busy condition dispersion was on the ‘eventful’, ‘pleasant’, ‘vibrant’, ‘annoying’, and ‘chaotic’ attributes ( Fig 4E ). In the ‘Park’ vs. ‘Plaza’ comparison, the attributes with superior dispersion on the empty condition are ‘calm’, ‘monotonous’, and ‘uneventful’ ( Fig 4C ), while in the busy condition on the ‘eventful’, ‘vibrant’, ‘annoying’, and ‘chaotic’ attributes ( Fig 4F ).

Derived from Table 6 , the significant U values for each location comparison are presented in descending order. In the MCR Online dataset, the greatest differences between population density were as follows: for ‘Street’ vs. ‘Park’ busy, the ‘uneventful’ PAQ (U = 2754.5, p<0.05); for ‘Plaza’ vs. ‘Park’ busy, the ‘chaotic’ PAQ (U = 2471.5, p<0.05); for the same locations in the empty condition, the ‘monotonous’ PAQ (U = 2424.0, p<0.05); in the ‘Plaza’ vs. ‘Street’ busy, the ‘calm’ PAQ (U = 2405.0, p<0.05); and in the ‘Street’ vs. ‘Park’ empty, the ‘eventful’ PAQ (U = 2374.0, p<0.05).

Regarding the non-significant results also presented in Fig 4 for the MCR online, ratings around zero were observed in different PAQs, as follows: ‘uneventful’ in the ‘Plaza’ vs. ‘Street’ ( Fig 4D ), and ‘Park’ vs. ‘Plaza’ ( Fig 4F ) both for the busy condition; ‘eventful’ in the ‘Plaza’ vs. ‘Street’ ( Fig 4A ), and ‘Park’ vs. ‘Plaza’ ( Fig 4C ) both for the empty condition; ‘annoying’ for the ‘Park’ vs. ‘Plaza’ ( Fig 4C and 4F ) in both conditions; ‘calm’ in the ‘Park’ vs. ‘Plaza’ ( Fig 4C ) for the empty condition; and ‘chaotic’ in the ‘Park’ vs. ‘Plaza’ ( Fig 4C ) for the empty condition. Additionally, the ‘eventful’ scale had similar scores of around 50 for the ‘Plaza’ vs. ‘Street’ ( Fig 4D ) in the busy conditions. For the ‘uneventful’ scale, the comparisons of ‘Plaza’ vs. ‘Street’ ( Fig 4A ), and ‘Park’ vs. ‘Plaza’ ( Fig 4C ) in the empty condition had values around 20. The ‘pleasant’ PAQ scores were around 60 and 25 in the ‘Park’ vs. ‘Plaza’ for the empty ( Fig 4C ) and busy ( Fig 4F ) conditions, respectively. The ‘calm’ scores were around 60 in the ‘Park’ vs. ‘Plaza’ Fig 4C ) in the empty condition. For the busy condition, the ‘vibrant’ scores were around 25 in the ‘Park’ vs. ‘Plaza’ ( Fig 4F ).

For the LAB VR, 62.5% (30 PAQs) of results presented significant differences when comparing different locations, and 37.5% (18 PAQs) had similar results. Fig 5 shows the results from the LAB VR. Regarding the ‘Plaza’ vs. ‘Street’ comparison, the dispersion occurs on the attributes ‘calm’, ‘monotonous’, and ‘uneventful’ for the empty condition ( Fig 5A ); and ‘pleasant’, ‘eventful’, ‘vibrant’, ‘annoying’, and ‘chaotic’ on the busy condition ( Fig 5D ). In the ‘Park’ vs. ‘Street’ comparison, the dispersion of results occurs on the attributes ‘calm’, ‘monotonous’, and ‘ uneventful’ in the empty ( Fig 5B ), while ‘vibrant’, ‘chaotic’, and ‘annoying’ in the busy condition ( Fig 5E ). Finally, in the ‘Park’ vs. ‘Plaza’ comparison, the attributes with higher dispersion in the empty condition are ‘calm’, ‘pleasant’, ‘monotonous’, and ‘uneventful’ ( Fig 5C ). In the busy condition ( Fig 5F ), the dispersion was observed in ‘eventful’, ‘vibrant’, ‘annoying’, and ‘chaotic’ scales.

Derived from Table 6 , the significant U values for each location comparison are presented in descending order as follows: the ‘chaotic’ PAQ in the ‘Street’ vs. ‘Park’ empty (U = 563.0, p<0.05); the ‘annoying’ PAQ in the ‘Plaza’ vs. ‘Park’ busy (U = 506.5, p<0.05); the ‘uneventful’ PAQ in the ‘Plaza’ vs. ‘Park’ empty (P = 473.5, p<0.05); the ‘monotonous’ PAQ in the ‘Plaza’ vs. ‘Street’ empty (U = 457.5, p<0.05); the ‘monotonous’ PAQ in the ‘Street’ vs. ‘Park’ busy (U = 365.0, p<0.001); and the ‘calm’ PAQ in the ‘Plaza’ vs. ‘Street’ busy (U = 333.5, p<0.001).

Meanwhile, the non-significant results also noticed in Fig 5 , ratings around zero were observed in different PAQs, as follows: ‘uneventful’ in the ‘Park’ vs. ‘Street’ ( Fig 5E ), and ‘Park’ vs. ‘Plaza’ ( Fig 5F ) both in the busy condition; ‘monotonous’ in the ‘Park’ vs. ‘Plaza’ for both conditions ( Fig 5C and 5F ); ‘chaotic’ in the ‘Plaza’ vs. ‘Street’ empty; and ‘eventful’ in the ‘Plaza’ vs. ‘Park’ empty. Four out of six location comparisons had around zero scores for the ‘annoying’ attribute: the ‘Street’ vs. ‘Park’ empty, the ‘Plaza’ vs. ‘Park’ empty, and the ‘Plaza’ vs. ‘Street’ in both conditions ( Fig 5A and 5D ). Two comparisons scored around 50 for the ‘pleasant’ and ‘eventful’ scales in the ‘Park’ vs. ‘Plaza’ busy ( Fig 5F ). The two comparisons scored around 40 for the ‘calm’ attribute in the ‘Plaza’ vs. ‘Street’ empty ( Fig 5A ), and the ‘pleasant’ scale in the ‘Plaza’ vs. ‘Street’ busy ( Fig 5D ). A score of around 30 appeared for ‘pleasant’ in the ‘Plaza’ vs. ‘Street’ empty ( Fig 5A ). Meanwhile, the ‘uneventful’ score in the ‘Park’ vs. ‘Street’ for the empty condition ( Fig 5B ) was around -50, the ‘vibrant’ scale was around 10, and 60 in the ‘Park’ vs. ‘Plaza’ for the empty ( Fig 5C ), and busy conditions ( Fig 5F ), respectively.

4. Discussion

When verifying the hypothesis (H 01 ) regarding different population densities at the same site and experiment, the Wilcoxon signed-rank test demonstrated that 85% of comparisons were significantly different. The PAQs for ‘calm’, ‘eventful’, ‘pleasant’, ‘chaotic’, ‘monotonous’, and ‘uneventful’ corroborated with the null hypothesis, that is, they changed with the number of people in the scenario ( Fig 3A–3F ). The ‘annoying’ in the ‘Plaza’ for the LAB VR ( Fig 3A ), the ‘vibrant’ of all locations in the MCR online ( Fig 3A–3C ), and the same attribute at the ‘Park’ in the LAB VR ( Fig 3E ) were also significantly different with population densities. When relating to the ‘Plaza’, results corroborate with the strategic urban plan done in 2016 to improve Piccadilly Gardens (‘Plaza’) into a vibrant location [ 56 ]. These similar results may indicate that both experiment methods were equivalent, given recordings, methods, and locations were the same, but in different moments. That is, perceptions of calmness always changed with population density at the ‘Park’ as did perceptions of eventfulness, pleasantness, uneventfulness, chaotic, and monotonous changed at the pedestrian street (‘Street’). This observation points out that these attributes may be sound qualities to consider when studying similar locations.

In the ‘Plaza’, there was a constant water fountain sound. This sound could mask the background traffic noise, which can cause a positive sensation that could justify the same pleasant rating. This masking effect was also observed in the study related to environmental noise [ 57 ]. Similar results related to the ‘pleasant’ and ‘vibrant’ qualities of water features showed that three Naples waterfront sites had no differences among laboratory and online experiments [ 32 ]. This finding corroborates the concept of using water sound as a tool [ 58 , 59 ] to support urban sound management and planning [ 9 , 38 ].

When verifying the hypothesis (H 02 ) regarding differences among urban locations in the same population density and experimental method, the Mann-Whitney test presented 63% and 58% significant differences for the MCR online and the LAB VR, respectively. The ‘calm’ PAQ was significantly different among four comparing sites for the MCR online ( Fig 4A, 4B, 4D and 4E ). Meanwhile, the LAB VR had five comparing sites ( Fig 5B–5F ) which corroborates with the null hypothesis. This tendency indicates that the ‘calm’ soundscape quality may be easier to assess since quiet areas are the opposite of noise pollution. However, there is a misconception of the definition of ‘calm’, which is easily confused with the term ‘quiet’. The ‘calm’ term represents pleasant and harmonic sound sources, while the ‘quiet’ term refers to the absence of sound sources. The calmness is more associated with silence, relaxation, and a tranquil area [ 60 ]. In addition, regarding the empty locations, resemblances among scores may be expected, given early hours may evoke similar perceptions. The tendency of similar results was unexpected for the comparison among the park and plaza ( Fig 4F ), given that different space functionalities may indicate different soundscape ‘characters’ as observed by Bento Coelho [ 38 ] and Siebein [ 53 ].

In both experiments, neutral responses, considered here as values around zero, were observed with 56% for the Wilcoxon signed-ranked test, and 54% and 44% for the Mann-Whitney test at the MCR online and LAB VR, respectively (Figs 3 – 5 ). Such behaviour might be related to neutral emotions which are also common in public opinion polls, because people avoid conflicting issues, especially when indifferent, and not used to the research topic or location [ 61 , 62 ]. Furthermore, neutrality may be because of a lack of familiarity with location due to the absence of retrieved sound memory [ 63 ]. Since semantic memory consists of facts, concepts, data, general information, and knowledge [ 64 ], individuals’ opinions must be grounded in these elements to interpret and rate the sonic environment [ 65 ]. For example, in the Wilcoxon signed-rank test the busy condition, the ‘monotonous’ and ‘uneventful’ scales were around zero in the same compared locations in both methods ( Fig 3 ). Meanwhile, in the Mann-Whiteney test, unexpected similarities were observed in the MCR online within half compared locations for the ‘monotonous’ scale with values over zero ( Fig 4 ). Similar zero scores were observed in the location comparisons for the ‘chaotic’, ‘annoying’, and ‘eventful’ qualities in the ‘Plaza’ vs. ‘Park’ empty in both experimental methods (Figs 4 and 5 ).

Another possibility for the neutrality of responses may be due to the uniformity of soundscapes which gives an impression of ‘blended’ sounds. This sound could be denominated as a ‘blended urban soundscape’, common in big cities due to similar sound sources in different functioning landscapes, also identified by Schafer as a ‘lo-fi’ sound [ 40 ]. When the environment is excessively urbanised, where the population exceeds three million inhabitants, the sonic environment is somehow normalised, so that people do not identify differences among the diverse urban soundscapes. These urban sonic environments are dominant in traffic and human-made sounds, constantly present in the background, and natural sounds have become rare. These noises could cause neurological stress on the population, where they become anesthetised due to overwhelming urban sounds. As Le Van Quyen [ 66 ] recommended, urban citizens should practice a ‘mental detox’, which includes being in a quiet environment. Such a principle reinforces the importance of maintaining and preserving quiet areas. It is also important to notice that these ‘blended soundscapes’ should be avoided when designing urban sound zones, to give character [ 38 , 53 ] and create diversity [ 67 ] within each site.

Another factor may be socio-cultural differences since 50% of participants from the MCR online were Brazilian Portuguese speakers. Some PAQ English words may not represent a common term in the Brazilian Portuguese language, as observed in Antunes et al. [ 68 ]. These inconsistencies in translations were also encountered in participating countries of the SATP group [ 14 ], as observed in the Indonesian study [ 15 ]. Therefore, further investigations should continue to consolidate the English terminology [ 4 ] so that translations can improve. However, even though there was a neutrality of perceived responses, the psychoacoustic indicators for the ‘Plaza’ busy scene showed higher values in loudness, sharpness, and tonality due to the sound source characteristics of the location. The most common sound sources in this location were the water sound from the fountain, children playing and shouting (sharpness, loudness, and tonality), tram circulation and sounds of tram brakes (sharpness and tonality), and babble sounds (loudness) [ 17 , 69 ]. Most psychoacoustic indicators in the other locations and densities presented similar results, corroborating with the characteristics of the ‘blended’ soundscapes.

Limitations of this work consist of audio levels and different smartphone audio reproduction in the online experiment, as well as lack of familiarity with the study areas, ‘social desirability’ in which participants desire to please the researcher [ 70 ], and ‘experimenter effect’ where individuals need to use their critical thinking in a way they never had to do before [ 71 ]. Recommendations are to adjust audio levels to the field sound levels at the beginning of an online experiment [ 72 ]. In the case of smartphone use in the online experiments, it is also recommended to ask the participant to inform the brand of the device to verify the factory calibration of loudspeakers.

5. Conclusions

This work aimed to observe the PAQ results regarding differences among the two population densities for each location, and comparisons among locations for each experimental method. The study highlighted that there were significant results regarding the effect of population density and comparison among locations in the subjective responses. Still, the neutrality of results did not contribute to characterising the soundscape diversity in a megalopolis city. Meanwhile, the second hypothesis verified that the differences among locations within each experimental method demonstrated similar unexpected results. Such behaviour was discussed and could be related to the participants’ unfamiliarity with the location, and homogeneities of the urban sonic environment characterized here as ‘blended urban soundscapes’.

Based on the identified ‘blended soundscapes’, it is highlighted the importance of managing and planning the sonic environment by the clear delimitation of the acoustic zones in line with the functionality of the space. Furthermore, soundscape tools should be investigated to increase the diversity of sound sources, enhancing the sonic environment with elements such as masking, bio-phony, noise reduction, noise barriers, selection of urban materials, and sound art installations, among others.

Future works include evaluating other cities with lower population density to highlight the PAQs to avoid ‘blended’ soundscapes and enrich the sonic environment for VR experiments. Further neurologic evaluations must include more objective metrics in assessing cognitive responses to urban soundscapes and understanding how social-cultural differences are reflected in VR experiments. These VR findings can support urban design in a low-cost approach where urban planners can test different scenarios and interventions.

Supporting information

https://doi.org/10.1371/journal.pone.0306261.s001

Acknowledgments

The authors thank participants and the Acoustic Research Centre staff from the University of Salford, UK for their contributions.

  • 1. International Organization for Standardization. ISO/TS 12913–2. Acoustics–Soundscape. Part 2: Methods and measurements in soundscape studies. Geneva, Switzerland. 2018.
  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 14. Aletta F, et al. Soundscape assessment: Towards a validated translation of perceptual attributes in different languages. In: Inter-noise and noise-con congress and conference proceedings 2020 Oct 12 (Vol. 261, No. 3, pp. 3137–3146). Institute of Noise Control Engineering.
  • 27. Rumsey F. Spatial audio. Routledge; 2012 Sep 10.
  • 28. Sun K, Botteldooren D, De Coensel B. Realism and immersion in the reproduction of audio-visual recordings for urban soundscape evaluation. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2018 Dec 18 (Vol. 258, No. 4, pp. 3432–3441). Institute of Noise Control Engineering.
  • 38. Coelho JB. Approaches to urban soundscape management, planning, and design. Soundscape and the built environment. Jian Kang & Brigitte Schulte-Fortkamp (editors). CRC Press, 2016: 197–214. Boca Raton, USA. https://doi.org/10.1201/b19145-11
  • 40. Schafer RM. The soundscape: Our sonic environment and the tuning of the world. Simon and Schuster; 1993 Oct 1.
  • 44. Sanchez GM, Alves S, Botteldooren D. Urban sound planning: an essential component in urbanism and landscape architecture. In: Handbook of research on perception-driven approaches to urban assessment and design 2018 (pp. 1–22). IGI Global.
  • 48. De Coensel B, Sun K, Botteldooren D. Urban Soundscapes of the World: Selection and reproduction of urban acoustic environments with soundscape in mind. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2017 Dec 7 (Vol. 255, No. 2, pp. 5407–5413). Institute of Noise Control Engineering.
  • 53. Siebein GW. ‘Creating and Designing Soundscape’, in Kang J. et al. (eds) Soundscape of European Cities and Landscapes—COST. 2013, Oxford: Soundscape-COST, pp. 158–162.
  • 55. Carvalho ML, Davies WJ, Fazenda B. Manchester Soundscape Experiment Online 2020: an overview. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2023 Feb 1 (Vol. 265, No. 1, pp. 5993–6001). Institute of Noise Control Engineering.
  • 63. Engel MS, Carvalho ML, Davies WJ. The influence of memories on soundscape perception responses. In: DAGA 2022 Proceedings. 2022. DAGA Stuttgart, pp. 1–4.
  • 72. Sudarsono AS, Sarwono J. The Development of a Web-Based Urban Soundscape Evaluation System. InIOP Conference Series: Earth and Environmental Science 2018 May (Vol. 158, No. 1, p. 012052). IOP Publishing. https://doi.org/10.1088/1755-1315/158/1/012052

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Ecological Validity, External Validity, and Mundane Realism in Hearing Science

Affiliation.

  • 1 Hearing Sciences - Scottish Section, School of Medicine, University of Nottingham, Glasgow, United Kingdom.
  • PMID: 35030552
  • DOI: 10.1097/AUD.0000000000001202

Tests of hearing function are typically conducted in conditions very different from those in which people need to hear and communicate. Even when test conditions are more similar, they cannot represent the diversity of situations that may be encountered by individuals in daily life. As a consequence, it is necessary to consider external validity: the extent to which findings are likely to generalize to conditions beyond those in which data are collected. External validity has long been a concern in many fields and has led to the development of theories and methods aimed at improving generalizability of laboratory findings. Within hearing science, along with related fields, efforts to address generalizability have come to focus heavily on realism: the extent to which laboratory conditions are similar to conditions found in everyday settings of interest. In fact, it seems that realism is now tacitly equated with generalizability. The term that has recently been applied to this approach by many researchers is ecological validity . Recent usage of the term ecological validity within hearing science, as well as other fields, is problematic for three related reasons: (i) it encourages the conflation of the separate concepts of realism and validity; (ii) it diverts attention from the need for methods of quantifying generalization directly; and (iii) it masks a useful longstanding definition of ecological validity within the field of ecological psychology. The definition of ecological validity first used within ecological psychology-the correlation between cues received at the peripheral nervous system and the identity of distant objects or events in the environment-is entirely different from its current usage in hearing science and many related fields. However, as part of an experimental approach known as representative design , the original concept of ecological validity can play a valuable role in facilitating generalizability. This paper will argue that separate existing terms should be used when referring to realism and generalizability, and that the definition of ecological validity provided by the Lens Model may be a valuable conceptual tool within hearing science.

Copyright © 2022 Wolters Kluwer Health, Inc. All rights reserved.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interest to disclose.

  • Araújo D., Davids K., Passos P. (2007). Ecological validity, representative design, and correspondence between experimental task constraints and behavioral setting: Comment on Rogers, Kadar, and Costall (2005). Ecol Psychol, 19, 69–78. https://doi.org/10.1080/10407410709336951 - DOI
  • Aronson E., Carlsmith J. (1968). Experimentation in social psychology. G, Lindzey, E Aronson (Eds.), In: Handbook of Social Psychology (Vol. 2). Addison Wesley.
  • Bänziger T., Scherer K. R. (2005). The role of intonation in emotional expressions. Speech Commun, 46, 252–267. https://doi.org/10.1016/j.specom.2005.02.016 - DOI
  • Blascovich J., Loomis J., Beall A. C., Swinth K. R., Hoyt C. L., Bailenson J. N. (2002). Immersive virtual environment technology as a methodological tool for social psychology. Psychol Inquiry, 13, 103–124. https://doi.org/10.1207/S15327965PLI1302_01 - DOI
  • Bregman A. S. (1994). Auditory Scene Analysis : The Perceptual Organization of Sound . MIT Press.
  • Search in MeSH

Grants and funding

  • MRF_MRF-049-0004-F-BEEC-C0899/MRF/MRF/United Kingdom

LinkOut - more resources

Full text sources.

  • Ovid Technologies, Inc.
  • Wolters Kluwer

Research Materials

  • NCI CPTC Antibody Characterization Program

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

The ‘Real-World Approach’ and Its Problems: A Critique of the Term Ecological Validity

Gijs a. holleman.

1 Department of Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, Netherlands

2 Department of Developmental Psychology, Utrecht University, Utrecht, Netherlands

Ignace T. C. Hooge

Chantal kemner.

3 Brain Center, University Medical Center Utrecht, Utrecht, Netherlands

Roy S. Hessels

A popular goal in psychological science is to understand human cognition and behavior in the ‘real-world.’ In contrast, researchers have typically conducted their research in experimental research settings, a.k.a. the ‘psychologist’s laboratory.’ Critics have often questioned whether psychology’s laboratory experiments permit generalizable results. This is known as the ‘real-world or the lab’-dilemma. To bridge the gap between lab and life, many researchers have called for experiments with more ‘ecological validity’ to ensure that experiments more closely resemble and generalize to the ‘real-world.’ However, researchers seldom explain what they mean with this term, nor how more ecological validity should be achieved. In our opinion, the popular concept of ecological validity is ill-formed, lacks specificity, and falls short of addressing the problem of generalizability. To move beyond the ‘real-world or the lab’-dilemma, we believe that researchers in psychological science should always specify the particular context of cognitive and behavioral functioning in which they are interested, instead of advocating that experiments should be more ‘ecologically valid’ in order to generalize to the ‘real-world.’ We believe this will be a more constructive way to uncover the context-specific and context-generic principles of cognition and behavior.

Introduction

A popular goal in psychological science is to understand human cognition and behavior in the ‘real-world.’ In contrast, researchers have traditionally conducted experiments in specialized research settings, a.k.a. the ‘psychologist’s laboratory’ ( Danziger, 1994 ; Hatfield, 2002 ). Over the course of psychology’s history, critics have often questioned whether psychology’s lab-based experiments permit the generalization of results beyond the laboratory settings within which these results are typically obtained. In response, many researchers have advocated for more ‘ecologically valid’ experiments, as opposed to the so-called ‘conventional’ laboratory methods ( Neisser, 1976 ; Aanstoos, 1991 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). In recent years, several technological advances (e.g., virtual reality, wearable eye trackers, mobile EEG devices, fNIRS, biosensors, etc.) have further galvanized researchers to emphasize the importance of studying human cognition and behavior in the ‘real-world,’ as new technologies will aid researchers in overcoming some of the inherent limitations of laboratory experiments ( Schilbach, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ).

In this article, we will argue that the general aspiration of researchers to understand human cognition and behavior in the ‘real-world’ by conducting experiments that are more ‘ecologically valid’ (henceforth referred to as the ‘real-world approach’) is not without its problems. Most notably, we will argue that the popular term ‘ecological validity,’ which is widely used nowadays by researchers to discuss whether experimental research resembles and generalizes to the ‘real-world,’ is shrouded in both conceptual and methodological confusion. As we ourselves are interested in cognitive and behavioral functioning in the context of people’s everyday experience, and conduct experiments across various ‘laboratory’ and ‘real-world’ environments, we have seen how the uncritical use of the term ‘ecological validity’ can lead to rather misleading and counterproductive discussions. This not only holds for how this concept is used in many scholarly articles and textbooks, but also in presentations and discussions of experimental research at conferences, during the review process, and when talking with students about experimental design and the analysis of evidence.

Although the usage of the term ecological validity has previously been criticized by several scholars ( Hammond, 1998 ; Schmuckler, 2001 ; cf. Araujo et al., 2007 ; Dunlosky et al., 2009 ), we think that these critiques have largely been overlooked. Therefore, it will be necessary to cover some of the same ground. The contribution of this article is threefold. First, we extend the critique of the term ecological validity and apply it to the field of social attention. Second, we scrutinize some of the assumptions that guide the contemporary framework of ecological validity, specifically those regarding artificiality–naturality and simplicity–complexity. Finally, our article is meant to educate a new generation of students and researchers on the historical roots and conceptual issues of the term ecological validity. This article consists of four parts. First, we will provide a brief history of the so-called ‘real-world or the lab’-dilemma and discuss several definitions and interpretations of the term ecological validity. Second, we will go into the historical roots of the concept of ecological validity and describe how the original meaning of this concept has transformed significantly. Third, we will scrutinize the prevailing assumptions that seems to guide how researchers are currently using the term ecological validity. Finally, we will apply our conceptual analysis to a specific field of study, namely the field of social attention. In recent years, this field has been particularly concerned with issues of ecological validity and generalizability. Therefore, the field of social attention offers an exemplary case to explain how the uncritical use of the terms ‘ecological validity’ and the ‘real-world’ may lead to misleading and counterproductive conclusions.

A Brief History of the ‘Real-World or the Lab’-Dilemma

The popular story of psychology (or the broader ‘cognitive sciences’) has it that “psychology became a science by rising from the ‘armchair’ of speculation and uncontrolled observation, and entering the laboratory to undertake controlled observation and measurement” ( Hatfield, 2002 , p. 208). The ‘psychologist’s laboratory’, a special room furnished with all kinds of lab paraphernalia and sophisticated equipment, has been regarded as the celebrated vehicle of psychology’s journey into sciencehood ( Danziger, 1994 ; Goodwin, 2015 ). However, despite psychologists’ long tradition of laboratory experimentation (for a history and discussion, see Gillis and Schneider, 1966 ), there also have been many critical voices saying that psychology’s laboratory experiments are too limited in scope to study how people function in daily life. For example, Brunswik (1943 , p. 262) once wrote that experimental psychology was limited to “narrow-spanning problems of artificially isolated proximal or peripheral technicalities of mediation which are not representative of the larger patterns of life”. Barker (1968 , p. 3) wrote that “it is impossible to create in the laboratory the frequency, duration, scope and magnitude of some important human conditions.” Neisser (1976 , p. 34) wrote that “contemporary studies of cognitive processes usually use stimulus material that is abstract, discontinuous, and only marginally real.” Bronfenbrenner (1977 , p. 513) wrote that “many of these experiments involve situations that are unfamiliar, artificial, and short-lived and that call for unusual behaviors that are difficult to generalize to other settings.” Kingstone et al. (2008 , p. 355) declared that “the research performed in labs, and the findings they generate, are in principle and in practice unlikely to be of relevance to the more complex situations that people experience in everyday lif e, ” and Shamay-Tsoory and Mendelsohn (2019 , p. 1) stated that “ conventional experimental psychological approaches have mainly focused on investigating behavior of individuals as isolated agents situated in artificial, sensory, and socially deprived environments, limiting our understanding of naturalistic cognitive, emotional, and social phenomena.”

According to these scholars, psychological science is faced with a gloomy predicament: findings and results based on highly controlled and systematically designed laboratory experiments may not be a great discovery but only a “mere laboratory curiosity” ( Gibson, 1970 , pp. 426–427). As Anderson et al. (1999 , p. 3) put it: “A common truism has been that … laboratory studies are good at telling whether or not some manipulation of an independent variable causes changes in the dependent variable, but many scholars assume that these results do not generalize to the “real-world.” The general concern is that, due to the ‘artificiality’ and ‘simplicity’ of the laboratory, some (if not many) lab-based experiments do not adequately represent the ‘naturality’ and ‘complexity’ of psychological phenomena in everyday life (see Figure 1 ). This problem has become familiar to psychologists as the ‘real-world or the lab’-dilemma ( Hammond and Stewart, 2001 ). At the heart of psychology’s ‘real-world or the lab’-dilemma lies a pernicious methodological choice: “Should it [psychological science] pursue the goal of generality by demanding that research be generalizable to “real life” (aka the “real-world”), or should it pursue generalizability by holding onto its traditional laboratory research paradigm?” ( Hammond and Stewart, 2001 , p. 7).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-11-00721-g001.jpg

Examples of historical and contemporary laboratory rooms and field experiments. (A) A laboratory room from the early 20th century. A participant is seated in front a ‘disc tachistoscope,’ an apparatus to display visual images (adapted from Hilton, 1920 ). (B) A picture of a field experiment by J. J. Gibson. Observers had to judge the size of an object in the distance (adapted from Gibson, 1950 ). (C) A 21st century eye tracking laboratory. A participant is seated in front of a SMI Hi-Speed tower-mounted eye tracker (based on Valtakari et al., 2019 ). (D) A wearable eye-tracker (barely visible) is used to measure gaze behavior while participants walked through corridors with human crowds ( Hessels et al., 2020 ). Copyright statement – Panels (A,B) . All photographs are used under the provision of the “fair use” U.S. Copyright Act 107 and Dutch Copyright Law Article 15a for non-profit purposes of research, education and scholarly comment. The photograph from W. Hilton’s book: Applied Psychology: Driving Power of Thought (Original date of publication, 1920). Retrieved April 1, 2020, from http://www.gutenberg.org/files/33076/33076-h/33076-h.htm . The photograph from J. J. Gibson’s book: The Perception of the Visual World (Original date of publication, 1950, Figure 74, p. 184) was retrieved from a copy of the Utrecht University library. (C,D) Photographs are owned by the authors and the people depicted in the images gave consent for publication.

Although psychological science is comprised of many specialized research areas, the goal to understand human cognition and behavior in the ‘real-world’ has become a critically acclaimed goal for psychologists and cognitive scientists of all stripes. Indeed, examples of the ‘real-world or the lab’-dilemma can be found not only in various ‘applied’ fields of psychology, such as ergonomics ( Hoc, 2001 ), clinical (neuro)psychology ( Wilson, 1993 ; Parsons, 2015 ), educational psychology ( Dunlosky et al., 2009 ), sport psychology ( Davids, 1988 ), marketing and consumer psychology ( Smith et al., 1998 ), and the psychology of driving ( Rogers et al., 2005 ), but also in the so-called ‘basic’ fields of psychological science, such as the study of perception ( Brunswik, 1956 ; Gibson, 1979/2014 ), attention ( Simons and Levin, 1998 ; Peelen and Kastner, 2014 ), memory ( Banaji and Crowder, 1989 ; Neisser, 1991 ; Cohen and Conway, 2007 ), social cognition ( Schilbach et al., 2013 ; Schilbach, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ), judgment-and-decision making ( Koehler, 1996 ), and child development ( Lewkowicz, 2001 ; Schmuckler, 2001 ; Adolph, 2019 ).

The ‘Real-World Approach’: A Call for Ecological Validity

In the past decades, researchers have often discussed how they may overcome some of the limitations of laboratory-based experiments. Perhaps the largest common denominator of what we call the ‘real-world approach’ is a strong emphasis on ‘ecological validity.’ Over the past decades, the term ecological validity has made its appearance whenever researchers became concerned with the potential limitations of laboratory experiments (see e.g., Jenkins, 1974 ; Neisser, 1976 ; Banaji and Crowder, 1989 ; Aanstoos, 1991 ; Koehler, 1996 ; Smilek et al., 2006 ; Risko et al., 2012 ; Schilbach, 2015 ; Caruana et al., 2017 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). As Neisser (1976 , p. 33) famously put it:

“The concept of ecological validity has become familiar to psychologists. It reminds them that the artificial situation created for an experiment may differ from the everyday world in crucial ways. When this is so, the results may be irrelevant to the phenomena that one would really like to explain.”

The main problem, according to Neisser and many others, is that experiments in psychological science are generally “lacking in ecological validity” ( Neisser, 1976 , p. 7; Smilek et al., 2006 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ). Aanstoos (1991 , p. 77) even referred to this problem as the “ecological validity crisis.” To counter this problem, many researchers have called for studies with ‘more’ or ‘greater’ ecological validity. For example, Koehler (1996 , p. 1) advocated for a “more ecologically valid research program,” Schilbach (2015 , p. 130) argued for “the inclusion of more ecologically valid conditions,” and Smilek et al. (2006 , p. 104) suggested that “in order for results to generalize to real-world scenarios we need to use tasks with greater ecological validity.” Clearly, ecological validity is regarded as an important feature of experimental research by researchers who pursue the ‘real-world approach.’ However, in our opinion, and we are not alone in this regard (see also Hammond, 1998 ; Araujo et al., 2007 ; Dunlosky et al., 2009 ), this notion of ecological validity has caused considerable confusion. To foreshadow some of our criticism of ecological validity, we will show that this concept has largely been detached from its original parentage (cf. Brunswik, 1949 ), and is now host to different interpretations guided by questionable assumptions (for a history, see Hammond, 1998 ). Worst of all, the concept is often wielded as a blunt weapon to criticize and dismiss experiments, even though researchers seldom make explicit what definition of ecological validity they use or by which set of criteria they have evaluated a study’s ecological validity (as previously pointed out by Hammond, 1998 ; Schmuckler, 2001 ; Dunlosky et al., 2009 ).

The Big Umbrella of Ecological Validity

In past decades, the concept of ecological validity has been related to various facets of psychological research, for example, the ecological validity of stimuli ( Neisser, 1976 ; Risko et al., 2012 ; Jack and Schyns, 2017 ), the ecological validity of tasks ( Smilek et al., 2006 ; Krakauer et al., 2017 ), the ecological validity of conditions ( Schilbach, 2015 ; Blanco-Elorrieta and Pylkkänen, 2018 ), the ecological validity of research settings ( Bronfenbrenner, 1977 ; Schmuckler, 2001 ), the ecological validity of results ( Eaton and Clore, 1975 ; Greenwald, 1976 ; Silverstein and Stang, 1976 ), the ecological validity of theories ( Neisser, 1976 ), the ecological validity of research designs ( Rogers et al., 2005 ), the ecological validity of methods ( Banaji and Crowder, 1989 ), the ecological validity of phenomena ( Johnston et al., 2014 ), the ecological validity of data ( Aspland and Gardner, 2003 ), and the ecological validity of paradigms ( Macdonald and Tatler, 2013 ; Schilbach et al., 2013 ). However, despite the popular usage of this term, specific definitions and requirements of ecological validity are not always clear.

A closer look at the literature suggests that different definitions and interpretations are used by researchers. Let’s consider some examples of the literature where researchers have been more explicit in their definitions of ecological validity. For example, Ashcraft and Radvansky (2009 , p. 511) defined ecological validity as: “The hotly debated principle that research must resemble the situations and task demands that are characteristic of the real-world rather than rely on artificial laboratory settings and tasks so that results will generalize to the real-world, that is, will have ecological validity.” Another influential definition of ecological validity was given by Bronfenbrenner (1977) , who defined ecological validity as “the extent to which the environment experienced by the subjects in a scientific investigation has the properties it is supposed or assumed to have by the investigator” (p. 516). In Bronfenbrenner’s view, a study’s ecological validity should not be predicated on the extent to which the research context resembles or is carried out in a ‘real-life’ environment. Instead, theoretical considerations should guide one’s methodological decisions on what type of research context is most appropriate given one’s focus of inquiry. For example, if one is interested in the behavioral responses of children when they are placed in a ‘strange situation’ then a laboratory room may be adequately suited for that particular research goal. However, if one is interested in how children behave within their home environment, then a laboratory room may not be the most suitable research context. As Bronfenbrenner (1977 , p. 516) remarked: “Specifically, so far as young children are concerned, the results indicate that the strangeness of the laboratory situation tends to increase anxiety and other negative feeling states and to decrease manifestations of social competence.”

Ecological validity has also been used interchangeably with (or regarded as a necessary component of) ‘external validity’ ( Berkowitz and Donnerstein, 1982 ; Mook, 1983 ; Hoc, 2001 ). The concept of external validity typically refers to whether a given study result or conclusion, usually obtained under one set of conditions and with one group of participants, can also be generalized to other people, tasks, and situations ( Campbell, 1957 ). For example, in the literature on neuropsychological assessment and rehabilitation, ecological validity has primarily been conceptualized as “ … the degree to which clinical tests of cognitive functioning predict functional impairment” ( Higginson et al., 2000 , p. 185). In this field, there has been much discussion about whether the neuropsychological tests used by clinicians accurately predict cognitive and behavioral impairments in everyday life ( Heinrichs, 1990 ; Wilson, 1993 ). One major concern is that the test materials are either too abstract or too general to adequately represent the kind of problems that people with cognitive and neurological impairments encounter in their daily routines, for example, while cooking or buying food at the supermarket. In response, various efforts have been made to increase the ecological validity of neuropsychological tests, for example, by developing performance measures with relevance for everyday tasks and activities ( Shallice and Burgess, 1991 ; Alderman et al., 2003 ), by combining and correlating tests results with behavioral observations and self-reports ( Wilson, 1993 ; Higginson et al., 2000 ), and by using Virtual Reality (VR) applications to create test situations in which a patient’s cognitive and functional impairments are likely to be expressed ( Parsons, 2015 ; Parsons et al., 2017 ).

The Historical Roots of Ecological Validity

As we have seen, definitions and interpretations of ecological validity may not only differ among researchers, but also across various subfields within psychology. As such, it is not always clear how the concept should be interpreted. Interestingly, the term ecological validity used to have a very precise meaning when it was first introduced to psychological science by Brunswik (1949 , 1952 , 1955 , 1956) . Brunswik coined the term ‘ecological validity’ to describe the correlation between a proximal sensory cue (e.g., retinal stimulation) and a distal object-variable (e.g., object in the environment). In Brunswik’s terminology, ecological validity refers to a measure (a correlation coefficient) that describes a probabilistic relationship between the distal and proximal layers of an organism-environment system. According to Brunswik (1955) : “A correlation between ecological variables, one which is capable of standing in this manner as a probability cue for the other, may thus be labeled “ecological validity”” (p. 199). Brunswik (1952) believed psychology to primarily be a science of organism-environment relations in which the “organism has to cope with an environment full of uncertainties” (p. 22). In Brunswik’s ‘lens model’ ( Brunswik, 1952 ), the ecological validities of perceptual cues indicate the potential utility of these cues for the organism to achieve its behavioral goals. Note that Brunswik’s concept of ecological validity is very different from how the term is generally used nowadays, namely to discuss and evaluate whether some laboratory-based experiments resemble and generalize to the ‘real-world’ (cf. Neisser, 1976 ; Smilek et al., 2006 ; Ashcraft and Radvansky, 2009 ; Shamay-Tsoory and Mendelsohn, 2019 ).

The erosion and distortion of Brunswik’s definition of ecological validity has been documented by several scholars (e.g., Hammond, 1998 ; Araujo et al., 2007 ; Holleman et al., in press ). As explained by Hammond (1998) , the original definition of ecological validity, as Brunswik (1949 , 1952) introduced it, has been conflated with Brunswik’s ‘representative design’ of experiments ( Brunswik, 1955 , 1956 ). Representative design was Brunswik’s methodological program for psychological science to achieve generalizability of results. To achieve this, researchers should not only conduct proper sampling on the side of the subjects, by sampling subjects who are representative of a specific ‘target population’ (e.g., children, patients), but researchers should also sample stimuli, tasks, and situations which are representative of a specific ‘target ecology.’ As such, an experiment may be treated as a sample of this ‘target ecology.’ By virtue of sampling theory, researchers may then determine whether results can be generalized to the intended conditions. In short, representative design requires researchers to first specify the conditions toward which they intend to generalize their findings, and then specify how those conditions are represented in the experimental arrangement ( Brunswik, 1956 ). For more in-depth discussions on representative design, see Hammond and Stewart (2001) ; Dhami et al. (2004) , and Hogarth (2005) .

A Systematic Approach to Ecological Validity?

The current lack of terminological precision surrounding ecological validity is, to say the least, problematic. There seems to be no agreed upon definition in the literature, nor any means of classification to determine or evaluate a study’s ecological validity. This seems to be at odds with the relative ease by which researchers routinely invoke this concept to discuss the limitations and shortcomings of laboratory experiments. All the while, researchers seldom make clear how they have determined a study’s ecological (in)validity. As Schmuckler (2001 , p. 419) pointed out: “One consequence of this problem is that concerns with ecological validity can be raised in most experimental situations.” To overcome these problems, several scholars have emphasized the need for a more systematic approach to ecological validity ( Lewkowicz, 2001 ; Schmuckler, 2001 ; Kingstone et al., 2008 ; Risko et al., 2012 ). For example, Lewkowicz (2001 , p. 443) wrote that: “What is missing is an independent, objective, and operational definition of the concept of ecological validity that makes it possible to quantify a stimulus or event as more or less ecologically valid.” According to Schmuckler (2001) , ecological validity can be evaluated on at least three dimensions: (1) the nature of the stimuli ; (2) the nature of task, behavior, or response ; (3) the nature of the research context . Researchers have primarily discussed these dimensions in terms of their artificiality–naturality (e.g., Hoc, 2001 ; Schmuckler, 2001 ; Risko et al., 2012 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ), and their simplicity–complexity (e.g., Kingstone et al., 2008 ; Peelen and Kastner, 2014 ; Lappi, 2015 ). As such, a general framework can be construed where stimuli, tasks, behaviors, and research contexts can be evaluated on a continuum of artificiality–naturality and simplicity–complexity (see also Risko et al., 2012 ; Lappi, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). At one extreme is the laboratory, characterized by its artificiality and simplicity. At the other extreme is the ‘real-world,’ characterized by its naturality and complexity. According to this multidimensional framework, researchers may determine a study’s overall ecological validity by combining (e.g., averaging or summing) the main components of ecological validity (i.e., stimuli, tasks/behaviors, research context) in terms of their relative artificiality–naturality and simplicity–complexity. However, while many researchers have conceptualized ecological validity alongside these dimensions, we think there are several problems to consider. Since the dimensions of this framework are supposedly important to determine the ecological validity of experimental research, this then raises the question of how researchers can judge the artificiality–naturality and simplicity–complexity of particular experiments. This question will be explored in the following sections.

Artificiality – Naturality

The contrast between ‘artificiality’ and ‘naturality’ is a particularly prominent point of discussion in the ‘real-world or the lab’-dilemma and when researchers talk about the ecological validity of experimental research practices ( Hoc, 2001 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ). According to Hoc (2001 , pp. 282–283), ‘artificial’ situations are “those that are specifically designed for research” and ‘natural’ situations are “the target situations to be understood by research” . Importantly, Hoc (2001) notes that this distinction is made from the perspective of the researcher. However, this artificiality–naturality distinction should also be considered from the subject’s point of view. For example, according to Sonkusare et al. (2019) : “naturalistic paradigms can be heuristically defined as those that employ the rich, multimodal dynamic stimuli that represent our daily lived experience, such as film clips, TV advertisements, news items, and spoken narratives, or that embody relatively unconstrained interactions with other agents, gaming environments, or virtual realities” (p. 700). Furthermore, researchers have long recognized that artificiality arises when the experimental methods employed by researchers interfere with the naturality of the psychological phenomena one aims to study. Consequently, there is always an inherent trade-off between the degree of artificiality imposed by the experimental conditions and the naturality of the phenomena under scientific investigation ( Brunswik, 1956 ; Barker, 1968 ; Banaji and Crowder, 1989 ; Kingstone et al., 2008 ; Risko et al., 2012 ; Caruana et al., 2017 ). However, as Winograd (1988) has previously remarked, it remains difficult to “draw a line where artificiality ends and ecological validity … for real events begins” (p. 18).

Interestingly, discussions on the naturality–artificiality of experimental methods have a long pedigree in psychological science. By the end of the 19th century, Thorndike (1899) and Mills (1899) already argued fiercely about what methodology should be favored to study the behavior of cats. Mills dismissed Thorndike’s work because of the artificiality of the experimental methods employed by Thorndike (see Figure 2 ), whereas Thorndike regarded the ethological approach favored by Mills as a collection of uncritical observations and anecdotes. Mills (1899 , p. 264) wrote that: “Dr. Thorndike … has given the impression that I have not made experiments, or ‘crucial experiments’ … I may remark that a laboratory as ordinarily understood is not well suited for making psychological experiments on animals” . Mills’ point was that: “cats placed in small enclosures … cannot be expected to act naturally. Thus, nothing from about their normal behavior can be determined from their behavior in highly artificial, abnormal surroundings” ( Goodwin, 2015 , p. 200). In response to Mills, Thorndike (1899 , p. 414) replied: “Professor Mills does not argue in concrete terms, does not criticize concrete unfitness in the situations I devised for the animals. He simply names them unnatural.” Thorndike clearly did not accept Mills’ charge on the artificiality of his experimental arrangements to study the behavior of cats because Mills did not define what should be considered natural behavior in the first place.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-11-00721-g002.jpg

A ‘puzzle box’ devised by Thorndike (1899 , 2017) to study learning behavior of cats. A hungry cat is placed in a box which can be opened if the cat pushes a latch. A food reward (‘positive reinforcer’) will be obtained by the cat if it figures out how to escape from the box. Thorndike discovered that after several trials, the time it takes the cat to escape from the box decreases. Experiments with puzzle boxes remain popular today to study the cognitive capacities of animals, for example, see Richter et al. (2016) for a study with octopuses. Copyright statement – Image created and owned by author IH and is based on E. L. Thorndike’s book: Animal Intelligence (Original date of publication, 1911, Figure 1, p. 30).

We think that this historical discussion between Thorndike and Mills is illuminating, because it characterizes the heart of the discussion on ecological validity nowadays. Namely, what exactly did Mills consider to be ‘natural’ or ‘normal’ behavior? And how did Mills determine that Thorndike’s experiments failed to capture the ‘natural’ behavior of cats? Following Thorndike’s point on the matter, we think that researchers cannot readily determine the naturality–artificiality of any given experimental arrangement, at least not without specifying what is entailed by these ascriptions. As Dunlosky et al. (2009 , p. 431) previously remarked: “A naturalistic setting guarantees nothing, especially given that “naturalistic” is never unpacked – what does it mean?”. Indeed, our survey of the literature also shows that the historical discussion between Thorndike and Mills is by no means a discussion of the past. In fact, we regularly encounter discussions on the ‘artificiality’ and ‘naturality’ of experimental setups, the presentation of stimuli, the behavior of participants, or the specific tasks and procedures used in experiments – not only in the literature, but also among our colleagues and reviewers. We must often ask for the specifics, because such remarks typically remain undefined by those who toss them around.

Simplicity – Complexity

The contemporary framework of ecological validity also posits that the laboratory and the ‘real-world’ are inversely proportional in terms of their simplicity–complexity. Many researchers have lamented that laboratory experiments have a ‘reductionistic’ tendency to simplify the complexity of the psychological phenomena under study (e.g., Neisser, 1976 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ). For example, Sonkusare et al. (2019 , p. 699) stated that “the ecological validity of these abstract, laboratory-style experiments is debatable, as in many ways they do not resemble the complexity and dynamics of stimuli and behaviors in real-life.” But what exactly is meant by complexity? Let’s consider some examples from the literature. In the field of social attention, researchers have often used schematic images, photographs and videos of people and social scenes as stimuli to study the cognitive, behavioral, and physiological processes of face perception, gaze following and joint attention ( Langton et al., 2000 ; Frischen et al., 2007 ; Puce and Bertenthal, 2015 ). However, in recent years, there has been considerable debate that such stimuli are not ‘ecologically valid’ because they do not “capture the complexity of real social situations” ( Birmingham et al., 2012 , p. 30). While we agree that looking at a photographic image of a person’s face is different from looking at a living and breathing person, in what ways do these situations differ in complexity? Do these scholars mean that looking at a ‘live’ person is more complex than looking at a picture of that person? Or do they mean that the former is more complex than the latter from the perspective of the researcher who wants to understand the cognitive, behavioral, and physiological processes of face perception and social attention?

To take another example, Gabor patches are often used as stimuli by experimental psychologists to study ‘low-level visual processing’ (see Figure 3 ). Experimental psychologists use Gabor patches as visual stimuli because they offer a high degree of experimental control over various stimulus parameters (e.g., spatial frequency bandwidths, orientation, contrast, size, location). Gabor patches can described with mathematical precision (i.e., ”Gaussian-windowed sinusoidal gratings,” Fredericksen et al., 1997 , p. 1), and their spatial properties are considered to be a good representation of the receptive field profiles in the primary visual cortex. While Gabor patches may be considered ‘simple’ to researchers who study the relation between low-level visual processing and neural activity in terms of orientation-tuning and hemodynamic response functions, they also point to the yet to be explained ‘complexity’ of the many possible relations between other cognitive processes and patterns of neural activity in the brain. On the other hand, a naïve participant (who likely has no clue about what researchers have discovered about low-level visual processing) may describe these Gabor patches as blurry, kind of stripy, zebra-like circles, and think that they are incredibly boring to look at for many trials while lying quietly in a MRI scanner.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-11-00721-g003.jpg

Are Gabor patches simple or complex compared to a picture of zebras? (A) A Gabor patch. (B) A photograph of zebras. The uniquely striped patterns of the zebra makes them most familiar to humans, whereas the question why zebras have such beautiful stripes remains the topic of much discussion among biologists, see e.g., Caro and Stankowich (2015) and Larison et al. (2015) . Copyright statement – Images are used under the provision of the “fair use” U.S. Copyright Act 107 and Dutch Copyright Law Article 15a for non-profit purposes of research, education and scholarly comment. Image of Gabor patch was adapted from Todorović (2016 , May 30). Retrieved April 1, 2020, from http://neuroanatody.com/2016/05/whats-in-a-gabor-patch/ ). Photograph of zebras was made by Ajay Lalu and has been made publicly available by the owner for non-profit purposes via Pixabay . Retrieved on April 1, 2020, from https://pixabay.com/nl/users/ajaylalu-1897335/ .

Our point here is that simplicity–complexity is in the eye of the beholder. Who is to say what is more simple or complex? Physicists, computer scientists, information theorists, and evolutionary biologists have developed various definitions and measures of complexity (e.g., physical complexity, computational complexity, effective complexity, algorithmic complexity, statistical complexity, structural complexity, functional complexity, etc.), typically expressed in strictly mathematical terms ( Edmonds, 1995 ; Gell-Mann, 1995 ; Adami, 2002 ). But what definitions and measures of complexity are used by psychologists and cognitive scientists? Researchers in psychological science seem to have more loosely used the term complexity, for example, to describe a wide range of biological, behavioral, cognitive, social, and cultural phenomena, which typically contain lots of many’s (i.e., many parts, many variables, many degrees of freedom). Researchers may refer to various phenomena as ‘complex’ because they are simply not (yet) understood, as in “the brain is too complex for us to understand” ( Edmonds, 1995 , p. 4). Yet, such intuitive notions of complexity, whether they are caused by ignorance or whether they are used to describe something’s size, number, or variety ( Edmonds, 1995 ), are not very helpful to evaluate the simplicity–complexity of stimuli, tasks, and situations, nor do such notions provide any formula by which these components can be summed to determine the total ecological validity of a given study. According to Gell-Mann (1995 , p. 16):

“As measures of something like complexity for an entity in the real-world, all such quantities are to some extent context-dependent or even subjective. They depend on the coarse graining (level of detail) of the description of the entity, on the previous knowledge and understanding of the world that is assumed, on the language employed, on the coding method used for conversion from that language into a string of bits, and on the particular idealized computer chosen as a standard.”

The ‘Real World’ or the ‘Laboratory’: Psychology’s False Dilemma?

We have discussed several problems with how researchers have used the term ‘ecological validity’. In short, the concept of ecological validity has transformed significantly over the past several decades since it was introduced by Brunswik (1949) . It has lost most of its former theoretical and methodological cohesion (for a history, see Hammond, 1998 ), and the definitions and requirements of ecological validity used by researchers nowadays are seldom made explicit. As such, some experiments may be regarded as ‘ecologically valid’ by one researcher while they can be casually dismissed as ‘ecologically invalid’ by others. A closer look at the literature suggests that many researchers seem to assume that everyone understands what is meant by this term, while in fact the concept of ecological validity is seldom defined. As such, the concept of ecological validity is primarily used nowadays to make hand-waving statements about whether some (lab-based) experiments resemble ‘real life,’ or whether some results obtained in the laboratory may or may not generalize to the ‘real-world.’

In our opinion, the contemporary framework of ecological validity eventually falls short of providing researchers with a tractable research program. Researchers seem to primarily base their judgments of ecological validity upon their own particular theoretical assumptions and considerations about the so-called artificiality–naturality and simplicity–complexity of experimental situations, typically in the absence of a more formal set of criteria. As such, while we certainly sympathize with the ‘call for ecological validity’, insofar it has motivated researchers to be critical about the limitations of experimental methods, we also think that the uncritical use of the term ecological validity has caused a lot of confusion, and in some cases has even been counterproductive. Perhaps the most problematic consequence of using the term ecological validity as an easy substitute for the ‘real-world’ was previously pointed out by Hammond (1998) . He commented that:

“There is, of course, no such thing as a “real-world.” It has been assigned no properties, and no definition; it is used simply because of the absence of a theory of tasks or other environments, and thus does not responsibly offer a frame of reference for the generalization” .

In Hammond’s view, the aim to understand cognitive and behavioral functioning in the ‘real-world’ is basically pointless if one does not first define this notion of the ‘real-world.’ As such, researchers have locked themselves “onto the horns of a false dilemma” ( Hammond and Stewart, 2001 , p. 7). Thus, in order to talk sensibly about whether some results can also be generalized to particular situations beyond the experimental conditions in which those results were obtained, researchers first need to specify the range and distributions of the variables and conditions to which their results are supposed to be applicable. Since the notion of the ‘real-world’ patently lacks specificity, this phrase inevitably hampers researchers to specify the range and boundary conditions of cognitive and behavioral functioning in any given research context, and thus precludes one from getting at the context-specific and context-generic principles of cognition and behavior (see also Kruglanski, 1975 ; Simons et al., 2017 ).

The Nature of the Environment?

Instead of trying to understand cognitive and behavioral functioning in the ‘real-world’, we completely agree with Hammond (1998) that the charge of researchers is to always specify and describe the particular context of behavior in which one is interested. Ultimately, the real challenge for researchers is to develop a theory of how specific environmental contexts are related to various forms of cognitive and behavioral functioning. But what constitutes a psychologist’s theory of the environment? Researchers in psychological science are typically concerned with the nature of the organism, yet, the nature of the environment and its relation to cognitive and behavioral functioning has received considerably less attention from a theoretical point of view ( Barker, 1966 ; Heft, 2013 ). Interestingly, there have been several scholars who have dedicated themselves to precisely this question, and whose theories of cognition and behavior included a clear perspective on the nature of the environment.

According to Tolman and Brunswik (1935) , the nature of the environment, as it appears to the organism, is full of uncertainties. The organism perceives the environment as an array of proximal ‘cues’ and ‘signs’ (i.e., information sources), which are the ‘local representatives’ of various distal objects and events in the organism’s environment. To function more or less efficiently, the organism needs to accumulate, combine, and substitute the information it derives from the available ‘cues’ and ‘signs,’ so that it can adequately adjust its means to achieve its behavioral goals (e.g., finding food or shelter). However, since the environment is inherently probabilistic and only partly predictable, the organism continually needs to adjust its assumptions about the state of the environment based on the available information sources. Another example is given by Barker (1968) , whose concept of ‘behavior settings’ (see also Heft, 2001 ) is key in describing how the environment shapes the frequency and occurrence of human cognition and behavior. Important to behavior settings is that they are the product of the collective actions of a group of individuals. Their geographical location can be specified (e.g., the supermarket, the cinema, etc.), and they have clear temporal and physical boundaries (e.g., opening hours, a door to enter and exit the building). Behavior settings are ‘independent’ of an individual’s subjective experience, yet what goes on inside any behavior setting is characterized by a high degree of interdependency and equivalence of actions between individuals (e.g., most people who are inside a supermarket are shopping for groceries and people in cinemas are watching movies). Another ‘classic’ example of a theory of the environment can be found in J. J. Gibson’s book The Ecological Approach to Visual Perception (1979/2014). According to Gibson, there exists a strong mutuality and reciprocity between the organism and its environment. He introduced the concept of ‘affordances’ to explain how the inherent ‘meaning’ of things (i.e., functional significance to the individual) can be directly perceived by an individual perceiver and how this ‘information’ shapes the possibilities for potential actions and experiences. For example, a sufficiently firm and smooth surface may be walk-on-able, run-on-able, or dance-on-able, whereas a rough surface cluttered with obstacles does not afford such actions ( Heft, 2001 ). In short, affordances are properties of an organism-environment system. They are perceiver-relative functional qualities of an object, event or place in the environment and they are dependent on the particular features of the environment and their relationships with the functional capabilities of a particular individual (for more in-depth discussions, see e.g., Heft, 2001 ; Stoffregen, 2003 ).

In order to describe and specify the environment and its relation to cognitive and behavioral functioning, we may draw on these scholars to guide us in a more specific direction. While we do not specifically recommend any of these perspectives, we think they are illuminating because these scholars motivate us to ask questions such as: What is the specific functional context of the cognitive and behavioral processes one is interested in? What are the relevant variables and conditions in this context given one’s focus of inquiry and level of analysis? What do we know or assume to know about the range and distribution of these variables and conditions? And how can these variables and conditions be represented in experimental designs to study specific patterns of cognitive and behavioral functioning? In order to answer some these questions, several researchers have emphasized the importance of first observing how people behave in everyday situations prior to experimentation. For example, Kingstone et al. (2008) advocated for an approach called Cognitive Ethology , which proposes that researchers should first observe how people behave in everyday situations before moving into the laboratory. In a similar vein, Adolph (2019) proposes that researchers should start with a rich description of the behaviors they are interested in order to first identify the “essential invariants” of these behaviors (p. 187).

The Field of Social Attention: Away From the Real-World and Toward Specificity About Context

To exemplify how some of the ideas outlined above may be useful to researchers, we will apply these ideas to a research topic of our interest: social attention. The field of social attention, as briefly discussed previously, is primarily focused on how attention is influenced by socially relevant objects, events, and situations, most notably, interactions with other social agents. In recent decades, it has been argued extensively that the experimental arrangements used by researchers in this field need more ‘ecological validity’ in order to adequately study the relevant characteristics of social attention in the ‘real-world’ ( Risko et al., 2012 , 2016 ; Schilbach et al., 2013 ; Caruana et al., 2017 ; Macdonald and Tatler, 2018 ; Shamay-Tsoory and Mendelsohn, 2019 ). In the light of these concerns, several researchers have advocated to study “real-world social attention” ( Risko et al., 2016 , p. 1) and “real-world social interaction” ( Macdonald and Tatler, 2018 , p. 1; see also Shamay-Tsoory and Mendelsohn, 2019 ). One example of this is given by Macdonald and Tatler (2018) . In this study, Macdonald and Tatler (2018) investigated how social roles given to participants influenced their social gaze behavior during a collaborative task: baking a cake together. Participants were either not given explicit social roles, or they were given a ‘Chef’ or ‘Gatherer’ role. Macdonald and Tatler (2018) showed that, regardless of whether social roles were assigned or not, participants did not gaze at their cake-baking partners very often while carrying out the task. After comparing their results with other so-called ‘real-world interaction studies’ (e.g., Laidlaw et al., 2011 ; Wu et al., 2013 ), the authors stated that: “we are not able to generalize about the specific amount of partner gaze during any given real-world interaction” ( Macdonald and Tatler, 2018 , p. 2171). We think that this statement clearly illustrates how the use of ‘real-world’ and ‘real life’ labels may lead to misleading and potentially counterproductive conclusions, as it seems to imply that ‘real-world interactions’ encompass a clearly defined category of behaviors. However, as argued previously, these so-called ‘real-world interactions’ are not a clearly defined category of behaviors. Instead, statements about generalizability need to be considered within a more constrained and carefully defined context (cf. Brunswik, 1956 ; Simons et al., 2017 ). This would make it more clear what researchers are talking about instead of subsuming studies under the big umbrella of the ‘real-world.’ For example, if the goal is to study how the cognitive and behavioral processes of social attention are influenced by different contexts and situations, researchers need to specify social gaze behavior as a function of these different contexts and situations.

Thus, instead of studying ‘real-world’ social attention in the context of ‘real-world’ social interactions, researchers should first try to describe and understand cake-baking attention ( Macdonald and Tatler, 2018 ), sharing-a-meal attention ( Wu et al., 2013 ), waiting-room attention ( Laidlaw et al., 2011 ), walking-on-campus attention ( Foulsham et al., 2011 ), Lego-block-building attention ( Macdonald and Tatler, 2013 ), playing-word-games attention ( Ho et al., 2015 ), interviewee-attention ( Freeth et al., 2013 ), and garage-sale attention ( Rubo and Gamer, 2018 ). By doing so, we may begin to understand the context-generic and context-specific aspects of attentional processes, allowing for a more sophisticated theory of social attention. These examples not only show the wide variety of behavioral tasks and contexts that are possible to study in relation to social attention, they also show that uncritical references to ‘ecological validity’ a.k.a. ‘real-worldliness’ are not very helpful to specify the relevant characteristics of particular behavioral contexts.

There are also good examples where researchers have been more explicit about the specific characteristics of social situations that they are interested in. Researchers in the field of social attention have, for example, tried to unravel the different functions of gaze behavior. One important function of gaze behavior is to acquire visual information from the world, however, within a social context, gaze may also signal important information to others which may be used to initiate and facilitate social interaction (see e.g., Gobel et al., 2015 ; Risko et al., 2016 ). In a series of experiments, researchers have systematically varied whether, and the degree to which social interaction between two people was possible, and measured how gaze was modulated as a function of the social context ( Laidlaw et al., 2011 ; Gobel et al., 2015 ; Gregory and Antolin, 2019 ; Holleman et al., 2020 ). In other studies, researchers have been explicit about the task-demands and social contexts that elicit specific patterns of gaze behavior, for example, in the context of face-to-face interactions and conversational exchanges ( Ho et al., 2015 ; Hessels et al., 2019 ). We think that, if researchers would try to be more explicit in their descriptions of task-demands and social contexts in relation to gaze, this may prove to be a solid basis for a more sophisticated theory of social attention, yet such work remains challenging (for a recent review, see Hessels, in press ).

We have argued that the ‘real-world approach’ and its call for ecological validity has several problems. The concept of ecological validity itself is seldom defined and interpretations differ among researchers. We believe that references to ecological validity and the ‘real-world’ can become superfluous if researchers would clearly specify and describe the particular contexts of behavior in which they are interested. This will be a more constructive way to uncover the context-specific and context-generic principles of cognition and behavior. As a final note, we hope that editors and reviewers will safeguard journals from publishing papers where terms such as ‘ecological validity’ and the ‘real-world’ are used without specification.

Author Contributions

GH and RH drafted the manuscript. RH, IH, and CK edited and revised the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding. GH and RH were supported by the Consortium on Individual Development (CID). CID is funded through the Gravitation programme of the Dutch Ministry of Education, Culture, and Science and the NWO (grant no. 024.001.003 awarded to author CK).

  • Aanstoos C. M. (1991). Experimental psychology and the challenge of real life. Am. Psychol. 1 : 77 10.1037/0003-066x.46.1.77 [ CrossRef ] [ Google Scholar ]
  • Adami C. (2002). What is complexity? Bioessays 24 1085–1094. [ PubMed ] [ Google Scholar ]
  • Adolph K. E. (2019). “ Ecological validity: mistaking the lab for real life ,” in My Biggest Research Mistake: Adventures and Misadventures in Psychological Research , Ed. Sternberg R. (New York, NY: Sage; ), 187–190. 10.4135/9781071802601.n58 [ CrossRef ] [ Google Scholar ]
  • Alderman N., Burgess P. W., Knight C., Henman C. (2003). Ecological validity of a simplified version of the multiple errands shopping test. J. Int. Neuropsychol. Soc. 9 31–44. 10.1017/s1355617703910046 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Anderson C. A., Lindsay J. J., Bushman B. J. (1999). Research in the psychological laboratory: truth or triviality? Curr. Direct. Psychol. Sci. 8 3–9. 10.1111/1467-8721.00002 [ CrossRef ] [ Google Scholar ]
  • Araujo D., Davids K., Passos P. (2007). Ecological validity, representative design, and correspondence between experimental task constraints and behavioral setting: comment on Rogers. Kadar, and Costall (2005). Ecol. Psychol. 19 69–78. 10.1080/10407410709336951 [ CrossRef ] [ Google Scholar ]
  • Ashcraft M., Radvansky G. (2009). Cognition , 5th Edn Upper Saddle River, NJ: Pearson Education, Inc. [ Google Scholar ]
  • Aspland H., Gardner F. (2003). Observational measures of parent-child interaction: an introductory review. Child Adolesc. Mental Health 8 136–143. 10.1111/1475-3588.00061 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Banaji M. R., Crowder R. G. (1989). The bankruptcy of everyday memory. Am. Psychol. 44 : 1185 10.1037/0003-066x.44.9.1185 [ CrossRef ] [ Google Scholar ]
  • Barker R. G. (1966). “ On the nature of the environment ,” in The Psychology of Egon Brunswik , ed. Hammond K. R. (New York: Holt, Rinehart and Winston; ). [ Google Scholar ]
  • Barker R. G. (1968). Ecological Psychology: Concepts and Methods for Studying the Environment of Human Behavior. Stanford, CA: Stanford University Press. [ Google Scholar ]
  • Berkowitz L., Donnerstein E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. Am. Psychol. 37 : 245 10.1037/0003-066x.37.3.245 [ CrossRef ] [ Google Scholar ]
  • Birmingham E., Ristic J., Kingstone A. (2012). “ Investigating social attention: a case for increasing stimulus complexity in the laboratory ,” in Cognitive Neuroscience, Development, and Psychopathology: Typical and Atypical Developmental Trajectories of Attention , eds Burack J. A., Enns J. T., Fox N. A. (Oxford University Press: ), 251–276. 10.1093/acprof:oso/9780195315455.003.0010 [ CrossRef ] [ Google Scholar ]
  • Blanco-Elorrieta E., Pylkkänen L. (2018). Ecological validity in bilingualism research and the bilingual advantage. Trends Cogn. Sci. 22 1117–1126. 10.1016/j.tics.2018.10.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bronfenbrenner U. (1977). Toward an experimental ecology of human development. Am. Psychol. 32 : 513 10.1037/0003-066x.32.7.513 [ CrossRef ] [ Google Scholar ]
  • Brunswik E. (1943). Organismic achievement and environmental probability. Psychol. Rev. 50 : 255 10.1037/h0060889 [ CrossRef ] [ Google Scholar ]
  • Brunswik E. (1949). Remarks on functionalism in perception. J. Pers. 18 56–65. 10.1111/j.1467-6494.1949.tb01233.x [ CrossRef ] [ Google Scholar ]
  • Brunswik E. (1952). The Conceptual Framework of Psychology. Chicago: University of Chicago Press. [ Google Scholar ]
  • Brunswik E. (1955). Representative design and probabilistic theory in a functional psychology. Psychol. Rev. 62 : 193 . 10.1037/h0047470 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brunswik E. (1956). Perception and the Representative Design of Psychological Experiments. Berkeley: University of California Press. [ Google Scholar ]
  • Campbell D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54 : 297 . 10.1037/h0040950 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Caro T., Stankowich T. (2015). Concordance on zebra stripes: a comment on Larison et al.(2015). R. Soc. Open Sci. 2 : 150323 . 10.1098/rsos.150323 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Caruana N., McArthur G., Woolgar A., Brock J. (2017). Simulating social interactions for the experimental investigation of joint attention. Neurosci. Biobehav. Rev. 74 115–125. 10.1016/j.neubiorev.2016.12.022 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cohen G., Conway M. A. (2007). Memory in the Real World. Abingdon: Psychology Press. [ Google Scholar ]
  • Danziger K. (1994). Constructing the Subject: Historical Origins of Psychological Research. Cambridge: Cambridge University Press. [ Google Scholar ]
  • Davids K. (1988). Ecological validity in understanding sport performance: some problems of definition. Quest 40 126–136. 10.1080/00336297.1988.10483894 [ CrossRef ] [ Google Scholar ]
  • Dhami M. K., Hertwig R., Hoffrage U. (2004). The role of representative design in an ecological approach to cognition. Psychol. Bull. 130 : 959 . 10.1037/0033-2909.130.6.959 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dunlosky J., Bottiroli S., Hartwig M. (2009). “ Sins committed in the name of ecological validity: A call for representative design in education science ,” in Handbook of Metacognition in Education , eds Hacker D. J., Dunlosky J., Graesser A. C. (Abingdon: Routledge; ), 442–452. [ Google Scholar ]
  • Eaton W. O., Clore G. L. (1975). Interracial imitation at a summer camp. J. Pers. Soc. Psychol. 32 : 1099 10.1037/0022-3514.32.6.1099 [ CrossRef ] [ Google Scholar ]
  • Edmonds B. (1995). “ What is complexity?-the philosophy of complexity per se with application to some examples in evolution ,” in The Evolution of Complexity , Ed. Bonner J. T. (Dordrecht: Kluwer; ). [ Google Scholar ]
  • Foulsham T., Walker E., Kingstone A. (2011). The where, what and when of gaze allocation in the lab and the natural environment. Vis. Res. 51 1920–1931. 10.1016/j.visres.2011.07.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fredericksen R., Bex P. J., Verstraten F. A. (1997). How big is a Gabor patch, and why should we care? JOSA A 14 1–12. [ PubMed ] [ Google Scholar ]
  • Freeth M., Foulsham T., Kingstone A. (2013). What affects social attention? Social presence, eye contact and autistic traits. PLoS One 8 : e53286 . 10.1371/journal.pone.0053286 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frischen A., Bayliss A. P., Tipper S. P. (2007). Gaze cueing of attention: visual attention, social cognition, and individual differences. Psychol. Bull. 133 : 694 . 10.1037/0033-2909.133.4.694 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gell-Mann M. (1995). What is complexity? Remarks on simplicity and complexity by the Nobel Prize-winning author of The Quark and the Jaguar. Complexity 1 16–19. 10.1002/cplx.6130010105 [ CrossRef ] [ Google Scholar ]
  • Gibson J. J. (1950). The Perception of the Visual World. Cambridge: Houghton Mifflin Company. [ Google Scholar ]
  • Gibson J. J. (1970). On the relation between hallucination and perception. Leonardo 3 425–427. [ Google Scholar ]
  • Gibson J. J. (2014). The Ecological Approach to Visual Perception: Classic Edition. New York, NY: Psychology Press. (Original date of publication 1979). [ Google Scholar ]
  • Gillis J., Schneider C. (1966). “ The historical preconditions of representative design ,” in The Psychology of Egon Brunswik , ed. Hammond K. R. (New York, NY: Holt, Rinehart & Winston, Inc; ), 204–236. [ Google Scholar ]
  • Gobel M. S., Kim H. S., Richardson D. C. (2015). The dual function of social gaze. Cognition 136 359–364. 10.1016/j.cognition.2014.11.040 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Goodwin C. J. (2015). A History of Modern Psychology , 5 Edn Hoboken, NJ: John Wiley & Sons. [ Google Scholar ]
  • Greenwald A. G. (1976). Within-subjects designs: to use or not to use? Psychol. Bull. 83 : 314 10.1037/0033-2909.83.2.314 [ CrossRef ] [ Google Scholar ]
  • Gregory N. J., Antolin J. V. (2019). Does social presence or the potential for interaction reduce social gaze in online social scenarios? Introducing the “live lab” paradigm. Q. J. Exp. Psychol. 72 779–791. 10.1177/1747021818772812 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hammond K. R. (1998). Ecological Validity: Then and Now. Available online at: http://www.brunswik.org/notes/essay2.html (accessed April 1, 2020). [ Google Scholar ]
  • Hammond K. R., Stewart T. R. (2001). The Essential Brunswik: Beginnings, Explications, Applications. New York, NY: Oxford University Press. [ Google Scholar ]
  • Hatfield G. (2002). Psychology, philosophy, and cognitive science: reflections on the history and philosophy of experimental psychology. Mind Lang. 17 207–232. 10.1111/1468-0017.00196 [ CrossRef ] [ Google Scholar ]
  • Heft H. (2001). Ecological Psychology in Context: James Gibson, Roger Barker, and the Legacy of William James’s Radical Empiricism. Hove: Psychology Press. [ Google Scholar ]
  • Heft H. (2013). An ecological approach to psychology. Rev. Gen. Psychol. 17 162–167. 10.1037/a0032928 [ CrossRef ] [ Google Scholar ]
  • Heinrichs R. W. (1990). Current and emergent applications of neuropsychological assessment: problems of validity and utility. Prof. Psychol. 21 : 171 10.1037/0735-7028.21.3.171 [ CrossRef ] [ Google Scholar ]
  • Hessels R. S. (in press). How does gaze to faces support face-to-face interaction? A review and perspective. Psychonom. Bull. Rev. 10.31219/osf.io/8zta5 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hessels R. S., Holleman G. A., Kingstone A., Hooge I. T. C., Kemner C. (2019). Gaze allocation in face-to-face communication is affected primarily by task structure and social context, not stimulus-driven factors. Cognition 184 28–43. 10.1016/j.cognition.2018.12.005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hessels R. S., van Doorn A. J., Benjamins J. S., Holleman G. A., Hooge I. T. C. (2020). Task-related gaze control in human crowd navigation. Attent. Percept. Psychophys. 10.3758/s13414-019-01952-9 [Online ahead of print] [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higginson C. I., Arnett P. A., Voss W. D. (2000). The ecological validity of clinical tests of memory and attention in multiple sclerosis. Arch. Clin. Neuropsychol. 15 185–204. 10.1016/s0887-6177(99)00004-9 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hilton W. (1920). Applied Psychology: Driving Power of Thought. The Society of Applied Psychology Available online at: http://www.gutenberg.org/files/33076/33076-h/33076-h.htm (accessed April 1, 2020). [ Google Scholar ]
  • Ho S., Foulsham T., Kingstone A. (2015). Speaking and listening with the eyes: gaze signaling during dyadic interactions. PLoS One 10 : e0136905 . 10.1371/journal.pone.0136905 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hoc J.-M. (2001). Towards ecological validity of research in cognitive ergonomics. Theor. Issues Ergon. Sci. 2 278–288. 10.1371/journal.pone.0184488 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hogarth R. M. (2005). The challenge of representative design in psychology and economics. J. Econ. Methodol. 12 253–263. 10.1177/0269216311399663 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Holleman G. A., Hessels R. S., Kemner C., Hooge I. T. C. (2020). Implying social interaction and its influence on gaze behavior to the eyes. PLoS One 15 : e0229203 . 10.1371/journal.pone.0229203 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Holleman G. A., Hooge I. T. C., Kemner C., Hessels R. S. (in press). The reality of ‘real-life’ neuroscience: a commentary on Shamay-Tsoory & Mendelsohn. Perspect. Psychol. Sci. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Jack R. E., Schyns P. G. (2017). Toward a social psychophysics of face communication. Annu. Rev. Psychol. 68 269–297. 10.1146/annurev-psych-010416-044242 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jenkins J. J. (1974). Remember that old theory of memory? Well, forget it. Am. Psychol. 29 : 785 10.1037/h0037399 [ CrossRef ] [ Google Scholar ]
  • Johnston P., Molyneux R., Young A. W. (2014). The N170 observed ‘in the wild’: robust event-related potentials to faces in cluttered dynamic visual scenes. Soc. Cogn. Affect. Neurosci. 10 938–944. 10.1093/scan/nsu136 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kingstone A., Smilek D., Eastwood J. D. (2008). Cognitive ethology: a new approach for studying human cognition. Br. J. Psychol. 99 317–340. 10.1348/000712607x251243 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Koehler J. J. (1996). The base rate fallacy reconsidered: descriptive, normative, and methodological challenges. Behav. Brain Sci. 19 1–17. 10.1017/s0140525x00041157 [ CrossRef ] [ Google Scholar ]
  • Krakauer J. W., Ghazanfar A. A., Gomez-Marin A., MacIver M. A., Poeppel D. (2017). Neuroscience needs behavior: correcting a reductionist bias. Neuron 93 480–490. 10.1016/j.neuron.2016.12.041 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kruglanski A. W. (1975). The two meanings of external invalidity. Hum. Relat. 28 653–659. 10.1177/001872677502800704 [ CrossRef ] [ Google Scholar ]
  • Laidlaw K. E., Foulsham T., Kuhn G., Kingstone A. (2011). Potential social interactions are important to social attention. Proc. Natl. Acad. Sci. U.S.A. 108 5548–5553. 10.1073/pnas.1017022108 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Langton S. R., Watt R. J., Bruce V. (2000). Do the eyes have it? Cues to the direction of social attention. Trends Cogn. Sci. 4 50–59. 10.1016/s1364-6613(99)01436-9 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lappi O. (2015). Eye tracking in the wild: the good, the bad and the ugly. J. Eye Mov. Res. 8 : 1 . 10.1016/j.dcn.2019.100710 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Larison B., Harrigan R. J., Thomassen H. A., Rubenstein D. I., Chan-Golston A. M., Li E., et al. (2015). How the zebra got its stripes: a problem with too many solutions. R. Soc. Open Science 2 : 140452 . 10.1098/rsos.140452 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lewkowicz D. J. (2001). The concept of ecological validity: what are its limitations and is it bad to be invalid? Infancy 2 437–450. 10.1207/s15327078in0204_03 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Macdonald R. G., Tatler B. W. (2013). Do as eye say: gaze cueing and language in a real-world social interaction. J. Vis. 13 1–12. 10.1167/13.4.6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Macdonald R. G., Tatler B. W. (2018). Gaze in a real-world social interaction: a dual eye-tracking study. Q. J. Exp. Psychol. 71 2162–2173. 10.1177/1747021817739221 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mills W. (1899). The nature of animal intelligence and the methods of investigating it. Psychol. Rev. 6 : 262 10.1037/h0074808 [ CrossRef ] [ Google Scholar ]
  • Mook D. G. (1983). In defense of external invalidity. Am. Psychol. 38 : 379 10.1037/0003-066x.38.4.379 [ CrossRef ] [ Google Scholar ]
  • Neisser U. (1976). Cognition and Reality: Principles and Implications Of Cognitive Psychology. San Fransisco, CA: W. H. Freeman and Company. [ Google Scholar ]
  • Neisser U. (1991). A case of misplaced nostalgia. Am. Psychol. 46 :34–36. 10.1037/0003-066x.46.1.34 [ CrossRef ] [ Google Scholar ]
  • Osborne-Crowley K. (2020). Social cognition in the real world: reconnecting the study of social cognition with social reality. Rev. Gen. Psychol. 1–15. 10.4324/9781315648156-1 [ CrossRef ] [ Google Scholar ]
  • Parsons T. D. (2015). Virtual reality for enhanced ecological validity and experimental control in the clinical, affective and social neurosciences. Front. Hum. Neurosci. 9 : 660 . 10.3389/fnhum.2015.00660 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parsons T. D., Carlew A. R., Magtoto J., Stonecipher K. (2017). The potential of function-led virtual environments for ecologically valid measures of executive function in experimental and clinical neuropsychology. Neuropsychol. Rehabil. 27 777–807. 10.1080/09602011.2015.1109524 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Peelen M. V., Kastner S. (2014). Attention in the real world: toward understanding its neural basis. Trends Cogn. Sci. 18 242–250. 10.1016/j.tics.2014.02.004 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Puce A., Bertenthal B. I. (2015). The Many Faces of Social Attention: Behavioral and Neural Measures , eds Puce A., Bertenthal B. I. (Switzerland: Springer; ). [ Google Scholar ]
  • Richter J. N., Hochner B., Kuba M. J. (2016). Pull or push? Octopuses solve a puzzle problem. PLoS One 11 : e0152048 . 10.1371/journal.pone.0152048 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Risko E. F., Laidlaw K., Freeth M., Foulsham T., Kingstone A. (2012). Social attention with real versus reel stimuli: toward an empirical approach to concerns about ecological validity. Front. Hum. Neurosci. 6 : 143 . 10.3389/fnhum.2012.00143 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Risko E. F., Richardson D. C., Kingstone A. (2016). Breaking the fourth wall of cognitive science: real-world social attention and the dual function of gaze. Curr. Direct. Psychol. Sci. 25 70–74. 10.1177/0963721415617806 [ CrossRef ] [ Google Scholar ]
  • Rogers S. D., Kadar E. E., Costall A. (2005). Gaze patterns in the visual control of straight-road driving and braking as a function of speed and expertise. Ecol. Psychol. 17 19–38. 10.1207/s15326969eco1701_2 [ CrossRef ] [ Google Scholar ]
  • Rubo M., Gamer M. (2018). “ Virtual reality as a proxy for real-life social attention? ,” Paper presented at the Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications. New York, NY. [ Google Scholar ]
  • Schilbach L. (2015). Eye to eye, face to face and brain to brain: novel approaches to study the behavioral dynamics and neural mechanisms of social interactions. Curr. Opin. Behav. Sci. 3 130–135. 10.1016/j.cobeha.2015.03.006 [ CrossRef ] [ Google Scholar ]
  • Schilbach L., Timmermans B., Reddy V., Costall A., Bente G., Schlicht T., et al. (2013). Toward a second-person neuroscience. Behav. Brain Sci. 36 393–414. 10.1017/s0140525x12000660 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schmuckler M. A. (2001). What is ecological validity? A dimensional analysis. Infancy 2 419–436. 10.1207/s15327078in0204_02 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shallice T., Burgess P. W. (1991). Deficits in strategy application following frontal lobe damage in man. Brain 114 727–741. 10.1093/brain/114.2.727 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shamay-Tsoory S. G., Mendelsohn A. (2019). Real-life neuroscience: an ecological approach to brain and behavior research. Perspect. Psychol. Sci. 14 841–859. 10.1177/1745691619856350 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Silverstein C. H., Stang D. J. (1976). Seating position and interaction in triads: a field study. Sociometry 39 166–170. [ Google Scholar ]
  • Simons D. J., Levin D. T. (1998). Failure to detect changes to people during a real-world interaction. Psychonom. Bull. Rev. 5 644–649. 10.3758/bf03208840 [ CrossRef ] [ Google Scholar ]
  • Simons D. J., Shoda Y., Lindsay D. S. (2017). Constraints on generality (COG): a proposed addition to all empirical papers. Perspect. Psychol. Sci. 12 1123–1128. 10.1177/1745691617708630 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smilek D., Birmingham E., Cameron D., Bischof W., Kingstone A. (2006). Cognitive ethology and exploring attention in real-world scenes. Brain Res. 1080 101–119. 10.1016/j.brainres.2005.12.090 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith P. W., Feinberg R. A., Burns D. J. (1998). An examination of classical conditioning principles in an ecologically valid advertising context. J. Market. Theor. Pract. 6 63–72. 10.1080/10696679.1998.11501789 [ CrossRef ] [ Google Scholar ]
  • Sonkusare S., Breakspear M., Guo C. (2019). Naturalistic stimuli in neuroscience: critically acclaimed. Trends Cogn. Sci. 23 699–714. 10.1016/j.tics.2019.05.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stoffregen T. A. (2003). Affordances as properties of the animal-environment system. Ecol. Psychol. 15 115–134. 10.1016/j.humov.2019.01.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Thorndike E. (1899). A reply to “The nature of animal intelligence and the methods of investigating it”. Psychol. Rev. 6 412–420. 10.1037/h0073289 [ CrossRef ] [ Google Scholar ]
  • Thorndike E. (2017). Animal Intelligence: Experimental Studies. Abingdon: Routledge. [ Google Scholar ]
  • Todorović A. (2016) What’s in a Gabor Patch? Vol. 2016 Available online at: http://neuroanatody.com/2016/05/whats-in-a-gabor-patch/ (accessed April 1, 2020). [ Google Scholar ]
  • Tolman E. C., Brunswik E. (1935). The organism and the causal texture of the environment. Psychol. Rev. 42 : 43 10.1037/h0062156 [ CrossRef ] [ Google Scholar ]
  • Valtakari N. V., Hooge I. T. C., Benjamins J. S., Keizer A. (2019). An eye-tracking approach to Autonomous sensory meridian response (ASMR): the physiology and nature of tingles in relation to the pupil. PLoS One 14 : e226692 . 10.1371/journal.pone.0226692 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilson B. A. (1993). Ecological validity of neuropsychological assessment: do neuropsychological indexes predict performance in everyday activities? Appl. Prevent. Psychol. 2 209–215. 10.1016/s0962-1849(05)80091-5 [ CrossRef ] [ Google Scholar ]
  • Winograd E. (1988). “ Continuities between ecological and laboratory approaches to memory ,” in Emory Symposia in Cognition, 2. Remembering Reconsidered: Ecological and Traditional Approaches to the Study of Memory eds Neisser U., Winograd E. (Cambridge: Cambridge University Press; ), 11–20. 10.1017/cbo9780511664014.003 [ CrossRef ] [ Google Scholar ]
  • Wu D. W.-L., Bischof W. F., Kingstone A. (2013). Looking while eating: the importance of social context to social attention. Sci. Rep. 3 : 2356 . 10.1038/srep02356 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

ecological validity vs experimental realism

  • Subscribe to journal Subscribe
  • Get new issue alerts Get alerts

Secondary Logo

Journal logo.

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Ecological Validity, External Validity, and Mundane Realism in Hearing Science

Beechey, Timothy

Hearing Sciences – Scottish Section, School of Medicine, University of Nottingham, Glasgow, United Kingdom.

Received February 10, 2021; accepted November 30, 2021; published online ahead of print January 13, 2022.

The authors have no conflict of interest to disclose.

Address for correspondence to: Timothy Beechey, Hearing Sciences – Scottish Section, Level 3 New Lister Building, Glasgow Royal Infirmary, Glasgow G31 2ER, United Kingdom. E-mail: [email protected]

Tests of hearing function are typically conducted in conditions very different from those in which people need to hear and communicate. Even when test conditions are more similar, they cannot represent the diversity of situations that may be encountered by individuals in daily life. As a consequence, it is necessary to consider external validity: the extent to which findings are likely to generalize to conditions beyond those in which data are collected. External validity has long been a concern in many fields and has led to the development of theories and methods aimed at improving generalizability of laboratory findings. Within hearing science, along with related fields, efforts to address generalizability have come to focus heavily on realism: the extent to which laboratory conditions are similar to conditions found in everyday settings of interest. In fact, it seems that realism is now tacitly equated with generalizability. The term that has recently been applied to this approach by many researchers is ecological validity . Recent usage of the term ecological validity within hearing science, as well as other fields, is problematic for three related reasons: (i) it encourages the conflation of the separate concepts of realism and validity; (ii) it diverts attention from the need for methods of quantifying generalization directly; and (iii) it masks a useful longstanding definition of ecological validity within the field of ecological psychology. The definition of ecological validity first used within ecological psychology—the correlation between cues received at the peripheral nervous system and the identity of distant objects or events in the environment—is entirely different from its current usage in hearing science and many related fields. However, as part of an experimental approach known as representative design , the original concept of ecological validity can play a valuable role in facilitating generalizability. This paper will argue that separate existing terms should be used when referring to realism and generalizability, and that the definition of ecological validity provided by the Lens Model may be a valuable conceptual tool within hearing science.

Full Text Access for Subscribers:

Individual subscribers.

ecological validity vs experimental realism

Institutional Users

Not a subscriber.

You can read the full text of this article if you:

  • + Favorites
  • View in Gallery

It is not real until it feels real: Testing a new method for simulation of eyewitness experience with virtual reality technology and equipment

  • Open access
  • Published: 28 July 2023
  • Volume 56 , pages 4336–4350, ( 2024 )

Cite this article

You have full access to this open access article

ecological validity vs experimental realism

  • Kaja Glomb   ORCID: orcid.org/0000-0001-5083-0385 1 ,
  • Przemysław Piotrowski   ORCID: orcid.org/0000-0002-3163-3228 1 &
  • Izabela Anna Romanowska   ORCID: orcid.org/0000-0002-9487-2111 2  

2240 Accesses

2 Altmetric

Explore all metrics

Laboratory research in the psychology of witness testimony is often criticized for its lack of ecological validity, including the use of unrealistic artificial stimuli to test memory performance. The purpose of our study is to present a method that can provide an intermediary between laboratory research and field studies or naturalistic experiments that are difficult to control and administer. It uses Video-360° technology and virtual reality (VR) equipment, which cuts subjects off from external stimuli and gives them control over the visual field. This can potentially increase the realism of the eyewitness's experience. To test the method, we conducted an experiment comparing the immersion effect, emotional response, and memory performance between subjects who watched a video presenting a mock crime on a head-mounted display (VR goggles; n  = 57) and a screen ( n  = 50). The results suggest that, compared to those who watched the video on a screen, the VR group had a deeper sense of immersion, that is, of being part of the scene presented. At the same time, they were not distracted or cognitively overloaded by the more complex virtual environment, and remembered just as much detail about the crime as those viewing it on the screen. Additionally, we noted significant differences between subjects in ratings of emotions felt during the video. This may suggest that the two formats evoke different types of discrete emotions. Overall, the results confirm the usefulness of the proposed method in witness research.

Similar content being viewed by others

ecological validity vs experimental realism

Revisiting the Heider and Simmel experiment for social meaning attribution in virtual reality

ecological validity vs experimental realism

Library for universal virtual reality experiments (luVRe): A standardized immersive 3D/360° picture and video database for VR based research

ecological validity vs experimental realism

Sustained inattentional blindness in virtual reality and under conventional laboratory conditions

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

Introduction

For many decades, forensic experts have drawn attention to the limited possibility of reaching inferences about the real experiences of eyewitnesses based on the results obtained in laboratory studies. One of the most fundamental issues is the lack of ecological validity of such experiments (Chae, 2010 ; McKenna et al., 1992 ; Wagstaff et al., 2003 ; Yuille & Wells, 1991 ). During laboratory experiments, stimulus manipulation does not evoke states that mimic the experiences of real eyewitnesses. Participants who stay in a safe space are rarely surprised by stimuli, and do not confront unexpected events. Therefore, it is possible that their reactions to short films, slides, narratives, or recordings presenting a crime are the product of rational thought rather than instinctive responses. Thus, there is no shortage of voices in the literature encouraging more field research and analysis based on real crime cases (e.g., Yuille, 2013 ). This type of study, however, has its own challenges related to the limited ability to control for confounding variables and the need to rigorously repeat the procedure in situ, which is more complex and unpredictable (Grzyb & Doliński, 2021 ). Moreover, this type of research is demanding to organize and administer, which, with the heavy emphasis on increasing the sample size in psychological studies, can make it time-consuming and cumbersome (Doliński, 2018 ). As a result, the contribution of field or naturalistic experiments is very limited. In preparing this paper, we analyzed 1400 publications indexed in Google Scholar (search term: eyewitness testimony 'field study'), examining the abstracts of empirical articles and the method sections. We found that the vast majority of them are in one area of interest: the effects of alcohol and other psychoactive substances on witness memory. This may suggest that, for most psychologists, field experiments are a last resort, used essentially only when a safer, better-controlled laboratory alternative is not available.

With this in mind, the purpose of this study is to test a method that employs elements of virtual reality (VR) and its equipment for experimental manipulation. We believe that this procedure could provide an intermediate point between laboratory research and naturalistic or field experiments, as it allows exposure to more realistic stimuli. Empirical research in eyewitness testimony research already makes use of VR and Video-360° display equipment—for example, Kloft et al. ( 2020 ) in their study on false memory used virtual reality equipment and digital imagery. They simulated two criminal events in which the subjects played the role of an uninvolved witness of a physical attack on a policeman, or a perpetrator of theft in a bar. The scene was created with digitally generated graphics; thus the perpetrator, victims, and bystanders resembled game avatars. As we are not aware whether fully digital characters have the capacity to imitate humans in a way to produce effects similar to the experience of watching a real person being harmed, this type of manipulation will not necessarily be adequate for the study of emotions and phenomena typical of social situations. After all, as we know from the game research, one of the leading factors in determining how believable a so-called NPC (non-playable character) is depends on perceptual cues (e.g., Warpefelt, 2016 )—thus, characters that look, move, express emotions, and behave unnaturally may not evoke similar psychological reactions as humans.

At this stage of the use of VR in eyewitness testimony research, however, the main obstacle is not so much the potential inadequacy of the stimuli, but rather a lack of methodological analysis of its effectiveness in inducing desired psychological states. Controlling for realism with few questions about the "realness" of the environment (e.g., Romeo et al., 2019 ), while important, does not allow us to fully determine the extent of immersion in the stimulus, and therefore subjects' engagement with the virtual world. Nor does it provide a way to identify these aspects of the method that can compete with more traditional research methods used in the psychology of witness testimony. We, therefore, decided to conduct a systematic study focused on VR, which appears to be essential to understanding the psychological states evoked by this medium. Our aim was to investigate the capability offered by virtual reality technology with respect not only to the realism of the experience but also to its potential consequences in terms of emotions and cognition.

To the best of our knowledge, this paper presents the results of the first methodological analysis of the potential of VR in eyewitness testimony research. Our study is set firmly in this field. While we do not ignore the body of work which demonstrates the capability of virtual reality to evoke emotions and arousal (e.g., Hofmann et al., 2021 , who studied the subjective emotions and cortical α activity evoked by riding a rollercoaster in VR) or a sense of presence (e.g., Barreda-Ángeles et al., 2021 , who used a design similar to ours while investigating journalistic pieces in terms of immersion and cognitive processing ), we believe that, with such a specific medium and research subject, it is essential to ensure that the particular context is addressed. For, as Yuille and Wells ( 1991 ) argue, in order for psychological lab research to be generalizable to real-life situations and to serve, for example, expert witnesses, it is essential to consider the contextual equivalence of the real eyewitness experience and the study. In our view, press materials and rollercoaster rides do not reflect this context; thus, our ability to infer the utility of VR in the paradigm of witness testimony is limited.

Crucial limitations of laboratory experiments in eyewitness testimony

The discussion regarding the generalizability of laboratory research on memory has been ongoing for many decades, and any attempt to summarize it deserves a separate article. No less intense is the debate over the validity of laboratory research on eyewitness testimony—for some experts the overreliance primarily on laboratory studies is the reason for the deficient recognition of many psychological phenomena in the forensic field. An extreme position has been presented by Yuille ( 2013 ), who argues that the context of the laboratory is so different from the context of many crimes, particularly violent crimes, that using the lab to study memory in the forensic context is pointless (p. 9).

One key criticism of laboratory research on eyewitness testimony is that it often uses highly controlled and artificial stimuli, such as photographs or videos of staged events, rather than live events. These stimuli hardly apply to real situations, where witnesses often encounter more complex and dynamic stimuli for which they are not prepared. Processing of stimuli that are simplified or highly focused on a specific aspect of reality appears to be less prone to the distortions present with the high demands that crime observation places on the witness's real-life experience, even when their level of involvement is minimal. As a result, one can expect findings from lab-based research that suggest better witness memory performance than may be the case with higher distraction (Lane, 2006 ).

Other important aspects relate to the inability to simulate a sense of threat and fear in the lab, and the consequences (or lack thereof) that lab eyewitnesses suffer for making mistakes. However, from the point of view of this paper, the critique concerning the conditions for processing and encoding information is crucial—it is this lack of the naturalness of the stimulus that we are addressing with this research. With this in mind, the goal of our research was to verify an experimental method using VR elements to simulate the experience of an eyewitness. We believe that this method may overcome the limitations of typical witness testimony research, and has the potential to create a stimulus-rich, close-to-real experience, while maintaining high control and replicability of the procedure.

How can virtual reality help experimental psychologists?

The definition of virtual reality is a subject of debate among experts, who do not always agree on the criteria that constitute VR. Since covering the discussion of this topic is beyond the scope of this paper, we focus solely on the criteria that justify the choice of this medium for psychological research. They relate primarily to the capacity users have in this environment and the degree of influence they have on it. Many experts would agree to use the term virtual reality only if the user has the ability to move, interfere, and change certain elements of that environment (Kardong-Edgren et al., 2019 ). For the purposes of this study, however, we adopted a less rigorous criterion: virtual reality is a digital space in which the user's movements are tracked, and their environment is continuously rendered and displayed according to those movements. Its purpose is to replace the signals coming from the real environment with digital ones (Fox et al., 2009 ). Therefore, a medium that adapts to the user's point of view and cuts off their access to real existing stimuli can be considered to be virtual reality.

These criteria are met by Video-360° (also called spherical video). Although the ability to influence the environment is limited to changing the field of view, the realism of this medium gives it an undeniable advantage over the strictly digital environment for psychological research. Video-360° uses recordings of real people in a real space. Therefore, researchers need not fear the effects that are present when it comes to realistic computer-generated characters (e.g., uncanny valley; Tinwell et al., 2011 ).

Some definitions of VR also focus not so much on the technology itself but place the user and their experience at the center. For example, as highlighted by Jerald ( 2015 , p. 45): VR is about psychologically being in a place different from where one is physically located, where that place may be a replica of the real world or may be an imaginary world that does not exist and could never exist . This definition refers, not explicitly, to psychological phenomena reflecting a sense of presence, transportation, or immersion in a particular medium. They define the state of being absorbed by the environment, a sense of being part of it, and experiencing it (Rigby et al., 2019 ). These are related terms, but their meanings vary among those in the broad field of human-computer or human-media interactions. In this paper, we have chosen to use the term immersion primarily to ensure consistency between theoretical terms and research methods. It is crucial to underline that the immersion effect is, in our view, an index of the simulation's realism, and this is the focal point of this study, as by realism we consider not so much the accuracy of the reflection of some fragment of reality, but realism of the user's subjective experience. Similarly to Steuer et al. ( 1995 ), we believe that a sense of immersion can enhance the overall viewing experience, making it feel more real and lifelike. As a result, we can expect psychological states and behaviors similar to real life, as the medium is capable of invoking the illusions of place (a sensation of being in a real place) and plausibility (the illusion that the scenario being depicted is actually occurring) (Slater, 2009 ).

These main criteria—the ability for the users to change their point of view, isolation from external stimuli, and the capacity for immersion effect—are components of the simulation which better imitate the real-life experience of an eyewitness. However, these are not the only benefits of using VR in experimental procedures. It also automates the procedure so that it is consistent and not affected by external unexpected events (compared to staged crime). More complex systems also offer performance recording, which provides insight into what the subject is doing in this environment, e.g., via eye tracking. Thus, increased realism does not come at the expense of the rigor of the procedure or control of the experiment.

Current study: Variables and hypotheses

Taking into account the nature of witness testimony research and, above all, the need to increase the ecological validity of the research while maintaining the rigor of the experimental procedure, we formulated the following hypotheses.

Our main dependent variable is immersion—an effect that can be described as being absorbed by a given medium (a game, a movie, or even a book). Thus, in this research, immersion is considered an operationalized realism of the experience. We expect that [H1] video watched on head-mounted displays (HMDs) creates a stronger immersion effect into the scene than a video watched on screen. The verification of this hypothesis is crucial for this study. If the participants have a higher sense of being present in the created scene and have the impression that they are in the space in which a crime is taking place, we would consider that the simulation has fulfilled its primary role, which is to increase the realism of simulation of the experience typical of an eyewitness. In addition to the main effect, we also expect differences in one of the subscale—Transportation. This subscale reflects a psychological state in which the distance between the observer and the scene is shortened, resulting in an observer feeling as if they are part of the events being presented. Achieving such a state seems to fulfill the previously mentioned definition of VR proposed by Jerald ( 2015 ), outlining a psychological "transfer" to a created reality.

A secondary issue with increased immersion relates to the consequences of this effect. As our objective was to develop a stimulus manipulation suitable for eyewitness testimony research, we assumed [H2] that subjects who watched the scene on HMD would feel stronger emotions than those who watched the same video on a screen. In particular, we expected higher negative emotions ratings accompanied by higher arousal. Therefore, we expected that our experiment would be in line with other studies suggesting an increased emotional response and arousal in VR (see, e.g., Estupiñán et al., 2013 ; Tian et al., 2021 ).

Due to the stimulus-rich environment, playing videos on head-mounted displays can also have negative consequences in terms of distraction and difficulty focusing on the scene presented. One of the challenges of creating any narrative in Video-360° format is to attract and direct attention to the focal actions, as the VR viewer has a much larger field of vision to explore (Dooley, 2017 ). As a result, participants in the experiment may ignore the events that are presented and focus on something completely different. Another problem related to immersive media such as VR is visual fatigue and cognitive overload, which can lead to impairment of certain cognitive functions (Frederiksen et al., 2020 ; Souchet et al., 2022a ; for a review, see: Souchet et al., 2022b ). In fact, there are some empirical studies suggesting the existence of this effect, although the material presented was much different from the one we prepared for this research (Barreda-Ángeles et al., 2021 ).

It is therefore necessary to examine whether attention processes—and the resulting memory processes—are in any way impaired in this stimuli-reach environment. As the purpose of the study was to test an experimental method suitable for research in eyewitness testimony, we chose long-term (episodic) memory as a measure of cognitive functioning. This type of memory is the main focus of research in this area. The theoretical and empirical rationale behind investigating the relationship between attentional processes and long-term memory is substantial. Prominent concepts in information processing recognize attention, working memory, and long-term memory as interconnected systems (for example, the embedded-process model proposed by Cowan, 1995 , 1999 ). Additionally, neuroscientific research provides evidence supporting the interaction between attention and long-term memory (for a review, see Chun & Turk-Browne, 2007 ). Hence, we set out to explore potential differences in event recollection. If the proposed simulation proved itself to be a valid research method, [H3] we would expect similar memory functioning in both groups.

Materials and method

Participants.

A total of 115 subjects participated in the study (female = 76). Ultimately, due to incomplete questionnaires and device or recording malfunctions, 107 subjects (M age  = 22.18; SD age  = 2.74) were eligible for the final analysis. The experimental group (VR equipment) included 57 subjects (female = 38), while the control group (flat screen) included 50 subjects (female = 35). The groups did not differ in terms of age ( t (104 Footnote 1 ) = .422; p  = .674). As compensation for participation in the study, subjects were offered a 15-minute VR gaming session and an individual personality profile.

Materials and apparatus

Experimental manipulation.

The video presenting a staged criminal incident prepared for the experiment was shot using Video-360° technology, which allows the full perceptual field to be observed. It lasts about three minutes and presents a scene in a pub with an outdoor garden. The criminal incident involves two perpetrators, male and female. They rob a girl who is sitting next to them. To carry out the theft, the male perpetrator turns to the victim and asks her for directions; at the same time, the female perpetrator approaches the table, takes a tablet and a wallet, and walks away from the scene. When the girl realizes that her belongings have been stolen and tries to run after the female perpetrator, the male stops her by pushing her onto a chair and knocking the rest of the items off the table.

Video display equipment

We used an HP Omen laptop computer with a 15″ diagonal screen and HP Reverbs G1 goggles (head-mounted device). We used the HMD in the experimental conditions, and a computer screen in the control group. In the experimental condition, subjects were able to view the full perceptual field, covering 360 degrees, while the subjects watching the movie on the screen viewed a slice of that scene, covering the central visual field, which was adapted to the flat screen. A comparison of the image observed by subjects in both groups is presented in Fig. 1 .

figure 1

Comparison of perceptual fields accessible to subjects under two conditions. At the top is a 360-degree view as seen in HMD; the bottom screen shows the scene on a 2D screen.

Post-event emotional ratings

The Geneva Emotion Wheel (GEW; Sacharin et al., 2012 ) was used to determine the valence and intensity of emotions experienced by participants while watching the film. This is a self-report measure consisting of discrete emotion labels corresponding to emotion domains that are arranged in a circle. The alignment of emotion terms is fundamental to the two-dimensional (2D) values (negative to positive) and control (low to high). The response options correspond to different levels of intensity for each emotion family, from low intensity (1) to high intensity (5). Subjects can also indicate that they did not feel a particular emotion (0), and they can independently label the name of the emotion they experienced.

Psychophysiological measurements

To assess arousal, we measured electrodermal activity (EDA). A wireless Shimmer3 GSR+ unit (worn as a wristband on the nondominant hand) and two EDA diodes were used. The unit was calibrated with a sampling rate frequency of 512 Hz. Subjects were asked to take a comfortable position, place their forearms on the desk, and attempt to minimize hand movement while watching the video. Preprocessing and further data analysis were performed in Python using a pyEDA (Hossein Aqajari et al., 2021 ).

Immersion assessment

To measure immersion in videos, we used the Immersive Experience Questionnaire for Film and TV (Film IEQ) developed by Rigby et al. ( 2019 ). The questionnaire was translated into Polish. It consists of 24 items and four factors: Captivation, real-world Dissociation, Comprehension, Transportation. The overall result of the questionnaire determines the strength of the immersion effect. Participants were asked to indicate on a seven-point scale how much they agreed with the statement.

Post-event memory performance

In this study, we analyzed episodic memory in delay condition and free recall procedure. For the purpose of the study, we created an index including the number of correctly remembered details about the event. The list of data that were considered includes information on the course of the event and the look and behavior of the perpetrators. It was developed by two competent judges who were unrelated to the project and not involved in the psychology of witness testimony. They were asked to watch a video (in the 2D variant) and then, immediately after watching it, to record all the information about the scene and the appearance and behavior of the people they watched. Based on the two lists we received, we created one covering all the noticeable details. We treated every detail as bits of information, which we then scored (if the information was given) in the subjects' responses. The maximum the subjects could report was 83 bits of information.

Due to differences between the videos in the size of the perceptual field, we only included information common to both conditions in the analysis.

The experiment was conducted in a between-subjects design. Subjects were randomly assigned to the experimental or control condition at the time of enrollment. The conditions differed in the type of equipment used to display the video. In the experimental conditions, subjects watched a 360-degree video played on head-mounted displays; in the control conditions, we used the traditional method of playing the video, i.e., on a flat screen.

Due to the health concerns of the subjects and the resulting sanitary rigor, the experiment was conducted in individual sessions. The whole procedure took about 1 h (+ c. 15–20 min for the game session offered as compensation). It included the following steps:

Preparation and baseline measurement (relaxing video) of electrodermal activity.

Exposure to stimulus (HMD versus flat screen) and electrodermal activity measurement.

Emotion self-report. Immediately after watching the video, subjects were asked to rate the intensity of emotions they felt while watching the video. We wanted to measure them soon after the film ended, so that the emotions would still be vivid and could be evaluated easily.

Immersion measurement (self-report).

Filler task designed to delay the memory testing, allowing us to study long-term memory rather than working memory. The participants completed questionnaires, the results of which will not be reported here.

Free recall memory task. Respondents were asked three questions: (1) Tell all you remember about the scene in a pub that you just watched, both about how the scene unfolded and about the people who participated in it . (2) Do you remember anything about the appearance of the main characters ? (3) Is that all you remember about the film? The task format, i.e., including three questions, was developed after the pilot study which showed that subjects, when asked to describe "everything they remember," were limited to a very schematic and brief description of the events. As very short description do not allow for a reliable comparative analysis, we decided to expand the task and ask three questions. As our study is concerned with eyewitness testimony, the question about the perpetrators' look was crucial (this type of information is often collected by investigators to identify the perpetrators.). We also added a third question in case that a subject remembered something about the perpetrators' behavior after recalling their appearance. Subject responses were recorded using a voice recorder. The recordings were then transcribed and coded to be analyzed in terms of the amount of information provided. The time interval between the encoding memory and recollection was set at 25 min.

The procedure was positively reviewed and approved by the Research Ethics Committee at the Institute of Applied Psychology at the Jagiellonian University before its application (decision number 56/2019 dated 25 November 2019).

For statistical analysis, we used PS Imago (IBM SPSS Statistics 28), JASP 0.16.4.0, and Python 3.10. The default software was SPSS; thus, we only specify when the analyses were performed with different tools.

Hypothesis 1. Videos watched on HMD are more immersive than those watched on screen

Our main objective was to verify the hypothesis of deeper immersion of a video viewed on HMD. To examine this, we used Film IEQ to measure the overall immersion effect and its components. We were most interested in the main effect, but we also expected to see a difference in terms of Transportation. The results of the subjects’ ratings and between-subjects comparison are presented in Table 1 .

A comparison made using a one-tailed (given the directional hypothesis) t -test for independent samples showed that participants who watched the video on HMD rated immersion ( t (105) = 2.756; p  = .003; d  = .534) and its two components, Captivation ( t (105) = 2.963; p  = .002; d  = .574) and Transportation, higher ( t (105) = 1.963; p  = .026; d  = .380). Ratings for the two other factors, namely Comprehension ( t (105) = –.553; p  = .291) and Dissociation ( t (105) = .132; p  = .189), did not differ between conditions. Given that we primarily expected a significant difference in the main effect, we consider Hypothesis 1 to be confirmed.

Hypothesis 2. Video watched on HMD evokes stronger emotions and higher arousal

Our second hypothesis relates to the potential consequences of the immersion effect, i.e., stronger emotional responses. In this experiment, we examined subjects’ rates of emotions in terms of their intensity and valence, as well as psychophysiological arousal. The first two aspects were examined using self-reports (GEW), while arousal was operationalized as electrodermal activity (EDA).

Post-event emotional self-ratings

To answer the question of whether video played on HMD evokes stronger emotions than video played on a screen, we analyzed the answers that subjects gave in the GEW. We first analyzed all discrete emotion labels and compared them between conditions (Table 2 ). The analysis indicates that the only emotion that the subjects in the experimental group (VR) rated higher was guilt ( t (79.53*) = 2.753; p  = .004; d  = .520; one-tailed significance). Moreover, contrary to our directional hypothesis, participants who watched the video on the screen rated hate ( t (89.69*) = –2.368; p  = .010; d  = .455) and anger t (104.97*) = –2.928; p  = .002; d  = .562) higher. These emotions, rather than fear, are expected after watching a criminal incident (theft and assault), as the subjects were not at risk of any harm.

In the second step, we created general indices of domains of emotions, in line with the theoretical background of the method (Scherer, 2005 ). Each indicator is an averaged rating of an emotion belonging to one of the quarters of the GEW (negative valence, low control; negative valence, high control; positive valence, low control; positive valence, high control). As can be seen in Table 3 , there is a significant (one-tailed) difference ( t (105)  =  –1.762; p  = .040; d  = .349) between conditions with respect to the ratings of emotions with negative valence and high control. This is a consequence of higher ratings for anger and hate that comprise this domain. However, significantly, the result is opposite to the one we assumed.

We began the EDA analysis by checking the data for any recording errors or artifacts that might strongly distort the measurement. As we did not identify such records, we performed the analysis using the calculation method proposed by Hossein Aqajari et al. ( 2021 ). First, we examined the mean level of electrodermal activity recorded when subjects watched the video. This allowed us to determine the overall arousal induced by the medium. Figure 2 presents the filtered electrodermal activity. To eliminate individual differences in perspiration, we compared the measurements recorded during the crime video with baseline measurements recorded during the preparation for the experiment. We compared two segments lasting 165 seconds, omitting the first seconds of the video because of the potential novelty effect that may cause arousal.

figure 2

Filtered electrodermal response recorded while watching the video. Between-subjects comparison

Table 4 presents the results of our analysis (“mean activity”). Although we were unable to obtain a significant difference ( t (105) = 1.553; p  = .062) between the conditions in the one-tailed test (owing to the directional hypothesis assuming higher arousal in the experimental condition), we can describe these results as on the verge of significance.

The second step of our analysis was to compare only the end of the video—that is, the several seconds (18 s) during which the crime occurred. This is because we wanted to isolate the arousal caused by the crime stimulus itself, not the entire video. The results are presented in Table 4 (“max. peak”). To verify the hypothesis of stronger arousal experienced when the crime itself was observed on the HMD rather than on the screen, we compared the maximum amplitude peak between conditions. Once again, we observed close to significance in a one-tailed t -test ( t (91.98) = 1.529; p  = .065). To summarize the analyses performed to verify Hypothesis 2, we can cautiously conclude that subjects under the experimental conditions were more aroused than those in the control group. At the same time, they generally rated negative emotions with high control lower than those who watched the film on the screen. However, they felt stronger guilt than the subjects in the control condition. Thus, we consider these results to be inconclusive.

Hypothesis 3. Video displayed in VR HMD is not more distracting than video played on screen

We considered post-event memory performance as a measure of distraction. We assumed that distraction would be indicated by a lower number of correctly reported pieces of information about the crime scene. Thus, to compare recollection between conditions, we used an index covering the number of details accurately remembered by the subjects. Table 5 presents the results. The t -test revealed that the conditions do not differ in the number of correctly remembered details ( t (105) = .073; p  = .942). However, as our hypothesis stated that there are no differences in recollection, we also decided to use Bayesian statistics and to apply the Bayes factor (BF) in the interpretation. BF is interpreted as the ratio of the probability of obtaining given observations in two comparable models (null hypothesis and alternative hypothesis; Masson, 2011 ). We performed the analysis in JASP and adopted the interpretation of the factor according to Andraszewicz et al. ( 2015 ): BF1–3 = anecdotal evidence for the null hypothesis; BF3–10 = moderate evidence for the null hypothesis; BF10–30 = strong evidence for the null hypothesis. The Bayesian independent t -test shows that there is moderate evidence for H0 (that is, there is no difference in terms of the number of correctly recalled bits of information about the event, BF 01  = 4.866).

In addition, to investigate more subtle aspects of recollection, we also analyzed misreports. We took into account both types of errors: (1) distortions , which are all bits of information that involve details that were present in the video but were incorrectly reported (e.g., incorrect color of pants, misremembered behavior), and (2) additions , which are all the bits of information that were absent in the video but were reported by subjects. As can be seen in Table 6 , the mean number of both types of errors, but also their overall value (Σ distortions + additions), is similar in VR and Screen conditions. Between-subjects comparison also showed no statistical difference in the number of errors; however, in the case of distortions there is only anecdotal evidence for the null hypothesis (BF 01  = 2.46). Finally, we decided to investigate the overall accuracy of recollection and compare the rates between conditions. We define recollection rate after Evans and Fisher ( 2011 ) as the number of accurately provided details (see Table 5 ) of the event / Σ accurate + errors (see Table 6 ). The rates, as shown in Table 7 , are almost identical for both conditions, and between-subject comparison indicates that there are no differences in terms of the accuracy of the recall ( t (105) = .127; p  = .899). The Bayesian t -test provides moderate evidence for the null hypothesis (BF = 4.84). Considering all the above analyses performed for Hypothesis 3, we conclude that it has been confirmed.

In the experiment ( N  = 107) in which we compared two types of video display devices (head-mounted device and flat screen), and thus two formats of video recording (Video-360° and 2D video), we obtained results suggesting that our proposed method may be a more realistic alternative to traditional stimulus manipulations using videos. We infer the higher realism of the subjects' experiences primarily based on the difference in terms of immersion effect evoked during stimuli manipulation. We observed higher rates of immersion and its two factors (Captivation and Transportation) among people who watched the video on HMD; thus, we believe that this medium offers researchers the potential to elicit in subjects a sense of being highly engrossed in a mediated experience. Our results suggest that the VR group felt more involved in the video and were more motivated to watch it (Captivation). Furthermore, there are some arguments in favor of the notion that, while watching a criminal incident on HMD, subjects felt like they were experiencing events for themselves and were located in the world portrayed in the video (Transportation). These differences between conditions indicate that the proposed method increases the realism of the experience and shortens the distance between the observer and the scene.

The results of our study can be related to the concept of two different types of realism in laboratory research introduced by Aronson and Carlsmith ( 1968 ) and developed by Wilson et al. ( 2010 ). The researchers proposed to assess lab research in terms of experimental and mundane realism. The first one implies subjects’ involvement in the situation created in the laboratory and the authentic experiences evoked during the task, while the latter is defined as the similarity of the experimental situation to events that might happen in real life. The results of our study support the argument that VR may enhance both types of realism. On the one hand, subjects in the VR group were more engaged in the experiment, as evidenced by higher scores in Captivation; on the other hand, they felt as if they were part of the crime event (Transportation), which appears to satisfy the definition of mundane realism. Therefore, we believe that studies that use VR for stimulus presentation seem to be less burdened by the accusation that is made against traditional laboratory research in eyewitness testimony, which points to the “artificiality” of experimental manipulation.

In contrast to immersion, we obtained inconclusive results when comparing subjects' emotional responses between conditions. On the one hand, we can argue (with some caution) that subjects in the experimental conditions were slightly more aroused than those in the control conditions, although the results are only on the cusp of significance in one-sided tests. To evaluate the level of uncertainty associated with the results, we conducted additional analyses in which we used bootstrapping simulation. Their results (see Appendix—Supplementary Analysis B ) provide additional support for the notion that the subjects' arousal was higher while watching the crime scene in VR than on the screen. First, parametric bootstrapping (resampling 10,000 times) demonstrated a significant difference between the conditions in terms of the change in arousal between the baseline measurement and the arousal experienced during the viewing offense. Secondly, the permutation test showed that although the maximum arousal registered during the last scene (the actual crime) was comparable, this finding is only true for low and medium amplitudes. For the most responsive subjects, the crime scene viewed in VR was significantly more arousing than the scene presented on the screen. These results suggest that experimental manipulation in VR may be recommended in particular for a strong emotional stimulus and/or a population with a low arousal threshold. Our study thus indirectly supports the finding of Slater et al. ( 2006 ), who showed a significant increase in arousal in an anxiety situation experienced in VR in phobic-sensitive subjects.

On the other hand, we obtained rather surprising results in ratings of the intensity of discrete emotions. They indicate stronger anger and hate felt by the subjects in the control conditions and more intense guilt felt by those in the experimental group.

First, it is necessary to address the discrepancy between the two measurements of the components of emotion (subjective feeling and psychophysiological measurement). This inconsistency is explainable theoretically, and parallels in other empirical studies can be pointed out (e.g., Mauss et al., 2004 ; Chivers et al., 2010 ; Ciuk et al., 2015 ). People struggle to identify and evaluate the intensity of emotions for various reasons. Labeling a specific emotion may be difficult, as during the emotional process they may quickly change (Scherer, 1987 ). Moreover, some stimuli may elicit emotions that are more complex and multifaceted than those captured by simple measurement methods associated with discrete emotions.

However, this discrepancy does not explain why subjects watching the crime in VR felt stronger guilt, while those watching the video on screen rated anger and hate higher. As our research is the first attempt at methodological analysis of stimulus manipulation using Video-360° and VR equipment to compare discrete emotions, in discussing the results, we decided to include two alternative explanations—we consider them as a starting point for future research on the proposed technique in witness testimony studies.

First of all, it should be considered that such emotion ratings adequately represent the subject's emotional experience, and therefore, in fact, these two media elicit different kinds of emotion. Based on the theory of emotions, it is possible to formulate possible explanations of which aspects of the experimental manipulation may be considered as their antecedents.

Although we commonly think of shame and guilt as feelings we experience as a result of our own actions, feelings of self-condemnation can sometimes result from acts committed by others. In such a situation, we can refer to so-called vicarious guilt , as Lickel et al. ( 2005 ) defined it. It assumes that personal causality is not always a prerequisite for the experience of guilt, but that there are certain conditions that may induce it. Thus, referring to Lickel's research, it seems possible that subjects who experienced increased transportation to the crime event and immersion in the scene could have felt stronger vicarious guilt due to virtual reality-induced control of the situation. Perhaps they felt while watching the crime that they could have done something—helped the victim catch the perpetrators, or even stopped them before the crime occurred. Importantly, the intensity of this guilt is not high. This may be because the emotion was triggered by the behaviors and actions of someone else, not themselves. This explanation, however, needs further verification with methods capable of discriminating between different types of guilt.

Anger and hate

These two emotions are substantively content related and are sometimes considered together (e.g., Bernier & Dozier, 2002 ; Frijda, 1986 ; Power & Dalgleish, 2015 ). Anger is often defined as a modal/basic emotion. By signaling significance at the individual–environment interface, it organizes a response to the stimulus, which often takes the form of aggression. However, anger is not necessarily a response to a stimulus directly related to the individual's self, but can also be triggered by aversive environmental stimuli, such as unpleasant sights, smells, and extreme thermal sensations (Berkowitz, 1990 ). In this sense, then, it appears more similar to hate than to a modal emotion that prepares for a fight. After all, one way to understand hate—on an individual, not a group level—is to define it as a strong feeling of intense or passionate dislike for someone or something. When considering hate, we most commonly refer to an emotion aroused by frustration of needs or an unpleasant sensory experience (Brogaard, 2020 ), but this emotion also has links to the moral evaluation of certain behaviors (Pretus Gomez et al., 2022 ). In this sense, hate and anger are emotions that could be evoked by the video that presents two individuals committing a crime and behaving in an irritating manner. Juxtaposing self-report rates with psychophysiological measurements (lower arousal in control condition), we can conclude that the video probably did not evoke violent, highly arousing emotions or trigger a fight/flight response. It results rather in a moral emotion based on evaluation of the culprits’ behavior. This line of reasoning, however, requires additional research that would provide a more in-depth understanding of the subjects' emotional states. Methods based on a free response format (e.g., Geneva Affect Label Coder; K. R. Scherer, 2005 ) or focused interview may be most useful.

However, why these emotions were felt more intensively when the crime was seen on the screen is more challenging to explain. Perhaps this format allowed the subjects to focus more on the course of the events they were watching. They had no influence on the visual field, so they could only follow what the perpetrators did. As a result, their attitude, conversation, and actions may evoke stronger emotions. Such an explanation would be consistent with research results indicating that shifting attention from an emetogenic stimulus to its background significantly reduces emotional experience (e.g., Dolcos et al., 2020 ). Moreover, the ability to change attention is one of the theoretical factors mediating emotional experience: it is necessary for regulating emotions and, therefore, maintaining desirable emotional states (Wadlinger & Isaacowitz, 2011 ).

This format-driven focus exclusively on a key part of the scene, reducing the ecological validity of the "witnessing" experience, may have also made the scene less ambivalent and simpler to interpret. Meanwhile, the scene viewed on HMD gave the subjects some control over the experience—although they remained static (they couldn't change their seats), they could look away, and see how others were reacting. Perhaps, too, the incident was more surprising or startling, which was not covered by the self-report method we used. As a result of being present in the space with other eyewitnesses, the subjects' responses may have been influenced by how other people present in the pub behaved (the characters expressed surprise and incomprehension of what had happened—both with their reactions and verbally). After all, as Erber and Erber ( 2000 ) stated, people are often compelled to regulate affective states according to the demands of the situation, and social appropriateness (especially when interacting with strangers) is one of the most prominent motives for self-regulation. Thus, in the experimental conditions, perhaps emotions more appropriate to the situation were evoked, not so much anger and hatred toward the perpetrators, but surprise that the robbery happened at all. However, this interpretation requires verification to determine the intensity of the surprise felt in a VR environment.

An alternative explanation for the results obtained in the study can be offered. It refers not so much to the results of the control group, as to the experimental one. Researchers investigating the immersion effect, in particular the presence in a virtual environment, draw attention to the essential hedonic nature of this experience. As Murray ( 2017 , p. 98) states, “ The experience of being transported to an elaborately simulated place is pleasurable in itself, regardless of the fantasy content .” Accepting this explanation, it can be argued that this pleasurable nature of being in a simulated space among other people on a warm, summer day may have resulted in the suppression of negative emotions in the experimental group. This explanation is all the more plausible when one considers that the study took place during a period of a sanitary regime (related to the COVID-19 pandemic), which limited opportunities for social participation Footnote 3 . On the other hand, the results of the comparison of conscious positive emotions do not differ between the groups. However, subjects who watched the video on HMD rated it slightly higher than those who watched it on a screen (M VR  = .91, SD = 1.57; M screen  = .54, SD =1.28; t (104.45) = 1.348; p  = .090 (one-tailed).

Our study also demonstrated that the proposed simulation method does not affect memory processes. This indicates that a full Video-360° stimulus environment is not more distracting, nor does it lead to cognitive exhaustion. Thus, our research does not confirm the results obtained by Barreda-Ángeles et al., 2021 , who observed that a virtual reality environment can harm focused attention, recognition, and cued recall of information. This discrepancy is likely due to a significant difference in the content presented. While our study tried to present a realistic crime scene, and therefore a video that can be used in the psychology of eyewitness testimony, the study by Barreda-Ángeles et al. used journalistic excerpts, with specific narration and editing. While virtual reality can cause cognitive fatigue in situations where the task is also performed in this environment, it is multimodal in nature, and the quality of the simulation causes negative phenomena such as simulator sickness or visual sense interference (Nash et al., 2000 ; Souchet, et al., 2022b ), we believe that our Video-360° was easy to process. The scene presented in this research appears to be realistic, coherent, and thus processed fluently—it is not so much a content carrier, but more of a presence in the environment itself. However, the scene in VR directs attention and forces concentration on the elements chosen by the developer, so in this respect it is still a proxy of the witness experience, in which case greater memory disruption is expected (Ihlebæk et al., 2003 ).

Limitations and future research

Although our study identified that the potential application of virtual reality in memory research is an important contribution to the research methodology in the psychology of eyewitness testimony, it is not free from limitations. As the study compared a video presented on-screen with one mediated by virtual reality equipment, the ability to infer the ecological validity of this method is still limited. For a method to be considered more ecologically accurate, a comparison with a natural experiment is necessary. Nevertheless, based on the results, we can infer a higher realism of the witnesses' experience—a deeper sense of being a real observer of the crime, rather than a viewer of a crime film.

Another limitation is the relatively modest sample size, which probably resulted in some of the analyses not yielding significant results and being only on the verge of significance. However, the research was conducted during a period of sanitary regime, which not only made it harder to access potential participants, but also slowed the research process. Research using virtual reality equipment required subjects to be present in the laboratory and could not be carried out over the Internet. Therefore, we decided to conduct the experiment within the scheduled project period, even at the cost of a smaller sample size.

We believe that the research should be repeated not only due to the small sample size but also to the surprising results of the emotional response analysis that are contrary to the hypotheses. Given that the tool we chose to measure the intensity of emotions did not allow us to capture surprise, we are not certain that the idea of the coherence between the reactions of the subjects and other “eyewitnesses” presented in the film is adequate. Future research should therefore compare the reaction of being startled. This would provide a stronger argument that the behavior and reactions elicited by the simulation using VR are realistic. Optimally, though, similar comparisons should be made between the VR experiment and the naturalistic one. Moreover, a more in-depth analysis of the subjects' emotional states and experiences of observing the crime is also necessary—ideally, one that allows the subjects to describe their states without the researcher's suggestion of how to label them. The account of more complex phenomenological experiences can potentially be compared to actual witnesses' emotional states, and this may provide a key argument for recognizing the proposed method as a valid simulation of witnesses' experiences.

Furthermore, the potentially pleasant nature of VR-mediated experiences should also be verified. As mentioned above, one possible explanation for the lower ratings of negative emotions in VR may be their suppression by the pleasurable nature of virtual reality. To determine whether an experimental manipulation mediated by VR in fact evokes different emotions than one performed using a traditional method, it is necessary to repeat the experiment during the period of ordinary access to social life. Another way to test it is to prepare a different stimulus that is not as easily associated with pleasure and leisure.

Given the above, however, we believe that our study represents an important step in the development of an ecologically valid experimental method. It can potentially change not only the psychology of witness testimony, but also more general studies of other mental function or behavior, so that they are set in a more realistic context without losing control of the procedure.

One person in the experimental group did not report age.

The video has been deposited in a repository and is available for noncommercial use by researchers under a CC BY-NC-ND 4.0 license.

DOI: 10.26106/r0av-bn42

https://ruj.uj.edu.pl/xmlui/handle/item/308227?locale-attribute=en

We consider this explanation plausible also in light of the qualitative assessment of the subjects' reports, which they spontaneously produced after watching the film. A portion of the participants, when asked to describe the scene, also highlighted their own emotions and described their experience. Some statements included accounts of the pleasure they felt during the simulation. Since description of emotional states in free response format was not part of the procedure, we did not systematically analyze them, but we consider this explanation plausible and in need of verification by more sensitive methods.

Andraszewicz, S., Scheibehenne, B., Rieskamp, J., Grasman, R., Verhagen, J., & Wagenmakers, E.-J. (2015). An Introduction to Bayesian Hypothesis Testing for Management Research. Journal of Management, 41 (2), 521–543. https://doi.org/10.1177/0149206314560412

Article   Google Scholar  

Aronson, E., & Carlsmith, J. M. (1968). The handbook of social psychology. In E. Aronson & G. Lindzey (Eds.), The handbook of social psychology (pp. 1–79). The handbook of social psychology.

Google Scholar  

Barreda-Ángeles, M., Aleix-Guillaume, S., & Pereda-Baños, A. (2021). Virtual reality storytelling as a double-edged sword: Immersive presentation of nonfiction 360°-video is associated with impaired cognitive information processing. Communication Monographs, 88 (2), 154–173. https://doi.org/10.1080/03637751.2020.1803496

Berkowitz, L. (1990). On the Formation and Regulation of Anger and Aggression. American Psychologist .

Bernier, A., & Dozier, M. (2002). The client-counselor match and the corrective emotional experience: Evidence from interpersonal and attachment research. Psychotherapy: Theory, Research, PracticeTraining, 39 (1), 32–43. https://doi.org/10.1037/0033-3204.39.1.32

Brogaard, B. (2020). Hatred: Understanding Our Most Dangerous Emotion . Oxford University Press.

Book   Google Scholar  

Chae, Y. (2010). Application of Laboratory Research on Eyewitness Testimony. Journal of Forensic Psychology Practice, 10 (3), 252–261. https://doi.org/10.1080/15228930903550608

Chivers, M. L., Seto, M. C., Lalumière, M. L., Laan, E., & Grimbos, T. (2010). Agreement of Self-Reported and Genital Measures of Sexual Arousal in Men and Women: A Meta-Analysis. Archives of Sexual Behavior, 39 (1), 5–56. https://doi.org/10.1007/s10508-009-9556-9

Article   PubMed   PubMed Central   Google Scholar  

Chun, M. M., & Turk-Browne, N. B. (2007). Interactions between attention and memory. Current Opinion in Neurobiology, 17 (2), 177–184. https://doi.org/10.1016/j.conb.2007.03.005

Article   PubMed   Google Scholar  

Ciuk, D., Troy, A., & Jones, M. (2015). Measuring Emotion: Self-Reports vs. Physiological Indicators. SSRN Electronic Journal . https://doi.org/10.2139/ssrn.2595359

Cowan, N. (1995). Attention and memory: An integrated framework . Clarendon Press.

Cowan, N. (1999). An embedded-processes model of working memory. In A. Miyake & P. Shah (Eds.), Models of Working Memory: Mechanisms of Active Maintenance and Executive Control (1st ed., pp. 62–101). Cambridge University Press. https://doi.org/10.1017/CBO9781139174909

Dolcos, F., Katsumi, Y., Shen, C., Bogdan, P. C., Jun, S., Larsen, R., ..., Dolcos, S. (2020). The Impact of Focused Attention on Emotional Experience: A Functional MRI Investigation. Cognitive, Affective, & Behavioral Neuroscience, 20 (5), 1011–1026. https://doi.org/10.3758/s13415-020-00816-2

Doliński, D. (2018). Is Psychology Still a Science of Behaviour? Social Psychological Bulletin, 13 (2), e25025. https://doi.org/10.5964/spb.v13i2.25025

Dooley, K. (2017). Storytelling with virtual reality in 360-degrees: A new screen grammar. Studies in Australasian Cinema, 11 (3), 161–171. https://doi.org/10.1080/17503175.2017.1387357

Erber, R., & Erber, M. W. (2000). The Self-Regulation of Moods: Second Thoughts on the Importance of Happiness in Everyday Life. Psychological Inquiry, 11 (3), 142–148. https://doi.org/10.1207/S15327965PLI1103_02

Estupiñán, S., Rebelo, F., Noriega, P., Ferreira, C., & Duarte, E. (2013). Can Virtual Reality Increase Emotional Responses (Arousal and Valence)? A Pilot Study. Lecture Notes in Computer Science, 8518 , 541–549.

Evans, J. R., & Fisher, R. P. (2011). Eyewitness memory: Balancing the accuracy, precision and quantity of information through metacognitive monitoring and control. Applied Cognitive Psychology, 25 (3), 501–508. https://doi.org/10.1002/acp.1722

Fox, J., Arena, D., & Bailenson, J. (2009). Virtual Reality: A Survival Guide for the Social Scientist. Journal of Media Psychology: Theories, Methods, and Applications, 21 , 95–113. https://doi.org/10.1027/1864-1105.21.3.95

Frederiksen, J. G., Sørensen, S. M. D., Konge, L., Svendsen, M. B. S., Nobel-Jørgensen, M., Bjerrum, F., & Andersen, S. A. W. (2020). Cognitive load and performance in immersive virtual reality versus conventional virtual reality simulation training of laparoscopic surgery: A randomized trial. Surgical Endoscopy, 34 (3), 1244–1252. https://doi.org/10.1007/s00464-019-06887-8

Frijda, N. H. (1986). The Emotions . Cambridge University Press.

Grzyb, T., & Dolinski, D. (2021). The Field Study in Social Psychology: How to Conduct Research Outside of a Laboratory Setting? Routledge. https://doi.org/10.4324/9781003092995

Hofmann, S. M., Klotzsche, F., Mariola, A., Nikulin, V., Villringer, A., & Gaebler, M. (2021). Decoding subjective emotional arousal from EEG during an immersive virtual reality experience. ELife, 10 , e64812. https://doi.org/10.7554/eLife.64812

Hossein Aqajari, S. A., Naeini, E. K., Mehrabadi, M. A., Labbaf, S., Dutt, N., & Rahmani, A. M. (2021). pyEDA: An Open-Source Python Toolkit for Pre-processing and Feature Extraction of Electrodermal Activity. Procedia Computer Science, 184 , 99–106. https://doi.org/10.1016/j.procs.2021.03.021

Ihlebæk, C., Løve, T., Erik Eilertsen, D., & Magnussen, S. (2003). Memory for a staged criminal event witnessed live and on video. Memory, 11 (3), 319–327. https://doi.org/10.1080/09658210244000018

Jerald, J. (2015). The VR Book: Human-Centered Design for Virtual Reality . Morgan & Claypool.

Kardong-Edgren, S., Farra, S. L., Alinier, G., & Young, H. M. (2019). A Call to Unify Definitions of Virtual Reality. Clinical Simulation in Nursing, 31 , 28–34. https://doi.org/10.1016/j.ecns.2019.02.006

Kloft, L., Otgaar, H., Blokland, A., Monds, L. A., Toennes, S. W., Loftus, E. F., & Ramaekers, J. G. (2020). Cannabis increases susceptibility to false memory. Proceedings of the National Academy of Sciences, 117 (9), 4585–4589. https://doi.org/10.1073/pnas.1920162117

Lane, S. M. (2006). Dividing attention during a witnessed event increases eyewitness suggestibility. Applied Cognitive Psychology, 20 (2), 199–212. https://doi.org/10.1002/acp.1177

Lickel, B., Schmader, T., Curtis, M., Scarnier, M., & Ames, D. R. (2005). Vicarious Shame and Guilt. Group Processes & Intergroup Relations, 8 (2), 145–157. https://doi.org/10.1177/1368430205051064

Masson, M. E. J. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behavior Research Methods, 43 (3), 679–690. https://doi.org/10.3758/s13428-010-0049-5

Mauss, I., Wilhelm, F., & Gross, J. (2004). Is there less to social anxiety than meets the eye? Emotion experience, expression, and bodily responding. Cognition & Emotion, 18 (5), 631–642. https://doi.org/10.1080/02699930341000112

McKenna, J., Treadway, M., & McCloskey, M. E. (1992). Expert Psychological Testimony on Eyewitness Reliability: Selling Psychology Before Its Time . Taylor & Francis.

Murray, J. H. (2017). Hamlet on the Holodeck, updated edition: The Future of Narrative in Cyberspace . MIT Press.

Nash, E. B., Edwards, G. W., Thompson, J. A., & Barfield, W. (2000). A Review of Presence and Performance in Virtual Environments. International Journal of Human-Computer Interaction, 12 (1), 1–41. https://doi.org/10.1207/S15327590IJHC1201_1

Power, M., & Dalgleish, T. (2015). From order to disorder (3rd ed.). Psychology Press. https://doi.org/10.4324/9781315708744

Pretus Gomez, C. H., Ray, J. L., Granot, Y., Cunningham, W. A., & Van Bavel, J. J. (2022). The psychology of hate: Moral concerns differentiate hate from dislike. European Journal of Social Psychology.

Rigby, J. M., Brumby, D. P., Gould, S. J. J., & Cox, A. L. (2019). Development of a Questionnaire to Measure Immersion in Video Media: The Film IEQ. In Proceedings of the 2019 ACM International Conference on Interactive Experiences for TV and Online Video (pp. 35–46). https://doi.org/10.1145/3317697.3323361

Chapter   Google Scholar  

Romeo, T., Otgaar, H., Smeets, T., Landstrom, S., & Boerboom, D. (2019). The impact of lying about a traumatic virtual reality experience on memory. Memory & Cognition, 47 (3), 485–495. https://doi.org/10.3758/s13421-018-0885-6

Sacharin, V., Schlegel, K., & Scherer, K. R. (2012). Geneva Emotion Wheel Rating Study . https://archive-ouverte.unige.ch/unige:97849

Scherer, K. (1987). Toward a dynamic theory of emotion: The component process model of affective states . https://www.semanticscholar.org/paper/Toward-a-dynamic-theory-of-emotion-%3A-The-component-Scherer/4c23c3099b3926d4b02819f2af196a86d2ef16a1

Scherer, K. R. (2005). What are emotions? And how can they be measured? Social Science Information, 44 (4), 695–729. https://doi.org/10.1177/0539018405058216

Slater, M. (2009). Place illusion and plausibility can lead to realistic behaviour in immersive virtual environments. Philosophical Transactions of the Royal Society B: Biological Sciences, 364 (1535), 3549–3557. https://doi.org/10.1098/rstb.2009.0138

Slater, M., Pertaub, D.-P., Barker, C., & Clark, D. M. (2006). An Experimental Study on Fear of Public Speaking Using a Virtual Environment. CyberPsychology & Behavior, 9 (5), 627–633. https://doi.org/10.1089/cpb.2006.9.627

Souchet, A. D., Lourdeaux, D., Pagani, A., & Rebenitsch, L. (2022a). A narrative review of immersive virtual reality’s ergonomics and risks at the workplace: Cybersickness, visual fatigue, muscular fatigue, acute stress, and mental overload. Virtual Reality. https://doi.org/10.1007/s10055-022-00672-0

Souchet, A. D., Philippe, S., Lourdeaux, D., & Leroy, L. (2022b). Measuring Visual Fatigue and Cognitive Load via Eye Tracking while Learning with Virtual Reality Head-Mounted Displays: A Review. International Journal of Human–Computer Interaction, 38 (9), 801–824. https://doi.org/10.1080/10447318.2021.1976509

Steuer, J., Biocca, F., & Levy, M. R. (1995). Defining virtual reality: Dimensions determining telepresence. Communication in the age of virtual reality, 33 , 37–39.

Tian, F., Hua, M., Zhang, W., Li, Y., & Yang, X. (2021). Emotional arousal in 2D versus 3D virtual reality environments. PLOS ONE, 16 (9), e0256211. https://doi.org/10.1371/journal.pone.0256211

Tinwell, A., Grimshaw, M., Nabi, D. A., & Williams, A. (2011). Facial expression of emotion and perception of the Uncanny Valley in virtual characters. Computers in Human Behavior, 27 (2), 741–749. https://doi.org/10.1016/j.chb.2010.10.018

Wadlinger, H. A., & Isaacowitz, D. M. (2011). Fixing Our Focus: Training Attention to Regulate Emotion. Personality and Social Psychology Review, 15 (1), 75–102. https://doi.org/10.1177/1088868310365565

Wagstaff, G. F., MaCveigh, J., Boston, R., Scott, L., Brunas-Wagstaff, J., & Cole, J. (2003). Can Laboratory Findings on Eyewitness Testimony Be Generalized to the Real World? An Archival Analysis of the Influence of Violence, Weapon Presence, and Age on Eyewitness Accuracy. The Journal of Psychology, 137 (1), 17–28. https://doi.org/10.1080/00223980309600596

Warpefelt, H. (2016). The Non-Player Character: Exploring the believability of NPC presentation and behavior .

Wilson, T. D., Aronson, E., & Carlsmith, K. (2010). The Art of Laboratory Experimentation. In S. T. Fiske, D. T. Gilbert, & G. Lindzey (Eds.), Handbook of Social Psychology (1st ed.). Wiley. https://doi.org/10.1002/9780470561119.socpsy001002

Yuille, J. C. (2013). The challenge for forensic memory research: Methodolotry. In Applied issues in investigative interviewing, eyewitness memory, and credibility assessment (pp. 3–18). Springer.

Yuille, J. C., & Wells, G. L. (1991). Concerns about the application of research findings: The issue of ecological validity (p. 128). American Psychological Association. https://doi.org/10.1037/10097-007

Download references

Acknowledgments

The study was partially funded by a mini-grant for Ph.D. students financed by the Faculty of Management and Social Communication, Jagiellonian University in Kraków.

The authors thank HP Poland Inc. for providing free of charge the VR equipment for testing and research purposes.

We want to thank Maciej Bernaś, without whom the Video-360 produced for the study would not have been created—and certainly not in such a professional form.

Open practices statement

The experiment was not preregistered.

The data (DOI 10.17605/OSF.IO/G73V5) supporting this research are available on the Open Science Framework website under a CC-By Attribution 4.0 International license.

Link to the data: https://osf.io/g73v5/

The Video-360° (DOI: 10.26106/r0av-bn42), which is the basis of the experimental procedure, is deposited in an open repository and is available for noncommercial use by researchers under a CC BY-NC-ND 4.0 license.

Link to the video: https://uj.rodbuk.pl/dataset.xhtml?persistentId=doi:10.26106/r0av-bn42

Author information

Authors and affiliations.

Faculty of Management and Social Communication, Jagiellonian University in Krakow, Kraków, Poland

Kaja Glomb & Przemysław Piotrowski

Aarhus Institute of Advanced Studies, Aarhus University, Aarhus C, Denmark

Izabela Anna Romanowska

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kaja Glomb .

Ethics declarations

Conflict of interest.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

(DOCX 1595 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Glomb, K., Piotrowski, P. & Romanowska, I.A. It is not real until it feels real: Testing a new method for simulation of eyewitness experience with virtual reality technology and equipment. Behav Res 56 , 4336–4350 (2024). https://doi.org/10.3758/s13428-023-02186-2

Download citation

Accepted : 28 June 2023

Published : 28 July 2023

Issue Date : August 2024

DOI : https://doi.org/10.3758/s13428-023-02186-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Eyewitness testimony
  • Methodology
  • Ecological validity
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Ecological Validity in Psychology: Definition & Examples (2024)

    ecological validity vs experimental realism

  2. Ecological Validity in Psychology

    ecological validity vs experimental realism

  3. Ecological Validity and “Ecological Validity”

    ecological validity vs experimental realism

  4. Naturalistic Research and Ecological Validity

    ecological validity vs experimental realism

  5. Ecological validity and Milgram’s study

    ecological validity vs experimental realism

  6. PPT

    ecological validity vs experimental realism

VIDEO

  1. Ecological validity of assessment on foundation year

  2. Theoretical vs. Experimental Probability

  3. Validity vs Reliability || Research ||

  4. Comparative vs. Absolute Advantage?

  5. Psychology101|内部vs外部vs生态有效性 (Internal Validity vs External Validity vs Ecological Validity)

  6. Quantitative Research Designs: Threats to Internal Validity Experimental Research

COMMENTS

  1. Ecological Validity (SOCIAL PSYCHOLOGY)

    In this regard, ecological validity is closely related to the concept of mundane realism. Experimental tasks are said to have mundane realism when they closely resemble activities that are common in natural settings. For example, activities in an experiment might be realistic in this mundane way when participants are asked to read a newspaper ...

  2. Internal, External, and Ecological Validity in Research Design, Conduct

    Internal, External, and Ecological Validity in Research ...

  3. What Is Ecological Validity?

    What Is Ecological Validity? | Definition & Examples

  4. Ecological validity

    The term "ecological validity" is now widely used by researchers unfamiliar with the origins and technical meaning of the term to be broadly equivalent to mundane realism. [5] Mundane realism references the extent to which the experimental situation is similar to situations people are likely to encounter outside the laboratory.

  5. What Is Ecological Validity?

    Revised on 10 October 2022. Ecological validity measures how generalisable experimental findings are to the real world, such as situations or settings typical of everyday life. It is a subtype of external validity. If a test has high ecological validity, it can be generalised to other real-life situations, while tests with low ecological ...

  6. Taking consciousness for real: Increasing the ecological validity of

    Second, a more prominent concern focuses on the trade-off between ecological validity and experimental control. 25 Both issues are not unique to consciousness science, ... where a certain level of realism creates a feeling of unease in human observers. 31 Thus, by using XR to move closer to reality, one might inadvertently create a scenario in ...

  7. PDF Three Meanings of "Ecological Validity"

    In reply, Gijs Holleman and his colleagues revived Hammond's critique (Holleman et al., 2020). The alleged misuse of the concept of ecological validity had its origins in a classic paper by Martin Orne (1962) on the social psychology of research in experimental psychology. As a former graduate student of Orne's, I had long contemplated ...

  8. Taking consciousness for real: Increasing the ecological validity of

    Taking consciousness for real: Increasing the ecological ...

  9. Ecological Validity as a Key Feature of External Validity in Research

    THE CONCEPT OF ECOLOGICAL VALIDITY. The term ecological validity was first introduced by Egon Brunswik (1943, 1955) in the area of the psychology of perception. According to Brunswik, EV concerns the question of whether the stimuli included in a psychological experiment are a good representation of the organism-environment relation in the naturally occurring ecology.

  10. Ecological Validity and "Ecological Validity"

    Egon Brunswik coined the term ecological validity to refer to the correlation between perceptual cues and the states and traits of a stimulus. Martin Orne adapted the term to refer to the generalization of experimental findings to the real world outside the laboratory. Both are legitimate uses of the term because the ecological validity of the ...

  11. Frontiers

    The 'Real-World Approach' and Its Problems: A Critique of the Term Ecological Validity. Gijs A. Holleman 1,2* Ignace T. C. Hooge 1 Chantal Kemner 1,2,3 Roy S. Hessels 1,2. 1 Department of Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, Netherlands. 2 Department of Developmental Psychology, Utrecht University ...

  12. PDF Towards Ecological Validity in Evaluating Uncertainty

    and realism [11], with experimental control positively correlated to the precision of measurements and results, and the amount of realism closely tied to the concept of ecological validity. Ecological validity specifically refers to how closely the experimental setting matches the setting in which the results might be applied [2 ...

  13. Experimental Realism (SOCIAL PSYCHOLOGY)

    The concept of experimental realism was developed in response to criticism that most social psychology experiments take place in artificial laboratory settings and thus are invalid for examining how people truly think and act. In 1968, Elliot Aronson and J. Merrill Carlsmith addressed this concern by distinguishing between reality created in ...

  14. Virtual Reality for Enhanced Ecological Validity and Experimental

    Virtual Reality for Enhanced Ecological Validity and Experimental Control. Given that virtual environments represent a special case of computerized ... some of the less robust findings may reflect the fact that early versions of the Cyberball task lacked the everyday realism and ecological validity that are now available in today's immersive ...

  15. Ecological Validity

    Ecological Validity. In subject area: Computer Science. Ecological validity is defined as the extent to which experimental findings can be generalized to real-life situations, considering the realism and immersive nature of the experimental settings. AI generated definition based on: The Journal of Strategic Information Systems, 2022.

  16. Ecological Validity in Psychology: Definition & Examples

    Ecological validity is a subset of external validity, specifically focusing on the extent to which behaviors observed and recorded in a study can be expected to occur in real-world settings (Nestor & Schutt, 2018).. To grasp the concept better, think about a study on eating habits conducted in a laboratory with measured portions . The controlled environment of a lab, with allocated meals ...

  17. What Is Ecological Validity?

    Ecological validity is a subtype of external validity, as is population validity. It's mostly used in experimental designs in the field of psychology, medicine, and other fields that concern human behavior. ... Ecological validity vs mundane realism. Ecological validity and mundane realism are sometimes used interchangeably, ...

  18. Evaluating the perceived affective qualities of urban soundscapes

    Nevertheless, field and VR laboratory tests should sustain the experimental 'ecological validity'. To guarantee this experimental condition, the laboratory reproduction of real-life audiovisual stimuli should create a similar sense of immersion and realism as in the original scenery . If similarities are maintained between real and VR ...

  19. Ecological Validity, External Validity, and Mundane Realism in Hearing

    The definition of ecological validity first used within ecological psychology-the correlation between cues received at the peripheral nervous system and the identity of distant objects or events in the environment-is entirely different from its current usage in hearing science and many related fields. However, as part of an experimental ...

  20. Virtual reality for enhanced ecological validity and experimental

    An essential tension can be found between researchers interested in ecological validity and those concerned with maintaining experimental control. Research in the human neurosciences often involves the use of simple and static stimuli lacking many of the potentially important aspects of real world activities and interactions. While this research is valuable, there is a growing interest in the ...

  21. The Treachery of Images: How Realism Influences Brain and Behavior

    Realism vs. symbolism. As Magritte highlighted best, while tangible objects satisfy needs, images are representational and often symbolic (e.g., cartoons). ... technologies and new experimental approaches provide the means to study cognitive neuroscience with a balance between experimental control and ecological validity.

  22. The 'Real-World Approach' and Its Problems: A Critique of the Term

    The 'Real-World Approach' and Its Problems: A Critique of ...

  23. Lippincott Home

    Read this article to learn about the concepts and applications of ecological validity, external validity, and mundane realism in ear and hearing research.

  24. It is not real until it feels real: Testing a new method for simulation

    Laboratory research in the psychology of witness testimony is often criticized for its lack of ecological validity, including the use of unrealistic artificial stimuli to test memory performance. The purpose of our study is to present a method that can provide an intermediary between laboratory research and field studies or naturalistic experiments that are difficult to control and administer ...