“…
While it is often treated as a certainty, Group 1 information is not actually so. Previous research results that may be used as Group 1 information are reported either qualitatively with no measure of the probability of it being right, or quantitively, via a statistically derived ‘ p ’ value (the chance of it being incorrect), which is always greater than zero [ 23 ] (The author is aware that the definition and use of p values is dispute, e.g., Sorkin et al. [ 24 ], and that a liberty is taken by describing and applying them to the discussion in this rather general manner, but the issue is too complex to be addressed here). Assuming that p = 0 for this pre-existing information does not usually cause serious issues with the design and outcomes of causal research as long as p is small enough, but this is not always so. Structural Equation Modelling (SEM) is one widely used instance where it can give rise to significant validity issues in research reporting [ 25 ]. The quote below is from an article specifically written to defend the validity of SEM as a tool of casual research:
“As we explained in the last section, researchers do not derive causal relations from an SEM. Rather, the SEM represents and relies upon the causal assumptions of the researcher. These assumptions derive from the research design, prior studies, scientific knowledge, logical arguments, temporal priorities, and other evidence that the researcher can marshal in support of them. The credibility of the SEM depends on the credibility of the causal assumptions in each application.” [ 26 ] (p. 309)
Thus, an SEM model relies upon a covariance matrix dataset, which contains no causal information whatsoever, which is then combined with the ‘credible’ causal assumptions of the researcher—normally made ‘credible’ and supported by cited results from prior research. Bolen and Pearl acknowledge this credibility generation later on the same page of their article. When they put an assumption-based arrow on a covariance-based relationship in an SEM model, the researcher that is constructing it is assuming that p = 0 for that relationship. In fact, p is never zero, and is never reported as such by prior primary research. It may be a very low number, but even if it is a very low number, the accumulated risk of the entire model being wrong can become significant if the SEM model is large and many such assumptions are made within it.
In a recent article in ‘Nutrients’ [ 27 ] (Figure 6, p. 18) present an SEM with 78 unidirectional arrows. Leaving all other matters aside, what is the chance of this model being ‘right’ with regard to just the causal direction of all these 78 arrows? If one sanguinely assumes a p value of 0.01 for all 78 individual causal assumptions, and a similar level for p in the research itself, the probability of this model being ‘right’ can be calculated as 0.99 79 = 4.5%. This very low level of probability is not a marginal outcome, and it is based upon a universally accepted probability calculation [ 28 ] and an authoritative account in support of SEM describing how SEM uses information with a high p value to establish causality [ 26 ]. It becomes even more alarming when one considers that once published, such research can then be used as a ‘credible’ secondary causal assumption input to further related SEM based primary research with its reliability/validity as Group 1 information ‘prior research’ readjusted up from 4% to 100%.
The conclusion is that ‘certainty’ in research is never actually so, and that consequently the more ‘certainty’ that a researcher includes in their theoretical development, the less ‘certain’ the platform from which they will launch their own research becomes. This is not an issue that is restricted to SEM based research—SEM just makes the process and its consequences manifest. The conclusion is that theoretical simplicity closely equates to theoretical and research reliability.
Identifying and acquiring specific information that we know we do not know is the basis of any contribution made by either experimental or observational causal research. These Group 2 relationships will thus be clearly defined by the researcher, and an enormous literature exists as to how such relationships may then be studied by either approach, and how the risk relating to the reliability of any conclusions may be quantified by statistics and expressed as a p value [ 29 ].
Typically Group 2 relationships will be few in number in any causal research exercise because a trade off exists between the number of variables that may be studied and the amount of data required to generate a statistically significant result with regard to any conclusions drawn [ 30 , 31 , 32 ]. The amount of data required usually increases exponentially, as does the number of potential interactions between the variables [ 30 , 31 , 32 ]. So, for example, a 4 2 full factorial with six levels of each variable and 30 observations in each cell would require 480 observations to fully compare the relationships between 2 independent variables and one dependent variable. By contrast a 4 4 full factorial would require 7680 observations to study the relationships between four independent variables and one dependent variable to the same standard.
This has led to the development of techniques that use less data to achieve the same level of statistical significance to express the risk related to multiple causal relationships [ 33 , 34 ]. Unsurprisingly these techniques, such as Conjoint Analysis, have proved to be extremely popular with researchers [ 35 , 36 ]. However, there is no ‘free lunch’, once again there is a trade-off. Conjoint Analysis, for example, is based upon a fractional factorial design [ 37 ]. The researcher specifies which relationships are of interest, and the programme removes parts of the full factorial array that are not relevant to those relationships [ 36 ]. As with any fractional factorial design, the researcher thus chooses to ignore these excluded relationships, within the fractional design, usually via the (credible) assumption that their main effects and interactions are not significant [ 38 ].
By doing so the researcher chooses to not know something that they do not know. These relationships are removed from the risk calculations relating to the variables that are of interest to the researcher. They and their effects on the research outcomes do not however disappear! They are transformed from visible Group 2 knowledge (risk) to invisible Group 3 knowledge (uncertainty). If the researcher’s assumptions are wrong and these excluded relationships are significant, then they have the potential to significantly distort the outcomes of the apparently authoritative analysis of the risk related to the visible Group 2 relationships that are eventually reported by the researcher. While techniques such as Conjoint Analysis that routinely rely upon highly fractionated fractional factorial designs are vulnerable in this regard [ 38 ], it is rarely acknowledged with regard to results that rely upon them. As with the SEM example above, the p value associated with the conclusion is routinely readjusted to zero on citation, and it thus graduates to the status of Group 1 knowledge (certainty).
This category of knowledge, as Donald Rumsfeldt observed, is the one that creates most difficulty. It is also invariably the largest category of knowledge in any ‘living’ research environment, and it is at its most complex in human research environments. Its impact on data cannot be separated or quantified and thus must be treated as uncertainty rather than risk.
To illustrate this, take the situation where a researcher wishes to study the causal relationship between fructose intake and attention span for adolescents. The sample will be 480 adolescents aged between 12 and 16. For each adolescent, measures for fructose intake and attention span are to be established by the researcher.
The researcher may also presume that other factors than fructose intake will have an effect on attention span, and they may seek to capture and control for the impact of these ‘extraneous’ variables by a variety of methods such as high order factorials and ANOVA, conjoint analysis or linear mixed model designs. Whatever method is used, the capacity to include additional variables is always restricted by the amount of information relating to the impact of an independent variable set that can be extracted from any dataset, and the conclusions relating to them that can have a meaningful measure of risk attached to them via a p value.
Thus, in this case the researcher designs the research to capture the main effects of three other extraneous independent variables in addition to fructose intake: parental education, household income and the child’s gender. These relationships thus become Group 2 information.
This accounts for four variables that might well significantly impact upon the relationship between sugar intake and attention span, but it leaves many others uncontrolled for and unaccounted for within the research environment. These Group 3 uncertainty inputs (variables) may include, but are by no means restricted to, the diet of the household (which includes many individual aspects), the number of siblings in the household, the school that the adolescent attends and level of physical activity, etc. etc. These Group 3 uncertainty variables may be colinear with one or more of the Group 2 variables, they may be anticolinear with them, or they may be simply unconnected (random).
To take ‘school attended’ for example—If the sample are drawn from a small number of equivalent schools, one of which has a ‘crusading’ attitude to attention span, this Group 3 variable is likely to have a significant impact upon the dataset depending upon how it ends up distributed within its groups. If the effect is ‘random’ in its impact in relation to any one of the Group 2 variables, the effect of it will end up in the error term, increasing the possibility of a Type II error with regard to that Group 2 variable (as it might be with regard to gender if the school is coeducational). If the impact is collinear with any one of the Group 2 variables, then its effect will end up in the variation that is attached to that variable, thus increasing the possibility of a Type I error (as it certainly will if the crusading school is single sex).
The key issue here is that the researcher simply does not know about these Group 3 uncertainty variables and their effects. Their ignorance of them is either absolute, or it is qualified because they have been forced to exclude them from the analysis. A researcher will be very fortunate indeed if one or more of these Group 3 uncertainty variables within their chosen human research environment do not have the capacity to significantly impact upon their research results. This researcher for example had an experimental research exercise on olive oil intake destroyed by a completely unsuspected but very strong aversion to Spanish olive oil within the research population. The incorporation of Spanish origin into the packaging of one of the four branded products involved (treated as an extraneous variable with which the ‘Spanish effect’ was fully colinear) produced a massive main effect for package treatment, and substantial primary and secondary interactions with other Group 2 variables that rendered the dataset useless.
Group 3 uncertainty variables will always be present in any living environment. Because they are unknown and uncontrolled for, they are incorrigible via any statistical technique that might reduce them to risk. Consequently, the uncertainty that they generate has the capacity to affect the reliability of both experimental and observational studies to a significant degree. To illustrate, this the fructose and attention span causal example introduced above will be used. Table 2 shows how the Group 3 uncertainty variable (school attended) would affect a comparable experimental and observational study if its impact was significant.
The impact of Group 3 uncertainty variables on experimental and observational research outcomes.
Experimental Study | Observational Study |
---|---|
2 factorial design—480 subjects recruited as eight matched groups of 60 on the basis of parental education, household income and gender. Within each group 30 randomly allocated to a high fructose diet and 30 to a low one, and attention span observed. | 480 subjects recruited as eight matched groups of 60 on the basis of parental education, household income and gender. Each group of 60 divided up into two groups of 30 (high and low) on the basis of their reported fructose consumption and attention span observed. |
The school attended effect will uniformly increase variation within the two randomly allocated experimental groups for high and low fructose diet. This increase in variation will end up in the error term of the analysis of variance, reducing the F ratio for fructose intake and for parental education, income and gender (trending to a Type I error). As the groups for parental education, household income and the child’s gender are not randomly allocated, the school effect will either end up in the error term of the analysis of variance thereby depressing the F ratio for parental education, income and gender if it is not colinear, or it will end up in the error that is related to these variables, and thus increase the F ratio if it is colinear. Therefore, results could trend towards a Type I or Type II error with regard to any or all of these Group 2 variables, depending on the level of and nature of the collinearity between them and the Group 3 variable. The school effect would be likely to be strongly colinear with all of these three Group 2 variables if the attention span crusading school was perceived to be the ‘good’ school in the area. | The school attended variable will impact upon the parental education, household income and child’s gender variables exactly as it does in the experimental design opposite. The impact of the school attended variable upon the fructose intake variable will depend upon its degree of collinearity with it. If it is not collinear, then the allocation to the two groups will effectively be random, and the variation will thus end up in the error term depressing the F ratio for fructose intake, and tending towards a Type I error. If school attended has any collinearity with fructose intake, then the allocation will not be random and the impact of school attended will be apportioned into the variation associated with fructose intake. Depending whether the effect of school attended is complementary or anticomplementary to the effect of fructose intake, the result is a trend towards either a Type I (suppressed F ratio) or a Type II error (increased F ratio). |
Experiments are distinguished from observational studies by the capacity of the researcher to randomly allocate to treatment conditions that they control. Table 2 shows that randomisation may confer a significant advantage over non-randomly allocated observation in an equivalent causal research situation. However, Table 2 also shows that while experimentation may confer advantage over observation in comparable situations, it is a case of ‘may’, and not ‘will’. Randomisation does not confer infallibility, and this is because researcher knowledge and control only relates to Group 2 variables and the random allocation of subjects to them. Control does not extend to any Group 3 variable and is thus not absolute in any human research situation. The outcome is that significant uncertainty, unlike significant risk, cannot be eliminated by random allocation.
Therefore, it is perfectly possible to design an experiment that is less reliable than an observational exercise when investigating causal relationships. Because it cannot be eliminated, how the uncertainty that is generated by Group 3 variables is managed at the design phase of research is one aspect that can significantly impact upon the reliability of causal research that is conducted using either experimental or observational techniques. Perhaps more than any other, it is this aspect of agricultural research method, the management of uncertainty, and the generation of the ‘clean’ data by design that can minimise uncertainty, that has failed to transfer to human research disciplines.
The development of modern, systematic experimental technique for living environments is usually associated with the publication of “The design and analysis of experiments’ and ‘Statistical methods for research workers’ by Sir Ronald Fisher [ 30 , 38 , 39 ]. Although Fisher’s work is most heavily recognised and cited in the role of risk reduction and the manipulation of Group 2 variables via random allocation between treatments, Fisher also was well aware of the potential impact of Group 3 variables and uncertainty on experimental reliability. In order to design ‘effective’ experimental research that dealt with the issue of Group 3 variables and uncertainty, Fisher proposed two ‘main’ principles:
“… the problem of designing economical and effective field experiments is reduced to two main principles (i) the division of the experimental area into plots as small as possible …; (ii) the use of [experimental] arrangements which eliminate a maximum fraction of soil heterogeneity, and yet provide a valid estimate of residual errors.” [ 40 ] (p. 510)
The overall objective of Fisher’s principles is very simple. They aim to minimise the contribution of Group 3 variation to the mean square for error in the analysis of variance table, as the mean square for error forms the denominator of the fraction that is used to calculate the F ratio for significance for any Group 2 variable. The mean square for the variance of that Group 2 variable forms the denominator of the fraction. Therefore, reducing Group 3 variation increases Group 2 ‘F’ ratios and thus their significance in the ANOVA table as expressed by a ‘ p ’ value. Fisher’s principles achieve this by increasing sample homogeneity, which is in turn achieved by reducing sample size.
Fisher’s second principle for experimental design for theory testing is also closely aligned with the much older and more general principal of parsimony in scientific theory generation known ‘Occam’s Razor, which is usually stated as: “Entities are not to be multiplied without necessity” (Non sunt multiplicanda entia sine necessitate) [ 41 ] (p. 483). Occam’s Razor, like Fisher’s principles, is not a ‘hard’ rule, but a general principle to be considered when conducting scientific research [ 42 ].
This is as far as Fisher ever went with regard to these two ‘main’ principles for dealing with Group 3 variation and uncertainty. Exactly why they were not developed further in his writing is a mystery, but Fisher may have assumed that these principles were so obvious to his audience of primarily agricultural researchers that no further development was necessary, and that the orally transmitted experimental ‘method’ discussed earlier in this article would suffice to ensure that these two principles were applied consistently to any experimental research design.
The author’s personal experience is that Fisher’s assumptions were justified with regard to agricultural research, but not the medical, biological and social sciences to which his experimental techniques were later transferred without their accompanying method. To a certain degree this may be due to the fact that the application of Fisher’s principles for the reduction of experimental uncertainty are also easier to visualise and understand in their original agricultural context, and so they will be initially explained in that context here ( Figure 1 ).
Fisher’s principles and Group 3 variables in the experimental environment.
Figure 1 a shows a living environment, in this case an agricultural research paddock. On first inspection it might appear to be flat and uniform, but it actually has significant non-uniformities within it with regard to soil, elevation, slope, sunlight and wind. The researcher either does not know about these non-uniformities (e.g., the old watercourse) or simply has to put up with them (slope, elevation and wind) in certain circumstances. These are all Group 3 variables in any research design. While Fisher used the term ‘soil heterogeneity as the input he wished to eliminate, he would have been more correct to use the term ‘environmental heterogeneity’.
In Figure 1 b, a 3 × 4 fractionally replicated Latin Square experiment that is able to separate the main effects of three independent Group 2 variables, with the ability to detect the presence of non-additivity (interaction) between them has been set up (Youden & Hunter 1955). The experiment follows Fisher’s first principle in that the individual plots (samples) are as small as it is possible to make them without creating significant ‘edge effects’ [ 43 ]. It also follows Fisher’s second principle in that this form of fractionally replicated Latin Square is the most efficient design for dealing with this set of three Group 2 variables and simple non-additivity [ 5 ]. In Figure 1 b the researcher has used the small size to avoid non-uniformity of sun and wind, and they have also fortuitously avoided any variations due to the river bed, if they were not aware of it.
In Figure 1 c the researcher has breached Fisher’s first principle in that the plot sizes of the experiment have been increased beyond the minimum on the basis of ‘the bigger the sample the better’ philosophy that dominates most experimental and observational research design. This increase in plot size may reduce random measurement error, thus reducing the proportion of variance ending up in the error term and thus potentially increasing the F ratios for the Group 2 variables. However, the increase in accuracy will be subject to diminishing returns.
Furthermore, the design now includes all the variations in Group 3 variables in the environment. This may do one of two things. Firstly, variation generated by the Group 3 variables may simply increase apparent random variation, which will reduce the F ratio and induce a Type I error. Secondly, as is shown in this case, Group 3 variation may fortuitously create an apparently systematic variation via collinearity with a Group 2 variable. As the old water course is under all the ‘level I’ treatments for the third Group 2 independent variable, all the variations due to this Group 3 variable will become collinear with those of the third Group 2 independent variable. This will apparently increase the F ratio for that variable, and also simultaneously reduce that for the Youden & Hunter test for non-additivity of effects thereby creating a significant potential for a Type II error. (The Youden and Hunter test for non-additivity [ 44 ] estimates experimental error directly by comparing replications of some treatment conditions in the design. Non-additivity is then estimated via the residual variation in the ANOVA table. In this case, the three main design plots for Group 2 Variable 3, treatment level I are all in the watercourse, while the single replication at this level is on the bottom left corner of the design on the elevated slope. This replicated level I plot is likely to return a significantly different result than the three main plots, thus erroneously increasing the test’s estimate of overall error, and concomitantly erroneously reducing its estimate of non-additivity.)
In Figure 1 d the researcher, who is only interested in three Group 2 main effects and the presence or not of interaction between them, has breached Fisher’s second principle by using a less efficient ‘overkill’ design for this specific purpose. They are using an 3 × 3 × 3 full factorial, but with the initial small plot size. This design has theoretically greater statistical power with regard to Group 2 variation, and also has the capacity to identify and quantify first, second and third order interactions between them—information that they do not need. The outcome of this is the same as breaching Fisher’s first principle, in that major variations in Group 3 variables are incorporated into the enlarged dataset that is required by this design. It is purely a matter of chance as to whether this Group 3 variation will compromise the result by increasing apparent random error, but this risk increases exponentially with increasing sample size. The randomisation of plots over the larger area makes a Type II error much less likely, but the chance of a Type I error is still significantly increased.
The design of an experiment that breached both of Fisher’s principles by using both the larger design and the larger plot size cannot be shown in Figure 1 as it would be too large, but the experiment’s dataset would inevitably incorporate even greater Group 3 variation than is shown in the figure, with predictably dire results for the reliability of any research analysis of the Group 2 variables.
It is important to note that that Fisher’s principles do not dictate that all experiments should be exceedingly small. Scale does endow greater reliability, but not as a simple matter of course. This scale must be achieved via replication of individual exercises that do conform to Fisher’s principles. ‘Internal ‘intra-study’ replication, where a small-sample experimental exercise is repeated multiple times to contribute to a single result does not breach Fisher’s principles, and it increases accuracy, power and observable reliability. It is thus standard agricultural research practice. Intra-study replications in agricultural research routinely occur on a very large scale [ 45 ], but it is rare to see it in human research disciplines [ 46 , 47 ]. The process is shown in Figure 1 e, where the experiment from Figure 1 a is replicated three times. With this design, variation in environment can be partitioned in the analysis of variance table as a sum of squares for replication. A large/significant figure in this category (likely in the scenario shown in Figure 1 e) may cause the researcher to conduct further investigations as to the potential impact of Group 3 variables on the overall result.
Figure 1 f shows a situation that arises in human rather than agricultural research, but places it into the same context as the other examples. In agricultural research, participation of the selected population is normally one hundred percent. In human research this is very rarely the case, and participation rates normally fall well below this level. Figure 1 f shows a situation where only around 25% of the potentially available research population is participating as a sample.
Fractional participation rates increase the effective size of the sample proportionately (shown by the dotted lines) of the actual plots from which the sample would be drawn. The reported sample numbers would make this look like the situation in Figure 1 b, but when it is shown laid out in Figure 1 f, it can be seen that the actual situation is more analogous to Figure 1 c, with a very large underlying research population that incorporates the same level of Group 3 variance as Figure 1 c, but without the advantage of greater actual sample size, thereby magnifying the potential effect of Group 3 variables beyond that in Figure 1 c. The outcome is an effective breach of Fisher’s first principle, and an increased chance that both Type I and Type II errors will occur.
Subject participation rate is therefore a crucial factor when assessing the potential impact of Group 3 variables on experimental research reliability. This derivative of Fisher’s first principle holds whether the experimental analysis of Group 2 variation is based upon a randomised sample or not.
Moving forward from these specific agricultural examples, the general application of Fisher’s principles with regard to the sample size used in any experiment can be visualised as in Figure 2 .
Graphical representation of the interaction of risk, uncertainty and unreliability as a function of experimental sample size.
As sample size increases, then ‘ceteris paribus’, the risk (R) of making a Type I or II error with regard to any Group 2 variable decreases geometrically, and is expressed via statistics in a precise and authoritative manner by ‘ p ’ value. As a consequence of this precision, this risk can be represented by a fine ‘hard’ solid line (R) in Figure 2 .
By contrast, the uncertainty that is generated by the influence of Group 3 variables within the sample increases as the sample size itself increases. Unlike risk, it cannot be analysed, and no specific source or probability can be assigned to it—yet its increase in any living environment is inevitable as sample size increases. As it is fundamentally amorphous in nature it cannot be expressed as a ‘hard’ line, but is shown as a shaded area (U) in Figure 2 .
The overall unreliability of research (T) is the sum of these two inputs. It is not expressed as a line in Figure 2 , but as a shape that starts as a hard black line when the sample size is small and risk is the dominant input, and as a wide shaded area as sample size increases and uncertainty become the dominant input. The shape of the unreliability plot (T) is significant. As risk reduces geometrically, and uncertainty increases at least linearly with sample size, unreliability (T) takes the form of an arc, with a specific minimum point ‘O’ on the sample size axis where risk and uncertainty contribute equally to unreliability.
This indicates that there is a theoretical ‘optimal’ sample size at which unreliability is at its lowest, which is represented by a point (O) at the bottom of the arc (T). ‘O’, however, is not the optimal size of any experimental design. The point where sample size reaches point ‘O’, uncertainty is also the point at which uncertainty becomes the dominant contributor to overall experimental unreliability. However, as uncertainty is amorphous, the exact or even approximate location of ‘O’, and the sample size that corresponds to it, therefore cannot be reliably established by the researcher.
Given that ‘O’ cannot be reliably located, then the researcher must endeavour to stay well on the right side of it. It is clear from Figure 2 that, if there is a choice that is to be made between them, then it is better to favour risk over uncertainty, and to design an experiment that has specific risk contributing the maximum, and amorphous uncertainty the minimum, amount to its overall experimental unreliability for a given and acceptable value of p .
The logical reaction of any experimental designer to this conclusion is to ‘hug’ the risk line (R). This means that the minimum sample size that is required to achieve an acceptable not minimal level of experimental risk is selected, and further scale is achieved by replication of the entire exercise. This point is represented by the vertical dotted line ‘S1′ for p = 0.10 if the designer takes this to be the required level of risk for the experiment. If the designer reduces p to 0.05 and increases the sample accordingly, then they reduce the apparent risk, but they do not know with any certainty whether they are doing the same for overall unreliability, as uncertainty is now contributing more to the overall unreliability of the experiment (line S2). If risk is further reduced to p = 0.01, then the geometric increase in the sample size required increases the impact of Group 3 variable derived uncertainty to the point that it generates an apparently lower risk experiment that actually has a significantly higher (but amorphous and hidden) level of overall unreliability (represented by the double headed arrow on line S3).
It is this logical design reaction to the situation outlined in Figure 1 that is expressed by Fisher in his two principles. It should be noted that the required risk is the cardinal input. The acceptable level of risk must be established first, and this choice should be driven by the research objectives and not by the research design process. Fisher’s principles are then applied to minimise the contribution of uncertainty to experimental designs that are capable of achieving that level of risk.
All the foregoing remarks apply equally to randomised experimental research, and also to observational research that uses any form of organised comparison as the basis for their conclusions. Indeed, many observational research designs are classical experimental designs in all facets bar the randomisation of their treatment conditions.
In both cases poor design that does not address the potential contribution of Group 1 (certainty) and Group 3 (uncertainty) variation to their data can produce a highly unreliable research outcome that can nevertheless report a low level of risk. This outcome is made even more undesirable when this unreliable outcome is authoritatively presented as a low-risk result on the basis of a design and statistical analysis that focusses purely on the contribution of Group 2 (risk) variation to the data. The situation is further aggravated if the practice becomes widespread, and if there is a lack of routine testing of such unreliable results via either intra-study or inter study replication.
The answer to this problem is the application of method to reduce uncertainty and thus unreliability—Fisher’s two principles form only a small part of this body of method. At present the situation is that method is widely considered to be of little importance As Gershon et al. note [ 15 ] “ Methods of observational studies tend to be difficult to understand…” Method is indeed difficult to report as it is both complex and case specific. My personal experience is that I have struggled to retain any methodological commentary in any article that I have published in the human research literature—It is just not perceived to be important by reviewers and editors—and thus presumably not worth understanding. Consequently, deletion is its routine fate.
One of the main barriers to the use, reporting and propagation of good method is that it is a fungible entity. While the techniques from Figure 1 such as Latin Square or ANOVA may applied to thousands of research exercises via a single, specific set of written rules, method is applied to research designs on a case-by-case basis via flexible and often unwritten guidelines. This is why ‘Fisher’s principles’, are principles and not rules. Thus, this article concludes by developing Fisher’s principles into a set of four methodological ‘principles’ for conducting observational research in nutrition—and for subsequently engaging with editors and reviewers:
Randomisation confers advantage over observation in specific situations rather than absolute infallibility. Therefore a researcher may make a reasonable choice between them when designing an experiment to maximise reliability.
Many observational studies are conducted because random allocation is not possible. If this is the case, then the use of observation may not need to be justified. If, however, the researcher faces the option of either a randomised or observational approach, then they need to look very carefully at whether the random design actually offers the prospect of a more reliable result. Ceteris paribus it does, but if randomisation is going to require a larger/less efficient design, or makes recruitment more difficult, thereby increasing the effective size of the individual samples, then level of uncertainty will be increased within the results t the degree that a reduction in reliability might reasonably be assumed. An observational approach may thus be justified via Fisher’s first or second principles.
Theoretical simplicity confers reliability. Therefore simpler theories and designs should be favoured.
All theoretical development involves an assumption of certainty for inputs when reality falls (slightly) short of this. This is not an issue when the inputs and assumptions related to the research theory are few, but can become an issue if a large number are involved.
There is no free lunch in science. The more hypotheses that the researcher seeks to test, the larger and more elaborate the research design and sample will have to be. Elaborate instruments make more assumptions and also tend to reduce participation, thus increasing effective individual sample size. All of these increase the level of uncertainty, and thus unreliability, for any research exercise.
The researcher should therefore use the simplest theory and related research design that is capable of addressing their specific research objectives.
There is an optimal sample size for maximum reliability—Big is not always better. Therefore the minimum sample size necessary to achieve a determined level of risk for any individual exercise should be selected.
The researcher should aim to use the smallest and most homogenous sample that is capable of delivering the required level of risk for a specific research design derived from Principle 2 above. Using a larger sample than is absolutely required inevitably decreases the level of homogeneity within the sample that can be achieved by the researcher, and thereby increases the uncertainty of Group 3 variables that are outside the control or awareness of the researcher. Unlike risk, uncertainty cannot be estimated, so the logical approach is not to increase sample size beyond the point at which risk is at the required level.
Scale is achieved by intra-study replication—more is always better. Therefore, multiple replications should be the norm in observational research exercises.
While there is an optimal sample size to an individual experimental/observational research exercise, the same does not apply to the research sample as a whole if scale is achieved by intra-study replication. Any observational exercise should be fully replicated at least once, and preferably multiple times within any study that is being prepared for publication. Replication can be captured within a statistical exercise and can thus be used to significantly reduce the estimate of risk related to Group 2 variables.
Far more importantly for observational researchers, replication stability also confers a subjective test of overall reliability of their research, and thus the potential uncertainty generated by Group 3 variables. A simple observational exercise that conforms with Principles 1–3 that is replicated three times with demonstrated stability to replication has a far more value, and thus a far higher chance of being published than a single more elaborate and ‘messy’ observational exercise that might occupy the same resource and dataset.
Clearly the research may not be stable to replication. However, this would be an important finding in and of itself, and the result may allow the researcher to develop some useful conclusions as to why this result occurred, what its implications are, and which Group 3 variable might be responsible for it. The work thus remains publishable. This is a better situation than that faced by the author of the single large and messy exercise noted above—The Group 3 variation would be undetected in their data. Consequently, the outcome would be an inconclusive/unpublishable result and potentially a Type 1 error.
Observational researchers will always have to face challenges with regard to the perceived reliability of their research. As they defend their work it is important for them to note that random designs are not infallible and that observational designs are therefore not necessarily less reliable than their randomised counterparts. Observation thus represents a logical path to reliability in many circumstances. If they follow the four principles above, then their work should have a demonstrably adequate level of reliability to survive these challenges and to make a contribution to the research literature.
Publishing experimental research of this type that takes a balanced approach to maximising experimental reliability by minimising both risk and uncertainty is likely to remain a challenging process in the immediate future. This is largely due to an unbalanced focus by reviewers, book authors and editors on statistical techniques that focus on the reduction of risk over any other source of experimental error [ 48 ].
Perhaps the key conclusion is that replication is an essential aspect of both randomised and observational research. The human research literature remains a highly hostile environment to inter-study replications of any type. Hopefully this will change. However, in the interim, intra-study replication faces no such barriers, and confers massive advantages, particularly to observational researchers. Some may approach replication with some trepidation. After forty years of commercial and academic research experience in both agricultural and human environments, my observation is that those who design replication based research exercises that conform to Fisher’s principles have much to gain and little to fear from it.
One reviewer raised an important point with regard to the application of Fisher’s principles to two important nutritional variables:
“There are some features on methods of data collection in nutritional studies that require attention, for example recall bias or within individual variation. The authors did not mention these at all.”
The researcher operates in food marketing where both of these issues can cause major problems. There are significant differences between them. Recall bias as its name suggests is a systematic variation, where a reported phenomenon is consistently either magnified or reduced upon recollection within a sample. Bias of any type is a real issue when an absolute measure of a phenomenon is required (e.g., total sugar intake). However, due to its systematic nature, it would not necessarily be an issue if the research exercise involves a comparison between two closely comparable sample groups to measure the impact of an independent variable upon total sugar intake (e.g., an experiment/observational exercise where the impact of education on total sugar intake was studied by recruiting two groups with high and low education, and then asking them to report their sugar intake). If the two groups were comparable in their systematic recall bias, then the systematic recall effect would cancel out between the samples and would disappear in the analysis of the impact of education upon total sugar intake.
However, this requires that the two groups are truly comparable with regard to their bias. The chances of this occurring are increased in both random allocation (experimental) and systematic allocation (observational) environments if the sample sizes are kept as small as possible while all efforts are taken to achieve homogeneity within them. Response bias is a type 3 (uncertainty) variable. If the population from which the two samples above are drawn increases in size, then the two samples will inevitably become less homogenous in their characteristics. This also applies to their bias, which thus ceases to be homogenous response bias, and instead becomes increasingly random response variation—the impact of which, along with all the other type 3 uncertainty variables, now ends up in the error term of any analysis, thus decreasing the research reliability (See Figure 2 ). Response bias can thus best be managed using Fisher’s principles.
Similar comments can be made about within individual variation. The fact that people are not consistent in their behaviour is a massive issue in both nutrition and food marketing research. However, this seemingly random variation in behaviour is usually driven by distinct and predictable changes in behaviour which are driven by both time and circumstance/opportunity. For example, you consistently eat different food for breakfast and dinner (temporal pattern). You also consistently tend to eat more, and less responsibly, if you go out to eat (circumstance/opportunity pattern). If time/circumstance/opportunity for any group can be tightened up enough and made homogenous within that group, then this seemingly random within individual variation thus becomes a consistent within individual bias, and can be eliminated as a factor between study groups in the manner shown above.
Thus, within individual variation is a Group 3 (uncertainty) variable, and it too can be managed via Fisher’s principles. Although most research looks at recruiting demographically homogenous samples, less attention is paid to also recruiting samples that are also temporally and environmentally homogenous. Thus, a researcher should not only collect demographically homogenous samples but should also recruit temporally and environmentally homogenous samples by recruiting at the same time and location. This temporal and environmental uniformity has the effect of turning a significant proportion of within consumer variation into within consumer bias for any sample. The effect of this bias is then eliminated by the experimental/observational comparison. The small experiments/observational exercises are then replicated as many times as necessary to create the required sample size and Group 2 risk.
This research received no external funding.
Informed consent statement, data availability statement, conflicts of interest.
The author declares no conflict of interest.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
FREE 2024-25 Printable Teacher Calendar! 🗓️
Because science doesn’t have to be complicated.
If there is one thing that is guaranteed to get your students excited, it’s a good science experiment! While some experiments require expensive lab equipment or dangerous chemicals, there are plenty of cool projects you can do with regular household items. We’ve rounded up a big collection of easy science experiments that anybody can try, and kids are going to love them!
Easy physics science experiments, easy biology and environmental science experiments, easy engineering experiments and stem challenges.
Teach your students about diffusion while creating a beautiful and tasty rainbow! Tip: Have extra Skittles on hand so your class can eat a few!
Learn more: Skittles Diffusion
Crystal science experiments teach kids about supersaturated solutions. This one is easy to do at home, and the results are absolutely delicious!
Learn more: Candy Crystals
This classic experiment demonstrates a chemical reaction between baking soda (sodium bicarbonate) and vinegar (acetic acid), which produces carbon dioxide gas, water, and sodium acetate.
Learn more: Best Volcano Experiments
This fun project uses yeast and a hydrogen peroxide solution to create overflowing “elephant toothpaste.” Tip: Add an extra fun layer by having kids create toothpaste wrappers for plastic bottles.
Add a few simple ingredients to dish soap solution to create the largest bubbles you’ve ever seen! Kids learn about surface tension as they engineer these bubble-blowing wands.
Learn more: Giant Soap Bubbles
All you need is a zip-top plastic bag, sharp pencils, and water to blow your kids’ minds. Once they’re suitably impressed, teach them how the “trick” works by explaining the chemistry of polymers.
Learn more: Leakproof Bag
Have students make predictions about what will happen to apple slices when immersed in different liquids, then put those predictions to the test. Have them record their observations.
Learn more: Apple Oxidation
Their eyes will pop out of their heads when you “levitate” a stick figure right off the table! This experiment works due to the insolubility of dry-erase marker ink in water, combined with the lighter density of the ink.
Learn more: Floating Marker Man
There are a lot of easy science experiments you can do with density. This one is extremely simple, involving only hot and cold water and food coloring, but the visuals make it appealing and fun.
Learn more: Layered Water
This density demo is a little more complicated, but the effects are spectacular. Slowly layer liquids like honey, dish soap, water, and rubbing alcohol in a glass. Kids will be amazed when the liquids float one on top of the other like magic (except it is really science).
Learn more: Layered Liquids
Easy science experiments can still have impressive results! This eye-popping chemical reaction demonstration only requires simple supplies like sugar, baking soda, and sand.
Learn more: Carbon Sugar Snake
Tell kids you’re going to make slime at home, and watch their eyes light up! There are a variety of ways to make slime, so try a few different recipes to find the one you like best.
These homemade bouncy balls are easy to make since all you need is glue, food coloring, borax powder, cornstarch, and warm water. You’ll want to store them inside a container like a plastic egg because they will flatten out over time.
Learn more: Make Your Own Bouncy Balls
Eggshells contain calcium, the same material that makes chalk. Grind them up and mix them with flour, water, and food coloring to make your very own sidewalk chalk.
Learn more: Eggshell Chalk
This is so cool! Use vinegar to dissolve the calcium carbonate in an eggshell to discover the membrane underneath that holds the egg together. Then, use the “naked” egg for another easy science experiment that demonstrates osmosis .
Learn more: Naked Egg Experiment
This sounds a lot more complicated than it is, but don’t be afraid to give it a try. Use simple kitchen supplies to create plastic polymers from plain old milk. Sculpt them into cool shapes when you’re done!
Teach kids about acids and bases without needing pH test strips! Simply boil some red cabbage and use the resulting water to test various substances—acids turn red and bases turn green.
Learn more: Cabbage pH
Use common household items to make old oxidized coins clean and shiny again in this simple chemistry experiment. Ask kids to predict (hypothesize) which will work best, then expand the learning by doing some research to explain the results.
Learn more: Cleaning Coins
This classic easy science experiment never fails to delight. Use the power of air pressure to suck a hard-boiled egg into a jar, no hands required.
Learn more: Egg in a Bottle
Chances are good you probably did easy science experiments like this when you were in school. The baking soda and vinegar balloon experiment demonstrates the reactions between acids and bases when you fill a bottle with vinegar and a balloon with baking soda.
This 1970s trend is back—as an easy science experiment! This activity combines acid-base reactions with density for a totally groovy result.
The calcium content of eggshells makes them a great stand-in for teeth. Use eggs to explore how soda and juice can stain teeth and wear down the enamel. Expand your learning by trying different toothpaste-and-toothbrush combinations to see how effective they are.
Learn more: Sugar and Teeth Experiment
If your kids are fascinated by the Egyptians, they’ll love learning to mummify a hot dog! No need for canopic jars , just grab some baking soda and get started.
This is a fiery twist on acid-base experiments. Light a candle and talk about what fire needs in order to survive. Then, create an acid-base reaction and “pour” the carbon dioxide to extinguish the flame. The CO2 gas acts like a liquid, suffocating the fire.
Turn your kids into secret agents! Write messages with a paintbrush dipped in lemon juice, then hold the paper over a heat source and watch the invisible become visible as oxidation goes to work.
Learn more: Invisible Ink
This is a fun version of the classic baking soda and vinegar experiment, perfect for the younger crowd. The bubbly mixture causes popcorn to dance around in the water.
You’ve always wondered if this really works, so it’s time to find out for yourself! Kids will marvel at the chemical reaction that sends diet soda shooting high in the air when Mentos are added.
Learn more: Soda Explosion
Hot air rises, and this experiment can prove it! You’ll want to supervise kids with fire, of course. For more safety, try this one outside.
Learn more: Flying Tea Bags
This fun and easy science experiment demonstrates principles related to surface tension, molecular interactions, and fluid dynamics.
Learn more: Magic Milk Experiment
Learn about Charles’s Law with this simple experiment. As the candle burns, using up oxygen and heating the air in the glass, the water rises as if by magic.
Learn more: Rising Water
Kids will be amazed as they watch the colored water move from glass to glass, and you’ll love the easy and inexpensive setup. Gather some water, paper towels, and food coloring to teach the scientific magic of capillary action.
Learn more: Capillary Action
Equally educational and fun, this experiment will teach kids about static electricity using everyday materials. Kids will undoubtedly get a kick out of creating beards on their balloon person!
Learn more: Static Electricity
Here’s an old classic that never fails to impress. Magnetize a needle, float it on the water’s surface, and it will always point north.
Learn more: DIY Compass
Sure, it’s easy to crush a soda can with your bare hands, but what if you could do it without touching it at all? That’s the power of air pressure!
While people use clocks or even phones to tell time today, there was a time when a sundial was the best means to do that. Kids will certainly get a kick out of creating their own sundials using everyday materials like cardboard and pencils.
Learn more: Make Your Own Sundial
Grab balloons, string, straws, and tape, and launch rockets to learn about the laws of motion.
All you need is steel wool and a 9-volt battery to perform this science demo that’s bound to make their eyes light up! Kids learn about chain reactions, chemical changes, and more.
Learn more: Steel Wool Electricity
Kids will get a kick out of this experiment, which is really all about Bernoulli’s principle. You only need plastic bottles, bendy straws, and Ping-Pong balls to make the science magic happen.
There are plenty of versions of this classic experiment out there, but we love this one because it sparkles! Kids learn about a vortex and what it takes to create one.
Learn more: Tornado in a Bottle
This simple but effective DIY science project teaches kids about air pressure and meteorology. They’ll have fun tracking and predicting the weather with their very own barometer.
Learn more: DIY Barometer
Students will certainly get a thrill out of seeing how an everyday object like a piece of ice can be used as a magnifying glass. Be sure to use purified or distilled water since tap water will have impurities in it that will cause distortion.
Learn more: Ice Magnifying Glass
Can you lift an ice cube using just a piece of string? This quick experiment teaches you how. Use a little salt to melt the ice and then refreeze the ice with the string attached.
Learn more: Sticky Ice
Light refraction causes some really cool effects, and there are multiple easy science experiments you can do with it. This one uses refraction to “flip” a drawing; you can also try the famous “disappearing penny” trick .
Learn more: Light Refraction With Water
We love how simple this project is to re-create since all you’ll need are some white carnations, food coloring, glasses, and water. The end result is just so beautiful!
Everyone knows that glitter is just like germs—it gets everywhere and is so hard to get rid of! Use that to your advantage and show kids how soap fights glitter and germs.
Learn more: Glitter Germs
You can do so many easy science experiments with a simple zip-top bag. Fill one partway with water and set it on a sunny windowsill to see how the water evaporates up and eventually “rains” down.
Learn more: Water Cycle
Your backyard is a terrific place for easy science experiments. Grab a plastic bag and rubber band to learn how plants get rid of excess water they don’t need, a process known as transpiration.
Learn more: Plant Transpiration
Before conducting this experiment, teach your students about engineers who solve environmental problems like oil spills. Then, have your students use provided materials to clean the oil spill from their oceans.
Learn more: Oil Spill
Kids get a better understanding of the respiratory system when they build model lungs using a plastic water bottle and some balloons. You can modify the experiment to demonstrate the effects of smoking too.
Learn more: Model Lungs
Kids love to collect rocks, and there are plenty of easy science experiments you can do with them. In this one, pour vinegar over a rock to see if it bubbles. If it does, you’ve found limestone!
Learn more: Limestone Experiments
All you need is a plastic bottle, a ruler, and a permanent marker to make your own rain gauge. Monitor your measurements and see how they stack up against meteorology reports in your area.
Learn more: DIY Rain Gauge
This clever demonstration helps kids understand how some landforms are created. Use layers of towels to represent rock layers and boxes for continents. Then pu-u-u-sh and see what happens!
Learn more: Towel Mountains
Learn about the layers of the earth by building them out of Play-Doh, then take a core sample with a straw. ( Love Play-Doh? Get more learning ideas here. )
Learn more: Play Dough Core Sampling
Use the video lesson in the link below to learn why stars are only visible at night. Then create a DIY star projector to explore the concept hands-on.
Learn more: DIY Star Projector
Use shaving cream and food coloring to simulate clouds and rain. This is an easy science experiment little ones will beg to do over and over.
Learn more: Shaving Cream Rain
This is such a cool (and easy!) way to look at fingerprint patterns. Inflate a balloon a bit, use some ink to put a fingerprint on it, then blow it up big to see your fingerprint in detail.
Twizzlers, gumdrops, and a few toothpicks are all you need to make this super-fun (and yummy!) DNA model.
Learn more: Edible DNA Model
Take a nature walk and find a flower or two. Then bring them home and take them apart to discover all the different parts of flowers.
No Bluetooth speaker? No problem! Put together your own from paper cups and toilet paper tubes.
Learn more: Smartphone Speakers
Kids will be amazed when they learn they can put together this awesome racer using cardboard and bottle-cap wheels. The balloon-powered “engine” is so much fun too.
Learn more: Balloon-Powered Car
You’ve probably ridden on a Ferris wheel, but can you build one? Stock up on wood craft sticks and find out! Play around with different designs to see which one works best.
Learn more: Craft Stick Ferris Wheel
There are lots of ways to craft a DIY phone stand, which makes this a perfect creative-thinking STEM challenge.
Put all their engineering skills to the test with an egg drop! Challenge kids to build a container from stuff they find around the house that will protect an egg from a long fall (this is especially fun to do from upper-story windows).
Learn more: Egg Drop Challenge Ideas
STEM challenges are always a hit with kids. We love this one, which only requires basic supplies like drinking straws.
Learn more: Straw Roller Coaster
Explore the power of the sun when you build your own solar ovens and use them to cook some yummy treats. This experiment takes a little more time and effort, but the results are always impressive. The link below has complete instructions.
Learn more: Solar Oven
There are plenty of bridge-building experiments out there, but this one is unique. It’s inspired by Leonardo da Vinci’s 500-year-old self-supporting wooden bridge. Learn how to build it at the link, and expand your learning by exploring more about Da Vinci himself.
Learn more: Da Vinci Bridge
This is one easy science experiment that never fails to astonish. With carefully placed scissor cuts on an index card, you can make a loop large enough to fit a (small) human body through! Kids will be wowed as they learn about surface area.
Combine physics and engineering and challenge kids to create a paper cup structure that can support their weight. This is a cool project for aspiring architects.
Learn more: Paper Cup Stack
Gather a variety of materials (try tissues, handkerchiefs, plastic bags, etc.) and see which ones make the best parachutes. You can also find out how they’re affected by windy days or find out which ones work in the rain.
Learn more: Parachute Drop
It’s amazing how a stack of newspapers can spark such creative engineering. Challenge kids to build a tower, support a book, or even build a chair using only newspaper and tape!
Learn more: Newspaper STEM Challenge
Explore the ways that sound waves are affected by what’s around them using a simple rubber band “guitar.” (Kids absolutely love playing with these!)
Learn more: Rubber Band Guitar
Challenge students to engineer the best possible umbrella from various household supplies. Encourage them to plan, draw blueprints, and test their creations using the scientific method.
Learn more: Umbrella STEM Challenge
This classic experiment teaches kids about basic chemistry and physics. Continue Reading
Copyright © 2024. All rights reserved. 5335 Gate Parkway, Jacksonville, FL 32256
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Experimental research—often considered to be the ‘gold standard’ in research designs—is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different treatment levels (random assignment), and the results of the treatments on outcomes (dependent variables) are observed. The unique strength of experimental research is its internal validity (causality) due to its ability to link cause and effect through treatment manipulation, while controlling for the spurious effect of extraneous variable.
Experimental research is best suited for explanatory research—rather than for descriptive or exploratory research—where the goal of the study is to examine cause-effect relationships. It also works well for research that involves a relatively limited and well-defined set of independent variables that can either be manipulated or controlled. Experimental research can be conducted in laboratory or field settings. Laboratory experiments , conducted in laboratory (artificial) settings, tend to be high in internal validity, but this comes at the cost of low external validity (generalisability), because the artificial (laboratory) setting in which the study is conducted may not reflect the real world. Field experiments are conducted in field settings such as in a real organisation, and are high in both internal and external validity. But such experiments are relatively rare, because of the difficulties associated with manipulating treatments and controlling for extraneous effects in a field setting.
Experimental research can be grouped into two broad categories: true experimental designs and quasi-experimental designs. Both designs require treatment manipulation, but while true experiments also require random assignment, quasi-experiments do not. Sometimes, we also refer to non-experimental research, which is not really a research design, but an all-inclusive term that includes all types of research that do not employ treatment manipulation or random assignment, such as survey research, observational research, and correlational studies.
Treatment and control groups. In experimental research, some subjects are administered one or more experimental stimulus called a treatment (the treatment group ) while other subjects are not given such a stimulus (the control group ). The treatment may be considered successful if subjects in the treatment group rate more favourably on outcome variables than control group subjects. Multiple levels of experimental stimulus may be administered, in which case, there may be more than one treatment group. For example, in order to test the effects of a new drug intended to treat a certain medical condition like dementia, if a sample of dementia patients is randomly divided into three groups, with the first group receiving a high dosage of the drug, the second group receiving a low dosage, and the third group receiving a placebo such as a sugar pill (control group), then the first two groups are experimental groups and the third group is a control group. After administering the drug for a period of time, if the condition of the experimental group subjects improved significantly more than the control group subjects, we can say that the drug is effective. We can also compare the conditions of the high and low dosage experimental groups to determine if the high dose is more effective than the low dose.
Treatment manipulation. Treatments are the unique feature of experimental research that sets this design apart from all other research methods. Treatment manipulation helps control for the ‘cause’ in cause-effect relationships. Naturally, the validity of experimental research depends on how well the treatment was manipulated. Treatment manipulation must be checked using pretests and pilot tests prior to the experimental study. Any measurements conducted before the treatment is administered are called pretest measures , while those conducted after the treatment are posttest measures .
Random selection and assignment. Random selection is the process of randomly drawing a sample from a population or a sampling frame. This approach is typically employed in survey research, and ensures that each unit in the population has a positive chance of being selected into the sample. Random assignment, however, is a process of randomly assigning subjects to experimental or control groups. This is a standard practice in true experimental research to ensure that treatment groups are similar (equivalent) to each other and to the control group prior to treatment administration. Random selection is related to sampling, and is therefore more closely related to the external validity (generalisability) of findings. However, random assignment is related to design, and is therefore most related to internal validity. It is possible to have both random selection and random assignment in well-designed experimental research, but quasi-experimental research involves neither random selection nor random assignment.
Threats to internal validity. Although experimental designs are considered more rigorous than other research methods in terms of the internal validity of their inferences (by virtue of their ability to control causes through treatment manipulation), they are not immune to internal validity threats. Some of these threats to internal validity are described below, within the context of a study of the impact of a special remedial math tutoring program for improving the math abilities of high school students.
History threat is the possibility that the observed effects (dependent variables) are caused by extraneous or historical events rather than by the experimental treatment. For instance, students’ post-remedial math score improvement may have been caused by their preparation for a math exam at their school, rather than the remedial math program.
Maturation threat refers to the possibility that observed effects are caused by natural maturation of subjects (e.g., a general improvement in their intellectual ability to understand complex concepts) rather than the experimental treatment.
Testing threat is a threat in pre-post designs where subjects’ posttest responses are conditioned by their pretest responses. For instance, if students remember their answers from the pretest evaluation, they may tend to repeat them in the posttest exam.
Not conducting a pretest can help avoid this threat.
Instrumentation threat , which also occurs in pre-post designs, refers to the possibility that the difference between pretest and posttest scores is not due to the remedial math program, but due to changes in the administered test, such as the posttest having a higher or lower degree of difficulty than the pretest.
Mortality threat refers to the possibility that subjects may be dropping out of the study at differential rates between the treatment and control groups due to a systematic reason, such that the dropouts were mostly students who scored low on the pretest. If the low-performing students drop out, the results of the posttest will be artificially inflated by the preponderance of high-performing students.
Regression threat —also called a regression to the mean—refers to the statistical tendency of a group’s overall performance to regress toward the mean during a posttest rather than in the anticipated direction. For instance, if subjects scored high on a pretest, they will have a tendency to score lower on the posttest (closer to the mean) because their high scores (away from the mean) during the pretest were possibly a statistical aberration. This problem tends to be more prevalent in non-random samples and when the two measures are imperfectly correlated.
Pretest-posttest control group design . In this design, subjects are randomly assigned to treatment and control groups, subjected to an initial (pretest) measurement of the dependent variables of interest, the treatment group is administered a treatment (representing the independent variable of interest), and the dependent variables measured again (posttest). The notation of this design is shown in Figure 10.1.
Statistical analysis of this design involves a simple analysis of variance (ANOVA) between the treatment and control groups. The pretest-posttest design handles several threats to internal validity, such as maturation, testing, and regression, since these threats can be expected to influence both treatment and control groups in a similar (random) manner. The selection threat is controlled via random assignment. However, additional threats to internal validity may exist. For instance, mortality can be a problem if there are differential dropout rates between the two groups, and the pretest measurement may bias the posttest measurement—especially if the pretest introduces unusual topics or content.
Posttest -only control group design . This design is a simpler version of the pretest-posttest design where pretest measurements are omitted. The design notation is shown in Figure 10.2.
The treatment effect is measured simply as the difference in the posttest scores between the two groups:
The appropriate statistical analysis of this design is also a two-group analysis of variance (ANOVA). The simplicity of this design makes it more attractive than the pretest-posttest design in terms of internal validity. This design controls for maturation, testing, regression, selection, and pretest-posttest interaction, though the mortality threat may continue to exist.
Because the pretest measure is not a measurement of the dependent variable, but rather a covariate, the treatment effect is measured as the difference in the posttest scores between the treatment and control groups as:
Due to the presence of covariates, the right statistical analysis of this design is a two-group analysis of covariance (ANCOVA). This design has all the advantages of posttest-only design, but with internal validity due to the controlling of covariates. Covariance designs can also be extended to pretest-posttest control group design.
Two-group designs are inadequate if your research requires manipulation of two or more independent variables (treatments). In such cases, you would need four or higher-group designs. Such designs, quite popular in experimental research, are commonly called factorial designs. Each independent variable in this design is called a factor , and each subdivision of a factor is called a level . Factorial designs enable the researcher to examine not only the individual effect of each treatment on the dependent variables (called main effects), but also their joint effect (called interaction effects).
In a factorial design, a main effect is said to exist if the dependent variable shows a significant difference between multiple levels of one factor, at all levels of other factors. No change in the dependent variable across factor levels is the null case (baseline), from which main effects are evaluated. In the above example, you may see a main effect of instructional type, instructional time, or both on learning outcomes. An interaction effect exists when the effect of differences in one factor depends upon the level of a second factor. In our example, if the effect of instructional type on learning outcomes is greater for three hours/week of instructional time than for one and a half hours/week, then we can say that there is an interaction effect between instructional type and instructional time on learning outcomes. Note that the presence of interaction effects dominate and make main effects irrelevant, and it is not meaningful to interpret main effects if interaction effects are significant.
Hybrid designs are those that are formed by combining features of more established designs. Three such hybrid designs are randomised bocks design, Solomon four-group design, and switched replications design.
Randomised block design. This is a variation of the posttest-only or pretest-posttest control group design where the subject population can be grouped into relatively homogeneous subgroups (called blocks ) within which the experiment is replicated. For instance, if you want to replicate the same posttest-only design among university students and full-time working professionals (two homogeneous blocks), subjects in both blocks are randomly split between the treatment group (receiving the same treatment) and the control group (see Figure 10.5). The purpose of this design is to reduce the ‘noise’ or variance in data that may be attributable to differences between the blocks so that the actual effect of interest can be detected more accurately.
Solomon four-group design . In this design, the sample is divided into two treatment groups and two control groups. One treatment group and one control group receive the pretest, and the other two groups do not. This design represents a combination of posttest-only and pretest-posttest control group design, and is intended to test for the potential biasing effect of pretest measurement on posttest measures that tends to occur in pretest-posttest designs, but not in posttest-only designs. The design notation is shown in Figure 10.6.
Switched replication design . This is a two-group design implemented in two phases with three waves of measurement. The treatment group in the first phase serves as the control group in the second phase, and the control group in the first phase becomes the treatment group in the second phase, as illustrated in Figure 10.7. In other words, the original design is repeated or replicated temporally with treatment/control roles switched between the two groups. By the end of the study, all participants will have received the treatment either during the first or the second phase. This design is most feasible in organisational contexts where organisational programs (e.g., employee training) are implemented in a phased manner or are repeated at regular intervals.
Quasi-experimental designs are almost identical to true experimental designs, but lacking one key ingredient: random assignment. For instance, one entire class section or one organisation is used as the treatment group, while another section of the same class or a different organisation in the same industry is used as the control group. This lack of random assignment potentially results in groups that are non-equivalent, such as one group possessing greater mastery of certain content than the other group, say by virtue of having a better teacher in a previous semester, which introduces the possibility of selection bias . Quasi-experimental designs are therefore inferior to true experimental designs in interval validity due to the presence of a variety of selection related threats such as selection-maturation threat (the treatment and control groups maturing at different rates), selection-history threat (the treatment and control groups being differentially impacted by extraneous or historical events), selection-regression threat (the treatment and control groups regressing toward the mean between pretest and posttest at different rates), selection-instrumentation threat (the treatment and control groups responding differently to the measurement), selection-testing (the treatment and control groups responding differently to the pretest), and selection-mortality (the treatment and control groups demonstrating differential dropout rates). Given these selection threats, it is generally preferable to avoid quasi-experimental designs to the greatest extent possible.
In addition, there are quite a few unique non-equivalent designs without corresponding true experimental design cousins. Some of the more useful of these designs are discussed next.
Regression discontinuity (RD) design . This is a non-equivalent pretest-posttest design where subjects are assigned to the treatment or control group based on a cut-off score on a preprogram measure. For instance, patients who are severely ill may be assigned to a treatment group to test the efficacy of a new drug or treatment protocol and those who are mildly ill are assigned to the control group. In another example, students who are lagging behind on standardised test scores may be selected for a remedial curriculum program intended to improve their performance, while those who score high on such tests are not selected from the remedial program.
Because of the use of a cut-off score, it is possible that the observed results may be a function of the cut-off score rather than the treatment, which introduces a new threat to internal validity. However, using the cut-off score also ensures that limited or costly resources are distributed to people who need them the most, rather than randomly across a population, while simultaneously allowing a quasi-experimental treatment. The control group scores in the RD design do not serve as a benchmark for comparing treatment group scores, given the systematic non-equivalence between the two groups. Rather, if there is no discontinuity between pretest and posttest scores in the control group, but such a discontinuity persists in the treatment group, then this discontinuity is viewed as evidence of the treatment effect.
Proxy pretest design . This design, shown in Figure 10.11, looks very similar to the standard NEGD (pretest-posttest) design, with one critical difference: the pretest score is collected after the treatment is administered. A typical application of this design is when a researcher is brought in to test the efficacy of a program (e.g., an educational program) after the program has already started and pretest data is not available. Under such circumstances, the best option for the researcher is often to use a different prerecorded measure, such as students’ grade point average before the start of the program, as a proxy for pretest data. A variation of the proxy pretest design is to use subjects’ posttest recollection of pretest data, which may be subject to recall bias, but nevertheless may provide a measure of perceived gain or change in the dependent variable.
Separate pretest-posttest samples design . This design is useful if it is not possible to collect pretest and posttest data from the same subjects for some reason. As shown in Figure 10.12, there are four groups in this design, but two groups come from a single non-equivalent group, while the other two groups come from a different non-equivalent group. For instance, say you want to test customer satisfaction with a new online service that is implemented in one city but not in another. In this case, customers in the first city serve as the treatment group and those in the second city constitute the control group. If it is not possible to obtain pretest and posttest measures from the same customers, you can measure customer satisfaction at one point in time, implement the new service program, and measure customer satisfaction (with a different set of customers) after the program is implemented. Customer satisfaction is also measured in the control group at the same times as in the treatment group, but without the new program implementation. The design is not particularly strong, because you cannot examine the changes in any specific customer’s satisfaction score before and after the implementation, but you can only examine average customer satisfaction scores. Despite the lower internal validity, this design may still be a useful way of collecting quasi-experimental data when pretest and posttest data is not available from the same subjects.
An interesting variation of the NEDV design is a pattern-matching NEDV design , which employs multiple outcome variables and a theory that explains how much each variable will be affected by the treatment. The researcher can then examine if the theoretical prediction is matched in actual observations. This pattern-matching technique—based on the degree of correspondence between theoretical and observed patterns—is a powerful way of alleviating internal validity concerns in the original NEDV design.
Experimental research is one of the most difficult of research designs, and should not be taken lightly. This type of research is often best with a multitude of methodological problems. First, though experimental research requires theories for framing hypotheses for testing, much of current experimental research is atheoretical. Without theories, the hypotheses being tested tend to be ad hoc, possibly illogical, and meaningless. Second, many of the measurement instruments used in experimental research are not tested for reliability and validity, and are incomparable across studies. Consequently, results generated using such instruments are also incomparable. Third, often experimental research uses inappropriate research designs, such as irrelevant dependent variables, no interaction effects, no experimental controls, and non-equivalent stimulus across treatment groups. Findings from such studies tend to lack internal validity and are highly suspect. Fourth, the treatments (tasks) used in experimental research may be diverse, incomparable, and inconsistent across studies, and sometimes inappropriate for the subject population. For instance, undergraduate student subjects are often asked to pretend that they are marketing managers and asked to perform a complex budget allocation task in which they have no experience or expertise. The use of such inappropriate tasks, introduces new threats to internal validity (i.e., subject’s performance may be an artefact of the content or difficulty of the task setting), generates findings that are non-interpretable and meaningless, and makes integration of findings across studies impossible.
The design of proper experimental treatments is a very important task in experimental design, because the treatment is the raison d’etre of the experimental method, and must never be rushed or neglected. To design an adequate and appropriate task, researchers should use prevalidated tasks if available, conduct treatment manipulation checks to check for the adequacy of such tasks (by debriefing subjects after performing the assigned task), conduct pilot tests (repeatedly, if necessary), and if in doubt, use tasks that are simple and familiar for the respondent sample rather than tasks that are complex or unfamiliar.
In summary, this chapter introduced key concepts in the experimental design research method and introduced a variety of true experimental and quasi-experimental designs. Although these designs vary widely in internal validity, designs with less internal validity should not be overlooked and may sometimes be useful under specific circumstances and empirical contingencies.
Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Get Your ALL ACCESS Shop Pass here →
Science doesn’t need to be complicated. These easy science experiments below are awesome for kids! They are visually stimulating, hands-on, and sensory-rich, making them fun to do and perfect for teaching simple science concepts at home or in the classroom.
Click on the titles below for the full supplies list and easy step-by-step instructions. Have fun trying these experiments at home or in the classroom, or even use them for your next science fair project!
Can you make a balloon inflate on its own? Grab a few basic kitchen ingredients and test them out! Try amazing chemistry for kids at your fingertips.
Enjoy learning about the basics of color mixing up to the density of liquids with this simple water density experiment . There are even more ways to explore rainbows here with walking water, prisms, and more.
This color-changing magic milk experiment will explode your dish with color. Add dish soap and food coloring to milk for cool chemistry!
Not all kids’ science experiments involve chemical reactions. Watch how a seed grows , which provides a window into the amazing field of biology .
One of our favorite science experiments is a naked egg or rubber egg experiment . Can you make your egg bounce? What happened to the shell?
Find out how to make corn dance with this easy experiment. Also, check out our dancing raisins and dancing cranberries.
Growing borax crystals is easy and a great way to learn about solutions. You could also grow sugar crystals , eggshell geodes , or salt crystals .
It is great for learning about what happens when you mix oil and water. a homemade lava lamp is a cool science experiment kids will want to do repeatedly!
Who doesn’t like doing science with candy? Try this classic Skittles science experiment and explore why the colors don’t mix when added to water.
Watch your kids’ faces light up, and their eyes widen when you test out cool chemistry with a lemon volcano using common household items, baking soda, and vinegar.
Kid tested, STEM approved! Making a popsicle stick catapult is a fantastic way to dive into hands-on physics and engineering.
Grab this free science experiments challenge calendar and have fun with science right away. Use the clickable links to see how to set up each science project.
💡Want to turn one of these fun and easy science experiments into a science fair project? Then, you will want to check out these helpful resources.
Are you looking for a specific topic? Check out these additional resources below. Each topic includes easy-to-understand information, everyday examples, and additional hands-on activities and experiments.
While many experiments can be performed by various age groups, the best science experiments for specific age groups are listed below.
Kids are curious and always looking to explore, discover, check out, and experiment to discover why things do what they do, move as they move, or change as they change! My son is now 13, and we started with simple science activities around three years of age with simple baking soda science.
Here are great tips for making science experiments enjoyable at home or in the classroom.
Safety first: Always prioritize safety. Use kid-friendly materials, supervise the experiments, and handle potentially hazardous substances yourself.
Start with simple experiments: Begin with basic experiments (find tons below) that require minimal setup and materials, gradually increasing complexity as kids gain confidence.
Use everyday items: Utilize common household items like vinegar and baking soda , food coloring, or balloons to make the experiments accessible and cost-effective.
Hands-on approach: Encourage kids to actively participate in the experiments rather than just observing. Let them touch, mix, and check out reactions up close.
Make predictions: Ask kids to predict the outcome before starting an experiment. This stimulates critical thinking and introduces the concept of hypothesis and the scientific method.
Record observations: Have a science journal or notebook where kids can record their observations, draw pictures, and write down their thoughts. Learn more about observing in science. We also have many printable science worksheets .
Theme-based experiments: Organize experiments around a theme, such as water , air , magnets , or plants . Even holidays and seasons make fun themes!
Kitchen science : Perform experiments in the kitchen, such as making ice cream using salt and ice or learning about density by layering different liquids.
Create a science lab: Set up a dedicated space for science experiments, and let kids decorate it with science-themed posters and drawings.
Outdoor experiments: Take some experiments outside to explore nature, study bugs, or learn about plants and soil.
DIY science kits: Prepare science experiment kits with labeled containers and ingredients, making it easy for kids to conduct experiments independently. Check out our DIY science list and STEM kits.
Make it a group effort: Group experiments can be more fun, allowing kids to learn together and share their excitement. Most of our science activities are classroom friendly!
Science shows or documentaries: Watch age-appropriate science shows or documentaries to introduce kids to scientific concepts entertainingly. Hello Bill Nye and the Magic Schoolbus! You can also check out National Geographic, the Discovery Channel, and NASA!
Ask open-ended questions: Encourage critical thinking by asking open-ended questions that prompt kids to think deeper about what they are experiencing.
Celebrate successes: Praise kids for their efforts and discoveries, no matter how small, to foster a positive attitude towards science and learning.
The scientific method is a way scientists figure out how things work. First, they ask a question about something they want to know. Then, they research to learn what’s already known about it. After that, they make a prediction called a hypothesis.
Next comes the fun part – they test their hypothesis by doing experiments. They carefully observe what happens during the experiments and write down all the details. Learn more about variables in experiments here.
Once they finish their experiments, they look at the results and decide if their hypothesis is right or wrong. If it’s wrong, they devise a new hypothesis and try again. If it’s right, they share their findings with others. That’s how scientists learn new things and make our world better!
Go ahead and introduce the scientific method and get kids started recording their observations and making conclusions. Read more about the scientific method for kids .
STEM activities include science, technology, engineering, and mathematics. In addition to our kids’ science experiments, we have lots of fun STEM activities for you to try. Check out these STEM ideas below.
If you’re looking to grab all of our printable science projects in one convenient place plus exclusive worksheets and bonuses like a STEAM Project pack, our Science Project Pack is what you need! Over 300+ Pages!
~ projects to try now ~.
IMAGES
VIDEO
COMMENTS
6. Experimental research allows cause and effect to be determined. The manipulation of variables allows for researchers to be able to look at various cause-and-effect relationships that a product, theory, or idea can produce. It is a process which allows researchers to dig deeper into what is possible, showing how the various variable ...
List of Advantages of Experimental Research. 1. It gives researchers a high level of control. When people conduct experimental research, they can manipulate the variables so they can create a setting that lets them observe the phenomena they want. They can remove or control other factors that may affect the overall results, which means they can ...
Benefits of science. The process of science is a way of building knowledge about the universe — constructing new ideas that illuminate the world around us. Those ideas are inherently tentative, but as they cycle through the process of science again and again and are tested and retested in different ways, we become increasingly confident in them.
Advantages of Experimental Research When talking about this research, we can think of human life. Babies do their own rudimentary experiments (such as putting objects in their mouths) to learn about the world around them, while older children and teens do experiments at school to learn more about science.
A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results. Keywords: scientific writing, scholarly communication. Study, experimental, or research design is the backbone of good research.
Experimentation in practice: The case of Louis Pasteur. Well-controlled experiments generally provide strong evidence of causality, demonstrating whether the manipulation of one variable causes a response in another variable. For example, as early as the 6th century BCE, Anaximander, a Greek philosopher, speculated that life could be formed from a mixture of sea water, mud, and sunlight.
Chapter 10 Experimental Research. Experimental research, often considered to be the "gold standard" in research designs, is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different treatment levels ...
Abstract. Practicing and studying automated experimentation may benefit from philosophical reflection on experimental science in general. This paper reviews the relevant literature and discusses central issues in the philosophy of scientific experimentation. The first two sections present brief accounts of the rise of experimental science and ...
The Advantages of Experimental Research. 1. A High Level Of Control. With experimental research groups, the people conducting the research have a very high level of control over their variables. By isolating and determining what they are looking for, they have a great advantage in finding accurate results. 2.
Experimental science is the queen of sciences and the goal of all speculation. Roger Bacon (1214-1294) Experiments are part of the scientific method that helps to decide the fate of two or more competing hypotheses or explanations on a phenomenon. The term 'experiment' arises from Latin, Experiri, which means, 'to try'.
There are 3 types of experimental research designs. These are pre-experimental research design, true experimental research design, and quasi experimental research design. 1. The assignment of the control group in quasi experimental research is non-random, unlike true experimental design, which is randomly assigned. 2.
The three main types of experimental research design are: 1. Pre-experimental research. A pre-experimental research study is an observational approach to performing an experiment. It's the most basic style of experimental research. Free experimental research can occur in one of these design structures: One-shot case study research design: In ...
Let us see some advantages and disadvantages of experimental research: Advantages of experimental research: All the variables are in the researchers' control, and that means the researcher can influence the experiment according to the research question's requirements. ... An example of experimental research in science: When scientists make ...
Abstract. Quantum technology promises to revolutionize how we learn about the physical world. An experiment that processes quantum data with a quantum computer could have substantial advantages over conventional experiments in which quantum states are measured and outcomes are processed with a classical computer.
Advantages And Disadvantages Of Experimental Research . With experimental research, we can test ideas in a controlled environment before marketing. It acts as the best method to test a theory as it can help in making predictions about a subject and drawing conclusions. Let's look at some of the advantages that make experimental research useful:
Experimental research serves as a fundamental scientific method aimed at unraveling. cause-and-effect relationships between variables across various disciplines. This. paper delineates the key ...
An experiment, also known as "the scientific method," is a process in which a researcher measures cause and effect. First, a researcher asserts a hypothesis, or a theory about how a certain variable will affect another. Then, to test their theory, the researcher exposes a group of subjects to a certain treatment, called the independent variable ...
List of Advantages of Experimental Research. 1. Control over variables. This kind of research looks into controlling independent variables so that extraneous and unwanted variables are removed. 2. Determination of cause and effect relationship is easy.
Experimental Design. Lisa and Henry are both psychologists doing research on how to treat anxiety. Lisa wants to see if a new pill is more effective at treating anxiety than the pills that doctors ...
Experiments have a long and important history in social science. Behaviorists such as John Watson, B. F. Skinner, Ivan Pavlov, and Albert Bandura used experimental design to demonstrate the various types of conditioning. ... In these cases, pre-experimental and quasi-experimental designs-which we will discuss in the next section-can be used ...
Abstract. Experimental research design is centrally concerned with constructing research that is high in causal (internal) validity. Randomized experimental designs provide the highest levels of causal validity. Quasi-experimental designs have a number of potential threats to their causal validity. Yet, new quasi-experimental designs adopted ...
Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.
1. Introduction 'Does A cause B'? is one of the most common questions that is asked within nutrition research. Usually 'A' is a dietary pattern, and 'B' is a health, development or morbidity outcome [].In agricultural nutrition, the standard approach to such questions is to use a randomised experimental design [].These research tools were in fact developed within agricultural ...
Experimental studies are expensive. Experimental studies are typically smaller and shorter than observational studies. Now, let us understand the difference between the two types of studies using different problems. Problem 1: A study took a random sample of students and asked them about their bedtime schedules.
Chances are good you probably did easy science experiments like this when you were in school. The baking soda and vinegar balloon experiment demonstrates the reactions between acids and bases when you fill a bottle with vinegar and a balloon with baking soda. 21 Assemble a DIY lava lamp.
10 Experimental research. 10. Experimental research. Experimental research—often considered to be the 'gold standard' in research designs—is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different ...
DIY science kits: Prepare science experiment kits with labeled containers and ingredients, making it easy for kids to conduct experiments independently. Check out our DIY science list and STEM kits. Make it a group effort: Group experiments can be more fun, allowing kids to learn together and share their excitement.