(a) the target population (who would be affected by any decisions made as a result of the study?);
(b) the health state to be summarized;
(c) the measure of occurrence; and
(d) any stratification variables, if applicable.
. | . | . |
---|---|---|
Title and abstract | 1 | Explicitly state that this is a “descriptive study” in the title or the abstract. |
2 | Summarize the target population and provide an informative and balanced summary of estimated disease occurrence in the abstract. | |
Introduction | ||
Background/rationale | 3 | State the motivation for the study, including, where relevant, the action that might be informed by the results. |
Objectives | 4 | State the descriptive estimand, explicitly including: (a) the target population (who would be affected by any decisions made as a result of the study?); (b) the health state to be summarized; (c) the measure of occurrence; and (d) any stratification variables, if applicable. |
Methods | ||
Study design | 5 | (a) State whether the study is cross-sectional or longitudinal. (b) Restate the measure of occurrence being targeted. (c) If the study is longitudinal, specify the time origin and follow-up period for the measure of occurrence; if the study is cross-sectional, specify the time anchor at which the health state is summarized for individuals. |
Setting | 6 | Describe any relevant features of the place and time in which the target population resides and across which data were collected. |
Participants | 7 | (a) Describe the target population thoroughly in terms of person, place, and time. (b) Describe sampling into the study population (whether sampling was explicit or implicit, e.g., by inclusion in an administrative database); this includes eligibility criteria (see recommendations on data sources in item 10 below). (c) Describe any restrictions on the analytical sample. |
Outcome(s) | 8 | (a) State when and how the outcome is measured. (b) Include estimates or discussion of the sensitivity and specificity of the study outcome definition relative to the gold standard. (c) List secondary outcomes or competing events of interest. |
Covariates | 9 | Specify any stratification or adjustment variables—clearly define how variables were collected or constructed. |
Data sources/measurement | 10 | Clearly delineate any inclusion/exclusion criteria for membership in the data source, including the original purpose for which the data were collected, if not for the study at hand. |
Bias | 11 | Describe any assumptions or methods used to extrapolate data from the analytical sample to the study population and from the study population to the target population. |
Statistical methods | 12 | (a) Describe the primary statistical methods used to estimate the measure of disease occurrence being targeted; discuss assumptions of that method in light of data limitations (e.g., assumption of independent censoring for people lost to follow-up). (b) If any adjustment/standardization will be done, state the goal of such adjustment. |
Results | ||
Participants | 13 | Report numbers of individuals at each study stage (this is likely to be approximate for the target population); consider summarizing this information in a flow diagram. |
Descriptive data | 14 | (a) Report on the characteristics of the analytical sample in a “Table 1.” (b) Indicate the number of participants with missing data for each variable used in the analysis. (c) If any weighting or imputation is done to reconstruct the study sample or target populations, include columns for those populations. |
Outcome data | 15 | (a) Present an overall (unstratified) estimate of the measure of occurrence of interest. (b) Report “crude” (raw data in the analytical sample) and (if applicable) “corrected” (after any weighting or imputation) estimates. |
Other analyses | 16 | Present prespecified stratum-specific or adjusted/standardized results. |
Discussion | ||
Key results | 17 | Summarize key results with reference to the study objectives. |
Limitations | 18 | Summarize potential sources of selection bias and measurement error and any attempts to mitigate these biases. Discuss both the direction and magnitude of any potential bias. Integrating quantitative bias analysis into the study to guide these discussions is encouraged. |
Interpretation | 19 | (a) Avoid causal interpretations of descriptive results; avoid overinterpreting stratum-specific differences in measures of occurrence. (b) Describe how results of this study might inform or improve public health or clinical practice. |
We define a descriptive epidemiologic question as one that aims to quantify some feature of the health of a population and, often, to characterize the distribution of that feature across the population. The estimand for causal analyses is a contrast of potential outcomes in a single population, where the potential outcomes are those we would expect to observe under some hypothetical intervention ( 1 , 4 – 7 ). The fundamental problem of causal inference is that we cannot observe all of these potential outcomes ( 8 ). The estimand for descriptive analyses is a function of the outcomes that occurred for everyone in the target population. The estimation challenge for descriptive analyses is that we may not completely observe all of the actual outcomes. A descriptive analysis might be cross-sectional or longitudinal; it might concern a dichotomous, categorical, or continuous outcome; and it might attempt to summarize the outcome in any number of ways (e.g., median time to some event, mean value, etc.). While much discussion focuses on the most common scenarios (e.g., dichotomous outcomes), this framework is intended to be applied to descriptive analyses for any combination of study designs, outcomes, and estimands.
We start with the premise that good epidemiologic questions are impactful and well-defined. An impactful question, if answered, would lead to knowledge that could inform action in the population it concerns ( 7 ). A well-defined question should be stated with enough specificity and clarity that answering it is at least theoretically possible.
A well-defined research question (causal or descriptive) states: 1) the target population, characterized by person and place, and anchored in time; 2) the outcome, event, or health state or characteristic; and 3) the measure of occurrence that will be used to summarize the outcome (e.g., incidence, prevalence, average time to event, etc.). A causal question requires specifying additional components, such as exposures and covariates that are thought to be confounders, effect modifiers, or mediators. For descriptive questions, consideration of additional variables is optional, but if auxiliary variables will be considered, a well-defined descriptive question will 4) prespecify any other variables of interest and how they will be considered (e.g., to characterize the population, as a stratification factor to characterize the outcome distribution, or as a “nuisance” variable that we would like to adjust for or standardize over). For a descriptive question, indiscriminate adjustment for these other variables can lead to uninterpretable results that may mislead ( 9 ); as such, researchers should be clear as to the purpose of adjustment in descriptive studies, understand the implications of such adjustments, and be cautious in interpreting adjusted statistics ( 10 ).
Example : We illustrate application of this framework to description of one portion of the human immunodeficiency virus (HIV) care continuum ( 11 ): What was the prevalence of viral suppression on December 31, 2019, among adults living with HIV who had been linked to HIV care (i.e., saw a clinician who was aware of their HIV status and had the ability to prescribe antiretroviral therapy) in the United States? We will explore specific components of this question to make it more well-defined (and tie those components to analytical decisions) below.
For a descriptive question, we define the target population as the group in which we would like to characterize the distribution of the outcome. The choice of target population is directly linked to the purpose of asking the question. The target population might be, for example, the population for which we will be providing public health services. The target population is not necessarily enumerated (in contrast to a cohort or a sample), but we do need to be able to define membership in terms of person, place, and time (here, time is used to define membership in the target population and does not relate directly to measurement of the outcome). For our example question, the target population is everyone living in the United States ( place ) who was aged ≥18 years, was infected and diagnosed with HIV, and attended ≥1 clinical visit for HIV care with a clinician who was aware of their infection and could prescribe antiretroviral medication ( person ) before December 31, 2019, and was alive through December 31, 2019 ( time ).
A well-defined question specifies the target population a priori. When data are available on a full census of the target population (e.g., through administrative records or public health surveillance), no sampling is needed. However, when data on the entire population cannot be obtained, we rely on data from a sample of the target population or a population that we hope is sufficiently representative of the target population with respect to both measured and unmeasured characteristics. The study sample is the enumerated set of individuals whose information is captured in a data set, among whom we attempt to measure occurrence of the outcome (after inclusion and exclusion criteria have been applied, if data were not collected using these criteria (e.g., administrative data)). Many descriptive and causal questions are answered using convenience samples without a clear sampling frame (e.g., people recruited using Web-based surveys, frequent clinic attendees, or people who sought medical care in a particular hospital system) and implicitly assume that the study sample is a random sample (perhaps conditional on covariates with known sampling probabilities) of the target population. Achieving a representative sample may involve considerable work and may be very resource-intensive ( 12 ). However, use of convenience samples often results in study samples that are different from the target population in unmeasurable ways, particularly when subjects must actively seek out or opt into participation ( 13 ).
On the topic of sampling and selection, it is also useful to define the analytical sample as a proper subset of the study sample in which disease occurrence is measured given practical limitations (e.g., excluding individuals in the study sample who are missing information on the outcome). We might use information from the analytical sample to attempt to quantify disease occurrence in the study sample, but we must rely on assumptions to do so (e.g., assuming data are missing at random and imputing missing data or reweighting study participants with complete data). For valid inferences, the incidence of the outcome in the sample must be able to stand in for the incidence in the target population. Here, the “sample” is either the analytical sample or the study sample represented by the analytical sample after any attempts to handle missing data. Given the many practical challenges enumerated above, the samples we rely on in our studies are rarely representative of the target population. If the distribution of risk factors for the health state differs between the study sample and the target population, we have a lack of generalizability ( 14 – 16 ); the absolute value (risk, prevalence, rate) of the outcome in the sample will differ from what we would have observed in the target population. Without applying quantitative approaches to generalize data from the sample to the target population, descriptive results will be biased. Except in special cases (e.g., when the selected estimand is the one scale on which effect measure modification is absent), if absolute measures differ between the sample and the target, most contrasts of the outcome across exposure groups in the sample will also be biased for the same contrasts in the target population (causal results will be biased) ( 14 – 16 ). If the underlying joint distribution of all causes of the outcome differs between the analytical sample and the study sample, we have selection bias ( 17 , 18 ). To recover an estimand relevant to the target population from an analytical sample with a different distribution of causes of the outcome, stratification and standardization methods may be appropriate.
Example : Recall that the target population is everyone living in the United States who had been linked to clinical care for HIV before December 31, 2019. There is mandated reporting in the United States of new HIV diagnoses and HIV viral load test results to public health surveillance agencies under national notifiable disease regulations, and the Centers for Disease Control and Prevention aggregates these data from all states and dependent areas. This might seem like a census of the target population. However, despite these mandates, not all diagnoses are reported, and people who move across state lines may be double-counted because of challenges with deduplication. Thus, the number of people with HIV infection may be inaccurate. Additionally, data rely on HIV viral load and CD4 cell-count laboratory tests as a proxy for clinical visits, and the proxy is imperfect ( 19 , 20 ); thus, we cannot accurately apply the second inclusion criterion for target population membership: linkage to clinical care. Alternatively, we might use data from the North American AIDS Cohort Collaboration on Research and Design (NA-ACCORD) ( 21 ) or another clinical cohort of people with HIV who have been linked to care. However, clinical cohort studies are often nested within academic medical centers, where the quality of care and wraparound services may differ (and thus the probability of the outcome, viral suppression, may differ), and may have stricter enrollment criteria (to preserve study resources) than we have used to define linkage to care for our target population.
There are other options for study samples we might try to leverage. We might even choose to estimate the parameter of interest in multiple samples and triangulate the results. The point is that there is rarely a single, perfect, existing study sample that can stand in for the target population. Therefore, if we wish to use existing data, identifying ways in which the study sample and the target population differ provides a framework for thinking about sources of bias and how we might adjust the estimate for better inferences.
A theme of many threats to descriptive and causal epidemiologic inference is that they can often be cast as missing-data problems ( 22 ). The ideal data set for answering our descriptive epidemiologic question includes a row for everyone in the target population and columns with values for the outcome and any covariates of interest. When the study sample is not a census of the target population, anyone in the target population who is not in the study sample will have missing data in some, if not all, columns. Indeed, without a clear sampling frame, we do not even know how many rows are missing from our ideal data set (and we cannot quantify the amount of missing data from this ideal study). Analyzing the study sample as if it were a random sample of the target population is akin to assuming that data are missing completely at random. If, instead, it is plausible to assume that data are missing at random conditional on covariates that are available for target population members who were not selected for the study sample, we could reweight or standardize the study sample to represent the full target population.
Example : The surveillance data include everyone in the target population (age ≥18 years, alive, diagnosed with HIV, and ≥1 HIV care visit before December 31, 2019), but they also include some people who are not in the target population (they include people who did not make ≥1 HIV care visit with a clinician who might prescribe antiretroviral medications), and we are unable to definitively identify people in the surveillance data who do not meet the inclusion criteria for the target population (we have to rely on laboratory tests as a proxy for clinical visits) ( 19 ). However, the surveillance data likely are closer to representing the target population than the NA-ACCORD data (which do not include everyone in the target population, although they do not include anyone who should be excluded from the target population). Therefore, we might use surveillance data for our primary analyses, but we might conduct secondary analyses that leverage the relative strengths of the different study samples and, for example, reweight NA-ACCORD data that include visits to resemble the target population implied by the surveillance data.
To describe the occurrence, frequency, or relative frequency of an outcome, we need an unambiguous definition of that outcome, and we must be able to apply that definition in our data. In the absence of a gold standard or the ability to apply that gold standard due to data or resource constraints, we must understand how imperfect sensitivity and specificity might affect our results. Measurement error has previously been described as a missing-data problem ( 22 ) in which the true outcome is missing and we overwrite that missing value with a mismeasured outcome. To the extent to which the mismeasured outcome is a poor substitute for the true outcome, our inferences will be biased.
Example : Our outcome is “viral suppression” on December 31, 2019, but there is no single, standard threshold for suppression. Prior studies have used plasma HIV RNA levels of <20, <50, <200, or <400 copies/mL ( 23 ). Lower thresholds will result in a lower estimate of the prevalence of viral suppression; for example, in an HIV clinical cohort in Baltimore, Maryland, the proportion of patients estimated to have a suppressed viral load in a given year from 2010 to 2018 was 75% if the threshold for suppression was set at <20 copies/mL but 89% if the threshold was set at <400 copies/mL ( 24 ). Failure to suppress viral load below a lower threshold may also be a more sensitive indicator of subsequent morbidity and mortality ( 24 – 28 ), but suppression below a higher threshold is more relevant as an indicator of an individual’s transmission potential ( 29 , 30 ), so our choice of threshold may depend on how our results will be used. Additionally, not everyone in either of our candidate study samples will have had a viral load measurement on December 31, 2019, exactly. Typically, researchers accept viral loads measured within a time window around some key date as indicative of the viral load on that key date. We must decide how wide a window we are willing to use to answer our question. The width we are willing to tolerate might depend on how frequently we anticipate viral load changes in the population. A wider window risks assigning a viral load value to December 31 that is inaccurate because viral load has changed since measurement, while a narrower window will result in a larger proportion of the cohort with a missing viral-load value.
We have multiple options for measures of occurrence, and like the proverbial blind men feeling the elephant, our choice of measure of occurrence might give us only part of the complete picture about the distribution of the outcome in the target population. Incidence tells us something about how frequently an event occurs over time. There are multiple measures of incidence; in the interest of space, we will restrict our discussion to risks and rates. If individuals are not followed over time and the event can recur, it may be difficult to distinguish the number of affected individuals from the number of events. Prevalent outcomes are often not of interest in causal investigations, as temporality is more challenging to determine and reverse causation is a potential problem. In addition, survival bias might affect results when considering prevalent exposures ( 31 , 32 ). Finally, prevalence is a function of the incidence of the condition and its duration, such that, if incidence is what is relevant to the question at hand, prevalence might be a misleading proxy. However, for descriptive questions designed to inform public-health planning for secondary or tertiary prevention measures, prevalence might be the most relevant measure of occurrence, as it reflects the population of people who might access those services.
Risk (the proportion of people free from disease at baseline who develop the outcome during the study period) is the foundation of many causal epidemiologic studies ( 33 ), particularly as the target trial framework ( 1 ) has gained in popularity. Risk is arguably the most easily interpretable measure of disease occurrence for the general public ( 33 ). We discuss rates (the number of events divided by a sum of person-time) as an alternative measure of incidence in a few paragraphs. Two complications for obtaining valid estimates of either measure of incidence, however, are competing events and incompletely observed person-time (left-truncation and right-censoring).
Competing events are events that preclude the event of interest from occurring and are theoretical if not practical problems for all outcomes other than all-cause mortality ( 34 ). In the presence of competing events, we have the option to report the conditional or unconditional risk (i.e., cumulative incidence function) ( 35 ). The conditional risk is the proportion of people free from disease at baseline that we would expect to develop the outcome during the study period if all competing events were prevented without changing the hazard of the event of interest; it is the risk “conditional” on removal of the competing event. It is estimated by censoring persons who experience a competing event and is the first and sometimes only estimand of risk that students of epidemiology are taught ( 36 ). It is also implied by the exponential formula for converting rates to risks. However, complete removal of the competing event is a hypothetical intervention, and the conditional risk is the risk under that often-infeasible intervention. If our goal is to describe the world as it exists, absent hypothetical interventions, the cumulative incidence function is recommended when the number of competing events is nontrivial ( 37 ). The cumulative incidence function (or, as is implied but is a less commonly used term, the unconditional risk) is the proportion of people free from disease at baseline who would develop the outcome of interest during the study period in the real world in which a competing event might remove them from follow-up and preclude them from ever developing the outcome of interest.
Risks can be calculated in the presence of late entries (left-truncation) and loss to follow-up (right-censoring) under strong assumptions about independence between entering/leaving the study and risk of the outcome ( 38 , 39 ). Left-truncation and right-censoring impute outcomes for people who did not survive to enroll in the study sample and for people who are censored ( 38 ). We can adjust for possible associations between censoring and the outcome (and resultant selection bias) using inverse probability of censoring weights ( 40 ). However, the resultant risks are interpretable as the risk that would have been observed if no one were lost to follow-up (a hypothetical intervention), and will be different from the natural course if loss to follow-up was associated with the outcome in ways not captured by covariates in the weight model or if loss to follow-up itself directly altered the risk of the outcome ( 18 , 40 ).
Finally, rates may occasionally be a useful measure of incidence as an alternative to risks, especially for descriptive studies. Risks are only defined relevant to a population free of, and biologically at risk for, the outcome at a particular time origin. When we would like to describe incidence across a time metric along which not all people were biologically at risk at the time origin, rates can appropriately exclude person-time not at risk and allow for reporting of smoothed incidence estimates. For example, when describing temporal trends for the incidence of HIV diagnoses since the beginning of the epidemic in the 1980s, there will be people who were not born (not at risk for the outcome) in the 1980s who should be counted in the target population in the 2010s. Perhaps in an idealized descriptive study, we would report the daily risk of HIV diagnosis restricted to people who were alive and at risk for HIV diagnosis at the start of each day. However, across 3 decades this may be computationally intensive and impractical given the granularity of data collection and reporting. We might instead report weekly, monthly, or yearly HIV diagnosis risk, but the wider the time interval across which we measure risk becomes, the greater the number of people in our target population who are not at risk at the start of the interval. How should we treat people born in December 1990 when calculating the risk of HIV diagnosis in 1990? In contrast, if we are willing to assume that the rate of HIV diagnosis across a calendar year is approximately constant, or if we assume that the average rate is a reasonable representation of the incidence in that year, rates could appropriately exclude person-time in which people are not biologically at risk. The assumption of a constant rate or the acceptability of an average rate for answering the study question should be plausible across the time intervals chosen, or time should be further discretized. Another benefit of rates is that they are straightforward to estimate when we do not have individual-level data, which is more common in descriptive analyses than in causal or predictive epidemiologic analyses. For example, rates are the standard measure of incidence used for notifiable diseases, where health departments count case reports to get the numerator and use midyear census estimates for the denominator.
Example : We have clearly specified in our research question that we are interested in the prevalence of viral suppression on December 31, 2019. People in our study sample with no viral load measurement in 2019 are lost to follow-up. Viral suppression is influenced by access to health care and is only possible if people are receiving antiretroviral therapy (except, in rare cases, for elite controllers) ( 41 ). In this setting, people who are lost to follow-up may have transferred to another clinic and may still be receiving treatment (if we are using NA-ACCORD data) or may have moved out of the jurisdiction (if we are using surveillance data), and we might assume that they have the same probability of viral suppression as people with a viral load measurement (censoring is appropriate; equivalently, we can restrict analyses to people with a measured viral load) ( 24 ). Alternatively, people who do not have a viral load measurement may have dropped out of clinical care and may not have access to antiretroviral therapy. The probability of viral suppression among these individuals is near 0 (we might think of loss to follow-up as a competing event and assign a value of “not suppressed” to persons who are lost to follow-up) ( 42 ). Understanding the assumptions and implications of different analytical decisions for these people is critical for making the right inference about the prevalence of the outcome.
When describing the prevalence or incidence of an outcome, we sometimes want to characterize the people who got the outcome according to covariates. Alternatively, we may want to account for nuisance variables, such as factors that differ between the study sample and the target population or between groups we plan to stratify by. When characterizing groups with the highest incidence of the outcome, bivariate results can make it challenging to understand how covariates interact to determine the distribution of disease. For example, if the prevalence of viral suppression is lower for cisgender women than for cisgender men and lower for Black patients than for White patients ( 43 ), what would we expect to see regarding the prevalence of viral suppression for cisgender White women relative to cisgender Black men? Stratifying on multiple variables simultaneously might be helpful in this setting, or we may want to employ theoretical models (e.g., conceptual frameworks for how variables influence risk of the outcome) or statistical strategies (e.g., supervised machine learning) to identify the most important variables if there are not enough data to stratify on all variables of interest. Conversely, when trying to understand whether one covariate is associated with the distribution of disease independently or merely because of its correlation with another covariate, a common approach is to put all covariates into a single model. However, this approach can lead to incorrect interpretations of the results and inappropriate recommendations for actions ( 44 ). Adjustment implies an intervention on the data and a distortion of reality—for example, “Would Black people still have lower prevalence of viral suppression if they had the same distribution of HIV acquisition risk factors as White people?”. Inappropriate adjustment may understate the magnitude of disparities ( 45 ) and adjusted statistics are prone to be interpreted causally, which could lead to inappropriate recommendations ( 9 ). We endorse reporting and primary interpretation of unadjusted results for descriptive studies and clear justification and proper interpretation in cases where adjustments are made.
Descriptive epidemiologic studies seek to characterize what is happening in the world to inform public health priorities, target interventions, and occasionally contrast with counterfactual scenarios to estimate intervention effects ( 46 , 47 ). Descriptive studies have value in their own right and not merely as stepping stools toward causal inference. Characterizing what is happening in the world requires that we be very clear about the particular slice of the world and the specific outcome we hope to study. Generalizability and selection biases can bias descriptive studies when study participation is associated with the outcome. Measurement error can bias descriptive studies when we do not use, or there is no gold-standard measure of, the outcome. Different measures of occurrence will provide different pictures of what is happening in the world. Censoring people who have a competing event or adjusting for covariates implies interventions on the data such that the results are a distorted version of reality. These are all basic epidemiologic principles that also affect the success of our attempts at causal effect estimation. Performing rigorous descriptive studies that accurately estimate a parameter of interest and are interpretable to clinicians and policy-makers will improve public health.
Author affiliations: Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States (Catherine R. Lesko); Departments of Epidemiology and Global Health, School of Public Health, Boston University, Boston, Massachusetts, United States (Matthew P. Fox); and Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States (Jessie K. Edwards).
This work was supported by grants K01 AA028193, K01 AI125087, and R01 AI157758 from the National Institutes of Health.
Conflict of interest: none declared.
Hernán MA , Robins JM . Using big data to emulate a target trial when a randomized trial is not available . Am J Epidemiol . 2016 ; 183 ( 8 ): 758 – 764 .
Google Scholar
Petersen ML , van der Laan MJ . Causal models and learning from data: integrating causal modeling and statistical estimation . Epidemiology . 2014 ; 25 ( 3 ): 418 – 426 .
von Elm E , Altman DG , Egger M , et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies . Int J Surg . 2014 ; 12 ( 12 ): 1495 – 1499 .
Robins JM . Data, design, and background knowledge in etiologic inference . Epidemiology . 2001 ; 12 ( 3 ): 313 – 320 .
Rubin DB . The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials . Stat Med . 2007 ; 26 ( 1 ): 20 – 36 .
Petersen ML . Commentary: applying a causal road map in settings with time-dependent confounding . Epidemiology . 2014 ; 25 ( 6 ): 898 – 901 .
Fox MP , Edwards JK , Platt R , et al. The critical importance of asking good questions: the role of epidemiology doctoral training programs . Am J Epidemiol . 2020 ; 189 ( 4 ): 261 – 264 .
Holland PW . Statistics and causal inference . J Am Stat Assoc . 1986 ; 81 ( 396 ): 945 – 960 .
Tennant PWG , Murray EJ . The quest for timely insights into COVID-19 should not come at the cost of scientific rigor . Epidemiology . 2021 ; 32 ( 1 ):e2.
Kaufman JS . Statistics, adjusted statistics, and maladjusted statistics . Am J Law Med . 2017 ; 43 ( 2-3 ): 193 – 208 .
Gardner EM , McLees MP , Steiner JF , et al. The spectrum of engagement in HIV care and its relevance to test-and-treat strategies for prevention of HIV infection . Clin Infect Dis . 2011 ; 52 ( 6 ): 793 – 800 .
Lee KK , Fitts MS , Conigrave JH , et al. Recruiting a representative sample of urban South Australian Aboriginal adults for a survey on alcohol consumption . BMC Med Res Methodol . 2020 ; 20 ( 1 ): 183 .
Offord C . How (not) to do an antibody survey for SARS-CoV-2. Scientist . https://www.the-scientist.com/news-opinion/how-not-to-do-an-antibody-survey-for-sars-cov-2-67488 . Published April 28, 2020 . Accessed April 8, 2022 .
Lesko CR , Buchanan AL , Westreich D , et al. Generalizing study results: a potential outcomes perspective . Epidemiology . 2017 ; 28 ( 4 ): 553 – 561 .
Dahabreh IJ , Robertson SE , Tchetgen EJ , et al. Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals . Biometrics . 2019 ; 75 ( 2 ): 685 – 694 .
Cole SR , Stuart EA . Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 Trial . Am J Epidemiol . 2010 ; 172 ( 1 ): 107 – 115 .
Westreich D . Berkson’s bias, selection bias, and missing data . Epidemiology . 2012 ; 23 ( 1 ): 159 – 164 .
Hernán MA . Invited commentary: selection bias without colliders . Am J Epidemiol . 2017 ; 185 ( 11 ): 1048 – 1050 .
Rebeiro PF , Althoff KN , Lau B , et al. Laboratory measures as proxies for primary care encounters: implications for quantifying clinical retention among HIV-infected adults in North America . Am J Epidemiol . 2015 ; 182 ( 11 ): 952 – 960 .
Lesko CR , Sampson LA , Miller WC , et al. Measuring the HIV care continuum using public health surveillance data in the United States . J Acquir Immune Defic Syndr . 2015 ; 70 ( 5 ): 489 – 494 .
Gange SJ , Kitahata MM , Saag MS , et al. Cohort profile: the North American AIDS Cohort Collaboration on Research and Design (NA-ACCORD) . Int J Epidemiol . 2007 ; 36 ( 2 ): 294 – 301 .
Edwards JK , Cole SR , Westreich D . All your data are always missing: incorporating bias due to measurement error into the potential outcomes framework . Int J Epidemiol . 2015 ; 44 ( 4 ): 1452 – 1459 .
McMahon JH , Elliott JH , Bertagnolio S , et al. Viral suppression after 12 months of antiretroviral therapy in low- and middle-income countries: a systematic review . Bull World Health Organ . 2013 ; 91 ( 5 ): 377 – 385E .
Lesko CR , Chander G , Moore RD , et al. Variation in estimated viral suppression associated with the definition of viral suppression used . AIDS . 2020 ; 34 ( 10 ): 1519 – 1526 .
Hermans LE , Moorhouse M , Carmona S , et al. Effect of HIV-1 low-level viraemia during antiretroviral therapy on treatment outcomes in WHO-guided South African treatment programmes: a multicentre cohort study . Lancet Infect Dis . 2018 ; 18 ( 2 ): 188 – 197 .
Elvstam O , Medstrand P , Yilmaz A , et al. Virological failure and all-cause mortality in HIV-positive adults with low-level viremia during antiretroviral treatment . PLoS One . 2017 ; 12 ( 7 ):e0180761.
Antiretroviral Therapy Cohort Collaboration , Vandenhende MA , Ingle S , et al. Impact of low-level viremia on clinical and virological outcomes in treated HIV-1-infected patients . AIDS . 2015 ; 29 ( 3 ): 373 – 383 .
Laprise C , de Pokomandy A , Baril J-G , et al. Virologic failure following persistent low-level viremia in a cohort of HIV-positive patients: results from 12 years of observation . Clin Infect Dis . 2013 ; 57 ( 10 ): 1489 – 1496 .
Lesko CR , Lau B , Chander G , et al. Time spent with HIV viral load >1500 copies/mL among persons engaged in continuity HIV care in an urban clinic in the United States, 2010–2015 . AIDS Behav . 2018 ; 22 ( 11 ): 3443 – 3450 .
Quinn TC , Wawer MJ , Sewankambo N , et al. Viral load and heterosexual transmission of human immunodeficiency virus type 1. Rakai Project Study Group . N Engl J Med . 2000 ; 342 ( 13 ): 921 – 929 .
Prentice RL , Chlebowski RT , Stefanick ML , et al. Estrogen plus progestin therapy and breast cancer in recently postmenopausal women . Am J Epidemiol . 2008 ; 167 ( 10 ): 1207 – 1216 .
Lund JL , Richardson DB , Stürmer T . The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application . Curr Epidemiol Rep . 2015 ; 2 ( 4 ): 221 – 228 .
Cole SR , Hudgens MG , Brookhart MA , et al. Risk . Am J Epidemiol . 2015 ; 181 ( 4 ): 246 – 250 .
Lau B , Cole SR , Gange SJ . Competing risk regression models for epidemiologic data . Am J Epidemiol . 2009 ; 170 ( 2 ): 244 – 256 .
Edwards JK , Hester LL , Gokhale M , et al. Methodologic issues when estimating risks in pharmacoepidemiology . Curr Epidemiol Rep . 2016 ; 3 ( 4 ): 285 – 296 .
Rothman KJ , Lash TL , VanderWeele TJ , et al. Measures of occurrence. In: Modern Epidemiology . 4th ed. Philadelphia, PA : Wolters Kluwer N.V. ; 2021 : 53 – 77 .
Google Preview
Cole SR , Lau B , Eron JJ , et al. Estimation of the standardized risk difference and ratio in a competing risks framework: application to injection drug use and progression to AIDS after initiation of antiretroviral therapy . Am J Epidemiol . 2015 ; 181 ( 4 ): 238 – 245 .
Cole SR , Edwards JK , Naimi AI , et al. Hidden imputations and the Kaplan-Meier estimator . Am J Epidemiol . 2020 ; 189 ( 11 ): 1408 – 1411 .
Lesko CR , Edwards JK , Cole SR , et al. When to censor? Am J Epidemiol . 2018 ; 187 ( 3 ): 623 – 632 .
Howe CJ , Cole SR , Lau B , et al. Selection bias due to loss to follow up in cohort studies . Epidemiology . 2016 ; 27 ( 1 ): 91 – 97 .
Okulicz JF , Marconi VC , Landrum ML , et al. Clinical outcomes of elite controllers, viremic controllers, and long-term nonprogressors in the US Department of Defense HIV Natural History Study . J Infect Dis . 2009 ; 200 ( 11 ): 1714 – 1723 .
Edwards JK , Lesko CR , Herce ME , et al. Gone but not lost: implications for estimating HIV care outcomes when loss to clinic is not loss to care . Epidemiology . 2020 ; 31 ( 4 ): 570 – 577 .
Centers for Disease Control and Prevention . Monitoring Selected National HIV Prevention and Care Objectives by Using HIV Surveillance Data—United States and 6 Dependent Areas, 2019 . ( HIV Surveillance Supplemental Report , vol. 26, no. 2) . Atlanta, GA : Centers for Disease Control and Prevention ; 2021 . https://www.cdc.gov/hiv/pdf/library/reports/surveillance/cdc-hiv-surveillance-report-vol-26-no-2.pdf . Accessed November 29, 2021 .
Westreich D , Greenland S . The table 2 fallacy: presenting and interpreting confounder and modifier coefficients . Am J Epidemiol . 2013 ; 177 ( 4 ): 292 – 298 .
Zalla LC , Martin CL , Edwards JK , et al. A geography of risk: structural racism and COVID-19 mortality in the United States . Am J Epidemiol . 2021 ; 190 ( 8 ): 1439 – 1446 .
Westreich D . From exposures to population interventions: pregnancy and response to HIV therapy . Am J Epidemiol . 2014 ; 179 ( 7 ): 797 – 806 .
Edwards JK , Cole SR , Lesko CR , et al. An illustration of inverse probability weighting to estimate policy-relevant causal effects . Am J Epidemiol . 2016 ; 184 ( 4 ): 336 – 344 .
Month: | Total Views: |
---|---|
July 2022 | 368 |
August 2022 | 99 |
September 2022 | 133 |
October 2022 | 230 |
November 2022 | 165 |
December 2022 | 4,217 |
January 2023 | 2,259 |
February 2023 | 1,077 |
March 2023 | 1,326 |
April 2023 | 869 |
May 2023 | 560 |
June 2023 | 356 |
July 2023 | 456 |
August 2023 | 545 |
September 2023 | 478 |
October 2023 | 541 |
November 2023 | 465 |
December 2023 | 418 |
January 2024 | 562 |
February 2024 | 411 |
March 2024 | 475 |
April 2024 | 643 |
May 2024 | 490 |
June 2024 | 461 |
July 2024 | 457 |
August 2024 | 144 |
Citing articles via, looking for your next opportunity.
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
Sign In or Create an Account
This PDF is available to Subscribers Only
For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
Part of the book series: Emerging Topics in Statistics and Biostatistics ((ETSB))
1124 Accesses
Epidemiology is essential for education, research, and practice in public health and medicine. As a scientific discipline, epidemiology covers four major tasks, including descriptive, etiological, translational, and methodological epidemiology. Descriptive epidemiology aims at quantifying the distribution of medical, health, or behavioral issues among people residing in a geographic area overtime; etiological epidemiology devotes to the understanding of causes and influential factors of any medical, health, or behavioral issue from onset, to progress and prognosis; translational epidemiology focuses on the transition of study findings from the descriptive and etiological epidemiology into interventions for disease prevention, treatment, and health promotion; and methodological epidemiology strives to develop new methods and innovatively use existing methods to deal with challenges in epidemiological research and practice.
Numbers speak louder than words.
This is a preview of subscription content, log in via an institution to check access.
Subscribe and save.
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Brownson, R.C., Samet, J.M., Bensyl, D.M.: Applied epidemiology and public health: are we training the future generations appropriately? Ann. Epidemiol. 27 (2), 77–82 (2017)
Article Google Scholar
Chen, X., Chen, D.: Cusp catastrophe modeling in medical and health research. In: Chen, Wilson (eds.) Innovative Statistical Methods for Public Health Data, pp. 265–290. Springer (2015)
Chapter Google Scholar
Chen, X., Wang, K.: Geographic area-based rate as a novel indicator to enhance research and precision intervention for more effective HIV/AIDS control. Prev. Med. Rep. 5 , 301–307 (2017)
Chen, X., Yu, B.: Age and birth cohort-adjusted rates of suicide mortality among US male and female youth aged 10-19 years from 1999 to 2017. JAMA Netw. Open. 2 (9), e1911383 (2019)
Chen, X., Hu, H., Xu, X., Gong, J., Yan, Y., Li, F.: Probability sampling by connecting space with households using GIS/GPS technologies. J. Surv. Stud. Methodol. 6 , 149–168 (2018)
Cochran, W.G.: Sampling Techniques, 3rd edn. John Willey & Sons, New York (1977)
MATH Google Scholar
Doll, R., Hill, A.B.: Smoking and carcinoma of the lung. Br. Med. J. 2 (4682), 739–748 (1950)
Heckathorn, D.: Extensions of respondent-driven sampling: analyzing continuous variables and controlling for differential recruitment. Sociol. Methodol. 37 (1), 152–208 (2007)
Article MathSciNet Google Scholar
Henry, G.T.: Practical Sampling. Sage Publications, Newbury Park (1990)
Book Google Scholar
Higgins, C., Hodges, C.: Studies on prostatic cancer. 1. The effect of castration, of estrogen and of androgen injection on serum phosphatases in metastatic carcinoma of the prostate. Cancer Res. 1 , 293–297 (1941)
Google Scholar
Nelson, K.E., William, C.M.: Infectious Disease Epidemiology, 3rd edn. Jones & Bartlett Learning (2014)
Omran, A.R.: The epidemiological transition: a theory of the epidemiology of population change. Milkbank Q. 83 (4), 731–751 (2005)
Palinkas, et al.: Purposeful sampling for qualitative data collection and analysis in mixed method implantation research. Admin. Pol. Ment. Health. 42 (5), 533–544 (2015)
Pasteur, L.: The Physiological Theory of Fermentation and the Germ Theory and its Application to Medicine and Surgery Kessinger Legacy Reprint in 2010. Kessinger Publishing, LLC (1910)
Rothman, J., Greenland, S., Lash, T.L.: Modern Epidemiology, 3rd edn. Wolters Kluwer Health/Lippincott/Williams & Wilkins (2008)
Wang, K., Chen, X., Bird, V.Y., Gerke, T.A., Manini, T.M., Prosperi, M.: Association between age-related reductions in testosterone and risk of prostate cancer – an analysis of patient data with prostate diseases. Int. J. Cancer. 141 (9), 1783–1793 (2017)
Woodward, M.: Epidemiology: Study Design and Data Analysis, 3rd edn. CRC Press (2014)
Yu, B., Chen, X.: Age and birth cohort-adjusted rates of suicide mortality among US male and female youths aged 10 to 19 years from 1999 to 2017. JAMA Netw. Open. 2 (9), e1911383 (2019)
Download references
Authors and affiliations.
Department of Epidemiology, University of Florida, Gainesville, FL, USA
Xinguang Chen
You can also search for this author in PubMed Google Scholar
Reprints and permissions
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
Chen, X. (2021). Introduction to Quantitative Epidemiology. In: Quantitative Epidemiology. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-83852-2_1
DOI : https://doi.org/10.1007/978-3-030-83852-2_1
Published : 22 February 2022
Publisher Name : Springer, Cham
Print ISBN : 978-3-030-83851-5
Online ISBN : 978-3-030-83852-2
eBook Packages : Mathematics and Statistics Mathematics and Statistics (R0)
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Policies and ethics
Diagnosed by clinician and confirmed by pathologist | 53 |
Diagnosed by clinician and not confirmed by pathologist | 21 |
First diagnosed post mortem | 22 |
Farmers (self employed) | 82% |
Professionals | 77% |
Skilled manual workers | 69% |
Labourers | 63% |
Armed forces | 42% |
Content links.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Article 11 August 2024 | Open Access
Article 09 August 2024 | Open Access
Article 07 August 2024 | Open Access
Article 06 August 2024 | Open Access
Article 02 August 2024 | Open Access
Article 01 August 2024 | Open Access
Article 31 July 2024 | Open Access
Article 30 July 2024 | Open Access
Article 29 July 2024 | Open Access
Article 26 July 2024 | Open Access
Article 24 July 2024 | Open Access
Article 23 July 2024 | Open Access
Article 22 July 2024 | Open Access
Article 20 July 2024 | Open Access
Article 19 July 2024 | Open Access
Article 18 July 2024 | Open Access
Article 17 July 2024 | Open Access
Article 16 July 2024 | Open Access
Article 15 July 2024 | Open Access
Article 14 July 2024 | Open Access
Article 12 July 2024 | Open Access
Article 11 July 2024 | Open Access
Article 10 July 2024 | Open Access
Article 09 July 2024 | Open Access
Article 08 July 2024 | Open Access
Article 06 July 2024 | Open Access
BMC Infectious Diseases volume 21 , Article number: 525 ( 2021 ) Cite this article
17k Accesses
33 Citations
13 Altmetric
Metrics details
Navigating the rapidly growing body of scientific literature on the SARS-CoV-2 pandemic is challenging, and ongoing critical appraisal of this output is essential. We aimed to summarize and critically appraise systematic reviews of coronavirus disease (COVID-19) in humans that were available at the beginning of the pandemic.
Nine databases (Medline, EMBASE, Cochrane Library, CINAHL, Web of Sciences, PDQ-Evidence, WHO’s Global Research, LILACS, and Epistemonikos) were searched from December 1, 2019, to March 24, 2020. Systematic reviews analyzing primary studies of COVID-19 were included. Two authors independently undertook screening, selection, extraction (data on clinical symptoms, prevalence, pharmacological and non-pharmacological interventions, diagnostic test assessment, laboratory, and radiological findings), and quality assessment (AMSTAR 2). A meta-analysis was performed of the prevalence of clinical outcomes.
Eighteen systematic reviews were included; one was empty (did not identify any relevant study). Using AMSTAR 2, confidence in the results of all 18 reviews was rated as “critically low”. Identified symptoms of COVID-19 were (range values of point estimates): fever (82–95%), cough with or without sputum (58–72%), dyspnea (26–59%), myalgia or muscle fatigue (29–51%), sore throat (10–13%), headache (8–12%) and gastrointestinal complaints (5–9%). Severe symptoms were more common in men. Elevated C-reactive protein and lactate dehydrogenase, and slightly elevated aspartate and alanine aminotransferase, were commonly described. Thrombocytopenia and elevated levels of procalcitonin and cardiac troponin I were associated with severe disease. A frequent finding on chest imaging was uni- or bilateral multilobar ground-glass opacity. A single review investigated the impact of medication (chloroquine) but found no verifiable clinical data. All-cause mortality ranged from 0.3 to 13.9%.
In this overview of systematic reviews, we analyzed evidence from the first 18 systematic reviews that were published after the emergence of COVID-19. However, confidence in the results of all reviews was “critically low”. Thus, systematic reviews that were published early on in the pandemic were of questionable usefulness. Even during public health emergencies, studies and systematic reviews should adhere to established methodological standards.
Peer Review reports
The spread of the “Severe Acute Respiratory Coronavirus 2” (SARS-CoV-2), the causal agent of COVID-19, was characterized as a pandemic by the World Health Organization (WHO) in March 2020 and has triggered an international public health emergency [ 1 ]. The numbers of confirmed cases and deaths due to COVID-19 are rapidly escalating, counting in millions [ 2 ], causing massive economic strain, and escalating healthcare and public health expenses [ 3 , 4 ].
The research community has responded by publishing an impressive number of scientific reports related to COVID-19. The world was alerted to the new disease at the beginning of 2020 [ 1 ], and by mid-March 2020, more than 2000 articles had been published on COVID-19 in scholarly journals, with 25% of them containing original data [ 5 ]. The living map of COVID-19 evidence, curated by the Evidence for Policy and Practice Information and Co-ordinating Centre (EPPI-Centre), contained more than 40,000 records by February 2021 [ 6 ]. More than 100,000 records on PubMed were labeled as “SARS-CoV-2 literature, sequence, and clinical content” by February 2021 [ 7 ].
Due to publication speed, the research community has voiced concerns regarding the quality and reproducibility of evidence produced during the COVID-19 pandemic, warning of the potential damaging approach of “publish first, retract later” [ 8 ]. It appears that these concerns are not unfounded, as it has been reported that COVID-19 articles were overrepresented in the pool of retracted articles in 2020 [ 9 ]. These concerns about inadequate evidence are of major importance because they can lead to poor clinical practice and inappropriate policies [ 10 ].
Systematic reviews are a cornerstone of today’s evidence-informed decision-making. By synthesizing all relevant evidence regarding a particular topic, systematic reviews reflect the current scientific knowledge. Systematic reviews are considered to be at the highest level in the hierarchy of evidence and should be used to make informed decisions. However, with high numbers of systematic reviews of different scope and methodological quality being published, overviews of multiple systematic reviews that assess their methodological quality are essential [ 11 , 12 , 13 ]. An overview of systematic reviews helps identify and organize the literature and highlights areas of priority in decision-making.
In this overview of systematic reviews, we aimed to summarize and critically appraise systematic reviews of coronavirus disease (COVID-19) in humans that were available at the beginning of the pandemic.
Research question.
This overview’s primary objective was to summarize and critically appraise systematic reviews that assessed any type of primary clinical data from patients infected with SARS-CoV-2. Our research question was purposefully broad because we wanted to analyze as many systematic reviews as possible that were available early following the COVID-19 outbreak.
We conducted an overview of systematic reviews. The idea for this overview originated in a protocol for a systematic review submitted to PROSPERO (CRD42020170623), which indicated a plan to conduct an overview.
Overviews of systematic reviews use explicit and systematic methods for searching and identifying multiple systematic reviews addressing related research questions in the same field to extract and analyze evidence across important outcomes. Overviews of systematic reviews are in principle similar to systematic reviews of interventions, but the unit of analysis is a systematic review [ 14 , 15 , 16 ].
We used the overview methodology instead of other evidence synthesis methods to allow us to collate and appraise multiple systematic reviews on this topic, and to extract and analyze their results across relevant topics [ 17 ]. The overview and meta-analysis of systematic reviews allowed us to investigate the methodological quality of included studies, summarize results, and identify specific areas of available or limited evidence, thereby strengthening the current understanding of this novel disease and guiding future research [ 13 ].
A reporting guideline for overviews of reviews is currently under development, i.e., Preferred Reporting Items for Overviews of Reviews (PRIOR) [ 18 ]. As the PRIOR checklist is still not published, this study was reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 statement [ 19 ]. The methodology used in this review was adapted from the Cochrane Handbook for Systematic Reviews of Interventions and also followed established methodological considerations for analyzing existing systematic reviews [ 14 ].
Approval of a research ethics committee was not necessary as the study analyzed only publicly available articles.
Systematic reviews were included if they analyzed primary data from patients infected with SARS-CoV-2 as confirmed by RT-PCR or another pre-specified diagnostic technique. Eligible reviews covered all topics related to COVID-19 including, but not limited to, those that reported clinical symptoms, diagnostic methods, therapeutic interventions, laboratory findings, or radiological results. Both full manuscripts and abbreviated versions, such as letters, were eligible.
No restrictions were imposed on the design of the primary studies included within the systematic reviews, the last search date, whether the review included meta-analyses or language. Reviews related to SARS-CoV-2 and other coronaviruses were eligible, but from those reviews, we analyzed only data related to SARS-CoV-2.
No consensus definition exists for a systematic review [ 20 ], and debates continue about the defining characteristics of a systematic review [ 21 ]. Cochrane’s guidance for overviews of reviews recommends setting pre-established criteria for making decisions around inclusion [ 14 ]. That is supported by a recent scoping review about guidance for overviews of systematic reviews [ 22 ].
Thus, for this study, we defined a systematic review as a research report which searched for primary research studies on a specific topic using an explicit search strategy, had a detailed description of the methods with explicit inclusion criteria provided, and provided a summary of the included studies either in narrative or quantitative format (such as a meta-analysis). Cochrane and non-Cochrane systematic reviews were considered eligible for inclusion, with or without meta-analysis, and regardless of the study design, language restriction and methodology of the included primary studies. To be eligible for inclusion, reviews had to be clearly analyzing data related to SARS-CoV-2 (associated or not with other viruses). We excluded narrative reviews without those characteristics as these are less likely to be replicable and are more prone to bias.
Scoping reviews and rapid reviews were eligible for inclusion in this overview if they met our pre-defined inclusion criteria noted above. We included reviews that addressed SARS-CoV-2 and other coronaviruses if they reported separate data regarding SARS-CoV-2.
Nine databases were searched for eligible records published between December 1, 2019, and March 24, 2020: Cochrane Database of Systematic Reviews via Cochrane Library, PubMed, EMBASE, CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Sciences, LILACS (Latin American and Caribbean Health Sciences Literature), PDQ-Evidence, WHO’s Global Research on Coronavirus Disease (COVID-19), and Epistemonikos.
The comprehensive search strategy for each database is provided in Additional file 1 and was designed and conducted in collaboration with an information specialist. All retrieved records were primarily processed in EndNote, where duplicates were removed, and records were then imported into the Covidence platform [ 23 ]. In addition to database searches, we screened reference lists of reviews included after screening records retrieved via databases.
All searches, screening of titles and abstracts, and record selection, were performed independently by two investigators using the Covidence platform [ 23 ]. Articles deemed potentially eligible were retrieved for full-text screening carried out independently by two investigators. Discrepancies at all stages were resolved by consensus. During the screening, records published in languages other than English were translated by a native/fluent speaker.
We custom designed a data extraction table for this study, which was piloted by two authors independently. Data extraction was performed independently by two authors. Conflicts were resolved by consensus or by consulting a third researcher.
We extracted the following data: article identification data (authors’ name and journal of publication), search period, number of databases searched, population or settings considered, main results and outcomes observed, and number of participants. From Web of Science (Clarivate Analytics, Philadelphia, PA, USA), we extracted journal rank (quartile) and Journal Impact Factor (JIF).
We categorized the following as primary outcomes: all-cause mortality, need for and length of mechanical ventilation, length of hospitalization (in days), admission to intensive care unit (yes/no), and length of stay in the intensive care unit.
The following outcomes were categorized as exploratory: diagnostic methods used for detection of the virus, male to female ratio, clinical symptoms, pharmacological and non-pharmacological interventions, laboratory findings (full blood count, liver enzymes, C-reactive protein, d-dimer, albumin, lipid profile, serum electrolytes, blood vitamin levels, glucose levels, and any other important biomarkers), and radiological findings (using radiography, computed tomography, magnetic resonance imaging or ultrasound).
We also collected data on reporting guidelines and requirements for the publication of systematic reviews and meta-analyses from journal websites where included reviews were published.
Two researchers independently assessed the reviews’ quality using the “A MeaSurement Tool to Assess Systematic Reviews 2 (AMSTAR 2)”. We acknowledge that the AMSTAR 2 was created as “a critical appraisal tool for systematic reviews that include randomized or non-randomized studies of healthcare interventions, or both” [ 24 ]. However, since AMSTAR 2 was designed for systematic reviews of intervention trials, and we included additional types of systematic reviews, we adjusted some AMSTAR 2 ratings and reported these in Additional file 2 .
Adherence to each item was rated as follows: yes, partial yes, no, or not applicable (such as when a meta-analysis was not conducted). The overall confidence in the results of the review is rated as “critically low”, “low”, “moderate” or “high”, according to the AMSTAR 2 guidance based on seven critical domains, which are items 2, 4, 7, 9, 11, 13, 15 as defined by AMSTAR 2 authors [ 24 ]. We reported our adherence ratings for transparency of our decision with accompanying explanations, for each item, in each included review.
One of the included systematic reviews was conducted by some members of this author team [ 25 ]. This review was initially assessed independently by two authors who were not co-authors of that review to prevent the risk of bias in assessing this study.
For data synthesis, we prepared a table summarizing each systematic review. Graphs illustrating the mortality rate and clinical symptoms were created. We then prepared a narrative summary of the methods, findings, study strengths, and limitations.
For analysis of the prevalence of clinical outcomes, we extracted data on the number of events and the total number of patients to perform proportional meta-analysis using RStudio© software, with the “meta” package (version 4.9–6), using the “metaprop” function for reviews that did not perform a meta-analysis, excluding case studies because of the absence of variance. For reviews that did not perform a meta-analysis, we presented pooled results of proportions with their respective confidence intervals (95%) by the inverse variance method with a random-effects model, using the DerSimonian-Laird estimator for τ 2 . We adjusted data using Freeman-Tukey double arcosen transformation. Confidence intervals were calculated using the Clopper-Pearson method for individual studies. We created forest plots using the RStudio© software, with the “metafor” package (version 2.1–0) and “forest” function.
Some of the included systematic reviews that address the same or similar research questions may include the same primary studies in overviews. Including such overlapping reviews may introduce bias when outcome data from the same primary study are included in the analyses of an overview multiple times. Thus, in summaries of evidence, multiple-counting of the same outcome data will give data from some primary studies too much influence [ 14 ]. In this overview, we did not exclude overlapping systematic reviews because, according to Cochrane’s guidance, it may be appropriate to include all relevant reviews’ results if the purpose of the overview is to present and describe the current body of evidence on a topic [ 14 ]. To avoid any bias in summary estimates associated with overlapping reviews, we generated forest plots showing data from individual systematic reviews, but the results were not pooled because some primary studies were included in multiple reviews.
Our search retrieved 1063 publications, of which 175 were duplicates. Most publications were excluded after the title and abstract analysis ( n = 860). Among the 28 studies selected for full-text screening, 10 were excluded for the reasons described in Additional file 3 , and 18 were included in the final analysis (Fig. 1 ) [ 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 ]. Reference list screening did not retrieve any additional systematic reviews.
PRISMA flow diagram
Summary features of 18 systematic reviews are presented in Table 1 . They were published in 14 different journals. Only four of these journals had specific requirements for systematic reviews (with or without meta-analysis): European Journal of Internal Medicine, Journal of Clinical Medicine, Ultrasound in Obstetrics and Gynecology, and Clinical Research in Cardiology . Two journals reported that they published only invited reviews ( Journal of Medical Virology and Clinica Chimica Acta ). Three systematic reviews in our study were published as letters; one was labeled as a scoping review and another as a rapid review (Table 2 ).
All reviews were published in English, in first quartile (Q1) journals, with JIF ranging from 1.692 to 6.062. One review was empty, meaning that its search did not identify any relevant studies; i.e., no primary studies were included [ 36 ]. The remaining 17 reviews included 269 unique studies; the majority ( N = 211; 78%) were included in only a single review included in our study (range: 1 to 12). Primary studies included in the reviews were published between December 2019 and March 18, 2020, and comprised case reports, case series, cohorts, and other observational studies. We found only one review that included randomized clinical trials [ 38 ]. In the included reviews, systematic literature searches were performed from 2019 (entire year) up to March 9, 2020. Ten systematic reviews included meta-analyses. The list of primary studies found in the included systematic reviews is shown in Additional file 4 , as well as the number of reviews in which each primary study was included.
Most of the reviews analyzed data from patients with COVID-19 who developed pneumonia, acute respiratory distress syndrome (ARDS), or any other correlated complication. One review aimed to evaluate the effectiveness of using surgical masks on preventing transmission of the virus [ 36 ], one review was focused on pediatric patients [ 34 ], and one review investigated COVID-19 in pregnant women [ 37 ]. Most reviews assessed clinical symptoms, laboratory findings, or radiological results.
The summary of findings from individual reviews is shown in Table 2 . Overall, all-cause mortality ranged from 0.3 to 13.9% (Fig. 2 ).
A meta-analysis of the prevalence of mortality
Seven reviews described the main clinical manifestations of COVID-19 [ 26 , 28 , 29 , 34 , 35 , 39 , 41 ]. Three of them provided only a narrative discussion of symptoms [ 26 , 34 , 35 ]. In the reviews that performed a statistical analysis of the incidence of different clinical symptoms, symptoms in patients with COVID-19 were (range values of point estimates): fever (82–95%), cough with or without sputum (58–72%), dyspnea (26–59%), myalgia or muscle fatigue (29–51%), sore throat (10–13%), headache (8–12%), gastrointestinal disorders, such as diarrhea, nausea or vomiting (5.0–9.0%), and others (including, in one study only: dizziness 12.1%) (Figs. 3 , 4 , 5 , 6 , 7 , 8 and 9 ). Three reviews assessed cough with and without sputum together; only one review assessed sputum production itself (28.5%).
A meta-analysis of the prevalence of fever
A meta-analysis of the prevalence of cough
A meta-analysis of the prevalence of dyspnea
A meta-analysis of the prevalence of fatigue or myalgia
A meta-analysis of the prevalence of headache
A meta-analysis of the prevalence of gastrointestinal disorders
A meta-analysis of the prevalence of sore throat
Three reviews described methodologies, protocols, and tools used for establishing the diagnosis of COVID-19 [ 26 , 34 , 38 ]. The use of respiratory swabs (nasal or pharyngeal) or blood specimens to assess the presence of SARS-CoV-2 nucleic acid using RT-PCR assays was the most commonly used diagnostic method mentioned in the included studies. These diagnostic tests have been widely used, but their precise sensitivity and specificity remain unknown. One review included a Chinese study with clinical diagnosis with no confirmation of SARS-CoV-2 infection (patients were diagnosed with COVID-19 if they presented with at least two symptoms suggestive of COVID-19, together with laboratory and chest radiography abnormalities) [ 34 ].
Pharmacological and non-pharmacological interventions (supportive therapies) used in treating patients with COVID-19 were reported in five reviews [ 25 , 27 , 34 , 35 , 38 ]. Antivirals used empirically for COVID-19 treatment were reported in seven reviews [ 25 , 27 , 34 , 35 , 37 , 38 , 41 ]; most commonly used were protease inhibitors (lopinavir, ritonavir, darunavir), nucleoside reverse transcriptase inhibitor (tenofovir), nucleotide analogs (remdesivir, galidesivir, ganciclovir), and neuraminidase inhibitors (oseltamivir). Umifenovir, a membrane fusion inhibitor, was investigated in two studies [ 25 , 35 ]. Possible supportive interventions analyzed were different types of oxygen supplementation and breathing support (invasive or non-invasive ventilation) [ 25 ]. The use of antibiotics, both empirically and to treat secondary pneumonia, was reported in six studies [ 25 , 26 , 27 , 34 , 35 , 38 ]. One review specifically assessed evidence on the efficacy and safety of the anti-malaria drug chloroquine [ 27 ]. It identified 23 ongoing trials investigating the potential of chloroquine as a therapeutic option for COVID-19, but no verifiable clinical outcomes data. The use of mesenchymal stem cells, antifungals, and glucocorticoids were described in four reviews [ 25 , 34 , 35 , 38 ].
Of the 18 reviews included in this overview, eight analyzed laboratory parameters in patients with COVID-19 [ 25 , 29 , 30 , 32 , 33 , 34 , 35 , 39 ]; elevated C-reactive protein levels, associated with lymphocytopenia, elevated lactate dehydrogenase, as well as slightly elevated aspartate and alanine aminotransferase (AST, ALT) were commonly described in those eight reviews. Lippi et al. assessed cardiac troponin I (cTnI) [ 25 ], procalcitonin [ 32 ], and platelet count [ 33 ] in COVID-19 patients. Elevated levels of procalcitonin [ 32 ] and cTnI [ 30 ] were more likely to be associated with a severe disease course (requiring intensive care unit admission and intubation). Furthermore, thrombocytopenia was frequently observed in patients with complicated COVID-19 infections [ 33 ].
Chest imaging (chest radiography and/or computed tomography) features were assessed in six reviews, all of which described a frequent pattern of local or bilateral multilobar ground-glass opacity [ 25 , 34 , 35 , 39 , 40 , 41 ]. Those six reviews showed that septal thickening, bronchiectasis, pleural and cardiac effusions, halo signs, and pneumothorax were observed in patients suffering from COVID-19.
Table 3 shows the detailed results of the quality assessment of 18 systematic reviews, including the assessment of individual items and summary assessment. A detailed explanation for each decision in each review is available in Additional file 5 .
Using AMSTAR 2 criteria, confidence in the results of all 18 reviews was rated as “critically low” (Table 3 ). Common methodological drawbacks were: omission of prospective protocol submission or publication; use of inappropriate search strategy: lack of independent and dual literature screening and data-extraction (or methodology unclear); absence of an explanation for heterogeneity among the studies included; lack of reasons for study exclusion (or rationale unclear).
Risk of bias assessment, based on a reported methodological tool, and quality of evidence appraisal, in line with the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) method, were reported only in one review [ 25 ]. Five reviews presented a table summarizing bias, using various risk of bias tools [ 25 , 29 , 39 , 40 , 41 ]. One review analyzed “study quality” [ 37 ]. One review mentioned the risk of bias assessment in the methodology but did not provide any related analysis [ 28 ].
This overview of systematic reviews analyzed the first 18 systematic reviews published after the onset of the COVID-19 pandemic, up to March 24, 2020, with primary studies involving more than 60,000 patients. Using AMSTAR-2, we judged that our confidence in all those reviews was “critically low”. Ten reviews included meta-analyses. The reviews presented data on clinical manifestations, laboratory and radiological findings, and interventions. We found no systematic reviews on the utility of diagnostic tests.
Symptoms were reported in seven reviews; most of the patients had a fever, cough, dyspnea, myalgia or muscle fatigue, and gastrointestinal disorders such as diarrhea, nausea, or vomiting. Olfactory dysfunction (anosmia or dysosmia) has been described in patients infected with COVID-19 [ 43 ]; however, this was not reported in any of the reviews included in this overview. During the SARS outbreak in 2002, there were reports of impairment of the sense of smell associated with the disease [ 44 , 45 ].
The reported mortality rates ranged from 0.3 to 14% in the included reviews. Mortality estimates are influenced by the transmissibility rate (basic reproduction number), availability of diagnostic tools, notification policies, asymptomatic presentations of the disease, resources for disease prevention and control, and treatment facilities; variability in the mortality rate fits the pattern of emerging infectious diseases [ 46 ]. Furthermore, the reported cases did not consider asymptomatic cases, mild cases where individuals have not sought medical treatment, and the fact that many countries had limited access to diagnostic tests or have implemented testing policies later than the others. Considering the lack of reviews assessing diagnostic testing (sensitivity, specificity, and predictive values of RT-PCT or immunoglobulin tests), and the preponderance of studies that assessed only symptomatic individuals, considerable imprecision around the calculated mortality rates existed in the early stage of the COVID-19 pandemic.
Few reviews included treatment data. Those reviews described studies considered to be at a very low level of evidence: usually small, retrospective studies with very heterogeneous populations. Seven reviews analyzed laboratory parameters; those reviews could have been useful for clinicians who attend patients suspected of COVID-19 in emergency services worldwide, such as assessing which patients need to be reassessed more frequently.
All systematic reviews scored poorly on the AMSTAR 2 critical appraisal tool for systematic reviews. Most of the original studies included in the reviews were case series and case reports, impacting the quality of evidence. Such evidence has major implications for clinical practice and the use of these reviews in evidence-based practice and policy. Clinicians, patients, and policymakers can only have the highest confidence in systematic review findings if high-quality systematic review methodologies are employed. The urgent need for information during a pandemic does not justify poor quality reporting.
We acknowledge that there are numerous challenges associated with analyzing COVID-19 data during a pandemic [ 47 ]. High-quality evidence syntheses are needed for decision-making, but each type of evidence syntheses is associated with its inherent challenges.
The creation of classic systematic reviews requires considerable time and effort; with massive research output, they quickly become outdated, and preparing updated versions also requires considerable time. A recent study showed that updates of non-Cochrane systematic reviews are published a median of 5 years after the publication of the previous version [ 48 ].
Authors may register a review and then abandon it [ 49 ], but the existence of a public record that is not updated may lead other authors to believe that the review is still ongoing. A quarter of Cochrane review protocols remains unpublished as completed systematic reviews 8 years after protocol publication [ 50 ].
Rapid reviews can be used to summarize the evidence, but they involve methodological sacrifices and simplifications to produce information promptly, with inconsistent methodological approaches [ 51 ]. However, rapid reviews are justified in times of public health emergencies, and even Cochrane has resorted to publishing rapid reviews in response to the COVID-19 crisis [ 52 ]. Rapid reviews were eligible for inclusion in this overview, but only one of the 18 reviews included in this study was labeled as a rapid review.
Ideally, COVID-19 evidence would be continually summarized in a series of high-quality living systematic reviews, types of evidence synthesis defined as “ a systematic review which is continually updated, incorporating relevant new evidence as it becomes available ” [ 53 ]. However, conducting living systematic reviews requires considerable resources, calling into question the sustainability of such evidence synthesis over long periods [ 54 ].
Research reports about COVID-19 will contribute to research waste if they are poorly designed, poorly reported, or simply not necessary. In principle, systematic reviews should help reduce research waste as they usually provide recommendations for further research that is needed or may advise that sufficient evidence exists on a particular topic [ 55 ]. However, systematic reviews can also contribute to growing research waste when they are not needed, or poorly conducted and reported. Our present study clearly shows that most of the systematic reviews that were published early on in the COVID-19 pandemic could be categorized as research waste, as our confidence in their results is critically low.
Our study has some limitations. One is that for AMSTAR 2 assessment we relied on information available in publications; we did not attempt to contact study authors for clarifications or additional data. In three reviews, the methodological quality appraisal was challenging because they were published as letters, or labeled as rapid communications. As a result, various details about their review process were not included, leading to AMSTAR 2 questions being answered as “not reported”, resulting in low confidence scores. Full manuscripts might have provided additional information that could have led to higher confidence in the results. In other words, low scores could reflect incomplete reporting, not necessarily low-quality review methods. To make their review available more rapidly and more concisely, the authors may have omitted methodological details. A general issue during a crisis is that speed and completeness must be balanced. However, maintaining high standards requires proper resourcing and commitment to ensure that the users of systematic reviews can have high confidence in the results.
Furthermore, we used adjusted AMSTAR 2 scoring, as the tool was designed for critical appraisal of reviews of interventions. Some reviews may have received lower scores than actually warranted in spite of these adjustments.
Another limitation of our study may be the inclusion of multiple overlapping reviews, as some included reviews included the same primary studies. According to the Cochrane Handbook, including overlapping reviews may be appropriate when the review’s aim is “ to present and describe the current body of systematic review evidence on a topic ” [ 12 ], which was our aim. To avoid bias with summarizing evidence from overlapping reviews, we presented the forest plots without summary estimates. The forest plots serve to inform readers about the effect sizes for outcomes that were reported in each review.
Several authors from this study have contributed to one of the reviews identified [ 25 ]. To reduce the risk of any bias, two authors who did not co-author the review in question initially assessed its quality and limitations.
Finally, we note that the systematic reviews included in our overview may have had issues that our analysis did not identify because we did not analyze their primary studies to verify the accuracy of the data and information they presented. We give two examples to substantiate this possibility. Lovato et al. wrote a commentary on the review of Sun et al. [ 41 ], in which they criticized the authors’ conclusion that sore throat is rare in COVID-19 patients [ 56 ]. Lovato et al. highlighted that multiple studies included in Sun et al. did not accurately describe participants’ clinical presentations, warning that only three studies clearly reported data on sore throat [ 56 ].
In another example, Leung [ 57 ] warned about the review of Li, L.Q. et al. [ 29 ]: “ it is possible that this statistic was computed using overlapped samples, therefore some patients were double counted ”. Li et al. responded to Leung that it is uncertain whether the data overlapped, as they used data from published articles and did not have access to the original data; they also reported that they requested original data and that they plan to re-do their analyses once they receive them; they also urged readers to treat the data with caution [ 58 ]. This points to the evolving nature of evidence during a crisis.
Our study’s strength is that this overview adds to the current knowledge by providing a comprehensive summary of all the evidence synthesis about COVID-19 available early after the onset of the pandemic. This overview followed strict methodological criteria, including a comprehensive and sensitive search strategy and a standard tool for methodological appraisal of systematic reviews.
In conclusion, in this overview of systematic reviews, we analyzed evidence from the first 18 systematic reviews that were published after the emergence of COVID-19. However, confidence in the results of all the reviews was “critically low”. Thus, systematic reviews that were published early on in the pandemic could be categorized as research waste. Even during public health emergencies, studies and systematic reviews should adhere to established methodological standards to provide patients, clinicians, and decision-makers trustworthy evidence.
All data collected and analyzed within this study are available from the corresponding author on reasonable request.
World Health Organization. Timeline - COVID-19: Available at: https://www.who.int/news/item/29-06-2020-covidtimeline . Accessed 1 June 2021.
COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). Available at: https://coronavirus.jhu.edu/map.html . Accessed 1 June 2021.
Anzai A, Kobayashi T, Linton NM, Kinoshita R, Hayashi K, Suzuki A, et al. Assessing the Impact of Reduced Travel on Exportation Dynamics of Novel Coronavirus Infection (COVID-19). J Clin Med. 2020;9(2):601.
Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400. https://doi.org/10.1126/science.aba9757 .
Article CAS PubMed PubMed Central Google Scholar
Fidahic M, Nujic D, Runjic R, Civljak M, Markotic F, Lovric Makaric Z, et al. Research methodology and characteristics of journal articles with original data, preprint articles and registered clinical trial protocols about COVID-19. BMC Med Res Methodol. 2020;20(1):161. https://doi.org/10.1186/s12874-020-01047-2 .
EPPI Centre . COVID-19: a living systematic map of the evidence. Available at: http://eppi.ioe.ac.uk/cms/Projects/DepartmentofHealthandSocialCare/Publishedreviews/COVID-19Livingsystematicmapoftheevidence/tabid/3765/Default.aspx . Accessed 1 June 2021.
NCBI SARS-CoV-2 Resources. Available at: https://www.ncbi.nlm.nih.gov/sars-cov-2/ . Accessed 1 June 2021.
Gustot T. Quality and reproducibility during the COVID-19 pandemic. JHEP Rep. 2020;2(4):100141. https://doi.org/10.1016/j.jhepr.2020.100141 .
Article PubMed PubMed Central Google Scholar
Kodvanj, I., et al., Publishing of COVID-19 Preprints in Peer-reviewed Journals, Preprinting Trends, Public Discussion and Quality Issues. Preprint article. bioRxiv 2020.11.23.394577; doi: https://doi.org/10.1101/2020.11.23.394577 .
Dobler CC. Poor quality research and clinical practice during COVID-19. Breathe (Sheff). 2020;16(2):200112. https://doi.org/10.1183/20734735.0112-2020 .
Article Google Scholar
Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. https://doi.org/10.1371/journal.pmed.1000326 .
Lunny C, Brennan SE, McDonald S, McKenzie JE. Toward a comprehensive evidence map of overview of systematic review methods: paper 1-purpose, eligibility, search and data extraction. Syst Rev. 2017;6(1):231. https://doi.org/10.1186/s13643-017-0617-1 .
Pollock M, Fernandes RM, Becker LA, Pieper D, Hartling L. Chapter V: Overviews of Reviews. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.1 (updated September 2020). Cochrane. 2020. Available from www.training.cochrane.org/handbook .
Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane handbook for systematic reviews of interventions version 6.1 (updated September 2020). Cochrane. 2020; Available from www.training.cochrane.org/handbook .
Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. The impact of different inclusion decisions on the comprehensiveness and complexity of overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):18. https://doi.org/10.1186/s13643-018-0914-3 .
Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. A decision tool to help researchers make decisions about including systematic reviews in overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):29. https://doi.org/10.1186/s13643-018-0768-8 .
Hunt H, Pollock A, Campbell P, Estcourt L, Brunton G. An introduction to overviews of reviews: planning a relevant research question and objective for an overview. Syst Rev. 2018;7(1):39. https://doi.org/10.1186/s13643-018-0695-8 .
Pollock M, Fernandes RM, Pieper D, Tricco AC, Gates M, Gates A, et al. Preferred reporting items for overviews of reviews (PRIOR): a protocol for development of a reporting guideline for overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):335. https://doi.org/10.1186/s13643-019-1252-9 .
Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Open Med. 2009;3(3):e123–30.
Krnic Martinic M, Pieper D, Glatt A, Puljak L. Definition of a systematic review used in overviews of systematic reviews, meta-epidemiological studies and textbooks. BMC Med Res Methodol. 2019;19(1):203. https://doi.org/10.1186/s12874-019-0855-0 .
Puljak L. If there is only one author or only one database was searched, a study should not be called a systematic review. J Clin Epidemiol. 2017;91:4–5. https://doi.org/10.1016/j.jclinepi.2017.08.002 .
Article PubMed Google Scholar
Gates M, Gates A, Guitard S, Pollock M, Hartling L. Guidance for overviews of reviews continues to accumulate, but important challenges remain: a scoping review. Syst Rev. 2020;9(1):254. https://doi.org/10.1186/s13643-020-01509-0 .
Covidence - systematic review software. Available at: https://www.covidence.org/ . Accessed 1 June 2021.
Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.
Borges do Nascimento IJ, et al. Novel Coronavirus Infection (COVID-19) in Humans: A Scoping Review and Meta-Analysis. J Clin Med. 2020;9(4):941.
Article PubMed Central Google Scholar
Adhikari SP, Meng S, Wu YJ, Mao YP, Ye RX, Wang QZ, et al. Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review. Infect Dis Poverty. 2020;9(1):29. https://doi.org/10.1186/s40249-020-00646-x .
Cortegiani A, Ingoglia G, Ippolito M, Giarratano A, Einav S. A systematic review on the efficacy and safety of chloroquine for the treatment of COVID-19. J Crit Care. 2020;57:279–83. https://doi.org/10.1016/j.jcrc.2020.03.005 .
Li B, Yang J, Zhao F, Zhi L, Wang X, Liu L, et al. Prevalence and impact of cardiovascular metabolic diseases on COVID-19 in China. Clin Res Cardiol. 2020;109(5):531–8. https://doi.org/10.1007/s00392-020-01626-9 .
Article CAS PubMed Google Scholar
Li LQ, Huang T, Wang YQ, Wang ZP, Liang Y, Huang TB, et al. COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(6):577–83. https://doi.org/10.1002/jmv.25757 .
Lippi G, Lavie CJ, Sanchis-Gomar F. Cardiac troponin I in patients with coronavirus disease 2019 (COVID-19): evidence from a meta-analysis. Prog Cardiovasc Dis. 2020;63(3):390–1. https://doi.org/10.1016/j.pcad.2020.03.001 .
Lippi G, Henry BM. Active smoking is not associated with severity of coronavirus disease 2019 (COVID-19). Eur J Intern Med. 2020;75:107–8. https://doi.org/10.1016/j.ejim.2020.03.014 .
Lippi G, Plebani M. Procalcitonin in patients with severe coronavirus disease 2019 (COVID-19): a meta-analysis. Clin Chim Acta. 2020;505:190–1. https://doi.org/10.1016/j.cca.2020.03.004 .
Lippi G, Plebani M, Henry BM. Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: a meta-analysis. Clin Chim Acta. 2020;506:145–8. https://doi.org/10.1016/j.cca.2020.03.022 .
Ludvigsson JF. Systematic review of COVID-19 in children shows milder cases and a better prognosis than adults. Acta Paediatr. 2020;109(6):1088–95. https://doi.org/10.1111/apa.15270 .
Lupia T, Scabini S, Mornese Pinna S, di Perri G, de Rosa FG, Corcione S. 2019 novel coronavirus (2019-nCoV) outbreak: a new challenge. J Glob Antimicrob Resist. 2020;21:22–7. https://doi.org/10.1016/j.jgar.2020.02.021 .
Marasinghe, K.M., A systematic review investigating the effectiveness of face mask use in limiting the spread of COVID-19 among medically not diagnosed individuals: shedding light on current recommendations provided to individuals not medically diagnosed with COVID-19. Research Square. Preprint article. doi : https://doi.org/10.21203/rs.3.rs-16701/v1 . 2020 .
Mullins E, Evans D, Viner RM, O’Brien P, Morris E. Coronavirus in pregnancy and delivery: rapid review. Ultrasound Obstet Gynecol. 2020;55(5):586–92. https://doi.org/10.1002/uog.22014 .
Pang J, Wang MX, Ang IYH, Tan SHX, Lewis RF, Chen JIP, et al. Potential Rapid Diagnostics, Vaccine and Therapeutics for 2019 Novel coronavirus (2019-nCoV): a systematic review. J Clin Med. 2020;9(3):623.
Rodriguez-Morales AJ, Cardona-Ospina JA, Gutiérrez-Ocampo E, Villamizar-Peña R, Holguin-Rivera Y, Escalera-Antezana JP, et al. Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis. Travel Med Infect Dis. 2020;34:101623. https://doi.org/10.1016/j.tmaid.2020.101623 .
Salehi S, Abedi A, Balakrishnan S, Gholamrezanezhad A. Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients. AJR Am J Roentgenol. 2020;215(1):87–93. https://doi.org/10.2214/AJR.20.23034 .
Sun P, Qie S, Liu Z, Ren J, Li K, Xi J. Clinical characteristics of hospitalized patients with SARS-CoV-2 infection: a single arm meta-analysis. J Med Virol. 2020;92(6):612–7. https://doi.org/10.1002/jmv.25735 .
Yang J, Zheng Y, Gou X, Pu K, Chen Z, Guo Q, et al. Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis. Int J Infect Dis. 2020;94:91–5. https://doi.org/10.1016/j.ijid.2020.03.017 .
Bassetti M, Vena A, Giacobbe DR. The novel Chinese coronavirus (2019-nCoV) infections: challenges for fighting the storm. Eur J Clin Investig. 2020;50(3):e13209. https://doi.org/10.1111/eci.13209 .
Article CAS Google Scholar
Hwang CS. Olfactory neuropathy in severe acute respiratory syndrome: report of a case. Acta Neurol Taiwanica. 2006;15(1):26–8.
Google Scholar
Suzuki M, Saito K, Min WP, Vladau C, Toida K, Itoh H, et al. Identification of viruses in patients with postviral olfactory dysfunction. Laryngoscope. 2007;117(2):272–7. https://doi.org/10.1097/01.mlg.0000249922.37381.1e .
Rajgor DD, Lee MH, Archuleta S, Bagdasarian N, Quek SC. The many estimates of the COVID-19 case fatality rate. Lancet Infect Dis. 2020;20(7):776–7. https://doi.org/10.1016/S1473-3099(20)30244-9 .
Wolkewitz M, Puljak L. Methodological challenges of analysing COVID-19 data during the pandemic. BMC Med Res Methodol. 2020;20(1):81. https://doi.org/10.1186/s12874-020-00972-6 .
Rombey T, Lochner V, Puljak L, Könsgen N, Mathes T, Pieper D. Epidemiology and reporting characteristics of non-Cochrane updates of systematic reviews: a cross-sectional study. Res Synth Methods. 2020;11(3):471–83. https://doi.org/10.1002/jrsm.1409 .
Runjic E, Rombey T, Pieper D, Puljak L. Half of systematic reviews about pain registered in PROSPERO were not published and the majority had inaccurate status. J Clin Epidemiol. 2019;116:114–21. https://doi.org/10.1016/j.jclinepi.2019.08.010 .
Runjic E, Behmen D, Pieper D, Mathes T, Tricco AC, Moher D, et al. Following Cochrane review protocols to completion 10 years later: a retrospective cohort study and author survey. J Clin Epidemiol. 2019;111:41–8. https://doi.org/10.1016/j.jclinepi.2019.03.006 .
Tricco AC, Antony J, Zarin W, Strifler L, Ghassemi M, Ivory J, et al. A scoping review of rapid review methods. BMC Med. 2015;13(1):224. https://doi.org/10.1186/s12916-015-0465-6 .
COVID-19 Rapid Reviews: Cochrane’s response so far. Available at: https://training.cochrane.org/resource/covid-19-rapid-reviews-cochrane-response-so-far . Accessed 1 June 2021.
Cochrane. Living systematic reviews. Available at: https://community.cochrane.org/review-production/production-resources/living-systematic-reviews . Accessed 1 June 2021.
Millard T, Synnot A, Elliott J, Green S, McDonald S, Turner T. Feasibility and acceptability of living systematic reviews: results from a mixed-methods evaluation. Syst Rev. 2019;8(1):325. https://doi.org/10.1186/s13643-019-1248-5 .
Babic A, Poklepovic Pericic T, Pieper D, Puljak L. How to decide whether a systematic review is stable and not in need of updating: analysis of Cochrane reviews. Res Synth Methods. 2020;11(6):884–90. https://doi.org/10.1002/jrsm.1451 .
Lovato A, Rossettini G, de Filippis C. Sore throat in COVID-19: comment on “clinical characteristics of hospitalized patients with SARS-CoV-2 infection: a single arm meta-analysis”. J Med Virol. 2020;92(7):714–5. https://doi.org/10.1002/jmv.25815 .
Leung C. Comment on Li et al: COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(9):1431–2. https://doi.org/10.1002/jmv.25912 .
Li LQ, Huang T, Wang YQ, Wang ZP, Liang Y, Huang TB, et al. Response to Char’s comment: comment on Li et al: COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(9):1433. https://doi.org/10.1002/jmv.25924 .
Download references
We thank Catherine Henderson DPhil from Swanscoe Communications for pro bono medical writing and editing support. We acknowledge support from the Covidence Team, specifically Anneliese Arno. We thank the whole International Network of Coronavirus Disease 2019 (InterNetCOVID-19) for their commitment and involvement. Members of the InterNetCOVID-19 are listed in Additional file 6 . We thank Pavel Cerny and Roger Crosthwaite for guiding the team supervisor (IJBN) on human resources management.
This research received no external funding.
Authors and affiliations.
University Hospital and School of Medicine, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Israel Júnior Borges do Nascimento & Milena Soriano Marcolino
Medical College of Wisconsin, Milwaukee, WI, USA
Israel Júnior Borges do Nascimento
Helene Fuld Health Trust National Institute for Evidence-based Practice in Nursing and Healthcare, College of Nursing, The Ohio State University, Columbus, OH, USA
Dónal P. O’Mathúna
School of Nursing, Psychotherapy and Community Health, Dublin City University, Dublin, Ireland
Department of Anesthesiology, Intensive Care and Pain Medicine, University of Münster, Münster, Germany
Thilo Caspar von Groote
Department of Sport and Health Science, Technische Universität München, Munich, Germany
Hebatullah Mohamed Abdulazeem
School of Health Sciences, Faculty of Health and Medicine, The University of Newcastle, Callaghan, Australia
Ishanka Weerasekara
Department of Physiotherapy, Faculty of Allied Health Sciences, University of Peradeniya, Peradeniya, Sri Lanka
Cochrane Croatia, University of Split, School of Medicine, Split, Croatia
Ana Marusic, Irena Zakarija-Grkovic & Tina Poklepovic Pericic
Center for Evidence-Based Medicine and Health Care, Catholic University of Croatia, Ilica 242, 10000, Zagreb, Croatia
Livia Puljak
Cochrane Brazil, Evidence-Based Health Program, Universidade Federal de São Paulo, São Paulo, Brazil
Vinicius Tassoni Civile & Alvaro Nagib Atallah
Yorkville University, Fredericton, New Brunswick, Canada
Santino Filoso
Laboratory for Industrial and Applied Mathematics (LIAM), Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada
Nicola Luigi Bragazzi
You can also search for this author in PubMed Google Scholar
IJBN conceived the research idea and worked as a project coordinator. DPOM, TCVG, HMA, IW, AM, LP, VTC, IZG, TPP, ANA, SF, NLB and MSM were involved in data curation, formal analysis, investigation, methodology, and initial draft writing. All authors revised the manuscript critically for the content. The author(s) read and approved the final manuscript.
Correspondence to Livia Puljak .
Ethics approval and consent to participate.
Not required as data was based on published studies.
Not applicable.
The authors declare no conflict of interest.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: appendix 1..
Search strategies used in the study.
Adjusted scoring of AMSTAR 2 used in this study for systematic reviews of studies that did not analyze interventions.
List of excluded studies, with reasons.
Table of overlapping studies, containing the list of primary studies included, their visual overlap in individual systematic reviews, and the number in how many reviews each primary study was included.
A detailed explanation of AMSTAR scoring for each item in each review.
List of members and affiliates of International Network of Coronavirus Disease 2019 (InterNetCOVID-19).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Borges do Nascimento, I.J., O’Mathúna, D.P., von Groote, T.C. et al. Coronavirus disease (COVID-19) pandemic: an overview of systematic reviews. BMC Infect Dis 21 , 525 (2021). https://doi.org/10.1186/s12879-021-06214-4
Download citation
Received : 12 April 2020
Accepted : 19 May 2021
Published : 04 June 2021
DOI : https://doi.org/10.1186/s12879-021-06214-4
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1471-2334
Digital Commons @ USF > USF Health > College of Public Health > Epidemiology and Biostatistics > Theses and Dissertations
Theses/dissertations from 2023 2023.
Gender Differences in Episodic Memory in Later Life: The Mediating Role of Education , Sara Robinson
Nonparametric Estimation of Transition Probabilities in Illness-Death Model based on Ranked Set Sampling , Ying Ma
Bayesian Multivariate Joint Modeling for Skewed-longitudinal and Time-to-event Data , Lan Xu
Identifying Barriers and Facilitators to Improve Hepatitis C Virus Screening , Linh M. Duong
Quantifying the Impact of Chronic Stress on Racial Disparities in Cardiovascular Disease , Nnadozie Emechebe
A Review of American College Campus Tobacco or Smoke free Policies: A Case Study of a Large Urban University , Sarah E. Powell
Evolutionary Dynamics of Influenza Type B in the Presence of Vaccination: An Ecological Study , Lindsey J. Fiedler
Respiratory Infections and Risk for Development of Narcolepsy: Analysis of the Truven Health MarketScan Database (2008 to 2010) with Additional Assessment of Incidence and Prevalence , Darren Scheer
Multimodal Treatment and Neoadjuvant Chemotherapy Trends, Utilization and Survival Effects in Intrahepatic Cholangiocarcinoma – a Propensity Score Analysis , Ovie Utuama
Flowgraph Models for Clustered Multistate Time to Event Data , Kristin Hall
Impact of Obesity and Expression of Obesity-Related Genes in the Progression of Prostate Cancer in African American Men , Mmadili Nancy Ilozumba
Angiostrongylus cantonensis: Epidemiologic Review, Location-Specific Habitat Modelling, and Surveillance in Hillsborough County, Florida, U.S.A. , Brad Christian Perich
Strategies to Adjust for Response Bias in Clinical Trials: A Simulation Study , Victoria R. Swaidan
Sleep and Alzheimer’s disease: A critical examination of the risk that Sleep Problems or Disorders particularly Obstructive Sleep Apnea pose towards developing Alzheimer’s disease , Omonigho A. Michael Bubu
Deployment, Post-Traumatic Stress Disorder and Hypertensive Disorders of Pregnancy among U.S. Active-Duty Military Women , Michelle C. Nash
Ambient Ozone and Cadmium as Risk Factors For Congenital Diaphragmatic Hernia , Rema Ramakrishnan
Ambient Benzene and PM2.5 Exposure during Pregnancy: Examining the Impact of Exposure Assessment Decisions on Associations between Birth Defects and Air Pollution , Jean Paul Tanner
Bayesian inference on quantile regression-based mixed-effects joint models for longitudinal-survival data from AIDS studies , Hanze Zhang
Sleep Duration Patterns from Adolescence to Young Adulthood and their Impact on Asthma and Inflammation , Chighaf Bakour
Efficiency of an Unbalanced Design in Collecting Time to Event Data with Interval Censoring , Peiyao Cheng
Association between Folate Levels and Preterm Birth in Tampa, Florida , Carolyn Heeraman
HIV/STIs and Intimate Partner Violence: Results from the Togo 2013-2014 Demographic and Health Surveys , Anthony H. Nguyen
Incidence, Persistence, and Recurrence of Anogenital α- Mucosal HPV Infections (HPV 6, 11, 16, 18, 31, 33, 45, 52 and 58) , Shitaldas J. Pamnani
Factors Associated with Sexually Transmitted Infections (STIs) and Multiple STI Co-infections: Results from the EVRI HIV Prevention Preparedness Trial , Ubin Pokharel
Hidden Markov Chain Analysis: Impact of Misclassification on Effect of Covariates in Disease Progression and Regression , Haritha Polisetti
Association of Known and Unknown Oncoviruses with External Genital Lesion (EGL) Manifestations in a Multinational Cohort of Men , Shams Ur Rahman
Racial and Ethnic Differences in Low-Risk Cesarean Deliveries in Florida , Yuri Combo Vanda Sebastiao
The Effects of Personal and Family History of Cancer on the Development of Dementia in Japanese Americans: The KAME Project , Adam Lee Slotnick
Rhabdomyosarcoma Incidence and Survival in Whites, Blacks, and Hispanics from 1973-2013: Analysis from the Surveillance, Epidemiology, and End Results Program , Heather Tinsley
Assessment of the impact of Attention Deficit Hyperactivity Disorder on Type 1 Diabetes , Kellee Miller
Bayesian Inference on Longitudinal Semi-continuous Substance Abuse/Dependence Symptoms Data , Dongyuan Xing
Statistical Analysis and Modeling of PM 2.5 Speciation Metals and Their Mixtures , Boubakari Ibrahimou
Elective Early Term Delivery and Adverse Infant Outcomes in a Population-Based Multiethnic Cohort , Jason Lee Salemi
Uncontrolled Hypertension and Associated Factors in Hypertensive Patients at the Primary Healthcare Center Luis H. Moreno, Panama: A Feasibility Study , Roderick Ramon Chen Camano
An Analysis of the Association between Animal Exposures and the Development of Type 1 Diabetes in the TEDDY Cohort , Callyn Hall
Multiple Calibrations in Integrative Data Analysis: A Simulation Study and Application to Multidimensional Family Therapy , Kristin Wynn Hall
Mother- to - Child Transmission of HIV and congenital syphilis: A snapshot of an Epidemic in the Republic of Panama , Lorna Elizabeth Jenkins
A Latent Mixture Approach to Modeling Zero-Inflated Bivariate Ordinal Data , Rajendra Kadel
Associations of Perceived Stress, Sleep, and Human Papillomavirus in a Prospective Cohort of Men , Stephanie Kay Kolar
Influence of Maternal Thyroid Dysfunction on Infant Growth and Development , Ronee Elisha Wilson
Bayesian Inference on Mixed-effects Models with Skewed Distributions for HIV longitudinal Data , Ren Chen
Linear Mixed-Effects Models: Applications to the Behavioral Sciences and Adolescent Community Health , Lizmarie Gabriela Maldonado
Statistical Estimation of Physiologically-based Pharmacokinetic Models: Identifiability, Variation, and Uncertainty with an Illustration of Chronic Exposure to Dioxin and Dioxin-like-compounds. , Zachary John Thompson
Evaluation of Repeated Biomarkers: Non-parametric Comparison of Areas under the Receiver Operating Curve Between Correlated Groups Using an Optimal Weighting Scheme , Ping Xu
The Natural History of Human Papillomavirus Related Condyloma In a Multinational Cohort of Men , Gabriella Anic
Characterization of the Serologic Responses to Plasmodium vivax DBPII Variants Among Inhabitants of Pursat Province, Cambodia , Samantha Jones Barnes
Disparities in Survival and Mortality among Infants with Congenital Aortic, Pulmonary, and Tricuspid Valve Defects by Maternal Race/Ethnicity and Infant Sex , Colleen Conklin
Case-Control Study of Sunlight Exposure and Cutaneous Human Papillomavirus Seroreactivity in Basal Cell and Squamous Cell Carcinomas of the Skin , Michelle R. Iannacone
Assessing the Relationship of Monocytes with Primary and Secondary Dengue Infection among Hospitalized Dengue Patients in Malaysia, 2010: A Cross-Sectional Study , Benjamin Glenn Klekamp
Gender Differences in Lung Cancer Treatment and Survival , Margaret Anne Kowski
An examination of diet, acculturation and risk factors for heart disease among Jamaican immigrants , Carol Renee Oladele
Indicators of Early Adult and Current Personality in Parkinson's Disease , Kelly Sullivan
Does Patient Dementia Limit the Use of Cardiac Catheterization in ST-Elevated Myocardial Infarction? , Marianne Chanti-Ketterl
Extending the Principal Stratification Method To Multi-Level Randomized Trials , Jing Guo
Serum Antibodies to Human Papillomavirus Type 6, 11, 16 and 18 and Their Role in the Natural History of HPV Infection in Men , Beibei Lu
Evaluation of Common Inherited Variants in Mitochondrial-Related and MicroRNA-Related Genes as Novel Risk Factors for Ovarian Cancer , Jennifer Permuth Wey
DNA Methylation and its Association with Prenatal Exposures and Pregnancy Outcomes , Jennifer Straughen
Cardiovascular risk factors for mild cognitive impairment , Michael Malek-Ahmadi
Additive Latent Variable (ALV) Modeling: Assessing Variation in Intervention Impact in Randomized Field Trials , Peter Ayo Toyinbo
A Comparison of Community-Based Centers versus University-Based Centers in Clinical Trial Performance , Cynthia R. Stockddale
Advanced Search
Home | About | Help | My Account | Accessibility Statement | Language and Diversity Statements
Privacy Copyright
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Stephen tyrer.
1 Newcastle University, UK
2 University of Huddersfield, UK
Surveys of people's opinions are fraught with difficulties. It is easier to obtain information from those who respond to text messages or to emails than to attempt to obtain a representative sample. Samples of the population that are selected non-randomly in this way are termed convenience samples as they are easy to recruit. This introduces a sampling bias. Such non-probability samples have merit in many situations, but an epidemiological enquiry is of little value unless a random sample is obtained. If a sufficient number of those selected actually complete a survey, the results are likely to be representative of the population. This editorial describes probability and non-probability sampling methods and illustrates the difficulties and suggested solutions in performing accurate epidemiological research.
It is wise to be chary of surveys and polls, and to always read the figures carefully. In the heady excitement just before the vote on Scottish independence many observers thought that allowing teenagers to vote would propel Scotland to self-government. After the referendum, the Straits Times in Singapore stated ‘Young people voted in droves to break up the centuries-old union’, based on an exit poll that showed that 71% had voted ‘yes’ to independence. 1 This poll only included the very small number of 14 people in this age bracket, 4 of whom had voted ‘no’. A later, more representative YouGov poll with a much larger sample reported that 51% of 16- to 24-year-olds had voted ‘no’. 2 In whatever way the first sample had been selected, its small size would have made it highly susceptible to sampling error. When there is a strong expectation that a particular event is going to result there is a strong inclination to believe the anticipated outcome.
Sampling for health-related research does not usually need to be as precise as sampling for political surveys but in epidemiological investigations every effort should be made to select a representative sample. Often this is not achieved. Concern has been expressed for years about the number of prisoners who have mental health problems. In a 1979 study in the USA to estimate the prevalence of mental illness in prisoners, 33 male prisoners were selected and interviewed by a psychiatrist using an instrument called the Psychiatric Status Schedule. 3 Of those interviewed 3% were diagnosed as having a mental disorder and 27% had a drug or alcohol problem. 4 The main problem with this paper is the number of people sampled and how they were selected. Although it is stated that the prisoners were selected at random, the number of prisoners selected for interview is on the low side. The procedure for randomisation is not indicated. Female prisoners were not included. The determination of the prevalence of mental illness from a survey in one prison in one state in the USA cannot be extrapolated to the whole country, where there are more than six grades of prisons according to the degree of security required. There is no indication in the paper about how the sampling procedure controlled for the proportion of inmates that were detained and those that were sentenced. Apart from sampling errors, justifiable criticism can also be made of the reliability of only having one psychiatrist reviewing all prisoners, the categorical method of diagnosis (mental disorder or drug or alcohol misuse) and the use of the Psychiatric Status Schedule, which is reported to have consistency in many of its scales. Under these circumstances it is unsurprising that the estimate of prevalence of mental disorder in this survey did not accord with a recent systematic review examining studies over a 40-year period which found 14% of prisoners had a diagnosed psychiatric disorder. 5
When carrying out any survey of any type it is essential for the researcher to clearly define the target population that they wish to sample. On some occasions the population will be sufficiently small, and the researcher is able to include the entire population in the study. This is termed a census study. Much more frequently the population is too large for all its members to be contacted and so a sample is chosen to reflect the characteristics of the population from which it is drawn.
Sampling methods are described as either probability or non-probability methods ( Box 1 ). 6 In probability samples, each member of the population has an exactly equal chance of being selected. Types of probability sampling include random sampling, stratified and systematic sampling. Probability sampling is a more accurate method in determining the true characteristics of the population but it is not perfect. Sampling error refers to the variations from the true population parameter which can result from random sampling. With true probability samples sampling error is reduced by having larger samples. In non-probability sampling, the degree to which the sample differs from the population is unknown.
Box 1 Sampling methods
Census study: whole population under enquiry
Probability sampling:
Non-probability sampling:
Qualitative research: purposive
To estimate how large the sample should be to reflect the total population the confidence level of the mean of the results, a measure of the variance of the responses of the sample (standard deviation) and an estimate of the margin of allowable error need to be determined. The calculation is not difficult and help can be readily accessed ( www.qualtrics.com/blog/determining-sample-size ).
Random sampling.
In random sampling every member of the population has the same chance (probability) of being selected into the sample. Using a random sample it is possible to describe quantitatively the relationship between the sample and the underlying population, giving the range of values, called confidence intervals, in which the true population parameter is likely to lie. Random does not mean arbitrary. Choosing a random sample relies on an objective mechanism to select elements from the population. This is usually done by a computer, but rolling dice or using random numbers are also acceptable options.
Stratified sampling is often used when one or more of the strata (subsets of the population) have a low incidence relative to the other strata. It can also be used to reduce sampling error.
In systematic sampling every 5th, 10th, 20th or n -th record is selected from a list of population members. It is no more than a form of random sampling.
In non-probability sampling members are selected from the population in any form of non-random manner. Examples include convenience sampling, judgement sampling, quota sampling and snowball sampling.
Convenience, accidental or opportunistic sampling is used to find out a cheap estimate of the truth. An easily accessible non-random selection of the population under enquiry is chosen. A frequently used method is contacting people by email.
An extension of convenience sampling is judgement sampling. Thus, when carrying out a national enquiry on the frequency of depressive illness, one specific town and one rural area that are thought to be typical of the country as a whole may be selected. Ideally, the chosen sample needs to be representative of the entire population and this is difficult to determine.
Quota sampling is the non-probability equivalent of stratified sampling. In the first instance the investigator identifies the strata and their frequency in the population. Convenience sampling is then used to select the required number of participants from each stratum.
Snowball sampling is a special non-probability method used when there are difficulties in identifying members of the population or if the desired sample characteristic is rare. This technique relies on existing study participants recruiting future participants from among their acquaintances. It is often used when it is anticipated that individuals may be reluctant to be identified, for instance when surveying illegal drug users. Although inexpensive, major bias may result because a balanced cross-section of the population is not identified.
Which sampling method to use depends on the nature of the survey proposed. Epidemiological research requires a representative sample but there is a great deal of health research that does not need one. Service evaluations and randomised controlled trials (RCTs) do not require a survey design. In an RCT the main purpose is to compare groups within the sample, members of which are placed into them randomly, such as treatment v . placebo. Similarly, health psychometrics (e.g. design of health measures), experimental studies, theoretical-based research studies (e.g. testing a theory or proposing a new theory), observational studies (e.g. looking for relationships of theoretical constructs, such as depression and self-esteem) are mostly conducted using opportunistic samples. Precisely accurate statistics may not be required.
Qualitative researchers are often concerned with what exists rather than how much, 7 and seek to delve into complex processes such as responding to long-term illnesses. Purposive sampling, one of the most common sampling strategies, groups participants according to pre-selected criteria relevant to a particular research question. There are more: Kuzel 8 identified 13 different forms of qualitative sampling strategy, including maximum variation, theory-driven, critical case and deviant case. One case is sufficient at times to illustrate a point. For example, Heyman et al 9 explored the experiences of a female patient who had ‘risked exploding’, according to a colorectal nurse, by absconding from hospital to have sexual intercourse with her boyfriend immediately after anal cancer surgery. The aim of the study was to understand why one particular individual had behaved in such a medically risky and highly unusual way. A recent introduction to qualitative research methodology is provided by Silverman. 10
When performing a survey there is a strong temptation to obtain information from as much of the population as possible in the belief that accuracy can be increased in this way. An example is given to show that this may be fallacious.
Many of us are interested in psychiatrists' views about service issues. A researcher wishes to find out the opinions of psychiatrists about policy regarding controlled drugs. A questionnaire is designed with a number of statements ranging from tighter control over existing drugs to decriminalisation of all unscheduled agents. Respondents have to select which statement best accords with their views. The researcher is also interested in the responses of grades of psychiatrist to see whether there are different attitudes about the issue between consultants and trainee psychiatrists. The Royal College of Psychiatrists holds the names of all psychiatrists in the UK, and the researcher is given access to this list. It is proposed that as many psychiatrists as possible are required, and so all the psychiatrists are contacted by email and asked their views. When all the questionnaires are returned online the response rate is 38% with 5128 psychiatrists completing the questionnaire. The analysis of the replies of this large number of people takes a good deal of time but this is completed after a few months and the paper is written. It is submitted to a prestigious psychiatric journal and is rejected. What were the reasons?
A proportion of the individuals would not have been contactable by email, and this group may have different attitudes from the rest. The nature of the responses of those individuals who failed to reply to the questionnaire, the majority, is unknown. They might have differed from respondents if, for instance, busier or more stressed psychiatrists were less likely to participate. As a result, the sample identified by the researcher may not have been representative and the findings cannot be safely generalised to all those working in this field. This is a non-probability sample and, as such, statistical inferences cannot be validly made from the results. Notwithstanding, the results of this survey are not valueless. Although they cannot be reliably generalised to the total population of psychiatrists, they could still be useful for piloting purposes. Certain questions on the survey could be refined and/or alternative questions included in a later enquiry.
In the example referred to above the sample size should be determined (see earlier) and the names of those selected for interview entered into a sampling frame. Attempts should be made to contact all those included to ensure that the results are representative. Multiple efforts must be made to persuade those selected to complete the survey questionnaire. If most of the initially identified sample do provide information, the results can be analysed statistically and valid conclusions can be drawn.
The researcher will need to decide whether to aim for a simple probability sample or to stratify the sample by predetermining the numbers to be selected randomly into relevant categories, for example, in this case, occupational grade (consultant, specialist registrar, etc.), gender. Stratification ensures that the sample is representative of the population with respect to the chosen population parameters if known; or, more commonly, to ensure that categories with smaller numbers in the population (e.g. associate specialists) are adequately represented for comparative purposes. An introduction to stratified and other forms of complex probability samples is provided by Bryman. 11
Selection bias can arise if insufficient numbers of individuals identified in the sampling frame fail to complete the questionnaire. The greater the number of non-respondents who fail to complete the exercise the more scope there is for the sample to be skewed in an unknown direction. As a rule of thumb, the researcher should aim for at least a minimum of 60% completion by those selected from the sampling frame and every effort should be made to achieve more than this. If the percentage of those completing the questionnaire is less than 100%, as it almost invariably will be, there are a number of strategies the investigator can adopt to manage non-response bias.
In the first instance, the non-respondents should be approached asking them again to complete the questionnaire. In those who fail to respond again a third attempt should be made to urge them to reply. Comparisons can then be made between first-, second- and third-time responders. If the responses are similar then extra sampling may not be needed. If the responses of the late respondents are very different to the rest of the study then it may be necessary to contact more of the non-respondents. This depends on the proportion of respondents completing the survey, the larger the number the better.
It may not be necessary to obtain more data as it has been shown that the observations of late responders are more like non-responders than are first-time responders, 12 so the responses of the late responders can be applied to those who failed to respond to the enquiry. This cannot be assumed, however, and late respondents in some surveys behave like earlier participants. 13
It has also been shown that if a small random sample of non-respondents is selected and all can be contactable and complete the survey, the results can be extrapolated to the remainder of the non-respondents. The relatively small number of 20 is considered to be sufficient for this purpose if all complete the questionnaire. 14 In practice, it is very difficult to ensure such a 100% response in a survey of this nature and this aim may not be achievable.
We hope this article will persuade the reader to examine the methods that have been used to perform surveys of opinions and other issues. Let us quote a final example. A Mail On Sunday poll in August 2011 showed that the majority of those surveyed backed the reintroduction of capital punishment. 15 One thousand people took part in this survey which was said to be representative of British public opinion. The consumer panel from which these people were selected were contacted online so those without email access were not included. Furthermore, members of this panel are paid for a registration of their interest and for each poll in which they give their opinion. They are possibly representative of the Daily Mail readership but not of the general population whose views may or may not correspond to those of the sample.
Those intending to perform surveys can find more information in this document: www.sagepub.com/upm-data/40803_5.pdf . Those wishing to carry out surveys on psychiatric topics, particularly if involving the membership of the Royal College of Psychiatrists, should contact the College Registrar.
We thank Dr Jonathan Tyrer, Genetic Epidemiology Group, Department of Oncology, Cambridge University, for helpful advice on the manuscript.
Declaration of interest None.
Background Prevalence measures the occurrence of any health condition, exposure or other factors related to health. The experience of COVID-19, a new disease caused by SARS-CoV-2, has highlighted the importance of prevalence studies, for which issues of reporting and methodology have traditionally been neglected.
Objective This communication highlights key issues about risks of bias in the design and conduct of prevalence studies and in reporting them, using examples about SARS-CoV-2 and COVID-19.
Summary The two main domains of bias in prevalence studies are those related to the study population (selection bias) and the condition or risk factor being assessed (information bias). Sources of selection bias should be considered both at the time of the invitation to take part in a study and when assessing who participates and provides valid data (respondents and non-respondents). Information bias appears when there are systematic errors affecting the accuracy and reproducibility of the measurement of the condition or risk factor. Types of information bias include misclassification, observer and recall bias. When reporting prevalence studies, clear descriptions of the target population, study population, study setting and context, and clear definitions of the condition or risk factor and its measurement are essential. Without clear reporting, the risks of bias cannot be assessed properly. Bias in the findings of prevalence studies can, however, impact decision-making and the spread of disease. The concepts discussed here can be applied to the assessment of prevalence for many other conditions.
Conclusions Efforts to strengthen methodological research and improve assessment of the risk of bias and the quality of reporting of studies of prevalence in all fields of research should continue beyond this pandemic.
This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/ .
https://doi.org/10.1136/bmjopen-2022-061497
Request permissions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
In introductory epidemiology, students learn about prevalence, an easy to understand concept, defined as ‘a proportion that measures disease occurrence of any type of health condition, exposure, or other factor related to health’, 1 or ‘the proportion of persons in a population who have a particular disease or attribute at a specified point in time or over a specified period.’ 2 Prevalence is an important measure for assessing the magnitude of health-related conditions, and studies of prevalence are an important source of information for estimating the burden of disease, injuries and risk factors. 3 Accurate information about prevalence enables health authorities to assess the health needs of a population, to develop prevention programmes and prioritise resources to improve public health. 4 Perhaps, owing to the apparent simplicity of the concept of prevalence, methodological developments to assess the quality of reporting, the potential for bias and the synthesis of prevalence estimates in meta-analysis have been neglected, 5 when compared with the attention paid to methods relevant to evidence from randomised controlled trials and comparative observational studies. 6 7
The COVID-19 pandemic has shown the need for epidemiological studies to describe and understand a new disease quickly but accurately. 8 Studies reporting on prevalence have been an important source of evidence to describe the prevalence of active SARS-CoV-2 infection and antibodies to SARS-CoV-2, the spectrum of SARS-CoV-2-related morbidity and helped to understand factors related to infection and disease to inform national decisions about containment measures. 9–11 Accurate estimates of prevalence of SARS-CoV-2 are crucial because they are used as an input for the estimation of other quantities, such as infection fatality ratios, which can be calculated indirectly using seroprevalence estimates. 12 Assessments of published studies have, however, highlighted methodological issues that affect study design, conduct, analysis, interpretation and reporting. 13–15 In addition, some questions about prevalence need to be addressed through systematic reviews and meta-epidemiological studies. A high proportion of published systematic reviews of prevalence, however, also have flaws in reporting and methodological quality. 5 16 Confidence in the results of systematic reviews is determined by the credibility of the primary studies and the methods used to synthesise them.
The objective of this communication is to highlight key issues about the risk of bias in studies that measure prevalence and about the quality of reporting, using examples about SARS-CoV-2 and COVID-19. We refer to prevalence at the level of a population, and not as a prediction at an individual level. The estimand is, therefore, ‘what proportion of the population is positive’ and not ‘what is the probability this person is positive.’ Although incidence and prevalence are related epidemiologically, we do not discuss incidence in this article because the study designs for measurement of the quantities differ. Bias is a systematic deviation of results or inferences from the underlying (unobserved) true values. 1 The risk of bias is a judgement about the degree to which the methods or findings of a study might underestimate or overestimate the true value in the target population, 7 in this case, the prevalence of a condition or risk factor. Quality of reporting refers to the completeness and transparency of the presentation of a research publication. 17 Risk of bias and quality of reporting are separate, but closely related, because it is only possible to assess the strengths and weaknesses of a study report if the methods and results are described adequately.
The two main domains of bias in prevalence studies are those related to the study population (selection bias) and the condition being assessed (information bias) ( figure 1 ). Biases involved in the design, conduct and analysis of a study affect its internal validity. Selection bias also affects external validity, the extent to which findings from a specific study can be generalised to a wider, target population in time and space. There are many names given to different biases, often addressing the same concept. For this communication, we use the names and definitions published in the Dictionary of Epidemiology. 1
Potential for selection bias and information bias in prevalence studies. Coloured lines relate to the coloured boxes, showing at which stage of study procedures selection bias (blue line) and information bias (purple line) can occur.
Selection bias relates to the representativeness of the sample used to estimate the prevalence in relation to the target population. The target population is the group of individuals to whom the findings, conclusions or inferences from a study can be generalised. 1 There are two steps in a prevalence study at which selection bias might occur: at the invitation to take part in the study and, among those invited, who takes part ( figure 1 ).
The probability of being invited to take part in a study should be the same for every person in the target population. Evaluation of selection bias at this stage should, therefore, account for the complexity of the strategy for identification of participants. For example, if participants are invited from people who have previously agreed to participate in a registry or cohort, each level of invitation that has contributed to the final setting should be judged for the increasing risk of self-selection. Those who are invited to take part might be defined by demographic characteristics, for example, children below 10 years or study setting (eg, hospitalised patients), or a random sample of the general population. The least biased method to select participants in a prevalence study is to sample at random from the target population. For example, the Real-time Assessment of Community Transmission (REACT) Studies to assess the prevalence of the virus, using molecular diagnostic tests (REACT-1) and antibodies (REACT-2), invite random samples of people, stratified by area, from the National Health Service patient list in England. 9 Those invited are close to a truly random sample because almost everyone in England is registered with a general practitioner. In some cases, criteria applied to the selection of a random sample might still result in considerable bias. For example, a seroprevalence study conducted in Spain did not include care home addresses, which could have excluded around 6% of the Spanish older population. 18 Excluding people in care homes facilities might underestimate SARS-CoV-2 seroprevalence in older adults, if their risk of exposure was higher than the average in the general population. 13 Other methods of sampling are at risk of selection bias. For example, asking for volunteers through advertisements are liable to selection bias because not everyone has the same probability of seeing or replying to the advert. For example, the use of social media to invite people to a drive-in test centre to estimate the population prevalence of antibodies to SARS-CoV-2, 19 or online adverts to assess mental health symptoms during the pandemic, excludes those without an internet connection or who do not use social media, such as older people. 20
Non-response bias occurs when people who have been invited, but do not take part in a study differ systematically from those who take part in ways that are associated with the condition of interest. 21 In the REACT-1 study, 22 for example, across four survey rounds, the investigators invited 2.4 million people; 596 000 swabs that were returned had a valid result (25%). The proportion of participants responding was lower in later than in earlier rounds, in men than women and in younger than older age groups. If the sociodemographic characteristics of the target population are known, the observed results could be weighted statistically to represent the overall population but might still be biased by unmeasurable characteristics that drive willingness to take part.
The direction of non-response bias is often not predictable (can result in over-or underestimation of the true prevalence) because information about the motivation to take part in a study, or not, is not usually collected. 13 In a multicentre cross-sectional survey of the prevalence of PCR-determined SARS-CoV-2 in hospitals in England, the authors suggested that different selection biases could have had opposing effects. 23 For example, staff might have volunteered to take part if they were concerned that they might have been exposed to COVID-19. If such people were more likely than unexposed people to be tested, prevalence might be overestimated. Alternatively, workers in lower-paid jobs, without financial support might have been less likely to take part than those at higher grades because of the consequences for themselves or their contacts if found to be infected. If the less-well paid jobs are also associated with a higher risk of exposure to SARS-CoV-2, the prevalence in the study population would be underestimated. Accorsi et al suggest that the risk of non-response bias in seroprevalence studies might be reduced by sampling from established and well-characterised cohorts with high levels of participation, in whom the characteristics of non-respondents are known. 13
As the proportion of invited people that do not take part in a study (non-respondents) increases, the probability of non-response bias might also increase if the topic of the study influences the probability and the composition of the study population. 24 Empirical evidence of bias was found in a systematic review of sexually transmitted Chlamydia trachomatis infection; prevalence surveys with the lowest proportion of respondents found the highest prevalence of infection, suggesting selective participation by those with a high risk of being infected. 25 Whether or not there is a dose–response relationship between the proportion of non-respondents and the likelihood of SARS-CoV-2 infection is unclear. The risks of selection bias at the stages of invitation and participation can be interrelated and might oppose each other. In the REACT-1 study, 22 it is not clear whether the reduction in selection bias through random sampling outweighed the potential for selection bias owing to the high and increasing proportion of non-respondents over time or vice versa.
Information bias occurs when there are systematic errors affecting the completeness or accuracy of the measurement of the condition or risk factor of interest. There are different types of information bias.
This bias refers to the incorrect classification of a participant as having, or not having, the condition of interest. Misclassification is an important source of measurement bias in prevalence studies because diagnostic tests are imperfect and might not distinguish clearly among those with and without the condition. 26 For diagnostic tests, the predictive values will also be influenced by the prevalence of the condition in the study population. Seroprevalence studies are essential for determining the proportion of a population that has been exposed to SARS-CoV-2 up to a given time point. Detection of antibodies is affected by the test type and manufacturer, sample type such as serum, dried blood spots, saliva, urine or others, 27 28 and the time of sampling after infection. Different diagnostic tests might also be used in participants in the same study population, but adjustment for test performance is not always appropriate because the characteristics derived from studies in which the tests were validated might differ from the study population. 13 Accorsi et al have described in detail this issue and other biases in the ascertainment of SARS-CoV-2 seroprevalence studies. 13 Test accuracy can also change across populations, owing to the inherent characteristics of tests when clinical variability is present, 29 for example, when tests for SARS-CoV-2 detection are applied to people with or without symptoms.
In a new disease, such as COVID-19, diagnostic criteria might not be standardised or might change over time. For example, accurate assessment of the prevalence of persistent asymptomatic SARS-CoV-2 infection requires a complete list of symptoms and follow-up for a sufficiently long duration to ensure that symptoms did not develop later. 15 30 In a prevalence study conducted in a care home in March 2020, patients were asked about typical and non-typical symptoms of COVID-19. However, symptoms such as anosmia or ageusia had not been reported in association with SARS-CoV-2 at that time, so patients with these as isolated symptoms could have been wrongly classified as asymptomatic. 15 31 Poor quality of data collection has also been found in studies estimating the prevalence of mental health problems during the pandemic. 32 The use of non-validated scales, or dichotomisation to define the cases using inappropriate or unclear thresholds, will bias the estimated prevalence of the condition. Misclassification may also occur in calculations of the prevalence of SARS-CoV-2 in contacts of diagnosed cases if not all contacts are tested, and it is assumed that individuals that were not tested were also uninfected. 13
This bias results in misclassification when the condition has been measured through surveys or questionnaires that rely on memory. A study that aimed to describe the characteristics and symptom profile of individuals with SARS-CoV-2 infection in the USA collected information about symptoms before, and for 14 days after, being enrolled in the study. 33 The authors discuss the potential for recall bias when collecting symptoms retrospectively and if different people recollect different symptoms.
This bias occurs when an observer provides a wrong measurement due to lack of training or subjectivity. 21 For example, a study in the USA found variation between 14 universities in the prevalence of clinical and subclinical myocarditis in competitive athletes with SARS-CoV-2 infection. 34 One of the diagnostic tools was cardiac magnetic resonance imaging and authors attributed some of the variability to differences in the protocols and the expertise among assessors. To reduce the risk of observer bias, researchers should aim to use tools that minimise subjectivity and standardise training procedures.
There is no agreed list of preferred items for reporting studies of prevalence. The published article or a preprint are usually the only available record of a study to which most people, other than the investigators themselves, have access. The written report, therefore, needs to contain the information required to understand the possible biases and assess internal and external validity. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement is a widely used guideline, which includes recommendations for cross-sectional studies that examine associations between an exposure and outcome. 35 Table 1 shows selected items from the STROBE statement and recommendations for cross-sectional studies that are particularly relevant to the complete and transparent description of methods for studies of prevalence.
Items from the STROBE checklist for cross-sectional studies that are relevant for prevalence studies
First, clear definitions of the target population, study setting and eligibility criteria to select the study population are required (STROBE items 5, 6a). These issues affect assessment of external validity 36 because estimates of prevalence in a specific population and setting are often generalised more widely. 1 14 Second, the denominator used to calculate the prevalence should be clearly stated, with a description of each stage of the study showing the numbers of individuals eligible, included and analysed (STROBE item 13a, b). Accurate reports of the numbers and characteristics of those who take part (responders) or do not take part (non-responders) in the study are needed for the assessment of selection bias, but this information is not always available. 24 37 Poor reporting about the proportion of responders has been described as one of the main limitations of studies in systematic reviews of prevalence. 38 As with reports of studies of any design, the statistical methods applied to provide prevalence estimates, including methods used to address missing data (STROBE item 12c) and to account for the sampling strategy (STROBE item 12d) need to be reported clearly. 35 The setting, location and periods of enrolment and data collection (STROBE item 5) are particularly important for studies of SARS-CoV-2; the stage of the pandemic, preventive measures in place and virus variants in circulation should all be described because these affect the interpretation of estimates of prevalence. Third, it is crucial to provide a clear definition of the condition or risk factor of interest (STROBE item 7) and how it was measured (STROBE item 8), so that the risk of information bias can be assessed. The definition may be straightforward if there are objective criteria for ascertainment. For example, studies of the prevalence of active SARS-CoV-2 infection should report the diagnostic test, manufacturer, sample type and criteria for a positive result. 39 40 For new conditions that have not been fully characterised, such as post-COVID-19 condition, also known as ‘long COVID-19’, reporting of prevalence can be challenging. 41 42 The WHO produced a case definition 43 in October 2021, but this might take time to be adopted widely.
The COVID-19 pandemic has produced an enormous amount of research about a single disease, published over a short time period. 44 45 Authors who have assessed the body of research on COVID-19 have highlighted concerns about the risks of bias in different study designs, including studies of prevalence. 13 44 In systematic reviews of a single topic, the occurrence of asymptomatic SARS-CoV-2 infection, we observed high between-study heterogeneity, serious risks of bias and poor reporting in the measurement of prevalence. 30 Biased results from prevalence studies can have a direct impact at the levels of the individual, community, global health and policy-making. This communication describes concepts about risks of bias and provides examples that authors can apply to the assessment of prevalence for many other conditions. Future research should be conducted to investigate sources of bias in studies of prevalence and empirical evidence of their influence on estimates of prevalence. The development of a tool that can be adapted to assess the risk of bias in studies of prevalence, and an extension to the STROBE reporting guideline, specifically for studies of prevalence, would help to improve the quality of published studies of prevalence in all fields of research beyond this pandemic.
We would like to thank Yuly Barón, who created figure 1.
Twitter @dianacarbg, @Geointheworld, @nicolamlow
Contributors DB-G, GS and NL conceptualised the project. DB-G and NL wrote the manuscript. GS and NL provided feedback. GS and NL supervised the research. All authors edited the manuscript and approved the final manuscript.
Funding This work received support from the Swiss government excellence scholarship (grant number 2019.0774), the SSPH+ Global PhD Fellowship Programme in Public Health Sciences of the Swiss School of Public Health, and the Swiss National Science Foundation (project number 176233, and National Research Programme 78 COVID-19, project number 198418).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
BMC Medical Research Methodology volume 22 , Article number: 209 ( 2022 ) Cite this article
57k Accesses
71 Citations
Metrics details
Although books and articles guiding the methods of sample size calculation for prevalence studies are available, we aim to guide, assist and report sample size calculation using the present calculators.
We present and discuss four parameters (namely level of confidence, precision, variability of the data, and anticipated loss) required for sample size calculation for prevalence studies. Choosing correct parameters with proper understanding, and reporting issues are mainly discussed. We demonstrate the use of a purposely-designed calculators that assist users to make proper informed-decision and prepare appropriate report.
Two calculators can be used with free software (Spreadsheet and RStudio) that benefit researchers with limited resources. It will, hopefully, minimize the errors in parameter selection, calculation, and reporting. The calculators are available at: ( https://sites.google.com/view/sr-ln/ssc ).
Peer Review reports
In quantitative research, when we take a sample from a study population or eligible population in order to save our resources, there are two important statistical processes namely using a probability sampling method (commonly known as “random sampling”) [ 1 ], and calculating an appropriate sample size [ 2 ]. Both are equally important to ensure a good representative sample for the study population.
As we need a specific statistical analysis for a specific research objective, we also need a specific sample size calculation method for a specific research objective. Even if two research objectives may require a similar statistical analysis, the sample size might be different depending on the parameters that we use for the calculation. In this paper, we focus on the objective that estimates a prevalence or proportion, for example, to estimate the prevalence of obesity, the prevalence of smoking, the prevalence of heart disease, diabetes mellitus or any other diseases of a study population. The method in this paper will not be suitable for other type of objectives such as estimating mean, comparing means, comparing proportions or regression analyses.
Books [ 3 , 4 ] and published articles [ 5 , 6 ] guiding the methods of sample size calculation for prevalence studies are available. Nevertheless, we observed that several parts of the sample size calculation process can be guided by a software or calculator and it can prevent incorrect calculation, incorrect use of formula, incorrect parameters, and incomplete sample size reporting.
Sample size softwares and calculators are extremely helpful that are available through commercial licenses such as Power Analysis & Sample Size (PASS) [ 7 ], or via freely available softwares such as Epitools [ 8 ] and the “presize” package in R [ 9 ]. However, there are a lot of confusion that still exists, that resulted in users incorrectly calculating sample size of their studies [ 10 , 11 ] especially the erroneous notion that one blanket formula can be used for all study designs [ 6 ]. In addition, users are expected to have some statistical knowledge to calculate and report the sample size calculation. Incorrect sample size calculation could introduce statistical errors that give rise to inaccurate results, which could be serious, particularly in medical research where evidences from these research studies are cornerstones of medical practices [ 12 , 13 ]. Many reasons could be attributed to these confusion, inaccuracy, and misunderstanding, in particular, the complexity of available softwares and corresponding guidelines [ 13 ].
Therefore, in this paper, we are addressing these issues by introducing a user-friendly Excel calculator that guides users to use the correct method and parameters step-by-step. This calculator also generates a publication-style report of adequate sample size for users’ study. We believe that, this will improve sample size calculation in future prevalence studies in medical and health sciences.
Method to calculate sample size.
For an objective that estimates a prevalence, the sample size calculation formula is fairly simple and available in a number of books.
The following formula [ 2 ] shall be used:
where n = Sample size,
Z = Z statistic for a level of confidence (1.96 for 95% confidence level),
P = Expected prevalence or proportion, and.
d = Precision.
However, we do not encourage researchers to use formula as it could have human error in manual calculation. We can use available softwares, and concentrate on carefully choosing appropriate parameters for the calculation.
The above formula indicates three parameters to be determined.
When we take a sample but wish to know about the population (such as prevalence of smoking) from where the sample is taken, we will not know the exact prevalence of the population as we do not study all members of the population. However, the sample study gives us an estimation which has lower and upper limits (informally ‘a range’, but we call ‘interval’ in Statistics) for the population prevalence. We normally calculate these lower and upper limits or an interval with a certain level of confidence. Commonly used or almost always used “level of confidence” for these intervals or estimates, is 95% (which we called 95% confidence interval, CI) in medical and health fields. In addition, most data analysis softwares give the results with 95% CIs by default. For these reasons, and also to minimize users’ error by non-statisticians, we have fixed the level of confidence as 95% without giving users’ choice in these presented calculators.
As mentioned above, we will not know the exact prevalence of the population as we do not study all members of the population. Therefore, the prevalence we calculate from the sample could deviate (error) from the population prevalence. We call this deviation as sampling error. We also know that, the larger the sample size, the smaller the errors in estimation. The errors are calculated as precision or also known as ‘margin of error’.
Practically, the precision reflects the width of 95% confidence interval. If we decide to choose an absolute precision of ± 2% in estimating a prevalence, we should expect, in the result, the width of 95% CI as 4% (example: 95% CI: 23%, 27%). If the absolute precision is ± 5% in estimating a prevalence, we should expect, in the result, the width of 95% CI as 10% (example: 95% CI: 20%, 30%). The width of the CI is twice that of the precision. Details are presented in Table 1 .
It is an opportunity for researchers to decide the precision (margin of error) and the width of the CI that they wish to see in the results. Normally, researchers wish to have narrower width of CI but the narrower it is, the more expensive (bigger sample size) it is going to be. Even if researchers decide to go for a smaller sample size, the researchers can also foresee or appreciate how poor CI width is going to be in their results. Therefore, this is an informed decision to be made by researchers.
Practically, we give some recommendations for choosing a precision value (Table 2 ). In general, well-funded studies or large scale studies, aiming to gain attention from policy makers, should aim for a precision of 2 to 3%, whereas small scale (or poorly-funded studies), for example, undergraduate or master student research projects, may consider a precision of 4 to 5%. If the precision is larger than 5% (such as 10%), due to limited resources, researchers should consider the study as a preliminary study.
However, the above recommendation applies to the expected prevalence of 10 to 90%. When the expected prevalence is too small (less than 10%) or too large (more than 90%), we need to apply much smaller precision. It is obvious that a precision of 5% is possible for an expected prevalence of 50%, but 5% precision is totally inappropriate for an expected prevalence of 2%.
We present details of precision for expected prevalence with examples in Table 2 .
The larger the variation the data has, the larger is the sample size needed. This relationship can be explained in a simple analogy. When we cook soup and near to the finish, we stir it well before we taste. We always need a very small amount (small sample size) to taste because we stir it well and the variation is almost zero.
Practically, in estimating prevalence, the prevalence has effect on this variation and therefore effect on the required sample size. The relationship of prevalence and the sample size is presented in Fig. 1 .
Prevalence and Effect on Sample Size
Obviously, it is the research objective to estimate the prevalence and researchers do not know this prevalence. Therefore, to calculate sample size, we normally find it out from most recent published studies with similar study population. If we cannot find suitable studies in the literature, we may consider to conduct a pilot study.
When we find multiple suitable prevalence from the literature, for example ranging from 15 to 30%, we should use the prevalence giving the highest sample size (in this case, 30%) in accordance with Fig. 1 that shows 30% will require the largest sample size in that range of 15 to 30% prevalence. Similarly, if the prevalence ranges from 60 to 80% in the recent literature, we should use 60% as it requires the largest sample size in that range.
We would like to caution that some books or guidelines suggest to use expected prevalence 50% if we could not get the prevalence at all [ 2 , 14 , 15 ]. We discourage this practice. In Fig. 1 , we should note that the prevalence of 50% will produce the largest sample size only within the range of 10 and 90% of the prevalence. The required sample size is much higher in the region below 10 and above 90%. Therefore, a short cut of prevalence 50% should not be used. It is best to calculate the sample size with appropriate expected prevalence. Researchers may find possible range of expected prevalence and apply the recommendation in the previous paragraph.
For this illustration, we have drawn Fig. 1 using precision for small scale study (Table 2 ). It means that we use the precision of fixed 5% for the expected prevalence between 10 and 90%, half of the expected prevalence for the expected prevalence less than 10%, and half of the (100 minus expected prevalence) for the expected prevalence larger than 90%.
We always have loss in sample size during the research process due to several reasons, such as non-response, incomplete data, loss-to-follow up, etc. Researchers should estimate the loss with their past experience, and inflate the sample size in calculation accordingly. These losses (especially, non-response, incomplete data, and loss-to-follow up) are very much related to research areas (for example, non-response rate could be higher if we study sexual issues or other sensitive issues) and population that researchers intend to study. Therefore, we recommend researchers to use non-response rates of previous studies of similar research areas and in similar populations.
Although we can put any per cent of the potential loss and inflate the sample size, it doesn’t guarantee that the calculated sample size is valid in terms of representative sample. In general, we would recommend that less than 10% loss would be an acceptable loss. However, there are different opinions on the acceptable per cent of loss or attrition [ 16 ] depending on the type of studies. At least, it is important to note that the higher the loss or attrition, the larger will be the compromise on the validity of the results.
The report of sample size should be reproducible. It means that all parameters used must be reported. There are four parameters namely, level of confidence (mostly 95%), expected prevalence (mostly from literature or pilot study), the precision or margin of error of estimate (decision by researchers) and anticipated loss (experience of researchers) used in the calculation. We should also include the name of the software or calculator with proper reference. Scalex SP calculator has incorporated the draft report for the user to copy and use. It ensures all necessary parameters used are included in the report.
Demonstration of scalex sp and scalar calculator, simple three steps for scalex sp.
Basically, the Scalex SP calculator (Scalex stands for ‘Sample Size Calculator using Excel’, and SP stands for ‘Single Proportion’) (available at: https://sites.google.com/view/sr-ln/ssc ) guides the users in three steps:
Step 1: to type in “Expected Prevalence” in terms of per cent (> 0 to < 100).
Step 2: to type in “Anticipated Loss” in terms of per cent (0 to < 100).
Step 3: to decide and type in the precision of user choice after going through the Sample Size Table. Users may type a precision which is not listed in the table (such as ± 2.5%). Then, Scalex SP will give a draft report for the user.
Major advantage of the Scalex SP calculator is that, it gives users Sample Size Table (Fig. 3 ) in which users can appreciate sample sizes for a range of precision, and appreciate or foresee the CIs in their results. Therefore, it helps users in decision making of selecting precision considering available resources.
We are going to conduct a study to estimate the prevalence of obesity among secondary school children in a district. We managed to find the expected prevalence in the literature as 30%.
When we start the Scalex SP, we see the interface as in Fig. 2 . Then, we fill 30 (30%) for Expected Prevalence. As we experienced 10% non-response in this study population in previous studies, we fill 10% loss (see Fig. 3 ).
Scalex SP interface for Step 1, 2 and 3
Scalex SP with Report
Then, sample sizes given for various precisions are reviewed and we decide to use ± 3% precision as it gives us an acceptable width of 95% CI (27%, 33%), and the sample size ( n = 997) is possible to manage.
Then, we fill in 3 (3%) in Step 3, and Scalex SP gives the draft report as in Fig. 3 .
Authors have written R Script (ScalaR SP.R) and with two command lines as in Fig. 4 (this Script file must be stored at “Working Directory”), will give the same output as Scalex SP.
ScalaR SP—with report
(available at: https://sites.google.com/view/sr-ln/ssc ).
Example of R command as follows:
> ScalarSP( p = 0.3, d = 0.03, loss = 0.1).
p = expected prevalence.
d = precision or margin of error.
loss = anticipated loss or attrition of sample size.
The Scalex calculator is for studies using the specific sampling method such as simple random sampling, systematic sampling, and proportionate-stratified random sampling. For other sampling methods, the calculated sample size should be multiplied with the design effect [ 14 ]. Estimating design effect could be from the literature if it is reported in the previous similar studies. If not, it is a complicated procedure involving data simulation.
The formula used in these calculators (reported in Para 2 above) assumes that the population is unknown and large. If the population is known, the required sample size could be smaller by using a different formula which has population size in the formula. However, if we use the formula with population size and obtain smaller sample size, researchers should analyse the data using ‘finite population correction’ and ‘survey data analysis method’ [ 17 ] instead of standard statistical analyses, to obtain valid results. Therefore, we consider a safer approach, that is, assuming that the population size is unknown both in calculating sample size and also later in data analyses. Therefore, it could be a limitation, if one would like to calculate a sample size with known population size and also using ‘finite population correction’ in their data analyses.
The presented calculators have been designed using Wald’s confidence interval. The limitation of this confidence interval is that, it could go below 0% or above 100% in the confidence intervals if the users specify precision inappropriately in relation to the expected prevalence. Though we could give users a choice to consider other methods of confidence interval such as exact confidence interval, logit-confidence interval, etc. we prevent this issue by recommending the use of appropriate precision in Implementation Paragraph 2.1.2 and Table 2 . We consider this would be a more intuitive approach especially for users with limited statistical knowledge or skills. In any case, with a single method of confidence interval (Wald), we wish to report this limitation for the presented calculators.
With technological advancement, researchers should not calculate sample sizes manually. The software or calculators should help researchers minimize possible error in calculation and also to assist in reporting. However, the use of correct parameters still remains as the responsibility of users. In addition, calculators using free software, will benefit researchers who have limited resources.
The presented calculators, designed for prevalence studies, is available at: ( https://sites.google.com/view/sr-ln/ssc ) for public without asking permission. Authors will continue to use Scalex calculator for other type of studies in the near future.
The presented calculators are beneficial as the calculators incorporate non-response or other loss, indicate the anticipated 95% CI, give a list of sample sizes for a range of precisions therefore, guide to make informed decision for precision, and finally draft a sample size calculation report for scientific reporting.
This paper also includes a number of cautions and recommendations for selecting parameters, especially expected prevalence, precision, and anticipated loss, so that researchers can conduct prevalence studies with more appropriate sample sizes.
Scalex SP calculator.
Project name: sample size calculator project.
Project home page: https://sites.google.com/view/sr-ln/ssc
Operating system(s): Windows.
Programming language: Excel-based.
License: no license required.
Any restrictions to use by non-academics: No restriction.
ScalaR calculator.
Programming language: R language.
This paper doesn’t involve data. However, the free calculator is available here: ( https://sites.google.com/view/sr-ln/ssc ).
Sample Size Calculator using Excel for Single Proportion
Sample Size Calculator using R & RStudio for Single Proportion
Power Analysis and Sample Size
Confidence Interval
Sample Size
Z Statistic
Expected prevalence or proportion
Cochran WG. Sampling Techniques. 3rd ed. New York: John Wiley & Sons; 1977.
Google Scholar
Daniel WW, Cross CL. Biostatistics: A foundation for analysis in the health sciences. 10th ed. New York: John Wiley & Sons; 2013.
Verma JP, Verma P. Determining sample size and power in research studies. Singapore: Springer; 2020.
Chow S-C, Shao J, Wang H, Lokhnygina Y. Sample size calculations in clinical research. New York: chapman and hall/CRC; 2017.
Vallejo A, Muniesa A, Ferreira C, de Blas I. New method to estimate the sample size for calculation of a proportion assuming binomial distribution. Res Vet Sci. 2013;95:405–9. https://doi.org/10.1016/j.rvsc.2013.04.005 .
Article PubMed Google Scholar
Charan J, Biswas T. How to calculate sample size for different study designs in medical research? Indian J Psychol Med. 2013;35:121–6.
Article Google Scholar
NCSS Statistical Software. Power Analysis & Sample Size (PASS). 2022.
Epitools. Epitools - Epidemiological calculators. 2022.
Haynes AG, Lenz A, Stalder O, Limacher A. presize: An R-package for precision-based sample size calculation in clinical research. J Open Source Softw. 2021;6:3118.
Patra P. Sample size in clinical research, the number we need. Int J Med Sci Public Heal. 2012;1:5–9.
Charan J, Kantharia N. How to calculate sample size in animal studies? J Pharmacol Pharmacother. 2013;4:303–6.
Pourhoseingholi MA, Vahedi M, Rahimzadeh M. Sample size calculation in medical studies. Gastroenterol Hepatol Bed Bench. 2013;6:14.
PubMed PubMed Central Google Scholar
Serdar CC, Cihan M, Yücel D, Serdar MA. Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies. Biochem medica. 2021;31:10502. https://doi.org/10.11613/BM.2021.010502 .
Lwanga SK, Lemeshow S. Sample size determination in health studies: a practical manual. Geneva: World Health Organization; 1991.
Maple Tech IL. Calculator.net. 2019. https://www.calculator.net/sample-size-calculator.html?type=1&cl=95&ci=5&pp=50&ps=&x=120&y=21 . Accessed 19 Dec 2019.
Draugalis JR, Plaza CM. Best practices for survey research reports revisited: implications of target population, probability sampling, and response rate. Am J Pharm Educ. 2009;73:1–3.
Heeringa SG, West BT, Berglund PA. Applied survey data analysis (Second Edition). New York: Chapman and Hall/CRC; 2020.
Download references
No acknowledgment required.
This study is not funded by any funding agency.
Authors and affiliations.
PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Jalan Tungku Link, Brunei-Muara BE3119, Gadong, Brunei Darussalam
Lin Naing & Hanif Abdul Rahman
Faculty of Medicine, Bioscience and Nursing, MAHSA University, Bandar Saujana Putra, Jenjarom, Selangor, Malaysia
Rusli Bin Nordin
Centre of Advanced Research (CARe), Universiti Brunei Darussalam, Gadong, Brunei Darussalam
Hanif Abdul Rahman
School of Nursing and Statistics Online Computational Resource (SOCR), University of Michigan, Ann Arbor, MI, USA
Graduate Student, Asia Pacific University of Technology and Innovation, Kuala Lumpur, Malaysia
Yuwadi Thein Naing
You can also search for this author in PubMed Google Scholar
LN contributed in the conception of the work, creating of the software, testing and further development of the software, drafting and revision of the paper. RN contributed in the conception of the work, testing the software, drafting and revision of the paper. HAR contributed in the conception of the work, testing the software, drafting and revision of the paper. YTN contributed in creating of the software, testing and further development of the software, and drafting and revision of the paper. The author(s) read and approved the final manuscript.
Correspondence to Lin Naing .
Ethics approval and consent to participate.
The study did not require ethics approval and consent to participate.
Not applicable.
We do not have any competing interest.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Naing, L., Nordin, R.B., Abdul Rahman, H. et al. Sample size calculation for prevalence studies using Scalex and ScalaR calculators. BMC Med Res Methodol 22 , 209 (2022). https://doi.org/10.1186/s12874-022-01694-7
Download citation
Received : 06 February 2022
Accepted : 22 July 2022
Published : 30 July 2022
DOI : https://doi.org/10.1186/s12874-022-01694-7
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1471-2288
IMAGES
COMMENTS
View sample epidemiology research paper. Browse research paper examples for more inspiration. If you need a health research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our writing service for professional assistance.
Drafting a research paper 2017 3 | P a g e Background Publishing papers is a key part of an effective strategy to disseminate research results and communicate with your peers. The number of papers published in journals is increasing, as is the competition in getting a paper accepted in journals, with increasingly high rejection rates.
Epidemiology research paper topics related to pressing health concerns, emerging diseases, or interventions that can improve health outcomes are likely to be impactful and garner attention from the scientific community. ... Include information on sample size, recruitment strategies, ethical considerations, and any adjustments made for ...
The American Journal of Epidemiology is publishing timely, high-quality articles to further the scientific discourse about COVID-19 and the understanding of the pandemic. Explore the papers. An official journal of John Hopkins Bloomberg School of Public Health. Publishes empirical research findings, opinion pieces, and methodological.
This guide incorporates both Epidemiology Department and Graduate School requirements. Discussion includes; topic development, Human Subjects training, roles of the thesis committee and chair, formatting, writing and revising, submission. Developing and Completing Your Epidemiology MS or MPH Thesis (and Surviving to Tell About It) Table of ...
In this paper, we propose a framework for thinking through the design and conduct of descriptive epidemiologic studies. ... A well-defined research question (causal or descriptive) states: 1) the target population, characterized by person and place, and anchored in time; 2) the outcome, event, or health state or characteristic; and 3) the ...
2. ]. In this issue, after an introductory paper by Kotz et al, Kotz and Cals publish the first of a series of monthly compact one-page papers, each highlighting an essential step in preparing and writing a research paper. This series, containing a total of 12 one-pagers, originates from a PhD student course organized at Maastricht University ...
Abstract. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a zoonotic infection, is responsible for COVID-19 pandemic and also is known as a public health concern. However, so far, the origin of the causative virus and its intermediate hosts is yet to be fully determined. SARS-CoV-2 contains nearly 30,000 letters of RNA that ...
This paper presents a new dataset of infectious disease outbreaks collected from the Disease Outbreak News and the Coronavirus Dashboard produced by the World Health Organization. The dataset ...
An important objective of epidemiological research is to identify risk factors for disease. Depending on the particular question being asked, cohort studies, case-control studies, or cross-sectional studies are conducted. ... the exposure and disease status are examined for a sample from a defined population at the same time point. The ...
As an introduction to quantitative epidemiology, this chapter consists of 9 sections, covering key concepts and major tasks of epidemiology, paradigm of quantitative epidemiology, population , study population, sample, and sampling methods; methods to identify a problem, frame a problem into a research question, defend a selected topic by considering significance, innovation, feasibility, and ...
Abstract. Objective: Epidemiologic and clinical research papers often describe the study sample in the first table. If well-executed, this "Table 1" can illuminate potential threats to internal and external validity. However, little guidance exists on best practices for designing a Table 1, especially for complex study designs and analyses.
Epidemiology is the study of how often diseases occur in different groups of people and why. Epidemiological information is used to plan and evaluate strategies to prevent illness and as a guide to the management of patients in whom disease has already developed. Like the clinical findings and pathology, the epidemiology of a disease is an ...
At-admission prediction of mortality and pulmonary embolism in an international cohort of hospitalised patients with COVID-19 using statistical and machine learning methods. Munib Mesinovic. , Xin ...
Navigating the rapidly growing body of scientific literature on the SARS-CoV-2 pandemic is challenging, and ongoing critical appraisal of this output is essential. We aimed to summarize and critically appraise systematic reviews of coronavirus disease (COVID-19) in humans that were available at the beginning of the pandemic. Nine databases (Medline, EMBASE, Cochrane Library, CINAHL, Web of ...
Theses/Dissertations from 2019 PDF. Evolutionary Dynamics of Influenza Type B in the Presence of Vaccination: An Ecological Study, Lindsey J. Fiedler. PDF. Respiratory Infections and Risk for Development of Narcolepsy: Analysis of the Truven Health MarketScan Database (2008 to 2010) with Additional Assessment of Incidence and Prevalence, Darren Scheer ...
Objective To identify, characterize, and explore author guides on the role, format, and content of protocols for observational epidemiological studies, particularly cohort and case-control studies.
Random sampling. In random sampling every member of the population has the same chance (probability) of being selected into the sample. Using a random sample it is possible to describe quantitatively the relationship between the sample and the underlying population, giving the range of values, called confidence intervals, in which the true population parameter is likely to lie.
This paper attempts to give guidance to non-epidemiologists on how to read and evaluate the quality of epidemiologic studies and their results critically. Different methodological issues for ...
EPIDEMIOLOGY; COVID-19; STATISTICS & RESEARCH METHODS; In introductory epidemiology, students learn about prevalence, an easy to understand concept, defined as 'a proportion that measures disease occurrence of any type of health condition, exposure, or other factor related to health',1 or 'the proportion of persons in a population who have a particular disease or attribute at a specified ...
Metabolic Diseases Promote Cardiovascular Diseases. Hong Wang. Professor Xiaofeng Yang, MD, PhD. Deyu Fang. Juncheng Wei. 2,077 views. 1 article. A journal for scientific exchange across the breadth of epidemiological research. It explores the use of data for Investigates and predicting health outcomes, and assessing the health impact of cli...
1. Acceptable - the thesis research can proceed as proposed. 2. Conditional acceptance - the thesis research can proceed after responding satisfactorily to comments made in the evaluation. A time limit will be imposed. 3. Not acceptable - the thesis research cannot proceed as proposed and requires re-submission of a new research proposal.
In quantitative research, when we take a sample from a study population or eligible population in order to save our resources, there are two important statistical processes namely using a probability sampling method (commonly known as "random sampling") [], and calculating an appropriate sample size [].Both are equally important to ensure a good representative sample for the study population.