Epidemiology Research Paper Topics

Academic Writing Service

Navigating the complex terrain of epidemiology research paper topics can be a challenge for students pursuing a degree in health sciences. These topics extend across various disease patterns, health risks, and preventive measures, offering a broad spectrum for exploration and research. This page serves as a detailed guide, offering a comprehensive list of epidemiology research paper topics divided into ten distinct categories. Additionally, we offer expert advice on how to select an appropriate topic and tips on writing a stellar epidemiology research paper. iResearchNet further supports students with its bespoke writing services, which encompass custom research paper writing on any chosen topic.

100 Epidemiology Research Paper Topics

Epidemiology, as a cornerstone of public health, offers an extensive array of research paper topics, ranging from investigating disease outbreaks to studying preventive healthcare measures. We have divided the epidemiology research paper topics into ten key categories, each housing ten topics. This exhaustive yet not exhaustive list should ignite your curiosity and guide you towards an appealing research question.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% off with 24start discount code.

  • The role of vaccination in controlling epidemics: A case study.
  • HIV/AIDS Epidemiology: A global perspective.
  • The impact of pandemic flu on global health.
  • Emerging infectious diseases: Analysis of risk factors.
  • The effect of environmental changes on the spread of infectious diseases.
  • Antimicrobial resistance: A ticking time bomb.
  • The role of epidemiology in understanding and controlling the spread of Ebola.
  • Vector-borne diseases: The growing challenge.
  • Epidemiology of Tuberculosis in developing countries.
  • Outbreak investigation of foodborne illnesses.
  • The rising burden of diabetes globally: An epidemiological analysis.
  • Cardiovascular diseases: Risk factors and preventive measures.
  • Lung cancer epidemiology: A comprehensive review.
  • The role of lifestyle in the epidemiology of obesity.
  • Alzheimer’s disease: Investigating the prevalence and risk factors.
  • Epidemiology of mental health disorders.
  • Osteoporosis: A silent epidemic.
  • Epidemiology of autoimmune diseases: A comprehensive review.
  • Stroke epidemiology: Prevalence, risk factors, and prevention.
  • Epidemiological study of chronic kidney disease.
  • Air pollution and respiratory diseases: An epidemiological perspective.
  • The impact of climate change on the spread of infectious diseases.
  • Occupational hazards and worker health: An epidemiological study.
  • Lead exposure and its impact on children’s health.
  • Waterborne diseases: The role of clean water in disease prevention.
  • Noise pollution and cardiovascular health: An epidemiological study.
  • The effects of pesticide exposure on human health.
  • Urbanization and its impact on health: An epidemiological approach.
  • The impact of natural disasters on public health.
  • Radiation exposure and cancer risk: An epidemiological perspective.
  • Socioeconomic status and health outcomes: An epidemiological study.
  • Epidemiology of drug abuse: Social and public health implications.
  • Health disparities in minority populations: An epidemiological analysis.
  • The impact of education level on health outcomes.
  • Homelessness and health: An epidemiological study.
  • Violence as a public health issue: An epidemiological perspective.
  • The effect of social media on mental health: An epidemiological approach.
  • The role of social determinants in maternal and child health.
  • The impact of migration on public health: An epidemiological study.
  • The role of social support in disease outcomes and recovery.
  • The epidemiology of dementia in the elderly.
  • Falls in the elderly: Risk factors and prevention.
  • The epidemiology of sarcopenia in older adults.
  • The role of lifestyle factors in healthy aging.
  • The impact of social isolation on health in the elderly.
  • Epidemiology of frailty in older adults.
  • The influence of aging on cancer epidemiology.
  • The role of epidemiology in understanding and managing age-related macular degeneration.
  • Epidemiology of osteoarthritis in the elderly.
  • The burden of cardiovascular diseases in the elderly.
  • The role of genetic factors in the epidemiology of breast cancer.
  • Genetic determinants of cardiovascular diseases.
  • Twin studies in genetic epidemiology: A review.
  • Genetic epidemiology of type 2 diabetes.
  • The role of genetic epidemiology in personalized medicine.
  • Genetic factors in Alzheimer’s disease: An epidemiological perspective.
  • The role of genomics in the epidemiology of infectious diseases.
  • Genetic epidemiology of autism spectrum disorder.
  • Understanding the genetics of obesity through epidemiological studies.
  • The influence of genetic factors on the epidemiology of mental disorders.
  • The role of diet in the prevention of chronic diseases: An epidemiological perspective.
  • The influence of nutrition on the epidemiology of cancer.
  • Understanding the role of nutritional factors in the epidemiology of diabetes.
  • Nutrition and cardiovascular health: An epidemiological review.
  • The impact of nutritional epidemiology on public health policy.
  • The role of nutrition in the epidemiology of mental health disorders.
  • Obesity and diet: An epidemiological study.
  • The impact of malnutrition on the health of developing nations.
  • Nutritional epidemiology and aging: A review.
  • Food insecurity and public health: An epidemiological study.
  • Maternal health and pregnancy outcomes: An epidemiological perspective.
  • The epidemiology of infertility: A review.
  • The role of epidemiology in understanding and preventing preterm birth.
  • The impact of reproductive health on global public health.
  • Epidemiological factors influencing maternal mortality.
  • The role of reproductive epidemiology in contraception and family planning.
  • The influence of environmental and occupational factors on reproductive health.
  • Sexually transmitted infections: An epidemiological review.
  • Epidemiology of ovarian cancer: Risk factors and prevention.
  • The role of epidemiology in understanding and preventing cervical cancer.
  • The epidemiology of childhood obesity: A comprehensive review.
  • Child mental health: An epidemiological perspective.
  • The impact of vaccination on child health: An epidemiological study.
  • Understanding the epidemiology of pediatric cancers.
  • The role of epidemiology in understanding and preventing child malnutrition.
  • Childhood asthma: An epidemiological review.
  • The influence of social factors on child health: An epidemiological perspective.
  • The epidemiology of birth defects: Risk factors and prevention.
  • The impact of the environment on children’s health: An epidemiological study.
  • The role of epidemiology in understanding and preventing sudden infant death syndrome (SIDS).
  • The role of pharmacoepidemiology in drug safety: A review.
  • Understanding drug interactions through pharmacoepidemiology.
  • The influence of pharmacoepidemiology on drug policy and regulation.
  • Drug-induced diseases: An epidemiological perspective.
  • The role of pharmacoepidemiology in the opioid crisis.
  • The impact of pharmacogenetics on the epidemiology of adverse drug reactions.
  • The use of real-world data in pharmacoepidemiology.
  • The influence of pharmacoepidemiology on the development of personalized medicine.
  • The epidemiology of medication adherence: A review.
  • The role of pharmacoepidemiology in the surveillance of drug abuse.

With this comprehensive list of epidemiology research paper topics, you should be able to identify an area of interest that aligns with your passion, your course requirements, and the overall impact on the health sector. The field of epidemiology is vast and constantly evolving, providing a wealth of opportunities for engaging, impactful, and original research.

Choosing Epidemiology Research Paper Topics

As you embark on your journey to write an epidemiology research paper, selecting the right topic is crucial. Your choice of topic will shape the direction of your study and determine the relevance and impact of your research. Here, we provide you with expert advice on how to choose epidemiology research paper topics that are engaging, focused, and contribute to the field of public health. Consider the following ten tips to guide you in this process:

  • Identify your research interests : Start by exploring epidemiology research paper topics that genuinely interest you. Consider your passion for specific areas within epidemiology, such as infectious diseases, chronic diseases, or environmental health. Choosing a topic aligned with your interests will fuel your motivation throughout the research process.
  • Define the scope : Narrow down the scope of your research topic to make it manageable and focused. A broad topic may be overwhelming and difficult to address comprehensively within the scope of a research paper. Refine your topic to a specific aspect or population to ensure depth and clarity.
  • Consult with your advisor : Engage in discussions with your advisor or mentors who have expertise in epidemiology. They can provide valuable guidance, suggest potential epidemiology research paper topics, and help you identify research gaps within the field. Their experience and insights can be instrumental in selecting a suitable topic.
  • Review current literature : Conduct a thorough review of existing literature in epidemiology to understand the latest trends, emerging issues, and ongoing debates. Identifying gaps or areas that require further exploration will help you develop research questions and select a topic that adds value to the existing knowledge.
  • Consider societal relevance : Choose a topic that has practical implications for public health and contributes to the well-being of communities. Epidemiology research paper topics related to pressing health concerns, emerging diseases, or interventions that can improve health outcomes are likely to be impactful and garner attention from the scientific community.
  • Access to data and resources : Ensure that you have access to relevant data sources, databases, and research tools necessary to conduct your study. Consider the feasibility of obtaining the required data and resources when selecting your topic. Availability of data will greatly influence the design and outcomes of your research.
  • Collaborate with experts : Collaborating with researchers or professionals in epidemiology can provide valuable insights and enhance the quality of your research. Engage in interdisciplinary collaborations to broaden the scope of your topic and gain diverse perspectives.
  • Address research gaps : Identify gaps in the existing literature and choose a topic that fills these gaps. By addressing unanswered research questions or challenging prevailing assumptions, you contribute to the advancement of knowledge and stimulate further research in the field.
  • Consider ethical implications : Evaluate the ethical considerations associated with your chosen topic. Ensure that your research design and data collection methods align with ethical guidelines and protect the rights and privacy of individuals or communities involved in the study.
  • Seek innovation : Explore innovative approaches or methodologies within epidemiology. Consider using advanced statistical techniques, incorporating new technologies (e.g., big data, machine learning), or employing novel study designs to bring a fresh perspective to your research topic.

In conclusion, selecting an appropriate epidemiology research paper topic requires thoughtful consideration and careful planning. By identifying your research interests, defining the scope, consulting with advisors, reviewing literature, and considering societal relevance and data availability, you can choose a topic that aligns with your passion, contributes to the field, and has the potential for meaningful impact. Remember to address research gaps, collaborate with experts, consider ethical implications, and seek innovative approaches to make your research stand out.

How to Write an Epidemiology Research Paper

Writing an epidemiology research paper requires a systematic approach to effectively communicate your findings and contribute to the field of public health. From formulating research questions to interpreting results, each step in the process plays a crucial role in producing a well-structured and impactful paper. In this section, we provide you with ten valuable tips on how to write an epidemiology research paper that is concise, clear, and scientifically sound.

  • Understand the structure : Familiarize yourself with the typical structure of a research paper in epidemiology. This includes the introduction, methods, results, and discussion sections. Each section serves a specific purpose and contributes to the overall coherence and clarity of your paper.
  • Conduct a thorough literature review : Before diving into your own research, conduct a comprehensive literature review to understand the existing knowledge and gaps in the field. This will help you position your research within the context of previous studies and identify the unique contributions of your work.
  • Formulate research questions : Clearly define your research questions based on the gaps identified in the literature. Your research questions should be specific, measurable, and aligned with the objectives of your study. They will guide your data collection and analysis.
  • Collect and analyze data : Utilize appropriate data collection methods and ensure the quality and reliability of your data. Employ rigorous statistical analysis techniques to draw meaningful conclusions from your data. Adhere to best practices in data handling and analysis to ensure the validity of your findings.
  • Interpret results objectively : Present your results in a clear and concise manner, using appropriate tables, graphs, or charts. Interpret your findings objectively and avoid overgeneralization or speculation. Discuss any limitations of your study that may have influenced the results.
  • Write a compelling introduction : Craft an engaging introduction that provides a concise overview of the research problem, rationale, and objectives. Clearly state the significance of your research and how it addresses existing gaps in knowledge. Hook your readers by highlighting the relevance of your study to public health.
  • Describe methods accurately : Provide a detailed description of your study design, data collection procedures, and statistical analysis methods. Include information on sample size, recruitment strategies, ethical considerations, and any adjustments made for confounding variables. This transparency ensures reproducibility and allows readers to assess the validity of your study.
  • Present results effectively : Organize your results section logically, presenting the key findings in a structured manner. Use clear and concise language to describe statistical analyses, effect sizes, and p-values. Supplement your text with visual aids such as tables or figures to enhance the understanding of your results.
  • Engage in critical discussion : Interpret your results in the context of existing literature and discuss their implications for public health. Analyze any unexpected or contradictory findings and propose potential explanations. Address the strengths and limitations of your study and suggest avenues for future research.
  • Conclude with impact : Craft a strong conclusion that summarizes the key findings and their significance. Emphasize the contributions of your research to the field of epidemiology and its potential implications for public health policy or practice. Avoid introducing new information in the conclusion and reiterate the main takeaways of your study.

In conclusion, writing an epidemiology research paper requires a structured and systematic approach. By understanding the paper’s structure, conducting a thorough literature review, formulating clear research questions, collecting and analyzing data, and interpreting results objectively, you can produce a scientifically rigorous paper. Additionally, focus on writing a compelling introduction, accurately describing methods, presenting results effectively, engaging in critical discussion, and concluding with impact. Following these tips will enhance the clarity, coherence, and impact of your epidemiology research paper.

iResearchNet’s Writing Services

At iResearchNet, we understand the challenges that students face when writing an epidemiology research paper. To alleviate your burden and ensure the success of your academic endeavors, we offer a comprehensive range of writing services tailored specifically for students studying health sciences. With our expert degree-holding writers, custom written works, in-depth research, and customized formatting options, we strive to provide you with top-quality, customized solutions. Here are thirteen tips to help you make the most of iResearchNet’s writing services:

  • Expert degree-holding writers : Our team consists of highly qualified writers with advanced degrees in epidemiology and related fields. They possess extensive knowledge and experience in writing research papers, ensuring that your paper is in capable hands.
  • Custom written works : We understand that each research paper is unique, requiring a customized approach. Our writers will develop a tailored research paper that meets your specific requirements, addressing your chosen topic and research questions.
  • In-depth research : Our writers conduct thorough and up-to-date research to ensure the scientific rigor and relevance of your paper. They have access to a wide range of academic resources, databases, and journals, enabling them to gather comprehensive information for your research topic.
  • Custom formatting : We offer custom formatting options in various citation styles, including APA, MLA, Chicago/Turabian, and Harvard. Our writers are well-versed in these formatting guidelines and will ensure that your paper adheres to the required style consistently.
  • Top quality : We prioritize delivering high-quality research papers that meet academic standards. Our writers adhere to rigorous quality control measures, conducting meticulous editing and proofreading to ensure the accuracy, coherence, and clarity of your paper.
  • Customized solutions : Our services are tailored to your unique needs. Whether you require assistance with selecting a research topic, writing specific sections, or an entire research paper, our writers can provide customized solutions that align with your requirements.
  • Flexible pricing : We understand that students have varying budgetary constraints. Therefore, we offer flexible pricing options to accommodate different financial situations. You can select a pricing plan that suits your budget while ensuring the highest quality of work.
  • Short deadlines : We recognize that students often face tight deadlines. With our short deadline option, you can place an order and receive a well-written research paper within as little as three hours. We prioritize promptness without compromising on quality.
  • Timely delivery : We value the importance of meeting deadlines. Our writers work diligently to ensure your research paper is delivered on time, allowing you sufficient time to review the content and make any necessary revisions.
  • 24/7 support : Our customer support team is available round-the-clock to assist you with any queries or concerns you may have. Whether you need updates on your order or have questions about our services, our dedicated support staff is here to provide prompt assistance.
  • Absolute privacy : We prioritize the confidentiality of our clients. Your personal information and order details are treated with the utmost privacy and are never shared with third parties. You can trust that your identity and academic integrity are safeguarded.
  • Easy order tracking : We provide a user-friendly platform that allows you to easily track the progress of your order. You can stay updated on the status of your research paper and communicate with your assigned writer conveniently.
  • Money-back guarantee : We are committed to ensuring your satisfaction with our services. If, for any reason, you are not completely satisfied with the delivered research paper, we offer a money-back guarantee to provide you peace of mind.

At iResearchNet, we are dedicated to supporting your academic success by providing top-quality writing services specifically tailored to epidemiology research papers. Our expert writers, customized solutions, in-depth research, and attention to detail will help you excel in your studies and achieve your goals. Place an order with iResearchNet today and experience the convenience and excellence of our services.

Unlock Your Potential with iResearchNet!

Are you feeling overwhelmed by the prospect of writing your epidemiology research paper? Don’t worry, iResearchNet is here to support you every step of the way. With our team of expert writers and a range of tailored services, we guarantee top-quality, customized solutions that will help you excel in your academic pursuits. Place your order today and experience the convenience and excellence of our writing services.

At iResearchNet, we understand the challenges faced by students in the health sciences. Our mission is to provide you with the highest level of support and expertise as you navigate the complexities of writing an epidemiology research paper. With our team of dedicated writers, custom-tailored solutions, and commitment to quality, we ensure that your research paper is in capable hands.

Don’t let the stress and time constraints of writing a research paper hold you back. Trust iResearchNet to deliver a well-written, scientifically sound, and impactful research paper that will impress your professors and contribute to the field of epidemiology. Our comprehensive writing services, combined with our commitment to meeting your unique requirements, make us the ideal partner for your academic success.

By choosing iResearchNet, you benefit from the knowledge and expertise of our degree-holding writers who are well-versed in the intricacies of epidemiology research. Our commitment to in-depth research, adherence to formatting guidelines, and emphasis on top-quality writing ensure that your research paper meets the highest standards of academic excellence.

We offer flexible pricing options to accommodate your budget, including short deadlines of up to 3 hours for those urgent requests. Our 24/7 support team is available to address any inquiries or concerns you may have, providing you with peace of mind throughout the writing process. Rest assured that your privacy is our utmost priority, and we guarantee the absolute confidentiality of your personal information.

With iResearchNet, you can easily track the progress of your order and collaborate with your assigned writer, ensuring a seamless and personalized experience. We also offer a money-back guarantee to ensure your satisfaction with our services. Your success is our ultimate goal, and we are committed to delivering exceptional results.

Don’t let the stress of writing an epidemiology research paper weigh you down. Take advantage of iResearchNet’s expertise and unlock your full potential. Place your order today and experience the convenience, quality, and customized solutions that iResearchNet offers. Your academic success is just a click away!

ORDER HIGH QUALITY CUSTOM PAPER

sample research paper epidemiology

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • 100 years of the AJE
  • Collections
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • About American Journal of Epidemiology
  • About the Johns Hopkins Bloomberg School of Public Health
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Society for Epidemiologic Research

Article Contents

Abbreviations, a well-defined question, specifying the target population (and its relationship to the study sample), intermission: missing data, defining the outcome, specifying a measure of occurrence, the role of covariates, conclusions, acknowledgments.

  • < Previous

A Framework for Descriptive Epidemiology

  • Article contents
  • Figures & tables
  • Supplementary Data

Catherine R Lesko, Matthew P Fox, Jessie K Edwards, A Framework for Descriptive Epidemiology, American Journal of Epidemiology , Volume 191, Issue 12, December 2022, Pages 2063–2070, https://doi.org/10.1093/aje/kwac115

  • Permissions Icon Permissions

In this paper, we propose a framework for thinking through the design and conduct of descriptive epidemiologic studies. A well-defined descriptive question aims to quantify and characterize some feature of the health of a population and must clearly state: 1) the target population, characterized by person and place, and anchored in time; 2) the outcome, event, or health state or characteristic; and 3) the measure of occurrence that will be used to summarize the outcome (e.g., incidence, prevalence, average time to event, etc.). Additionally, 4) any auxiliary variables will be prespecified and their roles as stratification factors (to characterize the outcome distribution) or nuisance variables (to be standardized over) will be stated. We illustrate application of this framework to describe the prevalence of viral suppression on December 31, 2019, among people living with human immunodeficiency virus (HIV) who had been linked to HIV care in the United States. Application of this framework highlights biases that may arise from missing data, especially 1) differences between the target population and the analytical sample; 2) measurement error; 3) competing events, late entries, loss to follow-up, and inappropriate interpretation of the chosen measure of outcome occurrence; and 4) inappropriate adjustment.

human immunodeficiency virus

North American AIDS Cohort Collaboration on Research and Design

Editor’s note:    An invited commentary on this article appears on page 2071, and the authors’ response appears on page 2073.

Epidemiologic questions arguably exist on a continuum from purely descriptive to purely causal. To be concise, we ignore prediction questions here. There are several frameworks intended to help guide causal analyses ( 1 , 2 ), but the literature on theoretical and practical guidance for conducting descriptive analyses is limited. Here we present a framework for conducting descriptive epidemiologic studies. Many, if not all, of the considerations discussed in this framework apply to estimation of valid causal effects in a population, although they may be frequently overlooked. Where there may be differences in analytical decisions depending on the type of study question, we highlight them. We summarize guidance provided herein in Table 1 in the form of a checklist modeled after the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines ( 3 ).

Items That Should Be Included in Reports of Descriptive Studies

Title and abstract1Explicitly state that this is a “descriptive study” in the title or the abstract.
2Summarize the target population and provide an informative and balanced summary of estimated disease occurrence in the abstract.
Introduction
 Background/rationale3State the motivation for the study, including, where relevant, the action that might be informed by the results.
 Objectives4State the descriptive estimand, explicitly including:
(a) the target population (who would be affected by any decisions made as a result of the study?);
(b) the health state to be summarized;
(c) the measure of occurrence; and
(d) any stratification variables, if applicable.
Methods
 Study design5(a) State whether the study is cross-sectional or longitudinal.
(b) Restate the measure of occurrence being targeted.
(c) If the study is longitudinal, specify the time origin and follow-up period for the measure of occurrence; if the study is cross-sectional, specify the time anchor at which the health state is summarized for individuals.
 Setting6Describe any relevant features of the place and time in which the target population resides and across which data were collected.
 Participants7(a) Describe the target population thoroughly in terms of person, place, and time.
(b) Describe sampling into the study population (whether sampling was explicit or implicit, e.g., by inclusion in an administrative database); this includes eligibility criteria (see recommendations on data sources in item 10 below).
(c) Describe any restrictions on the analytical sample.
 Outcome(s)8(a) State when and how the outcome is measured.
(b) Include estimates or discussion of the sensitivity and specificity of the study outcome definition relative to the gold standard.
(c) List secondary outcomes or competing events of interest.
 Covariates9Specify any stratification or adjustment variables—clearly define how variables were collected or constructed.
 Data sources/measurement10Clearly delineate any inclusion/exclusion criteria for membership in the data source, including the original purpose for which the data were collected, if not for the study at hand.
 Bias11Describe any assumptions or methods used to extrapolate data from the analytical sample to the study population and from the study population to the target population.
 Statistical methods12(a) Describe the primary statistical methods used to estimate the measure of disease occurrence being targeted; discuss assumptions of that method in light of data limitations (e.g., assumption of independent censoring for people lost to follow-up).
(b) If any adjustment/standardization will be done, state the goal of such adjustment.
Results
 Participants13Report numbers of individuals at each study stage (this is likely to be approximate for the target population); consider summarizing this information in a flow diagram.
 Descriptive data14(a) Report on the characteristics of the analytical sample in a “Table 1.”
(b) Indicate the number of participants with missing data for each variable used in the analysis.
(c) If any weighting or imputation is done to reconstruct the study sample or target populations, include columns for those populations.
 Outcome data15(a) Present an overall (unstratified) estimate of the measure of occurrence of interest.
(b) Report “crude” (raw data in the analytical sample) and (if applicable) “corrected” (after any weighting or imputation) estimates.
 Other analyses16Present prespecified stratum-specific or adjusted/standardized results.
Discussion
 Key results17Summarize key results with reference to the study objectives.
 Limitations18Summarize potential sources of selection bias and measurement error and any attempts to mitigate these biases. Discuss both the direction and magnitude of any potential bias. Integrating quantitative bias analysis into the study to guide these discussions is encouraged.
 Interpretation19(a) Avoid causal interpretations of descriptive results; avoid overinterpreting stratum-specific differences in measures of occurrence.
(b) Describe how results of this study might inform or improve public health or clinical practice.
Title and abstract1Explicitly state that this is a “descriptive study” in the title or the abstract.
2Summarize the target population and provide an informative and balanced summary of estimated disease occurrence in the abstract.
Introduction
 Background/rationale3State the motivation for the study, including, where relevant, the action that might be informed by the results.
 Objectives4State the descriptive estimand, explicitly including:
(a) the target population (who would be affected by any decisions made as a result of the study?);
(b) the health state to be summarized;
(c) the measure of occurrence; and
(d) any stratification variables, if applicable.
Methods
 Study design5(a) State whether the study is cross-sectional or longitudinal.
(b) Restate the measure of occurrence being targeted.
(c) If the study is longitudinal, specify the time origin and follow-up period for the measure of occurrence; if the study is cross-sectional, specify the time anchor at which the health state is summarized for individuals.
 Setting6Describe any relevant features of the place and time in which the target population resides and across which data were collected.
 Participants7(a) Describe the target population thoroughly in terms of person, place, and time.
(b) Describe sampling into the study population (whether sampling was explicit or implicit, e.g., by inclusion in an administrative database); this includes eligibility criteria (see recommendations on data sources in item 10 below).
(c) Describe any restrictions on the analytical sample.
 Outcome(s)8(a) State when and how the outcome is measured.
(b) Include estimates or discussion of the sensitivity and specificity of the study outcome definition relative to the gold standard.
(c) List secondary outcomes or competing events of interest.
 Covariates9Specify any stratification or adjustment variables—clearly define how variables were collected or constructed.
 Data sources/measurement10Clearly delineate any inclusion/exclusion criteria for membership in the data source, including the original purpose for which the data were collected, if not for the study at hand.
 Bias11Describe any assumptions or methods used to extrapolate data from the analytical sample to the study population and from the study population to the target population.
 Statistical methods12(a) Describe the primary statistical methods used to estimate the measure of disease occurrence being targeted; discuss assumptions of that method in light of data limitations (e.g., assumption of independent censoring for people lost to follow-up).
(b) If any adjustment/standardization will be done, state the goal of such adjustment.
Results
 Participants13Report numbers of individuals at each study stage (this is likely to be approximate for the target population); consider summarizing this information in a flow diagram.
 Descriptive data14(a) Report on the characteristics of the analytical sample in a “Table 1.”
(b) Indicate the number of participants with missing data for each variable used in the analysis.
(c) If any weighting or imputation is done to reconstruct the study sample or target populations, include columns for those populations.
 Outcome data15(a) Present an overall (unstratified) estimate of the measure of occurrence of interest.
(b) Report “crude” (raw data in the analytical sample) and (if applicable) “corrected” (after any weighting or imputation) estimates.
 Other analyses16Present prespecified stratum-specific or adjusted/standardized results.
Discussion
 Key results17Summarize key results with reference to the study objectives.
 Limitations18Summarize potential sources of selection bias and measurement error and any attempts to mitigate these biases. Discuss both the direction and magnitude of any potential bias. Integrating quantitative bias analysis into the study to guide these discussions is encouraged.
 Interpretation19(a) Avoid causal interpretations of descriptive results; avoid overinterpreting stratum-specific differences in measures of occurrence.
(b) Describe how results of this study might inform or improve public health or clinical practice.

We define a descriptive epidemiologic question as one that aims to quantify some feature of the health of a population and, often, to characterize the distribution of that feature across the population. The estimand for causal analyses is a contrast of potential outcomes in a single population, where the potential outcomes are those we would expect to observe under some hypothetical intervention ( 1 , 4 – 7 ). The fundamental problem of causal inference is that we cannot observe all of these potential outcomes ( 8 ). The estimand for descriptive analyses is a function of the outcomes that occurred for everyone in the target population. The estimation challenge for descriptive analyses is that we may not completely observe all of the actual outcomes. A descriptive analysis might be cross-sectional or longitudinal; it might concern a dichotomous, categorical, or continuous outcome; and it might attempt to summarize the outcome in any number of ways (e.g., median time to some event, mean value, etc.). While much discussion focuses on the most common scenarios (e.g., dichotomous outcomes), this framework is intended to be applied to descriptive analyses for any combination of study designs, outcomes, and estimands.

We start with the premise that good epidemiologic questions are impactful and well-defined. An impactful question, if answered, would lead to knowledge that could inform action in the population it concerns ( 7 ). A well-defined question should be stated with enough specificity and clarity that answering it is at least theoretically possible.

A well-defined research question (causal or descriptive) states: 1) the target population, characterized by person and place, and anchored in time; 2) the outcome, event, or health state or characteristic; and 3) the measure of occurrence that will be used to summarize the outcome (e.g., incidence, prevalence, average time to event, etc.). A causal question requires specifying additional components, such as exposures and covariates that are thought to be confounders, effect modifiers, or mediators. For descriptive questions, consideration of additional variables is optional, but if auxiliary variables will be considered, a well-defined descriptive question will 4) prespecify any other variables of interest and how they will be considered (e.g., to characterize the population, as a stratification factor to characterize the outcome distribution, or as a “nuisance” variable that we would like to adjust for or standardize over). For a descriptive question, indiscriminate adjustment for these other variables can lead to uninterpretable results that may mislead ( 9 ); as such, researchers should be clear as to the purpose of adjustment in descriptive studies, understand the implications of such adjustments, and be cautious in interpreting adjusted statistics ( 10 ).

Example : We illustrate application of this framework to description of one portion of the human immunodeficiency virus (HIV) care continuum ( 11 ): What was the prevalence of viral suppression on December 31, 2019, among adults living with HIV who had been linked to HIV care (i.e., saw a clinician who was aware of their HIV status and had the ability to prescribe antiretroviral therapy) in the United States? We will explore specific components of this question to make it more well-defined (and tie those components to analytical decisions) below.

For a descriptive question, we define the target population as the group in which we would like to characterize the distribution of the outcome. The choice of target population is directly linked to the purpose of asking the question. The target population might be, for example, the population for which we will be providing public health services. The target population is not necessarily enumerated (in contrast to a cohort or a sample), but we do need to be able to define membership in terms of person, place, and time (here, time is used to define membership in the target population and does not relate directly to measurement of the outcome). For our example question, the target population is everyone living in the United States ( place ) who was aged ≥18 years, was infected and diagnosed with HIV, and attended ≥1 clinical visit for HIV care with a clinician who was aware of their infection and could prescribe antiretroviral medication ( person ) before December 31, 2019, and was alive through December 31, 2019 ( time ).

A well-defined question specifies the target population a priori. When data are available on a full census of the target population (e.g., through administrative records or public health surveillance), no sampling is needed. However, when data on the entire population cannot be obtained, we rely on data from a sample of the target population or a population that we hope is sufficiently representative of the target population with respect to both measured and unmeasured characteristics. The study sample is the enumerated set of individuals whose information is captured in a data set, among whom we attempt to measure occurrence of the outcome (after inclusion and exclusion criteria have been applied, if data were not collected using these criteria (e.g., administrative data)). Many descriptive and causal questions are answered using convenience samples without a clear sampling frame (e.g., people recruited using Web-based surveys, frequent clinic attendees, or people who sought medical care in a particular hospital system) and implicitly assume that the study sample is a random sample (perhaps conditional on covariates with known sampling probabilities) of the target population. Achieving a representative sample may involve considerable work and may be very resource-intensive ( 12 ). However, use of convenience samples often results in study samples that are different from the target population in unmeasurable ways, particularly when subjects must actively seek out or opt into participation ( 13 ).

On the topic of sampling and selection, it is also useful to define the analytical sample as a proper subset of the study sample in which disease occurrence is measured given practical limitations (e.g., excluding individuals in the study sample who are missing information on the outcome). We might use information from the analytical sample to attempt to quantify disease occurrence in the study sample, but we must rely on assumptions to do so (e.g., assuming data are missing at random and imputing missing data or reweighting study participants with complete data). For valid inferences, the incidence of the outcome in the sample must be able to stand in for the incidence in the target population. Here, the “sample” is either the analytical sample or the study sample represented by the analytical sample after any attempts to handle missing data. Given the many practical challenges enumerated above, the samples we rely on in our studies are rarely representative of the target population. If the distribution of risk factors for the health state differs between the study sample and the target population, we have a lack of generalizability ( 14 – 16 ); the absolute value (risk, prevalence, rate) of the outcome in the sample will differ from what we would have observed in the target population. Without applying quantitative approaches to generalize data from the sample to the target population, descriptive results will be biased. Except in special cases (e.g., when the selected estimand is the one scale on which effect measure modification is absent), if absolute measures differ between the sample and the target, most contrasts of the outcome across exposure groups in the sample will also be biased for the same contrasts in the target population (causal results will be biased) ( 14 – 16 ). If the underlying joint distribution of all causes of the outcome differs between the analytical sample and the study sample, we have selection bias ( 17 , 18 ). To recover an estimand relevant to the target population from an analytical sample with a different distribution of causes of the outcome, stratification and standardization methods may be appropriate.

Example : Recall that the target population is everyone living in the United States who had been linked to clinical care for HIV before December 31, 2019. There is mandated reporting in the United States of new HIV diagnoses and HIV viral load test results to public health surveillance agencies under national notifiable disease regulations, and the Centers for Disease Control and Prevention aggregates these data from all states and dependent areas. This might seem like a census of the target population. However, despite these mandates, not all diagnoses are reported, and people who move across state lines may be double-counted because of challenges with deduplication. Thus, the number of people with HIV infection may be inaccurate. Additionally, data rely on HIV viral load and CD4 cell-count laboratory tests as a proxy for clinical visits, and the proxy is imperfect ( 19 , 20 ); thus, we cannot accurately apply the second inclusion criterion for target population membership: linkage to clinical care. Alternatively, we might use data from the North American AIDS Cohort Collaboration on Research and Design (NA-ACCORD) ( 21 ) or another clinical cohort of people with HIV who have been linked to care. However, clinical cohort studies are often nested within academic medical centers, where the quality of care and wraparound services may differ (and thus the probability of the outcome, viral suppression, may differ), and may have stricter enrollment criteria (to preserve study resources) than we have used to define linkage to care for our target population.

There are other options for study samples we might try to leverage. We might even choose to estimate the parameter of interest in multiple samples and triangulate the results. The point is that there is rarely a single, perfect, existing study sample that can stand in for the target population. Therefore, if we wish to use existing data, identifying ways in which the study sample and the target population differ provides a framework for thinking about sources of bias and how we might adjust the estimate for better inferences.

A theme of many threats to descriptive and causal epidemiologic inference is that they can often be cast as missing-data problems ( 22 ). The ideal data set for answering our descriptive epidemiologic question includes a row for everyone in the target population and columns with values for the outcome and any covariates of interest. When the study sample is not a census of the target population, anyone in the target population who is not in the study sample will have missing data in some, if not all, columns. Indeed, without a clear sampling frame, we do not even know how many rows are missing from our ideal data set (and we cannot quantify the amount of missing data from this ideal study). Analyzing the study sample as if it were a random sample of the target population is akin to assuming that data are missing completely at random. If, instead, it is plausible to assume that data are missing at random conditional on covariates that are available for target population members who were not selected for the study sample, we could reweight or standardize the study sample to represent the full target population.

Example : The surveillance data include everyone in the target population (age ≥18 years, alive, diagnosed with HIV, and ≥1 HIV care visit before December 31, 2019), but they also include some people who are not in the target population (they include people who did not make ≥1 HIV care visit with a clinician who might prescribe antiretroviral medications), and we are unable to definitively identify people in the surveillance data who do not meet the inclusion criteria for the target population (we have to rely on laboratory tests as a proxy for clinical visits) ( 19 ). However, the surveillance data likely are closer to representing the target population than the NA-ACCORD data (which do not include everyone in the target population, although they do not include anyone who should be excluded from the target population). Therefore, we might use surveillance data for our primary analyses, but we might conduct secondary analyses that leverage the relative strengths of the different study samples and, for example, reweight NA-ACCORD data that include visits to resemble the target population implied by the surveillance data.

To describe the occurrence, frequency, or relative frequency of an outcome, we need an unambiguous definition of that outcome, and we must be able to apply that definition in our data. In the absence of a gold standard or the ability to apply that gold standard due to data or resource constraints, we must understand how imperfect sensitivity and specificity might affect our results. Measurement error has previously been described as a missing-data problem ( 22 ) in which the true outcome is missing and we overwrite that missing value with a mismeasured outcome. To the extent to which the mismeasured outcome is a poor substitute for the true outcome, our inferences will be biased.

Example : Our outcome is “viral suppression” on December 31, 2019, but there is no single, standard threshold for suppression. Prior studies have used plasma HIV RNA levels of <20, <50, <200, or <400 copies/mL ( 23 ). Lower thresholds will result in a lower estimate of the prevalence of viral suppression; for example, in an HIV clinical cohort in Baltimore, Maryland, the proportion of patients estimated to have a suppressed viral load in a given year from 2010 to 2018 was 75% if the threshold for suppression was set at <20 copies/mL but 89% if the threshold was set at <400 copies/mL ( 24 ). Failure to suppress viral load below a lower threshold may also be a more sensitive indicator of subsequent morbidity and mortality ( 24 – 28 ), but suppression below a higher threshold is more relevant as an indicator of an individual’s transmission potential ( 29 , 30 ), so our choice of threshold may depend on how our results will be used. Additionally, not everyone in either of our candidate study samples will have had a viral load measurement on December 31, 2019, exactly. Typically, researchers accept viral loads measured within a time window around some key date as indicative of the viral load on that key date. We must decide how wide a window we are willing to use to answer our question. The width we are willing to tolerate might depend on how frequently we anticipate viral load changes in the population. A wider window risks assigning a viral load value to December 31 that is inaccurate because viral load has changed since measurement, while a narrower window will result in a larger proportion of the cohort with a missing viral-load value.

We have multiple options for measures of occurrence, and like the proverbial blind men feeling the elephant, our choice of measure of occurrence might give us only part of the complete picture about the distribution of the outcome in the target population. Incidence tells us something about how frequently an event occurs over time. There are multiple measures of incidence; in the interest of space, we will restrict our discussion to risks and rates. If individuals are not followed over time and the event can recur, it may be difficult to distinguish the number of affected individuals from the number of events. Prevalent outcomes are often not of interest in causal investigations, as temporality is more challenging to determine and reverse causation is a potential problem. In addition, survival bias might affect results when considering prevalent exposures ( 31 , 32 ). Finally, prevalence is a function of the incidence of the condition and its duration, such that, if incidence is what is relevant to the question at hand, prevalence might be a misleading proxy. However, for descriptive questions designed to inform public-health planning for secondary or tertiary prevention measures, prevalence might be the most relevant measure of occurrence, as it reflects the population of people who might access those services.

Risk (the proportion of people free from disease at baseline who develop the outcome during the study period) is the foundation of many causal epidemiologic studies ( 33 ), particularly as the target trial framework ( 1 ) has gained in popularity. Risk is arguably the most easily interpretable measure of disease occurrence for the general public ( 33 ). We discuss rates (the number of events divided by a sum of person-time) as an alternative measure of incidence in a few paragraphs. Two complications for obtaining valid estimates of either measure of incidence, however, are competing events and incompletely observed person-time (left-truncation and right-censoring).

Competing events are events that preclude the event of interest from occurring and are theoretical if not practical problems for all outcomes other than all-cause mortality ( 34 ). In the presence of competing events, we have the option to report the conditional or unconditional risk (i.e., cumulative incidence function) ( 35 ). The conditional risk is the proportion of people free from disease at baseline that we would expect to develop the outcome during the study period if all competing events were prevented without changing the hazard of the event of interest; it is the risk “conditional” on removal of the competing event. It is estimated by censoring persons who experience a competing event and is the first and sometimes only estimand of risk that students of epidemiology are taught ( 36 ). It is also implied by the exponential formula for converting rates to risks. However, complete removal of the competing event is a hypothetical intervention, and the conditional risk is the risk under that often-infeasible intervention. If our goal is to describe the world as it exists, absent hypothetical interventions, the cumulative incidence function is recommended when the number of competing events is nontrivial ( 37 ). The cumulative incidence function (or, as is implied but is a less commonly used term, the unconditional risk) is the proportion of people free from disease at baseline who would develop the outcome of interest during the study period in the real world in which a competing event might remove them from follow-up and preclude them from ever developing the outcome of interest.

Risks can be calculated in the presence of late entries (left-truncation) and loss to follow-up (right-censoring) under strong assumptions about independence between entering/leaving the study and risk of the outcome ( 38 , 39 ). Left-truncation and right-censoring impute outcomes for people who did not survive to enroll in the study sample and for people who are censored ( 38 ). We can adjust for possible associations between censoring and the outcome (and resultant selection bias) using inverse probability of censoring weights ( 40 ). However, the resultant risks are interpretable as the risk that would have been observed if no one were lost to follow-up (a hypothetical intervention), and will be different from the natural course if loss to follow-up was associated with the outcome in ways not captured by covariates in the weight model or if loss to follow-up itself directly altered the risk of the outcome ( 18 , 40 ).

Finally, rates may occasionally be a useful measure of incidence as an alternative to risks, especially for descriptive studies. Risks are only defined relevant to a population free of, and biologically at risk for, the outcome at a particular time origin. When we would like to describe incidence across a time metric along which not all people were biologically at risk at the time origin, rates can appropriately exclude person-time not at risk and allow for reporting of smoothed incidence estimates. For example, when describing temporal trends for the incidence of HIV diagnoses since the beginning of the epidemic in the 1980s, there will be people who were not born (not at risk for the outcome) in the 1980s who should be counted in the target population in the 2010s. Perhaps in an idealized descriptive study, we would report the daily risk of HIV diagnosis restricted to people who were alive and at risk for HIV diagnosis at the start of each day. However, across 3 decades this may be computationally intensive and impractical given the granularity of data collection and reporting. We might instead report weekly, monthly, or yearly HIV diagnosis risk, but the wider the time interval across which we measure risk becomes, the greater the number of people in our target population who are not at risk at the start of the interval. How should we treat people born in December 1990 when calculating the risk of HIV diagnosis in 1990? In contrast, if we are willing to assume that the rate of HIV diagnosis across a calendar year is approximately constant, or if we assume that the average rate is a reasonable representation of the incidence in that year, rates could appropriately exclude person-time in which people are not biologically at risk. The assumption of a constant rate or the acceptability of an average rate for answering the study question should be plausible across the time intervals chosen, or time should be further discretized. Another benefit of rates is that they are straightforward to estimate when we do not have individual-level data, which is more common in descriptive analyses than in causal or predictive epidemiologic analyses. For example, rates are the standard measure of incidence used for notifiable diseases, where health departments count case reports to get the numerator and use midyear census estimates for the denominator.

Example : We have clearly specified in our research question that we are interested in the prevalence of viral suppression on December 31, 2019. People in our study sample with no viral load measurement in 2019 are lost to follow-up. Viral suppression is influenced by access to health care and is only possible if people are receiving antiretroviral therapy (except, in rare cases, for elite controllers) ( 41 ). In this setting, people who are lost to follow-up may have transferred to another clinic and may still be receiving treatment (if we are using NA-ACCORD data) or may have moved out of the jurisdiction (if we are using surveillance data), and we might assume that they have the same probability of viral suppression as people with a viral load measurement (censoring is appropriate; equivalently, we can restrict analyses to people with a measured viral load) ( 24 ). Alternatively, people who do not have a viral load measurement may have dropped out of clinical care and may not have access to antiretroviral therapy. The probability of viral suppression among these individuals is near 0 (we might think of loss to follow-up as a competing event and assign a value of “not suppressed” to persons who are lost to follow-up) ( 42 ). Understanding the assumptions and implications of different analytical decisions for these people is critical for making the right inference about the prevalence of the outcome.

When describing the prevalence or incidence of an outcome, we sometimes want to characterize the people who got the outcome according to covariates. Alternatively, we may want to account for nuisance variables, such as factors that differ between the study sample and the target population or between groups we plan to stratify by. When characterizing groups with the highest incidence of the outcome, bivariate results can make it challenging to understand how covariates interact to determine the distribution of disease. For example, if the prevalence of viral suppression is lower for cisgender women than for cisgender men and lower for Black patients than for White patients ( 43 ), what would we expect to see regarding the prevalence of viral suppression for cisgender White women relative to cisgender Black men? Stratifying on multiple variables simultaneously might be helpful in this setting, or we may want to employ theoretical models (e.g., conceptual frameworks for how variables influence risk of the outcome) or statistical strategies (e.g., supervised machine learning) to identify the most important variables if there are not enough data to stratify on all variables of interest. Conversely, when trying to understand whether one covariate is associated with the distribution of disease independently or merely because of its correlation with another covariate, a common approach is to put all covariates into a single model. However, this approach can lead to incorrect interpretations of the results and inappropriate recommendations for actions ( 44 ). Adjustment implies an intervention on the data and a distortion of reality—for example, “Would Black people still have lower prevalence of viral suppression if they had the same distribution of HIV acquisition risk factors as White people?”. Inappropriate adjustment may understate the magnitude of disparities ( 45 ) and adjusted statistics are prone to be interpreted causally, which could lead to inappropriate recommendations ( 9 ). We endorse reporting and primary interpretation of unadjusted results for descriptive studies and clear justification and proper interpretation in cases where adjustments are made.

Descriptive epidemiologic studies seek to characterize what is happening in the world to inform public health priorities, target interventions, and occasionally contrast with counterfactual scenarios to estimate intervention effects ( 46 , 47 ). Descriptive studies have value in their own right and not merely as stepping stools toward causal inference. Characterizing what is happening in the world requires that we be very clear about the particular slice of the world and the specific outcome we hope to study. Generalizability and selection biases can bias descriptive studies when study participation is associated with the outcome. Measurement error can bias descriptive studies when we do not use, or there is no gold-standard measure of, the outcome. Different measures of occurrence will provide different pictures of what is happening in the world. Censoring people who have a competing event or adjusting for covariates implies interventions on the data such that the results are a distorted version of reality. These are all basic epidemiologic principles that also affect the success of our attempts at causal effect estimation. Performing rigorous descriptive studies that accurately estimate a parameter of interest and are interpretable to clinicians and policy-makers will improve public health.

Author affiliations: Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States (Catherine R. Lesko); Departments of Epidemiology and Global Health, School of Public Health, Boston University, Boston, Massachusetts, United States (Matthew P. Fox); and Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States (Jessie K. Edwards).

This work was supported by grants K01 AA028193, K01 AI125087, and R01 AI157758 from the National Institutes of Health.

Conflict of interest: none declared.

Hernán   MA , Robins   JM . Using big data to emulate a target trial when a randomized trial is not available . Am J Epidemiol .   2016 ; 183 ( 8 ): 758 – 764 .

Google Scholar

Petersen   ML , van der   Laan   MJ . Causal models and learning from data: integrating causal modeling and statistical estimation . Epidemiology .   2014 ; 25 ( 3 ): 418 – 426 .

von   Elm   E , Altman   DG , Egger   M , et al.    The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies . Int J Surg .   2014 ; 12 ( 12 ): 1495 – 1499 .

Robins   JM . Data, design, and background knowledge in etiologic inference . Epidemiology .   2001 ; 12 ( 3 ): 313 – 320 .

Rubin   DB . The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials . Stat Med .   2007 ; 26 ( 1 ): 20 – 36 .

Petersen   ML . Commentary: applying a causal road map in settings with time-dependent confounding . Epidemiology .   2014 ; 25 ( 6 ): 898 – 901 .

Fox   MP , Edwards   JK , Platt   R , et al.    The critical importance of asking good questions: the role of epidemiology doctoral training programs . Am J Epidemiol .   2020 ; 189 ( 4 ): 261 – 264 .

Holland   PW . Statistics and causal inference . J Am Stat Assoc .   1986 ; 81 ( 396 ): 945 – 960 .

Tennant   PWG , Murray   EJ . The quest for timely insights into COVID-19 should not come at the cost of scientific rigor . Epidemiology .   2021 ; 32 ( 1 ):e2.

Kaufman   JS . Statistics, adjusted statistics, and maladjusted statistics . Am J Law Med .   2017 ; 43 ( 2-3 ): 193 – 208 .

Gardner   EM , McLees   MP , Steiner   JF , et al.    The spectrum of engagement in HIV care and its relevance to test-and-treat strategies for prevention of HIV infection . Clin Infect Dis .   2011 ; 52 ( 6 ): 793 – 800 .

Lee   KK , Fitts   MS , Conigrave   JH , et al.    Recruiting a representative sample of urban South Australian Aboriginal adults for a survey on alcohol consumption . BMC Med Res Methodol .   2020 ; 20 ( 1 ): 183 .

Offord   C . How (not) to do an antibody survey for SARS-CoV-2. Scientist .   https://www.the-scientist.com/news-opinion/how-not-to-do-an-antibody-survey-for-sars-cov-2-67488 . Published April 28, 2020 . Accessed April 8, 2022 .

Lesko   CR , Buchanan   AL , Westreich   D , et al.    Generalizing study results: a potential outcomes perspective . Epidemiology .   2017 ; 28 ( 4 ): 553 – 561 .

Dahabreh   IJ , Robertson   SE , Tchetgen   EJ , et al.    Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals . Biometrics .   2019 ; 75 ( 2 ): 685 – 694 .

Cole   SR , Stuart   EA . Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 Trial . Am J Epidemiol .   2010 ; 172 ( 1 ): 107 – 115 .

Westreich   D . Berkson’s bias, selection bias, and missing data . Epidemiology .   2012 ; 23 ( 1 ): 159 – 164 .

Hernán   MA . Invited commentary: selection bias without colliders . Am J Epidemiol .   2017 ; 185 ( 11 ): 1048 – 1050 .

Rebeiro   PF , Althoff   KN , Lau   B , et al.    Laboratory measures as proxies for primary care encounters: implications for quantifying clinical retention among HIV-infected adults in North America . Am J Epidemiol .   2015 ; 182 ( 11 ): 952 – 960 .

Lesko   CR , Sampson   LA , Miller   WC , et al.    Measuring the HIV care continuum using public health surveillance data in the United States . J Acquir Immune Defic Syndr .   2015 ; 70 ( 5 ): 489 – 494 .

Gange   SJ , Kitahata   MM , Saag   MS , et al.    Cohort profile: the North American AIDS Cohort Collaboration on Research and Design (NA-ACCORD) . Int J Epidemiol .   2007 ; 36 ( 2 ): 294 – 301 .

Edwards   JK , Cole   SR , Westreich   D . All your data are always missing: incorporating bias due to measurement error into the potential outcomes framework . Int J Epidemiol .   2015 ; 44 ( 4 ): 1452 – 1459 .

McMahon   JH , Elliott   JH , Bertagnolio   S , et al.    Viral suppression after 12 months of antiretroviral therapy in low- and middle-income countries: a systematic review . Bull World Health Organ .   2013 ; 91 ( 5 ): 377 – 385E .

Lesko   CR , Chander   G , Moore   RD , et al.    Variation in estimated viral suppression associated with the definition of viral suppression used . AIDS .   2020 ; 34 ( 10 ): 1519 – 1526 .

Hermans   LE , Moorhouse   M , Carmona   S , et al.    Effect of HIV-1 low-level viraemia during antiretroviral therapy on treatment outcomes in WHO-guided South African treatment programmes: a multicentre cohort study . Lancet Infect Dis .   2018 ; 18 ( 2 ): 188 – 197 .

Elvstam   O , Medstrand   P , Yilmaz   A , et al.    Virological failure and all-cause mortality in HIV-positive adults with low-level viremia during antiretroviral treatment . PLoS One .   2017 ; 12 ( 7 ):e0180761.

Antiretroviral Therapy Cohort Collaboration , Vandenhende   MA , Ingle   S , et al.    Impact of low-level viremia on clinical and virological outcomes in treated HIV-1-infected patients . AIDS .   2015 ; 29 ( 3 ): 373 – 383 .

Laprise   C , de   Pokomandy   A , Baril   J-G , et al.    Virologic failure following persistent low-level viremia in a cohort of HIV-positive patients: results from 12 years of observation . Clin Infect Dis .   2013 ; 57 ( 10 ): 1489 – 1496 .

Lesko   CR , Lau   B , Chander   G , et al.    Time spent with HIV viral load >1500 copies/mL among persons engaged in continuity HIV care in an urban clinic in the United States, 2010–2015 . AIDS Behav . 2018 ; 22 ( 11 ): 3443 – 3450 .

Quinn   TC , Wawer   MJ , Sewankambo   N , et al.    Viral load and heterosexual transmission of human immunodeficiency virus type 1. Rakai Project Study Group . N Engl J Med .   2000 ; 342 ( 13 ): 921 – 929 .

Prentice   RL , Chlebowski   RT , Stefanick   ML , et al.    Estrogen plus progestin therapy and breast cancer in recently postmenopausal women . Am J Epidemiol .   2008 ; 167 ( 10 ): 1207 – 1216 .

Lund   JL , Richardson   DB , Stürmer   T . The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application . Curr Epidemiol Rep .   2015 ; 2 ( 4 ): 221 – 228 .

Cole   SR , Hudgens   MG , Brookhart   MA , et al.    Risk . Am J Epidemiol .   2015 ; 181 ( 4 ): 246 – 250 .

Lau   B , Cole   SR , Gange   SJ . Competing risk regression models for epidemiologic data . Am J Epidemiol .   2009 ; 170 ( 2 ): 244 – 256 .

Edwards   JK , Hester   LL , Gokhale   M , et al.    Methodologic issues when estimating risks in pharmacoepidemiology . Curr Epidemiol Rep .   2016 ; 3 ( 4 ): 285 – 296 .

Rothman   KJ , Lash   TL , VanderWeele   TJ , et al.  Measures of occurrence. In: Modern Epidemiology . 4th ed.   Philadelphia, PA : Wolters Kluwer N.V. ; 2021 : 53 – 77 .

Google Preview

Cole   SR , Lau   B , Eron   JJ , et al.    Estimation of the standardized risk difference and ratio in a competing risks framework: application to injection drug use and progression to AIDS after initiation of antiretroviral therapy . Am J Epidemiol .   2015 ; 181 ( 4 ): 238 – 245 .

Cole   SR , Edwards   JK , Naimi   AI , et al.    Hidden imputations and the Kaplan-Meier estimator . Am J Epidemiol .   2020 ; 189 ( 11 ): 1408 – 1411 .

Lesko   CR , Edwards   JK , Cole   SR , et al.    When to censor?   Am J Epidemiol .   2018 ; 187 ( 3 ): 623 – 632 .

Howe   CJ , Cole   SR , Lau   B , et al.    Selection bias due to loss to follow up in cohort studies . Epidemiology .   2016 ; 27 ( 1 ): 91 – 97 .

Okulicz   JF , Marconi   VC , Landrum   ML , et al.    Clinical outcomes of elite controllers, viremic controllers, and long-term nonprogressors in the US Department of Defense HIV Natural History Study . J Infect Dis .   2009 ; 200 ( 11 ): 1714 – 1723 .

Edwards   JK , Lesko   CR , Herce   ME , et al.    Gone but not lost: implications for estimating HIV care outcomes when loss to clinic is not loss to care . Epidemiology .   2020 ; 31 ( 4 ): 570 – 577 .

Centers for Disease Control and Prevention . Monitoring Selected National HIV Prevention and Care Objectives by Using HIV Surveillance Data—United States and 6 Dependent Areas, 2019 . ( HIV Surveillance Supplemental Report , vol. 26, no. 2) . Atlanta, GA : Centers for Disease Control and Prevention ; 2021 . https://www.cdc.gov/hiv/pdf/library/reports/surveillance/cdc-hiv-surveillance-report-vol-26-no-2.pdf . Accessed November 29, 2021 .

Westreich   D , Greenland   S . The table 2 fallacy: presenting and interpreting confounder and modifier coefficients . Am J Epidemiol .   2013 ; 177 ( 4 ): 292 – 298 .

Zalla   LC , Martin   CL , Edwards   JK , et al.    A geography of risk: structural racism and COVID-19 mortality in the United States . Am J Epidemiol .   2021 ; 190 ( 8 ): 1439 – 1446 .

Westreich   D . From exposures to population interventions: pregnancy and response to HIV therapy . Am J Epidemiol .   2014 ; 179 ( 7 ): 797 – 806 .

Edwards   JK , Cole   SR , Lesko   CR , et al.    An illustration of inverse probability weighting to estimate policy-relevant causal effects . Am J Epidemiol .   2016 ; 184 ( 4 ): 336 – 344 .

  • epidemiology
  • epidemiologic studies
  • stratification
  • outcome measures
  • measurement error
  • missing data
  • viral suppression
  • data analysis
Month: Total Views:
July 2022 368
August 2022 99
September 2022 133
October 2022 230
November 2022 165
December 2022 4,217
January 2023 2,259
February 2023 1,077
March 2023 1,326
April 2023 869
May 2023 560
June 2023 356
July 2023 456
August 2023 545
September 2023 478
October 2023 541
November 2023 465
December 2023 418
January 2024 562
February 2024 411
March 2024 475
April 2024 643
May 2024 490
June 2024 461
July 2024 457
August 2024 144

Email alerts

Citing articles via, looking for your next opportunity.

  • Recommend to your Library

Affiliations

  • Online ISSN 1476-6256
  • Print ISSN 0002-9262
  • Copyright © 2024 Johns Hopkins Bloomberg School of Public Health
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Introduction to Quantitative Epidemiology

  • First Online: 22 February 2022

Cite this chapter

sample research paper epidemiology

  • Xinguang Chen 7  

Part of the book series: Emerging Topics in Statistics and Biostatistics ((ETSB))

1124 Accesses

Epidemiology is essential for education, research, and practice in public health and medicine. As a scientific discipline, epidemiology covers four major tasks, including descriptive, etiological, translational, and methodological epidemiology. Descriptive epidemiology aims at quantifying the distribution of medical, health, or behavioral issues among people residing in a geographic area overtime; etiological epidemiology devotes to the understanding of causes and influential factors of any medical, health, or behavioral issue from onset, to progress and prognosis; translational epidemiology focuses on the transition of study findings from the descriptive and etiological epidemiology into interventions for disease prevention, treatment, and health promotion; and methodological epidemiology strives to develop new methods and innovatively use existing methods to deal with challenges in epidemiological research and practice.

Numbers speak louder than words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Brownson, R.C., Samet, J.M., Bensyl, D.M.: Applied epidemiology and public health: are we training the future generations appropriately? Ann. Epidemiol. 27 (2), 77–82 (2017)

Article   Google Scholar  

Chen, X., Chen, D.: Cusp catastrophe modeling in medical and health research. In: Chen, Wilson (eds.) Innovative Statistical Methods for Public Health Data, pp. 265–290. Springer (2015)

Chapter   Google Scholar  

Chen, X., Wang, K.: Geographic area-based rate as a novel indicator to enhance research and precision intervention for more effective HIV/AIDS control. Prev. Med. Rep. 5 , 301–307 (2017)

Chen, X., Yu, B.: Age and birth cohort-adjusted rates of suicide mortality among US male and female youth aged 10-19 years from 1999 to 2017. JAMA Netw. Open. 2 (9), e1911383 (2019)

Chen, X., Hu, H., Xu, X., Gong, J., Yan, Y., Li, F.: Probability sampling by connecting space with households using GIS/GPS technologies. J. Surv. Stud. Methodol. 6 , 149–168 (2018)

Cochran, W.G.: Sampling Techniques, 3rd edn. John Willey & Sons, New York (1977)

MATH   Google Scholar  

Doll, R., Hill, A.B.: Smoking and carcinoma of the lung. Br. Med. J. 2 (4682), 739–748 (1950)

Heckathorn, D.: Extensions of respondent-driven sampling: analyzing continuous variables and controlling for differential recruitment. Sociol. Methodol. 37 (1), 152–208 (2007)

Article   MathSciNet   Google Scholar  

Henry, G.T.: Practical Sampling. Sage Publications, Newbury Park (1990)

Book   Google Scholar  

Higgins, C., Hodges, C.: Studies on prostatic cancer. 1. The effect of castration, of estrogen and of androgen injection on serum phosphatases in metastatic carcinoma of the prostate. Cancer Res. 1 , 293–297 (1941)

Google Scholar  

Nelson, K.E., William, C.M.: Infectious Disease Epidemiology, 3rd edn. Jones & Bartlett Learning (2014)

Omran, A.R.: The epidemiological transition: a theory of the epidemiology of population change. Milkbank Q. 83 (4), 731–751 (2005)

Palinkas, et al.: Purposeful sampling for qualitative data collection and analysis in mixed method implantation research. Admin. Pol. Ment. Health. 42 (5), 533–544 (2015)

Pasteur, L.: The Physiological Theory of Fermentation and the Germ Theory and its Application to Medicine and Surgery Kessinger Legacy Reprint in 2010. Kessinger Publishing, LLC (1910)

Rothman, J., Greenland, S., Lash, T.L.: Modern Epidemiology, 3rd edn. Wolters Kluwer Health/Lippincott/Williams & Wilkins (2008)

Wang, K., Chen, X., Bird, V.Y., Gerke, T.A., Manini, T.M., Prosperi, M.: Association between age-related reductions in testosterone and risk of prostate cancer – an analysis of patient data with prostate diseases. Int. J. Cancer. 141 (9), 1783–1793 (2017)

Woodward, M.: Epidemiology: Study Design and Data Analysis, 3rd edn. CRC Press (2014)

Yu, B., Chen, X.: Age and birth cohort-adjusted rates of suicide mortality among US male and female youths aged 10 to 19 years from 1999 to 2017. JAMA Netw. Open. 2 (9), e1911383 (2019)

Download references

Author information

Authors and affiliations.

Department of Epidemiology, University of Florida, Gainesville, FL, USA

Xinguang Chen

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Chen, X. (2021). Introduction to Quantitative Epidemiology. In: Quantitative Epidemiology. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-83852-2_1

Download citation

DOI : https://doi.org/10.1007/978-3-030-83852-2_1

Published : 22 February 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-83851-5

Online ISBN : 978-3-030-83852-2

eBook Packages : Mathematics and Statistics Mathematics and Statistics (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Chapter 1. What is epidemiology?
Diagnosed by clinician and confirmed by pathologist53
Diagnosed by clinician and not confirmed by pathologist21
First diagnosed post mortem22
Farmers (self employed)82%
Professionals77%
Skilled manual workers69%
Labourers63%
Armed forces42%
  • Chapter 2. Quantifying disease in populations
  • Chapter 3. Comparing disease rates
  • Chapter 4. Measurement error and bias
  • Chapter 5. Planning and conducting a survey
  • Chapter 6. Ecological studies
  • Chapter 7. Longitudinal studies
  • Chapter 8. Case-control and cross sectional studies
  • Chapter 9. Experimental studies
  • Chapter 10. Screening
  • Chapter 11. Outbreaks of disease
  • Chapter 12. Reading epidemiological reports
  • Chapter 13. Further reading

Follow us on

Content links.

  • Collections
  • Health in South Asia
  • Women’s, children’s & adolescents’ health
  • News and views
  • BMJ Opinion
  • Rapid responses
  • Editorial staff
  • BMJ in the USA
  • BMJ in South Asia
  • Submit your paper
  • BMA members
  • Subscribers
  • Advertisers and sponsors

Explore BMJ

  • Our company
  • BMJ Careers
  • BMJ Learning
  • BMJ Masterclasses
  • BMJ Journals
  • BMJ Student
  • Academic edition of The BMJ
  • BMJ Best Practice
  • The BMJ Awards
  • Email alerts
  • Activate subscription

Information

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts

Epidemiology articles within Scientific Reports

Article 11 August 2024 | Open Access

Comparative analysis of feature selection techniques for COVID-19 dataset

  • Farideh Mohtasham
  • , MohamadAmin Pourhoseingholi
  •  &  Mohammad Reza Zali

Article 09 August 2024 | Open Access

Assessing genetic and environmental components for pterygium: a nationwide study in Taiwan

  • Jiahn-Shing Lee
  • , Wei-Min Chen
  •  &  Lai-Chu See

A study on factors influencing delayed sputum conversion in newly diagnosed pulmonary tuberculosis based on bacteriology and genomics

  • Mengdi Pang
  • , Xiaowei Dai
  •  &  Chuanyou Li

Article 07 August 2024 | Open Access

Evidence of higher suicidal ideation among young adults in Canada during the COVID-19 pandemic

  • Guillaume Dubé
  • , Robin Legault
  •  &  Éric Lacourse

Article 06 August 2024 | Open Access

Asymptomatic infection and disappearance of clinical symptoms of COVID-19 infectors in China 2022–2023: a cross-sectional study

  • Kaige Zhang
  • , Xiaoying Zhong
  •  &  Hongcai Shang

A better performing algorithm for identification of implausible growth data from longitudinal pediatric medical records

  • Kylie K. Harrall
  • , Sarah M. Bird
  •  &  Deborah H. Glueck

Article 02 August 2024 | Open Access

Assessment of mental well-being and its socio-economic determinants among older adults in the Rohingya refugee camp of Bangladesh

  • Afsana Anwar
  • , Nahida Akter
  •  &  Sabuj Kanti Mistry

Article 01 August 2024 | Open Access

The public health impact of COVID-19 variants of concern on the effectiveness of contact tracing in Vermont, United States

  • François M. Castonguay
  • , Brian F. Borah
  •  &  Martin I. Meltzer

Adverse effects of meteorological factors and air pollutants on dry eye disease: a hospital-based retrospective cohort study

  • Yun-Hee Choi
  • , Myung-Sun Song
  •  &  Dong Hyun Kim

A methylation risk score for chronic kidney disease: a HyperGEN study

  • Alana C. Jones
  • , Amit Patki
  •  &  Marguerite R. Irvin

Inappropriate antibiotic access practices at the community level in Eastern Ethiopia

  • Dumessa Edessa
  • , Fekede Asefa Kumsa
  •  &  Lemessa Oljira

Article 31 July 2024 | Open Access

Missed Follow-up is associated with worse survival in stage I lung cancer: results from a large multi-site academic hospital system

  • Ethan M. Steele
  • , Heather N. Burney
  •  &  Jordan A. Holmes

Prevalence and incidence of venous thromboembolism in geriatric patients admitted to long-term care hospitals

  • Gernot Wagner
  • , Daniel Steiner
  •  &  Cihan Ay

Impact of radiotherapy on second primary lung cancer incidence and survival in esophageal cancer survivors

  • , Dinghang Chen
  •  &  Mingqiang Kang

Article 30 July 2024 | Open Access

Emotional and cognitive influences on alcohol consumption in middle-aged and elderly Tanzanians: a population-based study

  • , Patrick Kazonda
  •  &  Till Bärnighausen

Genetic predictors of blood pressure traits are associated with preeclampsia

  • Elizabeth A. Jasper
  • , Jacklyn N. Hellwege
  •  &  Digna R. Velez Edwards

The prevalence and determinants of diabetes mellitus and thyroid disorder comorbidity in Tabari cohort population

  • Mahmood Moosazadeh
  • , Saeedeh Khakhki
  •  &  Erfan Ghadirzadeh

Article 29 July 2024 | Open Access

Survey on nurse-physician communication gaps focusing on diagnostic concerns and reasons for silence

  • Taiju Miyagami
  • , Takashi Watari
  •  &  Toshio Naito

Effects of feeding patterns during the first 6 months on weight development of infants ages 0–12 months: a longitudinal study

  • Chun-ying Zhang
  •  &  Ai-qun Huang

Article 26 July 2024 | Open Access

Intervention effect of targeted workplace closures may be approximated by single-layered networks in an individual-based model of COVID-19 control

  • Maximilian Richter
  • , Melissa A. Penny
  •  &  Andrew J. Shattock

Article 24 July 2024 | Open Access

Association of meeting the 24-h movement guidelines with high blood pressure in adolescents: a cross-sectional study

  • Diego G. D. Christofaro
  • , Gerson Ferrari
  •  &  Javier Brazo-Sayavera

Time series analysis of the interaction between ambient temperature and air pollution on hospitalizations for AECOPD in Ganzhou, China

  • Chenyang Shi
  • , Jinyun Zhu
  •  &  Yanbin Hao

Article 23 July 2024 | Open Access

Trends in poisoning associated with the use of insecticides for bed bug infestations: a 20-year retrospective study in France

  • Hervé Laborde-Castérot
  • , Dominique Vodovar
  •  &  Laurine Le Visage

The incidence and prevalence of ankylosing spondylitis in Thailand using ministry of public health database

  • Ajanee Mahakkanukrauh
  • , Siraphop Suwannaroj
  •  &  Chingching Foocharoen

Article 22 July 2024 | Open Access

Predictive factors for perinatal bacterial transmission from colonized mothers to delivered very-low-birth-weight infants: a retrospective cohort study

  • Jieun Hwang
  • , Sumin Kim
  •  &  Yun Sil Chang

Alcohol consumption among Iranian population based on the findings of STEPS survey 2021

  • Amirali Hajebi
  • , Maryam Nasserinejad
  •  &  Farshad Farzadfar

Article 20 July 2024 | Open Access

Interaction between ambient CO and temperature or relative humidity on the risk of stroke hospitalization

  •  &  Peifeng Liang

Article 19 July 2024 | Open Access

Foodborne bacteria in milk and milk products along the water buffalo milk chain in Bangladesh

  • Shuvo Singha
  • , Gerrit Koop
  •  &  Cristina Lecchi

Three years of COVID-19-related school restrictions and mental health of children and adolescents in Japan

  • , Naohisa Shobako
  •  &  Taisuke Nakata

Association between dietary vitamin A intake and risk of cardiometabolic multimorbidity

  •  &  Guiyuan Qiao

Article 18 July 2024 | Open Access

The association between visceral adiposity index and risk of type 2 diabetes mellitus

  • Haoran Zhou
  • , Tianshu Li
  •  &  Jie Yang

Article 17 July 2024 | Open Access

Pulmonary function trajectories in COVID-19 survivors with and without pre-existing respiratory disease

  • Debbie Gach
  • , Rosanne J. H. C. G. Beijers
  •  &  Frits H. M. van Osch

Association between eating speed and atherosclerosis in relation to growth differentiation factor-15 levels in older individuals in a cross-sectional study

  • Yuji Shimizu
  • , Shin-Ya Kawashiri
  •  &  Naomi Hayashida

Article 16 July 2024 | Open Access

At-admission prediction of mortality and pulmonary embolism in an international cohort of hospitalised patients with COVID-19 using statistical and machine learning methods

  • Munib Mesinovic
  • , Xin Ci Wong
  •  &  David Zucman

Modeling pediatric antibiotic use in an area of declining malaria prevalence

  • Lydia Helen Rautman
  • , Daniel Eibach
  •  &  Ralf Krumkamp

Article 15 July 2024 | Open Access

Assessment of domestic pig–bushpig ( Potamochoerus larvatus ) interactions through local knowledge in rural areas of Madagascar

  • Rianja Rakotoarivony
  • , Daouda Kassie
  •  &  Ferran Jori

Genomic surveillance of malaria parasites in an indigenous community in the Peruvian Amazon

  • Luis Cabrera-Sosa
  • , Oscar Nolasco
  •  &  Christopher Delgado-Ratto

Article 14 July 2024 | Open Access

Association between self-reported and objectively assessed physical functioning in the general population

  • Nicola Moser
  • , Floran Sahiti
  •  &  Caroline Morbach

Article 12 July 2024 | Open Access

Levels of Sex Hormones and Abdominal Muscle Composition in Men from The Multi-Ethnic Study of Atherosclerosis

  • Amar Osmancevic
  • , Matthew Allison
  •  &  Bledar Daka

Article 11 July 2024 | Open Access

The prevalence, risk factors, and outcomes of acute pulmonary embolism complicating sepsis and septic shock: a national inpatient sample analysis

  • Daisuke Hasegawa
  • , Ryota Sato
  •  &  David Steiger

Population based study on the progress in survival of primarily metastatic lung cancer patients in Germany

  • Therese Tzschoppe
  • , Julia Ohlinger
  •  &  Daniel Medenwald

Early mutational signatures and transmissibility of SARS-CoV-2 Gamma and Lambda variants in Chile

  • Karen Y. Oróstica
  • , Sebastian B. Mohr
  •  &  Seba Contreras

Article 10 July 2024 | Open Access

Seroprevalence of adeno-associated virus types 1, 2, 3, 4, 5, 6, 8, and 9 in a Basque cohort of healthy donors

  • Miguel Navarro-Oliveros
  • , Ander Vidaurrazaga
  •  &  Nicola G. A. Abrescia

Asymmetric affective polarization regarding COVID-19 vaccination in six European countries

  • Maximilian Filsinger
  •  &  Markus Freitag

HIV-1 diversity and pre-treatment drug resistance in the era of integrase inhibitor among newly diagnosed ART-naïve adult patients in Luanda, Angola

  • Cruz S. Sebastião
  • , Ana B. Abecasis
  •  &  Victor Pimentel

Article 09 July 2024 | Open Access

Antidepressant drug use after intensive care: a nationwide cohort study

  • Erik von Oelreich
  • , Jesper Eriksson
  •  &  Anders Oldner

Article 08 July 2024 | Open Access

The impact of the COVID-19 pandemic on Staphylococcus aureus infections in pediatric patients admitted with community acquired pneumonia

  • , Liang Fang
  •  &  Fang Gong

Epidemiologic relationship between alcohol flushing and smoking in the Korean population: the Korea National Health and Nutrition Examination Survey

  • Hwa Jung Yook
  • , Gyu-Na Lee
  •  &  Young Min Park

Integrating socio-psychological factors in the SEIR model optimized by a genetic algorithm for COVID-19 trend analysis

  • Haonan Wang
  • , Danhong Wu
  •  &  Junhui Zhang

Article 06 July 2024 | Open Access

A health economic pilot study comparing two diabetic retinopathy screening strategies

  • Ellen Steffenssen Sauesund
  • , Silvia N. W. Hertzberg
  •  &  Goran Petrovski

Advertisement

Browse broader subjects

  • Medical research
  • Public health

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

sample research paper epidemiology

  • Research article
  • Open access
  • Published: 04 June 2021

Coronavirus disease (COVID-19) pandemic: an overview of systematic reviews

  • Israel Júnior Borges do Nascimento 1 , 2 ,
  • Dónal P. O’Mathúna 3 , 4 ,
  • Thilo Caspar von Groote 5 ,
  • Hebatullah Mohamed Abdulazeem 6 ,
  • Ishanka Weerasekara 7 , 8 ,
  • Ana Marusic 9 ,
  • Livia Puljak   ORCID: orcid.org/0000-0002-8467-6061 10 ,
  • Vinicius Tassoni Civile 11 ,
  • Irena Zakarija-Grkovic 9 ,
  • Tina Poklepovic Pericic 9 ,
  • Alvaro Nagib Atallah 11 ,
  • Santino Filoso 12 ,
  • Nicola Luigi Bragazzi 13 &
  • Milena Soriano Marcolino 1

On behalf of the International Network of Coronavirus Disease 2019 (InterNetCOVID-19)

BMC Infectious Diseases volume  21 , Article number:  525 ( 2021 ) Cite this article

17k Accesses

33 Citations

13 Altmetric

Metrics details

Navigating the rapidly growing body of scientific literature on the SARS-CoV-2 pandemic is challenging, and ongoing critical appraisal of this output is essential. We aimed to summarize and critically appraise systematic reviews of coronavirus disease (COVID-19) in humans that were available at the beginning of the pandemic.

Nine databases (Medline, EMBASE, Cochrane Library, CINAHL, Web of Sciences, PDQ-Evidence, WHO’s Global Research, LILACS, and Epistemonikos) were searched from December 1, 2019, to March 24, 2020. Systematic reviews analyzing primary studies of COVID-19 were included. Two authors independently undertook screening, selection, extraction (data on clinical symptoms, prevalence, pharmacological and non-pharmacological interventions, diagnostic test assessment, laboratory, and radiological findings), and quality assessment (AMSTAR 2). A meta-analysis was performed of the prevalence of clinical outcomes.

Eighteen systematic reviews were included; one was empty (did not identify any relevant study). Using AMSTAR 2, confidence in the results of all 18 reviews was rated as “critically low”. Identified symptoms of COVID-19 were (range values of point estimates): fever (82–95%), cough with or without sputum (58–72%), dyspnea (26–59%), myalgia or muscle fatigue (29–51%), sore throat (10–13%), headache (8–12%) and gastrointestinal complaints (5–9%). Severe symptoms were more common in men. Elevated C-reactive protein and lactate dehydrogenase, and slightly elevated aspartate and alanine aminotransferase, were commonly described. Thrombocytopenia and elevated levels of procalcitonin and cardiac troponin I were associated with severe disease. A frequent finding on chest imaging was uni- or bilateral multilobar ground-glass opacity. A single review investigated the impact of medication (chloroquine) but found no verifiable clinical data. All-cause mortality ranged from 0.3 to 13.9%.

Conclusions

In this overview of systematic reviews, we analyzed evidence from the first 18 systematic reviews that were published after the emergence of COVID-19. However, confidence in the results of all reviews was “critically low”. Thus, systematic reviews that were published early on in the pandemic were of questionable usefulness. Even during public health emergencies, studies and systematic reviews should adhere to established methodological standards.

Peer Review reports

The spread of the “Severe Acute Respiratory Coronavirus 2” (SARS-CoV-2), the causal agent of COVID-19, was characterized as a pandemic by the World Health Organization (WHO) in March 2020 and has triggered an international public health emergency [ 1 ]. The numbers of confirmed cases and deaths due to COVID-19 are rapidly escalating, counting in millions [ 2 ], causing massive economic strain, and escalating healthcare and public health expenses [ 3 , 4 ].

The research community has responded by publishing an impressive number of scientific reports related to COVID-19. The world was alerted to the new disease at the beginning of 2020 [ 1 ], and by mid-March 2020, more than 2000 articles had been published on COVID-19 in scholarly journals, with 25% of them containing original data [ 5 ]. The living map of COVID-19 evidence, curated by the Evidence for Policy and Practice Information and Co-ordinating Centre (EPPI-Centre), contained more than 40,000 records by February 2021 [ 6 ]. More than 100,000 records on PubMed were labeled as “SARS-CoV-2 literature, sequence, and clinical content” by February 2021 [ 7 ].

Due to publication speed, the research community has voiced concerns regarding the quality and reproducibility of evidence produced during the COVID-19 pandemic, warning of the potential damaging approach of “publish first, retract later” [ 8 ]. It appears that these concerns are not unfounded, as it has been reported that COVID-19 articles were overrepresented in the pool of retracted articles in 2020 [ 9 ]. These concerns about inadequate evidence are of major importance because they can lead to poor clinical practice and inappropriate policies [ 10 ].

Systematic reviews are a cornerstone of today’s evidence-informed decision-making. By synthesizing all relevant evidence regarding a particular topic, systematic reviews reflect the current scientific knowledge. Systematic reviews are considered to be at the highest level in the hierarchy of evidence and should be used to make informed decisions. However, with high numbers of systematic reviews of different scope and methodological quality being published, overviews of multiple systematic reviews that assess their methodological quality are essential [ 11 , 12 , 13 ]. An overview of systematic reviews helps identify and organize the literature and highlights areas of priority in decision-making.

In this overview of systematic reviews, we aimed to summarize and critically appraise systematic reviews of coronavirus disease (COVID-19) in humans that were available at the beginning of the pandemic.

Methodology

Research question.

This overview’s primary objective was to summarize and critically appraise systematic reviews that assessed any type of primary clinical data from patients infected with SARS-CoV-2. Our research question was purposefully broad because we wanted to analyze as many systematic reviews as possible that were available early following the COVID-19 outbreak.

Study design

We conducted an overview of systematic reviews. The idea for this overview originated in a protocol for a systematic review submitted to PROSPERO (CRD42020170623), which indicated a plan to conduct an overview.

Overviews of systematic reviews use explicit and systematic methods for searching and identifying multiple systematic reviews addressing related research questions in the same field to extract and analyze evidence across important outcomes. Overviews of systematic reviews are in principle similar to systematic reviews of interventions, but the unit of analysis is a systematic review [ 14 , 15 , 16 ].

We used the overview methodology instead of other evidence synthesis methods to allow us to collate and appraise multiple systematic reviews on this topic, and to extract and analyze their results across relevant topics [ 17 ]. The overview and meta-analysis of systematic reviews allowed us to investigate the methodological quality of included studies, summarize results, and identify specific areas of available or limited evidence, thereby strengthening the current understanding of this novel disease and guiding future research [ 13 ].

A reporting guideline for overviews of reviews is currently under development, i.e., Preferred Reporting Items for Overviews of Reviews (PRIOR) [ 18 ]. As the PRIOR checklist is still not published, this study was reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 statement [ 19 ]. The methodology used in this review was adapted from the Cochrane Handbook for Systematic Reviews of Interventions and also followed established methodological considerations for analyzing existing systematic reviews [ 14 ].

Approval of a research ethics committee was not necessary as the study analyzed only publicly available articles.

Eligibility criteria

Systematic reviews were included if they analyzed primary data from patients infected with SARS-CoV-2 as confirmed by RT-PCR or another pre-specified diagnostic technique. Eligible reviews covered all topics related to COVID-19 including, but not limited to, those that reported clinical symptoms, diagnostic methods, therapeutic interventions, laboratory findings, or radiological results. Both full manuscripts and abbreviated versions, such as letters, were eligible.

No restrictions were imposed on the design of the primary studies included within the systematic reviews, the last search date, whether the review included meta-analyses or language. Reviews related to SARS-CoV-2 and other coronaviruses were eligible, but from those reviews, we analyzed only data related to SARS-CoV-2.

No consensus definition exists for a systematic review [ 20 ], and debates continue about the defining characteristics of a systematic review [ 21 ]. Cochrane’s guidance for overviews of reviews recommends setting pre-established criteria for making decisions around inclusion [ 14 ]. That is supported by a recent scoping review about guidance for overviews of systematic reviews [ 22 ].

Thus, for this study, we defined a systematic review as a research report which searched for primary research studies on a specific topic using an explicit search strategy, had a detailed description of the methods with explicit inclusion criteria provided, and provided a summary of the included studies either in narrative or quantitative format (such as a meta-analysis). Cochrane and non-Cochrane systematic reviews were considered eligible for inclusion, with or without meta-analysis, and regardless of the study design, language restriction and methodology of the included primary studies. To be eligible for inclusion, reviews had to be clearly analyzing data related to SARS-CoV-2 (associated or not with other viruses). We excluded narrative reviews without those characteristics as these are less likely to be replicable and are more prone to bias.

Scoping reviews and rapid reviews were eligible for inclusion in this overview if they met our pre-defined inclusion criteria noted above. We included reviews that addressed SARS-CoV-2 and other coronaviruses if they reported separate data regarding SARS-CoV-2.

Information sources

Nine databases were searched for eligible records published between December 1, 2019, and March 24, 2020: Cochrane Database of Systematic Reviews via Cochrane Library, PubMed, EMBASE, CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Sciences, LILACS (Latin American and Caribbean Health Sciences Literature), PDQ-Evidence, WHO’s Global Research on Coronavirus Disease (COVID-19), and Epistemonikos.

The comprehensive search strategy for each database is provided in Additional file 1 and was designed and conducted in collaboration with an information specialist. All retrieved records were primarily processed in EndNote, where duplicates were removed, and records were then imported into the Covidence platform [ 23 ]. In addition to database searches, we screened reference lists of reviews included after screening records retrieved via databases.

Study selection

All searches, screening of titles and abstracts, and record selection, were performed independently by two investigators using the Covidence platform [ 23 ]. Articles deemed potentially eligible were retrieved for full-text screening carried out independently by two investigators. Discrepancies at all stages were resolved by consensus. During the screening, records published in languages other than English were translated by a native/fluent speaker.

Data collection process

We custom designed a data extraction table for this study, which was piloted by two authors independently. Data extraction was performed independently by two authors. Conflicts were resolved by consensus or by consulting a third researcher.

We extracted the following data: article identification data (authors’ name and journal of publication), search period, number of databases searched, population or settings considered, main results and outcomes observed, and number of participants. From Web of Science (Clarivate Analytics, Philadelphia, PA, USA), we extracted journal rank (quartile) and Journal Impact Factor (JIF).

We categorized the following as primary outcomes: all-cause mortality, need for and length of mechanical ventilation, length of hospitalization (in days), admission to intensive care unit (yes/no), and length of stay in the intensive care unit.

The following outcomes were categorized as exploratory: diagnostic methods used for detection of the virus, male to female ratio, clinical symptoms, pharmacological and non-pharmacological interventions, laboratory findings (full blood count, liver enzymes, C-reactive protein, d-dimer, albumin, lipid profile, serum electrolytes, blood vitamin levels, glucose levels, and any other important biomarkers), and radiological findings (using radiography, computed tomography, magnetic resonance imaging or ultrasound).

We also collected data on reporting guidelines and requirements for the publication of systematic reviews and meta-analyses from journal websites where included reviews were published.

Quality assessment in individual reviews

Two researchers independently assessed the reviews’ quality using the “A MeaSurement Tool to Assess Systematic Reviews 2 (AMSTAR 2)”. We acknowledge that the AMSTAR 2 was created as “a critical appraisal tool for systematic reviews that include randomized or non-randomized studies of healthcare interventions, or both” [ 24 ]. However, since AMSTAR 2 was designed for systematic reviews of intervention trials, and we included additional types of systematic reviews, we adjusted some AMSTAR 2 ratings and reported these in Additional file 2 .

Adherence to each item was rated as follows: yes, partial yes, no, or not applicable (such as when a meta-analysis was not conducted). The overall confidence in the results of the review is rated as “critically low”, “low”, “moderate” or “high”, according to the AMSTAR 2 guidance based on seven critical domains, which are items 2, 4, 7, 9, 11, 13, 15 as defined by AMSTAR 2 authors [ 24 ]. We reported our adherence ratings for transparency of our decision with accompanying explanations, for each item, in each included review.

One of the included systematic reviews was conducted by some members of this author team [ 25 ]. This review was initially assessed independently by two authors who were not co-authors of that review to prevent the risk of bias in assessing this study.

Synthesis of results

For data synthesis, we prepared a table summarizing each systematic review. Graphs illustrating the mortality rate and clinical symptoms were created. We then prepared a narrative summary of the methods, findings, study strengths, and limitations.

For analysis of the prevalence of clinical outcomes, we extracted data on the number of events and the total number of patients to perform proportional meta-analysis using RStudio© software, with the “meta” package (version 4.9–6), using the “metaprop” function for reviews that did not perform a meta-analysis, excluding case studies because of the absence of variance. For reviews that did not perform a meta-analysis, we presented pooled results of proportions with their respective confidence intervals (95%) by the inverse variance method with a random-effects model, using the DerSimonian-Laird estimator for τ 2 . We adjusted data using Freeman-Tukey double arcosen transformation. Confidence intervals were calculated using the Clopper-Pearson method for individual studies. We created forest plots using the RStudio© software, with the “metafor” package (version 2.1–0) and “forest” function.

Managing overlapping systematic reviews

Some of the included systematic reviews that address the same or similar research questions may include the same primary studies in overviews. Including such overlapping reviews may introduce bias when outcome data from the same primary study are included in the analyses of an overview multiple times. Thus, in summaries of evidence, multiple-counting of the same outcome data will give data from some primary studies too much influence [ 14 ]. In this overview, we did not exclude overlapping systematic reviews because, according to Cochrane’s guidance, it may be appropriate to include all relevant reviews’ results if the purpose of the overview is to present and describe the current body of evidence on a topic [ 14 ]. To avoid any bias in summary estimates associated with overlapping reviews, we generated forest plots showing data from individual systematic reviews, but the results were not pooled because some primary studies were included in multiple reviews.

Our search retrieved 1063 publications, of which 175 were duplicates. Most publications were excluded after the title and abstract analysis ( n = 860). Among the 28 studies selected for full-text screening, 10 were excluded for the reasons described in Additional file 3 , and 18 were included in the final analysis (Fig. 1 ) [ 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 ]. Reference list screening did not retrieve any additional systematic reviews.

figure 1

PRISMA flow diagram

Characteristics of included reviews

Summary features of 18 systematic reviews are presented in Table 1 . They were published in 14 different journals. Only four of these journals had specific requirements for systematic reviews (with or without meta-analysis): European Journal of Internal Medicine, Journal of Clinical Medicine, Ultrasound in Obstetrics and Gynecology, and Clinical Research in Cardiology . Two journals reported that they published only invited reviews ( Journal of Medical Virology and Clinica Chimica Acta ). Three systematic reviews in our study were published as letters; one was labeled as a scoping review and another as a rapid review (Table 2 ).

All reviews were published in English, in first quartile (Q1) journals, with JIF ranging from 1.692 to 6.062. One review was empty, meaning that its search did not identify any relevant studies; i.e., no primary studies were included [ 36 ]. The remaining 17 reviews included 269 unique studies; the majority ( N = 211; 78%) were included in only a single review included in our study (range: 1 to 12). Primary studies included in the reviews were published between December 2019 and March 18, 2020, and comprised case reports, case series, cohorts, and other observational studies. We found only one review that included randomized clinical trials [ 38 ]. In the included reviews, systematic literature searches were performed from 2019 (entire year) up to March 9, 2020. Ten systematic reviews included meta-analyses. The list of primary studies found in the included systematic reviews is shown in Additional file 4 , as well as the number of reviews in which each primary study was included.

Population and study designs

Most of the reviews analyzed data from patients with COVID-19 who developed pneumonia, acute respiratory distress syndrome (ARDS), or any other correlated complication. One review aimed to evaluate the effectiveness of using surgical masks on preventing transmission of the virus [ 36 ], one review was focused on pediatric patients [ 34 ], and one review investigated COVID-19 in pregnant women [ 37 ]. Most reviews assessed clinical symptoms, laboratory findings, or radiological results.

Systematic review findings

The summary of findings from individual reviews is shown in Table 2 . Overall, all-cause mortality ranged from 0.3 to 13.9% (Fig. 2 ).

figure 2

A meta-analysis of the prevalence of mortality

Clinical symptoms

Seven reviews described the main clinical manifestations of COVID-19 [ 26 , 28 , 29 , 34 , 35 , 39 , 41 ]. Three of them provided only a narrative discussion of symptoms [ 26 , 34 , 35 ]. In the reviews that performed a statistical analysis of the incidence of different clinical symptoms, symptoms in patients with COVID-19 were (range values of point estimates): fever (82–95%), cough with or without sputum (58–72%), dyspnea (26–59%), myalgia or muscle fatigue (29–51%), sore throat (10–13%), headache (8–12%), gastrointestinal disorders, such as diarrhea, nausea or vomiting (5.0–9.0%), and others (including, in one study only: dizziness 12.1%) (Figs. 3 , 4 , 5 , 6 , 7 , 8 and 9 ). Three reviews assessed cough with and without sputum together; only one review assessed sputum production itself (28.5%).

figure 3

A meta-analysis of the prevalence of fever

figure 4

A meta-analysis of the prevalence of cough

figure 5

A meta-analysis of the prevalence of dyspnea

figure 6

A meta-analysis of the prevalence of fatigue or myalgia

figure 7

A meta-analysis of the prevalence of headache

figure 8

A meta-analysis of the prevalence of gastrointestinal disorders

figure 9

A meta-analysis of the prevalence of sore throat

Diagnostic aspects

Three reviews described methodologies, protocols, and tools used for establishing the diagnosis of COVID-19 [ 26 , 34 , 38 ]. The use of respiratory swabs (nasal or pharyngeal) or blood specimens to assess the presence of SARS-CoV-2 nucleic acid using RT-PCR assays was the most commonly used diagnostic method mentioned in the included studies. These diagnostic tests have been widely used, but their precise sensitivity and specificity remain unknown. One review included a Chinese study with clinical diagnosis with no confirmation of SARS-CoV-2 infection (patients were diagnosed with COVID-19 if they presented with at least two symptoms suggestive of COVID-19, together with laboratory and chest radiography abnormalities) [ 34 ].

Therapeutic possibilities

Pharmacological and non-pharmacological interventions (supportive therapies) used in treating patients with COVID-19 were reported in five reviews [ 25 , 27 , 34 , 35 , 38 ]. Antivirals used empirically for COVID-19 treatment were reported in seven reviews [ 25 , 27 , 34 , 35 , 37 , 38 , 41 ]; most commonly used were protease inhibitors (lopinavir, ritonavir, darunavir), nucleoside reverse transcriptase inhibitor (tenofovir), nucleotide analogs (remdesivir, galidesivir, ganciclovir), and neuraminidase inhibitors (oseltamivir). Umifenovir, a membrane fusion inhibitor, was investigated in two studies [ 25 , 35 ]. Possible supportive interventions analyzed were different types of oxygen supplementation and breathing support (invasive or non-invasive ventilation) [ 25 ]. The use of antibiotics, both empirically and to treat secondary pneumonia, was reported in six studies [ 25 , 26 , 27 , 34 , 35 , 38 ]. One review specifically assessed evidence on the efficacy and safety of the anti-malaria drug chloroquine [ 27 ]. It identified 23 ongoing trials investigating the potential of chloroquine as a therapeutic option for COVID-19, but no verifiable clinical outcomes data. The use of mesenchymal stem cells, antifungals, and glucocorticoids were described in four reviews [ 25 , 34 , 35 , 38 ].

Laboratory and radiological findings

Of the 18 reviews included in this overview, eight analyzed laboratory parameters in patients with COVID-19 [ 25 , 29 , 30 , 32 , 33 , 34 , 35 , 39 ]; elevated C-reactive protein levels, associated with lymphocytopenia, elevated lactate dehydrogenase, as well as slightly elevated aspartate and alanine aminotransferase (AST, ALT) were commonly described in those eight reviews. Lippi et al. assessed cardiac troponin I (cTnI) [ 25 ], procalcitonin [ 32 ], and platelet count [ 33 ] in COVID-19 patients. Elevated levels of procalcitonin [ 32 ] and cTnI [ 30 ] were more likely to be associated with a severe disease course (requiring intensive care unit admission and intubation). Furthermore, thrombocytopenia was frequently observed in patients with complicated COVID-19 infections [ 33 ].

Chest imaging (chest radiography and/or computed tomography) features were assessed in six reviews, all of which described a frequent pattern of local or bilateral multilobar ground-glass opacity [ 25 , 34 , 35 , 39 , 40 , 41 ]. Those six reviews showed that septal thickening, bronchiectasis, pleural and cardiac effusions, halo signs, and pneumothorax were observed in patients suffering from COVID-19.

Quality of evidence in individual systematic reviews

Table 3 shows the detailed results of the quality assessment of 18 systematic reviews, including the assessment of individual items and summary assessment. A detailed explanation for each decision in each review is available in Additional file 5 .

Using AMSTAR 2 criteria, confidence in the results of all 18 reviews was rated as “critically low” (Table 3 ). Common methodological drawbacks were: omission of prospective protocol submission or publication; use of inappropriate search strategy: lack of independent and dual literature screening and data-extraction (or methodology unclear); absence of an explanation for heterogeneity among the studies included; lack of reasons for study exclusion (or rationale unclear).

Risk of bias assessment, based on a reported methodological tool, and quality of evidence appraisal, in line with the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) method, were reported only in one review [ 25 ]. Five reviews presented a table summarizing bias, using various risk of bias tools [ 25 , 29 , 39 , 40 , 41 ]. One review analyzed “study quality” [ 37 ]. One review mentioned the risk of bias assessment in the methodology but did not provide any related analysis [ 28 ].

This overview of systematic reviews analyzed the first 18 systematic reviews published after the onset of the COVID-19 pandemic, up to March 24, 2020, with primary studies involving more than 60,000 patients. Using AMSTAR-2, we judged that our confidence in all those reviews was “critically low”. Ten reviews included meta-analyses. The reviews presented data on clinical manifestations, laboratory and radiological findings, and interventions. We found no systematic reviews on the utility of diagnostic tests.

Symptoms were reported in seven reviews; most of the patients had a fever, cough, dyspnea, myalgia or muscle fatigue, and gastrointestinal disorders such as diarrhea, nausea, or vomiting. Olfactory dysfunction (anosmia or dysosmia) has been described in patients infected with COVID-19 [ 43 ]; however, this was not reported in any of the reviews included in this overview. During the SARS outbreak in 2002, there were reports of impairment of the sense of smell associated with the disease [ 44 , 45 ].

The reported mortality rates ranged from 0.3 to 14% in the included reviews. Mortality estimates are influenced by the transmissibility rate (basic reproduction number), availability of diagnostic tools, notification policies, asymptomatic presentations of the disease, resources for disease prevention and control, and treatment facilities; variability in the mortality rate fits the pattern of emerging infectious diseases [ 46 ]. Furthermore, the reported cases did not consider asymptomatic cases, mild cases where individuals have not sought medical treatment, and the fact that many countries had limited access to diagnostic tests or have implemented testing policies later than the others. Considering the lack of reviews assessing diagnostic testing (sensitivity, specificity, and predictive values of RT-PCT or immunoglobulin tests), and the preponderance of studies that assessed only symptomatic individuals, considerable imprecision around the calculated mortality rates existed in the early stage of the COVID-19 pandemic.

Few reviews included treatment data. Those reviews described studies considered to be at a very low level of evidence: usually small, retrospective studies with very heterogeneous populations. Seven reviews analyzed laboratory parameters; those reviews could have been useful for clinicians who attend patients suspected of COVID-19 in emergency services worldwide, such as assessing which patients need to be reassessed more frequently.

All systematic reviews scored poorly on the AMSTAR 2 critical appraisal tool for systematic reviews. Most of the original studies included in the reviews were case series and case reports, impacting the quality of evidence. Such evidence has major implications for clinical practice and the use of these reviews in evidence-based practice and policy. Clinicians, patients, and policymakers can only have the highest confidence in systematic review findings if high-quality systematic review methodologies are employed. The urgent need for information during a pandemic does not justify poor quality reporting.

We acknowledge that there are numerous challenges associated with analyzing COVID-19 data during a pandemic [ 47 ]. High-quality evidence syntheses are needed for decision-making, but each type of evidence syntheses is associated with its inherent challenges.

The creation of classic systematic reviews requires considerable time and effort; with massive research output, they quickly become outdated, and preparing updated versions also requires considerable time. A recent study showed that updates of non-Cochrane systematic reviews are published a median of 5 years after the publication of the previous version [ 48 ].

Authors may register a review and then abandon it [ 49 ], but the existence of a public record that is not updated may lead other authors to believe that the review is still ongoing. A quarter of Cochrane review protocols remains unpublished as completed systematic reviews 8 years after protocol publication [ 50 ].

Rapid reviews can be used to summarize the evidence, but they involve methodological sacrifices and simplifications to produce information promptly, with inconsistent methodological approaches [ 51 ]. However, rapid reviews are justified in times of public health emergencies, and even Cochrane has resorted to publishing rapid reviews in response to the COVID-19 crisis [ 52 ]. Rapid reviews were eligible for inclusion in this overview, but only one of the 18 reviews included in this study was labeled as a rapid review.

Ideally, COVID-19 evidence would be continually summarized in a series of high-quality living systematic reviews, types of evidence synthesis defined as “ a systematic review which is continually updated, incorporating relevant new evidence as it becomes available ” [ 53 ]. However, conducting living systematic reviews requires considerable resources, calling into question the sustainability of such evidence synthesis over long periods [ 54 ].

Research reports about COVID-19 will contribute to research waste if they are poorly designed, poorly reported, or simply not necessary. In principle, systematic reviews should help reduce research waste as they usually provide recommendations for further research that is needed or may advise that sufficient evidence exists on a particular topic [ 55 ]. However, systematic reviews can also contribute to growing research waste when they are not needed, or poorly conducted and reported. Our present study clearly shows that most of the systematic reviews that were published early on in the COVID-19 pandemic could be categorized as research waste, as our confidence in their results is critically low.

Our study has some limitations. One is that for AMSTAR 2 assessment we relied on information available in publications; we did not attempt to contact study authors for clarifications or additional data. In three reviews, the methodological quality appraisal was challenging because they were published as letters, or labeled as rapid communications. As a result, various details about their review process were not included, leading to AMSTAR 2 questions being answered as “not reported”, resulting in low confidence scores. Full manuscripts might have provided additional information that could have led to higher confidence in the results. In other words, low scores could reflect incomplete reporting, not necessarily low-quality review methods. To make their review available more rapidly and more concisely, the authors may have omitted methodological details. A general issue during a crisis is that speed and completeness must be balanced. However, maintaining high standards requires proper resourcing and commitment to ensure that the users of systematic reviews can have high confidence in the results.

Furthermore, we used adjusted AMSTAR 2 scoring, as the tool was designed for critical appraisal of reviews of interventions. Some reviews may have received lower scores than actually warranted in spite of these adjustments.

Another limitation of our study may be the inclusion of multiple overlapping reviews, as some included reviews included the same primary studies. According to the Cochrane Handbook, including overlapping reviews may be appropriate when the review’s aim is “ to present and describe the current body of systematic review evidence on a topic ” [ 12 ], which was our aim. To avoid bias with summarizing evidence from overlapping reviews, we presented the forest plots without summary estimates. The forest plots serve to inform readers about the effect sizes for outcomes that were reported in each review.

Several authors from this study have contributed to one of the reviews identified [ 25 ]. To reduce the risk of any bias, two authors who did not co-author the review in question initially assessed its quality and limitations.

Finally, we note that the systematic reviews included in our overview may have had issues that our analysis did not identify because we did not analyze their primary studies to verify the accuracy of the data and information they presented. We give two examples to substantiate this possibility. Lovato et al. wrote a commentary on the review of Sun et al. [ 41 ], in which they criticized the authors’ conclusion that sore throat is rare in COVID-19 patients [ 56 ]. Lovato et al. highlighted that multiple studies included in Sun et al. did not accurately describe participants’ clinical presentations, warning that only three studies clearly reported data on sore throat [ 56 ].

In another example, Leung [ 57 ] warned about the review of Li, L.Q. et al. [ 29 ]: “ it is possible that this statistic was computed using overlapped samples, therefore some patients were double counted ”. Li et al. responded to Leung that it is uncertain whether the data overlapped, as they used data from published articles and did not have access to the original data; they also reported that they requested original data and that they plan to re-do their analyses once they receive them; they also urged readers to treat the data with caution [ 58 ]. This points to the evolving nature of evidence during a crisis.

Our study’s strength is that this overview adds to the current knowledge by providing a comprehensive summary of all the evidence synthesis about COVID-19 available early after the onset of the pandemic. This overview followed strict methodological criteria, including a comprehensive and sensitive search strategy and a standard tool for methodological appraisal of systematic reviews.

In conclusion, in this overview of systematic reviews, we analyzed evidence from the first 18 systematic reviews that were published after the emergence of COVID-19. However, confidence in the results of all the reviews was “critically low”. Thus, systematic reviews that were published early on in the pandemic could be categorized as research waste. Even during public health emergencies, studies and systematic reviews should adhere to established methodological standards to provide patients, clinicians, and decision-makers trustworthy evidence.

Availability of data and materials

All data collected and analyzed within this study are available from the corresponding author on reasonable request.

World Health Organization. Timeline - COVID-19: Available at: https://www.who.int/news/item/29-06-2020-covidtimeline . Accessed 1 June 2021.

COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). Available at: https://coronavirus.jhu.edu/map.html . Accessed 1 June 2021.

Anzai A, Kobayashi T, Linton NM, Kinoshita R, Hayashi K, Suzuki A, et al. Assessing the Impact of Reduced Travel on Exportation Dynamics of Novel Coronavirus Infection (COVID-19). J Clin Med. 2020;9(2):601.

Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400. https://doi.org/10.1126/science.aba9757 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Fidahic M, Nujic D, Runjic R, Civljak M, Markotic F, Lovric Makaric Z, et al. Research methodology and characteristics of journal articles with original data, preprint articles and registered clinical trial protocols about COVID-19. BMC Med Res Methodol. 2020;20(1):161. https://doi.org/10.1186/s12874-020-01047-2 .

EPPI Centre . COVID-19: a living systematic map of the evidence. Available at: http://eppi.ioe.ac.uk/cms/Projects/DepartmentofHealthandSocialCare/Publishedreviews/COVID-19Livingsystematicmapoftheevidence/tabid/3765/Default.aspx . Accessed 1 June 2021.

NCBI SARS-CoV-2 Resources. Available at: https://www.ncbi.nlm.nih.gov/sars-cov-2/ . Accessed 1 June 2021.

Gustot T. Quality and reproducibility during the COVID-19 pandemic. JHEP Rep. 2020;2(4):100141. https://doi.org/10.1016/j.jhepr.2020.100141 .

Article   PubMed   PubMed Central   Google Scholar  

Kodvanj, I., et al., Publishing of COVID-19 Preprints in Peer-reviewed Journals, Preprinting Trends, Public Discussion and Quality Issues. Preprint article. bioRxiv 2020.11.23.394577; doi: https://doi.org/10.1101/2020.11.23.394577 .

Dobler CC. Poor quality research and clinical practice during COVID-19. Breathe (Sheff). 2020;16(2):200112. https://doi.org/10.1183/20734735.0112-2020 .

Article   Google Scholar  

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. https://doi.org/10.1371/journal.pmed.1000326 .

Lunny C, Brennan SE, McDonald S, McKenzie JE. Toward a comprehensive evidence map of overview of systematic review methods: paper 1-purpose, eligibility, search and data extraction. Syst Rev. 2017;6(1):231. https://doi.org/10.1186/s13643-017-0617-1 .

Pollock M, Fernandes RM, Becker LA, Pieper D, Hartling L. Chapter V: Overviews of Reviews. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.1 (updated September 2020). Cochrane. 2020. Available from www.training.cochrane.org/handbook .

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane handbook for systematic reviews of interventions version 6.1 (updated September 2020). Cochrane. 2020; Available from www.training.cochrane.org/handbook .

Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. The impact of different inclusion decisions on the comprehensiveness and complexity of overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):18. https://doi.org/10.1186/s13643-018-0914-3 .

Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. A decision tool to help researchers make decisions about including systematic reviews in overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):29. https://doi.org/10.1186/s13643-018-0768-8 .

Hunt H, Pollock A, Campbell P, Estcourt L, Brunton G. An introduction to overviews of reviews: planning a relevant research question and objective for an overview. Syst Rev. 2018;7(1):39. https://doi.org/10.1186/s13643-018-0695-8 .

Pollock M, Fernandes RM, Pieper D, Tricco AC, Gates M, Gates A, et al. Preferred reporting items for overviews of reviews (PRIOR): a protocol for development of a reporting guideline for overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):335. https://doi.org/10.1186/s13643-019-1252-9 .

Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Open Med. 2009;3(3):e123–30.

Krnic Martinic M, Pieper D, Glatt A, Puljak L. Definition of a systematic review used in overviews of systematic reviews, meta-epidemiological studies and textbooks. BMC Med Res Methodol. 2019;19(1):203. https://doi.org/10.1186/s12874-019-0855-0 .

Puljak L. If there is only one author or only one database was searched, a study should not be called a systematic review. J Clin Epidemiol. 2017;91:4–5. https://doi.org/10.1016/j.jclinepi.2017.08.002 .

Article   PubMed   Google Scholar  

Gates M, Gates A, Guitard S, Pollock M, Hartling L. Guidance for overviews of reviews continues to accumulate, but important challenges remain: a scoping review. Syst Rev. 2020;9(1):254. https://doi.org/10.1186/s13643-020-01509-0 .

Covidence - systematic review software. Available at: https://www.covidence.org/ . Accessed 1 June 2021.

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.

Borges do Nascimento IJ, et al. Novel Coronavirus Infection (COVID-19) in Humans: A Scoping Review and Meta-Analysis. J Clin Med. 2020;9(4):941.

Article   PubMed Central   Google Scholar  

Adhikari SP, Meng S, Wu YJ, Mao YP, Ye RX, Wang QZ, et al. Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review. Infect Dis Poverty. 2020;9(1):29. https://doi.org/10.1186/s40249-020-00646-x .

Cortegiani A, Ingoglia G, Ippolito M, Giarratano A, Einav S. A systematic review on the efficacy and safety of chloroquine for the treatment of COVID-19. J Crit Care. 2020;57:279–83. https://doi.org/10.1016/j.jcrc.2020.03.005 .

Li B, Yang J, Zhao F, Zhi L, Wang X, Liu L, et al. Prevalence and impact of cardiovascular metabolic diseases on COVID-19 in China. Clin Res Cardiol. 2020;109(5):531–8. https://doi.org/10.1007/s00392-020-01626-9 .

Article   CAS   PubMed   Google Scholar  

Li LQ, Huang T, Wang YQ, Wang ZP, Liang Y, Huang TB, et al. COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(6):577–83. https://doi.org/10.1002/jmv.25757 .

Lippi G, Lavie CJ, Sanchis-Gomar F. Cardiac troponin I in patients with coronavirus disease 2019 (COVID-19): evidence from a meta-analysis. Prog Cardiovasc Dis. 2020;63(3):390–1. https://doi.org/10.1016/j.pcad.2020.03.001 .

Lippi G, Henry BM. Active smoking is not associated with severity of coronavirus disease 2019 (COVID-19). Eur J Intern Med. 2020;75:107–8. https://doi.org/10.1016/j.ejim.2020.03.014 .

Lippi G, Plebani M. Procalcitonin in patients with severe coronavirus disease 2019 (COVID-19): a meta-analysis. Clin Chim Acta. 2020;505:190–1. https://doi.org/10.1016/j.cca.2020.03.004 .

Lippi G, Plebani M, Henry BM. Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: a meta-analysis. Clin Chim Acta. 2020;506:145–8. https://doi.org/10.1016/j.cca.2020.03.022 .

Ludvigsson JF. Systematic review of COVID-19 in children shows milder cases and a better prognosis than adults. Acta Paediatr. 2020;109(6):1088–95. https://doi.org/10.1111/apa.15270 .

Lupia T, Scabini S, Mornese Pinna S, di Perri G, de Rosa FG, Corcione S. 2019 novel coronavirus (2019-nCoV) outbreak: a new challenge. J Glob Antimicrob Resist. 2020;21:22–7. https://doi.org/10.1016/j.jgar.2020.02.021 .

Marasinghe, K.M., A systematic review investigating the effectiveness of face mask use in limiting the spread of COVID-19 among medically not diagnosed individuals: shedding light on current recommendations provided to individuals not medically diagnosed with COVID-19. Research Square. Preprint article. doi : https://doi.org/10.21203/rs.3.rs-16701/v1 . 2020 .

Mullins E, Evans D, Viner RM, O’Brien P, Morris E. Coronavirus in pregnancy and delivery: rapid review. Ultrasound Obstet Gynecol. 2020;55(5):586–92. https://doi.org/10.1002/uog.22014 .

Pang J, Wang MX, Ang IYH, Tan SHX, Lewis RF, Chen JIP, et al. Potential Rapid Diagnostics, Vaccine and Therapeutics for 2019 Novel coronavirus (2019-nCoV): a systematic review. J Clin Med. 2020;9(3):623.

Rodriguez-Morales AJ, Cardona-Ospina JA, Gutiérrez-Ocampo E, Villamizar-Peña R, Holguin-Rivera Y, Escalera-Antezana JP, et al. Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis. Travel Med Infect Dis. 2020;34:101623. https://doi.org/10.1016/j.tmaid.2020.101623 .

Salehi S, Abedi A, Balakrishnan S, Gholamrezanezhad A. Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients. AJR Am J Roentgenol. 2020;215(1):87–93. https://doi.org/10.2214/AJR.20.23034 .

Sun P, Qie S, Liu Z, Ren J, Li K, Xi J. Clinical characteristics of hospitalized patients with SARS-CoV-2 infection: a single arm meta-analysis. J Med Virol. 2020;92(6):612–7. https://doi.org/10.1002/jmv.25735 .

Yang J, Zheng Y, Gou X, Pu K, Chen Z, Guo Q, et al. Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis. Int J Infect Dis. 2020;94:91–5. https://doi.org/10.1016/j.ijid.2020.03.017 .

Bassetti M, Vena A, Giacobbe DR. The novel Chinese coronavirus (2019-nCoV) infections: challenges for fighting the storm. Eur J Clin Investig. 2020;50(3):e13209. https://doi.org/10.1111/eci.13209 .

Article   CAS   Google Scholar  

Hwang CS. Olfactory neuropathy in severe acute respiratory syndrome: report of a case. Acta Neurol Taiwanica. 2006;15(1):26–8.

Google Scholar  

Suzuki M, Saito K, Min WP, Vladau C, Toida K, Itoh H, et al. Identification of viruses in patients with postviral olfactory dysfunction. Laryngoscope. 2007;117(2):272–7. https://doi.org/10.1097/01.mlg.0000249922.37381.1e .

Rajgor DD, Lee MH, Archuleta S, Bagdasarian N, Quek SC. The many estimates of the COVID-19 case fatality rate. Lancet Infect Dis. 2020;20(7):776–7. https://doi.org/10.1016/S1473-3099(20)30244-9 .

Wolkewitz M, Puljak L. Methodological challenges of analysing COVID-19 data during the pandemic. BMC Med Res Methodol. 2020;20(1):81. https://doi.org/10.1186/s12874-020-00972-6 .

Rombey T, Lochner V, Puljak L, Könsgen N, Mathes T, Pieper D. Epidemiology and reporting characteristics of non-Cochrane updates of systematic reviews: a cross-sectional study. Res Synth Methods. 2020;11(3):471–83. https://doi.org/10.1002/jrsm.1409 .

Runjic E, Rombey T, Pieper D, Puljak L. Half of systematic reviews about pain registered in PROSPERO were not published and the majority had inaccurate status. J Clin Epidemiol. 2019;116:114–21. https://doi.org/10.1016/j.jclinepi.2019.08.010 .

Runjic E, Behmen D, Pieper D, Mathes T, Tricco AC, Moher D, et al. Following Cochrane review protocols to completion 10 years later: a retrospective cohort study and author survey. J Clin Epidemiol. 2019;111:41–8. https://doi.org/10.1016/j.jclinepi.2019.03.006 .

Tricco AC, Antony J, Zarin W, Strifler L, Ghassemi M, Ivory J, et al. A scoping review of rapid review methods. BMC Med. 2015;13(1):224. https://doi.org/10.1186/s12916-015-0465-6 .

COVID-19 Rapid Reviews: Cochrane’s response so far. Available at: https://training.cochrane.org/resource/covid-19-rapid-reviews-cochrane-response-so-far . Accessed 1 June 2021.

Cochrane. Living systematic reviews. Available at: https://community.cochrane.org/review-production/production-resources/living-systematic-reviews . Accessed 1 June 2021.

Millard T, Synnot A, Elliott J, Green S, McDonald S, Turner T. Feasibility and acceptability of living systematic reviews: results from a mixed-methods evaluation. Syst Rev. 2019;8(1):325. https://doi.org/10.1186/s13643-019-1248-5 .

Babic A, Poklepovic Pericic T, Pieper D, Puljak L. How to decide whether a systematic review is stable and not in need of updating: analysis of Cochrane reviews. Res Synth Methods. 2020;11(6):884–90. https://doi.org/10.1002/jrsm.1451 .

Lovato A, Rossettini G, de Filippis C. Sore throat in COVID-19: comment on “clinical characteristics of hospitalized patients with SARS-CoV-2 infection: a single arm meta-analysis”. J Med Virol. 2020;92(7):714–5. https://doi.org/10.1002/jmv.25815 .

Leung C. Comment on Li et al: COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(9):1431–2. https://doi.org/10.1002/jmv.25912 .

Li LQ, Huang T, Wang YQ, Wang ZP, Liang Y, Huang TB, et al. Response to Char’s comment: comment on Li et al: COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis. J Med Virol. 2020;92(9):1433. https://doi.org/10.1002/jmv.25924 .

Download references

Acknowledgments

We thank Catherine Henderson DPhil from Swanscoe Communications for pro bono medical writing and editing support. We acknowledge support from the Covidence Team, specifically Anneliese Arno. We thank the whole International Network of Coronavirus Disease 2019 (InterNetCOVID-19) for their commitment and involvement. Members of the InterNetCOVID-19 are listed in Additional file 6 . We thank Pavel Cerny and Roger Crosthwaite for guiding the team supervisor (IJBN) on human resources management.

This research received no external funding.

Author information

Authors and affiliations.

University Hospital and School of Medicine, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

Israel Júnior Borges do Nascimento & Milena Soriano Marcolino

Medical College of Wisconsin, Milwaukee, WI, USA

Israel Júnior Borges do Nascimento

Helene Fuld Health Trust National Institute for Evidence-based Practice in Nursing and Healthcare, College of Nursing, The Ohio State University, Columbus, OH, USA

Dónal P. O’Mathúna

School of Nursing, Psychotherapy and Community Health, Dublin City University, Dublin, Ireland

Department of Anesthesiology, Intensive Care and Pain Medicine, University of Münster, Münster, Germany

Thilo Caspar von Groote

Department of Sport and Health Science, Technische Universität München, Munich, Germany

Hebatullah Mohamed Abdulazeem

School of Health Sciences, Faculty of Health and Medicine, The University of Newcastle, Callaghan, Australia

Ishanka Weerasekara

Department of Physiotherapy, Faculty of Allied Health Sciences, University of Peradeniya, Peradeniya, Sri Lanka

Cochrane Croatia, University of Split, School of Medicine, Split, Croatia

Ana Marusic, Irena Zakarija-Grkovic & Tina Poklepovic Pericic

Center for Evidence-Based Medicine and Health Care, Catholic University of Croatia, Ilica 242, 10000, Zagreb, Croatia

Livia Puljak

Cochrane Brazil, Evidence-Based Health Program, Universidade Federal de São Paulo, São Paulo, Brazil

Vinicius Tassoni Civile & Alvaro Nagib Atallah

Yorkville University, Fredericton, New Brunswick, Canada

Santino Filoso

Laboratory for Industrial and Applied Mathematics (LIAM), Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada

Nicola Luigi Bragazzi

You can also search for this author in PubMed   Google Scholar

Contributions

IJBN conceived the research idea and worked as a project coordinator. DPOM, TCVG, HMA, IW, AM, LP, VTC, IZG, TPP, ANA, SF, NLB and MSM were involved in data curation, formal analysis, investigation, methodology, and initial draft writing. All authors revised the manuscript critically for the content. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Livia Puljak .

Ethics declarations

Ethics approval and consent to participate.

Not required as data was based on published studies.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: appendix 1..

Search strategies used in the study.

Additional file 2: Appendix 2.

Adjusted scoring of AMSTAR 2 used in this study for systematic reviews of studies that did not analyze interventions.

Additional file 3: Appendix 3.

List of excluded studies, with reasons.

Additional file 4: Appendix 4.

Table of overlapping studies, containing the list of primary studies included, their visual overlap in individual systematic reviews, and the number in how many reviews each primary study was included.

Additional file 5: Appendix 5.

A detailed explanation of AMSTAR scoring for each item in each review.

Additional file 6: Appendix 6.

List of members and affiliates of International Network of Coronavirus Disease 2019 (InterNetCOVID-19).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Borges do Nascimento, I.J., O’Mathúna, D.P., von Groote, T.C. et al. Coronavirus disease (COVID-19) pandemic: an overview of systematic reviews. BMC Infect Dis 21 , 525 (2021). https://doi.org/10.1186/s12879-021-06214-4

Download citation

Received : 12 April 2020

Accepted : 19 May 2021

Published : 04 June 2021

DOI : https://doi.org/10.1186/s12879-021-06214-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Coronavirus
  • Evidence-based medicine
  • Infectious diseases

BMC Infectious Diseases

ISSN: 1471-2334

sample research paper epidemiology

Digital Commons @ University of South Florida

  • USF Research
  • USF Libraries

Digital Commons @ USF > USF Health > College of Public Health > Epidemiology and Biostatistics > Theses and Dissertations

Epidemiology and Biostatistics Theses and Dissertations

Theses/dissertations from 2023 2023.

Gender Differences in Episodic Memory in Later Life: The Mediating Role of Education , Sara Robinson

Theses/Dissertations from 2022 2022

Nonparametric Estimation of Transition Probabilities in Illness-Death Model based on Ranked Set Sampling , Ying Ma

Theses/Dissertations from 2021 2021

Bayesian Multivariate Joint Modeling for Skewed-longitudinal and Time-to-event Data , Lan Xu

Theses/Dissertations from 2020 2020

Identifying Barriers and Facilitators to Improve Hepatitis C Virus Screening , Linh M. Duong

Quantifying the Impact of Chronic Stress on Racial Disparities in Cardiovascular Disease , Nnadozie Emechebe

A Review of American College Campus Tobacco or Smoke free Policies: A Case Study of a Large Urban University , Sarah E. Powell

Theses/Dissertations from 2019 2019

Evolutionary Dynamics of Influenza Type B in the Presence of Vaccination: An Ecological Study , Lindsey J. Fiedler

Respiratory Infections and Risk for Development of Narcolepsy: Analysis of the Truven Health MarketScan Database (2008 to 2010) with Additional Assessment of Incidence and Prevalence , Darren Scheer

Multimodal Treatment and Neoadjuvant Chemotherapy Trends, Utilization and Survival Effects in Intrahepatic Cholangiocarcinoma – a Propensity Score Analysis , Ovie Utuama

Theses/Dissertations from 2018 2018

Flowgraph Models for Clustered Multistate Time to Event Data , Kristin Hall

Impact of Obesity and Expression of Obesity-Related Genes in the Progression of Prostate Cancer in African American Men , Mmadili Nancy Ilozumba

Angiostrongylus cantonensis: Epidemiologic Review, Location-Specific Habitat Modelling, and Surveillance in Hillsborough County, Florida, U.S.A. , Brad Christian Perich

Strategies to Adjust for Response Bias in Clinical Trials: A Simulation Study , Victoria R. Swaidan

Theses/Dissertations from 2017 2017

Sleep and Alzheimer’s disease: A critical examination of the risk that Sleep Problems or Disorders particularly Obstructive Sleep Apnea pose towards developing Alzheimer’s disease , Omonigho A. Michael Bubu

Deployment, Post-Traumatic Stress Disorder and Hypertensive Disorders of Pregnancy among U.S. Active-Duty Military Women , Michelle C. Nash

Ambient Ozone and Cadmium as Risk Factors For Congenital Diaphragmatic Hernia , Rema Ramakrishnan

Ambient Benzene and PM2.5 Exposure during Pregnancy: Examining the Impact of Exposure Assessment Decisions on Associations between Birth Defects and Air Pollution , Jean Paul Tanner

Bayesian inference on quantile regression-based mixed-effects joint models for longitudinal-survival data from AIDS studies , Hanze Zhang

Theses/Dissertations from 2016 2016

Sleep Duration Patterns from Adolescence to Young Adulthood and their Impact on Asthma and Inflammation , Chighaf Bakour

Efficiency of an Unbalanced Design in Collecting Time to Event Data with Interval Censoring , Peiyao Cheng

Association between Folate Levels and Preterm Birth in Tampa, Florida , Carolyn Heeraman

HIV/STIs and Intimate Partner Violence: Results from the Togo 2013-2014 Demographic and Health Surveys , Anthony H. Nguyen

Incidence, Persistence, and Recurrence of Anogenital α- Mucosal HPV Infections (HPV 6, 11, 16, 18, 31, 33, 45, 52 and 58) , Shitaldas J. Pamnani

Factors Associated with Sexually Transmitted Infections (STIs) and Multiple STI Co-infections: Results from the EVRI HIV Prevention Preparedness Trial , Ubin Pokharel

Hidden Markov Chain Analysis: Impact of Misclassification on Effect of Covariates in Disease Progression and Regression , Haritha Polisetti

Association of Known and Unknown Oncoviruses with External Genital Lesion (EGL) Manifestations in a Multinational Cohort of Men , Shams Ur Rahman

Racial and Ethnic Differences in Low-Risk Cesarean Deliveries in Florida , Yuri Combo Vanda Sebastiao

The Effects of Personal and Family History of Cancer on the Development of Dementia in Japanese Americans: The KAME Project , Adam Lee Slotnick

Rhabdomyosarcoma Incidence and Survival in Whites, Blacks, and Hispanics from 1973-2013: Analysis from the Surveillance, Epidemiology, and End Results Program , Heather Tinsley

Theses/Dissertations from 2015 2015

Assessment of the impact of Attention Deficit Hyperactivity Disorder on Type 1 Diabetes , Kellee Miller

Bayesian Inference on Longitudinal Semi-continuous Substance Abuse/Dependence Symptoms Data , Dongyuan Xing

Theses/Dissertations from 2014 2014

Statistical Analysis and Modeling of PM 2.5 Speciation Metals and Their Mixtures , Boubakari Ibrahimou

Elective Early Term Delivery and Adverse Infant Outcomes in a Population-Based Multiethnic Cohort , Jason Lee Salemi

Theses/Dissertations from 2013 2013

Uncontrolled Hypertension and Associated Factors in Hypertensive Patients at the Primary Healthcare Center Luis H. Moreno, Panama: A Feasibility Study , Roderick Ramon Chen Camano

An Analysis of the Association between Animal Exposures and the Development of Type 1 Diabetes in the TEDDY Cohort , Callyn Hall

Multiple Calibrations in Integrative Data Analysis: A Simulation Study and Application to Multidimensional Family Therapy , Kristin Wynn Hall

Mother- to - Child Transmission of HIV and congenital syphilis: A snapshot of an Epidemic in the Republic of Panama , Lorna Elizabeth Jenkins

A Latent Mixture Approach to Modeling Zero-Inflated Bivariate Ordinal Data , Rajendra Kadel

Associations of Perceived Stress, Sleep, and Human Papillomavirus in a Prospective Cohort of Men , Stephanie Kay Kolar

Influence of Maternal Thyroid Dysfunction on Infant Growth and Development , Ronee Elisha Wilson

Theses/Dissertations from 2012 2012

Bayesian Inference on Mixed-effects Models with Skewed Distributions for HIV longitudinal Data , Ren Chen

Linear Mixed-Effects Models: Applications to the Behavioral Sciences and Adolescent Community Health , Lizmarie Gabriela Maldonado

Statistical Estimation of Physiologically-based Pharmacokinetic Models: Identifiability, Variation, and Uncertainty with an Illustration of Chronic Exposure to Dioxin and Dioxin-like-compounds. , Zachary John Thompson

Evaluation of Repeated Biomarkers: Non-parametric Comparison of Areas under the Receiver Operating Curve Between Correlated Groups Using an Optimal Weighting Scheme , Ping Xu

Theses/Dissertations from 2011 2011

The Natural History of Human Papillomavirus Related Condyloma In a Multinational Cohort of Men , Gabriella Anic

Characterization of the Serologic Responses to Plasmodium vivax DBPII Variants Among Inhabitants of Pursat Province, Cambodia , Samantha Jones Barnes

Disparities in Survival and Mortality among Infants with Congenital Aortic, Pulmonary, and Tricuspid Valve Defects by Maternal Race/Ethnicity and Infant Sex , Colleen Conklin

Case-Control Study of Sunlight Exposure and Cutaneous Human Papillomavirus Seroreactivity in Basal Cell and Squamous Cell Carcinomas of the Skin , Michelle R. Iannacone

Assessing the Relationship of Monocytes with Primary and Secondary Dengue Infection among Hospitalized Dengue Patients in Malaysia, 2010: A Cross-Sectional Study , Benjamin Glenn Klekamp

Gender Differences in Lung Cancer Treatment and Survival , Margaret Anne Kowski

An examination of diet, acculturation and risk factors for heart disease among Jamaican immigrants , Carol Renee Oladele

Indicators of Early Adult and Current Personality in Parkinson's Disease , Kelly Sullivan

Theses/Dissertations from 2010 2010

Does Patient Dementia Limit the Use of Cardiac Catheterization in ST-Elevated Myocardial Infarction? , Marianne Chanti-Ketterl

Extending the Principal Stratification Method To Multi-Level Randomized Trials , Jing Guo

Serum Antibodies to Human Papillomavirus Type 6, 11, 16 and 18 and Their Role in the Natural History of HPV Infection in Men , Beibei Lu

Evaluation of Common Inherited Variants in Mitochondrial-Related and MicroRNA-Related Genes as Novel Risk Factors for Ovarian Cancer , Jennifer Permuth Wey

DNA Methylation and its Association with Prenatal Exposures and Pregnancy Outcomes , Jennifer Straughen

Theses/Dissertations from 2009 2009

Cardiovascular risk factors for mild cognitive impairment , Michael Malek-Ahmadi

Additive Latent Variable (ALV) Modeling: Assessing Variation in Intervention Impact in Randomized Field Trials , Peter Ayo Toyinbo

Theses/Dissertations from 2008 2008

A Comparison of Community-Based Centers versus University-Based Centers in Clinical Trial Performance , Cynthia R. Stockddale

Advanced Search

  • Email Notifications and RSS
  • All Collections
  • USF Faculty Publications
  • Open Access Journals
  • Conferences and Events
  • Theses and Dissertations
  • Textbooks Collection

Useful Links

  • Rights Information
  • SelectedWorks
  • Submit Research

Home | About | Help | My Account | Accessibility Statement | Language and Diversity Statements

Privacy Copyright

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BJPsych Bull
  • v.40(2); 2016 Apr

Logo of bjpsychbull

Sampling in epidemiological research: issues, hazards and pitfalls

Stephen tyrer.

1 Newcastle University, UK

2 University of Huddersfield, UK

Surveys of people's opinions are fraught with difficulties. It is easier to obtain information from those who respond to text messages or to emails than to attempt to obtain a representative sample. Samples of the population that are selected non-randomly in this way are termed convenience samples as they are easy to recruit. This introduces a sampling bias. Such non-probability samples have merit in many situations, but an epidemiological enquiry is of little value unless a random sample is obtained. If a sufficient number of those selected actually complete a survey, the results are likely to be representative of the population. This editorial describes probability and non-probability sampling methods and illustrates the difficulties and suggested solutions in performing accurate epidemiological research.

It is wise to be chary of surveys and polls, and to always read the figures carefully. In the heady excitement just before the vote on Scottish independence many observers thought that allowing teenagers to vote would propel Scotland to self-government. After the referendum, the Straits Times in Singapore stated ‘Young people voted in droves to break up the centuries-old union’, based on an exit poll that showed that 71% had voted ‘yes’ to independence. 1 This poll only included the very small number of 14 people in this age bracket, 4 of whom had voted ‘no’. A later, more representative YouGov poll with a much larger sample reported that 51% of 16- to 24-year-olds had voted ‘no’. 2 In whatever way the first sample had been selected, its small size would have made it highly susceptible to sampling error. When there is a strong expectation that a particular event is going to result there is a strong inclination to believe the anticipated outcome.

Sampling in epidemiological studies

Sampling for health-related research does not usually need to be as precise as sampling for political surveys but in epidemiological investigations every effort should be made to select a representative sample. Often this is not achieved. Concern has been expressed for years about the number of prisoners who have mental health problems. In a 1979 study in the USA to estimate the prevalence of mental illness in prisoners, 33 male prisoners were selected and interviewed by a psychiatrist using an instrument called the Psychiatric Status Schedule. 3 Of those interviewed 3% were diagnosed as having a mental disorder and 27% had a drug or alcohol problem. 4 The main problem with this paper is the number of people sampled and how they were selected. Although it is stated that the prisoners were selected at random, the number of prisoners selected for interview is on the low side. The procedure for randomisation is not indicated. Female prisoners were not included. The determination of the prevalence of mental illness from a survey in one prison in one state in the USA cannot be extrapolated to the whole country, where there are more than six grades of prisons according to the degree of security required. There is no indication in the paper about how the sampling procedure controlled for the proportion of inmates that were detained and those that were sentenced. Apart from sampling errors, justifiable criticism can also be made of the reliability of only having one psychiatrist reviewing all prisoners, the categorical method of diagnosis (mental disorder or drug or alcohol misuse) and the use of the Psychiatric Status Schedule, which is reported to have consistency in many of its scales. Under these circumstances it is unsurprising that the estimate of prevalence of mental disorder in this survey did not accord with a recent systematic review examining studies over a 40-year period which found 14% of prisoners had a diagnosed psychiatric disorder. 5

When carrying out any survey of any type it is essential for the researcher to clearly define the target population that they wish to sample. On some occasions the population will be sufficiently small, and the researcher is able to include the entire population in the study. This is termed a census study. Much more frequently the population is too large for all its members to be contacted and so a sample is chosen to reflect the characteristics of the population from which it is drawn.

Sampling methods

Sampling methods are described as either probability or non-probability methods ( Box 1 ). 6 In probability samples, each member of the population has an exactly equal chance of being selected. Types of probability sampling include random sampling, stratified and systematic sampling. Probability sampling is a more accurate method in determining the true characteristics of the population but it is not perfect. Sampling error refers to the variations from the true population parameter which can result from random sampling. With true probability samples sampling error is reduced by having larger samples. In non-probability sampling, the degree to which the sample differs from the population is unknown.

Box 1 Sampling methods

Census study: whole population under enquiry

Probability sampling:

Non-probability sampling:

  • convenience

Qualitative research: purposive

Sample size

To estimate how large the sample should be to reflect the total population the confidence level of the mean of the results, a measure of the variance of the responses of the sample (standard deviation) and an estimate of the margin of allowable error need to be determined. The calculation is not difficult and help can be readily accessed ( www.qualtrics.com/blog/determining-sample-size ).

Types of probability sampling

Random sampling.

In random sampling every member of the population has the same chance (probability) of being selected into the sample. Using a random sample it is possible to describe quantitatively the relationship between the sample and the underlying population, giving the range of values, called confidence intervals, in which the true population parameter is likely to lie. Random does not mean arbitrary. Choosing a random sample relies on an objective mechanism to select elements from the population. This is usually done by a computer, but rolling dice or using random numbers are also acceptable options.

Stratified and systematic sampling

Stratified sampling is often used when one or more of the strata (subsets of the population) have a low incidence relative to the other strata. It can also be used to reduce sampling error.

In systematic sampling every 5th, 10th, 20th or n -th record is selected from a list of population members. It is no more than a form of random sampling.

Non-probability sampling

In non-probability sampling members are selected from the population in any form of non-random manner. Examples include convenience sampling, judgement sampling, quota sampling and snowball sampling.

Convenience sampling

Convenience, accidental or opportunistic sampling is used to find out a cheap estimate of the truth. An easily accessible non-random selection of the population under enquiry is chosen. A frequently used method is contacting people by email.

Judgement sampling

An extension of convenience sampling is judgement sampling. Thus, when carrying out a national enquiry on the frequency of depressive illness, one specific town and one rural area that are thought to be typical of the country as a whole may be selected. Ideally, the chosen sample needs to be representative of the entire population and this is difficult to determine.

Quota sampling

Quota sampling is the non-probability equivalent of stratified sampling. In the first instance the investigator identifies the strata and their frequency in the population. Convenience sampling is then used to select the required number of participants from each stratum.

Snowball sampling

Snowball sampling is a special non-probability method used when there are difficulties in identifying members of the population or if the desired sample characteristic is rare. This technique relies on existing study participants recruiting future participants from among their acquaintances. It is often used when it is anticipated that individuals may be reluctant to be identified, for instance when surveying illegal drug users. Although inexpensive, major bias may result because a balanced cross-section of the population is not identified.

Which sampling method to use?

Which sampling method to use depends on the nature of the survey proposed. Epidemiological research requires a representative sample but there is a great deal of health research that does not need one. Service evaluations and randomised controlled trials (RCTs) do not require a survey design. In an RCT the main purpose is to compare groups within the sample, members of which are placed into them randomly, such as treatment v . placebo. Similarly, health psychometrics (e.g. design of health measures), experimental studies, theoretical-based research studies (e.g. testing a theory or proposing a new theory), observational studies (e.g. looking for relationships of theoretical constructs, such as depression and self-esteem) are mostly conducted using opportunistic samples. Precisely accurate statistics may not be required.

Qualitative researchers are often concerned with what exists rather than how much, 7 and seek to delve into complex processes such as responding to long-term illnesses. Purposive sampling, one of the most common sampling strategies, groups participants according to pre-selected criteria relevant to a particular research question. There are more: Kuzel 8 identified 13 different forms of qualitative sampling strategy, including maximum variation, theory-driven, critical case and deviant case. One case is sufficient at times to illustrate a point. For example, Heyman et al 9 explored the experiences of a female patient who had ‘risked exploding’, according to a colorectal nurse, by absconding from hospital to have sexual intercourse with her boyfriend immediately after anal cancer surgery. The aim of the study was to understand why one particular individual had behaved in such a medically risky and highly unusual way. A recent introduction to qualitative research methodology is provided by Silverman. 10

Hazards of non-probability sampling

When performing a survey there is a strong temptation to obtain information from as much of the population as possible in the belief that accuracy can be increased in this way. An example is given to show that this may be fallacious.

Many of us are interested in psychiatrists' views about service issues. A researcher wishes to find out the opinions of psychiatrists about policy regarding controlled drugs. A questionnaire is designed with a number of statements ranging from tighter control over existing drugs to decriminalisation of all unscheduled agents. Respondents have to select which statement best accords with their views. The researcher is also interested in the responses of grades of psychiatrist to see whether there are different attitudes about the issue between consultants and trainee psychiatrists. The Royal College of Psychiatrists holds the names of all psychiatrists in the UK, and the researcher is given access to this list. It is proposed that as many psychiatrists as possible are required, and so all the psychiatrists are contacted by email and asked their views. When all the questionnaires are returned online the response rate is 38% with 5128 psychiatrists completing the questionnaire. The analysis of the replies of this large number of people takes a good deal of time but this is completed after a few months and the paper is written. It is submitted to a prestigious psychiatric journal and is rejected. What were the reasons?

A proportion of the individuals would not have been contactable by email, and this group may have different attitudes from the rest. The nature of the responses of those individuals who failed to reply to the questionnaire, the majority, is unknown. They might have differed from respondents if, for instance, busier or more stressed psychiatrists were less likely to participate. As a result, the sample identified by the researcher may not have been representative and the findings cannot be safely generalised to all those working in this field. This is a non-probability sample and, as such, statistical inferences cannot be validly made from the results. Notwithstanding, the results of this survey are not valueless. Although they cannot be reliably generalised to the total population of psychiatrists, they could still be useful for piloting purposes. Certain questions on the survey could be refined and/or alternative questions included in a later enquiry.

How to conduct a probability sample

In the example referred to above the sample size should be determined (see earlier) and the names of those selected for interview entered into a sampling frame. Attempts should be made to contact all those included to ensure that the results are representative. Multiple efforts must be made to persuade those selected to complete the survey questionnaire. If most of the initially identified sample do provide information, the results can be analysed statistically and valid conclusions can be drawn.

The researcher will need to decide whether to aim for a simple probability sample or to stratify the sample by predetermining the numbers to be selected randomly into relevant categories, for example, in this case, occupational grade (consultant, specialist registrar, etc.), gender. Stratification ensures that the sample is representative of the population with respect to the chosen population parameters if known; or, more commonly, to ensure that categories with smaller numbers in the population (e.g. associate specialists) are adequately represented for comparative purposes. An introduction to stratified and other forms of complex probability samples is provided by Bryman. 11

Selection bias

Selection bias can arise if insufficient numbers of individuals identified in the sampling frame fail to complete the questionnaire. The greater the number of non-respondents who fail to complete the exercise the more scope there is for the sample to be skewed in an unknown direction. As a rule of thumb, the researcher should aim for at least a minimum of 60% completion by those selected from the sampling frame and every effort should be made to achieve more than this. If the percentage of those completing the questionnaire is less than 100%, as it almost invariably will be, there are a number of strategies the investigator can adopt to manage non-response bias.

Avoiding non-response bias

In the first instance, the non-respondents should be approached asking them again to complete the questionnaire. In those who fail to respond again a third attempt should be made to urge them to reply. Comparisons can then be made between first-, second- and third-time responders. If the responses are similar then extra sampling may not be needed. If the responses of the late respondents are very different to the rest of the study then it may be necessary to contact more of the non-respondents. This depends on the proportion of respondents completing the survey, the larger the number the better.

It may not be necessary to obtain more data as it has been shown that the observations of late responders are more like non-responders than are first-time responders, 12 so the responses of the late responders can be applied to those who failed to respond to the enquiry. This cannot be assumed, however, and late respondents in some surveys behave like earlier participants. 13

It has also been shown that if a small random sample of non-respondents is selected and all can be contactable and complete the survey, the results can be extrapolated to the remainder of the non-respondents. The relatively small number of 20 is considered to be sufficient for this purpose if all complete the questionnaire. 14 In practice, it is very difficult to ensure such a 100% response in a survey of this nature and this aim may not be achievable.

We hope this article will persuade the reader to examine the methods that have been used to perform surveys of opinions and other issues. Let us quote a final example. A Mail On Sunday poll in August 2011 showed that the majority of those surveyed backed the reintroduction of capital punishment. 15 One thousand people took part in this survey which was said to be representative of British public opinion. The consumer panel from which these people were selected were contacted online so those without email access were not included. Furthermore, members of this panel are paid for a registration of their interest and for each poll in which they give their opinion. They are possibly representative of the Daily Mail readership but not of the general population whose views may or may not correspond to those of the sample.

Those intending to perform surveys can find more information in this document: www.sagepub.com/upm-data/40803_5.pdf . Those wishing to carry out surveys on psychiatric topics, particularly if involving the membership of the Royal College of Psychiatrists, should contact the College Registrar.

Acknowledgments

We thank Dr Jonathan Tyrer, Genetic Epidemiology Group, Department of Oncology, Cambridge University, for helpful advice on the manuscript.

Declaration of interest None.

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals

You are here

  • Volume 12, Issue 10
  • Studies of prevalence: how a basic epidemiology concept has gained recognition in the COVID-19 pandemic
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-9761-206X Diana Buitrago-Garcia 1 , 2 ,
  • http://orcid.org/0000-0002-3830-8508 Georgia Salanti 1 ,
  • http://orcid.org/0000-0003-4817-8986 Nicola Low 1
  • 1 Institute of Social and Preventive Medicine , University of Bern Faculty of Medicine , Bern , Switzerland
  • 2 Graduate School of Health Sciences , University of Bern , Bern , Switzerland
  • Correspondence to Professor Nicola Low; nicola.low{at}ispm.unibe.ch

Background Prevalence measures the occurrence of any health condition, exposure or other factors related to health. The experience of COVID-19, a new disease caused by SARS-CoV-2, has highlighted the importance of prevalence studies, for which issues of reporting and methodology have traditionally been neglected.

Objective This communication highlights key issues about risks of bias in the design and conduct of prevalence studies and in reporting them, using examples about SARS-CoV-2 and COVID-19.

Summary The two main domains of bias in prevalence studies are those related to the study population (selection bias) and the condition or risk factor being assessed (information bias). Sources of selection bias should be considered both at the time of the invitation to take part in a study and when assessing who participates and provides valid data (respondents and non-respondents). Information bias appears when there are systematic errors affecting the accuracy and reproducibility of the measurement of the condition or risk factor. Types of information bias include misclassification, observer and recall bias. When reporting prevalence studies, clear descriptions of the target population, study population, study setting and context, and clear definitions of the condition or risk factor and its measurement are essential. Without clear reporting, the risks of bias cannot be assessed properly. Bias in the findings of prevalence studies can, however, impact decision-making and the spread of disease. The concepts discussed here can be applied to the assessment of prevalence for many other conditions.

Conclusions Efforts to strengthen methodological research and improve assessment of the risk of bias and the quality of reporting of studies of prevalence in all fields of research should continue beyond this pandemic.

  • EPIDEMIOLOGY
  • STATISTICS & RESEARCH METHODS

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:  https://creativecommons.org/licenses/by/4.0/ .

https://doi.org/10.1136/bmjopen-2022-061497

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

In introductory epidemiology, students learn about prevalence, an easy to understand concept, defined as ‘a proportion that measures disease occurrence of any type of health condition, exposure, or other factor related to health’, 1 or ‘the proportion of persons in a population who have a particular disease or attribute at a specified point in time or over a specified period.’ 2 Prevalence is an important measure for assessing the magnitude of health-related conditions, and studies of prevalence are an important source of information for estimating the burden of disease, injuries and risk factors. 3 Accurate information about prevalence enables health authorities to assess the health needs of a population, to develop prevention programmes and prioritise resources to improve public health. 4 Perhaps, owing to the apparent simplicity of the concept of prevalence, methodological developments to assess the quality of reporting, the potential for bias and the synthesis of prevalence estimates in meta-analysis have been neglected, 5 when compared with the attention paid to methods relevant to evidence from randomised controlled trials and comparative observational studies. 6 7

The COVID-19 pandemic has shown the need for epidemiological studies to describe and understand a new disease quickly but accurately. 8 Studies reporting on prevalence have been an important source of evidence to describe the prevalence of active SARS-CoV-2 infection and antibodies to SARS-CoV-2, the spectrum of SARS-CoV-2-related morbidity and helped to understand factors related to infection and disease to inform national decisions about containment measures. 9–11 Accurate estimates of prevalence of SARS-CoV-2 are crucial because they are used as an input for the estimation of other quantities, such as infection fatality ratios, which can be calculated indirectly using seroprevalence estimates. 12 Assessments of published studies have, however, highlighted methodological issues that affect study design, conduct, analysis, interpretation and reporting. 13–15 In addition, some questions about prevalence need to be addressed through systematic reviews and meta-epidemiological studies. A high proportion of published systematic reviews of prevalence, however, also have flaws in reporting and methodological quality. 5 16 Confidence in the results of systematic reviews is determined by the credibility of the primary studies and the methods used to synthesise them.

The objective of this communication is to highlight key issues about the risk of bias in studies that measure prevalence and about the quality of reporting, using examples about SARS-CoV-2 and COVID-19. We refer to prevalence at the level of a population, and not as a prediction at an individual level. The estimand is, therefore, ‘what proportion of the population is positive’ and not ‘what is the probability this person is positive.’ Although incidence and prevalence are related epidemiologically, we do not discuss incidence in this article because the study designs for measurement of the quantities differ. Bias is a systematic deviation of results or inferences from the underlying (unobserved) true values. 1 The risk of bias is a judgement about the degree to which the methods or findings of a study might underestimate or overestimate the true value in the target population, 7 in this case, the prevalence of a condition or risk factor. Quality of reporting refers to the completeness and transparency of the presentation of a research publication. 17 Risk of bias and quality of reporting are separate, but closely related, because it is only possible to assess the strengths and weaknesses of a study report if the methods and results are described adequately.

Bias in prevalence studies

The two main domains of bias in prevalence studies are those related to the study population (selection bias) and the condition being assessed (information bias) ( figure 1 ). Biases involved in the design, conduct and analysis of a study affect its internal validity. Selection bias also affects external validity, the extent to which findings from a specific study can be generalised to a wider, target population in time and space. There are many names given to different biases, often addressing the same concept. For this communication, we use the names and definitions published in the Dictionary of Epidemiology. 1

  • Download figure
  • Open in new tab
  • Download powerpoint

Potential for selection bias and information bias in prevalence studies. Coloured lines relate to the coloured boxes, showing at which stage of study procedures selection bias (blue line) and information bias (purple line) can occur.

Selection bias

Selection bias relates to the representativeness of the sample used to estimate the prevalence in relation to the target population. The target population is the group of individuals to whom the findings, conclusions or inferences from a study can be generalised. 1 There are two steps in a prevalence study at which selection bias might occur: at the invitation to take part in the study and, among those invited, who takes part ( figure 1 ).

Selection bias in the invitation to take part in the study

The probability of being invited to take part in a study should be the same for every person in the target population. Evaluation of selection bias at this stage should, therefore, account for the complexity of the strategy for identification of participants. For example, if participants are invited from people who have previously agreed to participate in a registry or cohort, each level of invitation that has contributed to the final setting should be judged for the increasing risk of self-selection. Those who are invited to take part might be defined by demographic characteristics, for example, children below 10 years or study setting (eg, hospitalised patients), or a random sample of the general population. The least biased method to select participants in a prevalence study is to sample at random from the target population. For example, the Real-time Assessment of Community Transmission (REACT) Studies to assess the prevalence of the virus, using molecular diagnostic tests (REACT-1) and antibodies (REACT-2), invite random samples of people, stratified by area, from the National Health Service patient list in England. 9 Those invited are close to a truly random sample because almost everyone in England is registered with a general practitioner. In some cases, criteria applied to the selection of a random sample might still result in considerable bias. For example, a seroprevalence study conducted in Spain did not include care home addresses, which could have excluded around 6% of the Spanish older population. 18 Excluding people in care homes facilities might underestimate SARS-CoV-2 seroprevalence in older adults, if their risk of exposure was higher than the average in the general population. 13 Other methods of sampling are at risk of selection bias. For example, asking for volunteers through advertisements are liable to selection bias because not everyone has the same probability of seeing or replying to the advert. For example, the use of social media to invite people to a drive-in test centre to estimate the population prevalence of antibodies to SARS-CoV-2, 19 or online adverts to assess mental health symptoms during the pandemic, excludes those without an internet connection or who do not use social media, such as older people. 20

Selection bias related to who takes part in the study

Non-response bias occurs when people who have been invited, but do not take part in a study differ systematically from those who take part in ways that are associated with the condition of interest. 21 In the REACT-1 study, 22 for example, across four survey rounds, the investigators invited 2.4 million people; 596 000 swabs that were returned had a valid result (25%). The proportion of participants responding was lower in later than in earlier rounds, in men than women and in younger than older age groups. If the sociodemographic characteristics of the target population are known, the observed results could be weighted statistically to represent the overall population but might still be biased by unmeasurable characteristics that drive willingness to take part.

The direction of non-response bias is often not predictable (can result in over-or underestimation of the true prevalence) because information about the motivation to take part in a study, or not, is not usually collected. 13 In a multicentre cross-sectional survey of the prevalence of PCR-determined SARS-CoV-2 in hospitals in England, the authors suggested that different selection biases could have had opposing effects. 23 For example, staff might have volunteered to take part if they were concerned that they might have been exposed to COVID-19. If such people were more likely than unexposed people to be tested, prevalence might be overestimated. Alternatively, workers in lower-paid jobs, without financial support might have been less likely to take part than those at higher grades because of the consequences for themselves or their contacts if found to be infected. If the less-well paid jobs are also associated with a higher risk of exposure to SARS-CoV-2, the prevalence in the study population would be underestimated. Accorsi et al suggest that the risk of non-response bias in seroprevalence studies might be reduced by sampling from established and well-characterised cohorts with high levels of participation, in whom the characteristics of non-respondents are known. 13

As the proportion of invited people that do not take part in a study (non-respondents) increases, the probability of non-response bias might also increase if the topic of the study influences the probability and the composition of the study population. 24 Empirical evidence of bias was found in a systematic review of sexually transmitted Chlamydia trachomatis infection; prevalence surveys with the lowest proportion of respondents found the highest prevalence of infection, suggesting selective participation by those with a high risk of being infected. 25 Whether or not there is a dose–response relationship between the proportion of non-respondents and the likelihood of SARS-CoV-2 infection is unclear. The risks of selection bias at the stages of invitation and participation can be interrelated and might oppose each other. In the REACT-1 study, 22 it is not clear whether the reduction in selection bias through random sampling outweighed the potential for selection bias owing to the high and increasing proportion of non-respondents over time or vice versa.

Information bias

Information bias occurs when there are systematic errors affecting the completeness or accuracy of the measurement of the condition or risk factor of interest. There are different types of information bias.

Misclassification bias

This bias refers to the incorrect classification of a participant as having, or not having, the condition of interest. Misclassification is an important source of measurement bias in prevalence studies because diagnostic tests are imperfect and might not distinguish clearly among those with and without the condition. 26 For diagnostic tests, the predictive values will also be influenced by the prevalence of the condition in the study population. Seroprevalence studies are essential for determining the proportion of a population that has been exposed to SARS-CoV-2 up to a given time point. Detection of antibodies is affected by the test type and manufacturer, sample type such as serum, dried blood spots, saliva, urine or others, 27 28 and the time of sampling after infection. Different diagnostic tests might also be used in participants in the same study population, but adjustment for test performance is not always appropriate because the characteristics derived from studies in which the tests were validated might differ from the study population. 13 Accorsi et al have described in detail this issue and other biases in the ascertainment of SARS-CoV-2 seroprevalence studies. 13 Test accuracy can also change across populations, owing to the inherent characteristics of tests when clinical variability is present, 29 for example, when tests for SARS-CoV-2 detection are applied to people with or without symptoms.

In a new disease, such as COVID-19, diagnostic criteria might not be standardised or might change over time. For example, accurate assessment of the prevalence of persistent asymptomatic SARS-CoV-2 infection requires a complete list of symptoms and follow-up for a sufficiently long duration to ensure that symptoms did not develop later. 15 30 In a prevalence study conducted in a care home in March 2020, patients were asked about typical and non-typical symptoms of COVID-19. However, symptoms such as anosmia or ageusia had not been reported in association with SARS-CoV-2 at that time, so patients with these as isolated symptoms could have been wrongly classified as asymptomatic. 15 31 Poor quality of data collection has also been found in studies estimating the prevalence of mental health problems during the pandemic. 32 The use of non-validated scales, or dichotomisation to define the cases using inappropriate or unclear thresholds, will bias the estimated prevalence of the condition. Misclassification may also occur in calculations of the prevalence of SARS-CoV-2 in contacts of diagnosed cases if not all contacts are tested, and it is assumed that individuals that were not tested were also uninfected. 13

Recall bias

This bias results in misclassification when the condition has been measured through surveys or questionnaires that rely on memory. A study that aimed to describe the characteristics and symptom profile of individuals with SARS-CoV-2 infection in the USA collected information about symptoms before, and for 14 days after, being enrolled in the study. 33 The authors discuss the potential for recall bias when collecting symptoms retrospectively and if different people recollect different symptoms.

Observer bias

This bias occurs when an observer provides a wrong measurement due to lack of training or subjectivity. 21 For example, a study in the USA found variation between 14 universities in the prevalence of clinical and subclinical myocarditis in competitive athletes with SARS-CoV-2 infection. 34 One of the diagnostic tools was cardiac magnetic resonance imaging and authors attributed some of the variability to differences in the protocols and the expertise among assessors. To reduce the risk of observer bias, researchers should aim to use tools that minimise subjectivity and standardise training procedures.

Reporting studies of prevalence

There is no agreed list of preferred items for reporting studies of prevalence. The published article or a preprint are usually the only available record of a study to which most people, other than the investigators themselves, have access. The written report, therefore, needs to contain the information required to understand the possible biases and assess internal and external validity. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement is a widely used guideline, which includes recommendations for cross-sectional studies that examine associations between an exposure and outcome. 35 Table 1 shows selected items from the STROBE statement and recommendations for cross-sectional studies that are particularly relevant to the complete and transparent description of methods for studies of prevalence.

  • View inline

Items from the STROBE checklist for cross-sectional studies that are relevant for prevalence studies

First, clear definitions of the target population, study setting and eligibility criteria to select the study population are required (STROBE items 5, 6a). These issues affect assessment of external validity 36 because estimates of prevalence in a specific population and setting are often generalised more widely. 1 14 Second, the denominator used to calculate the prevalence should be clearly stated, with a description of each stage of the study showing the numbers of individuals eligible, included and analysed (STROBE item 13a, b). Accurate reports of the numbers and characteristics of those who take part (responders) or do not take part (non-responders) in the study are needed for the assessment of selection bias, but this information is not always available. 24 37 Poor reporting about the proportion of responders has been described as one of the main limitations of studies in systematic reviews of prevalence. 38 As with reports of studies of any design, the statistical methods applied to provide prevalence estimates, including methods used to address missing data (STROBE item 12c) and to account for the sampling strategy (STROBE item 12d) need to be reported clearly. 35 The setting, location and periods of enrolment and data collection (STROBE item 5) are particularly important for studies of SARS-CoV-2; the stage of the pandemic, preventive measures in place and virus variants in circulation should all be described because these affect the interpretation of estimates of prevalence. Third, it is crucial to provide a clear definition of the condition or risk factor of interest (STROBE item 7) and how it was measured (STROBE item 8), so that the risk of information bias can be assessed. The definition may be straightforward if there are objective criteria for ascertainment. For example, studies of the prevalence of active SARS-CoV-2 infection should report the diagnostic test, manufacturer, sample type and criteria for a positive result. 39 40 For new conditions that have not been fully characterised, such as post-COVID-19 condition, also known as ‘long COVID-19’, reporting of prevalence can be challenging. 41 42 The WHO produced a case definition 43 in October 2021, but this might take time to be adopted widely.

The COVID-19 pandemic has produced an enormous amount of research about a single disease, published over a short time period. 44 45 Authors who have assessed the body of research on COVID-19 have highlighted concerns about the risks of bias in different study designs, including studies of prevalence. 13 44 In systematic reviews of a single topic, the occurrence of asymptomatic SARS-CoV-2 infection, we observed high between-study heterogeneity, serious risks of bias and poor reporting in the measurement of prevalence. 30 Biased results from prevalence studies can have a direct impact at the levels of the individual, community, global health and policy-making. This communication describes concepts about risks of bias and provides examples that authors can apply to the assessment of prevalence for many other conditions. Future research should be conducted to investigate sources of bias in studies of prevalence and empirical evidence of their influence on estimates of prevalence. The development of a tool that can be adapted to assess the risk of bias in studies of prevalence, and an extension to the STROBE reporting guideline, specifically for studies of prevalence, would help to improve the quality of published studies of prevalence in all fields of research beyond this pandemic.

Acknowledgments

We would like to thank Yuly Barón, who created figure 1.

  • Rothman KJ ,
  • Greenland S ,
  • Institute for Health Metrics and Evaluation (IHME)
  • GBD 2019 Diseases and Injuries Collaborators
  • Hoffmann F ,
  • Pieper D , et al
  • Davey-Smith G ,
  • Higgins JPT ,
  • Lipsitch M ,
  • Swerdlow DL ,
  • Atchison C ,
  • Ashby D , et al
  • World Health Organization
  • Siegler AJ ,
  • Sullivan PS ,
  • Sanchez T , et al
  • Hanage WP ,
  • Owusu-Boaitey N , et al
  • Accorsi EK ,
  • Rumpler E , et al
  • Griffith GJ ,
  • Morris TT ,
  • Tudball MJ , et al
  • Meyerowitz EA ,
  • Richterman A ,
  • Bogoch II , et al
  • McKenzie JE ,
  • Kirkham J , et al
  • Hirst A , et al
  • Pérez-Gómez B ,
  • Pastor-Barriuso R , et al
  • Bendavid E ,
  • Mulaney B ,
  • Sood N , et al
  • Kalimullah NA ,
  • Osuagwu UL , et al
  • Catalogue of Bias Collaboration
  • Chand M , et al
  • Groves RM ,
  • Peytcheva E
  • Redmond SM ,
  • Alexander-Kisslig K ,
  • Woodhall SC , et al
  • Bossuyt PMM , et al
  • Kim H-S , et al
  • Niedrig M ,
  • El Wahed AA , et al
  • Leeflang MMG ,
  • Bossuyt PMM ,
  • Buitrago-Garcia D ,
  • Ipekci AM ,
  • Heron L , et al
  • Kimball A ,
  • Hatfield KM ,
  • Arons M , et al
  • Yousaf AR ,
  • Chu V , et al
  • Daniels CJ ,
  • Greenshields JT , et al
  • Vandenbroucke JP ,
  • von Elm E ,
  • Altman DG , et al
  • Baumann L ,
  • Egli-Gany D , et al
  • Cheng SMS , et al
  • Corman VM ,
  • Kaiser M , et al
  • Whitaker M ,
  • Elliott J ,
  • Chadeau-Hyam M
  • Michelen M ,
  • Manoharan L ,
  • Elkheir N , et al
  • Raynaud M ,
  • Louis K , et al

Twitter @dianacarbg, @Geointheworld, @nicolamlow

Contributors DB-G, GS and NL conceptualised the project. DB-G and NL wrote the manuscript. GS and NL provided feedback. GS and NL supervised the research. All authors edited the manuscript and approved the final manuscript.

Funding This work received support from the Swiss government excellence scholarship (grant number 2019.0774), the SSPH+ Global PhD Fellowship Programme in Public Health Sciences of the Swiss School of Public Health, and the Swiss National Science Foundation (project number 176233, and National Research Programme 78 COVID-19, project number 198418).

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

  • Open access
  • Published: 30 July 2022

Sample size calculation for prevalence studies using Scalex and ScalaR calculators

  • Lin Naing   ORCID: orcid.org/0000-0003-1723-9854 1 ,
  • Rusli Bin Nordin   ORCID: orcid.org/0000-0003-1878-3501 2 ,
  • Hanif Abdul Rahman   ORCID: orcid.org/0000-0003-3022-8690 1 , 3 , 4 &
  • Yuwadi Thein Naing   ORCID: orcid.org/0000-0001-7842-0927 5  

BMC Medical Research Methodology volume  22 , Article number:  209 ( 2022 ) Cite this article

57k Accesses

71 Citations

Metrics details

Although books and articles guiding the methods of sample size calculation for prevalence studies are available, we aim to guide, assist and report sample size calculation using the present calculators.

We present and discuss four parameters (namely level of confidence, precision, variability of the data, and anticipated loss) required for sample size calculation for prevalence studies. Choosing correct parameters with proper understanding, and reporting issues are mainly discussed. We demonstrate the use of a purposely-designed calculators that assist users to make proper informed-decision and prepare appropriate report.

Two calculators can be used with free software (Spreadsheet and RStudio) that benefit researchers with limited resources. It will, hopefully, minimize the errors in parameter selection, calculation, and reporting. The calculators are available at: ( https://sites.google.com/view/sr-ln/ssc ).

Peer Review reports

In quantitative research, when we take a sample from a study population or eligible population in order to save our resources, there are two important statistical processes namely using a probability sampling method (commonly known as “random sampling”) [ 1 ], and calculating an appropriate sample size [ 2 ]. Both are equally important to ensure a good representative sample for the study population.

As we need a specific statistical analysis for a specific research objective, we also need a specific sample size calculation method for a specific research objective. Even if two research objectives may require a similar statistical analysis, the sample size might be different depending on the parameters that we use for the calculation. In this paper, we focus on the objective that estimates a prevalence or proportion, for example, to estimate the prevalence of obesity, the prevalence of smoking, the prevalence of heart disease, diabetes mellitus or any other diseases of a study population. The method in this paper will not be suitable for other type of objectives such as estimating mean, comparing means, comparing proportions or regression analyses.

Books [ 3 , 4 ] and published articles [ 5 , 6 ] guiding the methods of sample size calculation for prevalence studies are available. Nevertheless, we observed that several parts of the sample size calculation process can be guided by a software or calculator and it can prevent incorrect calculation, incorrect use of formula, incorrect parameters, and incomplete sample size reporting.

Sample size softwares and calculators are extremely helpful that are available through commercial licenses such as Power Analysis & Sample Size (PASS) [ 7 ], or via freely available softwares such as Epitools [ 8 ] and the “presize” package in R [ 9 ]. However, there are a lot of confusion that still exists, that resulted in users incorrectly calculating sample size of their studies [ 10 , 11 ] especially the erroneous notion that one blanket formula can be used for all study designs [ 6 ]. In addition, users are expected to have some statistical knowledge to calculate and report the sample size calculation. Incorrect sample size calculation could introduce statistical errors that give rise to inaccurate results, which could be serious, particularly in medical research where evidences from these research studies are cornerstones of medical practices [ 12 , 13 ]. Many reasons could be attributed to these confusion, inaccuracy, and misunderstanding, in particular, the complexity of available softwares and corresponding guidelines [ 13 ].

Therefore, in this paper, we are addressing these issues by introducing a user-friendly Excel calculator that guides users to use the correct method and parameters step-by-step. This calculator also generates a publication-style report of adequate sample size for users’ study. We believe that, this will improve sample size calculation in future prevalence studies in medical and health sciences.

Implementation

Method to calculate sample size.

For an objective that estimates a prevalence, the sample size calculation formula is fairly simple and available in a number of books.

The following formula [ 2 ] shall be used:

where n  = Sample size,

Z  = Z statistic for a level of confidence (1.96 for 95% confidence level),

P  = Expected prevalence or proportion, and.

d  = Precision.

However, we do not encourage researchers to use formula as it could have human error in manual calculation. We can use available softwares, and concentrate on carefully choosing appropriate parameters for the calculation.

Appropriately choosing parameters

The above formula indicates three parameters to be determined.

Parameter 1: level of confidence

When we take a sample but wish to know about the population (such as prevalence of smoking) from where the sample is taken, we will not know the exact prevalence of the population as we do not study all members of the population. However, the sample study gives us an estimation which has lower and upper limits (informally ‘a range’, but we call ‘interval’ in Statistics) for the population prevalence. We normally calculate these lower and upper limits or an interval with a certain level of confidence. Commonly used or almost always used “level of confidence” for these intervals or estimates, is 95% (which we called 95% confidence interval, CI) in medical and health fields. In addition, most data analysis softwares give the results with 95% CIs by default. For these reasons, and also to minimize users’ error by non-statisticians, we have fixed the level of confidence as 95% without giving users’ choice in these presented calculators.

Parameter 2: precision

As mentioned above, we will not know the exact prevalence of the population as we do not study all members of the population. Therefore, the prevalence we calculate from the sample could deviate (error) from the population prevalence. We call this deviation as sampling error. We also know that, the larger the sample size, the smaller the errors in estimation. The errors are calculated as precision or also known as ‘margin of error’.

Practically, the precision reflects the width of 95% confidence interval. If we decide to choose an absolute precision of ± 2% in estimating a prevalence, we should expect, in the result, the width of 95% CI as 4% (example: 95% CI: 23%, 27%). If the absolute precision is ± 5% in estimating a prevalence, we should expect, in the result, the width of 95% CI as 10% (example: 95% CI: 20%, 30%). The width of the CI is twice that of the precision. Details are presented in Table 1 .

It is an opportunity for researchers to decide the precision (margin of error) and the width of the CI that they wish to see in the results. Normally, researchers wish to have narrower width of CI but the narrower it is, the more expensive (bigger sample size) it is going to be. Even if researchers decide to go for a smaller sample size, the researchers can also foresee or appreciate how poor CI width is going to be in their results. Therefore, this is an informed decision to be made by researchers.

Practically, we give some recommendations for choosing a precision value (Table 2 ). In general, well-funded studies or large scale studies, aiming to gain attention from policy makers, should aim for a precision of 2 to 3%, whereas small scale (or poorly-funded studies), for example, undergraduate or master student research projects, may consider a precision of 4 to 5%. If the precision is larger than 5% (such as 10%), due to limited resources, researchers should consider the study as a preliminary study.

However, the above recommendation applies to the expected prevalence of 10 to 90%. When the expected prevalence is too small (less than 10%) or too large (more than 90%), we need to apply much smaller precision. It is obvious that a precision of 5% is possible for an expected prevalence of 50%, but 5% precision is totally inappropriate for an expected prevalence of 2%.

We present details of precision for expected prevalence with examples in Table 2 .

Parameter 3: variability of the data

The larger the variation the data has, the larger is the sample size needed. This relationship can be explained in a simple analogy. When we cook soup and near to the finish, we stir it well before we taste. We always need a very small amount (small sample size) to taste because we stir it well and the variation is almost zero.

Practically, in estimating prevalence, the prevalence has effect on this variation and therefore effect on the required sample size. The relationship of prevalence and the sample size is presented in Fig.  1 .

figure 1

Prevalence and Effect on Sample Size

Obviously, it is the research objective to estimate the prevalence and researchers do not know this prevalence. Therefore, to calculate sample size, we normally find it out from most recent published studies with similar study population. If we cannot find suitable studies in the literature, we may consider to conduct a pilot study.

When we find multiple suitable prevalence from the literature, for example ranging from 15 to 30%, we should use the prevalence giving the highest sample size (in this case, 30%) in accordance with Fig.  1 that shows 30% will require the largest sample size in that range of 15 to 30% prevalence. Similarly, if the prevalence ranges from 60 to 80% in the recent literature, we should use 60% as it requires the largest sample size in that range.

We would like to caution that some books or guidelines suggest to use expected prevalence 50% if we could not get the prevalence at all [ 2 , 14 , 15 ]. We discourage this practice. In Fig.  1 , we should note that the prevalence of 50% will produce the largest sample size only within the range of 10 and 90% of the prevalence. The required sample size is much higher in the region below 10 and above 90%. Therefore, a short cut of prevalence 50% should not be used. It is best to calculate the sample size with appropriate expected prevalence. Researchers may find possible range of expected prevalence and apply the recommendation in the previous paragraph.

For this illustration, we have drawn Fig.  1 using precision for small scale study (Table 2 ). It means that we use the precision of fixed 5% for the expected prevalence between 10 and 90%, half of the expected prevalence for the expected prevalence less than 10%, and half of the (100 minus expected prevalence) for the expected prevalence larger than 90%.

Parameter 4: anticipated loss

We always have loss in sample size during the research process due to several reasons, such as non-response, incomplete data, loss-to-follow up, etc. Researchers should estimate the loss with their past experience, and inflate the sample size in calculation accordingly. These losses (especially, non-response, incomplete data, and loss-to-follow up) are very much related to research areas (for example, non-response rate could be higher if we study sexual issues or other sensitive issues) and population that researchers intend to study. Therefore, we recommend researchers to use non-response rates of previous studies of similar research areas and in similar populations.

Although we can put any per cent of the potential loss and inflate the sample size, it doesn’t guarantee that the calculated sample size is valid in terms of representative sample. In general, we would recommend that less than 10% loss would be an acceptable loss. However, there are different opinions on the acceptable per cent of loss or attrition [ 16 ] depending on the type of studies. At least, it is important to note that the higher the loss or attrition, the larger will be the compromise on the validity of the results.

Sample size calculation report

The report of sample size should be reproducible. It means that all parameters used must be reported. There are four parameters namely, level of confidence (mostly 95%), expected prevalence (mostly from literature or pilot study), the precision or margin of error of estimate (decision by researchers) and anticipated loss (experience of researchers) used in the calculation. We should also include the name of the software or calculator with proper reference. Scalex SP calculator has incorporated the draft report for the user to copy and use. It ensures all necessary parameters used are included in the report.

Results and discussion

Demonstration of scalex sp and scalar calculator, simple three steps for scalex sp.

Basically, the Scalex SP calculator (Scalex stands for ‘Sample Size Calculator using Excel’, and SP stands for ‘Single Proportion’) (available at: https://sites.google.com/view/sr-ln/ssc ) guides the users in three steps:

Step 1: to type in “Expected Prevalence” in terms of per cent (> 0 to < 100).

Step 2: to type in “Anticipated Loss” in terms of per cent (0 to < 100).

Step 3: to decide and type in the precision of user choice after going through the Sample Size Table. Users may type a precision which is not listed in the table (such as ± 2.5%). Then, Scalex SP will give a draft report for the user.

Major advantage of the Scalex SP calculator is that, it gives users Sample Size Table (Fig.  3 ) in which users can appreciate sample sizes for a range of precision, and appreciate or foresee the CIs in their results. Therefore, it helps users in decision making of selecting precision considering available resources.

Example using Scalex SP

We are going to conduct a study to estimate the prevalence of obesity among secondary school children in a district. We managed to find the expected prevalence in the literature as 30%.

When we start the Scalex SP, we see the interface as in Fig.  2 . Then, we fill 30 (30%) for Expected Prevalence. As we experienced 10% non-response in this study population in previous studies, we fill 10% loss (see Fig.  3 ).

figure 2

Scalex SP interface for Step 1, 2 and 3

figure 3

Scalex SP with Report

Then, sample sizes given for various precisions are reviewed and we decide to use ± 3% precision as it gives us an acceptable width of 95% CI (27%, 33%), and the sample size ( n  = 997) is possible to manage.

Then, we fill in 3 (3%) in Step 3, and Scalex SP gives the draft report as in Fig.  3 .

ScalaR SP programme for R users

Authors have written R Script (ScalaR SP.R) and with two command lines as in Fig.  4 (this Script file must be stored at “Working Directory”), will give the same output as Scalex SP.

figure 4

ScalaR SP—with report

(available at: https://sites.google.com/view/sr-ln/ssc ).

Example of R command as follows:

 > ScalarSP( p  = 0.3, d = 0.03, loss = 0.1).

p  = expected prevalence.

d = precision or margin of error.

loss = anticipated loss or attrition of sample size.

Other issues

The Scalex calculator is for studies using the specific sampling method such as simple random sampling, systematic sampling, and proportionate-stratified random sampling. For other sampling methods, the calculated sample size should be multiplied with the design effect [ 14 ]. Estimating design effect could be from the literature if it is reported in the previous similar studies. If not, it is a complicated procedure involving data simulation.

Limitation of the presented calculators

The formula used in these calculators (reported in Para 2 above) assumes that the population is unknown and large. If the population is known, the required sample size could be smaller by using a different formula which has population size in the formula. However, if we use the formula with population size and obtain smaller sample size, researchers should analyse the data using ‘finite population correction’ and ‘survey data analysis method’ [ 17 ] instead of standard statistical analyses, to obtain valid results. Therefore, we consider a safer approach, that is, assuming that the population size is unknown both in calculating sample size and also later in data analyses. Therefore, it could be a limitation, if one would like to calculate a sample size with known population size and also using ‘finite population correction’ in their data analyses.

The presented calculators have been designed using Wald’s confidence interval. The limitation of this confidence interval is that, it could go below 0% or above 100% in the confidence intervals if the users specify precision inappropriately in relation to the expected prevalence. Though we could give users a choice to consider other methods of confidence interval such as exact confidence interval, logit-confidence interval, etc. we prevent this issue by recommending the use of appropriate precision in Implementation Paragraph 2.1.2 and Table 2 . We consider this would be a more intuitive approach especially for users with limited statistical knowledge or skills. In any case, with a single method of confidence interval (Wald), we wish to report this limitation for the presented calculators.

Conclusions

With technological advancement, researchers should not calculate sample sizes manually. The software or calculators should help researchers minimize possible error in calculation and also to assist in reporting. However, the use of correct parameters still remains as the responsibility of users. In addition, calculators using free software, will benefit researchers who have limited resources.

The presented calculators, designed for prevalence studies, is available at: ( https://sites.google.com/view/sr-ln/ssc ) for public without asking permission. Authors will continue to use Scalex calculator for other type of studies in the near future.

The presented calculators are beneficial as the calculators incorporate non-response or other loss, indicate the anticipated 95% CI, give a list of sample sizes for a range of precisions therefore, guide to make informed decision for precision, and finally draft a sample size calculation report for scientific reporting.

This paper also includes a number of cautions and recommendations for selecting parameters, especially expected prevalence, precision, and anticipated loss, so that researchers can conduct prevalence studies with more appropriate sample sizes.

Availability and requirements

Scalex SP calculator.

Project name: sample size calculator project.

Project home page: https://sites.google.com/view/sr-ln/ssc

Operating system(s): Windows.

Programming language: Excel-based.

License: no license required.

Any restrictions to use by non-academics: No restriction.

ScalaR calculator.

Programming language: R language.

Availability of data and materials

This paper doesn’t involve data. However, the free calculator is available here: ( https://sites.google.com/view/sr-ln/ssc ).

Abbreviations

Sample Size Calculator using Excel for Single Proportion

Sample Size Calculator using R & RStudio for Single Proportion

Power Analysis and Sample Size

Confidence Interval

Sample Size

Z Statistic

Expected prevalence or proportion

Cochran WG. Sampling Techniques. 3rd ed. New York: John Wiley & Sons; 1977.

Google Scholar  

Daniel WW, Cross CL. Biostatistics: A foundation for analysis in the health sciences. 10th ed. New York: John Wiley & Sons; 2013.

Verma JP, Verma P. Determining sample size and power in research studies. Singapore: Springer; 2020.

Chow S-C, Shao J, Wang H, Lokhnygina Y. Sample size calculations in clinical research. New York: chapman and hall/CRC; 2017.

Vallejo A, Muniesa A, Ferreira C, de Blas I. New method to estimate the sample size for calculation of a proportion assuming binomial distribution. Res Vet Sci. 2013;95:405–9. https://doi.org/10.1016/j.rvsc.2013.04.005 .

Article   PubMed   Google Scholar  

Charan J, Biswas T. How to calculate sample size for different study designs in medical research? Indian J Psychol Med. 2013;35:121–6.

Article   Google Scholar  

NCSS Statistical Software. Power Analysis & Sample Size (PASS). 2022.

Epitools. Epitools - Epidemiological calculators. 2022.

Haynes AG, Lenz A, Stalder O, Limacher A. presize: An R-package for precision-based sample size calculation in clinical research. J Open Source Softw. 2021;6:3118.

Patra P. Sample size in clinical research, the number we need. Int J Med Sci Public Heal. 2012;1:5–9.

Charan J, Kantharia N. How to calculate sample size in animal studies? J Pharmacol Pharmacother. 2013;4:303–6.

Pourhoseingholi MA, Vahedi M, Rahimzadeh M. Sample size calculation in medical studies. Gastroenterol Hepatol Bed Bench. 2013;6:14.

PubMed   PubMed Central   Google Scholar  

Serdar CC, Cihan M, Yücel D, Serdar MA. Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies. Biochem medica. 2021;31:10502. https://doi.org/10.11613/BM.2021.010502 .

Lwanga SK, Lemeshow S. Sample size determination in health studies: a practical manual. Geneva: World Health Organization; 1991.

Maple Tech IL. Calculator.net. 2019. https://www.calculator.net/sample-size-calculator.html?type=1&cl=95&ci=5&pp=50&ps=&x=120&y=21 . Accessed 19 Dec 2019.

Draugalis JR, Plaza CM. Best practices for survey research reports revisited: implications of target population, probability sampling, and response rate. Am J Pharm Educ. 2009;73:1–3.

Heeringa SG, West BT, Berglund PA. Applied survey data analysis (Second Edition). New York: Chapman and Hall/CRC; 2020.

Download references

Acknowledgements

No acknowledgment required.

This study is not funded by any funding agency.

Author information

Authors and affiliations.

PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Jalan Tungku Link, Brunei-Muara BE3119, Gadong, Brunei Darussalam

Lin Naing & Hanif Abdul Rahman

Faculty of Medicine, Bioscience and Nursing, MAHSA University, Bandar Saujana Putra, Jenjarom, Selangor, Malaysia

Rusli Bin Nordin

Centre of Advanced Research (CARe), Universiti Brunei Darussalam, Gadong, Brunei Darussalam

Hanif Abdul Rahman

School of Nursing and Statistics Online Computational Resource (SOCR), University of Michigan, Ann Arbor, MI, USA

Graduate Student, Asia Pacific University of Technology and Innovation, Kuala Lumpur, Malaysia

Yuwadi Thein Naing

You can also search for this author in PubMed   Google Scholar

Contributions

LN contributed in the conception of the work, creating of the software, testing and further development of the software, drafting and revision of the paper. RN contributed in the conception of the work, testing the software, drafting and revision of the paper. HAR contributed in the conception of the work, testing the software, drafting and revision of the paper. YTN contributed in creating of the software, testing and further development of the software, and drafting and revision of the paper. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Lin Naing .

Ethics declarations

Ethics approval and consent to participate.

The study did not require ethics approval and consent to participate.

Consent for publication

Not applicable.

Competing interests

We do not have any competing interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Naing, L., Nordin, R.B., Abdul Rahman, H. et al. Sample size calculation for prevalence studies using Scalex and ScalaR calculators. BMC Med Res Methodol 22 , 209 (2022). https://doi.org/10.1186/s12874-022-01694-7

Download citation

Received : 06 February 2022

Accepted : 22 July 2022

Published : 30 July 2022

DOI : https://doi.org/10.1186/s12874-022-01694-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Sample size
  • Single proportion
  • Prevalence studies

BMC Medical Research Methodology

ISSN: 1471-2288

sample research paper epidemiology

IMAGES

  1. (PDF) Descriptive Epidemiology for Public Health Professionals Part 4

    sample research paper epidemiology

  2. USC

    sample research paper epidemiology

  3. Epidemiology: H1N1 Virus Essay Example

    sample research paper epidemiology

  4. MRSA Epidemiology Paper

    sample research paper epidemiology

  5. American Journal of Epidemiology Template

    sample research paper epidemiology

  6. 📗 Descriptive Epidemiology Paper Example

    sample research paper epidemiology

COMMENTS

  1. Epidemiology Research Paper

    View sample epidemiology research paper. Browse research paper examples for more inspiration. If you need a health research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our writing service for professional assistance.

  2. PDF Drafting a quantitative epidemiological research paper

    Drafting a research paper 2017 3 | P a g e Background Publishing papers is a key part of an effective strategy to disseminate research results and communicate with your peers. The number of papers published in journals is increasing, as is the competition in getting a paper accepted in journals, with increasingly high rejection rates.

  3. Epidemiology Research Paper Topics

    Epidemiology research paper topics related to pressing health concerns, emerging diseases, or interventions that can improve health outcomes are likely to be impactful and garner attention from the scientific community. ... Include information on sample size, recruitment strategies, ethical considerations, and any adjustments made for ...

  4. American Journal of Epidemiology

    The American Journal of Epidemiology is publishing timely, high-quality articles to further the scientific discourse about COVID-19 and the understanding of the pandemic. Explore the papers. An official journal of John Hopkins Bloomberg School of Public Health. Publishes empirical research findings, opinion pieces, and methodological.

  5. PDF Master's Thesis Guide

    This guide incorporates both Epidemiology Department and Graduate School requirements. Discussion includes; topic development, Human Subjects training, roles of the thesis committee and chair, formatting, writing and revising, submission. Developing and Completing Your Epidemiology MS or MPH Thesis (and Surviving to Tell About It) Table of ...

  6. A Framework for Descriptive Epidemiology

    In this paper, we propose a framework for thinking through the design and conduct of descriptive epidemiologic studies. ... A well-defined research question (causal or descriptive) states: 1) the target population, characterized by person and place, and anchored in time; 2) the outcome, event, or health state or characteristic; and 3) the ...

  7. How to write a research paper

    2. ]. In this issue, after an introductory paper by Kotz et al, Kotz and Cals publish the first of a series of monthly compact one-page papers, each highlighting an essential step in preparing and writing a research paper. This series, containing a total of 12 one-pagers, originates from a PhD student course organized at Maastricht University ...

  8. Epidemiology of COVID-19: An updated review

    Abstract. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a zoonotic infection, is responsible for COVID-19 pandemic and also is known as a public health concern. However, so far, the origin of the causative virus and its intermediate hosts is yet to be fully determined. SARS-CoV-2 contains nearly 30,000 letters of RNA that ...

  9. A global dataset of pandemic- and epidemic-prone disease outbreaks

    This paper presents a new dataset of infectious disease outbreaks collected from the Disease Outbreak News and the Coronavirus Dashboard produced by the World Health Organization. The dataset ...

  10. Data Analysis of Epidemiological Studies

    An important objective of epidemiological research is to identify risk factors for disease. Depending on the particular question being asked, cohort studies, case-control studies, or cross-sectional studies are conducted. ... the exposure and disease status are examined for a sample from a defined population at the same time point. The ...

  11. Introduction to Quantitative Epidemiology

    As an introduction to quantitative epidemiology, this chapter consists of 9 sections, covering key concepts and major tasks of epidemiology, paradigm of quantitative epidemiology, population , study population, sample, and sampling methods; methods to identify a problem, frame a problem into a research question, defend a selected topic by considering significance, innovation, feasibility, and ...

  12. Who is in this study, anyway? Guidelines for a useful Table 1

    Abstract. Objective: Epidemiologic and clinical research papers often describe the study sample in the first table. If well-executed, this "Table 1" can illuminate potential threats to internal and external validity. However, little guidance exists on best practices for designing a Table 1, especially for complex study designs and analyses.

  13. Chapter 1. What is epidemiology?

    Epidemiology is the study of how often diseases occur in different groups of people and why. Epidemiological information is used to plan and evaluate strategies to prevent illness and as a guide to the management of patients in whom disease has already developed. Like the clinical findings and pathology, the epidemiology of a disease is an ...

  14. Epidemiology

    At-admission prediction of mortality and pulmonary embolism in an international cohort of hospitalised patients with COVID-19 using statistical and machine learning methods. Munib Mesinovic. , Xin ...

  15. Coronavirus disease (COVID-19) pandemic: an overview of systematic

    Navigating the rapidly growing body of scientific literature on the SARS-CoV-2 pandemic is challenging, and ongoing critical appraisal of this output is essential. We aimed to summarize and critically appraise systematic reviews of coronavirus disease (COVID-19) in humans that were available at the beginning of the pandemic. Nine databases (Medline, EMBASE, Cochrane Library, CINAHL, Web of ...

  16. Epidemiology and Biostatistics Theses and Dissertations

    Theses/Dissertations from 2019 PDF. Evolutionary Dynamics of Influenza Type B in the Presence of Vaccination: An Ecological Study, Lindsey J. Fiedler. PDF. Respiratory Infections and Risk for Development of Narcolepsy: Analysis of the Truven Health MarketScan Database (2008 to 2010) with Additional Assessment of Incidence and Prevalence, Darren Scheer ...

  17. How to Design a (Good) Epidemiological Observational Study

    Objective To identify, characterize, and explore author guides on the role, format, and content of protocols for observational epidemiological studies, particularly cohort and case-control studies.

  18. Sampling in epidemiological research: issues, hazards and pitfalls

    Random sampling. In random sampling every member of the population has the same chance (probability) of being selected into the sample. Using a random sample it is possible to describe quantitatively the relationship between the sample and the underlying population, giving the range of values, called confidence intervals, in which the true population parameter is likely to lie.

  19. (PDF) Critical reading of epidemiological papers: A guide

    This paper attempts to give guidance to non-epidemiologists on how to read and evaluate the quality of epidemiologic studies and their results critically. Different methodological issues for ...

  20. Studies of prevalence: how a basic epidemiology concept has gained

    EPIDEMIOLOGY; COVID-19; STATISTICS & RESEARCH METHODS; In introductory epidemiology, students learn about prevalence, an easy to understand concept, defined as 'a proportion that measures disease occurrence of any type of health condition, exposure, or other factor related to health',1 or 'the proportion of persons in a population who have a particular disease or attribute at a specified ...

  21. Frontiers in Epidemiology

    Metabolic Diseases Promote Cardiovascular Diseases. Hong Wang. Professor Xiaofeng Yang, MD, PhD. Deyu Fang. Juncheng Wei. 2,077 views. 1 article. A journal for scientific exchange across the breadth of epidemiological research. It explores the use of data for Investigates and predicting health outcomes, and assessing the health impact of cli...

  22. PDF MSc Epidemiology Thesis Research Proposal

    1. Acceptable - the thesis research can proceed as proposed. 2. Conditional acceptance - the thesis research can proceed after responding satisfactorily to comments made in the evaluation. A time limit will be imposed. 3. Not acceptable - the thesis research cannot proceed as proposed and requires re-submission of a new research proposal.

  23. Sample size calculation for prevalence studies using Scalex and ScalaR

    In quantitative research, when we take a sample from a study population or eligible population in order to save our resources, there are two important statistical processes namely using a probability sampling method (commonly known as "random sampling") [], and calculating an appropriate sample size [].Both are equally important to ensure a good representative sample for the study population.