Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Systematic Review | Definition, Example, & Guide

Systematic Review | Definition, Example & Guide

Published on June 15, 2022 by Shaun Turney . Revised on November 20, 2023.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question “What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?”

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

Table of contents

What is a systematic review, systematic review vs. meta-analysis, systematic review vs. literature review, systematic review vs. scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, other interesting articles, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce bias . The methods are repeatable, and the approach is formal and systematic:

  • Formulate a research question
  • Develop a protocol
  • Search for all relevant studies
  • Apply the selection criteria
  • Extract the data
  • Synthesize the data
  • Write and publish a report

Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesizing all available evidence and evaluating the quality of the evidence. Synthesizing means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Prevent plagiarism. Run a free check.

Systematic reviews often quantitatively synthesize the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesize results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimize bias by using transparent and repeatable methods.

However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

systematic literature review of

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

  • A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
  • If you’re doing a systematic review on your own (e.g., for a research paper or thesis ), you should take appropriate measures to ensure the validity and reliability of your research.
  • Access to databases and journal archives. Often, your educational institution provides you with access.
  • Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
  • Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

  • They minimize research bias by considering all available evidence and evaluating each study for bias.
  • Their methods are transparent , so they can be scrutinized by others.
  • They’re thorough : they summarize all available evidence.
  • They can be replicated and updated by others.

Systematic reviews also have a few cons .

  • They’re time-consuming .
  • They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

  • Allow you to more effectively communicate your research to other researchers and practitioners
  • Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

  • Population(s) or problem(s)
  • Intervention(s)
  • Comparison(s)

You can rearrange these four components to write your research question:

  • What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fifth component, the type of study design . In this case, the acronym is PICOT .

  • Type of study design(s)
  • The population of patients with eczema
  • The intervention of probiotics
  • In comparison to no treatment, placebo , or non-probiotic treatment
  • The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
  • Randomized control trials, a type of study design

Their research question was:

  • What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

  • Background information : Provide the context of the research question, including why it’s important.
  • Research objective (s) : Rephrase your research question as an objective.
  • Selection criteria: State how you’ll decide which studies to include or exclude from your review.
  • Search strategy: Discuss your plan for finding studies.
  • Analysis: Explain what information you’ll collect from the studies and how you’ll synthesize the data.

If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:

  • Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
  • Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
  • Gray literature: Gray literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of gray literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of gray literature.
  • Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

  • Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
  • Handsearch: Conference proceedings and reference lists of articles
  • Gray literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
  • Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.

You should apply the selection criteria in two phases:

  • Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
  • Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarize what you did using a PRISMA flow diagram .

Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

  • Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
  • Your judgment of the quality of the evidence, including risk of bias .

You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomized into the control and treatment groups.

Step 6: Synthesize the data

Synthesizing the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesizing the data:

  • Narrative ( qualitative ): Summarize the information in words. You’ll need to discuss the studies and assess their overall quality.
  • Quantitative : Use statistical methods to summarize and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analyzed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

  • Abstract : A summary of the review
  • Introduction : Including the rationale and objectives
  • Methods : Including the selection criteria, search method, data extraction method, and synthesis method
  • Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
  • Discussion : Including interpretation of the results and limitations of the review
  • Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

In their report, Boyle and colleagues concluded that probiotics cannot be recommended for reducing eczema symptoms or improving quality of life in patients with eczema. Note Generative AI tools like ChatGPT can be useful at various stages of the writing and research process and can help you to write your systematic review. However, we strongly advise against trying to pass AI-generated text off as your own work.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, November 20). Systematic Review | Definition, Example & Guide. Scribbr. Retrieved September 16, 2024, from https://www.scribbr.com/methodology/systematic-review/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, how to write a literature review | guide, examples, & templates, how to write a research proposal | examples & templates, what is critical thinking | definition & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Reference management. Clean and simple.

How to write a systematic literature review [9 steps]

Systematic literature review

What is a systematic literature review?

Where are systematic literature reviews used, what types of systematic literature reviews are there, how to write a systematic literature review, 1. decide on your team, 2. formulate your question, 3. plan your research protocol, 4. search for the literature, 5. screen the literature, 6. assess the quality of the studies, 7. extract the data, 8. analyze the results, 9. interpret and present the results, registering your systematic literature review, frequently asked questions about writing a systematic literature review, related articles.

A systematic literature review is a summary, analysis, and evaluation of all the existing research on a well-formulated and specific question.

Put simply, a systematic review is a study of studies that is popular in medical and healthcare research. In this guide, we will cover:

  • the definition of a systematic literature review
  • the purpose of a systematic literature review
  • the different types of systematic reviews
  • how to write a systematic literature review

➡️ Visit our guide to the best research databases for medicine and health to find resources for your systematic review.

Systematic literature reviews can be utilized in various contexts, but they’re often relied on in clinical or healthcare settings.

Medical professionals read systematic literature reviews to stay up-to-date in their field, and granting agencies sometimes need them to make sure there’s justification for further research in an area. They can even be used as the starting point for developing clinical practice guidelines.

A classic systematic literature review can take different approaches:

  • Effectiveness reviews assess the extent to which a medical intervention or therapy achieves its intended effect. They’re the most common type of systematic literature review.
  • Diagnostic test accuracy reviews produce a summary of diagnostic test performance so that their accuracy can be determined before use by healthcare professionals.
  • Experiential (qualitative) reviews analyze human experiences in a cultural or social context. They can be used to assess the effectiveness of an intervention from a person-centric perspective.
  • Costs/economics evaluation reviews look at the cost implications of an intervention or procedure, to assess the resources needed to implement it.
  • Etiology/risk reviews usually try to determine to what degree a relationship exists between an exposure and a health outcome. This can be used to better inform healthcare planning and resource allocation.
  • Psychometric reviews assess the quality of health measurement tools so that the best instrument can be selected for use.
  • Prevalence/incidence reviews measure both the proportion of a population who have a disease, and how often the disease occurs.
  • Prognostic reviews examine the course of a disease and its potential outcomes.
  • Expert opinion/policy reviews are based around expert narrative or policy. They’re often used to complement, or in the absence of, quantitative data.
  • Methodology systematic reviews can be carried out to analyze any methodological issues in the design, conduct, or review of research studies.

Writing a systematic literature review can feel like an overwhelming undertaking. After all, they can often take 6 to 18 months to complete. Below we’ve prepared a step-by-step guide on how to write a systematic literature review.

  • Decide on your team.
  • Formulate your question.
  • Plan your research protocol.
  • Search for the literature.
  • Screen the literature.
  • Assess the quality of the studies.
  • Extract the data.
  • Analyze the results.
  • Interpret and present the results.

When carrying out a systematic literature review, you should employ multiple reviewers in order to minimize bias and strengthen analysis. A minimum of two is a good rule of thumb, with a third to serve as a tiebreaker if needed.

You may also need to team up with a librarian to help with the search, literature screeners, a statistician to analyze the data, and the relevant subject experts.

Define your answerable question. Then ask yourself, “has someone written a systematic literature review on my question already?” If so, yours may not be needed. A librarian can help you answer this.

You should formulate a “well-built clinical question.” This is the process of generating a good search question. To do this, run through PICO:

  • Patient or Population or Problem/Disease : who or what is the question about? Are there factors about them (e.g. age, race) that could be relevant to the question you’re trying to answer?
  • Intervention : which main intervention or treatment are you considering for assessment?
  • Comparison(s) or Control : is there an alternative intervention or treatment you’re considering? Your systematic literature review doesn’t have to contain a comparison, but you’ll want to stipulate at this stage, either way.
  • Outcome(s) : what are you trying to measure or achieve? What’s the wider goal for the work you’ll be doing?

Now you need a detailed strategy for how you’re going to search for and evaluate the studies relating to your question.

The protocol for your systematic literature review should include:

  • the objectives of your project
  • the specific methods and processes that you’ll use
  • the eligibility criteria of the individual studies
  • how you plan to extract data from individual studies
  • which analyses you’re going to carry out

For a full guide on how to systematically develop your protocol, take a look at the PRISMA checklist . PRISMA has been designed primarily to improve the reporting of systematic literature reviews and meta-analyses.

When writing a systematic literature review, your goal is to find all of the relevant studies relating to your question, so you need to search thoroughly .

This is where your librarian will come in handy again. They should be able to help you formulate a detailed search strategy, and point you to all of the best databases for your topic.

➡️ Read more on on how to efficiently search research databases .

The places to consider in your search are electronic scientific databases (the most popular are PubMed , MEDLINE , and Embase ), controlled clinical trial registers, non-English literature, raw data from published trials, references listed in primary sources, and unpublished sources known to experts in the field.

➡️ Take a look at our list of the top academic research databases .

Tip: Don’t miss out on “gray literature.” You’ll improve the reliability of your findings by including it.

Don’t miss out on “gray literature” sources: those sources outside of the usual academic publishing environment. They include:

  • non-peer-reviewed journals
  • pharmaceutical industry files
  • conference proceedings
  • pharmaceutical company websites
  • internal reports

Gray literature sources are more likely to contain negative conclusions, so you’ll improve the reliability of your findings by including it. You should document details such as:

  • The databases you search and which years they cover
  • The dates you first run the searches, and when they’re updated
  • Which strategies you use, including search terms
  • The numbers of results obtained

➡️ Read more about gray literature .

This should be performed by your two reviewers, using the criteria documented in your research protocol. The screening is done in two phases:

  • Pre-screening of all titles and abstracts, and selecting those appropriate
  • Screening of the full-text articles of the selected studies

Make sure reviewers keep a log of which studies they exclude, with reasons why.

➡️ Visit our guide on what is an abstract?

Your reviewers should evaluate the methodological quality of your chosen full-text articles. Make an assessment checklist that closely aligns with your research protocol, including a consistent scoring system, calculations of the quality of each study, and sensitivity analysis.

The kinds of questions you'll come up with are:

  • Were the participants really randomly allocated to their groups?
  • Were the groups similar in terms of prognostic factors?
  • Could the conclusions of the study have been influenced by bias?

Every step of the data extraction must be documented for transparency and replicability. Create a data extraction form and set your reviewers to work extracting data from the qualified studies.

Here’s a free detailed template for recording data extraction, from Dalhousie University. It should be adapted to your specific question.

Establish a standard measure of outcome which can be applied to each study on the basis of its effect size.

Measures of outcome for studies with:

  • Binary outcomes (e.g. cured/not cured) are odds ratio and risk ratio
  • Continuous outcomes (e.g. blood pressure) are means, difference in means, and standardized difference in means
  • Survival or time-to-event data are hazard ratios

Design a table and populate it with your data results. Draw this out into a forest plot , which provides a simple visual representation of variation between the studies.

Then analyze the data for issues. These can include heterogeneity, which is when studies’ lines within the forest plot don’t overlap with any other studies. Again, record any excluded studies here for reference.

Consider different factors when interpreting your results. These include limitations, strength of evidence, biases, applicability, economic effects, and implications for future practice or research.

Apply appropriate grading of your evidence and consider the strength of your recommendations.

It’s best to formulate a detailed plan for how you’ll present your systematic review results. Take a look at these guidelines for interpreting results from the Cochrane Institute.

Before writing your systematic literature review, you can register it with OSF for additional guidance along the way. You could also register your completed work with PROSPERO .

Systematic literature reviews are often found in clinical or healthcare settings. Medical professionals read systematic literature reviews to stay up-to-date in their field and granting agencies sometimes need them to make sure there’s justification for further research in an area.

The first stage in carrying out a systematic literature review is to put together your team. You should employ multiple reviewers in order to minimize bias and strengthen analysis. A minimum of two is a good rule of thumb, with a third to serve as a tiebreaker if needed.

Your systematic review should include the following details:

A literature review simply provides a summary of the literature available on a topic. A systematic review, on the other hand, is more than just a summary. It also includes an analysis and evaluation of existing research. Put simply, it's a study of studies.

The final stage of conducting a systematic literature review is interpreting and presenting the results. It’s best to formulate a detailed plan for how you’ll present your systematic review results, guidelines can be found for example from the Cochrane institute .

The best research databases for computer science

  • Locations and Hours
  • UCLA Library
  • Research Guides
  • Biomedical Library Guides

Systematic Reviews

  • Types of Literature Reviews

What Makes a Systematic Review Different from Other Types of Reviews?

  • Planning Your Systematic Review
  • Database Searching
  • Creating the Search
  • Search Filters and Hedges
  • Grey Literature
  • Managing and Appraising Results
  • Further Resources

Reproduced from Grant, M. J. and Booth, A. (2009), A typology of reviews: an analysis of 14 review types and associated methodologies. Health Information & Libraries Journal, 26: 91–108. doi:10.1111/j.1471-1842.2009.00848.x

Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or mode Seeks to identify most significant items in the field No formal quality assessment. Attempts to evaluate according to contribution Typically narrative, perhaps conceptual or chronological Significant component: seeks to identify conceptual contribution to embody existing or derive new theory
Generic term: published materials that provide examination of recent or current literature. Can cover wide range of subjects at various levels of completeness and comprehensiveness. May include research findings May or may not include comprehensive searching May or may not include quality assessment Typically narrative Analysis may be chronological, conceptual, thematic, etc.
Mapping review/ systematic map Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature Completeness of searching determined by time/scope constraints No formal quality assessment May be graphical and tabular Characterizes quantity and quality of literature, perhaps by study design and other key features. May identify need for primary or secondary research
Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results Aims for exhaustive, comprehensive searching. May use funnel plot to assess completeness Quality assessment may determine inclusion/ exclusion and/or sensitivity analyses Graphical and tabular with narrative commentary Numerical analysis of measures of effect assuming absence of heterogeneity
Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies Requires either very sensitive search to retrieve all studies or separately conceived quantitative and qualitative strategies Requires either a generic appraisal instrument or separate appraisal processes with corresponding checklists Typically both components will be presented as narrative and in tables. May also employ graphical means of integrating quantitative and qualitative studies Analysis may characterise both literatures and look for correlations between characteristics or use gap analysis to identify aspects absent in one literature but missing in the other
Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics May or may not include comprehensive searching (depends whether systematic overview or not) May or may not include quality assessment (depends whether systematic overview or not) Synthesis depends on whether systematic or not. Typically narrative but may include tabular features Analysis may be chronological, conceptual, thematic, etc.
Method for integrating or comparing the findings from qualitative studies. It looks for ‘themes’ or ‘constructs’ that lie in or across individual qualitative studies May employ selective or purposive sampling Quality assessment typically used to mediate messages not for inclusion/exclusion Qualitative, narrative synthesis Thematic analysis, may include conceptual models
Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research Completeness of searching determined by time constraints Time-limited formal quality assessment Typically narrative and tabular Quantities of literature and overall quality/direction of effect of literature
Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research evidence (usually including ongoing research) Completeness of searching determined by time/scope constraints. May include research in progress No formal quality assessment Typically tabular with some narrative commentary Characterizes quantity and quality of literature, perhaps by study design and other key features. Attempts to specify a viable review
Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives Aims for comprehensive searching of current literature No formal quality assessment Typically narrative, may have tabular accompaniment Current state of knowledge and priorities for future investigation and research
Seeks to systematically search for, appraise and synthesis research evidence, often adhering to guidelines on the conduct of a review Aims for exhaustive, comprehensive searching Quality assessment may determine inclusion/exclusion Typically narrative with tabular accompaniment What is known; recommendations for practice. What remains unknown; uncertainty around findings, recommendations for future research
Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce ‘best evidence synthesis’ Aims for exhaustive, comprehensive searching May or may not include quality assessment Minimal narrative, tabular summary of studies What is known; recommendations for practice. Limitations
Attempt to include elements of systematic review process while stopping short of systematic review. Typically conducted as postgraduate student assignment May or may not include comprehensive searching May or may not include quality assessment Typically narrative with tabular accompaniment What is known; uncertainty around findings; limitations of methodology
Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their results Identification of component reviews, but no search for primary studies Quality assessment of studies within component reviews and/or of reviews themselves Graphical and tabular with narrative commentary What is known; recommendations for practice. What remains unknown; recommendations for future research
  • << Previous: Home
  • Next: Planning Your Systematic Review >>
  • Last Updated: Jul 23, 2024 3:40 PM
  • URL: https://guides.library.ucla.edu/systematicreviews

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Carrying out systematic literature reviews: an introduction

Affiliation.

  • 1 Lecturer in Health Data Science, School of Health Sciences, University of Manchester, Manchester.
  • PMID: 31393770
  • DOI: 10.12968/bjon.2019.28.15.1008

Systematic reviews provide a synthesis of evidence for a specific topic of interest, summarising the results of multiple studies to aid in clinical decisions and resource allocation. They remain among the best forms of evidence, and reduce the bias inherent in other methods. A solid understanding of the systematic review process can be of benefit to nurses that carry out such reviews, and for those who make decisions based on them. An overview of the main steps involved in carrying out a systematic review is presented, including some of the common tools and frameworks utilised in this area. This should provide a good starting point for those that are considering embarking on such work, and to aid readers of such reviews in their understanding of the main review components, in order to appraise the quality of a review that may be used to inform subsequent clinical decision making.

Keywords: Health care education; Health care roles; Nursing education; Nursing evaluation research; Nursing research.

PubMed Disclaimer

Similar articles

  • Student and educator experiences of maternal-child simulation-based learning: a systematic review of qualitative evidence protocol. MacKinnon K, Marcellus L, Rivers J, Gordon C, Ryan M, Butcher D. MacKinnon K, et al. JBI Database System Rev Implement Rep. 2015 Jan;13(1):14-26. doi: 10.11124/jbisrir-2015-1694. JBI Database System Rev Implement Rep. 2015. PMID: 26447004
  • The effectiveness of internet-based e-learning on clinician behavior and patient outcomes: a systematic review protocol. Sinclair P, Kable A, Levett-Jones T. Sinclair P, et al. JBI Database System Rev Implement Rep. 2015 Jan;13(1):52-64. doi: 10.11124/jbisrir-2015-1919. JBI Database System Rev Implement Rep. 2015. PMID: 26447007
  • The effectiveness of public health nursing: the problems and solutions in carrying out a review of systematic reviews. Elliott L, Crombie IK, Irvine L, Cantrell J, Taylor J. Elliott L, et al. J Adv Nurs. 2004 Jan;45(2):117-25. doi: 10.1046/j.1365-2648.2003.02873.x. J Adv Nurs. 2004. PMID: 14705995
  • Experiences of registered nurses as managers and leaders in residential aged care facilities: a systematic review. Dwyer D. Dwyer D. Int J Evid Based Healthc. 2011 Dec;9(4):388-402. doi: 10.1111/j.1744-1609.2011.00239.x. Int J Evid Based Healthc. 2011. PMID: 22093388 Review.
  • Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature. Eddy K, Jordan Z, Stephenson M. Eddy K, et al. JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843. JBI Database System Rev Implement Rep. 2016. PMID: 27532314 Review.
  • Analysis of Risk Determinants of Neonatal Mortality in the Last Decade: A Systematic Literature Review (2013-2023). Nabila M, Baidani A, Mourajid Y, Chebabe M, Abderraouf H. Nabila M, et al. Pediatr Rep. 2024 Aug 12;16(3):696-716. doi: 10.3390/pediatric16030059. Pediatr Rep. 2024. PMID: 39189293 Free PMC article. Review.
  • Design and Implementation Factors for Performance Measurement in Non-profit Organizations: A Literature Review. Treinta FT, Moura LF, Almeida Prado Cestari JM, Pinheiro de Lima E, Deschamps F, Gouvea da Costa SE, Van Aken EM, Munik J, Leite LR. Treinta FT, et al. Front Psychol. 2020 Aug 7;11:1799. doi: 10.3389/fpsyg.2020.01799. eCollection 2020. Front Psychol. 2020. PMID: 32903643 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Systematic Reviews and Meta Analysis

  • Getting Started
  • Guides and Standards
  • Review Protocols
  • Databases and Sources
  • Randomized Controlled Trials
  • Controlled Clinical Trials
  • Observational Designs
  • Tests of Diagnostic Accuracy
  • Software and Tools
  • Where do I get all those articles?
  • Collaborations
  • EPI 233/528
  • Countway Mediated Search
  • Risk of Bias (RoB)

Systematic review Q & A

What is a systematic review.

A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and pre-selected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies. A well-designed systematic review includes clear objectives, pre-selected criteria for identifying eligible studies, an explicit methodology, a thorough and reproducible search of the literature, an assessment of the validity or risk of bias of each included study, and a systematic synthesis, analysis and presentation of the findings of the included studies. A systematic review may include a meta-analysis.

For details about carrying out systematic reviews, see the Guides and Standards section of this guide.

Is my research topic appropriate for systematic review methods?

A systematic review is best deployed to test a specific hypothesis about a healthcare or public health intervention or exposure. By focusing on a single intervention or a few specific interventions for a particular condition, the investigator can ensure a manageable results set. Moreover, examining a single or small set of related interventions, exposures, or outcomes, will simplify the assessment of studies and the synthesis of the findings.

Systematic reviews are poor tools for hypothesis generation: for instance, to determine what interventions have been used to increase the awareness and acceptability of a vaccine or to investigate the ways that predictive analytics have been used in health care management. In the first case, we don't know what interventions to search for and so have to screen all the articles about awareness and acceptability. In the second, there is no agreed on set of methods that make up predictive analytics, and health care management is far too broad. The search will necessarily be incomplete, vague and very large all at the same time. In most cases, reviews without clearly and exactly specified populations, interventions, exposures, and outcomes will produce results sets that quickly outstrip the resources of a small team and offer no consistent way to assess and synthesize findings from the studies that are identified.

If not a systematic review, then what?

You might consider performing a scoping review . This framework allows iterative searching over a reduced number of data sources and no requirement to assess individual studies for risk of bias. The framework includes built-in mechanisms to adjust the analysis as the work progresses and more is learned about the topic. A scoping review won't help you limit the number of records you'll need to screen (broad questions lead to large results sets) but may give you means of dealing with a large set of results.

This tool can help you decide what kind of review is right for your question.

Can my student complete a systematic review during her summer project?

Probably not. Systematic reviews are a lot of work. Including creating the protocol, building and running a quality search, collecting all the papers, evaluating the studies that meet the inclusion criteria and extracting and analyzing the summary data, a well done review can require dozens to hundreds of hours of work that can span several months. Moreover, a systematic review requires subject expertise, statistical support and a librarian to help design and run the search. Be aware that librarians sometimes have queues for their search time. It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100-200 records. A systematic review is a labor-intensive team effort.

How can I know if my topic has been been reviewed already?

Before starting out on a systematic review, check to see if someone has done it already. In PubMed you can use the systematic review subset to limit to a broad group of papers that is enriched for systematic reviews. You can invoke the subset by selecting if from the Article Types filters to the left of your PubMed results, or you can append AND systematic[sb] to your search. For example:

"neoadjuvant chemotherapy" AND systematic[sb]

The systematic review subset is very noisy, however. To quickly focus on systematic reviews (knowing that you may be missing some), simply search for the word systematic in the title:

"neoadjuvant chemotherapy" AND systematic[ti]

Any PRISMA-compliant systematic review will be captured by this method since including the words "systematic review" in the title is a requirement of the PRISMA checklist. Cochrane systematic reviews do not include 'systematic' in the title, however. It's worth checking the Cochrane Database of Systematic Reviews independently.

You can also search for protocols that will indicate that another group has set out on a similar project. Many investigators will register their protocols in PROSPERO , a registry of review protocols. Other published protocols as well as Cochrane Review protocols appear in the Cochrane Methodology Register, a part of the Cochrane Library .

  • Next: Guides and Standards >>
  • Last Updated: Sep 4, 2024 4:04 PM
  • URL: https://guides.library.harvard.edu/meta-analysis
         


10 Shattuck St, Boston MA 02115 | (617) 432-2136

| |
Copyright © 2020 President and Fellows of Harvard College. All rights reserved.

  • A-Z Publications

Annual Review of Psychology

Volume 70, 2019, review article, how to do a systematic review: a best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses.

  • Andy P. Siddaway 1 , Alex M. Wood 2 , and Larry V. Hedges 3
  • View Affiliations Hide Affiliations Affiliations: 1 Behavioural Science Centre, Stirling Management School, University of Stirling, Stirling FK9 4LA, United Kingdom; email: [email protected] 2 Department of Psychological and Behavioural Science, London School of Economics and Political Science, London WC2A 2AE, United Kingdom 3 Department of Statistics, Northwestern University, Evanston, Illinois 60208, USA; email: [email protected]
  • Vol. 70:747-770 (Volume publication date January 2019) https://doi.org/10.1146/annurev-psych-010418-102803
  • First published as a Review in Advance on August 08, 2018
  • Copyright © 2019 by Annual Reviews. All rights reserved

Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information. We outline core standards and principles and describe commonly encountered problems. Although this guide targets psychological scientists, its high level of abstraction makes it potentially relevant to any subject area or discipline. We argue that systematic reviews are a key methodology for clarifying whether and how research findings replicate and for explaining possible inconsistencies, and we call for researchers to conduct systematic reviews to help elucidate whether there is a replication crisis.

Article metrics loading...

Full text loading...

Literature Cited

  • APA Publ. Commun. Board Work. Group J. Artic. Rep. Stand. 2008 . Reporting standards for research in psychology: Why do we need them? What might they be?. Am. Psychol . 63 : 848– 49 [Google Scholar]
  • Baumeister RF 2013 . Writing a literature review. The Portable Mentor: Expert Guide to a Successful Career in Psychology MJ Prinstein, MD Patterson 119– 32 New York: Springer, 2nd ed.. [Google Scholar]
  • Baumeister RF , Leary MR 1995 . The need to belong: desire for interpersonal attachments as a fundamental human motivation. Psychol. Bull. 117 : 497– 529 [Google Scholar]
  • Baumeister RF , Leary MR 1997 . Writing narrative literature reviews. Rev. Gen. Psychol. 3 : 311– 20 Presents a thorough and thoughtful guide to conducting narrative reviews. [Google Scholar]
  • Bem DJ 1995 . Writing a review article for Psychological Bulletin. Psychol . Bull 118 : 172– 77 [Google Scholar]
  • Borenstein M , Hedges LV , Higgins JPT , Rothstein HR 2009 . Introduction to Meta-Analysis New York: Wiley Presents a comprehensive introduction to meta-analysis. [Google Scholar]
  • Borenstein M , Higgins JPT , Hedges LV , Rothstein HR 2017 . Basics of meta-analysis: I 2 is not an absolute measure of heterogeneity. Res. Synth. Methods 8 : 5– 18 [Google Scholar]
  • Braver SL , Thoemmes FJ , Rosenthal R 2014 . Continuously cumulating meta-analysis and replicability. Perspect. Psychol. Sci. 9 : 333– 42 [Google Scholar]
  • Bushman BJ 1994 . Vote-counting procedures. The Handbook of Research Synthesis H Cooper, LV Hedges 193– 214 New York: Russell Sage Found. [Google Scholar]
  • Cesario J 2014 . Priming, replication, and the hardest science. Perspect. Psychol. Sci. 9 : 40– 48 [Google Scholar]
  • Chalmers I 2007 . The lethal consequences of failing to make use of all relevant evidence about the effects of medical treatments: the importance of systematic reviews. Treating Individuals: From Randomised Trials to Personalised Medicine PM Rothwell 37– 58 London: Lancet [Google Scholar]
  • Cochrane Collab. 2003 . Glossary Rep., Cochrane Collab. London: http://community.cochrane.org/glossary Presents a comprehensive glossary of terms relevant to systematic reviews. [Google Scholar]
  • Cohn LD , Becker BJ 2003 . How meta-analysis increases statistical power. Psychol. Methods 8 : 243– 53 [Google Scholar]
  • Cooper HM 2003 . Editorial. Psychol. Bull. 129 : 3– 9 [Google Scholar]
  • Cooper HM 2016 . Research Synthesis and Meta-Analysis: A Step-by-Step Approach Thousand Oaks, CA: Sage, 5th ed.. Presents a comprehensive introduction to research synthesis and meta-analysis. [Google Scholar]
  • Cooper HM , Hedges LV , Valentine JC 2009 . The Handbook of Research Synthesis and Meta-Analysis New York: Russell Sage Found, 2nd ed.. [Google Scholar]
  • Cumming G 2014 . The new statistics: why and how. Psychol. Sci. 25 : 7– 29 Discusses the limitations of null hypothesis significance testing and viable alternative approaches. [Google Scholar]
  • Earp BD , Trafimow D 2015 . Replication, falsification, and the crisis of confidence in social psychology. Front. Psychol. 6 : 621 [Google Scholar]
  • Etz A , Vandekerckhove J 2016 . A Bayesian perspective on the reproducibility project: psychology. PLOS ONE 11 : e0149794 [Google Scholar]
  • Ferguson CJ , Brannick MT 2012 . Publication bias in psychological science: prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychol. Methods 17 : 120– 28 [Google Scholar]
  • Fleiss JL , Berlin JA 2009 . Effect sizes for dichotomous data. The Handbook of Research Synthesis and Meta-Analysis H Cooper, LV Hedges, JC Valentine 237– 53 New York: Russell Sage Found, 2nd ed.. [Google Scholar]
  • Garside R 2014 . Should we appraise the quality of qualitative research reports for systematic reviews, and if so, how. Innovation 27 : 67– 79 [Google Scholar]
  • Hedges LV , Olkin I 1980 . Vote count methods in research synthesis. Psychol. Bull. 88 : 359– 69 [Google Scholar]
  • Hedges LV , Pigott TD 2001 . The power of statistical tests in meta-analysis. Psychol. Methods 6 : 203– 17 [Google Scholar]
  • Higgins JPT , Green S 2011 . Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0 London: Cochrane Collab. Presents comprehensive and regularly updated guidelines on systematic reviews. [Google Scholar]
  • John LK , Loewenstein G , Prelec D 2012 . Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23 : 524– 32 [Google Scholar]
  • Juni P , Witschi A , Bloch R , Egger M 1999 . The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 282 : 1054– 60 [Google Scholar]
  • Klein O , Doyen S , Leys C , Magalhães de Saldanha da Gama PA , Miller S et al. 2012 . Low hopes, high expectations: expectancy effects and the replicability of behavioral experiments. Perspect. Psychol. Sci. 7 : 6 572– 84 [Google Scholar]
  • Lau J , Antman EM , Jimenez-Silva J , Kupelnick B , Mosteller F , Chalmers TC 1992 . Cumulative meta-analysis of therapeutic trials for myocardial infarction. N. Engl. J. Med. 327 : 248– 54 [Google Scholar]
  • Light RJ , Smith PV 1971 . Accumulating evidence: procedures for resolving contradictions among different research studies. Harvard Educ. Rev. 41 : 429– 71 [Google Scholar]
  • Lipsey MW , Wilson D 2001 . Practical Meta-Analysis London: Sage Comprehensive and clear explanation of meta-analysis. [Google Scholar]
  • Matt GE , Cook TD 1994 . Threats to the validity of research synthesis. The Handbook of Research Synthesis H Cooper, LV Hedges 503– 20 New York: Russell Sage Found. [Google Scholar]
  • Maxwell SE , Lau MY , Howard GS 2015 . Is psychology suffering from a replication crisis? What does “failure to replicate” really mean?. Am. Psychol. 70 : 487– 98 [Google Scholar]
  • Moher D , Hopewell S , Schulz KF , Montori V , Gøtzsche PC et al. 2010 . CONSORT explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 340 : c869 [Google Scholar]
  • Moher D , Liberati A , Tetzlaff J , Altman DG PRISMA Group. 2009 . Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 339 : 332– 36 Comprehensive reporting guidelines for systematic reviews. [Google Scholar]
  • Morrison A , Polisena J , Husereau D , Moulton K , Clark M et al. 2012 . The effect of English-language restriction on systematic review-based meta-analyses: a systematic review of empirical studies. Int. J. Technol. Assess. Health Care 28 : 138– 44 [Google Scholar]
  • Nelson LD , Simmons J , Simonsohn U 2018 . Psychology's renaissance. Annu. Rev. Psychol. 69 : 511– 34 [Google Scholar]
  • Noblit GW , Hare RD 1988 . Meta-Ethnography: Synthesizing Qualitative Studies Newbury Park, CA: Sage [Google Scholar]
  • Olivo SA , Macedo LG , Gadotti IC , Fuentes J , Stanton T , Magee DJ 2008 . Scales to assess the quality of randomized controlled trials: a systematic review. Phys. Ther. 88 : 156– 75 [Google Scholar]
  • Open Sci. Collab. 2015 . Estimating the reproducibility of psychological science. Science 349 : 943 [Google Scholar]
  • Paterson BL , Thorne SE , Canam C , Jillings C 2001 . Meta-Study of Qualitative Health Research: A Practical Guide to Meta-Analysis and Meta-Synthesis Thousand Oaks, CA: Sage [Google Scholar]
  • Patil P , Peng RD , Leek JT 2016 . What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11 : 539– 44 [Google Scholar]
  • Rosenthal R 1979 . The “file drawer problem” and tolerance for null results. Psychol. Bull. 86 : 638– 41 [Google Scholar]
  • Rosnow RL , Rosenthal R 1989 . Statistical procedures and the justification of knowledge in psychological science. Am. Psychol. 44 : 1276– 84 [Google Scholar]
  • Sanderson S , Tatt ID , Higgins JP 2007 . Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int. J. Epidemiol. 36 : 666– 76 [Google Scholar]
  • Schreiber R , Crooks D , Stern PN 1997 . Qualitative meta-analysis. Completing a Qualitative Project: Details and Dialogue JM Morse 311– 26 Thousand Oaks, CA: Sage [Google Scholar]
  • Shrout PE , Rodgers JL 2018 . Psychology, science, and knowledge construction: broadening perspectives from the replication crisis. Annu. Rev. Psychol. 69 : 487– 510 [Google Scholar]
  • Stroebe W , Strack F 2014 . The alleged crisis and the illusion of exact replication. Perspect. Psychol. Sci. 9 : 59– 71 [Google Scholar]
  • Stroup DF , Berlin JA , Morton SC , Olkin I , Williamson GD et al. 2000 . Meta-analysis of observational studies in epidemiology (MOOSE): a proposal for reporting. JAMA 283 : 2008– 12 [Google Scholar]
  • Thorne S , Jensen L , Kearney MH , Noblit G , Sandelowski M 2004 . Qualitative meta-synthesis: reflections on methodological orientation and ideological agenda. Qual. Health Res. 14 : 1342– 65 [Google Scholar]
  • Tong A , Flemming K , McInnes E , Oliver S , Craig J 2012 . Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Med. Res. Methodol. 12 : 181– 88 [Google Scholar]
  • Trickey D , Siddaway AP , Meiser-Stedman R , Serpell L , Field AP 2012 . A meta-analysis of risk factors for post-traumatic stress disorder in children and adolescents. Clin. Psychol. Rev. 32 : 122– 38 [Google Scholar]
  • Valentine JC , Biglan A , Boruch RF , Castro FG , Collins LM et al. 2011 . Replication in prevention science. Prev. Sci. 12 : 103– 17 [Google Scholar]
  • Article Type: Review Article

Most Read This Month

Most cited most cited rss feed, job burnout, executive functions, social cognitive theory: an agentic perspective, on happiness and human potentials: a review of research on hedonic and eudaimonic well-being, sources of method bias in social science research and recommendations on how to control it, mediation analysis, missing data analysis: making it work in the real world, grounded cognition, personality structure: emergence of the five-factor model, motivational beliefs, values, and goals.

  • Research Process
  • Manuscript Preparation
  • Manuscript Review
  • Publication Process
  • Publication Recognition
  • Language Editing Services
  • Translation Services

Elsevier QRcode Wechat

Systematic Literature Review or Literature Review?

  • 3 minute read
  • 60.4K views

Table of Contents

As a researcher, you may be required to conduct a literature review. But what kind of review do you need to complete? Is it a systematic literature review or a standard literature review? In this article, we’ll outline the purpose of a systematic literature review, the difference between literature review and systematic review, and other important aspects of systematic literature reviews.

What is a Systematic Literature Review?

The purpose of systematic literature reviews is simple. Essentially, it is to provide a high-level of a particular research question. This question, in and of itself, is highly focused to match the review of the literature related to the topic at hand. For example, a focused question related to medical or clinical outcomes.

The components of a systematic literature review are quite different from the standard literature review research theses that most of us are used to (more on this below). And because of the specificity of the research question, typically a systematic literature review involves more than one primary author. There’s more work related to a systematic literature review, so it makes sense to divide the work among two or three (or even more) researchers.

Your systematic literature review will follow very clear and defined protocols that are decided on prior to any review. This involves extensive planning, and a deliberately designed search strategy that is in tune with the specific research question. Every aspect of a systematic literature review, including the research protocols, which databases are used, and dates of each search, must be transparent so that other researchers can be assured that the systematic literature review is comprehensive and focused.

Most systematic literature reviews originated in the world of medicine science. Now, they also include any evidence-based research questions. In addition to the focus and transparency of these types of reviews, additional aspects of a quality systematic literature review includes:

  • Clear and concise review and summary
  • Comprehensive coverage of the topic
  • Accessibility and equality of the research reviewed

Systematic Review vs Literature Review

The difference between literature review and systematic review comes back to the initial research question. Whereas the systematic review is very specific and focused, the standard literature review is much more general. The components of a literature review, for example, are similar to any other research paper. That is, it includes an introduction, description of the methods used, a discussion and conclusion, as well as a reference list or bibliography.

A systematic review, however, includes entirely different components that reflect the specificity of its research question, and the requirement for transparency and inclusion. For instance, the systematic review will include:

  • Eligibility criteria for included research
  • A description of the systematic research search strategy
  • An assessment of the validity of reviewed research
  • Interpretations of the results of research included in the review

As you can see, contrary to the general overview or summary of a topic, the systematic literature review includes much more detail and work to compile than a standard literature review. Indeed, it can take years to conduct and write a systematic literature review. But the information that practitioners and other researchers can glean from a systematic literature review is, by its very nature, exceptionally valuable.

This is not to diminish the value of the standard literature review. The importance of literature reviews in research writing is discussed in this article . It’s just that the two types of research reviews answer different questions, and, therefore, have different purposes and roles in the world of research and evidence-based writing.

Systematic Literature Review vs Meta Analysis

It would be understandable to think that a systematic literature review is similar to a meta analysis. But, whereas a systematic review can include several research studies to answer a specific question, typically a meta analysis includes a comparison of different studies to suss out any inconsistencies or discrepancies. For more about this topic, check out Systematic Review VS Meta-Analysis article.

Language Editing Plus

With Elsevier’s Language Editing Plus services , you can relax with our complete language review of your systematic literature review or literature review, or any other type of manuscript or scientific presentation. Our editors are PhD or PhD candidates, who are native-English speakers. Language Editing Plus includes checking the logic and flow of your manuscript, reference checks, formatting in accordance to your chosen journal and even a custom cover letter. Our most comprehensive editing package, Language Editing Plus also includes any English-editing needs for up to 180 days.

PowerPoint Presentation of Your Research Paper

How to Make a PowerPoint Presentation of Your Research Paper

Strong Research Hypothesis

Step-by-Step Guide: How to Craft a Strong Research Hypothesis

You may also like.

what is a descriptive research design

Descriptive Research Design and Its Myriad Uses

Doctor doing a Biomedical Research Paper

Five Common Mistakes to Avoid When Writing a Biomedical Research Paper

Writing in Environmental Engineering

Making Technical Writing in Environmental Engineering Accessible

Risks of AI-assisted Academic Writing

To Err is Not Human: The Dangers of AI-assisted Academic Writing

Importance-of-Data-Collection

When Data Speak, Listen: Importance of Data Collection and Analysis Methods

choosing the Right Research Methodology

Choosing the Right Research Methodology: A Guide for Researchers

Why is data validation important in research

Why is data validation important in research?

Writing a good review article

Writing a good review article

Input your search keywords and press Enter.

systematic literature review of

What is a Systematic Literature Review?

A systematic literature review (SLR) is an independent academic method that aims to identify and evaluate all relevant literature on a topic in order to derive conclusions about the question under consideration. "Systematic reviews are undertaken to clarify the state of existing research and the implications that should be drawn from this." (Feak & Swales, 2009, p. 3) An SLR can demonstrate the current state of research on a topic, while identifying gaps and areas requiring further research with regard to a given research question. A formal methodological approach is pursued in order to reduce distortions caused by an overly restrictive selection of the available literature and to increase the reliability of the literature selected (Tranfield, Denyer & Smart, 2003). A special aspect in this regard is the fact that a research objective is defined for the search itself and the criteria for determining what is to be included and excluded are defined prior to conducting the search. The search is mainly performed in electronic literature databases (such as Business Source Complete or Web of Science), but also includes manual searches (reviews of reference lists in relevant sources) and the identification of literature not yet published in order to obtain a comprehensive overview of a research topic.

An SLR protocol documents all the information gathered and the steps taken as part of an SLR in order to make the selection process transparent and reproducible. The PRISMA flow-diagram support you in making the selection process visible.

In an ideal scenario, experts from the respective research discipline, as well as experts working in the relevant field and in libraries, should be involved in setting the search terms . As a rule, the literature is selected by two or more reviewers working independently of one another. Both measures serve the purpose of increasing the objectivity of the literature selection. An SLR must, then, be more than merely a summary of a topic (Briner & Denyer, 2012). As such, it also distinguishes itself from “ordinary” surveys of the available literature. The following table shows the differences between an SLR and an “ordinary” literature review.

  • Charts of BSWL workshop (pdf, 2.88 MB)
  • Listen to the interview (mp4, 12.35 MB)

Differences to "common" literature reviews

CharacteristicSLRcommon literature overview
Independent research methodyesno
Explicit formulation of the search objectivesyesno
Identification of all publications on a topicyesno
Defined criteria for inclusion and exclusion of publicationsyesno
Description of search procedureyesno
Literature selection and information extraction by several personsyesno
Transparent quality evaluation of publicationsyesno

What are the objectives of SLRs?

  • Avoidance of research redundancies despite a growing amount of publications
  • Identification of research areas, gaps and methods
  • Input for evidence-based management, which allows to base management decisions on scientific methods and findings
  • Identification of links between different areas of researc

Process steps of an SLR

A SLR has several process steps which are defined differently in the literature (Fink 2014, p. 4; Guba 2008, Transfield et al. 2003). We distinguish the following steps which are adapted to the economics and management research area:

1. Defining research questions

Briner & Denyer (2009, p. 347ff.) have developed the CIMO scheme to establish clearly formulated and answerable research questions in the field of economic sciences:

C – CONTEXT:  Which individuals, relationships, institutional frameworks and systems are being investigated?

I – Intervention:  The effects of which event, action or activity are being investigated?

M – Mechanisms:  Which mechanisms can explain the relationship between interventions and results? Under what conditions do these mechanisms take effect?

O – Outcomes:  What are the effects of the intervention? How are the results measured? What are intended and unintended effects?

The objective of the systematic literature review is used to formulate research questions such as “How can a project team be led effectively?”. Since there are numerous interpretations and constructs for “effective”, “leadership” and “project team”, these terms must be particularized.

With the aid of the scheme, the following concrete research questions can be derived with regard to this example:

Under what conditions (C) does leadership style (I) influence the performance of project teams (O)?

Which constructs have an effect upon the influence of leadership style (I) on a project team’s performance (O)?          

Research questions do not necessarily need to follow the CIMO scheme, but they should:

  • ... be formulated in a clear, focused and comprehensible manner and be answerable;
  • ... have been determined prior to carrying out the SLR;
  • ... consist of general and specific questions.

As early as this stage, the criteria for inclusion and exclusion are also defined. The selection of the criteria must be well-grounded. This may include conceptual factors such as a geographical or temporal restrictions, congruent definitions of constructs, as well as quality criteria (journal impact factor > x).

2. Selecting databases and other research sources

The selection of sources must be described and explained in detail. The aim is to find a balance between the relevance of the sources (content-related fit) and the scope of the sources.

In the field of economic sciences, there are a number of literature databases that can be searched as part of an SLR. Some examples in this regard are:

  • Business Source Complete
  • ProQuest One Business
  • EconBiz        

Our video " Selecting the right databases " explains how to find relevant databases for your topic.

Literature databases are an important source of research for SLRs, as they can minimize distortions caused by an individual literature selection (selection bias), while offering advantages for a systematic search due to their data structure. The aim is to find all database entries on a topic and thus keep the retrieval bias low (tutorial on retrieval bias ).  Besides articles from scientific journals, it is important to inlcude working papers, conference proceedings, etc to reduce the publication bias ( tutorial on publication bias ).

Our online self-study course " Searching economic databases " explains step 2 und 3.

3. Defining search terms

Once the literature databases and other research sources have been selected, search terms are defined. For this purpose, the research topic/questions is/are divided into blocks of terms of equal ranking. This approach is called the block-building method (Guba 2008, p. 63). The so-called document-term matrix, which lists topic blocks and search terms according to a scheme, is helpful in this regard. The aim is to identify as many different synonyms as possible for the partial terms. A precisely formulated research question facilitates the identification of relevant search terms. In addition, keywords from particularly relevant articles support the formulation of search terms.

A document-term matrix for the topic “The influence of management style on the performance of project teams” is shown in this example .

Identification of headwords and keywords

When setting search terms, a distinction must be made between subject headings and keywords, both of which are described below:

  • appear in the title, abstract and/or text
  • sometimes specified by the author, but in most cases automatically generated
  • non-standardized
  • different spellings and forms (singular/plural) must be searched separately

Subject headings

  • describe the content
  • are generated by an editorial team
  • are listed in a standardized list (thesaurus)
  • may comprise various keywords
  • include different spellings
  • database-specific

Subject headings are a standardized list of words that are generated by the specialists in charge of some databases. This so-called index of subject headings (thesaurus) helps searchers find relevant articles, since the headwords indicate the content of a publication. By contrast, an ordinary keyword search does not necessarily result in a content-related fit, since the database also displays articles in which, for example, a word appears once in the abstract, even though the article’s content does not cover the topic.

Nevertheless, searches using both headwords and keywords should be conducted, since some articles may not yet have been assigned headwords, or errors may have occurred during the assignment of headwords. 

To add headwords to your search in the Business Source Complete database, please select the Thesaurus tab at the top. Here you can find headwords in a new search field and integrate them into your search query. In the search history, headwords are marked with the addition DE (descriptor).

The EconBiz database of the German National Library of Economics (ZBW – Leibniz Information Centre for Economics), which also contains German-language literature, has created its own index of subject headings with the STW Thesaurus for Economics . Headwords are integrated into the search by being used in the search query.

Since the indexes of subject headings divide terms into synonyms, generic terms and sub-aspects, they facilitate the creation of a document-term matrix. For this purpose it is advisable to specify in the document-term matrix the origin of the search terms (STW Thesaurus for Economics, Business Source Complete, etc.).

Searching in literature databases

Once the document-term matrix has been defined, the search in literature databases begins. It is recommended to enter each word of the document-term matrix individually into the database in order to obtain a good overview of the number of hits per word. Finally, all the words contained in a block of terms are linked with the Boolean operator OR and thereby a union of all the words is formed. The latter are then linked with each other using the Boolean operator AND. In doing so, each block should be added individually in order to see to what degree the number of hits decreases.

Since the search query must be set up separately for each database, tools such as  LitSonar  have been developed to enable a systematic search across different databases. LitSonar was created by  Professor Dr. Ali Sunyaev (Institute of Applied Informatics and Formal Description Methods – AIFB) at the Karlsruhe Institute of Technology.

Advanced search

Certain database-specific commands can be used to refine a search, for example, by taking variable word endings into account (*) or specifying the distance between two words, etc. Our overview shows the most important search commands for our top databases.

Additional searches in sources other than literature databases

In addition to literature databases, other sources should also be searched. Fink (2014, p. 27) lists the following reasons for this:

  • the topic is new and not yet included in indexes of subject headings;
  • search terms are not used congruently in articles because uniform definitions do not exist;
  • some studies are still in the process of being published, or have been completed, but not published.

Therefore, further search strategies are manual search, bibliographic analysis, personal contacts and academic networks (Briner & Denyer, p. 349). Manual search means that you go through the source information of relevant articles and supplement your hit list accordingly. In addition, you should conduct a targeted search for so-called gray literature, that is, literature not distributed via the book trade, such as working papers from specialist areas and conference reports. By including different types of publications, the so-called publication bias (DBWM video “Understanding publication bias” ) – that is, distortions due to exclusive use of articles from peer-reviewed journals – should be kept to a minimum.

The PRESS-Checklist can support you to check the correctness of your search terms.

4. Merging hits from different databases

In principle, large amounts of data can be easily collected, structured and sorted with data processing programs such as Excel. Another option is to use reference management programs such as EndNote, Citavi or Zotero. The Saxon State and University Library Dresden (SLUB Dresden) provides an  overview of current reference management programs  . Software for qualitative data analysis such as NVivo is equally suited for data processing. A comprehensive overview of the features of different tools that support the SLR process can be found in Bandara et al. (2015).

Our online-self study course "Managing literature with Citavi" shows you how to use the reference management software Citavi.

When conducting an SLR, you should specify for each hit the database from which it originates and the date on which the query was made. In addition, you should always indicate how many hits you have identified in the various databases or, for example, by manual search.

Exporting data from literature databases

Exporting from literature databases is very easy. In  Business Source Complete  , you must first click on the “Share” button in the hit list, then “Email a link to download exported results” at the very bottom and then select the appropriate format for the respective literature program.

Exporting data from the literature database  EconBiz  is somewhat more complex. Here you must first create a marked list and then select each hit individually and add it to the marked list. Afterwards, articles on the list can be exported.

After merging all hits from the various databases, duplicate entries (duplicates) are deleted.

5. Applying inclusion and exclusion criteria

All publications are evaluated in the literature management program applying the previously defined criteria for inclusion and exclusion. Only those sources that survive this selection process will subsequently be analyzed. The review process and inclusion criteria should be tested with a small sample and adjustments made if necessary before applying it to all articles. In the ideal case, even this selection would be carried out by more than one person, with each working independently of one another. It needs to be made clear how discrepancies between reviewers are dealt with. 

The review of the criteria for inclusion and exclusion is primarily based on the title, abstract and subject headings in the databases, as well as on the keywords provided by the authors of a publication in the first step. In a second step the whole article / source will be read.

You can create tag words for the inclusion and exclusion in your literature management tool to keep an overview.

In addition to the common literature management tools, you can also use software tools that have been developed to support SLRs. The central library of the university in Zurich has published an overview and evaluation of different tools based on a survey among researchers. --> View SLR tools

The selection process needs to be made transparent. The PRISMA flow diagram supports the visualization of the number of included / excluded studies.

Forward and backward search

Should it become apparent that the number of sources found is relatively small, or if you wish to proceed with particular thoroughness, a forward-and-backward search based on the sources found is recommendable (Webster & Watson 2002, p. xvi). A backward search means going through the bibliographies of the sources found. A forward search, by contrast, identifies articles that have cited the relevant publications. The Web of Science and Scopus databases can be used to perform citation analyses.

6. Perform the review

As the next step, the remaining titles are analyzed as to their content by reading them several times in full. Information is extracted according to defined criteria and the quality of the publications is evaluated. If the data extraction is carried out by more than one person, a training ensures that there will be no differences between the reviewers.

Depending on the research questions there exist diffent methods for data abstraction (content analysis, concept matrix etc.). A so-called concept matrix can be used to structure the content of information (Webster & Watson 2002, p. xvii). The image to the right gives an example of a concept matrix according to Becker (2014).

Particularly in the field of economic sciences, the evaluation of a study’s quality cannot be performed according to a generally valid scheme, such as those existing in the field of medicine, for instance. Quality assessment therefore depends largely on the research questions.

Based on the findings of individual studies, a meta-level is then applied to try to understand what similarities and differences exist between the publications, what research gaps exist, etc. This may also result in the development of a theoretical model or reference framework.

Example concept matrix (Becker 2013) on the topic Business Process Management

ArticlePatternConfigurationSimilarities
Thom (2008)x  
Yang (2009)x x
Rosa (2009) xx

7. Synthesizing results

Once the review has been conducted, the results must be compiled and, on the basis of these, conclusions derived with regard to the research question (Fink 2014, p. 199ff.). This includes, for example, the following aspects:

  • historical development of topics (histogram, time series: when, and how frequently, did publications on the research topic appear?);
  • overview of journals, authors or specialist disciplines dealing with the topic;
  • comparison of applied statistical methods;
  • topics covered by research;
  • identifying research gaps;
  • developing a reference framework;
  • developing constructs;
  • performing a meta-analysis: comparison of the correlations of the results of different empirical studies (see for example Fink 2014, p. 203 on conducting meta-analyses)

Publications about the method

Bandara, W., Furtmueller, E., Miskon, S., Gorbacheva, E., & Beekhuyzen, J. (2015). Achieving Rigor in Literature Reviews: Insights from Qualitative Data Analysis and Tool-Support.  Communications of the Association for Information Systems . 34(8), 154-204.

Booth, A., Papaioannou, D., and Sutton, A. (2012)  Systematic approaches to a successful literature review.  London: Sage.

Briner, R. B., & Denyer, D. (2012). Systematic Review and Evidence Synthesis as a Practice and Scholarship Tool. In Rousseau, D. M. (Hrsg.),  The Oxford Handbook of Evidenence Based Management . (S. 112-129). Oxford: Oxford University Press.

Durach, C. F., Wieland, A., & Machuca, Jose A. D. (2015). Antecedents and dimensions of supply chain robustness: a systematic literature review . International Journal of Physical Distribution & Logistic Management , 46 (1/2), 118-137. doi:  https://doi.org/10.1108/IJPDLM-05-2013-0133

Feak, C. B., & Swales, J. M. (2009). Telling a Research Story: Writing a Literature Review.  English in Today's Research World 2.  Ann Arbor: University of Michigan Press. doi:  10.3998/mpub.309338

Fink, A. (2014).  Conducting Research Literature Reviews: From the Internet to Paper  (4. Aufl.). Los Angeles, London, New Delhi, Singapore, Washington DC: Sage Publication.

Fisch, C., & Block, J. (2018). Six tips for your (systematic) literature review in business and management research.  Management Review Quarterly,  68, 103–106 (2018).  doi.org/10.1007/s11301-018-0142-x

Guba, B. (2008). Systematische Literaturrecherche.  Wiener Medizinische Wochenschrift , 158 (1-2), S. 62-69. doi:  doi.org/10.1007/s10354-007-0500-0  Hart, C.  Doing a literature review: releasing the social science research imagination.  London: Sage.

Jesson, J. K., Metheson, L. & Lacey, F. (2011).  Doing your Literature Review - traditional and Systematic Techniques . Los Angeles, London, New Delhi, Singapore, Washington DC: Sage Publication.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. doi: 10.1136/bmj.n71.

Petticrew, M. and Roberts, H. (2006).  Systematic Reviews in the Social Sciences: A Practical Guide . Oxford:Blackwell. Ridley, D. (2012).  The literature review: A step-by-step guide . 2nd edn. London: Sage. 

Chang, W. and Taylor, S.A. (2016), The Effectiveness of Customer Participation in New Product Development: A Meta-Analysis,  Journal of Marketing , American Marketing Association, Los Angeles, CA, Vol. 80 No. 1, pp. 47–64.

Tranfield, D., Denyer, D. & Smart, P. (2003). Towards a methodology for developing evidence-informed management knowledge by means of systematic review.  British Journal of Management , 14 (3), S. 207-222. doi:  https://doi.org/10.1111/1467-8551.00375

Webster, J., & Watson, R. T. (2002). Analyzing the Past to Prepare for the Future: Writing a Literature Review.  Management Information Systems Quarterly , 26(2), xiii-xxiii.  http://www.jstor.org/stable/4132319

Durach, C. F., Wieland, A. & Machuca, Jose. A. D. (2015). Antecedents and dimensions of supply chain robustness: a systematic literature review. International Journal of Physical Distribution & Logistics Management, 45(1/2), 118 – 137.

What is particularly good about this example is that search terms were defined by a number of experts and the review was conducted by three researchers working independently of one another. Furthermore, the search terms used have been very well extracted and the procedure of the literature selection very well described.

On the downside, the restriction to English-language literature brings the language bias into play, even though the authors consider it to be insignificant for the subject area.

Bos-Nehles, A., Renkema, M. & Janssen, M. (2017). HRM and innovative work behaviour: a systematic literature review. Personnel Review, 46(7), pp. 1228-1253

  • Only very specific keywords used
  • No precise information on how the review process was carried out (who reviewed articles?)
  • Only journals with impact factor (publication bias)

Jia, F., Orzes, G., Sartor, M. & Nassimbeni, G. (2017). Global sourcing strategy and structure: towards a conceptual framework. International Journal of Operations & Production Management, 37(7), 840-864

  • Research questions are explicitly presented
  • Search string very detailed
  • Exact description of the review process
  • 2 persons conducted the review independently of each other

Franziska Klatt

[email protected]

+49 30 314-29778

systematic literature review of

Privacy notice: The TU Berlin offers a chat information service. If you enable it, your IP address and chat messages will be transmitted to external EU servers. more information

The chat is currently unavailable.

Please use our alternative contact options.

systematic literature review of

Exploring Systematic Literature Reviews: A Comprehensive Guide for Graduate Researchers

Introduction.

When conducting graduate-level research, a standard literature review might not always be sufficient, especially if you’re aiming for a high level of rigor and reproducibility. This is where systematic literature reviews come in. Unlike traditional reviews, a systematic literature review (SLR) follows a structured and detailed methodology to identify, assess, and synthesize relevant studies on a specific research question. This method is particularly valuable for fields such as medicine, social sciences, and education, where researchers must provide evidence-based conclusions. In this guide, we’ll walk you through the key steps of conducting a systematic literature review and how it differs from other review types.

.

What Is a Systematic Literature Review?

A systematic literature review (SLR) is a comprehensive approach to analyzing existing research. It follows a specific protocol and predefined criteria to ensure objectivity and transparency in the selection and evaluation of studies.

Key Elements of a Systematic Review:

Predefined Research Question: You must start with a clear, specific research question.

Inclusion and Exclusion Criteria: These rules are used to filter which studies are included.

Reproducibility: Another researcher should be able to replicate your review process.

Bias Reduction: Through systematic methods, you reduce selection bias compared to traditional reviews.

Why Conduct an SLR?

Systematic reviews are considered high-quality evidence because they minimize bias and offer a comprehensive analysis of existing research on a topic. SLRs are especially common in fields like healthcare, where decision-making must be based on all available evidence.

How to Define Your Research Question

The first and most crucial step in conducting an SLR is formulating a research question. Your question should be specific enough to allow for a focused review but broad enough to capture relevant studies.

Tip: Use the PICO Framework One effective way to define a question is the PICO framework, often used in healthcare and social sciences:

P : Population (Who are you studying?)

I : Intervention (What is the intervention or focus?)

C : Comparison (Is there a comparison group or intervention?)

O : Outcome (What are you measuring?)

For example, your SLR question could be: “What are the effects of peer tutoring (I) on academic performance (O) for high school students (P) compared to traditional teaching methods (C)?”

Developing a Search Strategy

Once your research question is clear, the next step is to develop a detailed search strategy. This includes identifying databases, search terms, and filters to find relevant studies.

Tip: Use Multiple Databases Common databases for systematic reviews include:

PubMed : For biomedical and life sciences.

PsycINFO : For psychology-related studies.

ERIC : For education research.

Scopus and Web of Science : For multidisciplinary research.

Tip: Create a Boolean Search Strategy Use Boolean operators (AND, OR, NOT) to combine search terms. For example:

Peer tutoring AND academic performance

Peer tutoring OR collaborative learning

A well-structured search strategy helps ensure you capture as many relevant studies as possible.

Screening and Selecting Studies

Now comes one of the most labor-intensive steps—screening studies for inclusion. You'll need to carefully sift through the studies identified in your search to decide which ones to include based on your predefined inclusion and exclusion criteria.

Tip: Use a PRISMA Flow Diagram A PRISMA flow diagram (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) helps you visually track the study selection process. It shows how many studies were identified, how many were excluded, and how many were included in the final review.

Inclusion Criteria Example:

Studies published in peer-reviewed journals within the last 10 years.

Quantitative studies measuring specific academic outcomes.

Exclusion Criteria Example:

Studies not published in English.

Qualitative studies or theoretical articles that don’t report on measurable outcomes.

Data Extraction and Synthesis

Once the relevant studies are selected, the next step is extracting and synthesizing the data. In an SLR, data extraction is systematic and predefined to maintain objectivity.

Tip: Create a Data Extraction Form A data extraction form helps you systematically collect key details from each study. For example:

Study characteristics: Author, year, publication, and sample size.

Intervention details: How the intervention was carried out.

Results: Primary outcomes and measures.

Synthesis Techniques:

Qualitative Synthesis: Summarizing themes or patterns across studies.

Quantitative Synthesis (Meta-Analysis): Statistically combining results from multiple studies to estimate overall effects.

Reporting and Writing Up the Review

Finally, your systematic literature review needs to be written up clearly and transparently, detailing every step of your process. This is where your review differs from traditional reviews—the methodology section is critical in an SLR.

Key Sections of an SLR Report:

Introduction: Introduce your research question and justify why the review is necessary.

Methodology: Describe the databases searched, the keywords used, the inclusion/exclusion criteria, and how studies were selected.

Results: Present your findings, possibly using tables or figures to organize data.

Discussion: Discuss your results, highlighting trends, limitations, and gaps in the research.

Conclusion: Summarize the key takeaways and suggest areas for future research.

Tip: Follow PRISMA Guidelines Ensure your review adheres to PRISMA guidelines to maintain quality and transparency. PRISMA provides a checklist for reporting systematic reviews and meta-analyses.

Conclusion: Elevate Your Research with Systematic Literature Reviews

A systematic literature review may require more time and effort than a traditional review, but it offers greater credibility and rigor. By following a structured, step-by-step process, you’ll provide a comprehensive and unbiased assessment of the literature that can significantly elevate the quality of your research.

Ready to dive into your systematic literature review? WritersER is here to support you with personalized coaching and resources. Click here to get started!

U.S. flag

A .gov website belongs to an official government organization in the United States.

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Table of Contents
  • Chapter 1: Introduction
  • Chapter 2: Creating Trustworthy Guidelines
  • Chapter 3: Overview of the Guideline Development Process
  • Chapter 4: Formulating PICO Questions
  • Chapter 5: Choosing and Ranking Outcomes
  • Chapter 6: Systematic Review Overview
  • Chapter 7: GRADE Criteria Determining Certainty of Evidence
  • Chapter 8: Domains Decreasing Certainty in the Evidence
  • Chapter 9: Domains Increasing One's Certainty in the Evidence
  • Chapter 10: Overall Certainty of Evidence
  • Chapter 11: Communicating findings from the GRADE certainty assessment
  • Chapter 12: Integrating Randomized and Non-randomized Studies in Evidence Synthesis

Related Topics:

  • Advisory Committee on Immunization Practices (ACIP)
  • Vaccine-Specific Recommendations
  • Evidence-Based Recommendations—GRADE

Chapter 6: Systematic Review Overview

  • This ACIP GRADE handbook provides guidance to the ACIP workgroups on how to use the GRADE approach for assessing the certainty of evidence.

The evidence base must be identified and retrieved systematically before the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach is used to assess the certainty of the evidence and provide support for guideline judgements. A systematic review should be used to retrieve the best available evidence related to the Population, Intervention, Comparison, and Outcomes (PICO) question. All guidelines should be preceded by a systematic review to ensure that recommendations and judgements are supported by an extensive body of evidence that addresses the research question. This section provides an overview of the systematic review process, external to the GRADE assessment of the certainty of evidence.

Systematic methods should be used to identify and synthesize the evidence 1 . In contrast to narrative reviews, systematic methods address a specific question and apply a rigorous scientific approach to the selection, appraisal, and synthesis of relevant studies. A systematic approach requires documentation of the search strategy used to identify all relevant published and unpublished studies and the eligibility criteria for the selection of studies. Systematic methods reduce the risk of selective citation and improve the reliability and accuracy of decisions. The Cochrane handbook provides guidance on searching for studies, including gray literature and unpublished studies ( Chapter 4: Searching for and selecting studies ) 1 .

6.1 Identifying the evidence

Guidelines should be based on a systematic review of the evidence 2 3 . A published systematic review can be used to inform the guideline, or a new one can be conducted. The benefits of identifying a previously conducted systematic review include reduced time and resources of conducting a review from scratch 3 . Additionally, if a Cochrane or other well-done systematic review exists on the topic of interest, the evidence is likely presented in a well-structured format and meets certain quality standards, thus providing a good evidence foundation for guidelines. As a result, systematic reviews do not need to be developed de novo if a high-quality review of the topic exists. Updating a relevant and recent high-quality review is usually less expensive and requires less time than conducting a review de novo. Databases, such as the Cochrane library, Medline (through PubMed or OVID), and EMBASE can be searched to identify existing systematic reviews which address the PICO question of interest. Additionally, the International Prospective Register of Systematic Reviews (PROSPERO) database can be searched to check for completed or on-going systematic reviews addressing the research question of interest 3 . It's important to base an evidence assessment and recommendations on a well-done systematic review to avoid any potential for bias to be introduced into the review, such as the inability to replicate methods or exclusion of relevant studies. Assessing the quality of a published systematic review can be done using the A Measurement Tool to Assess systematic Reviews (AMSTAR 2) instrument 3 . This instrument assesses the presence of the following characteristics in the review: relevancy to the PICO question; deviations from the protocol; study selection criteria; search strategy; data extraction process; risk of bias assessments for included studies; and appropriateness of both quantitative and qualitative synthesis 4 . A Risk of Bias of Systematic Reviews (ROBIS) assessment may also be performed 5 .

If a well-done systematic review is identified but the date of the last search is more than 6-12 months old, consider updating the search from the last date to ensure that all available evidence is captured to inform the guideline. In a well-done published systematic review, the search strategy will be provided, possibly as an online appendix or supplementary materials. Refer to the Evidence Retrieval section (6.3) for more information.

If a well-done published systematic review is not identified, then a de novo systematic review must be conducted. Once the PICO question(s) have been identified, conducting a systematic review includes the following steps:

  • Protocol development
  • Evidence retrieval and identification
  • Risk of bias assessment
  • A meta-analysis or narrative synthesis
  • Assessment of the certainty of evidence using GRADE

6.2 Protocol development

There are several in-depth resources available to support authors when developing a systematic review; therefore, this and following sections will refer to higher-level points and provide information on those resources. The Cochrane Handbook serves as a fundamental reference for the development of systematic reviews and the PRISMA guidance provides detailed information on reporting requirements. To improve transparency and reduce the potential for bias to be introduced into the systematic review process, a protocol should be developed a priori to outline the methods of the planned systematic review. If the methods in the final systematic review deviate from the protocol (as is not uncommon), this must be noted in the final review with a rationale. Protocol development aims to reduce potential bias and ensure transparency in the decisions and judgements made by the review team. Protocols should document the predetermined PICO and study inclusion/exclusion criteria without the influence of the outcomes available in published primary studies 6 . The Preferred Reporting Items for Systematic review and Meta-Analysis Protocols (PRISMA-P) framework can be used to guide the development of a systematic review 7 . Details on the PRISMA-P statement and checklist are available at https://www.prisma-statement.org/protocols . 7 If the intention is to publish the systematic review in a peer-reviewed journal separately from the guideline, consider registering the systematic review using PROSPERO before beginning the systematic review process 8 .

To ensure the review is done well and meets the needs of the guideline authors, it is important to consider what type of evidence will be searched and included at the protocol stage before the evidence is retrieved 9 . While randomized controlled trials (RCTs) are often considered gold standards for evidence, there are many reasons why authors will choose to include nonrandomized studies (NRS) in their searches:

  • To address baseline risks
  • When RCTs aren't feasible, ethical or readily available
  • When it is predicted that RCTs will have very serious concerns with indirectness (Refer to Table 12 for more information about Indirectness)

NRS can serve as complementary, sequential, or replacement evidence to RCTs depending on the situation 10 . Section 9 of this handbook provides detailed information about how to integrate NRS evidence. At the protocol stage it is important to consider whether or not NRS should be included.

The systematic review team will scope the available literature to develop a sense of whether or not the systematic review should be limited to RCTs alone or if a reliance on NRS may also be necessary. Once this inclusion and exclusion criteria has been established, the literature can be searched and retrieved systematically.

6.3 Evidence retrieval and identification

6.3a. searching databases.

An expert librarian or information specialist should be consulted to create a search strategy that is applied to all relevant databases to gather primary literature 1 . The following databases are widely used when conducting a systematic review: MEDLINE (via PubMed or OVID); EMBASE; Cochrane Central Register of Controlled Trials (CENTRAL). The details of each strategy as actually performed, with search terms (keywords and/or Medical Subject Headings/MESH terms) the date(s) on which the search was conducted and/or updated; and the publication dates of the literature covered, should be recorded.

In addition to searching for evidence, references from studies included for the review should also be examined to add anything relevant missed by the searches. It is also useful to examine clinical trials registries maintained by the federal government ( www.clinicaltrials.gov ) and vaccine manufacturers, and to consult subject matter experts. Ongoing studies should be recorded as well so that if the review or guideline were to be updated, these studies can be assessed for inclusion.

6.3b. Screening to identify eligible studies

The criteria for including/excluding evidence identified by the search, and the reasons for including and excluding evidence should be described (e.g., population characteristics, intervention, comparison, outcomes, study design, setting, language). Screening is typically conducted independently and in duplicate by at least two reviewers. Title and abstract screening is done first based on broader eligibility criteria and once relevant abstracts are selected, the full texts of those papers are pulled. The full-text screening is also usually conducted by two reviewers, independently and in duplicate with a more specific eligibility criteria to decide if the paper answers the PICO question or not. At both the title and abstract, and at the full-text stages, disagreements between reviewers can be resolved through discussion or involvement of a third reviewer. The goal of the screening process is to sort through the literature and select the most relevant studies for the review. To organize and conduct the systematic review, Covidence can be used to better manage each of the steps of the screening process. Other programs, such as DistillerSR or Rayyan can also be used to manage the screening process 11 12 . The PRISMA Statement ( www.prisma-statement.org ) includes guidance on reporting the methods for evidence retrieval. A PRISMA flow diagram (Figure 3) presents the systematic review search process and results.

Figure 3. PRISMA flow diagram depicting the flow of information through the different phases of the systematic review evidence retrieval process, including the number of records identified, records included and excluded at each stage, and the reasons for exclusions.

References in this figure: 13

Figure 3: PRISMA flow diagram depicting the flow of information through the different phases of the systematic review...

*Consider, if feasible to do so, reporting the number of records identified from each database or register searched (rather than the total number across all databases/registers).

**If automation tools were used, indicate how many records were excluded by a human and how many were excluded by automation tools.

From: Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. doi: 10.1136/bmj.n71. For more information, visit: http://www.prisma-statement.org/

6.3c. Data extraction

Once included articles have been screened and selected, relevant information from the articles should be extracted systematically using a standardized and pilot-tested data extraction form. Table 3 provides an example of an ACIP data extraction form (data fields may differ by topic and scope); Microsoft Excel can be used to keep track of and extract relevant details about each study. Data extraction forms typically capture information about: 1) study details (author, publication year, title, funding source, etc.); 2) study characteristics (study design, geographical location, population, etc.); 3) study population (demographics, disease severity, etc.); 4) intervention and comparisons (e.g., type of vaccine/placebo/control, dose, number in series, etc.); 5) outcome measures. For example, for dichotomously reported outcomes, the number of people with the outcome per study arm and the total number of people in each study arm are noted. In contrast, for continuous outcomes, the total number of people in each study arm, the mean or median, as well as standard deviation or standard error are extracted. This is the information needed to conduct a quantitative synthesis. If this information is not provided in the study, reviewers may want to reach out to the authors for more information or contact a statistician about alternative approaches to quantifying data. After extracting the studies, risk of bias should be assessed using an appropriate tool described in Section 8.1 of this handbook.

Table 3. Example of a data extraction form for included studies

Author, Year Name of reviewer Date completed Study characteristics Participants Interventions Outcomes Other fields
Study design Number of participants enrolled* Number of participants analyzed* Loss to follow up (for each outcome) Country Age Sex (% female) Race/ Ethnicity Inclusion criteria Exclusion criteria Equivalence of baseline characteristics Intervention arm Dose Duration Cointerventions Comparison arm Dose Duration Cointerventions Dichotomous: intervention arm n event/N, control arm n event/N Type of study (published/ unpublished) Funding source Study period Reported subgroup analyses

*total and per group

6.4 Conducting the meta-analysis

After the data has been retrieved, if appropriate, it can be statistically combined to produce a pooled estimate of the relative (e.g., risk ratio, odds ratio, hazard ratio) or absolute (e.g., mean difference, standard mean difference) effect for the body of evidence of each outcome. A meta-analysis can be performed when there are at least two studies that report on the same outcome. Several software programs are available that can be used to perform a meta-analysis, including R, STATA, and Review Manager (RevMan).

The results from a meta-analysis are presented in a forest plot as presented in figure 4. A forest plot presents the effect estimates and confidence intervals for each individual study and a pooled estimate of all the studies included in the meta-analysis 14 . The square represents the effect estimate and the horizontal line crossing the square is indicative of the confidence interval (CI; typically 95% CI). The area the square covers reflects the weight given to the study in the analysis. The summary result is presented as a diamond at the bottom.

Figure 4. Estimates of effect for RCTs included in analysis for outcome of incidence of arthralgia (0-42 days)

References in this figure: 15

Figure 4. Estimates of effect for RCTs included in analysis for outcome of incidence of arthralgia (0-42 days)

The two most popular statistical methods for conducting meta-analyses are the fixed-effects model and the random-effects model 14 . These two models typically generate similar effect estimates when used in meta-analyses. However, these models are not interchangeable, and each model makes a different assumption about the data being analyzed.

A fixed-effects model assumes that there is one true effect size that can be identified across all included studies; therefore, all observed differences between studies are attributed to sampling error. The fixed effect model is used when all the studies are assumed to share a common effect size 16 . Before using the fixed-effect model in a meta-analysis, consideration should be made as to whether the results will be applied to only the included studies. Since the fixed-effect model provides the pooled effect estimate for the population in the studies included in the analysis, it should not be used if the goal is to generalize the estimate to other populations.

In contrast, a random-effects model, some variability between the true effect sizes studies is accepted. These effect sizes are assumed to follow a normal distribution. The confidence intervals generated by the random-effects model are typically wider than those generated by the fixed-effect model, as they recognize that some variability in the findings can be due to differences between the primary studies. The weights of the studies are also more similar under the random-effects model. When variations in, for example, the participants or methods across different included studies is suspected, it is suggested to use a random-effects model. This is because the studies are weighed more evenly than the fixed effect model. The majority of analyses will meet the criteria to use a random effects mode. One caveat about the selection of models: when the number of studies included in the analysis is few (<3), the random-effects model will produce an estimate of variance with poor precision. In this situation, a fixed effect model will be a more appropriate way to conduct the meta-analysis 17 .

  • Lefebvre C, Glanville J, Briscoe S, et al. Chapter 4: Searching for and selecting studies. In: Higgins J, Thomas J, Chandler J, et al, eds. Cochrane Handbook for Systematic Reviews of Interventions version 63 (updated February 2022). Cochrane; 2022. www.training.cochrane.org/handbook.
  • Committee on Standards for Developing Trustworthy Clinical Practice Guidelines BoHCS, Institute of Medicine. Clinical Practice Guidelines We Can Trust. National Academies Press; 2011.
  • World Health Organization. WHO handbook for guideline development, 2nd ed. 2014: World Health Organization. 167.
  • Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017/09/21/ 2017:j4008. doi:10.1136/bmj.j4008
  • Bristol Uo. ROBIS tool.
  • Lasserson T, Thomas J, Higgins J. Chapter 1: Starting a review. In: Higgins J, Thomas J, Chandler J, et al, eds. Cochrane Handbook for Systematic Reviews of Interventions version 63. 2022. www.training.cochrane.org/handbook
  • Moher D, Shamseer L, Clarke M, et al. Preferred reporting items for systematic review and meta analysis protocols (PRISMA-P) 2015 statement. Syst Rev. Jan 1 2015;4:1. doi:10.1186/2046- 4053-4-1
  • PROSPERO. York.ac.uk. https://www.crd.york.ac.uk/PROSPERO/
  • Cuello-Garcia CA, Santesso N, Morgan RL, et al. GRADE guidance 24 optimizing the integration of randomized and non-randomized studies of interventions in evidence syntheses and health guidelines. J Clin Epidemiol. 2022/02// 2022;142:200-208. doi:10.1016/j.jclinepi.2021.11.026
  • Schünemann HJ, Tugwell P, Reeves BC, et al. Non-randomized studies as a source of complementary, sequential or replacement evidence for randomized controlled trials in systematic reviews on the effects of interventions. Research Synthesis Methods. 2013 2013;4(1):49-62. doi:10.1002/jrsm.1078
  • DistillerSR | Systematic Review and Literature Review Software. DistillerSR.
  • Rayyan – Intelligent Systematic Review. https://www.rayyan.ai/
  • Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi:10.1136/bmj.n71
  • Deeks J, Higgins J, Altman D. Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins J, Thomas J, Chandler J, et al, eds. Cochrane Handbook for Systematic Reviews of v.04_2024 20 Interventions version 63 (updated February 2022). Cochrane; 2022. www.training.cochrane.org/handbook .
  • Choi MJ, Cossaboom CM, Whitesell AN, et al. Use of ebola vaccine: recommendations of the Advisory Committee on Immunization Practices, United States, 2020. MMWR Recommendations and Reports. 2021;70(1):1.
  • Borenstein M, Hedges LV, Higgins JP, Rothstein HR. Introduction to meta-analysis. John Wiley & Sons; 2021.
  • Borenstein M, Hedges LV, Higgins JP, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods. 2010;1:97-111. doi:DOI: 10.1002/jrsm.12

ACIP GRADE Handbook

This handbook provides guidance to the ACIP workgroups on how to use the GRADE approach for assessing the certainty of evidence.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10248995

Logo of sysrev

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table ​ (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses

 Cochrane (formerly Cochrane Collaboration)
 JBI (formerly Joanna Briggs Institute)
 National Institute for Health and Care Excellence (NICE)—United Kingdom
 Scottish Intercollegiate Guidelines Network (SIGN) —Scotland
 Agency for Healthcare Research and Quality (AHRQ)—United States

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Review typeTopic assessedElements of research question (mnemonic)
Intervention [ , ]Benefits and harms of interventions used in healthcare. opulation, ntervention, omparator, utcome ( )
Diagnostic test accuracy [ ]How well a diagnostic test performs in diagnosing and detecting a particular disease. opulation, ndex test(s), and arget condition ( )
Qualitative
 Cochrane [ ]Questions are designed to improve understanding of intervention complexity, contextual variations, implementation, and stakeholder preferences and experiences.

etting, erspective, ntervention or Phenomenon of nterest, omparison, valuation ( )

ample, henomenon of nterest, esign, valuation, esearch type ( )

spective, etting, henomena of interest/Problem, nvironment, omparison (optional), me/timing, indings ( )

 JBI [ ]Questions inform meaningfulness and appropriateness of care and the impact of illness through documentation of stakeholder experiences, preferences, and priorities. opulation, the Phenomena of nterest, and the ntext
Prognostic [ ]Probable course or future outcome(s) of people with a health problem. opulation, ntervention (model), omparator, utcomes, iming, etting ( )
Etiology and risk [ ]The relationship (association) between certain factors (e.g., genetic, environmental) and the development of a disease or condition or other health outcome. opulation or groups at risk, xposure(s), associated utcome(s) (disease, symptom, or health condition of interest), the context/location or the time period and the length of time when relevant ( )
Measurement properties [ , ]What is the most suitable instrument to measure a construct of interest in a specific study population? opulation, nstrument, onstruct, utcomes ( )
Prevalence and incidence [ ]The frequency, distribution and determinants of specific factors, health states or conditions in a defined population: eg, how common is a particular disease or condition in a specific group of individuals?Factor, disease, symptom or health ndition of interest, the epidemiological indicator used to measure its frequency (prevalence, incidence), the ulation or groups at risk as well as the ntext/location and time period where relevant ( )

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI

Intervention857296.3Effectiveness43561.5
Diagnostic1761.9Diagnostic Test Accuracy91.3
Overview640.7Umbrella40.6
Methodology410.45Mixed Methods20.3
Qualitative170.19Qualitative15922.5
Prognostic110.12Prevalence and Incidence60.8
Rapid110.12Etiology and Risk71.0
Prototype 80.08Measurement Properties30.4
Economic60.6
Text and Opinion10.14
Scoping436.0
Comprehensive 324.5
Total = 8900Total = 707

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. ​ Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis

 Quality of Reporting of Meta-analyses (QUOROM) StatementMoher 1999 [ ]
 Meta-analyses Of Observational Studies in Epidemiology (MOOSE)Stroup 2000 [ ]
 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)Moher 2009 [ ]
 PRISMA 2020 Page 2021 [ ]
 Overview Quality Assessment Questionnaire (OQAQ)Oxman and Guyatt 1991 [ ]
 Systematic Review Critical Appraisal SheetCentre for Evidence-based Medicine 2005 [ ]
 A Measurement Tool to Assess Systematic Reviews (AMSTAR)Shea 2007 [ ]
 AMSTAR-2 Shea 2017 [ ]
 Risk of Bias in Systematic Reviews (ROBIS) Whiting 2016 [ ]

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

Characteristic
ExtensiveExtensive
InterventionIntervention, diagnostic, etiology, prognostic
7 critical, 9 non-critical4
 Total number1629
 Response options

Items # 1, 3, 5, 6, 10, 13, 14, 16: rated or

Items # 2, 4, 7, 8, 9 : rated or

Items # 11 , 12, 15: rated or

24 assessment items: rated

5 items regarding level of concern: rated

 ConstructConfidence based on weaknesses in critical domainsLevel of concern for risk of bias
 CategoriesHigh, moderate, low, critically lowLow, high, unclear

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions

PRISMA for systematic reviews with a focus on health equity [ ]PRISMA-E2012
Reporting systematic reviews in journal and conference abstracts [ ]PRISMA for Abstracts2015; 2020
PRISMA for systematic review protocols [ ]PRISMA-P2015
PRISMA for Network Meta-Analyses [ ]PRISMA-NMA2015
PRISMA for Individual Participant Data [ ]PRISMA-IPD2015
PRISMA for reviews including harms outcomes [ ]PRISMA-Harms2016
PRISMA for diagnostic test accuracy [ ]PRISMA-DTA2018
PRISMA for scoping reviews [ ]PRISMA-ScR2018
PRISMA for acupuncture [ ]PRISMA-A2019
PRISMA for reporting literature searches [ ]PRISMA-S2021

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a

Table Table
Methods for study selection#5#2.5All three components must be done in duplicate, and methods fully described.Helps to mitigate CoI and bias; also may improve accuracy.
Methods for data extraction#6#3.1
Methods for RoB assessmentNA#3.5
Study description#8#3.2Research design features, components of research question (eg, PICO), setting, funding sources.Allows readers to understand the individual studies in detail.
Sources of funding#10NAIdentified for all included studies.Can reveal CoI or bias.
Publication bias#15*#4.5Explored, diagrammed, and discussed.Publication and other selective reporting biases are major threats to the validity of systematic reviews.
Author CoI#16NADisclosed, with management strategies described.If CoI is identified, management strategies must be described to ensure confidence in the review.

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

AcronymMeaning
feasible, interesting, novel, ethical, and relevant
specific, measurable, attainable, relevant, timely
time, outcomes, population, intervention, context, study design, plus (effect) moderators

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses

 BMJ Open
 BioMed Central
 JMIR Research Protocols
 World Journal of Meta-analysis
 Cochrane
 JBI
 PROSPERO

 Research Registry-

 Registry of Systematic Reviews/Meta-Analyses

 International Platform of Registered Systematic Review and Meta-analysis Protocols (INPLASY)
 Center for Open Science
 Protocols.io
 Figshare
 Open Science Framework
 Zenodo

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis

Aggregate data

Individual

participant data

Weighted average of effect estimates

Pairwise comparisons of effect estimates, CI

Overall effect estimate, CI, value

Evaluation of heterogeneity

Forest plot with summary statistic for average effect estimate
Network Variable The interventions, which are compared directly indirectlyNetwork diagram or graph, tabular presentations
Comparisons of relative effects between any pair of interventionsEffect estimates for intervention pairings
Summary relative effects for pair-wise comparisons with evaluations of inconsistency and heterogeneityForest plot, other methods
Treatment rankings (ie, probability that an intervention is among the best options)Rankogram plot
Summarizing effect estimates from separate studies (without combination that would provide an average effect estimate)Range and distribution of observed effects such as median, interquartile range, range

Box-and-whisker plot, bubble plot

Forest plot (without summary effect estimate)

Combining valuesCombined value, number of studiesAlbatross plot (study sample size against values per outcome)
Vote counting by direction of effect (eg, favors intervention over the comparator)Proportion of studies with an effect in the direction of interest, CI, valueHarvest plot, effect direction plot

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4  for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

[ ]
Risk of bias [ ]Large magnitude of effect
Imprecision [ ]Dose–response gradient
Inconsistency [ ]All residual confounding would decrease magnitude of effect (in situations with an effect)
Indirectness [ ]
Publication bias [ ]

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

 ⊕  ⊕  ⊕  ⊕ High: We are very confident that the true effect lies close to that of the estimate of the effect
 ⊕  ⊕  ⊕ Moderate: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
 ⊕  ⊕ Low: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect
 ⊕ Very low: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

1. The certainty in the evidence (also known as quality of evidence or confidence in the estimates) should be defined consistently with the definitions used by the GRADE Working Group.
2. Explicit consideration should be given to each of the GRADE domains for assessing the certainty in the evidence (although different terminology may be used).
3. The overall certainty in the evidence should be assessed for each important outcome using four or three categories (such as high, moderate, low and/or very low) and definitions for each category that are consistent with the definitions used by the GRADE Working Group.
4. Evidence summaries … should be used as the basis for judgments about the certainty in the evidence.

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a

Cochrane , JBICochrane, JBICochraneCochrane, JBIJBIJBIJBICochrane, JBIJBI
 ProtocolPRISMA-P [ ]PRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-P
 Systematic reviewPRISMA 2020 [ ]PRISMA-DTA [ ]PRISMA 2020

eMERGe [ ]

ENTREQ [ ]

PRISMA 2020PRISMA 2020PRISMA 2020PRIOR [ ]PRISMA-ScR [ ]
 Synthesis without MASWiM [ ]PRISMA-DTA [ ]SWiM eMERGe [ ] ENTREQ [ ] SWiM SWiM SWiM PRIOR [ ]

For RCTs: Cochrane RoB2 [ ]

For NRSI:

ROBINS-I [ ]

Other primary research

QUADAS-2[ ]

Factor review QUIPS [ ]

Model review PROBAST [ ]

CASP qualitative checklist [ ]

JBI Critical Appraisal Checklist [ ]

JBI checklist for studies reporting prevalence data [ ]

For NRSI: ROBINS-I [ ]

Other primary research

COSMIN RoB Checklist [ ]AMSTAR-2 [ ] or ROBIS [ ]Not required
GRADE [ ]GRADE adaptation GRADE adaptation

CERQual [ ]

ConQual [ ]

GRADE adaptation Risk factors GRADE adaptation

GRADE (for intervention reviews)

Risk factors

Not applicable

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

A review that uses explicit, systematic methods to collate and synthesize findings of studies that address a clearly formulated question.
The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates and other methods, such as combining values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect.
A statistical technique used to synthesize results when study effect estimates and their variances are available, yielding a quantitative summary of results.
An event or measurement collected for participants in a study (such as quality of life, mortality).
The combination of a point estimate (such as a mean difference, risk ratio or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome.
A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information.
The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.
An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses.

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

PreferredPotentially problematic

Evidence synthesis with meta-analysis

Systematic review with meta-analysis

Meta-analysis
Overview or umbrella review

Systematic review of systematic reviews

Review of reviews

Meta-review

RandomizedExperimental
Non-randomizedObservational
Single case experimental design

Single-subject research

N-of-1 design

Case report or case seriesDescriptive study
Methodological qualityQuality
Certainty of evidence

Quality of evidence

Grade of evidence

Level of evidence

Strength of evidence

Qualitative systematic reviewQualitative synthesis
Synthesis of qualitative data Qualitative synthesis
Synthesis without meta-analysis

Narrative synthesis , narrative summary

Qualitative synthesis

Descriptive synthesis, descriptive summary

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Subject guides
  • Systematic Review
  • Grey literature & supplementary searching

Systematic Review: Grey literature & supplementary searching

  • Getting started, systematic review workflow
  • Develop question & key concepts, inclusion & exclusion criteria
  • Look for existing reviews
  • Protocol, Manuals, documentation & PRISMA
  • Identify search terms, initial search & gold set
  • Select databases & save results
  • Search strategy, using limits & filters, test & review strategy
  • Database search translation
  • Screening process steps
  • Assess quality of included studies
  • Request a consultation

Grey literature

Finding grey literature, searching it systematically and documenting your searches is time consuming and challenging. 

See the Grey literature library guide for further information on searching specific grey literature sources to identify relevant material, as well as using search engines such as Google/Google Scholar.

See the Moodle book MNHS: Systematically searching the grey literature for a comprehensive module on grey literature for systematic reviews.

Trial registers

Systematic reviews routinely search trials registers as a means of identifying additional unpublished and ongoing clinical trials and reducing the risk of reporting biases.

No single registry contains all studies. The Cochrane Handbook indicates that at a minimum the ICTRP ( WHO International Clinical Trials Registry Platform, a meta-registry containing 17 registers) and ClinicalTrials.gov should be searched.

Cochrane Handbook 4.3.3 Trials registers and trials results registers #section-4-3-3. Although there are many other trials registers, ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform (ICTRP) portal are considered to be the most important for searching to identify studies for a systematic review (Pansieri et al 2017). Research has shown that even though ClinicalTrials.gov is included in the WHO ICTRP Search Portal, not all ClinicalTrials.gov records can be successfully retrieved via searches of the ICTRP Search Portal (Glanville et al 2014, Knelangen et al 2018).

Both of these important sources are contained in Cochrane CENTRAL (Cochrane Central Register of Controlled Trials) which is updated monthly and searchable via the Ovid platform. This comprehensiveness makes CENTRAL a popular choice for systematic reviews. As noted above, for Cochrane reviews they must search ClinicalTrials.gov and ICTRP instead for optimum sensitivity.

If using CENTRAL and including non-randomised study designs in your review, you should search ClinicalTrials.gov as well, as CENTRAL only indexes RCTs/quasi-RCTs from this source.

You might also include a national trials register (or another relevant register) separately in order to be rigorous (to overcome limited search functionality and lag in updates) or inclusive for your context. See the full curated list of registers .

Supplementary search methods

Handsearching (manual searching)

While manual searching can still mean actually hand-searching print copies of journals for relevant studies, it more often involves browsing the online table-of-contents of relevant issues, or reference lists of relevant papers.

In systematic reviews, manual searching is considered an important method of uncovering papers that may not have been picked up in your database searches. This can include studies in journals not indexed by core databases, or papers not retrieved by your search strategy due to being poorly described or incorrectly indexed.

Citation searching

Citation searching comes under the umbrella of manual searching and is a search method that can be done forward or backward in time:

  • Forward citation searching retrieves records that have cited an item, also known as “cited by”. This provides you with more recently published articles that may be relevant for your topic. 
  • Backward citation searching involves records that an item has cited (these will be located in the article's reference list). This is also known as snowballing - using known relevant articles to identify other key articles or search terms.

The main citation databases are Scopus, Web of Science and Google Scholar. 

  • << Previous: Database search translation
  • Next: Screening process steps >>
  • Radcliffe Group
  • Medical Education
  • Content for healthcare professionals only

CFR logo

Editorial Board

For Authors

Current Volume

Special Collections

Previous Volumes

Submit Article

Article Info

Article Text

Figures & Tables

Review Article

Transcatheter Aortic Valve Replacement is Ready for Most Low-risk Patients: A Systematic Review of the Literature

Ahmad Jabri

Anas Alameh

Gennaro Giustino

Pedro Engel Gonzalez

Brian O’Neill

Rodrigo Bagur

Tiberio Frisoli

Dee Dee Wang

William W O’Neill

Pedro Villablanca

Information image

Downloads: 1

  • Read time: 15m

Transcatheter aortic valve replacement (TAVR) has undergone rapid expansion, emerging as a viable therapeutic option for low-risk patients in lieu of surgical aortic valve replacement. This paper aims to provide a review of the scientific evidence concerning TAVR in low-risk patients, encompassing both observational and clinical trial data. Furthermore, a substantial proportion of low-risk patients possesses a bicuspid aortic valve, necessitating careful examination of the pertinent anatomic and clinical considerations to TAVR that is highlighted in this review. Additionally, the review expands upon some of the unique challenges associated with alternate access in low-risk patients evaluated for TAVR. Last, this review outlines the pivotal role of a multidisciplinary heart team approach in the execution of all TAVR procedures and the authors’ vision of ‘minimalist TAVR’ as a new era in low-risk TAVR.

Transcatheter aortic valve replacement , transcatheter aortic valve intervention , low risk , systematic review ,

Disclosure: The authors have no conflicts of interest to declare.

Received: 07 November 2023

Accepted: 26 January 2024

Published online: 13 September 2024

Citation: Cardiac Failure Review 2024;10:e11.

Select format

DOI: https://doi.org/10.15420/cfr.2023.23

Correspondence Details: Pedro Villablanca, Henry Ford Hospital, Centre for Structural Heart Disease, CFP 4th Floor, 2799 W Grand Blvd, Detroit, MI 48202, US. E: [email protected]

This work is open access under the CC-BY-NC 4.0 License which allows users to copy, redistribute and make derivative works for non-commercial purposes, provided the original work is cited correctly.

Transcatheter aortic valve replacement (TAVR) has evolved as a transformative intervention, gaining widespread acceptance and expanding indications to include low-risk patient populations. 1–6 Notably, the annual volume of TAVR procedures in the US surpassed all forms of surgical aortic valve replacement (SAVR) in 2019. 1–3 This paradigm shift is underscored by the approval of SAPIEN 3 (Edwards) and Evolut (Medtronic) valves for low-surgical risk patients, signifying a potential decline in the age threshold for TAVR referrals. As the landscape of TAVR broadens, an increasing number of low-risk patients, including those with bicuspid aortic valves (BAV), will undergo this intervention, presenting unique challenges given their longer life expectancy with bioprosthetic valves. However, the long-term outlook for TAVR outcomes in low-risk patients remains uncertain.

In navigating the complexities of TAVR in low-risk patients, the review explores alternative access strategies and underscores the crucial role of the heart team. Furthermore, the discussion delves into the imperative to establish strategies for lifetime management in young, low-risk patients, emphasising the approach to the selection of the initial procedure. This involves not only optimising durability but also facilitating potential second and third reinterventions, potentially leading to scenarios involving TAVR–TAVR–TAVR.

As we navigate this dynamic landscape, the integration of scientific evidence, individual patient characteristics, and collaborative decision-making within the heart team will be pivotal in defining the trajectory of TAVR in low-risk patients. Additionally, the review anticipates the growing significance of the emerging concept of ‘minimalist TAVR’, projecting its role as the new era of TAVR expansion in the near future.

Scientific Rationale for TAVR in Low-risk Individuals

Over the past decade, evolving research has underscored the efficacy and safety of in comparison to SAVR in low-risk patients. Initial observational studies reported varied outcomes in this patient cohort ( Table 1 ). 7–13 Subsequently, prospective studies and data registries were initiated to further elucidate TAVR outcomes in low-risk patients. In a prospective study of 200 low-risk TAVR patients, favourable 1-year outcomes were observed, with low mortality rates (3%) and a low incidence of stroke (2.1%). Notably, 14% of TAVR recipients displayed hypo-attenuated leaflet thickening at 30 days, correlating with a numerically higher stroke rate (3.8% versus 1.9%; p=0.53), albeit without impacting valve haemodynamics at the 1-year mark. 11 Results from the GARY registry, encompassing 20,549 low-risk patients (6,062 TAVR, 14,487 SAVR) indicated comparable 1-year survival rates between groups. However, in-hospital and 30-day survival rates favoured TAVR over SAVR (98.5% versus 97.3%; p=0.003; 98.1% versus 97.1%; p=0.014, respectively). 8

Table 1: Observational Studies of Low-risk Transcatheter Aortic Valve Replacement

Article image

Serruys et al. analysed 254 patients (131 TAVR, 123 SAVR) and found that the composite endpoint of all-cause mortality or disabling stroke was lower in the TAVR group compared to SAVR among patients with a Society of Thoracic Surgeons (STS) score of < 3% (1.5% versus 6.5%, p=0.04). 9 Conversely, a study involving 3,402 low-risk patients demonstrated that following propensity matching, SAVR exhibited higher 3-year survival rates compared to TAVR (83.4% versus 72.0%, p=0.0015). This analysis also revealed superior freedom from major cardiac and cerebrovascular events with SAVR in comparison to TAVR (80.9% versus 67.3%, p<0.001). 12

Subsequent to these studies, randomised clinical trials conducted within the low-risk patient population have consistently demonstrated favourable outcomes with TAVR as a therapeutic option ( Table 2 ). 1,2,12,13

Table 2: Clinical Trials of Low-risk Transcatheter Aortic Valve Replacement

Article image

In the PARTNER 3 trial encompassing 1,000 patients (503 TAVR with balloon-expandable valves and 497 SAVR), the primary endpoint of death, stroke or rehospitalisation at 1 year significantly favoured TAVR over SAVR, meeting both superiority and non-inferiority margins (8.5% versus 15.1%; p<0.001 for non-inferiority, p=0.001 for superiority). 1

Similarly, in the Evolut Low Risk trial, involving 1,403 low-risk patients (725 TAVR with self-expanding valves, 678 SAVR), TAVR demonstrated non-inferiority to SAVR in terms of death or disabling stroke rates at 24 months (5.3% in TAVR versus 6.7% in SAVR). Additionally, TAVR displayed lower rates of acute kidney injury, bleeding events, and AF compared to SAVR, although with higher rates of aortic regurgitation and permanent pacemaker implantations. 2 In a recent 4-year report, the authors noted an all-cause mortality or disabling stroke of 10.7% in the TAVR group and 14.1% in the SAVR group (HR 0.74; 95% CI [0.54–1.00]; p=0.05), indicating a 26% relative reduction in the risk for death or disabling stroke with TAVR. Notably, indicators of valve performance (including AV reintervention, valve thrombosis, endocarditis) showed no discernible difference between the two groups.

Within the NOTION trial, encompassing 274 low-risk patients (139 TAVR, 135 SAVR recipients), all-cause mortality at 6 years was comparable between TAVR and SAVR, with similar outcomes persisting at 8 years. Remarkably, this trial represents the lengthiest follow-up data among randomised trials concerning low-risk TAVR outcomes. 12,13 Additionally, investigators documented significantly higher rates of structural valve deterioration (SVD) in SAVR compared to TAVR at 5 years (24.0% versus 4.8%; p<0.001) and at 8 years (13.9% versus 28.3%; p=0.0017).

Long-term Durability: A Closer Look

The long-term analysis of PARTNER 3 raises notable considerations. Beyond the first year, the initially favourable non-hierarchical composite primary endpoint in the TAVR group diminished, revealing a signal in the difference in mortality, primarily driven by non-cardiovascular deaths in the TAVR arm. 14 The 5-year primary endpoint rates for TAVR and surgery were 22.8% and 27.2%, respectively, compared to the 1-year rates (8.5% versus 15.1%; p<0.001) with no significant difference in a win-ratio analysis for a hierarchical composite endpoint. The incidence of stroke at 5 years appeared similar between the two groups, with most strokes being ischaemic, emphasising the continued significance of stroke as a serious complication of aortic-valve replacement. While statistically non-significant, the convergence of mortality curves at this timeframe prompts cautious consideration, especially given the initial assertions of TAVR superiority at 1 year.

In contrast, 4-year data from the Evolut Low Risk trial indicate a persistent benefit with TAVR over time. While all-cause mortality rates were numerically higher with TAVR, the primary endpoint (death or disabling stroke) favoured TAVR, showing a 26% relative reduction in risk. 15 The absolute difference increased from 1.8% at 1 year to 3.4% at 4 years, pointing towards the possible benefits of TAVR in this low-risk population.

These findings underscore the need for continued vigilant monitoring in longer-term follow-up and a nuanced assessment of causes of death to elucidate the evolving landscape of TAVR outcomes in low-risk patients.

Bicuspid TAVR: A Distinct yet Substantial Subset of Low-risk Patients

Traditionally excluded from prior TAVR trials due to safety and outcome uncertainties, patients with BAV have been the focus of recent prospective studies and registries. 1–5,11,13,16 Severe aortic stenosis (AS) in BAV patients, marked by a younger age of onset and unique anatomical challenges, has been addressed in TAVR, with promising short- and intermediate-term success rates. However, this comes with a higher incidence of significant perivalvular regurgitation, a topic of ongoing discussion ( Table 3 ). 6,17–19

Table 3: Studies of Bicuspid Patients

Article image

In a study by Forrest et al., 150 low-risk patients with BAV stenosis undergoing TAVR with self-expanding valves (SEV) showed high device success and a low rate of death or disabling stroke at 30 days, independent of Sievers classification of bicuspid valve type. 6

Another analysis of 61 low-risk patients with BAV undergoing TAVR reported no mortalities or disabling strokes at 30 days. However, there was a 13.1% rate of new pacemaker implantation and a 10% incidence of hypo-attenuated leaflet thickening at 30 days, unrelated to clinical events. 17

When comparing bicuspid and tricuspid TAVR patients, an analysis of 932 bicuspid TAVR procedures from the TVT registry revealed comparable all-cause mortality and stroke rates at 30 days and 1 year, albeit with a slightly elevated 30-day risk for stroke in patients with BAV. 19 An analysis by Halim et al. of 5,412 low-risk TAVR procedures in BAV also demonstrated lower adjusted 1-year mortality in bicuspid TAVR compared to tricuspid TAVR, with a slightly higher incidence of residual moderate or severe aortic insufficiency in bicuspid TAVR. 18 The study also noted a higher device success and lower rates of significant aortic insufficiency with current-generation valves compared to older-generation valves.

Despite the promising results, careful patient selection and anatomical assessment are crucial due to the unique anatomical challenges associated with BAV, including asymmetric aortic annulus, eccentric heavy calcification, calcium distribution throughout the aorto-annular complex, raphe resistance to pre-dilatation, and aortic root dilatation. 20 These challenges may impact valve haemodynamics and durability, resulting in elevated transvalvular gradients, paravalvular leak (PVL), device malpositioning, and a higher rate of permanent pacemaker implantations. Due to these anatomical characteristics, valves are often implanted higher and anchored at the narrowest part of the commissural.

Currently, only observational data are available comparing SEV versus balloon-expandable valves (BEV) valves. The BEAT registry compared SAPIEN 3 versus Evolut R/PRO in AS BAV and confirmed favourable procedural results with both platforms. However, the SEV group exhibited a higher rate of moderate-to-severe PVL at 1 year, and BEV were associated with a more frequent occurrence of annular rupture. 21

Alternate Vascular Access for TAVR

The transfemoral (TF) vascular access route constitutes the primary approach in the majority of TAVR procedures, accounting for approximately 90% of all TAVR interventions, even in cases involving low-risk patients. 1–8 Historically, femoral access has been the standard access employed in randomised clinical trials of TAVR due to its well-established safety profile, consistent outcomes, and the familiarity of operators, with alternative access methods comprising a smaller fraction of these procedures. 1–5,11,13,16 In the Evolut Low Risk trial, the usage of alternative access was approximately 1%, and the absence of TF access served as an exclusion criterion in the PARTNER 3 trial.

It is noteworthy that alternative access approaches have been associated with increased mortality and stroke rates compared to patients with TF access, particularly in cases involving transapical, direct aortic, and transcaval routes. 22–24 While peripheral artery disease and significant vessel tortuosity typically prompt consideration of alternate access TAVR, operators often opt for peripheral vascular interventions to facilitate TF TAVR and avoid the necessity of alternative access procedures. 23 The emergence of intravascular lithotripsy as a novel technology for modifying heavily calcified arteries to accommodate larger-bore access, including TAVR delivery sheaths, is actively under investigation and holds promise, especially in patients with stenotic calcified iliofemoral vessels. 24

However, given the associated morbidity and mortality of alternative access TAVR, coupled with the exclusion of these patients from low-risk trials, SAVR should remain the preferred choice for low-surgical-risk patients lacking TF access, particularly in centres where routine alternative access procedures are not routinely performed. Furthermore, the most recent American College of Cardiology/American Heart Association (ACC/AHA 2020) valvular heart disease guidelines have recommended SAVR as the preferred treatment if vascular anatomy or other factors preclude TF-TAVR (class I). 25

Challenges to Low-risk TAVR

Permanent Pacemaker

Presently, the rates of new permanent pacemaker implantation (PPI) after TAVR vary widely, ranging from 2% to 36%. Meta-analyses have indicated an elevated risk of all-cause mortality at 1 year in patients necessitating a new prosthesis–patient mismatch (PPM). 26 Additionally, the requirement for PPI is associated with extended hospital stays and increased healthcare costs. Impingement onto the membranous septum by the TAVR valve is linked to a higher incidence of heart block. 26

The MInimizing Depth According to the membranous Septum (MIDAS) approach has been shown to significantly reduce the rate of PPI. 27 However, it is imperative to balance between the risk of heart block and the risk of upward migration of the TAVR valve. Consequently, the cusp-overlapping technique has been developed to better assess the true depth of TAVR implantation, which can be misleading when using the traditional co-planar view. 26,27 The cusp overlap view angle can be determined pre-procedurally through CT reconstruction and subsequently confirmed intraoperatively via fluoroscopy, typically employing a right anterior oblique (RAO)-caudal view. By employing this approach, the genuine depth of valve deployment can be accurately gauged during the implantation procedure. 28

This technique holds particular significance for SEV TAVR for two primary reasons. First, SEV TAVR tend to descend into the ventricular side during implantation, with the degree of descent varying depending on the specific valve used. Second, the gradual implantation process of SEV TAVR allows for more precise adjustments to the depth of implantation. Consequently, the cusp-overlapping technique aids in deploying the TAVR valve at the optimal desired position. 29

Paravalvular Leak

TAVR has been associated with an increased rate of PVL compared to SAVR, which in turn translates into higher mortality rates. 4,30 Suboptimal device implantation, valve annulus-prosthesis diameter size mismatch, and calcification in the device landing zone have been identified as the primary predictors of PVL. 31,32 Advances in valve technology, including the development of newer generation valves, pre-procedural multidetector computed tomography imaging for precise sizing, and improved sealing mechanisms, have contributed to a reduction in PVL rates. 33,34

For instance, the PARTNER 3 trial reported similar rates of moderate to severe PVL with the BEV compared to SAVR (0.6% versus 0.5%). 1 In contrast, the low-risk trial involving the Medtronic SEV demonstrated a higher incidence of moderate to severe PVL (3.5% versus 0.55%). 2 This observation aligns with prior studies that consistently show a higher occurrence of moderate to severe PVL with SEV compared to BEV. 3–5,11,16

In cases involving highly calcified anatomies, including calcification at the annular and left ventricular outflow tract (LVOT) areas, SAVR may be considered a reasonable option for low-risk patients. 34

Durability/Bioprosthetic Valvular Dysfunction

Long-term durability data for TAVR are limited, especially for patients <65 years of age, due to predominant enrolment of >80 years of age in high- and intermediate-risk trials. This data gap underscores the need for a thorough understanding of TAVR durability across surgical risk levels. Younger patients, expected to live longer, face increased SVD risks, including heightened calcification concerns and microstructural alterations.

A study of 1,128 patients comparing supra-annular SEV TAVR and SAVR in intermediate- and high-risk patients showed a lower 5-year SVD incidence in SEV TAVR (2.57% versus 4.38%), emphasising its significance with a 50% greater risk of all-cause mortality or hospitalisation in both groups. 35

In the PARTNER 1 trial, 5-year outcomes revealed instances of SVD in the TAVR group, necessitating reoperation, notably with moderate or severe aortic regurgitation. 36 Mortality rates were higher in subgroups with aortic regurgitation. In PARTNER 2, TAVR patients experienced more paravalvular aortic regurgitation, leading to increased hospitalisations and reinterventions, mainly due to aortic regurgitation or progressive stenosis. 5

In PARTNER 3, haemodynamic valve performance of both TAVR and surgical valves appeared similar at 2 years. 37 The 5-year incidence of bioprosthetic valve failure and reintervention was comparable, with a higher percentage of mild or greater paravalvular aortic regurgitation in TAVR, but without associated higher mortality. 14

The NOTION trial provided reassuring evidence of long-term durability in low-risk patients comparing TAVR to SAVR, though a substantial percentage in the SAVR arm received later-withdrawn bioprosthetics due to early SVD. 13 The Evolut Low Risk trial suggested SEV TAVR valves may have similar durability, with better valve haemodynamic and lower PPM incidence at 3 years. 2

Direct comparisons between SAVR-only and TAVR-only studies should be avoided due to varying definitions of SVD used in different studies. The trials mentioned lack sufficient long-term data, preventing definitive conclusions regarding long-term SVD. Ongoing long-term follow-up of low-risk trials is anticipated to provide more reliable data based on standardised definitions and further comparison between balloon-expandable and self-expanding valves.

Lifetime Management

With the increasing prevalence of valve-in-valve (ViV) procedures and the broader application of TAVR in younger, low-risk patients, the imperative to establish comprehensive strategies for lifetime management has grown. Despite the expansion of TAVR indications, data on TAVR in challenging anatomies remain limited. As attention shifts towards the lifetime management of AS in younger patients requiring early interventions, careful consideration of the initial procedure choice becomes paramount. The selection of the first intervention is pivotal, aiming not only for optimal durability but also for facilitating potential second and third reinterventions. This underscores the importance of tailoring strategies based on individual patient characteristics, anatomy, technical considerations, and considering centre and operator experience, as well as patient preferences.

TAVR as First Intervention

TAVR reinterventions involve two primary strategies: TAVR explantation with SAVR and repeat TAVR. Repeat TAVR emerges as a less invasive alternative to TAVR explantation, particularly favoured in high-risk patients. Percy et al. conducted a study comparing TAVR-in-TAVR with TAVR explantation, revealing lower 30-day mortality for TAVR-in-TAVR (6.2% versus 12.3%; p=0.05) and fewer major adverse cardiovascular events (RR for TAV explantation: 2.92; 95% CI [1.88–4.99]; p≤0.001). 38 However, 1-year mortality rates were similar (21.0% versus 20.8%; p=1.000), highlighting the need for further understanding of long-term outcomes.

In another analysis of the international Redo-TAVR registry, encompassing 212 TAVR-in-TAVR patients, both early and late-presenting groups exhibited comparable 30-day and 1-year mortality rates (5.4% versus 1.5%, p=0.427, and 16.4% versus 11.7%, p=0.34, respectively). 39 Periprocedural complications after TAVR-in-TAVR were minimal, with occurrences such as new PPI (9.6%), valve malposition (3.3%), stroke (1.4%), and coronary obstruction (0.9%), and notably, no reported deaths. Stratifying TAVR-in-TAVR outcomes by the type of TAVR (BEV and SEV) revealed no association with procedural safety or mortality, and TAVR-in-TAVR with SEV was associated with a lower residual gradient. The EXPLANT-TAVR registry highlights challenges in TAVR explantation, driven mainly by endocarditis, SVD, PVL, and PPM, resulting in a 30-day mortality of 13% and a 1-year mortality of 28%. 40 Aortic root replacement was required in 13% of cases due to stent endothelialisation. Studies using the Society of Thoracic Surgeons National Database emphasise the complexity of TAVR explantation, with concomitant procedures in 63% of cases and an overall 30-day death rate of 18%. 41 The observed-to-expected mortality ratio for TAVR explant followed by isolated SAVR is 2.2. Despite its increasing prevalence, TAVR explantation remains a high-risk procedure demanding surgical expertise and exhibiting higher in-hospital mortality than standard redo-SAVR.

TAVR-in-TAVR, while a viable option, introduces its own challenges, including a >30% incidence of severe PPM and uncertainties regarding its impact on 1-year survival. 42 With the increasing use of TAVR technology in low-risk, younger patients, strategies to avoid SVD of transcatheter heart valves (THV) become crucial. Additionally, the feasibility of repeat TAVR may be limited in 10–20% of cases. 43,44 This is primarily related to the risk of sinus sequestration and coronary obstruction, particularly for supra-annular THV. Approximately one-quarter of TAVR-in-TAVR patients faces a high probability of coronary obstruction, regardless of the valve type used in the first procedure, while another one-quarter exhibits aortic root anatomy suitable for any combination of THVs during TAVR-in-TAVR. 45

Currently, the reported incidence of redo-TAVR in TAVR cases is approximately 0.33–0.59%. 39,46 As TAVR becomes more prevalent in younger patients, this incidence is expected to rise. Redo-TAVR is a feasible option for patients experiencing SVD, including THV stenosis and regurgitation. However, it is not recommended for patients with infective endocarditis or PPM.

Coronary Access in TAVR and TAVR-in-TAVR

Coronary ostium obstruction during native valve TAVR is relatively uncommon, occurring in <1% of cases. 47 Coronary obstruction typically arises due to the displacement of native valve leaflets and any accompanying calcium deposits. Patients with coronary ostia positioned <10–11 mm above the lowest point of the associated sinus, effaced sinuses, and a narrow sinus of Valsalva to tubular ascending aorta are at a heightened risk of occlusion. The use of coronary stents to safeguard a coronary artery susceptible to post-TAVR obstruction has demonstrated favourable mid-term survival rates. However, long-term data on this approach remain unavailable. 48

Despite coronary access being more challenging after the initial TAVR, the challenges after TAVR-in-TAVR, especially with SEV, are projected to be exceedingly difficult in most cases. The displaced leaflets of the first THV, positioned between two stent frames, often extend above the sinotubular junction. This creates a tube graft that holds open the first valve, posing risks to coronary circulation and access. 49 A study of the Redo-TAVR of TAVR-in-TAVR patients showed that 45.5% in the Evolut R/Evolut PRO group and 2.0% in the SAPIEN 3 had high-risk features on sinus sequestration on CT, which included AV commissure level above the sinotubular junction and a close distance between THV and STJ (<2 mm). 49

Therefore, screening candidates using cardiac CT is crucial for identifying high-risk cases, particularly in younger patients where the need for future procedures might be indicated. This is particularly important because TAVR prosthesis modification is limited and unamenable to fracture, unlike their surgical counterparts. The bioprosthetic or native Aortic Scallop Intentional Laceration to prevent Iatrogenic Coronary artery obstruction (BASILICA) technique has exhibited effectiveness in averting coronary obstruction in both native and bioprosthetic valves, thereby providing additional options for patients at risk. 50 It is worth noting that the patients included in this trial were classified as high and intermediate risk, and the procedure was conducted by experienced operators. 28 Other options available are TAVR explantation plus SAVR or the use of new emerging devices (ShortCut [Pi-Cardia]).

SAVR as First Intervention

The decision between redo-SAVR and TAVR within the surgical aortic valve (TAVR-in-SAVR) for individuals with degenerated surgical aortic bioprostheses is a nuanced process influenced by various factors. The considerations extend beyond short-term outcomes, encompassing factors such as age, surgical risk, life expectancy and anatomical considerations.

A 2021 meta-analysis involving a substantial cohort of 8,048 patients undergoing ViV-TAVR and 8,159 patients treated with redo-SAVR presented a comprehensive comparison of outcomes. 51 This analysis demonstrated no significant differences in perioperative rates of stroke, MI, major vascular complications, PVL, PPI, or 30-day readmission. Notably, ViV-TAVR was associated with lower rates of 30-day mortality [OR 0.52; 95% CI [0.39–0.68]; p<0.001), major bleeding and shorter hospital stays. However, a crucial drawback emerged as ViV-TAVR was linked to significantly higher rates of severe post-procedural PPM compared to redo-SAVR. PPM following TAVR-in-SAVR is identified as an independent risk factor for future reinterventions and exhibits inferior long-term survival. The early mortality benefit of TAVR-in-SAVR over redo-SAVR is observed to diminish at 1 year, prompting consideration of the latter for potentially better long-term survival.

An analysis of 717 propensity score-matched pairs from a large French database also showed a lower rate of the composite endpoint (all-cause mortality, stroke, MI, major or life-threatening bleeding) at 30 days following a TAVR. 52 However, no significant differences between the two groups were noted on follow-up. Intriguingly, the incidence curves favouring redo-SAVR over TAVR-in-SAVR became apparent after approximately 1 year, possibly in line with the findings in the meta-analysis reported above.

In contrast, a comprehensive 5-year follow-up study by Hahn et al., part of the PARTNER 2 Aortic ViV registry, reported the outcomes of ViV-TAVR in patients at high surgical risk. 53 The study, encompassing 369 patients who underwent ViV-TAVR, revealed sustained valve performance up to 5 years, with low rates (6.6%) of haemodynamic valve deterioration or bioprosthetic valve failure.

Important factors that have been reported in the literature to correlate with worse outcomes in ViV-TAVR in surgical prosthesis patients include smaller-degenerate valves and suboptimal implantation depth. 54 High implantation during ViV-TAVR was associated with lower gradients in both SEV and BEV. Additionally, in efforts to reduce PPM, proven to independently correlate with mortality, bioprosthetic valve ring fracture has been proposed with promising results.

Collectively, these studies underscore the multifaceted nature of the decision-making process in choosing between redo-SAVR and TAVR-in-SAVR for patients with degenerated surgical aortic bioprostheses. Patient selection, careful evaluation of procedural risks, and an understanding of the long-term implications play pivotal roles in optimising the choice between these interventions.

Challenges and Future Directions

The inclusion of young patients in TAVR discussions portends a potential rise in triple valve interventions, with the possible emergence of TAVR-TAVR-TAVR scenarios. While theoretically feasible, this approach poses considerable limitations, encompassing increased risks of PPM, PVL, need for pacemaker implantation, and significant concerns regarding long-term durability, potential coronary obstruction, restricted future coronary access, and valve thrombosis.

In considering a ‘TAVR first’ strategy, it might be wise to target patients with a large aortic root and favourable coronary anatomy. However, existing drawbacks, including limited current evidence and unknown long-term efficacy in low-risk, young patients, necessitate further investigation. Alternatively, the ‘surgery first’ approach, whether in SAVR-SAVR-TAVR or SAVR-TAVR-TAVR scenarios, remains the gold standard for managing severe AS in low-risk patients below 75 years of age, particularly in the presence of small aortic root or low coronary ostia. This strategy minimises long-term mortality and morbidity risks associated with TAVR but requires attention to procedural characteristics, bioprosthesis choice and consideration of concomitant cardiac diseases ( Figure 1 ).

Figure 1: Special Considerations for Low-risk Transcatheter Aortic Valve Replacement

Article image

Heart Team Approach

The heart team approach serves as the fundamental cornerstone in numerous structural heart and coronary interventions, including TAVR. Typically comprised of a structuralist, structural imaging specialist, cardiovascular surgeon, cardiac anaesthesiologist, as well as nursing and ancillary staff involved in the TAVR procedure, this collaborative team plays a pivotal role. 25

In alignment with the most recent American College of Cardiology/American Heart Association (ACC/AHA) valvular heart disease guidelines, it is recommended that all patients with severe valvular heart disease being considered for intervention undergo evaluation by a multidisciplinary heart valve team (class I). Furthermore, the ACC/AHA guidelines advocate for consultation with or referral to a primary valve centre or a comprehensive valve centre for deliberation on treatment options, particularly in the context of asymptomatic patients with severe valve disease, patients who may benefit from valve repair instead of valve replacement, and those with multiple comorbidities (class IIa). 25

In the case of younger patients, the heart team’s role is crucial, particularly in the decision-making process for the initial intervention. The focus extends beyond achieving optimal durability to strategically planning for potential second and third reinterventions ( Figure 2 ). The heart team must carefully weigh various factors, including the patient’s age, anatomical considerations and long-term outcomes, to make informed decisions that align with the patient’s individualised needs and maximise the efficacy of subsequent reinterventions. For more complex TAVR procedures, the centre’s experience and procedural volume emerge as important factors influencing optimal outcomes in low-risk patients. 55

Figure 2: Lifetime Management Strategies in Transcatheter Aortic Valve Replacement

Article image

Minimalist TAVR

Over the past several years, we have witnessed the rapid advancement of TAVR, giving rise to the term ‘minimalist TAVR’, which is increasingly adopted within the structural heart disease community. 56,57 This term characterises TAVR procedures with less invasive peri-procedural approaches, facilitating expedited patient recovery. 57 Such procedures typically involve conscious sedation instead of general anaesthesia, resulting in a standard length of stay of approximately 48 hours. Patients are typically monitored on telemetry floors and often do not require intensive care unit beds.

The 3M TAVR study, conducted collaboratively with 13 North American centres spanning low-, medium-, and high-volume categories, has demonstrated the feasibility of implementing a consistent minimalist TAVR approach across diverse centres. This approach led to safe next-day discharge for 80.1% of participants and discharge within 48 hours for 89.5% of participants. 57 As TAVR increasingly becomes a viable option for lower-risk patients, we anticipate witnessing a greater adoption of the minimalist TAVR approach in clinical practice.

Future Directions

Several ongoing randomised clinical trials are actively investigating outcomes in low-risk patients with asymptomatic severe aortic AS and those with moderate AS and left ventricular dysfunction. These trials, expected to conclude soon, are poised to contribute valuable insights to the prevailing body of research. Table 4 summarises these ongoing trials.

Table 4: Ongoing Randomised Clinical Trials on Outcomes of Transcatheter Aortic Valve Replacement in Low-risk Patients

Article image

TAVR has undergone rapid evolution in recent years, expanding its scope to include low-risk patients and other previously excluded patient groups. The future of TAVR is poised for further expansion, with a focus on the heart team approach, ongoing enhancements in valve design and durability, and the growing experience of operators. These trends point towards a potential future where minimalist TAVR becomes the standard of care.

  • Mack MJ, Leon MB, Thourani VH, et al. Transcatheter aortic-valve replacement with a balloon-expandable valve in low-risk patients. N Engl J Med 2019;380:1695–705.  Crossref | PubMed
  • Popma JJ, Deeb GM, Yakubov SJ, et al. Transcatheter aortic-valve replacement with a self-expanding valve in low-risk patients. N Engl J Med 2019;380:1706–15.  Crossref | PubMed
  • Carroll JD, Mack MJ, Vemulapalli S, et al. STS-ACC TVT registry of transcatheter aortic valve replacement. J Am Coll Cardiol 2020;76:2492–516.  Crossref | PubMed
  • Smith CR, Leon MB, Mack MJ, et al. Transcatheter versus surgical aortic-valve replacement in high-risk patients. N Engl J Med 2011;364:2187–98.  Crossref | PubMed
  • Leon MB, Smith CR, Mack MJ, et al. Transcatheter or surgical aortic-valve replacement in intermediate-risk patients. N Engl J Med 2016;374:1609–20.  Crossref | PubMed
  • Forrest JK, Ramlawi B, Deeb GM, et al. Transcatheter aortic valve replacement in low-risk patients with bicuspid aortic valve stenosis. JAMA Cardiol 2021;6:50–7.  Crossref | PubMed
  • Rosato S, Santini F, Barbanti M, et al. Transcatheter aortic valve implantation compared with surgical aortic valve replacement in low-risk patients. Circ Cardiovasc Interv 2016;9:e003326.  Crossref | PubMed
  • Bekeredjian R, Szabo G, Balaban Ü, et al. Patients at low surgical risk as defined by the Society of Thoracic Surgeons Score undergoing isolated interventional or surgical aortic valve implantation: in-hospital data and 1-year results from the German Aortic Valve Registry (GARY). Eur Heart J 2019;40:1323–30.  Crossref | PubMed
  • Serruys PW, Mondolo R, Reardon M, et al. One-year outcomes of patients with severe aortic stenosis and an STS PROM of less than three percent in the SURTAVI trial. EuroIntervention 2018;14:877–83.  Crossref | PubMed
  • Finkelstein A, Rozenbaum Z, Halkin A, et al. Outcomes of transcatheter aortic valve implantation in patients with low versus intermediate to high surgical risk. Am J Cardiol 2019;123:644–9.  Crossref | PubMed
  • Waksman R, Corso PJ, Torguson R, et al. TAVR in low-risk patients: 1-year results from the LRT trial. JACC Cardiovasc Interv 2019;12:901–7.  Crossref | PubMed
  • Jørgensen TH, Thyregod HGH, Ihlemann N, et al. Eight-year outcomes for patients with aortic valve stenosis at low surgical risk randomized to transcatheter vs. surgical aortic valve replacement. Eur Heart J 2021;42:2912–9.  Crossref | PubMed
  • Thyregod HGH, Ihlemann N, Jørgensen TH, et al. Five-year clinical and echocardiographic outcomes from the NOTION randomized clinical trial in patients at lower surgical risk. Circulation 2019;139:2714–23.  Crossref | PubMed
  • Mack MJ, Leon MB, Thourani VH, et al. Transcatheter aortic-valve replacement in low-risk patients at five years. N Engl J Med 2023;389:1949–60.  Crossref | PubMed
  • Forrest JK, Deeb GM, Yakubov SJ, et al. 4-year outcomes of patients with aortic stenosis in the Evolut low risk trial. J Am Coll Cardiol 2023;82:2163–5.  Crossref | PubMed
  • Leon MB, Smith CR, Mack M, et al. Transcatheter aortic-valve implantation for aortic stenosis in patients who cannot undergo surgery. N Engl J Med 2010;363:1597–607.  Crossref | PubMed
  • Waksman R, Craig PE, Torguson R, et al. Transcatheter aortic valve replacement in low-risk patients with symptomatic severe bicuspid aortic valve stenosis. JACC Cardiovasc Interv 2020;13:1019–27.  Crossref | PubMed
  • Halim SA, Edwards FH, Dai D, et al. Outcomes of transcatheter aortic valve replacement in patients with bicuspid aortic valve disease: a report from the Society of Thoracic Surgeons/American College of Cardiology transcatheter valve therapy registry. Circulation 2020;141:1071–9.  Crossref | PubMed
  • Forrest JK, Kaple RK, Ramlawi B, et al. Transcatheter aortic valve replacement in bicuspid versus tricuspid aortic valves from the STS/ACC TVT registry. JACC Cardiovasc Interv 2020;13:1749–59.  Crossref | PubMed
  • Mylotte D, Lefevre T, Søndergaard L, et al. Transcatheter aortic valve replacement in bicuspid aortic valve disease. J Am Coll Cardiol 2014;64:2330–9.  Crossref | PubMed
  • Mangieri A, Tchetchè D, Kim WK, et al. Balloon versus self-expandable valve for the treatment of bicuspid aortic valve stenosis: insights from the BEAT International Collaborative Registrys. Circ Cardiovasc Interv 2020;13:e008714.  Crossref | PubMed
  • Chandrasekhar J, Hibbert B, Ruel M, et al. Transfemoral vs non-transfemoral access for transcatheter aortic valve implantation: a systematic review and meta-analysis. Can J Cardiol 2015;31:1427–38.  Crossref | PubMed
  • Di Mario C, Goodwin M, Ristalli F, et al. A prospective registry of intravascular lithotripsy-enabled vascular access for transfemoral transcatheter aortic valve replacement. JACC Cardiovasc Interv 2019;12:502–4.  Crossref | PubMed
  • Banks A, Gaca J, Kiefer T. Review of alternative access in transcatheter aortic valve replacement. Cardiovasc Diagn Ther 2020;10:72–82.  Crossref | PubMed
  • Otto CM, Nishimura RA, Bonow RO, et al. 2020 ACC/AHA guideline for the management of patients with valvular heart disease: executive summary: a report of the American College of Cardiology/American Heart Association joint committee on clinical practice guidelines. Circulation 2021;143:e35–71.  Crossref | PubMed
  • van Rosendael PJ, Delgado V, Bax JJ. Pacemaker implantation rate after transcatheter aortic valve implantation with early and new-generation devices: a systematic review. Eur Heart J 2018;39:2003–13.  Crossref | PubMed
  • Jilaihawi H, Zhao Z, Du R, et al. Minimizing permanent pacemaker following repositionable self-expanding transcatheter aortic valve replacement. JACC Cardiovasc Interv 2019;12:1796–807.  Crossref | PubMed
  • Faroux L, Chen S, Muntané-Carol G, et al. Clinical impact of conduction disturbances in transcatheter aortic valve replacement recipients: a systematic review and meta-analysis. Eur Heart J 2020;41:2771–81.  Crossref | PubMed
  • Fadahunsi OO, Olowoyeye A, Ukaigwe A, et al. Incidence, predictors, and outcomes of permanent pacemaker implantation following transcatheter aortic valve replacement: analysis from the U.S. Society of Thoracic Surgeons/American College of Cardiology TVT registry. JACC Cardiovasc Interv 2016;9:2189–99.  Crossref | PubMed
  • Adams DH, Popma JJ, Reardon MJ, et al. Transcatheter aortic-valve replacement with a self-expanding prosthesis. N Engl J Med 2014;370:1790–8.  Crossref | PubMed
  • Athappan G, Patvardhan E, Tuzcu EM, et al. Incidence, predictors, and outcomes of aortic regurgitation after transcatheter aortic valve replacement: meta-analysis and systematic review of literature. J Am Coll Cardiol 2013;61:1585–95.  Crossref | PubMed
  • Reardon MJ, Van Mieghem NM, Popma JJ, et al. Surgical or transcatheter aortic-valve replacement in intermediate-risk patients. N Engl J Med 2017;376:1321–31.  Crossref | PubMed
  • Finkelstein A, Rozenbaum Z, Zhitomirsky S, et al. Safety outcomes of new versus old generation transcatheter aortic valves. Catheter Cardiovasc Interv 2019;94:E44–53.  Crossref | PubMed
  • Seiffert M, Fujita B, Avanesov M, et al. Device landing zone calcification and its impact on residual regurgitation after transcatheter aortic valve implantation with different devices. Eur Heart J Cardiovasc Imaging 2016;17:576–84.  Crossref | PubMed
  • Reardon M. 5-year incidence, outcomes and predictors of structural valve deterioration of transcatheter and surgical aortic bioprostheses: insights from the CoreValve US Pivotal and SURTAVI trials. Presented at: ACC Annual Scientific Session, ACC22, Washington, DC, US, 4 April 2022.
  • Mack MJ, Leon MB, Smith CR, et al. 5-year outcomes of transcatheter aortic valve replacement or surgical aortic valve replacement for high surgical risk patients with aortic stenosis (PARTNER 1): a randomised controlled trial. Lancet 2015;385:2477–84.  Crossref | PubMed
  • Leon MB, Mack MJ, Hahn RT, et al. Outcomes 2 years after transcatheter aortic valve replacement in patients at low surgical risk. J Am Coll Cardiol 2021;77:1149–61.  Crossref | PubMed
  • Percy ED, Harloff MT, Hirji S, et al. Nationally representative repeat transcatheter aortic valve replacement outcomes: report from the Centers for Medicare and Medicaid Services. JACC Cardiovasc Interv 2021;14:1717–26.  Crossref | PubMed
  • Landes U, Webb JG, De Backer O, et al. Repeat transcatheter aortic valve replacement for transcatheter prosthesis dysfunction. J Am Coll Cardiol 2020;75:1882–93.  Crossref | PubMed
  • Bapat VN, Zaid S, Fukuhara S, et al. Surgical explantation after TAVR failure: mid-term outcomes from the EXPLANT-TAVR international registry. JACC Cardiovasc Interv 2021;14:1978–91.  Crossref | PubMed
  • Fukuhara S, Nguyen CTN, Yang B, et al. Surgical explantation of transcatheter aortic bioprostheses: balloon vs self-expandable devices. Ann Thorac Surg 2022;113:138–45.  Crossref | PubMed
  • Dvir D, Webb JG, Bleiziffer S, et al. Transcatheter aortic valve implantation in failed bioprosthetic surgical valves. JAMA 2014;312:162–70.  Crossref | PubMed
  • Forrestal BJ, Case BC, Yerasi C, et al. Risk of coronary obstruction and feasibility of coronary access after repeat transcatheter aortic valve replacement with the self-expanding Evolut valve: a computed tomography simulation study. Circ Cardiovasc Interv 2020;13:e009496.  Crossref | PubMed
  • Tang GHL, Zaid S, Gupta E, et al. Feasibility of repeat TAVR after SAPIEN 3 TAVR: A novel classification scheme and pilot angiographic study. JACC Cardiovasc Interv 2019;12:1290–2.  Crossref | PubMed
  • Russo G, Tang GHL, Sangiorgi G, et al. Lifetime management of aortic stenosis: transcatheter versus surgical treatment for young and low-risk patients. Circ Cardiovasc Interv 2022;15:915–27.  Crossref | PubMed
  • Tang GHL, Zaid S, Kleiman NS, et al. Explant vs redo-TAVR after transcatheter valve failure: mid-term outcomes from the EXPLANTORREDO-TAVR international registry. JACC CardioVasc Interv 2023;16:927–41.  Crossref | PubMed
  • Ribeiro HB, Nombela-Franco L, Urena M, et al. Coronary obstruction following transcatheter aortic valve implantation: a systematic review. JACC Cardiovasc Interv 2013;6:452–61.  Crossref | PubMed
  • Palmerini T, Chakravarty T, Saia F, et al. Coronary protection to prevent coronary obstruction during TAVR: a multicenter international registry. JACC Cardiovasc Interv 2020;13:739–47.  Crossref | PubMed
  • Ochiai T, Oakley L, Sekhon N, et al. Risk of coronary obstruction due to sinus sequestration in redo transcatheter aortic valve replacement. JACC Cardiovasc Interv 2020;13:2617–27.  Crossref | PubMed
  • Khan JM, Greenbaum AB, Babaliaros VC, et al. BASILICA trial: one-year outcomes of transcatheter electrosurgical leaflet laceration to prevent TAVR coronary obstruction. Circ Cardiovasc Interv 2021;14:e010238.  Crossref | PubMed
  • Sá MPBO, Van den Eynde J, Simonato M, et al. Valve-in-valve transcatheter aortic valve replacement versus redo surgical aortic valve replacement: an updated meta-analysis. JACC Cardiovasc Interv 2021;14:211–20.  Crossref | PubMed
  • Woitek FJ, Stachel G, Kiefer P, et al. Treatment of failed aortic bioprostheses: an evaluation of conventional redo surgery and transfemoral transcatheter aortic valve-in-valve implantation. Int J Cardiol 2020;300:80–6.  Crossref | PubMed
  • Hahn RT, Webb J, Pibarot P, et al. 5-year follow-up from the PARTNER 2 aortic valve-in-valve registry for degenerated aortic surgical bioprostheses. JACC Cardiovasc Interv 2022;15:698–708.  Crossref | PubMed
  • Bleiziffer S, Simonato M, Webb JG, et al. Long-term outcomes after transcatheter aortic valve implantation in failed bioprosthetic valves. Eur Heart J 2020;41:2731–42.  Crossref | PubMed
  • Vemulapalli S, Carroll JD, Mack MJ, et al. Procedural volume and outcomes for transcatheter aortic-valve replacement. N Engl J Med 2019;380:2541–50.  Crossref | PubMed
  • Lauck SB, Sathananthan J, Park J, et al. Post-procedure protocol to facilitate next-day discharge: results of the multidisciplinary, multimodality but minimalist TAVR study. Catheter Cardiovasc Interv 2020;96:450–8.  Crossref | PubMed
  • Wood DA, Lauck SB, Cairns JA, et al. The Vancouver 3M (multidisciplinary, multimodality, but minimalist) clinical pathway facilitates safe next-day discharge home at low-, medium-, and high-volume transfemoral transcatheter aortic valve replacement centers: the 3M TAVR study. JACC Cardiovasc Interv 2019;12:459–69.  Crossref | PubMed
  • Small Language Model
  • Computer Vision
  • Federated Learning
  • Reinforcement Learning
  • Natural Language Processing
  • New Releases
  • Open Source AI
  • Hugging Face Trends
  • AI Webinars
  • 🔥 Promotion/Partnership

Logo

The SLR utilizes a comprehensive search strategy using various digital libraries, databases, and AI-powered tools. The search, conducted until May 25th, 2024, focused on studies related to language modeling, particularly LLM optimization and acceleration. Moreover, ResearchRabbit and Rayyan AI tools facilitated data collection and study selection. The selection process contains strict inclusion criteria, focusing on large-scale language modeling techniques, including transformer-based models. A two-stage screening process, (a) initial screening based on eligibility and (b) inclusion criteria, was implemented. The Rayyan platform’s “compute rating” function assisted in the final selection, with authors double-checking excluded studies to ensure accuracy.

LLM training frameworks and libraries face major challenges due to the complexity and size of the models. Distributed training frameworks like Megatron-LM and CoLLiE tackle these issues by splitting models across multiple GPUs for parallel processing. Efficiency and speed enhancement are achieved through system-level optimizations in frameworks like LightSeq2 and ByteTransformer, which improve GPU utilization and reduce memory usage. Moreover, Memory management is an important factor that can be addressed with CoLLiE which uses 3D parallelism and distributes memory efficiently across training machines and GPUs.

These five key frameworks and libraries help overcome LLM training limitations: 

  • GPipe successfully trains large multilingual transformer models, outperforming individual smaller models. 
  • ByteTransformer demonstrates superior performance for BERT-like transformers across various benchmarks.
  • Megatron-LM enables the training of billion-parameter LLMs, achieving state-of-the-art results on NLP tasks with high throughput. 
  • LightSeq2 significantly accelerates transformer model training, enhancing performance by up to 308%. 
  • CoLLiE introduces collaborative LLM training, improving efficiency and effectiveness for large models like LLaMA-65B, without compromising overall performance.

systematic literature review of

Now, talking about LLM Inference Frameworks and Libraries, the major challenges faced are computational expenses, resource constraints, the requirement of balance speed, accuracy, and resource utilization. Hardware specialization, resource optimization, algorithmic improvements, and distributed inference are the crucial findings to address these challenges. Frameworks like Splitwise separate compute-intensive and memory-intensive phases onto specialized hardware, and FlexGen optimizes resource usage across CPU, GPU, and disk. Moreover, libraries like EET and LightSeq help to accelerate GPU inference through custom algorithms and memory management. These advancements show significant performance, with frameworks like DeepSpeed Inference and FlexGen to gain throughput increases and latency reductions.

Large language models (LLMs) face significant challenges during training optimization. It includes (a) resource constraints that limit their training and deployment on single devices due to high memory and computational needs, (b) balancing efficiency and accuracy between efficient resource utilization and maintaining model performance, (c) memory bottlenecks when distributing LMs across devices, (d) communication overhead during data exchange that can slow training, (e) hardware heterogeneity that complicates efficient utilization of diverse devices, and (f) scalability limitation hindered by memory and communication constraints.

To overcome these challenges, diverse optimization techniques for LLMs have been developed: 

  • Algorithmic: Techniques like FlexGen enhance efficiency through optimized computations and specialized hardware kernels. 
  • Model partitioning: Techniques like GPipe allow processing across multiple devices, even with limited memory. 
  • Fine-tuning for efficiency: Techniques like AlphaTuning and LoRA enable fine-tuning large models on limited memory by reducing the number of adjustmentable parameters.
  • Scheduler optimization: Techniques like TurboTransformers improve response throughput and task execution on GPUs.

Other optimizations include size reduction optimization, Parallelism strategies, Memory optimization, Heterogeneous optimization, and Automatic parallelism:

systematic literature review of

While the SLR on large language model optimization techniques is thorough, it has some limitations. The search strategy may have missed relevant studies that used different terminologies. Moreover, the limited database coverage has resulted in overlooking significant research. These factors might impact the review’s completeness, especially in the historical context and the latest advancements.

In this paper, researchers introduced a systematic literature review (SLR) that analyzes 65 publications from 2017 to December 2023, following the PRISMA approach, and examined optimization and acceleration techniques for LLMs. It identified challenges in training, inference, and system serving for billion or trillion parameter LLMs. The proposed taxonomy provides a clear guide for researchers to navigate various optimization strategies. The review of libraries and frameworks supports efficient LLM training and deployment, and two case studies demonstrate practical approaches to optimize model training and enhance inference efficiency. Although recent advancements are promising, the study emphasizes the need for future research to realize the potential of LLM optimization techniques fully.

Check out the Paper . All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on  Twitter and join our  Telegram Channel and  LinkedIn Gr oup . If you like our work, you will love our  newsletter..

Don’t Forget to join our  50k+ ML SubReddit

⏩ ⏩ FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)

systematic literature review of

Sajjad Ansari

Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.

  • CONClave: Enhancing Security and Trust in Cooperative Autonomous Vehicle Networks Cooperative Infrastructure Sensors Environments
  • CogniDual Framework for LLMs: Advancing Language Models from Deliberate Reasoning to Intuitive Responses Through Self-Training
  • MIT Researchers Introduce Stochastic Quantum Signal Processing (QSP) as a Randomly-Compiled Version of QSP, and Reduce the Cost of QSP-based Algorithms by a Factor of 1/2
  • This AI Paper from MIT Explores the Complexities of Teaching Language Models to Forget: Insights from Randomized Fine-Tuning

RELATED ARTICLES MORE FROM AUTHOR

Source2synth: a new ai technique for synthetic data generation and curation grounded in real data sources, mistral ai released mistral-small-instruct-2409: a game-changing open-source language model empowering versatile ai applications with unmatched efficiency and accessibility, writer researchers introduce writing in the margins (wim): a new inference pattern for large language models designed to optimize the handling of long input..., dreamhoi: a novel ai approach for realistic 3d human-object interaction generation using textual descriptions and diffusion models, microscopic-mamba released: a groundbreaking hybrid model combining convolutional neural network cnns and ssms for efficient and accurate medical microscopic image classification, how well can ai models capture the sound of emotion this ai paper unveils salmon: a suite for acoustic language model evaluation, source2synth: a new ai technique for synthetic data generation and curation grounded in real..., mistral ai released mistral-small-instruct-2409: a game-changing open-source language model empowering versatile ai applications with..., writer researchers introduce writing in the margins (wim): a new inference pattern for large..., dreamhoi: a novel ai approach for realistic 3d human-object interaction generation using textual descriptions..., microscopic-mamba released: a groundbreaking hybrid model combining convolutional neural network cnns and ssms for..., how well can ai models capture the sound of emotion this ai paper unveils..., optimizing ai safety and deployment: a game-theoretic approach to protocol evaluation in untrusted ai..., contrastive twist learning and bidirectional smc bounds: a new paradigm for language model control, mppi-generic: a new c++/cuda library for gpu-accelerated stochastic optimization, an extensible open-source ai framework to benchmark attributable information-seeking using representative llm-based approaches, summarymixing: a linear-time complexity alternative to self-attention, to streaming speech recognition with a streaming..., nino: a novel machine learning approach to accelerate neural network training through neuron interaction....

  • AI Magazine
  • Privacy & TC
  • Cookie Policy
  • 🐝 Partnership and Promotion

Privacy Overview

IMAGES

  1. Systematic literature review phases.

    systematic literature review of

  2. Systematic Literature Review Methodology

    systematic literature review of

  3. How to Conduct a Systematic Review

    systematic literature review of

  4. systematic literature review steps

    systematic literature review of

  5. Systematic Literature Review Methodology

    systematic literature review of

  6. How to Write A Systematic Literature Review?

    systematic literature review of

VIDEO

  1. Systematic Literature Review Paper

  2. Systematic literature review in Millitary Studies'...free webinar

  3. Introduction Systematic Literature Review-Various frameworks Bibliometric Analysis

  4. Academic Stress of Students in Higher Education using ML: A Systematic Literature Review

  5. Artificial Intelligence Initiative in Taiwan : A Systematic Literature Review

  6. Systematic Literature Review- Part 1, What and Why

COMMENTS

  1. How-to conduct a systematic literature review: A quick guide for

    Method details Overview. A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure [12].An SLR updates the reader with current literature about a subject [6].The goal is to review critical points of current knowledge on a ...

  2. Systematic reviews: Structure, form and content

    A systematic review collects secondary data, and is a synthesis of all available, relevant evidence which brings together all existing primary studies for review (Cochrane 2016). A systematic review differs from other types of literature review in several major ways.

  3. Systematic Review

    Systematic review vs. literature review. A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method.

  4. How to write a systematic literature review [9 steps]

    Screen the literature. Assess the quality of the studies. Extract the data. Analyze the results. Interpret and present the results. 1. Decide on your team. When carrying out a systematic literature review, you should employ multiple reviewers in order to minimize bias and strengthen analysis.

  5. Guidelines for writing a systematic review

    A Systematic Review (SR) is a synthesis of evidence that is identified and critically appraised to understand a specific topic. SRs are more comprehensive than a Literature Review, which most academics will be familiar with, as they follow a methodical process to identify and analyse existing literature (Cochrane, 2022).

  6. Systematic reviews: Structure, form and content

    A systematic review collects secondary data, and is a synthesis of all available, relevant evidence which brings together all existing primary studies for review (Cochrane 2016).A systematic review differs from other types of literature review in several major ways.

  7. Guidance on Conducting a Systematic Literature Review

    Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in literature ...

  8. How to Do a Systematic Review: A Best Practice Guide for Conducting and

    The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information.

  9. How to Write a Systematic Review of the Literature

    This article provides a step-by-step approach to conducting and reporting systematic literature reviews (SLRs) in the domain of healthcare design and discusses some of the key quality issues associated with SLRs. SLR, as the name implies, is a systematic way of collecting, critically evaluating, int …

  10. PDF Systematic Literature Reviews: an Introduction

    Systematic literature reviews (SRs) are a way of synthesising scientific evidence to answer a particular research question in a way that is transparent and reproducible, while seeking to include all published evidence on the topic and appraising the quality of th is evidence. SRs have become a major methodology

  11. How to Write a Systematic Review of the Literature

    This article provides a step-by-step approach to conducting and reporting systematic literature reviews (SLRs) in the domain of healthcare design and discusses some of the key quality issues associated with SLRs. SLR, as the name implies, is a systematic way of collecting, critically evaluating, integrating, and presenting findings from across ...

  12. Research Guides: Systematic Reviews: Types of Literature Reviews

    Rapid review. Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research. Completeness of searching determined by time constraints. Time-limited formal quality assessment. Typically narrative and tabular.

  13. Systematically Reviewing the Literature: Building the Evidence for

    Systematic reviews that summarize the available information on a topic are an important part of evidence-based health care. There are both research and non-research reasons for undertaking a literature review. It is important to systematically review the literature when one would like to justify the need for a study, to update personal ...

  14. Carrying out systematic literature reviews: an introduction

    Systematic reviews provide a synthesis of evidence for a specific topic of interest, summarising the results of multiple studies to aid in clinical decisions and resource allocation. They remain among the best forms of evidence, and reduce the bias inherent in other methods. A solid understanding of the systematic review process can be of ...

  15. Systematic Reviews and Meta Analysis

    A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and pre-selected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies.

  16. Systematic Literature Reviews: An Introduction

    Systematic literature reviews (SRs) are a way of synt hesising scientific evidence to answer a particular. research question in a way that is transparent and reproducible, while seeking to include ...

  17. A guide to systematic literature reviews

    Systematic literature reviews (SLRs) are an effective way of mapping a research field and synthesizing research evidence. However, especially in communication research, SLRs often include diverse ...

  18. How to Do a Systematic Review: A Best Practice Guide ...

    Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to ...

  19. Systematic Literature Review or Literature Review

    The difference between literature review and systematic review comes back to the initial research question. Whereas the systematic review is very specific and focused, the standard literature review is much more general. The components of a literature review, for example, are similar to any other research paper.

  20. Description of the Systematic Literature Review Method

    A systematic literature review (SLR) is an independent academic method that aims to identify and evaluate all relevant literature on a topic in order to derive conclusions about the question under consideration."Systematic reviews are undertaken to clarify the state of existing research and the implications that should be drawn from this."

  21. An overview of methodological approaches in systematic reviews

    1. INTRODUCTION. Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the "gold standard" of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to systematically search ...

  22. Exploring Systematic Literature Reviews: A Comprehensive Guide for

    A systematic literature review (SLR) goes beyond traditional reviews by following a structured, detailed methodology to gather, assess, and synthesize studies on a specific research question. This guide from WritersER explains the key steps involved in conducting an SLR, including defining your research question, developing a search strategy, screening studies, and reporting findings.

  23. How-to conduct a systematic literature review: A quick guide for

    Overview. A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure .An SLR updates the reader with current literature about a subject .The goal is to review critical points of current knowledge on a topic about research ...

  24. Chapter 6: Systematic Review Overview

    A systematic review should be used to retrieve the best available evidence related to the Population, Intervention, Comparison, and Outcomes (PICO) question. All guidelines should be preceded by a systematic review to ensure that recommendations and judgements are supported by an extensive body of evidence that addresses the research question.

  25. Continuance intention of online technologies: A systematic literature

    Given the dynamic nature of digital technologies, understanding why users intend to continue to use them or not is important for practitioners and academics alike. This paper presents an up-to-date Systematic Literature Review (SLR) of Continuance Intention (CI) for online technologies. The SLR classifies and analyses 147 relevant articles on CI in the field of online technology.

  26. Guidance to best tools and practices for systematic reviews

    Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ...

  27. Systematic Review: Grey literature & supplementary searching

    See the Moodle book MNHS: Systematically searching the grey literature for a comprehensive module on grey literature for systematic reviews. Trial registers. Systematic reviews routinely search trials registers as a means of identifying additional unpublished and ongoing clinical trials and reducing the risk of reporting biases.

  28. Guidelines for the Use of Literature Reviews in Master's Theses in

    A systematized literature review is a type of review that allows students to practice central elements of a systematic review (transparency, reproducibility), while omitting some of its more resource-intensive prerequisites, like conducting a comprehensive search or performing a quality assessment (Grant & Booth, 2009). However, the lack of ...

  29. TAVR is Ready for Most Low-risk Patients

    Serruys et al. analysed 254 patients (131 TAVR, 123 SAVR) and found that the composite endpoint of all-cause mortality or disabling stroke was lower in the TAVR group compared to SAVR among patients with a Society of Thoracic Surgeons (STS) score of < 3% (1.5% versus 6.5%, p=0.04). 9 Conversely, a study involving 3,402 low-risk patients demonstrated that following propensity matching, SAVR ...

  30. A Systematic Literature Review: Optimization and Acceleration

    These factors might impact the review's completeness, especially in the historical context and the latest advancements. In this paper, researchers introduced a systematic literature review (SLR) that analyzes 65 publications from 2017 to December 2023, following the PRISMA approach, and examined optimization and acceleration techniques for LLMs.