A 6-step guide to Data Quality Assessments (DQAs)

Monitoring experts responsible for data management and reporting are increasingly questioned about data validity, reliability, integrity, and timeliness. In short, people want to know whether they can trust the data. In order to answer these questions, a Data Quality Assessment (DQA) is carried out. A DQA can either be carried out internally (by the project M&E team), or a donor agency can engage an external DQA expert to conduct the assessment.

DQA is a systematic process to assess the strengths and weaknesses of a data set, and to inform users of the ‘health’ of data. The assessment mainly focuses on the following aspects of data:

Validity Does the data clearly and adequately represent the intended result?
Would an expert third party agree that the indicator is a valid measure for the stated result?
Reliability Are the indicator definition and data collection and analysis processes clear and are these consistently applied over time?
Integrity Do the data collection, analysis and reporting processes have clear mechanisms in place to reduce manipulation?
Timeliness Is the data sufficient, timely and current (recent) to influence management decision-making?

DQA is a multi-stage process involving several steps, each with its own activities and deliverables. The following sections provide details about the six steps involved in carrying out a DQA.

If you like this article don't forget to register to the ActivityInfo newsletter to receive new guides, articles and webinars on various M&E topics !

This guide is also available in French and Spanish

Step 1: Selection of indicators

Since DQA is a time-consuming and resource-intensive process, experts advise the selection of a minimum number of indicators. Ideally, no more than three indicators should be selected in one mission/DQA assignment using the following criteria:

  • indicators which are of a high importance such as ‘the number of jobs created’
  • indicators which report high progress over time (or those of high targets)
  • indicators which haven’t been under the DQA previously
  • any indicators with suspected data quality issues (or unusual progress)
  • indicators which were previously DQAed and their data quality was rated as ‘poor’

Step 2: Review of available documents/datasets and preparation for the field-phase

In the second step, the DQA expert should review the previous DQA reports (if any) to understand the data collection and data management system as well as the findings and recommendations. Moreover, any available reports, such as narrative progress reports, are also reviewed. In the case of an external DQA, the expert must also review data sets supplied by the project/organisation. The expert may also request (or obtain) the project M&E plan or guidelines to understand the M&E system. The information may be used to develop a DQA matrix which includes the key questions, sub-questions, data sources for the DQA questions, and tools and methods to be used to answer those questions.

Step 3: Review/assessment of data collection and management system

Once the preparatory phase is over, the DQA expert arranges meetings with the relevant project staff (including the M&E team) to understand the data collection and management system.The focus should be on:

  • checking the M&E Plan (if available)
  • reviewing indicator meta-data (or indicator reference sheets)
  • assessing the adequacy of methods and tools
  • understanding the data flow process, roles and responsibilities (background/experience) of the team responsible for data collection and data management
  • understanding the tools and mechanisms in place to ensure the integrity of data The expert may request supporting documents to triangulate the details given by the team in response to the above items.

Step 4: Review of data collection and management system implementation/operationalization

During this stage, the expert should focus on the following questions:

  • Has the data been collected and managed in conformity with the data collection system design?
  • Is the data collected and analysed on a sufficiently timely basis to influence management decision-making?
  • Are adequate data-checking procedures being conducted (excluding field-level verification and validation)?
  • Has the data been analysed and reported in conformity with the data collection system design?

The above questions are answered by reviewing the actual data and analysis. Supporting documents are consulted, and the system/database is checked to ensure that the data collection and management system is in conformity with the data collection system design.

Step 5: Verification and validation of data

At this stage, the expert carries out a verification exercise to validate the reported data. This is done by selecting a sample of data and physically verifying it through supporting documents as well as physical verification of the data.

Step 6: Compilation of the DQA report

Once the review and field phases are over, the DQA expert produces a report. Ideally, a DQA report should include the following:

  • Executive summary
  • Background / Introduction of the project
  • Indicators selected for the DQA: a. Process and methodology followed for the DQA, b. Key findings (separately per indicator), c. Data flow (steps), d. Data Management System Design, e. Implementation/Operationalisation of the Data Management System Design
  • Data Verification/validation
  • Scores and overall rating (per indicator)
  • Recommendations (separately for each indicator)

In summary, the DQA process involves selecting indicators, reviewing available documents and datasets, assessing the data collection and management system, reviewing its implementation/operationalisation, verifying and validating the data, and compiling a DQA report. The report serves as a tool to identify weaknesses and strengths in the data collection and management system and to make recommendations for improvement.

The ActivityInfo team would like to thank the Education partner Maheed Ullah Fazli Wahid for this article. Maheed is a high-profile M&E expert with demonstrated experience in designing and managing M&E systems for multi-billion-dollar programmes focusing on humanitarian and development interventions. Over the past few years, Maheed has participated in more than 30 DQAs. Currently, he is the Senior M&E System Manager for the EU Facility for Refugees in Turkey (FRiT), a programme consisting of over 100 projects covering projects in sectors such as Education, Health, Livelihoods, Cash Distribution, Protection, Municipal Infrastructure, and Migration Management.

SlidePlayer

  • My presentations

Auth with social network:

Download presentation

We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!

Presentation is loading. Please wait.

To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video

Data Quality Assessment

Published by Leon Byrd Modified over 8 years ago

Similar presentations

Presentation on theme: "Data Quality Assessment"— Presentation transcript:

Data Quality Assessment

Role of Senior Management

data quality assessment presentation

Quality Assurance/Quality Control Plan Evaluation February 16, 2005.

data quality assessment presentation

Data Quality Considerations

data quality assessment presentation

Chapter 7: Key Process Areas for Level 2: Repeatable - Arvind Kabir Yateesh.

data quality assessment presentation

Quality Improvement/ Quality Assurance Amelia Broussard, PhD, RN, MPH Christopher Gibbs, JD, MPH.

data quality assessment presentation

Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.

data quality assessment presentation

Internal verification and external standards moderation.

data quality assessment presentation

DoD Information Technology Security Certification and Accreditation Process (DITSCAP) Phase III – Validation Thomas Howard Chris Pierce.

data quality assessment presentation

The Islamic University of Gaza

data quality assessment presentation

ASSESSMENT WORKSHOP: SESSION 1 ADMINISTRATIVE SUPPORT SERVICES ACADEMIC AND STUDENT SUPPORT SERVICES PRESENTED BY THE DIVISION OF INSTITUTIONAL EFFECTIVENESS.

data quality assessment presentation

Indicators, Data Sources, and Data Quality for TB M&E

data quality assessment presentation

Unit 4: Monitoring Data Quality For HIV Case Surveillance Systems #6-0-1.

data quality assessment presentation

UNDERSTANDING, PLANNING AND PREPARING FOR THE SCHOOL-WIDE EVALUATION TOOL (SET)

data quality assessment presentation

HEDIS Audit – Appropriate Monitoring and Oversight of Vendors Presenter: Yolanda Strozier, MBA Project Manager, EQRO Services.

data quality assessment presentation

Purpose of the Standards

data quality assessment presentation

Dimensions of Data Quality M&E Capacity Strengthening Workshop, Addis Ababa 4 to 8 June 2012 Arif Rashid, TOPS.

data quality assessment presentation

FPSC Safety, LLC ISO AUDIT.

data quality assessment presentation

SystematicSystematic process that translates quality policy into measurable objectives and requirements, and lays down a sequence of steps for realizing.

data quality assessment presentation

Photocopies Occasionally need uncontrolled copies

data quality assessment presentation

QA/QC FOR ENVIRONMENTAL MEASUREMENT

About project

© 2024 SlidePlayer.com Inc. All rights reserved.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Environ Res Public Health

Logo of ijerph

A Review of Data Quality Assessment Methods for Public Health Information Systems

1 School of Information Systems and Technology, Faculty of Engineering and Information Sciences, University of Wollongong, Wollongong, NSW, 2522, Australia; E-Mails: ua.ude.liamwou@879ch (H.C.); ua.moc.liamzo@yeliahd (D.H.)

2 Jiangxi Provincial Center for Disease Control and Prevention, Nanchang 330029, China

David Hailey

3 National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, China; E-Mail: moc.361@jbngnaw

High quality data and effective data quality assessment are required for accurately evaluating the impact of public health interventions and measuring public health outcomes. Data, data use, and data collection process, as the three dimensions of data quality, all need to be assessed for overall data quality assessment. We reviewed current data quality assessment methods. The relevant study was identified in major databases and well-known institutional websites. We found the dimension of data was most frequently assessed. Completeness, accuracy, and timeliness were the three most-used attributes among a total of 49 attributes of data quality. The major quantitative assessment methods were descriptive surveys and data audits, whereas the common qualitative assessment methods were interview and documentation review. The limitations of the reviewed studies included inattentiveness to data use and data collection process, inconsistency in the definition of attributes of data quality, failure to address data users’ concerns and a lack of systematic procedures in data quality assessment. This review study is limited by the coverage of the databases and the breadth of public health information systems. Further research could develop consistent data quality definitions and attributes. More research efforts should be given to assess the quality of data use and the quality of data collection process.

1. Introduction

Public health is “the science and art of preventing disease, prolonging life, and promoting physical health and efficiency through organized community efforts” [ 1 ]. The ultimate goal of public health is to improve health at the population level, and this is achieved through the collective mechanisms and actions of public health authorities within the government context [ 1 , 2 ]. Three functions of public health agencies have been defined: assessment of health status and health needs, policy development to serve the public interest, and assurance that necessary services are provided [ 2 , 3 ]. Since data, information and knowledge underpin these three functions, public health is inherently a data-intensive domain [ 3 , 4 ]. High quality data are the prerequisite for better information, better decision-making and better population health [ 5 ].

Public health data represent and reflect the health and wellbeing of the population, the determinants of health, public health interventions and system resources [ 6 ]. The data on health and wellbeing comprise measures of mortality, ill health, and disability. The levels and distribution of the determinants of health are measured in terms of biomedical, behavioral, socioeconomic and environmental risk factors. Data on public health interventions include prevention and health promotion activities, while those on system resources encompass material, funding, workforce, and other information [ 6 ].

Public health data are used to monitor trends in the health and wellbeing of the community and of health determinants. Also, they are used to assess the risks of adverse health effects associated with certain determinants, and the positive effects associated with protective factors. The data inform the development of public health policy and the establishment of priorities for investment in interventions aimed at modifying health determinants. They are also used to monitor and evaluate the implementation, cost and outcomes of public health interventions, and to implement surveillance of emerging health issues [ 6 ].

Thus, public health data can help public health agencies to make appropriate decisions, take effective and efficient action, and evaluate the outcomes [ 7 , 8 ]. For example, health indicators set up the goals for the relevant government-funded public health agencies [ 5 ]. Well-known health indicators are the Millennium Development Goals (MDGs) 2015 for the United Nations member states [ 9 ]; the European Core Health Indicators for member countries of the European Union [ 10 ]; “Healthy People” in the United States, which set up 10-year national objectives for improving the health of US citizens [ 11 ]; “Australia: The Healthiest Country by 2020” that battles lifestyle risk factors for chronic disease [ 12 ]; and “Healthy China 2020”, an important health strategy to improve the public’s health in China [ 13 ].

Public health data are generated from public health practice, with data sources being population-based and institution-based [ 5 , 6 ]. Population-based data are collected through censuses, civil registrations, and population surveys. Institution-based data are obtained from individual health records and administrative records of health institutions [ 5 ]. The data stored in public health information systems (PHIS) must first undergo collection, storage, processing, and compilation. The procured data can then be retrieved, analyzed, and disseminated. Finally, the data will be used for decision-making to guide public health practice [ 5 ]. Therefore, the data flows in a public health practice lifecycle consist of three phases: data, data collection process and use of data.

PHIS, whether paper-based or electronic, are the repositories of public health data. The systematic application of information and communication technologies (ICTs) to public health has seen the proliferation of computerized PHIS around the world [ 14 , 15 , 16 ]. These distributed systems collect coordinated, timely, and useful multi-source data, such as those collected by nation-wide PHIS from health and other sectors [ 17 ]. These systems are usually population-based, and recognized by government-owned public health agencies [ 18 ].

The computerized PHIS are developed with broad objectives, such as to provide alerts and early warning, support public health management, stimulate research, and to assist health status and trend analyses [ 19 ]. Significant advantages of PHIS are their capability of electronic data collection, as well as the transmission and interchange of data, to promote public health agencies’ timely access to information [ 15 , 20 ]. The automated mechanisms of numeric checks and alerts can improve validity and reliability of the data collected. These functions contribute to data management, thereby leading to the improvement in data quality [ 21 , 22 ].

Negative effects of poor data quality, however, have often been reported. For example, Australian researchers reported coding errors due to poor quality documentations in the clinical information systems. These errors had consequently led to inaccurate hospital performance measurement, inappropriate allocation of health funding, and failure in public health surveillance [ 23 ].

The establishment of information systems driven by the needs of single-disease programs may cause excessive data demand and fragmented PHIS systems, which undermine data quality [ 5 , 24 ]. Studies in China, the United Kingdom and Pakistan reported data users’ lack of trust in the quality of AIDS, cancer, and health management information systems due to unreliable or uncertain data [ 25 , 26 , 27 ].

Sound and reliable data quality assessment is thus vital to obtain the high data quality which enhances users’ confidence in public health authorities and their performance [ 19 , 24 ]. As countries monitor and evaluate the performance and progress of established public health indicators, the need for data quality assessment in PHIS that store the performance-and-progress-related data has never been greater [ 24 , 28 , 29 ]. Nowadays, data quality assessment that has been recommended for ensuring the quality of data in PHIS becomes widespread acceptance in routine public health practice [ 19 , 24 ].

Data quality in public health has different definitions from different perspectives. These include: “fit for use in the context of data users” [ 30 ], (p. 2); “timely and reliable data essential for public health core functions at all levels of government” [ 31 ], (p. 114) and “accurate, reliable, valid, and trusted data in integrated public health informatics networks” [ 32 ]. Whether the specific data quality requirements are met is usually measured along a certain number of data quality dimensions. A dimension of data quality represents or reflects an aspect or construct of data quality [ 33 ].

Data quality is recognized as a multi-dimensional concept across public health and other sectors [ 30 , 33 , 34 , 35 ]. Following the “information chain” perspective, Karr et al. used “three hyper-dimensions” ( i.e. , process, data and user) to group a set of conceptual dimensions of data quality [ 35 ]. Accordingly, the methods for assessment of data quality must be useful to assess these three dimensions [ 35 ]. We adopted the approach of Karr et al. because their typology provided a comprehensive perspective for classifying data quality assessment. However, we replace “process” by “data collection process” and “user” by “data use”. “Process” is a broad term and may be considered as the whole process of data flows, including data and use of data. “User” is a specific term related to data users or consumers and may ignore the use of data. To accurately reflect the data flows in the context of public health, we define the three dimensions of data quality as data, data use and data collection process. The dimension of data focuses on data values or data schemas at record/table level or database level [ 35 ]. The dimension of data use, related to use and user, is the degree and manner in which data are used [ 35 ]. The dimension of data collection process refers to the generation, assembly, description and maintenance of data [ 35 ] before data are stored in PHIS.

Data quality assessment methods generally base on the measurement theory [ 35 , 36 , 37 , 38 ]. Each dimension of data quality consists of a set of attributes. Each attribute characterizes a specific data quality requirement, thereby offering the standard for data quality assessment [ 35 ]. Each attribute can be measured by different methods; therefore, there is flexibility in methods used to measure data quality [ 36 , 37 , 38 ]. As the three dimensions of data quality are embedded in the lifecycle of public health practice, we propose a conceptual framework for data quality assessment in PHIS ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is ijerph-11-05170-g001.jpg

Conceptual framework of data quality assessment in public health practice.

Although data quality has always been an important topic in public health, we have identified a lack of systematic review of data quality assessment methods for PHIS. This is the motivation for this study because knowledge about current developments in methods for data quality assessment is essential for research and practice in public health informatics. This study aims to investigate and compare the methods for data quality assessment of PHIS so as to identify possible patterns and trends emerging over the first decade of the 21st century. We take a qualitative systematic review approach using our proposed conceptual framework.

2.1. Literature Search

We identified publications by searching several electronic bibliographic databases. These included Scopus, IEEE Xplore, Web of Science, ScienceDirect, PubMed, Cochrane Library and ProQuest. Because many public health institutes also published guidelines, frameworks, or instruments to guide the institutional approach to assess data quality, some well-known institutions’ websites were also reviewed to search for relevant literature. The following words and MeSH headings were used individually or in combination: “data quality”, “information quality”, “public health”, “population health”, “information system *”, “assess *”, “evaluat *”. (“*” was used to find the variations of some word stems.) The articles were confined to those published in English and Chinese language.

The first author performed the literature search between June 2012 and October 2013. The inclusion criteria were peer-refereed empirical studies or institutional reports of data quality assessment in public health or PHIS during the period 2001–2013. The exclusion criteria were narrative reviews, expert opinion, correspondence and commentaries in the topic area. To improve coverage, a manual search of the literature was conducted to identify papers referenced by other publications, papers and well-known authors, and papers from personal databases.

2.2. Selection of Publications

Citations identified in the literature search were screened by title and abstract for decisions about inclusion or exclusion in this review. If there was uncertainty about the relevance of a citation, the full-text was retrieved and checked. A total of 202 publications were identified and were manually screened. If there was uncertainty about whether to include a publication, its relevance was checked by the fourth author. Finally 39 publications that met the inclusion criteria were selected. The screening process is summarized in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is ijerph-11-05170-g002.jpg

Publication search process.

2.3. Data Abstraction

The selected publications were stored in an EndNote library. Data extracted from the publications included author, year of publication, aim of data quality assessment, country and context of the study, function and scope of the PHIS, definition of data quality, methods for data quality assessment, study design, data collection methods, data collected, research procedure, methods for data analysis, key findings, conclusions and limitations.

The 39 publications were placed in two groups according to whether they were published by a public health institution at national or international level or by individual researchers. If the article was published by the former, it is referred to as an institutional publication, if by the latter, as a research paper.

Of the 39 publications reviewed, 32 were peer-refereed research papers and seven were published by public health institutions. The institutional publications are listed in Table 1 .

Institutional data quality assessment publications.

AcronymTitleInstitution
CDC’s Guidelines [ ]Updated Guidelines for Evaluating Public Health Surveillance SystemsUnited States Centers for Diseases Control and Prevention
CIHI DQF [ ]CIHI Data Quality FrameworkCanadian Institute for Health Information
ME DQA [ , ] *Data Quality Audit ToolMEASURE Evaluation Project
ME PRISM [ , ]Performance of Routine Information System Management Version 3.1MEASURE Evaluation Project
WHO DQA [ , ]The Immunization Data Quality Audit (DQA)Procedure; Immunization Data Quality Self-assessment (WHO DQS) ToolDepartment of Immunization Vaccines and Biologicals, World Health Organization
WHO DQRC [ ]Guide to the Health Facility Data Quality Report CardWorld Health Organization
WHO HMN [ ]Assessing the National Health Information System An Assessment Tool Version 4.00Health Metrics Network, World Health Organization

* ME DQA is adopted by the Global Fund to Fight AIDS, Tuberculosis and Malaria.

27 of the 39 reviewed publications were published between 2008 and 2013. There was a trend of increasing numbers of research papers per year, suggesting an increasing research focus on data quality with the wider adoption of computerised PHIS in recent years.

The results are organized as follows. First, the aims of the studies are given. This is followed by context and scope identified in Section 3.2 . Section 3.3 examines the methods for data quality assessment. A detailed summary of the findings concludes the results in Section 3.4 . For each section, a comparison between institutional publications and research papers was conducted, where this was possible and meaningful.

3.1. Aims of the Studies

The main aims of the studies are assessing the quality of data (19 publications [ 30 , 34 , 42 , 44 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 ]) and assessing the performance of the PHIS (17 publications [ 15 , 22 , 34 , 40 , 42 , 45 , 50 , 58 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 ]). Five studies assessed data use and explored the factors influencing data use [ 26 , 27 , 52 , 70 , 71 ]. Four studies investigated the facilitators and barriers for achieving high quality data and systems [ 22 , 40 , 59 , 65 ]. Three studies compared or developed methods for the improvement of data quality assessment or data exchange [ 54 , 56 , 72 ]. Finally two studies assessed data flow [ 30 , 70 ].

The institutions tended to focus on the PHIS system and the data [ 15 , 30 , 34 , 40 , 42 , 44 , 45 ]. Data use, comparison of different PHIS, identification of the factors related to poor data quality, and analysis of data flow were also reported in research papers [ 22 , 26 , 27 , 52 , 54 , 56 , 59 , 61 , 65 , 70 , 71 , 72 , 73 ].

3.2. Context and Scope of the Studies

The contexts of the studies were primarily confined to the public health domain, with other settings addressed occasionally.

Two types of public health context were covered in the institutional publications. The first included specific disease and health events, such as AIDS, tuberculosis, malaria, and immunization [ 15 , 34 , 42 ]. The latter was the public health system. This included public health project/program data management and reporting, routine health information systems, and PHIS under a national health institute [ 34 , 40 , 41 , 44 , 45 ].

Most research studies were conducted in disease-specific public health contexts. Ten were in the maternal and children’s health setting, e.g., immunization, childbirth, maternal health and hand-foot-mouth disease [ 47 , 53 , 56 , 57 , 58 , 68 , 69 , 70 , 72 , 73 ]. Another five were delivered in the context of HIV/AIDS prevention and care [ 48 , 49 , 63 , 65 , 67 ]. Two studies were related to tuberculosis [ 46 , 61 ]. Other contexts included multi-disease surveillance system, primary health care, acute pesticide poisoning, road data or road safety, aboriginal health, monkey pox, and cancer [ 22 , 26 , 51 , 52 , 55 , 59 , 66 , 74 ]. In addition, clinical information management was studied in four research papers [ 50 , 54 , 62 , 71 ]. National health management information systems were studied in one publication [ 27 ].

The public health data from information systems operated by agencies other than public health were also assessed. They include the National Coronial Information System managed by the Victorian Department of Justice in Australia, women veteran mortality information maintained by the U.S. Department of Veterans’ Affairs, and military disability data from U.S. Navy Physical Evaluation Board [ 47 , 52 , 64 ].

The studies were conducted at different levels of the PHIS, including health facilities that deliver the health service and collect data (e.g., clinics, health units, or hospitals), and district, provincial and national levels where PHIS data are aggregated and managed. The institutions took a comprehensive approach targeting all levels of PHIS [ 15 , 30 , 34 , 40 , 42 , 44 , 45 ]. Twenty-seven research studies were conducted at a single level [ 22 , 26 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 59 , 61 , 62 , 63 , 64 , 66 , 68 , 69 , 70 , 71 , 72 , 73 , 74 ]. Of these, 14 were conducted at data collection and entry level. The other 13 studies assessed the PHIS at management level. Only five research papers covered more than one level of the system [ 27 , 58 , 60 , 65 , 67 ], two of which were multi-country studies [ 58 , 67 ]. Lin et al. studied the surveillance system at national level, provincial level, and at surveillance sites [ 65 ].

3.3. Methods for Data Quality Assessment

Analysis of methods for data quality assessment in the reviewed publications is presented in three sections, based on the dimensions of data quality that were covered: data, data use or data collection process. Seven perspectives were reviewed, including quality attributes for each dimension, major measurement indicators for each attribute, study design/method of assessment, data collection methods, data analysis methods, contributions and limitations.

3.3.1. Methods for Assessment of the Dimension of Data

In this section, the concept of data quality is a narrow one, meaning the quality of the dimension of data. All of the institutional publications and 28 research papers, a total of 35 articles, conducted assessment of the quality of data [ 15 , 22 , 30 , 34 , 40 , 42 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 72 , 73 , 74 ]. Matheson et al. introduced the attributes of data quality but did not give assessment methods [ 71 ]. Additional information is provided in Table A1 .

Quality Attributes of Data and Corresponding Measures

A total of 49 attributes were used in the studies to describe data quality, indicating its multi-dimensional nature. Completeness, accuracy and timeliness were the three attributes measured most often.

Completeness was the most-used attribute of data quality in 24 studies (5 institutional and 19 research publications) [ 15 , 22 , 34 , 40 , 42 , 44 , 46 , 48 , 49 , 50 , 51 , 54 , 57 , 61 , 62 , 63 , 64 , 65 , 66 , 68 , 69 , 72 , 73 , 74 ]. This was followed by accuracy, in 5 institutional and 16 research publications [ 15 , 30 , 34 , 40 , 42 , 46 , 48 , 49 , 50 , 51 , 52 , 53 , 56 , 57 , 58 , 63 , 64 , 65 , 69 , 72 , 74 ]. The third most-used attribute, timeliness, was measured in 5 institutional and 4 research publications [ 22 , 30 , 40 , 42 , 44 , 45 , 64 , 69 , 73 ].

The attributes of data quality are grouped into two types: those of good data quality and those of poor data quality (see Table 2 ).

Attributes of data quality.

ItemAttribute
High data quality (38)Completeness, accuracy or positional accuracy, timeliness or up-datedness or currency, validity, periodicity, relevance, reliability, precision, integrity, confidentiality or data security, comparability, consistency or internal consistency or external consistency, concordance, granularity, repeatability, readily useableness or usability or utility, objectivity, ease with understanding, importance, reflecting actual sample, meeting data standards, use of standards, accessibility, transparency, representativeness, disaggregation, data collection method or adjustment methods or data management process or data management
Poor data quality (11)Missing data, under-reporting, inconsistencies, data errors or calculation errors or errors in report forms or errors resulted from data entry, invalid data, illegible hand writing, non-standardization of vocabulary, and inappropriate fields

Inconsistencies in the definition of attributes were identified. The same attribute was sometimes given different meanings by different researchers. One example of this was “completeness”. Some institutions required conformity to the standard process of data entry, such as filling in data elements in the reporting forms [ 15 , 40 , 41 , 44 ]. Completeness was represented as the percentage of blank or unknown data, not zero/missing, or proportion of filling in all data elements in the facility report form [ 15 , 40 , 41 , 44 ]. The ME PRISM, instead, defined completeness as the proportion of facilities reporting in an administrative area [ 40 ]. The other definition of completeness was the correctness of data collection methods in ME DQA, i.e. , “complete list of eligible persons or units and not just a fraction of the list” [ 34 ].

Of the 19 research papers including completeness as an attribute, 12 measured the completeness of data elements as “no missing data or blank” [ 22 , 46 , 48 , 49 , 50 , 51 , 57 , 63 , 69 , 72 , 73 , 74 ]. Dixon et al. defined completeness as considering both filling in data elements and data collection methods [ 54 ]. Four studies measured completeness of data by the sample size and the percentage of health facilities that completed data reports [ 61 , 65 , 66 , 68 ]. The remaining two studies did not give precise definitions [ 51 , 64 ].

On the other hand, different attributes could be given the same meaning. For example, the ME DQA defined accuracy as “validity”, which is one of two attributes of data quality in CDC’s Guidelines [ 15 , 34 ]. Makombe et al. considered that data were accurate if none of the examined variables in the site report was missing [ 49 ]. This is similar to the definition of completeness, as “no missing data” or “no blank of data elements” in the reports by other studies.

Study Design

Quantitative methods were used in all studies except that of Lowrance et al. who used only qualitative methods [ 63 ]. Retrospective, cross-sectional survey was commonly used for quantitative studies. Pereira et al. conducted a multi-center randomized trial [ 72 ].

Qualitative methods, including review of publications and documentations, interviews with key informants, and field observations, were also used in 8 studies [ 34 , 45 , 50 , 57 , 61 , 65 , 69 , 72 ]. The purpose of the application of qualitative methods was primarily to provide the context of the findings from the quantitative data. For example, Hahn et al. conducted a multiple-case study in Kenya to describe clinical information systems and assess the quality of data. They audited a set of selected data tracer items, such as blood group and weight, to assess data completeness and accuracy. Meanwhile, they obtained end-users’ views of data quality from structured interviews with 44 staff members and qualitative in-depth interviews with 15 key informants [ 50 ].

The study subjects varied. In 22 publications, the study subjects were entirely data [ 15 , 42 , 44 , 46 , 47 , 48 , 49 , 51 , 52 , 53 , 54 , 55 , 56 , 58 , 59 , 60 , 64 , 66 , 67 , 68 , 73 , 74 ]; in four of these publications, they were entirely users or stakeholders of the PHIS [ 30 , 45 , 62 , 63 ]. Three publications studied both the data and the users [ 22 , 50 , 72 ]. Study subjects in research included data and documentations by Dai et al. [ 69 ]; data, documentation of instructions, and key informants in four studies [ 34 , 40 , 57 , 61 ]; and data, user, documentations of guidelines and protocols, and the data collection process by Lin et al. [ 65 ]. Both data and users as study subjects were reported in eight publications [ 22 , 34 , 40 , 50 , 57 , 61 , 65 , 72 ].

The sampling methods also varied. Only the study by Clayton et al. calculated sample size and statistical power [ 56 ]. Freestone et al. determined the sample size without explanation [ 52 ]. One study used two-stage sampling [ 56 ]. Ten studies used multi-stage sampling methods [ 22 , 34 , 42 , 48 , 52 , 55 , 56 , 58 , 68 , 72 ]. The rest used convenience or purposive sampling. The response rates were reported in two studies [ 62 , 72 ].

The data collection period ranged from one month to 16 years [ 67 , 74 ]. The study with the shortest time frame of one month had the maximum number of data records, 7.5 million [ 67 ], whereas the longest study, from 1970 to 1986, collected only 404 cases of disease [ 74 ]. The sample size of users ranged from 10 to 100 [ 45 , 61 ].

Data Collection Methods

Four methods were used individually or in combination in data collection. These were: field observation, interview, structured and semi-structured questionnaire survey, and auditing the existing data. Field observation was conducted using checklist and rating scales, or informal observations on workplace walkthroughs [ 34 , 40 , 50 , 65 ]. Open, semi-structured or structured interviews were used when the study subjects were users or stakeholders of the PHIS [ 30 , 40 , 45 , 50 , 57 , 61 , 62 , 63 , 65 ]. Auditing was used in directly examining existing datasets in PHIS, looking for certain data elements or variables. The benchmarks used for auditing included: in-house-defined data standards, international or national gold standards, and authoritative datasets [ 15 , 40 , 42 , 44 , 46 , 48 , 49 , 51 , 52 , 53 , 54 , 55 , 56 , 58 , 59 , 64 , 66 , 67 , 68 , 72 , 73 , 74 ]. The effect of auditing was enhanced by field observations to verify the accuracy of data sets [ 34 , 40 , 42 , 50 , 58 , 65 ].

Data Analysis Methods

Data analysis methods were determined by the purpose of the study and the types of data collected.

For the quantitative data, descriptive statistics were often used. For example, continuous data were usually analyzed by the value of percentage, particularly for the data about completeness and accuracy, to ascertain whether they reached the quality standards. This method was most often used in 24 papers [ 22 , 34 , 40 , 42 , 44 , 46 , 47 , 48 , 49 , 50 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 64 , 65 , 66 , 68 , 72 , 73 ]. Plot chart, bubble scatter chart, and confidence intervals were also used in two studies [ 52 , 68 ]. Other common statistical techniques included: correlation relationship, the Chi-square test, and the Mann–Whitney test [ 56 , 58 , 68 ]. The geographic information system technique was reported in 3 studies [ 51 , 52 , 74 ]. Seven studies reported the use of questionnaires or checklists with a Likert scale or a yes/no tick, as well as simple, summative and group scoring methods [ 30 , 34 , 40 , 45 , 58 , 61 , 62 ].

In the publications with data as the study subject, a certain number of data variables were selected, but the reason(s) for the section was (were) not always given. They included elements of demographics such as age, gender, and birth date, and specific information such as laboratory testing results, and disease code. The minimum and maximum number of data variables was 1 and 30, respectively [ 58 , 59 ].

The qualitative data were transcribed first before semantic analysis by theme grouping methods [ 63 ].

3.3.2. Methods for Assessment of the Dimension of Data Use

Ten studies, including one institutional publication and nine research papers, are reviewed in this section [ 26 , 27 , 40 , 45 , 50 , 52 , 61 , 62 , 70 , 71 ]. Five studies were concerned with the assessment of data use and the factors influencing data use [ 26 , 27 , 52 , 70 , 71 ]. The other five included assessment of data use, but this was not always highlighted [ 40 , 45 , 50 , 61 , 62 ]. Details are given in Table A2 .

Quality Attributes of Data Use and Corresponding Measures

A total of 11 attributes were used to define the concept of data use. These were: trend in use, use of data or use of information, system use or usefulness of the system, intention to use, user satisfaction, information dissemination or dissemination of data, extent of data source recognition and use or specific uses of data, and existence and contents of formal information strategies and routines.

The measures fall into three categories: data use for the purpose of action, planning and research; strategies and mechanisms of data use; and awareness of data sources and data use.

The first category of measures was mentioned in eight studies [ 26 , 40 , 45 , 50 , 52 , 61 , 70 , 71 ]. For example, actioned requests from researchers, the number of summaries/reports produced, and the percentage of report use [ 40 , 52 , 71 ]. Freestone et al. calculated actioned requests from researchers who do not have access to the PHIS [ 52 ]. The measurement indicators in ME PRISM were report production and display of information. They were assessed by whether and how many reports containing data from the PHIS were compiled, issued, fed back and displayed for a set time frame [ 40 ]. Saeed et al. assessed the use of data by predefined criteria, including the availability of comprehensive information, whether data were used for planning and action at each level, and whether feedback was given to the lower organizational level of the public health system [ 61 ].

The second category of measures was assessed in five studies [ 26 , 27 , 45 , 61 , 70 ]. The criteria of the measurement included the availability of a feedback mechanism, policy and advocacy, the existence and the focus of formal information strategies, and routines of data use [ 26 , 45 , 70 ].

The third category measured users’ awareness of data use which was reported in two studies [ 26 , 62 ]. Petter and Fruhling applied the DeLone and McLean information systems success model [ 62 ]. They used the framework to evaluate system use, intention to use, and user satisfaction in 15 questions by considering the context of the PHIS, which was an emergency response medical information system. Wilkinson and McCarthy recommended examining whether the studied information systems were recognized by the users in order to assess the extent of data source recognition among respondents [ 26 ].

Three studies only used quantitative methods [ 40 , 52 , 62 ] and three studies only used qualitative methods [ 27 , 50 , 70 ]. The remaining four studies combined qualitative and quantitative methods [ 26 , 45 , 61 , 71 ]. Interviews, questionnaire surveys, reviews of documentation and abstracts of relevant data were used in the studies.

The sources of information for the study subjects included users and stakeholders, existing documents, and data from the PHIS. Study subjects were all users in six studies [ 26 , 27 , 45 , 50 , 62 , 70 ], and all data in the study by Freestone et al. [ 52 ]. Both user and documentation were study subjects in two studies [ 40 , 61 ], and together with data in another study [ 71 ]. Convenience or purposive sampling was generally used.

Among nine studies whose study subjects were users, structured and semi-structured questionnaire surveys, group discussions, and in-depth interviews were used to collect data. Use of self-assessment, face-to-face communication, telephone, internet telephony, online, email, facsimile and mail were reported in the studies. For example, Wilkinson and McCarthy used a standardized semi-structured questionnaire for telephone interviews with key informants [ 26 ]. Petter and Fruhling used an online survey as well as facsimile and mail to the PHIS users [ 62 ]. Qazi and Al administered in-depth, face-to-face and semi-structured interviews with an interview guide [ 27 ]. Saeed et al. predefined each criterion for data use and measured it by a 3-point Likert scale. They assessed each criterion through interviewing key informants and consulting stakeholders . Desk review of important documents, such as national strategic plans, guidelines, manuals, annual reports and databases was also reported in their study [ 61 ].

Four studies assessing data use by data and documentation either queried information directly from the data in the studied PHIS, if applicable, or collected evidence from related documents such as reports, summaries, and guidelines [ 40 , 52 , 61 , 71 ]. The data to be collected included actioned requests, the number of data linked to action, and the number of data used for planning. Time for data collection varied without explanation, such as 12 months in ME PRISM or six years by Freestone et al. [ 40 , 52 ].

The data collected from qualitative studies were usually processed manually, organized thematically or chronologically. They were either analyzed by classification of answers, grouping by facility or respondent’s role, or categorization of verbatim notes into themes.

Various strategies were applied for quantitative data. For example, Wilkinson and McCarthy counted the same or similar responses to indicate frequency of beliefs/examples across participants [ 26 ]. Data in their study were analyzed individually, by role and aggregated level. Some correlational analyses, such as Pearson’s r for parametric data and Spearman’s Rho for non-parametric data, were conducted to identify possible relationships between data use, perceptions of data, and organizational factors. Petter and Fruhling conducted hypothesis analysis in structured questionnaire with a 7-point Likert scale for all quantitative questions [ 62 ]. Due to the small sample size of 64 usable responses, they used summative scales for each of the constructs. All of the items used for a specific construct were averaged to obtain a single value for this construct. Then, using this average score, each hypothesis was tested using simple regression.

3.3.3. Methods for Assessment of the Dimension of Data Collection Process

Although the aim of assessing data flow or the process of data collection was only stated in two studies, another 14 articles were found that implicitly assessed data collection process [ 22 , 30 , 34 , 40 , 42 , 45 , 50 , 52 , 55 , 58 , 59 , 60 , 65 , 67 , 69 , 70 ]. These articles were identified through a detailed content analysis. For example, data collection process assessment activities were sometimes initiated by identification of the causes of poor data quality [ 52 , 55 , 59 ]. Or data collection process was considered as a component of the evaluation of the effectiveness of the system [ 22 , 34 , 42 , 45 , 58 , 60 , 65 , 69 ]. Three studies led by two institutions, CIHI and MEASURE Evaluation Project, assessed data collection process while conducting assessment of the quality of the data, [ 30 , 40 , 50 ]. Details are given in Table A3 .

Quality Attributes of Data Collection Process and Corresponding Measures

A total of 23 attributes of data collection process were identified. These were: quality index or quality scores or functional areas, root causes for poor data quality, metadata or metadata documentation or data management or case detection, data flow or information flow chart or data transmission, data collection or routine data collection or data recording or data collection and recording processes or data collection procedures, data quality management or data quality control, statistical analysis or data compilation or data dissemination, feedback, and training.

Only four studies explicitly defined the attributes of the dimension of data collection process, two of them from institutions [ 40 , 45 , 52 , 70 ]. Data collection was the most-used attribute in six publications [ 34 , 40 , 52 , 65 , 67 , 69 , 70 ]. The next most-assessed attribute is data management processes or data control reported in four publications [ 34 , 45 , 67 , 69 ].

Data collection process was sometimes considered a composite concept in six studies, four of them proposed by institutions [ 30 , 34 , 42 , 45 , 58 , 60 ]. For example, the quality index/score was composed of five attributes: recording practices, storing/reporting practices, monitoring and evaluation, denominators, and system design (the receipt, processing, storage and tabulation of the reported data) [ 42 , 58 , 60 ]. Metadata documentation or metadata dictionary cover dataset description, methodology, and data collection, capture, processing, compilation, documentation, storage, analysis and dissemination [ 30 , 45 ]. The ME DQA assessed five functional areas, including structures, functions and capabilities, indicator definitions and reporting guidelines, data collection and reporting forms and tools, data management processes, and links with the national reporting system [ 34 ].

Seven studies only used qualitative methods [ 50 , 52 , 55 , 59 , 65 , 69 , 70 ], five only conducted quantitative research [ 22 , 30 , 40 , 58 , 67 ], and four used both approaches [ 34 , 42 , 45 , 60 ]. Questionnaire surveys were reported in 10 papers [ 22 , 30 , 34 , 40 , 42 , 45 , 58 , 60 , 67 , 70 ]. Interviews were conducted in 3 studies [ 34 , 50 , 70 ]. Focus group approaches, including consultation, group discussion, or meeting with staff or stakeholders, were reported in four studies [ 45 , 52 , 59 , 65 ]. Review of documentation was conducted in five papers [ 34 , 40 , 52 , 55 , 69 ], and field observation was used in five studies [ 34 , 40 , 50 , 52 , 65 ].

Data Collection and Analysis Methods

The study subjects included managers or users of the PHIS, the documentation of instructions and guidelines of data management for the PHIS, and some procedures of data collection process. The study subjects were entirely users in eight studies [ 22 , 30 , 40 , 45 , 58 , 59 , 67 , 70 ]. Corriols et al. and Dai et al. only studied documentation such as evaluation reports on the PHIS including deficiency in the information flow chart and non-reporting by physicians [ 55 , 69 ]. Data collection process was studied in six publications [ 34 , 45 , 50 , 52 , 60 , 65 ]. Of these, four studies combined data collection procedures with users and documentation [ 34 , 42 , 52 , 65 ], while Hahn et al. only observed data collection procedures and Ronveaux et al. surveyed users and observed data collection procedures for a hypothetical population [ 50 , 60 ].

The data collection methods included field observation, questionnaire surveys, consensus development, and desk review of documentation. Field observations were conducted either in line with a checklist or in an informal way [ 34 , 40 , 50 , 52 , 60 , 65 ]. Lin et al. made field observations of the laboratory staff dealing with specimens and testing at the early stage of the data collection process [ 65 ]. Freestone et al. observed data coders’ activities during the process of data geocoding and entry [ 52 ]. Hahn et al. followed the work-through in study sites [ 50 ]. WHO DQA conducted field observations on sites of data collection, processing and entry [ 42 ], while Ronveaux et al. observed workers at the health-unit level who completed some data collection activities for 20 hypothetical children [ 60 ]. ME DQA made follow-up on-site assessment of off-site desk-reviewed documentation at each level of the PHIS [ 34 ].

Questionnaire surveys included semi-structured and structured ones [ 22 , 30 , 34 , 40 , 42 , 45 , 58 , 60 , 67 , 70 ]. The questionnaire data were collected by face-to-face interviews, except one online questionnaire survey study by Forster et al. [ 67 ]. Five studies used a multi-stage sampling method [ 22 , 34 , 42 , 58 , 60 ]. The rest surveyed convenience samples or samples chosen according to a particular guideline, which was sometimes not described [ 30 , 34 , 40 ].

Consensus development was mainly used in group discussion and meetings, guided by either structured questionnaires or data quality issues [ 45 , 59 ]. Ancker et al. held a series of weekly team meetings over about four months with key informants involved in data collection [ 59 ]. They explored the root causes of poor data quality in line with the issues identified from assessment results. WHO HMN organized group discussions with approximately 100 major stakeholders [ 45 ]. Five measures related to data collection process were contained in a 197-item questionnaire. The consensus to each measure was reached through self-assessment, individual or group scoring to yield a percentage rating [ 45 ].

Desk review of documentation was reported in six studies [ 34 , 52 , 55 , 65 , 69 , 70 ]. The documentation included guidelines, protocols, official evaluation reports and those provided by data management units. The procedures for appraisal and adoption of relevant information were not introduced in the studies.

Data analysis methods for quantitative studies were mainly descriptive statistics. Most papers did not present the methods for analysis of the qualitative data. Information retrieved from the qualitative study was usually triangulated with findings from quantitative data.

3.4. Summary of the Findings

Four major themes of the results have emerged after our detailed analysis, which are summarized in this section.

The first theme is there are differences between the seven institutional and the 32 individual research publications in their approach to data quality assessment, in terms of aims, context and scope. First, the effectiveness of the PHIS was more of an institutional rather than a researcher’s interest. It was covered in all of the institutional publications but only in one-third of the research papers. Second, the disease-specific public health contexts covered by United Nations’ MDGs, maternal health, children’s health, and HIV/AIDS, were the area most often studied by researchers. Whereas the institutions also paid attention to the routine PHIS. Third, the institutions tended to evaluate all levels of data management whereas most research studies were focused on a single level of analysis, either record collection or management.

The second theme is coverage of the three dimensions of data quality was not equal. The dimension of data was most frequently assessed (reported in 35 articles). Data use was explicitly assessed in five studies and data collection process in one. Implicit assessment of data use and data collection process was found in another five and 15 papers, respectively. The rationale for initiating these implicit assessments was usually to identify factors arising from either data use or data collection process while assessing the quality of data. Within studies that considered more than one dimension of data quality, 15 assessed both data and data collection process, seven assessed data and data use and one, both data use and data collection process. Only four studies assessed all three dimensions of data quality.

The third emerging theme is a lack of clear definition of the attributes and measurement indicators of each dimension of data quality. First, a wide variation of the definition of the key terms was identified, including the different terms for the same attribute, and the same term to refer to distinct attributes. The definition of attributes and their associated measures was sometimes given based on intuition, prior experience, or the underlying objectives unique to the PHIS in a specific context.

Second, the attributes of the quality of data were relatively developed than those for the dimensions of data use and data collection process. Most definitions of data quality attributes and measures are referred to the dimension of data as opposed to the other two dimensions, the attributes of which were primarily vague or obscure. One clear gap is the absence of the attributes of the dimension of data collection process.

Third, a consensus has not been reached as to what attributes should be measured. For example, a large variety existed in the number of attributes measured in the studies varied between 1 and 8, in a total of 49 attributes. The attribute of data quality in public health is often measured positively in terms of what it is. The three most-used attributes of good data quality were completeness, accuracy, and timeliness. The institutions tended to assess more attributes of data quality than individual researchers. The number of attributes reported in research papers was no more than four, while the institutions assessed at least four attributes.

The last emerging theme of the results is methods of assessment lack systematic procedures. Quantitative data quality assessment primarily used descriptive surveys and data audits, while qualitative data quality assessment methods include primarily interview, documentation review and field observation. Both objective and subjective strategies were identified among the methods for assessing data quality. The objective approach applies quantifiable measurements to directly examine the data according to a set of data items/variables/elements/tracer items. The subjective approach measures the perceptions of the users and stakeholders of the PHIS. However, only a small minority of the reviewed studies used both types of assessment. Meanwhile, field verification of the quality of data is not yet a routine practice in data quality assessment. Only five studies conducted field observations for data or for data collection process and they were usually informal. The reliability and validity of the study was rarely reported.

4. Discussion

Data are essential to public health. They represent and reflect public health practice. The broad application of data in PHIS for the evaluation of public health accountability and performance has raised the awareness of public health agencies of data quality, and of methods and approaches for its assessment. We systematically reviewed the current status of quality assessment for each of the three dimensions of data quality: data, data collection process and data use. The results suggest that the theory of measurement has been applied either explicitly or implicitly in the development of data quality assessment methods for PHIS. The majority of previous studies assessed data quality by a set of attributes using certain measures. Our findings, based on the proposed conceptual framework of data quality assessment for public health, also identified the gaps existed in the methods included in this review.

The importance of systematic, scientific data quality assessment needs to be highlighted. All three dimensions of data quality, data, data use and data collection process, need to be systematically evaluated. To date, the three dimensions of data quality were not given the same weight across the reviewed studies. The quality of data use and data collection process has not received adequate attention. This lack of recognition of data use and data collection process might reflect a lack of consensus on the dimensions of data quality. Because of the equal contributions of these three dimensions to data quality, they should be given equal weight in data quality assessment. Further development in methods to assess data collection process and data use is required.

Effort should also be directed towards clear conceptualisation of the definitions of the relevant terms that are commonly used to describe and measure data quality, such as the dimensions and attributes of data quality. The lack of clear definition of the key terms creates confusions and uncertainties and undermines the validity and reliability of data quality assessment methods. An ontology-based exploration and evaluation from the perspective of data users will be useful for future development in this field [ 33 , 75 ]. Two steps that involve conceptualization of data quality attributes and operationalization of corresponding measures need to be taken seriously into consideration and rationally followed as shown in our proposed conceptual framework.

Data quality assessment should use mixed methods (e.g., qualitative and quantitative assessment methods) to assess data from multiple sources (e.g., records, organisational documentation, data collection process and data users) and used at different levels of the organisation [ 33 , 35 , 36 , 38 , 75 , 76 ]. More precisely, we strongly suggest that subjective assessments of end-users’ or customers’ perspectives be an indispensible component in data quality assessment for PHIS. The importance of this strategy has long been articulated by the researchers [ 33 , 75 , 76 ]. Objective assessment methods assess the data that were already collected and stored in the PHIS. Many methods have been developed, widely accepted and used in practice [ 38 , 76 ]. On the other hand, subjective assessments provide a supplement to objective data quality assessment. For example, interview is useful for the identification of the root causes of poor data quality and for the design of effective strategies to improve data quality. Meanwhile, field observation and validation is necessary wherever it is possible because reference of data to the real world will give data users confidence in the data quality and in application of data to public health decision-making, action, and outcomes [ 52 ]. The validity of a study would be doubtful if the quality of data could not be verified in the field [ 36 ], especially when the data are come from a PHIS consisting of secondary data.

To increase the rigor of data quality assessment, the relevant statistical principles for sample size calculation, research design, measurement and analysis need to be adhered to. Use of convenience or specifically chosen sampling methods in 24 studies included in this review reduced the representativeness and generalizability of the findings of these studies. At the same time, reporting of data quality assessment needs to present the detailed procedures and methods used for the study, the findings and limitations. The relatively simple data analysis methods using only descriptive statistics could lead to loss of useful supportive information.

Finally, to address the gaps identified in this review, we suggest re-prioritizing the orientation of data quality assessment in future studies. Data quality is influenced by technical, organizational, behavioural and environmental factors [ 35 , 41 ]. It covers large information systems contexts, specific knowledge and multi-disciplinary techniques [ 33 , 35 , 75 ]. Data quality in the reviewed studies is frequently assessed as a component of the quality or effectiveness or performance of the PHIS. This may reflect that the major concern of public health is in managerial efficiency, especially of the PHIS institutions. Also, this may reflect differences in the resources available to, and the responsibilities of institutions and individual researchers. However, data quality assessment hidden within other scopes may lead to ignorance of data management and thereby the unawareness of data quality problems enduring in public health practice. Data quality needs to be positioned at the forefront of public health as a distinct area that deserves specific scientific research and management investment.

While this review provides a detailed overview of data quality assessment issues, there are some limitations in its coverage, constrained by the access to the databases and the breadth of public health information systems making it challenge to conduct systematic comparison among studies. The search was limited by a lack of subject headings for data quality of PHIS in MeSH terms. This could cause our search to miss some relevant publications. To compensate for this limitation, we used the strategy of searching well-known institutional publications and manually searching the references of each article retrieved.

Our classification process was primarily subjective. It is possible that some original researchers disagree with our interpretations. Each assessment method has contributions and limitations which make the choices difficult. We provided some examples of approaches to these issues.

In addition, our evaluation is limited by an incomplete presentation of details in some of the papers that we reviewed. A comprehensive data quality assessment method includes a set of guidelines and techniques that defines a rational process to assess data quality [ 37 ]. The detailed procedure of data analysis, data quality requirements analysis, and identification of critical attributes is rarely given in the reviewed papers. A lack of adequate detail in the original studies could have affected the validity of some of our conclusions.

5. Conclusions

Public health is a data-intensive field which needs high-quality data to support public health assessment, decision-making and to assure the health of communities. Data quality assessment is important for public health. In this review of the literature we have examined the data quality assessment methods based on our proposed conceptual framework. This framework incorporates the three dimensions of data quality in the assessment methods for overall data quality: data, data use and data collection process. We found that the dimension of the data themselves was most frequently assessed in previous studies. Most methods for data quality assessment evaluated a set of attributes using relevant measures. Completeness, accuracy, and timeliness were the three most-assessed attributes. Quantitative data quality assessment primarily used descriptive surveys and data audits, while qualitative data quality assessment methods include primarily interview, documentation review and field observation.

We found that data-use and data-process have not been given adequate attention, although they were equally important factors which determine the quality of data. Other limitations of the previous studies were inconsistency in the definition of the attributes of data quality, failure to address data users’ concerns and a lack of triangulation of mixed methods for data quality assessment. The reliability and validity of the data quality assessment were rarely reported. These gaps suggest that in the future, data quality assessment for public health needs to consider equally the three dimensions of data quality, data, data use and data process. More work is needed to develop clear and consistent definitions of data quality and systematic methods and approaches for data quality assessment.

The results of this review highlight the need for the development of data quality assessment methods. As suggested by our proposed conceptual framework, future data quality assessment needs to equally pay attention to the three dimensions of data quality. Measuring the perceptions of end users or consumers towards data quality will enrich our understanding of data quality issues. Clear conceptualization, scientific and systematic operationalization of assessment will ensure the reliability and validity of the measurement of data quality. New theories on data quality assessment for PHIS may also be developed.

Acknowledgments

The authors wish to gratefully acknowledge the help of Madeleine Strong Cincotta in the final language editing of this paper.

Characteristics of methods for assessment of the data dimension reported in the 36 publications included in the review.

Authors YearAttributes Major measuresStudy designData collection methodsData analysis methodsContributionLimitations
Ancker 2011 [ ]Percentage of missing data, inconsistencies and potential errors of different variables; number of duplicate records, number of non-standardization of vocabulary, number of inappropriate fieldsQuantitative audit of data attributes of dataset.Selected one data set and used tools to query 30 variables, manually assessed data formatsRates, percentage or countsIdentified data quality issues and their root causes.Need a specific data query tool
Bosch-Capblanch 2009 [ ]Accuracy
Proportions in the relevant data set, such as the recounted number of indicator’s data by the reported number at the next tier in the reporting system. A ratio less than 100% indicates “over-reporting”; a ratio over 100% suggests “under-reporting”
Quantitative audit of data accuracy by external auditors applying WHO DQA in 41 countriesA multistage weighted representative random sampling procedure, field visits verifying the reported data. Compared data collected from fields with the reports at the next tierPercentage, median, inter-quartile range, 95% confidence intervals, ratio (verification factor quotient) adjusted and extrapolatedSystematic methodology to describe data quality and identify basic recording and reporting practices as key factors and good practicesLimited attributes, lack of verification of source of actual data and excluded non-eligible districts
CDC 2001 [ ]Completeness, accuracy
Percentage of blank or unknown responses, ratio of recorded data values over true values
Quantitative audit of dataset, a review of sampled data, a special record linkage, or a patient interviewCalculating the percentage of blank or unknown responses to items on recording forms, reviewing sampled data, conducting record linkage, or a patient interviewDescriptive statistics: percentageProvides generic guidelinesLack of detail on procedures, needs adjustment
Chiba 2012 [ ]Completeness: percentage of complete data.
Accuracy: 1-percentage of the complete data which were illegible, wrongly coded, inappropriate and unrecognized.
Relevance: comparing the data categories with those in upper level report to evaluate whether the data collected satisfied management information needs
Quantitative verification of data accuracy and completeness, and qualitative verification of data relevance in a retrospective comparative case studyPurposive sampling, clinical visits, re-entered and audited 30 data categories of one year data to evaluate accuracy and completeness; qualitatively examined data categories and instructions to assess the relevance, completeness and accuracy of the data, semi-structured interviews to capture factors that influence data qualityDescriptive statistics for accuracy and completeness of the data. Qualitative data were thematically grouped and analyzed by data categories, instructions, and key informants’ viewsQuantitative and qualitative verification of data quality; comparison of two hospitals increased generalizability of the findingsConsistency and timeliness were not assessed. Data from the system were not able to be validated
CIHI 2009 [ ]Accuracy: coverage, capture and collection, unit non-response, item (partial) non-response, measurement error, edit and imputation, processing and estimation. Timeliness: data currency at the time of release, documentation currency. Comparability: data dictionary standards, standardization, linkage, equivalency, historical comparability. Usability: accessibility, documentation, interpretability.
Relevance: adaptability, value.
Quantitative method, user survey-questionnaireQuestionnaire by asking users, three ratings of each construct, including met, not met, unknown or not applicable (or minimal or none, moderate, significant or unknown) All levels of the system were taken into account in the assessmentDescriptive statistics for ratings by each criterion, the overall assessment for a criterion based on the worst assessment of the applicable levelsData quality assessed from user’s perspective provides comprehensive characteristics and criteria of each dimension of data quality. 5 dimensions, 19 characteristics and 61criteriaUndefined procedures of survey including sample size. Being an internal assessment, rating scores were used for internal purposes
Clayton 2013 [ ]Accuracy
Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV)
Quantitative method to audit dataset by power calculation of 840 medical recordsTwo stage sampling of study sites, abstracting records and auditing 25 data variables to assess accuracy of the data reported on three data sourcesDescriptive statistics were calculated for each data sources; summary measure of kappa values sing the paired sample Wilcoxon signed rank testAccessing and linking three data sources—maternal medical charts, birth certificates and hospital discharge data whose access is limited and using the medical chart as the gold standardLimited generalizability of the findings; low sample size and limited representativeness
Corriols 2008 [ ]Under-reporting
Calculating the difference between registered cases and surveyed cases
Quantitative method to administer a cross-sectional survey in the country4 stage consistent random sampling method across the country. Face-to-face interview questionnaire survey.Descriptive statistics for estimation of national underreporting by using survey resultsGood representativeness of the study populationLack of case diagnosis information and the quality of the source of the data
Dai 2011 [ ]Under-reporting, errors on report forms, errors resulted from data entry; completeness of information, accuracy, timelinessQualitative and quantitative methods by reviewing publications on the system and data from the systemReviewing publications on the system and data from the systemDescriptive statistics for quantitative data and thematically grouping for qualitative dataEvaluated all existing sub-systems included in the systemUndefined procedures of review, lack of verification of source data
Authors YearAttributes Major measuresStudy designData collection methodsData analysis methodsContributionLimitations
Dixon 2011 [ ]Completeness
The proportion of diagnosed cases and the proportion of fields in a case report
Quantitative method by auditing datasetCreating a minimum data set of 18 key data elements, using structured query language (SQL) statements to calculate the percent completeness of each field of a total of 7.5 million laboratory reportsDescriptive statistics to calculate the difference between the completeness scores across samplesDevelopment of a method for evaluating the completeness of laboratory dataNeed a specific data query tool and only assessed completeness
Edmond 2011 [ ]Completeness, illegible hand writing, calculation errors
The proportion of the consultation rates for two items, the proportion of illegible hand writing and required clarification, and the proportion of calculation errors on the submitted record forms
Quantitative method: audit the submitted record forms in the dataset3303 cards from randomly selected five weeks from each year between 2003 and 2009Descriptive statistics for the percentage of each data quality attributeRandom selection of datasetOnly calculated completeness, without field verification of accuracy of data
Ford 2007 [ ]Accuracy
Sensitivity, specificity and positive predictive values
Quantitative method to use record linkage to audit dataset, comparing the system with a gold standard (a statewide audit dataset)Calculated data quality indicators for 18 data variables, compared with a statewide audit (gold standard), including 2432 babies admitted to NICUs, 1994–1996Descriptive statistics with exact binomial confidence intervals for data quality attributes, comparing two datasets by using the chi-square testThe findings are consistent with other validation studies that compare routinely collected population health data with medical recordsLack of verification of variations between two datasets, inadequate representativeness
Forster 2008 [ ]Missing data
The percentage of the missing data
Quantitative method to audit datasetAssessed data quality of a set of six key variables. A global missing data index was computed determining the median of the percentages missing data. Sites were ranked according to this indexConfidence interval (CI), Conbach’s, multivariate logic models, Spearman rank correlation coefficientDirectly examined associations between site characteristics and data qualityConvenience sample and uncertain generalizability
Freestone 2012 [ ]Accuracy, consistency, granularityQuantitative method to audit dataset from three components: source documents, data extraction/transposition, and data cleaningSystematic sampling 200 cases, each geocoded and comparatively assessed of data quality with and without the influence of geocoding, by pre-selected criteriaData quality measured by category: perfect, near perfect, poor. Paired t-test for 200 samples and chi-square test for yearQuantify data quality attributes with different factorsNo reference type and no field verification (for historic data)
Frizzelle 2009 [ ]Accuracy, completeness, currency
Assessed by positional errors, generalizations incompatible with highly accurate geospatial locations, updated with the change
Quantitative method to use geographic information systems (GIS) by developing a custom road dataset for analyzing data quality of four datasetsDeveloped a custom road dataset, and compared with four readily available public and commercial road datasets; developed three analytical measures to assess the comparative data qualityPercentage, concordance coefficients and Pearson correlation coefficientsExemplary to assessing the feasibility of readily available commercial or public road datasets and outlines the steps of developing a custom datasetNo field verification for historic data
Hahn 2013 [ ]Completeness, accuracy
The percentage of correctly or completely transmitted items from the original data source to secondary data sources
A multiple case study by quantitative and qualitative approaches in 3 antenatal care clinics of two private and one public Kenyan hospitalQuantitative method: selected 11 data tracer items followed retrospectively and audited compared to independently created gold standard. Qualitative methods: structured interviews and qualitative in-depth interviews to assess the subjective dimensions of data quality. Five-point scales were used for each statement. Purposeful sampling of 44 staff for survey and 15 staff for key informants interviewsQuantitative data: manual review, descriptive statistics, Kruskal-Wallis test, Mann-Whitney U test for continuous measures. Qualitative data: processed manually and classified and grouped by facility and staff classCombining different methods and viewing the information systems from different viewpoints, covering the quality of PHIS and drawing suggestions for improvement of data quality from qualitative results, likely to produce robust results in other settings
Harper 2011 [ ]Completeness: the proportion of filled fields on the reports. Validity: the proportion of the number of the written indicators against the assigned standard; the proportion of entered incorrect numbers; the proportion of illegible entries; the proportion of entries out of chronological orderQuantitative method to audit an electronic database that was manually extracted entries of a reference syndrome from anonymized dataset from the E-Book health registry entriesUsing a random systematic sample of 10% of the extracted entries ( , beginning with a randomly chosen starting point and then performing interval sampling to check 10% of records), with an acceptable error rate of <5%Descriptive statistics on attributes. To avoid bias, age and sex proportions were extracted from available records, the proportions compared to National Census data.Examine data quality using a reference syndrome, thus making it possible to provide informed recommendations. Descriptive data analysis provides grounded and useful information for decision makersNo evaluation of data collection methods
Hills 2012 [ ]Timeliness: the number of days between Service Date and Entry Date of submission of data to the system (three categories: ≤7 days, =8–30 days, and ≥31 days).
Completeness: the complete recording of data elements by calculating the proportion of complete fields over total number of fields
Quantitative method to audit data setUse a de-identified 757,476 demographic records and 2,634,101 vaccination records from the systemDescriptive statistics on attributesLarge dataset provides a statistically significant associationNot able to examine two highly relevant components of data quality: vaccination record coverage completeness and accuracy
Lash 2012 [ ]Completeness: the number of locations matching to latitude and longitude coordinates.
Positional accuracy: spatial resolution of the dataset. Concordance: the number of localities falling within the boundary. Repeatability: the georeferencing methodology
Georeferencing historic datasets, quantitative method research historic data with 404 recorded MPX cases in seven countries during 1970–1986 from 231 unique localitiesDevelop ecological niche models and maps of potential MPX distributions based on each of the three occurrence data sets with different georeferencing effortsDescriptive statistics on attributes and comparison of georeferencing match ratesDocument the difficulties and limitations in the available methods for georeferencing with historic disease data in foreign locations with poor geographic reference information.Not able to examine the accuracy of data source
Lin 2012 [ ]Completeness: sufficient sample size. Accuracy: data missing or discrepancies between questionnaires and databaseQuantitative and qualitative methods, auditing data set by cross-checking 5% questionnaires against the electronic database during the field visitsReview guidelines and protocols using a detailed checklist; purposive sampling; direct observations of data collection; cross-checking compared database with the questionnairesDescriptive statistics for attributes of data qualityMixed-methods to assess data qualityUnable to generalize the findings to the whole system
Litow and Krahl 2007 [ ]Accuracy, use of standards, completeness, timeliness, and accessibilityQuantitative method based on a framework developed for assessment of PHISExported and queried one year data by 12 data itemsDescriptive statistics for data quality attributesResearch on Navy population for public health applicability of the system and identified factors influencing data qualityNeeds a framework which was undefined in the research
Lowrance 2007 [ ]Completeness, updated-ness, accuracyQualitative method by following CDC’s Guidelines with qualitative methodsStandardized interviews with 18 key informants during 12 site visits, and meetings with stakeholders from government, non-governmental and faith-based organizations.Thematically grouping interview responsesData quality qualitatively assessed by key informants and stakeholdersLack of quantifiable information
Authors YearAttributes Major measuresStudy designData collection methodsData analysis methodsContributionLimitations
Makombe 2008 [ ]Completeness: filled fields; accuracy: no missing examined variables or a difference less than 5% compared to the supervision reportQuantitative methods to audit the quality of site reports as of the date of field supervisory visits6 case registration fields and 2 outcome data were examinedDescriptive statistics on attributes of data quality from site reported were compared to those of supervision reports (“gold standard”)Set up thresholds of accuracy, examine association between facility characteristics and data qualityOnly assessed aggregated facility-level rather individual patient data
Mate 2009 [ ]Completeness: no missing data in a period of time; accuracy: the value in the database was within 10% of the gold standard value or percentage deviation from expected for each data element when compared to the gold standard data setQuantitative methods to assess attributes. Completeness: surveying six data elements in one year dataset from all sample sites. Accuracy: surveying a random sample sites in three months to assess variation of three steps in data collection and reportingExtracted one year dataset for surveying data completeness of six data elements. Randomization sampling. Paralleled collection of raw data by on-site audit of the original data. Reconstructed an objective, quality-assured “gold standard” report dataset. All clinical sites were surveyed for data completeness, 99 sites were sampled for data accuracyDescriptive statistics, by using charts, average magnitude of deviation from expected, and data concordance analysis between reported data and reconstructed datasetLarge sample size, randomized sampling technique, the use of an objective, quality-assured “gold standard” report generated by on-site audit of the original data to evaluate the accuracy of data elements reported in the PHIS. Set up thresholds of accuracy and errorsSources of data were not verified
Matheson 2012 [ ] *Missing data, invalid data, data cleaning, data management processesNot conductedN/AN/AN/ALack of specific metrics
ME DQA 2008 [ ]Accuracy, reliability, precision, completeness, timeliness, integrity, confidentialityComprehensive audit in quantitative and qualitative methods including in-depth verifications at the service delivery sites; and follow-up verifications at the next level4 methods for selection of sites including purposive selection, restricted site design, stratified random sampling, random sampling; the time period corresponding to the most recent relevant reporting period for the IS. Five types of data verifications including description, documentation review, trace and verification (recount), cross-checks, spot-checks. Observation, interviews and conversations with key data quality officials were applied to collect dataDescriptive statistics on accuracy, availability, completeness, and timeliness of reported data, including results verification ratio of verification, percentage of each dimension, differences between cross-checkTwo protocols, 6 phases, 17 steps for the audit; sample on a limited scale considering the resources available to conduct the audit and level of precision desired; 2–4 indicators “case by case” purposive selection; on-site audit visits by tracing and verifying results from source documents at each level of the PHISConfined to specific disease context and standard program-level output indicators
ME PRISM 2010 [ ]Relevance: comparing data collected against management information needs. Completeness: filling in all data elements in the form, the proportion of facilities reporting in an administrative area. Timeliness: submission of the reports by an accepted deadline. Accuracy: comparing data between facility records and reports, and between facility reports and administrative area databasesQuantitative method, Questionnaire survey including data completeness and transmission, data accuracy check, data processing and analysis, assess the respondent’s perceptions about the use of registers, data collection forms and information technologyNon-anonymous interviews with identified name and title, including asking, manual counting, observation and recording results or circling “yes or no”Using a data entry and analysis tool (DEAT), described in quantitative terms rather than qualitative. Yes or No tick checklistA diagnostic tool in forms measures strengths and weaknesses in three dimensions of data quality. Quantitative terms help set control limits and targets and monitor over timeIndicators are not all inclusive; tool should be adapted in a given context. Need pre-test and make adjustments
Pereira 2012 [ ]Completeness and accuracy of data-fields and errorsQuantitative and qualitative methods: Use primary (multi-center randomized trial) and secondary (observational convenience sample) studiesField visits of a sample of clinics within each PHU to assess barcode readability, method efficiency and data quality. 64 clinic staff representing 65% of all inventory staff members in 19 of the 21 participating PHUs completed a survey examining method perceptionsDescriptive statistics: a weighted analysis method, histograms, 95% confidence intervals, F-test, Bootstrap method, the two-proportion z-test, adjusted the p values using Benjamin–Hochberg’s method for controlling false discovery rates (FDR)The first study of such in an immunization setting.Lack of representativeness to multiple lot numbers. Inaccurate data entry was not examined. Observations were based on a convenience sample
Petter and Fruhling 2011 [ ]Checklist of system quality, information qualityQuantitative methods to use DeLone&McLean IS success model. Use a survey in structured questionnaireOnline survey, facsimile, and mail, using 7 Likert scale for all quantitative questions. A response rate of 42.7% with representative demographicsSummative score for each construct, and each hypothesis was tested using simple regression. Mean, standard deviation, the Spearman’s correlation coefficients for analysisDemonstrates the need to consider the context of the medical information system when using frameworks to evaluate the systemInability of assessing some correlational factors due to the small PHIS user system
Ronveaux 2005 [ ]Consistency
The ratio of verified indicators reported compared with written documentation at health facilities and districts
Quantitative methods, using standardized data quality audits (WHO DQAs) in 27 countriesRecounted data compared to reported dataDescriptive statisticsA quantitative indication of reporting consistency and quality, facilitate comparisons of results over time or placeSimilar to WHO DQA
Saeed 2013 [ ]Completeness, validity, data management Calculation of missing data and illegal values (out of a predetermined range), data management (data collection, entry, editing, analysis and feedback)Quantitative and qualitative methods, including interview, consultation, and documentation review10 key informants interview among the directors, managers and officers; 1 or 2 staff at national level interviewed; consultation with stakeholders, document review of each system strategic plan, guidelines, manuals, annual reports and data bases at national levelPredefined scoring criteria for attributes: poor, average, or goodComparison of two PHISPurposive sampling
Savas 2009 [ ]Sensitivity, specificity and the Kappa coefficient for inter-rater agreementQuantitative methods: audit data set by cross-linkage techniquesDatabases were deterministically cross linked using female sex and social security numbers. Deterministic and probabilistic linkage methods were also comparedDescriptive statisticsCombined electronic databases provide nearly complete ascertainment for specific datasetUsing data which were missing would affect the results by under-ascertainment
Van Hest 2008 [ ]Accuracy and completeness of reported casesQuantitative methods: audit data set by record-linkage and capture-recapture techniquesUse record linkage, false-positive records and correction, and capture-recapture analysis through 3 data sources by a core set of identifiersDescriptive statistics: number, proportion and distribution of cases, 95% ACI (Approximate confidence interval), Zelterman’s truncated modelRecord-linkage of TB data sources and cross-validation with additional TB related datasets improves data accuracy as well as completeness of case ascertainmentImperfect record-linkage and false-positive records, violation of the underlying capture–recapture assumptions
Venkatarao 2012 [ ]Timeliness: Percentage of the reports received on time every week; Completeness: percentage of the reporting units sending reports every weekQuantitative methods: Use field survey (questionnaire) with a 4-stage sampling method2 study instruments: the first focused on the components of disease surveillance; the second assessed the ability of the study subject in identifying cases through a syndromic approachDescriptive statistics analysisTwo instruments including surveying users and datasetNot able to assess the quality of data source such as accuracy
WHO DQA 2003 [ ]Completeness of reporting, report availability, timeliness of reporting, verification factorQuantitative methods to audit selected indicators in the dataset. Multi-stage sampling from stratified sample representing the country’s PHISRecounted data compared to reported dataDescriptive statisticsA systematic methodology to describe data quality in the collection, transmission and use of information, and to provide recommendations to address themSample size and the precision dictated by logistical and financial considerations
WHO DQRC 2013 [ ]Completeness of reporting; internal consistency of reported data; external consistency of population data; external consistency of coverage ratesQuantitative method to conduct a desk review of available data and a data verification component at national level and sub-national levelAn accompanying Excel-based data quality assessment toolSimple descriptive statistics: percentage, standard deviationEasy to calculateNeeds WHO DQA to complement assessment of the quality of data source
WHO HMN 2008 [ ]Data-collection method, timeliness, periodicity, consistency, representativeness, disaggregation, confidentiality, data security, and data accessibility.Quantitative and qualitative methods to use 63 out of 197 questions among around 100 major stakeholdersUse consensus development method by group discussions, self-assessment approach, individual (less than 14) or group scoring to yield a percentage rating for each categoryAn overall score for each question, quartiles for the overall report.Expert panel discussion, operational indicators with quality assessment criteria.Sample size was dictated by logistical and financial considerations

Characteristics of the methods for assessment of data use reported in the 10 publications included in the review.

Authors YearAttributes Major measuresStudy designData collection methodsData analysis methodsContributionLimitations
Freestone 2012 [ ]Trends in use Actioned requests from researchers in a set period of timeAnalysis of actioned requests from researchers in a period of timeAbstracted data from the database for the study periodTrend analysis of proportion of requestsQuantifiable measuresLimit attributes
Hahn 2013 [ ]Use of data
The usage of aggregated data for monitoring, information processing, finance and accounting, and long-term business decisions
Qualitative methods: structured interviews with purposive sample of 44 staff and in-depth interviews with 15 key informantsStructured survey and key informant interview to assess five structured statements. Five-point scales were used for each statementResponses were processed manually, classified and grouped by facility and staff classIdentified indicators of use of dataLack of quantifiable results for assessment of data use
Iguiñiz-Romero and Palomino 2012 [ ]Data use
Data dissemination: identify whether data used for decision making, the availability of feedback mechanisms
Qualitative exploratory study including interview and review of documentationsOpen-ended, semi-structured questionnaire interviews with 15 key decision-makers. Review national documents and academic publicationsInterview data recorded, transcribed, organized thematically and chronologically. The respondents were identified by positions but not namedMost respondents held key positions and a long period of the reviewed publicationsPurposive sample lack of representativeness
Matheson 2012 [ ]Clinical use of data: the number of summaries produced.
Use of data for local activities to improve care.
Data entry: the number of active sites.
Report use: the percentage of active sites using prebuilt queries to produce data for each type of report in a given month over time
Qualitative and quantitative methods: key informant interview, documentation review, database query.Personal interviews by phone and through internet telephony; follow up in person or by email; running SQL queries against the central database. External events were identified by reviewing news reports and through personal knowledge of the authorsDescriptive statistics using charts on number of clinics using the system in a given month, percentage of active clinicsMultiple methodsLack of verification of data source
ME PRISM 2010 [ ]Checklist of use of information
Report production, display of information, discussion and decisions about use of information, promotion and use of information at each level
Quantitative method to complete a predesigned checklist diagnostic toolChecklist and non-anonymous interviewing staff, asking, manual counting, observation and recording results or circling “yes or no”Two Likert score and descriptive statisticsQuantitative terms help set control limits and targets and monitor over time
Petter and Fruhling 2011 [ ]System use, intention to use, user satisfactionQuantitative methods to use DeLone & McLean IS success model. Survey respondents with a response rate of 42.7% and with representative demographicsUse an online survey in structured questionnaire with 7 Likert scale for all quantitative questions, in addition to facsimile and mailSummative score for each construct, and each hypothesis was tested using simple regression, in addition to mean, standard deviation, the Spearman’s correlation coefficientsUse is dictated by factors outside of the control of the user, and it is not a reasonable measure of IS success. The quality does not affect the depth of useLack of objective assessments
Qazi and Al 2011 [ ]Use of data
Non-use, misuse, disuse of data
Descriptive qualitative interviewsIn-depth, face to face and semi structured interviews with an interview guide, 26 managers (all men, ages ranging from 26 to 49 years; selected from federal level (2), provincial (4) and seven selected districts (20) from all four provinces)Data transcription, analysis based on categorization of verbatim notes into themes and a general description of the experience that emerged out of statementsA qualitative study allows getting close to the people and situations being studied, identified a number of hurdles to use of dataConvenience sample only one type of stakeholders has been covered.
Saeed 2013Usefulness of the system
Data linked to action, feedback at lower level, data used for planning, detect outbreaks, data used for the development and conduct of studies
Quantitative and qualitative methods, including interview, consultation, and documentation review10 key informants interview; consultation with stakeholders, document review of each systemPredefined scoring criteria for attributes: poor, average, or goodMixed methodsPurposive sampling
WHO HMN 2008 [ ]Information dissemination and use, demand and analysis, policy and advocacy, planning and priority-setting, resource allocation, implementation and actionMixed methods: quantitative and qualitative. Use 10 out of 197 questions among stakeholders at national and subnational levelsUse group discussions (100 major stakeholders), self-assessment approach, individual (less than 14) or group scoring to yield a percentage rating for each categoryAn overall score for each question, quartiles for the overall reportExpert panel discussion, operational indicators with quality assessment criteriaLack of field verification of data use
Wilkinson and McCarthyExtent of data recognition and use, strategies and routines, specific uses, disseminationQuantitative and qualitative methods to use standardized semi-structured questionnaire telephone interviews of key informants from the management teams of the systemTelephone structured questionnaire interviews of 68 key informants from the 29 out of 34 management teams of the networks. Response options for most of the questionnaire items were yes/no or five or seven point Likert and semantic differential response scalesQuantitative and qualitative analysis of survey results. Qualitative data transcribed, ordered by question number, and common themes, then content analyzed to indicate frequencies and percentages. Correlational analyses used Pearson’s r for parametric data and Spearman’s Rho for non-parametric dataQuantification of qualitative dataStatistical analysis is limited by the size of the sample as there were only 29 networks and 68 individual participants, statistical power to detect an effect is weak, and general trends are mainly reported.

Characteristics of the methods for assessment of data collection process reported in the 16 publications included in the review.

Authors YearAttributes Major measuresStudy designData collection methodsData analysis methodsContributionLimitations
Ancker 2011 [ ]Group discussion about root causes of poor data quality and strategies for solving the problemsQualitative method by focus group discussionHeld a series of weekly team meetings over about 4 months with key informants involved in the data collectionTheme grouping to each data quality issueInitiated by and related to identified poor data quality issuesImplicitly focused. Only analyzed causes not assessed the magnitude
Bosch-Capblanch 2009 [ ]Quality scores
Recording and reporting of data,
keeping of vaccine ledgers and information system design
Quantitative method by user’s survey based on WHO DQA. A multistage weighted representative sampling procedureQuestionnaire based on a series of 19 questions and observations undertaken at each level (national, district and health units)Each question 1 point. Average score, summary score, medians, inter-quartile ranges, confidence intervals, P value, bubble scatter chart, Rho valueCombined with data qualityImplicitly focused, the number of questions surveyed was less than that of the WHO DQA
CIHI 2009 [ ]Metadata documentation
Data holding description, methodology, data collection and capture, data processing, data analysis and dissemination, data storage, and documentation.
Quantitative method by surveying usersQuestionnaireUndefined7 categories, with subcategories and definition and/or exampleImplicitly focused
Corriols 2008 [ ]Identification of underreporting reasons by reviewing information flow chart and non-reporting in physiciansQualitative method to review documentationsReview the national reports on the system related to deficiency in the information flow chart and non-reporting in physiciansUndefinedInitiated by identified data quality issuesImplicitly focused
Dai 2011 [ ]Data collection, data quality management, statistical analysis and data disseminationQualitative method, review documentationsDocument reviewTheme groupingDesk reviewImplicitly focused
Forster 2008Routine data collection, training and data quality controlQuantitative method by online surveyQuestionnaireDescriptive statistics.Examine associations between site characteristics and data qualityImplicitly focused. Convenience sample
Freestone 2012 [ ]Data collection and recording processesQualitative method to review current processes about identification, code, geocode of address or location data. Staff consulted to establish and observe coder activities and entry processesReview the processes; consultation with staff; observation of coder activities and entry processes to identify any potential cause of errors which then grouped thematicallyThematically grouping dataIdentify each of the key elements of the geocoding process are factors that impact on geocoding qualityDifferences in software and system settings need to be aware of.
Hahn 2013 [ ]Data flow The generation and transmission of health informationQualitative method to use workplace walkthroughs on 5 subsequent working days at each siteInformal observations of the generation and transmission of health information of all kinds for the selection of data flowsUndefinedObservation of walkthroughsUndefined indicators
Iguiñiz-Romero and Palomino 2012 [ ]Data flow or data collection process: data collectors, frequencies, data flow, data processing and sharing,Qualitative exploratory study including interview and review documentationsOpen-ended, semi-structured questionnaire interviews with 15 key decision-makers. Review national documents and academic publicationsData recorded, transcribed, organized thematically and chronologicallyMost respondents held key positions and a long period of reviewed publicationsPurposive sample
Lin 2012 [ ]Data collection and reportingQualitative methods based on CDC’s Guidelines,Review guidelines and protocols using a detailed checklist; direct observation; focus group discussions and semi-structured interviewsTheme groupingField visits or observations of data collection to identify impact on the data qualityUndefined indicators
ME DQA 2008 [ ]Five functional areas: M&E structures, functions and capabilities, indicator definitions and reporting guidelines, data collection and reporting forms and tools, data management processes, and links with national reporting systemQuantitative and qualitative methods by 13 system assessment summary questions based on 39 questions from five functional areas. Score the system combined with a comprehensive audit of data qualityOff-site desk review of documentation provided by the program/project; on-site follow-up assessments at each level of the IS, including observation, interviews, and consultations with key informantsUsing summary statistics based on judgment of the audit team. Three-point Likert scale to each response. Average scores for per site between 0 and 3 continuous scaleDQA protocol and system assessment protocolImplicitly focused. The scores should be interpreted within the context of the interviews, documentation reviews, data verifications and observations made during the assessment.
ME PRISM 2010 [ ]ProcessesData collection, transmission, processing, analysis, display, quality checking, feedbackQuantitative method by questionnaire survey including data transmission, quality check, processing and analysis and assessing the respondent’s perceptions about the use of registers, data collection forms and information technologyNon-anonymous interviewing staff with identified name and title, including asking, observation and circling “yes or no”Using a data entry and analysis tool (DEAT), described in quantitative terms rather than qualitative. Yes or No tick checklistA diagnostic tool. Quantitative terms help set control limits and targets and monitor over timeIndicators are not all inclusive; tool should be adapted and pre-test and make adjustments
Ronveaux 2005 [ ]Quality index (QI)
Recording practices, storing/reporting practices, monitoring and evaluation, denominators used at district and national levels, and system design at national level
Quantitative and qualitative methods by external on-site evaluation after a multi-stage sampling based on WHO DQA.Questionnaires and observations. Survey at national level (53 questions), district level (38 questions) and health-unit level (31 questions). Observations to workers at the health-unit level. They were asked to complete 20 hypothetical practices.Descriptive statistics (aggregated scores, mean scores): 1 point each question or task observed. Correlational analyses by zero-order Pearson correlation coefficients Implicitly focused. The chosen sample size and the precision of the results were dictated by logistical and financial considerations
Venkatarao 2012 [ ]Accuracy of case detection, data recording, data compilation, data transmissionQuantitative method by using a 4-stage sampling method to conduct field survey (questionnaire) during May-June 2005 among 178 subjectsQuestionnaires of 2 study instruments: the first focused on the components of disease surveillance; the second assessed the ability of the study subject in identifying cases through a syndromic approachDescriptive statistics analysisAssessment from user’s viewpoint.Implicitly focused. Lack of field verification of data collection process
WHO DQA 2003 [ ]Quality questions checklist, quality index Five components: recording practices, storing/reporting practices, monitoring and evaluation, denominators, system design (the receipt, processing, storage and tabulation of the reported data)Quantitative and qualitative method using questionnaire checklists for each level (three levels: national, district, health unit level) of the system including 45, 38, 31 questions respectivelyQuestionnaires and discussions. Observations by walking around the health unit for field observation to validate the reported valuesPercentage of the items answered yes. The target is 100% for each componentDescribe the quality of data collection and transmissionImplicitly focused. The chosen sample size was dictated by logistical and financial considerations
WHO HMN 2008 [ ]Data management or metadata
A written set of procedures for data management including data collection, storage, cleaning, quality control, analysis and presentation for users, an integrated data warehouse, a metadata dictionary, unique identifier codes available
Mixed methods: quantitative and qualitative. Use 5 out of 197 questions, at various national and subnational levelsUse group discussions around 100 major stakeholders, self-assessment approach, individual (less than 14) or group scoring to yield a percentage rating for each categoryAn overall score for each question, quartiles for the overall reportExpert panel discussion, operational indicators with quality assessment criteriaLack of field verification of data collection process

Author Contributions

PY conceptualized the study. HC developed the conceptual framework with the guidance of PY, and carried out the design of the study with all co-authors. HC collected the data, performed the data analysis and appraised all included papers as part of her PhD studies. PY reviewed the papers included and the data extracted. PY, DH and NW discussed the study; all participated in the synthesis processes. HC drafted the first manuscript. All authors made intellectual input through critical revision to the manuscript. All authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Data Topics

  • Data Architecture
  • Data Literacy
  • Data Science
  • Data Strategy
  • Data Modeling
  • Governance & Quality
  • Education Resources For Use & Management of Data

DAS Slides: Data Quality Best Practices

DAS Slides: Data Quality Best Practices from DATAVERSITY To view just the On Demand recording of this presentation, click HERE>> About the Webinar Tackling data quality problems requires more than a series of tactical, one off improvement projects. By their nature, many data quality problems extend across and often beyond an organization. Addressing these issues […]

To view just the On Demand recording of this presentation, click HERE>>

data quality assessment presentation

About the Webinar

Tackling data quality problems requires more than a series of tactical, one off improvement projects. By their nature, many data quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control data quality issues in your organization.

About the Speakers

Donna Burbank

Managing Director, Global Data Strategy, Ltd

data quality assessment presentation

Donna Burbank is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information.  She currently is the Managing Director of Global Data Strategy Ltd , where she assists organizations around the globe in driving value from their data.  She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences.  She has co-authored several books on data management and is a regular contributor to industry publications. She can be reached at  [email protected] and you can follow her on Twitter @donnaburbank.

Nigel Turner

Principal Information Management Consultant/EMEA, Global Data Strategy, Ltd

data quality assessment presentation

Nigel Turner has over 20 years of experience in Information Management (IM) with specialization in Information Strategy, Data Quality, Data Governance, and Master Data Management. He has created and led large IM & CRM consultancy & delivery practices in multiple consulting organizations including British Telecommunications Group (BT), IPL, and FHO. Nigel also has experience in the data quality tools space as Vice President of Information Management Strategy at Harte Hanks Trillium Software, a leading global provider of Data Quality & Data Governance tools and consultancy where he engaged with over 150 customer organizations from all parts of the globe. Nigel is a well-known thought leader in Information Management and has presented at many international conferences in addition to writing numerous white papers and blogs on Information Management topics. Nigel provides education across the IM community, having lectured at Cardiff University on Data Governance and as an active member of DAMA International’s mentoring program, which he was instrumental in founding. He can be reached at [email protected] .

This webinar is brought to you in partnership with:

data quality assessment presentation

Leave a Reply Cancel reply

You must be logged in to post a comment.

JavaScript seems to be disabled in your browser. For the best experience on our site, be sure to turn on Javascript in your browser.

data quality assessment presentation

  • My Wish List

Collidu

  • Compare Products
  • Presentations

Data Quality

You must be logged in to download this file*

item details (16 Editable Slides)

(16 Editable Slides)

Data Quality - Slide 1

Related Products

How Good is Your Response to Data Breaches - Slide 1

If you wish to deliver a compelling presentation on how Data Quality impacts organizational efficiency in today's data-driven landscape, our presentation template is the perfect solution. Lay your hands on this deck now and present important aspects related to this topic in a meaningful and impactful manner. Available for MS PowerPoint and Google Slides!

Quality experts and data scientists will find this deck helpful in showcasing the characteristics and elements of data quality. You can depict the key areas of data quality management. You can also demonstrate the stepwise process of improving the quality of data. Visualize the key factors for data quality and how they contribute to effective data management. Besides this, you can illustrate the process of data quality assessment.

Sizing Charts

Size XS S S M M L
EU 32 34 36 38 40 42
UK 4 6 8 10 12 14
US 0 2 4 6 8 10
Bust 79.5cm / 31" 82cm / 32" 84.5cm / 33" 89.5cm / 35" 94.5cm / 37" 99.5cm / 39"
Waist 61.5cm / 24" 64cm / 25" 66.5cm / 26" 71.5cm / 28" 76.5cm / 30" 81.5cm / 32"
Hip 86.5cm / 34" 89cm / 35" 91.5cm / 36" 96.5cm / 38" 101.5cm / 40" 106.5cm / 42"
Size XS S M L XL XXL
UK/US 34 36 38 40 42 44
Neck 37cm / 14.5" 38cm /15" 39.5cm / 15.5" 41cm / 16" 42cm / 16.5" 43cm / 17"
Chest 86.5cm / 34" 91.5cm / 36" 96.5cm / 38" 101.5cm / 40" 106.5cm / 42" 111.5cm / 44"
Waist 71.5cm / 28" 76.5cm / 30" 81.5cm / 32" 86.5cm / 34" 91.5cm / 36" 96.5cm / 38"
Seat 90cm / 35.4" 95cm / 37.4" 100cm / 39.4" 105cm / 41.3" 110cm / 43.3" 115cm / 45.3"

assessment of data quality

Assessment of data quality

Jul 27, 2014

520 likes | 866 Views

Assessment of data quality. Mirza Muhammad Waqar Contact: [email protected] +92-21-34650765-79 EXT:2257. RG610. Course: Introduction to RS &amp; DIP. Contents. Hard vs Soft Classification Supervised Classification Training Stage Field Truthing

Share Presentation

  • positional discrepancies
  • root mean square error
  • reference values
  • classification error
  • mirza muhammad waqar contact

susane

Presentation Transcript

Assessment of data quality Mirza Muhammad Waqar Contact: [email protected] +92-21-34650765-79 EXT:2257 RG610 Course: Introduction to RS & DIP

Contents • Hard vs Soft Classification • Supervised Classification • Training Stage • Field Truthing • Inter class vs Intra Class Variability • Classification Stage • Minimum Distance to Mean Classifier • Parallelepiped Classifier • Maximum Likelihood Classifier • Output Stage • Supervised vs Unsupervised Classification

Positional and Attribute Accuracies • Positional and attributeaccuracies are the most critical factors in determining the quality of geographic data. • Can be quantified by sample data (a portion of whole data set) against reference data. • The concepts and methods of spatial data quality are applicable to both raster and vector data.

Evaluation of Positional Accuracy • Made up of two elements: • Planimetric accuracy, and • This is done by comparing the coordinates (x and y) of sample points on maps to the coordinates (x and y) of corresponding reference points. • Height accuracy • Involves comparison of elevation values of sample and reference data points.

Reference Data • To be used as a sample point, the point must be well defined, which means that it can be unambiguously identified both on the map and on the ground. • Survey monuments • Bench marks • Road intersections • Corner of building • Lampposts • Fire hydrants etc.

Reference Data • It is important for both the sample and reference data to be in the same map projection and based on the same datum. • The Accuracy Standards for Large-scale Maps however, specifies that: • A minimum of 20 check points must be established throughout the area covered by the map. • These sample points should be spatially distributed in such a way that at least 20% of the points be located in each quadrant of the map. • with individual points spaced at intervals equal to at least 10% of the diagonal of the map sheet.

Standard to take sample Points

Root Mean Square Error The discrepancies between the coordinate values of the sample points and their corresponding reference coordinate values are used to compute the overall accuracy of the map as represented by the root mean-square error (RMSE) The RMSE is defined as the square root of the average of the squared discrepancies. The RMSE for discrepancies in the X coordinate direction (rmsx) Y coordinate direction (rmsy) and elevation (rmst ) are computed from:

RMS for discrepancies • Where • dx = discrepancies in X coordinate direction = Xreference – Xsample

RMS for discrepancies • dy = discrepancies in Y coordinate direction = Yreference – Ysample • e = discrepancies in elevation = E reference – E sample • n = total number of points checked (sampled)

RMS for discrepancies • From rmsx and rmsy, a single RMSE of planimetry (rmsp) can be computed as follows.

RMS as Overall Accuracy The RMSEs of planimetry and elevation have now been generally accepted as the overall accuracy of the map. RMSE is used as the index to check against specific standards to determine the fitness for use of the map. The major drawback of the RMSE is that it provides information of only the overall accuracy. It does not give any indication of the spatial variation of the errors.

RMS as Overall Accuracy • For users who require such information, a map showing the positional discrepancies at the sample points can be generated. • Separate maps can be generated for discrepancies in easting and northing. • Alternatively a map showing the vectors of discrepancies at each point can be plotted

Evaluation of Attribute Accuracy • Attribute accuracy is obtained by comparing values of sample spatial data units with reference data obtained either by field checks or from sources of data with a higher degree of accuracy. • These sample spatial units can be raster cells; raster image pixels; or sample points, lines, and polygons.

Error Matrix • An error matrix is constructed to show the frequency of discrepancies between encoded values (i.e., data values on a map or in a database) and their corresponding actual or reference values for a sample of locations. • The error matrix has been widely used as a method for assessing classification accuracy of remotely sensed images

Error/Confusion Matrix • An error matrix, also known as classification error matrix or confusion matrix, is a square array of values, which cross-tabulates the number of sample spatial data units assigned to a particular category relative to the actual category as verified by the reference data.

Error Matrix • Conventionally, the rows of the error matrix represent the categories of the classification of the database, while the columns indicate the classification of the reference data. • In the error matrix, the element ij represents the frequency of spatial data units assigned to category i that actually belong to category j.

An Error Matrix A = Exposed soil B = Cropland C = Range D = Sparse woodland E = Forest F = water body

Error Matrix • The numbers along the diagonal of the error matrix (i.e. when i = j) indicate the frequencies of correctly classified spatial data units in each category; and the off-diagonal numbers (when I j) represent the frequencies of misclassification in the various categories.

Error Matrix • The error matrix is an effective way to describe attribute accuracy of geographic data. • If in a particular error matrix, all the nonzero entries lie on the diagonal. it indicates that no misclassification at the sample locations has occurred and an overall accuracy of 100% is obtained.

Commission or Omission • When misclassification occurs, it can be identified either as an error of commission or an error of omission. • Any misclassification is simultaneously an error of commission and an error of omission.

Error of Commission and Omission • Errors of commission, also known as errors of inclusion, are defined as wrongful inclusion of a sample location in a particular category due to misclassification. • When this happens, it means that the same sample location is omitted from another category in the reference data, which is an error of omission.

Commission vs Omission • Errors of commission are identified by off-diagonal values across the rows. • Errors of omission. also known as errors of exclusion, are identified by those off-diagonal values down the columns.

An Error Matrix Error of Commission A = Exposed soil B = Cropland C = Range D = Sparse woodland E = Forest F = water body

An Error Matrix Error of Omission A = Exposed soil B = Cropland C = Range D = Sparse woodland E = Forest F = water body

Indices to check Accuracy • In addition to the interpretation of errors of commission and omission, the error matrix may also be used to compute a series of descriptive indices to quantify the attribute accuracy of the data. • These include: • Overall Accuracy • Producer's Accuracy • User's Accuracy

Overall Accuracy • The PCC (Percent Correctly Classified) index represents the overall accuracy of the data. • In the case of simple random sampling, the PCC is defined as the trace of the error matrix (i.e., the sum of the diagonal values) divided by n, the total number of sample locations.

Overall Accuracy • PCC = (Sd / n) * 100% • Where • Sd = sum of values along diagonal • n = total number of sample locations

PCC – Overall Accuracy A = Exposed soil B = Cropland C = Range D = Sparse woodland E = Forest F = water body PCC = (1+5+5+4+4+1) x 100/35 PCC = 20 x 100 / 35 = 57.1%

Overall Accuracy • The maximum value of the PCC index is 100 when there is perfect agreement between the database and the reference data. The minimum value is 0, which indicates no agreement.

Deficiencies in PCC index • In the first place, since the sample points are randomly selected, the index is sensitive to the structure of the error matrix. This means that if one category of data dominates the sample (this occurs when the category covers a much larger area than others), the PCC index can be quite high even if the other classes are poorly classified.

Deficiencies in PCC index • Second, the computation of the PCC index does not take into account the chance agreements that might occur between sample and reference data. The index therefore always tends to overestimate the accuracy of the data. • Third, the PCC index does not differentiate between errors of omission and commission. Indices of these two types of errors are provided by the producer's accuracy and the user's accuracy.

Producer’s Accuracy • This is the probability of a sample spatial data unit being correctly classified and is a measure of the error of omission for the particular category to which the sample data belong. • The producer's accuracy is so-called because it indicates how accurate the classification is at the time when the data are produced.

Producer’s Accuracy • Producer’s accuracy is computed by: • Producer’s accuracy = (Ci/Ct) * 100 • Where • Ci = Correctly classified sample locations in column • Ct = Total number of sample locations in column • Error of omission = 100 – producer’s accuracy

User’s Accuracy • This is the probability that a spatial data unit classified on the map or image actually represents that particular category on the ground. • This index of attribute accuracy, which is actually a measure of the error of commission, is of more interest to the user than the producer of the data.

User’s Accuracy • User’s accuracy is computed by: • User’s accuracy = (Ri/Rt) * 100 • where • Rj = correctly classified sample locations in row • Rt = total number of sample locations in row • error of commission = 100 – user's accuracy

An Error Matrix PCC = (1+5+5+4+4+1) x 100/35 = 57.1% Producer’s accuracy: A = 1/1 = 100% D = 4/7 = 57.1% B = 5/10 = 50% E = 4/7 = 57.1% C = 5/9 = 55.6% F = 1/1 = 100% User’s Accuracy: A = 1/3 = 33.3% D = 4/8 = 50% B = 5/10 = 50% E = 4/4 = 100% C = 5/9 = 55.6% F = 1/1 = 100% A = Exposed soil B = Cropland C = Range D = Sparse woodland E = Forest F = water body

Kappa Coefficient (k) • Another useful analytical technique is the computation of the kappa coefficient or Kappa Index of Agreement (KIA) • It is capable of controlling the tendency of the PCC index to overestimate by incorporating all the off-diagonal values in its computation • The use of the off-diagonal values in the computation of the kappa coefficients also makes them useful for testing the statistical significance of the differences in different error matrices

Kappa Coefficient (k) • The coefficient (K), first developed by Cohen (1960) for nominal scale data • K = Po – Pc / 1 – Pc • Po is the proportion of agreement between the reference and sample data (PCC) • Kappa coefficient varies from a minimum of 1 to a maximum of 0.

Tau Coefficient • Kappa coefficient tends to overestimate the agreement between data sets. • Foody (1992) described a modified kappa coefficient based on equal probability of group membership that resembles and is derived more properly from the tau coefficient.

Tau Coefficient •  = Po – Pr / 1 – Pr • It was demonstrated that the tau coefficient, which is based on the a priori probabilities of group membership, provides an intuitive and relatively more precise quantitative measure of classification accuracy than the kappa coefficient, which is based on the a posteriori probabilities

Questions & Discussion

  • More by User

Quality of life Assessment introduction

Quality of life Assessment introduction

Quality of life Assessment introduction. Jan J. v. Busschbach, Ph.D. www.xs4all.nl/~jannetvb/busschbach Department of Medical Psychology and Psychotherapy, Erasmus MC [email protected] +31 10 4087807 Psychotherapeutic Centrum ‘De Viersprong’, Halsteren

626 views • 30 slides

Quality of assessment practices

Quality of assessment practices

Background. In late 2008 the NQC commissioned Learning Australia to undertake a scoping study to: review what constitutes and contributes to quality assessment and how perceptions or understandings of this may have changed in the last five years from the point of view of key stakeholders, including

393 views • 19 slides

Assessment of Interchangeable Multisource Medicines Quality of BE Data

Assessment of Interchangeable Multisource Medicines Quality of BE Data

Assessment of Interchangeable Multisource Medicines Quality of BE Data . Dr. Henrike Potthast ([email protected]). Training workshop: Assessment of Interchangeable Multisource Medicines, Kenya, August 2009. Quality of Bioequivalence Studies. Citation from Directive 2001/20/EC:

490 views • 32 slides

The Data Quality Assessment Framework

The Data Quality Assessment Framework

The Data Quality Assessment Framework. OECD Meeting of National Accounts Experts October 2001. Purpose of this Presentation. To describe: The IMF’s Data Quality Assessment Framework (DQAF), and

685 views • 41 slides

Data Quality Assessment and Measurement

Data Quality Assessment and Measurement

Data Quality Assessment and Measurement. Laura Sebastian-Coleman, Ph.D., IQCP Optum Data Management EDW April 2014 – AM5 April 28, 8:30 – 11:45. Agenda. Welcome and Thank You for attending! Agenda Introductory materials Abstract Information about Optum and about me

1.12k views • 84 slides

Assessment of data quality

Assessment of data quality. Dr Venanzio Vella , Epidemiologist Key Expert CEEN. Overview of the analysis. Individual data were available from all the regions of the RS , BD, 6 cantons in the Federation (4 provided weekly data) for 2008-11;

543 views • 38 slides

Quality Assessment Methodologies for  Linked Open Data

Quality Assessment Methodologies for  Linked Open Data

Ontologias e Web Semântica Profs.: Fred Freitas e Bernadette Farias Lóscio. Quality Assessment Methodologies for  Linked Open Data. Aluno: Walter Travassos Sarinho [email protected]. Summary. Introduction What is Data Quality? The Data Quality Problem Quality Dimensions

733 views • 59 slides

Quality of Assessment Practices

Quality of Assessment Practices

Quality of Assessment Practices . Industry perceptions of VET assessment ‘. Source: NQC: Investigation into industry expectations of vocational education and training assessment, 2009 High level of satisfaction with assessment in organisation with a VET qualification – relied on:

326 views • 14 slides

Quality assessment

Quality assessment

Bioinformatics Unit. OVERVIEW

120 views • 1 slides

Quality Assessment

Quality Assessment

Quality Assessment. Clinical Research Compliance Office (CRCO). QA activities include:. Auditing Not-for-cause For-cause Federal Incident Reporting Assistance to Investigators who hold an Investigational New Drug (IND) or Investigational Device Exemption (IDE). What is auditing? .

488 views • 17 slides

Quality Assessment

Quality Assessment. What is Quality Control?. Quality Control in the clinical laboratory is a system designed to increase the probability that each result reported by the laboratory is valid and can be used with confidence by the physician making a diagnostic or therapeutic decision.

537 views • 18 slides

Models of Quality Assessment

Models of Quality Assessment

SQU College of Education Instructional &amp; Learning Technologies Department TECH4211. Models of Quality Assessment. Asma Al Yahyaei Ateka Al Saqri Sara Al Hattali Maytha Al Omari. Outline:. Baldrige criteria ISO 900-2000. There are five popular models of quality assurance :.

411 views • 23 slides

Models of Quality Assessment

Models of Quality Assessment. Done By: Asila AL- harthi Fatma AL- shehhi Fakhriya AL- Omieri Safaa AL- Mahroqi. 3. Capability Maturity Model. I s based on the concept of “ key Process Areas” that achieve a set of goals important for enhancing process capability.

293 views • 18 slides

Tsunami Recovery Program Data Quality Assessment

Tsunami Recovery Program Data Quality Assessment

Tsunami Recovery Program Data Quality Assessment. American Red Cross Tsunami Recovery Program Regional Technical Team, Bangkok 15 September 2009. Overview.

312 views • 20 slides

Quality Assessment

Quality Assessment. Lecture 2. Laboratory Analysis. The goal of laboratory analysis is to provide the reliable laboratory data to the health-care provider in order to assist in clinical decision-making.

805 views • 42 slides

Data Quality Assessment Manager (DQAM)

Data Quality Assessment Manager (DQAM)

The Data Quality Assessment Manager is a Data Quality product specifically designed to manage data quality assessments, manage data quality scores, review and correct quality issues and manage the workflow across all stakeholders involved in a data quality assessment. DQAM is the industry’s first platform designed to put data quality in the hands of data stewards and business owners who know and understand the data the best.

147 views • 2 slides

Assessment of quality of standards

Assessment of quality of standards

Assessment of quality of standards. Jolanta Szutkowska Methodology, Standards and Registers Division GUS. Content. The purpose of quality assessment of standards Intrinsic nature of standard and its links with quality References with quality components ESS and other aspects of quality

379 views • 37 slides

The Data Quality Assessment Framework

721 views • 41 slides

Newly Launched - AI Presentation Maker

SlideTeam

  • Customer Favourites

Data Assessment

AI PPT Maker

Powerpoint Templates

Icon Bundle

Kpi Dashboard

Professional

Business Plans

Swot Analysis

Gantt Chart

Business Proposal

Marketing Plan

Project Management

Business Case

Business Model

Cyber Security

Business PPT

Digital Marketing

Digital Transformation

Human Resources

Product Management

Artificial Intelligence

Company Profile

Acknowledgement PPT

PPT Presentation

Reports Brochures

One Page Pitch

Interview PPT

All Categories

category-banner

  • You're currently reading page 1

Stages // require(['jquery'], function ($) { $(document).ready(function () { //removes paginator if items are less than selected items per page var paginator = $("#limiter :selected").text(); var itemsPerPage = parseInt(paginator); var itemsCount = $(".products.list.items.product-items.sli_container").children().length; if (itemsCount ? ’Stages’ here means the number of divisions or graphic elements in the slide. For example, if you want a 4 piece puzzle slide, you can search for the word ‘puzzles’ and then select 4 ‘Stages’ here. We have categorized all our content according to the number of ‘Stages’ to make it easier for you to refine the results.

Category // require(['jquery'], function ($) { $(document).ready(function () { //removes paginator if items are less than selected items per page var paginator = $("#limiter :selected").text(); var itemsperpage = parseint(paginator); var itemscount = $(".products.list.items.product-items.sli_container").children().length; if (itemscount.

  • 3D Man (10)
  • Block Chain (1)
  • Branding (1)
  • Brochures (2)
  • Brochures Layout (141)

Data Assessment Powerpoint Ppt Template Bundles

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals

You are here

  • Volume 14, Issue 8
  • Association between different levels of suppressed viral load and the risk of sexual transmission of HIV among serodiscordant couples on antiretroviral therapy: a protocol for a two-step systematic review and individual participant data meta-analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Pascal Djiadeu 1 ,
  • Housne Begum 1 ,
  • Chris Archibald 1 ,
  • Taline Ekmekjian 2 ,
  • Giovanna Busa 1 ,
  • Jeffery Dansoh 1 ,
  • Phu Van Nguyen 1 ,
  • Annie Fleurant 1
  • 1 Sexually Transmitted and Blood Borne Infections Surveillance Division , Public Health Agency of Canada , Ottawa , Ontario , Canada
  • 2 PHAC Library, Office of the Chief Science Officer , Public Health Agency of Canada , Ottawa , Ontario , Canada
  • Correspondence to Dr Pascal Djiadeu; sti.secretariat-its{at}phac-aspc.gc.ca

Introduction HIV is a major global public health issue. The risk of sexual transmission of HIV in serodiscordant couples when the partner living with HIV maintains a suppressed viral load of <200 copies of HIV copies/mL has been found in systematic reviews to be negligible. A recent systematic review reported a similar risk of transmission for viral load<1000 copies/mL, but quantitative transmission risk estimates were not provided. Precise estimates of the risk of sexual transmission at sustained viral load levels between 200 copies/mL and 1000 copies/mL remain a significant gap in the literature.

Methods and analysis A systematic search of various electronic databases for the articles written in English or French will be conducted from January 2000 to October 2023, including MEDLINE, Embase, the Cochrane Central Register of Controlled Trials via Ovid and Scopus. The first step of a two-step meta-analysis will consist of a systematic review along with a meta-analysis, and the second step will use individual participant data for meta-analysis. Our primary outcome is the risk of sexual HIV transmission in serodiscordant couples where the partner living with HIV is on antiretroviral therapy. Our secondary outcome is the dose-response association between different levels of viral load and the risk of sexual HIV transmission. We will ascertain the risk of bias using the Risk Of Bias in Non-randomised Studies of Interventions (ROBINS-I) and Quality in Prognostic Studies (QUIPS), the risk of publication bias using forest plots and Egger’s test and heterogeneity using I 2 . A random effects model will estimate the pooled incidence of sexual HIV transmission, and multivariate logistic regression will be used to assess the viral load dose-response relationships. The Grading of Recommendations, Assessment, Development and Evaluation system will determine the certainty of evidence.

Ethics and dissemination The meta-analysis will be conducted using deidentified data. No human subjects will be involved in the research. Findings will be disseminated through peer-reviewed publications, presentations and conferences.

PROSPERO registration number CRD42023476946.

  • HIV and AIDS
  • Sexually Transmitted Disease
  • Public health
  • Systematic Review
  • Epidemiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjopen-2023-082254

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THE STUDY

The proposed individual participant data (IPD) meta-analysis will be conducted using raw data from individual participants with advantages, including greater quantity of data, more flexibility in analytic approaches, the ability to conduct subgroup analyses and improved ability to detect and address biases.

This innovative two-step meta-analysis will still collect and synthesise evidence and answer the research questions even if the IPD part is not feasible.

Studies collected in the review may have differences in the timing and frequency of viral load testing, adherence to antiretroviral therapy and patient follow-up, causing imprecision within the data.

There may be insufficient data across the full range of viral load levels to fully assess the association and/or potential lack of agreement by authors to share data for the IPD meta-analysis.

Introduction

Globally, an estimated 39 million people are currently living with HIV (PLHIV), of whom 29.8 million (76%) are on treatment and 21.2 million (54% of all PLHIV and 71% of PLHIV on treatment) are living with suppressed HIV. 1 Antiretroviral therapy (ART) can improve the lives of PLHIV and help protect their sexual partners from sexual HIV transmission. People who are on HIV treatment can achieve an undetectable viral load with effectively no risk of transmitting HIV to their sexual partners. 2 This concept is referred to as Undetectable equals Untransmittable, or U=U, 3 and it was initiated in 2016 by the Prevention Access Campaign, a health equity initiative with the goal of ending the HIV/AIDS pandemic and associated HIV-related stigma. 4 The U=U concept is based on a substantial body of scientific evidence demonstrating that for PLHIV who have achieved a sustained suppressed and undetectable viral load, there is effectively no risk of sexual HIV transmission. 5 Furthermore, treatment as prevention is one of the effective strategies to prevent HIV transmission, with high uptake of ART suggested as an effective approach to reduce HIV incidence. 6 7

A systematic review and meta-analysis published by the Public Health Agency of Canada (PHAC) in 2018, concluded, using criteria defined by the Canadian AIDS Society framework to characterise HIV transmission risk, 8 that the risk of sexual transmission of HIV is negligible when the PLHIV is on ART with a suppressed viral load of <200 copies of HIV RNA/ml with consecutive testing every 4–6 months. 2 A rapid review published by the Canadian Agency for Drugs and Technologies in Health (CADTH) in 2023 as well as a 2023 PHAC rapid communication confirmed these findings, with the PHAC report providing an estimated risk of HIV sexual transmission of 0.00 transmissions per 100 person-years (95% CI 0.00 to 0.10) in this specific situation. 9 10 In 2023, a systematic review by Broyles et al 11 concluded that the risk of sexual transmission of HIV is almost zero when the PLHIV is under ART and has a suppressed viral load of <1000 copies of HIV RNA, 11 but no quantitative risk estimate was calculated. Furthermore, the WHO concluded in its 2023 policy brief that PLHIV who have a suppressed but detectable viral load have almost zero or negligible risk of sexual transmission of HIV to their partner as long as they continue to take their ART as prescribed. 12 The WHO also revised the operational definition for undetected viral load from ‘≤ 50 copies/ml’ to ‘not detected by the test or sample type used’ and suppressed viral load from ‘≤200 copies/ml’ to ‘≤1000 copies/ml’ and recommended a viral suppression threshold of 1000 copies/mL because persistent viral load levels above 1000 copies/mL are associated with treatment failure. 12

Most of the literature demonstrating that a suppressed, undetectable viral load is associated with effectively no risk of sexual HIV transmission uses a viral load threshold of 200 copies/mL. 3 5 Precise estimates of the risk of sexual transmission at sustained viral load levels between 200 and 1000 copies/mL remain a significant gap in the literature. Addressing this gap by quantifying these risks is needed to evaluate the strength of the association between different viral load levels and the risk of HIV transmission, and to better understand considerations of viral load levels with respect to HIV treatment and prevention programmes.

The primary objective of this review is to quantify the risk estimate of HIV transmission and determine the association between different levels of viral load (primarily in the range 200–1000 copies/mL) and the risk of sexual HIV transmission among serodiscordant couples where the PLHIV is on ART.

The specific hypotheses include: (a) there will be a significant difference in the risk of sexual HIV transmission between viral load levels and (b) there will be a dose-response relationship between different viral load levels (200 copies/mL, 400 copies/mL, 1000 copies/mL or >1000 copies/mL) and risk of sexual HIV transmission.

Research questions

Q1 : What is the risk of sexual transmission of HIV with suppressed viral load<1000 copies/ml and at different levels of viral load>1000 copies/ml?

Q1.1: What is the risk of sexual HIV transmission in serodiscordant couples when the PLHIV is on ART with different levels of suppressed viral load between 200 to 1000 copies/ml (new potential evidence on risk of HIV transmission with viral load<200 copies/ml will also be assessed and reported if available)?

Q1.2: What is the risk of sexual HIV transmission in serodiscordant couples when the PLHIV is on ART with different levels of viral load>1000?

Q2 : Is there a dose-response association between different levels of viral load and the risk of sexual HIV transmission?

Methods and analysis

Patient and public involvement.

In designing this meta-analysis protocol, neither patients nor public were involved.

Protocol guidance and registration

This systematic review will follow a two-step meta-analysis approach. First, a systematic review and meta-analysis will be conducted. Second, an individual participant data (IPD) meta-analysis will be performed if feasible. IPD is considered as the gold standard of reviews and has several advantages compared with aggregate data systematic reviews and meta-analyses. These advantages include a greater quantity of data, the ability to standardise outcomes across trials, more flexibility in analytic approaches, the ability to conduct subgroup/moderator analyses and an enhanced ability to detect and address biases. 13 This protocol is based on the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocol (PRISMA-P) statement 14 ( online supplemental appendix table 1 ). This systematic review and meta-analysis will follow the methodology outlined in the Cochrane Handbook for Systematic Reviews of Intervention. 15 16 The reporting of results will follow the PRISMA 2020 and PRISMA-IPD meta-analysis guidelines ( online supplemental appendix tables 1,2 ). 17 18 IPD meta-analysis will be done using data from studies already published or ongoing on this topic. Although such an approach would produce the best theoretical result, there are some limitations with this method, 19 namely the potential for insufficient data across the full range of viral load levels and/or potential lack of agreement by authors to share such data. Based on the results of the included studies from full-text screening of studies, we will be selecting collaborators for IPD requests. If the proposed IPD meta-analysis is not possible, the systematic review will assess the extent to which these research questions can be answered from existing published literature alone.

Supplemental material

Protocol registration.

This study has been registered with the International Registration of Systematic Reviews (PROSPERO) on 11 November 2023 with the registration number CRD42023476946 . Any future changes or modifications to the review procedures will be documented and updated to the PROSPERO registration.

Eligibility criteria and type of study

Original studies (randomised controlled trials and non-randomised studies), case reports and conference abstracts will be included if they report on longitudinal studies of couples with one partner living with HIV and document the number of HIV infections in previously seronegative sexual partners and provide information about viral load levels in the HIV-seropositive partner and/or use of ART. For studies that report any HIV infections in the seronegative partner, they will need to be linked to the partner living with HIV through phylogenetic analysis to rule out infection from outside the couple. Considering the difficulty of doing individualised randomisation in public health interventions, cluster Randomized Controlled Trials (RCTs) and quasi-experimental studies with self-control will also be considered for inclusion. Studies reporting a sex partner living with HIV who takes ART and has a viral load measurement provided will be included. Articles written in English and French will be retrieved from electronic English and French databases with full-text access, and published within the timeframe of 1 January, 2000 to Oct 2023 will be included. 20 Studies involving condom use or pre-exposure prophylaxis will be excluded. Studies where HIV is not primarily transmitted through sex will also be excluded. Reviews, editorials, letters and conference proceedings without detailed results will be excluded. Search types and patterns are featured in online supplemental appendix 2 .

Participants, type of interventions and outcomes of interest

  • View inline

Information sources and search strategy

A comprehensive and systematic search of the following databases will be conducted: MEDLINE, Embase, the Cochrane Central Register of Controlled Trials via Ovid and Scopus. The search strategy, developed by a health information professional in collaboration with the other authors, uses text words and relevant indexing to identify studies on viral load, ART and transmission of HIV between serodiscordant couples. The MEDLINE search strategy (see Appendix) will be applied to all databases with appropriate modifications. The search will be limited to publications from January 2000 to 2023.

In addition, a thorough examination will be performed of the reference lists of identified relevant studies, experts in the field of HIV sexual transmission will be contacted to identify any additional studies or results, and ClinicalTrials.gov and International Clinical Trials Registry Platform will be examined to identify planned, ongoing or unpublished trials. To retrieve any grey literature, Google Scholar and Baidu Scholar will also be searched. Clinical trial registries will also be searched, including the US National Institutes of Health’s clinicaltrials.gov and Health Canada’s Clinical Trials Database. Search types and patterns are featured in online supplemental appendix table 3 .

Study selection

Articles will be imported and deduplicated using EndNote20 (Clarivate, Philadelphia, Pennsylvania, USA) and then imported into Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia) for screening. Reviewers (GB, JD and PVN) will do pilot screening with a sample of 100 abstracts to ensure consistency of use and clarity of the inclusion and exclusion criteria. To measure the inter-rater reliability, a Cohen’s kappa statistic will be used. Screening will begin when >70% agreement is achieved. 21 In duplicate, the authors (GB, JD and PVN) will conduct all screening, data extraction and quality assessment procedures. Disagreements will be resolved by consensus. Situations where consensus cannot be reached will be resolved by a third author who will arbitrate (PD and HB). Eligible articles identified by title and abstract screening based on inclusion criteria will be selected for full-text screening. Two independent reviewers will review the full texts. References of the included studies will be hand searched to identify additional relevant studies for inclusion. Conflicts between reviewers will be resolved through discussion, and if no resolution can be achieved, a third reviewer (PD and HB) will be consulted. In case of missing data or information, authors will be contacted.

A third reviewer (PD and HB) will confirm the excluded publications and their respective reasons for elimination. A PRISMA flow chart adapted from the PRISMA 2020 and the PRISMA IPD flow diagram ( figure 1 ) 17 18 will be used to show the process of study selection.

  • Download figure
  • Open in new tab
  • Download powerpoint

PRISMA flow diagram. IPD, individual participant data; PRISMA, Preferred Reporting Items for Systematic Review and Meta-Analysis.

Data extraction and management

After the full-text screening and study selection process, the selected studies will undergo data extraction, wherein information from the studies will be extracted after a thorough reading of the full text. The list of variables to be extracted is presented in table 1 . The data extraction form will be created using Microsoft Excel 2016. Data extraction will be conducted by two independent reviewers using the designed data extraction form. Following this process, the records extracted by the reviewers will be cross-checked, and any disputed points will be resolved through a third reviewer (PD and HB).

List of variables for data extraction.

Risk of bias assessment

For non-RCTs, the ROBINS-I (the Risk Of Bias in Non-randomised Studies of Interventions) tool will be used by the reviewers to determine the quality of the study. The ROBINS-I tool is concerned with evaluating the risk of bias in estimates of the effectiveness or safety (benefit or harm) of an intervention from studies that did not use randomisation to allocate interventions. 22 This will influence how the data are interpreted. For prognostic studies, QUIPS tool will be used. 23 Biases will be measured as ‘critical risk’, ‘serious risk’, ‘moderate risk’, ‘low risk’ and ‘no information’.

The risk of publication bias will be assessed by visual inspection of funnel plots and using the Egger’s test (with 10 or more included articles). 24

Data synthesis

Descriptive statistics from included studies will be extracted and summarised in tables. When there is a difference in data units across studies, we will perform data conversion for the meta-analysis. The main statistical analysis of the study will involve two steps:

Meta-analysis using data extracted from included studies

Incidence data will be summarised for meta-analysis. A pooled estimate of the incidence of sexual HIV transmission will be generated and reported with 95% CI. Heterogeneity will be examined using 25 26 the I 2 and the H² statistics since they both relate to the percentage of variability that is due to true differences between studies (heterogeneity). I² will be quantified as low (≤25%), moderate (25%–50%) or high (>50%). Fixed-effect model will be used for heterogeneity <50%. Where heterogeneity is >50%, we will use the random-effect model to examine the association between varying viral loads and risk of HIV transmission among serodiscordant couples and create summary forest plots.

The variation for moderate or higher heterogeneity will be explored by conducting meta-regression and sensitivity analyses, including sample size, study year and demographic characteristics, or excluding studies to examine heterogeneity. Furthermore, we will also attempt to explain the heterogeneity by conducting subgroup analyses to compare the risk of HIV transmission between groups, including gay, bisexual and other men-who-have-sex-with-men (gbMSM), women who have sex with women and heterosexuals.

The presence of publication bias will be assessed using a funnel plot and Egger’s test, provided we have at least 10 studies included in the meta-analysis. 24

Statistical analysis using IPD

A data sharing agreement will be established outlining the nature of the project, collaboration and responsibilities of each party. Deidentified and anonymised participant data will be confidentially collected from collaborators. Descriptive analyses will be performed to examine the participants’ demographic characteristics.

We will analyse all the studies separately to compare our results with the original study. Any discrepancies will be resolved. Analysis will include all study participants following the intention-to-treat approach. Summary statistics will be presented as mean (SD) or median (IQR) for continuous variables and per cent for categorical variables. Effect size will be computed for different thresholds of viral load. χ 2 test will be used to evaluate the association of viral load to the risk of sexual HIV transmission by comparing the various viral load levels. We will also compute ORs and corresponding 95% CI to assess the strength of the association of viral load to the risk of sexual HIV transmission. The level of statistical significance α will be 0.05 for all tests. An individual random-effect meta-analysis will be conducted to determine the overall effect of viral load on sexual HIV transmission. Furthermore, a multivariable logistic regression for binary outcomes will be done to predict the risk of HIV transmission among serodiscordant couples at different levels of viral load at the baseline level from each study. Additional adjustments with sociodemographic characteristics, including age, sex, education and location, will also be included. Effect sizes and standard errors can be obtained from this analysis including covariate adjustment which could potentially address bias concerns.

Additionally, viral load levels will be categorised into a contingency table to investigate whether different viral load categories are associated with different levels of HIV transmission risk among sexual partners. A dose-response relationship will also be examined between different viral load levels in PLHIV and the incidence of HIV among their partners using multivariate logistic regression and incidence frequencies of sexual HIV transmission.

All analyses will be done in R V.4.2.3, REVMAN and SPSS V.28 as needed.

Missing data

Missing data will be addressed depending on the specific characteristics of the missing data. An effort will be made to discuss with collaborating teams the possibility of collecting missing data from their studies. If the data are missing completely at random for the entire study, a list-wise or pair-wise deletion to obtain valid and complete cases will be performed. However, this step may reduce the sample and power of the study. For the remaining non-random missing data, multiple imputations by chained equations will be used. 27 In this method, missing data is computed on a case-by-case basis. A regression model will also be conditionally applied to the other variables in the dataset.

Certainty of evidence

Summary of findings will be presented via tables, including tables for each of the prespecified outcomes (eg, number of cases of HIV transmission). The Grading of Recommendations, Assessment, Development and Evaluation will be used to assess the certainty of evidence considering the bias risk of the trials, consistency of effect, imprecision, indirectness, publication bias, dose response and residual confounding. 28

Ethics and dissemination

The meta-analysis will be conducted using deidentified and anonymised data. No human subjects will be directly involved in this research. Dissemination of results of this review will be done through peer-reviewed publications and presentations, as well as international conferences.

We understand that effort, resources and international cooperation are required to perform meta-analysis based on IPD. We will produce a meta-analysis based on the number of collaborators interested in this review and the quality of data collected. We will attempt to establish quantitative risk estimates of sexual HIV transmission at viral load levels between 200 copies/mL and 1000 copies/mL and potentially also at levels >1000 copies/mL. This two-step systematic review (SR) and individual participant data (IPD) meta-analysis will also evaluate the strength of the association of viral load to the risk of sexual HIV transmission. The findings of this SR and IPD meta-analysis will help patients, researchers and policymakers to better understand the risk of sexual HIV transmission in the context of ART and the associated considerations for HIV treatment and prevention programmes.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • LeMessurier J ,
  • Traversy G ,
  • Varsaneux O , et al
  • Eisinger RW ,
  • Dieffenbach CW ,
  • Prevention Access Campaign
  • Martin-Blondel G ,
  • Vu Hai V , et al
  • Sansom SL ,
  • Wolitski RJ , et al
  • Lockman S ,
  • Ayles H , et al
  • Canadian Aids Society
  • Khangura S ,
  • Subramonian A ,
  • Djiadeu P ,
  • Sabourin S , et al
  • Broyles LN ,
  • Boeras D , et al
  • Tierney JF ,
  • Riley R , et al
  • Shamseer L ,
  • Clarke M , et al
  • Higgins JPT ,
  • Chandler J , et al
  • Altman DG ,
  • Gøtzsche PC , et al
  • McKenzie JE ,
  • Bossuyt PM , et al
  • Stewart LA ,
  • Rovers M , et al
  • Broeze KA ,
  • Opmeer BC ,
  • van der Veen F , et al
  • Phillips EJ
  • Sterne JAC ,
  • Hernán MA ,
  • McAleenan A , et al
  • Grooten WJA ,
  • Äng BO , et al
  • Davey Smith G ,
  • Schneider M , et al
  • Julian PT ,
  • Thompson SG ,
  • Deeks JJ , et al
  • Schafer JL ,
  • Schünemann HJ ,
  • Vist GE , et al

Contributors PD, HB, CA and AF participated in the conception and design of the study. PD, HB, CA, and TE developed the search strategy and assessed the feasibility of the study. PD, HB, GB, JD and PVN wrote the manuscript. CA improved the manuscript. PD, HB, CA and AF are the guarantors. All the authors critically reviewed this manuscript and approved the final version.

Funding This research was funded, conducted and approved by the Public Health Agency of Canada.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

IMAGES

  1. Data Quality In 6 Step Process Showing Assessment And Control

    data quality assessment presentation

  2. Six Dimensions For Data Quality Assessment

    data quality assessment presentation

  3. Instant Insights: The 6C Data Quality Framework

    data quality assessment presentation

  4. Data Quality Definition Assessment Analysis Improvement Implementation

    data quality assessment presentation

  5. Data Quality Assessment Framework Ppt Powerpoint Presentation

    data quality assessment presentation

  6. PPT

    data quality assessment presentation

VIDEO

  1. Implementation of Routine Data Quality Assessment (RDQA)

  2. Elait's Data Quality Assessment

  3. Benefits of Utilizing Early Precision QT and Artificial Intelligence-Powered Data Quality Assessment

  4. Webinar: Data quality and data management techniques to drive value-led business outcomes

  5. Risk Data Quality Assessment

  6. Data quality assessment for academic and commercial datasets

COMMENTS

  1. PDF Data Quality Assessment Presentation to USAID

    Steps for Conducting a DQA. Step 1: Conduct PMP Desk Review using PMP Tool to establish status & quality of PMP. Step 2: Meet with Partners. Use DQA Tool No.1 to assess Outcome Indicators; and Tool No.2 for Output Indicators. Step 3: Generate DQA Assessment Summaries for Outcome & Output indicators. Note: For multiple partners, use the Summary ...

  2. PDF Handbook on Data Quality Assessment Methods and Tools

    data quality assessment is a precondition for informing the users about the possible uses of the data, or which results could be published with or without a warning. ... After a short presentation of the basic quality components for products, processes and user perception, chapters 2 and 3 give concise descriptions of each of the methods. The ...

  3. PDF How to Conduct a Data Quality Assessment (Dqa)

    5. Share the draft Report with the DO team, COR/AOR, and with the IP for comments and formal acknowledgement of the report's findings. HOW TO CONDUCT A DATA QUALITY ASSESSMENT 6. 6. File the DQA report in the official award files and send to the Mission's M&E Specialist for the Program Office files.

  4. PDF How-To Note: Conduct a Data Quality Assessment

    Introduction. This resource describes how to conduct a Data Quality Assessment (DQA). How-To Notes provide guidelines and practical advice to USAID staff and partners related to the Program Cycle. This note was produced by the Bureau for Policy, Planning and Learning (PPL). This How-To Note supplements ADS 201.3.5.7.

  5. A 6-step guide to Data Quality Assessments (DQAs)

    Monitoring experts responsible for data management and reporting are increasingly questioned about data validity, reliability, integrity, and timeliness. In short, people want to know whether they can trust the data. In order to answer these questions, a Data Quality Assessment (DQA) is carried out.

  6. PDF Assessing Data Quality: An Approach and An Application

    The data quality assessment1 literature is extensive,2 but it lacks practical applicability in 1. Throughout this paper, we use the word "assessment" to include what is o˙en referred to as a "validation" exercise ... but rather a theoretical presentation of assessment concepts. The classic work of Carmines and Zeller (1979) and Zeller ...

  7. Introduction to Data Quality

    Download ppt "Introduction to Data Quality". Learning Objectives and Topics Covered Understand the data quality conceptual framework Become familiar with the dimensions and metrics of data quality Understand what different RHIS management levels can do to ensure data quality Identify the main types of data quality problems Topics Covered Define ...

  8. All Presentations

    It also discusses the relationship between data assessment and requirements definition, and the application of criticality and risk assessment to make better decisions about what data to measure and how to measure it. ... In this presentation, data quality and Lean Six Sigma author, Rajesh Jugulum, provides practical ideas for aligning your ...

  9. Data Quality Assessment

    Presentation on theme: "Data Quality Assessment"— Presentation transcript: 1 Data Quality Assessment Good morning and welcome to this Data Quality Session. It is great to be with you here at USAID/Jordan. As you are aware, the Mission is in the final phases of updating its PMP. You have been working on this for sometime now and you were busy ...

  10. Training Courses on Quality Assurance and Quality Control Activities

    The course is intended for managers or analysts that either use (analyze) data themselves or review the use of data by others. "Introduction to Data Quality Assessment" Course. Introduction to Data Quality Assessment Presentation (PPT) (ppt) (828.5 KB) Data Quality Assessment Handout 1 - Project Table (doc) (642 KB)

  11. PDF ROUTINE DATA QUALITY ASSESSMENT TOOL (RDQA)

    The DQA Tool focuses exclusively on (1) verifying the quality of reported data, and (2) assessing the underlying data management and reporting systems for standard program-level output indicators. The DQA Tool is not intended to assess the entire M&E system of a country's response to HIV/AIDS, Tuberculosis or Malaria.

  12. A Review of Data Quality Assessment Methods for Public Health

    Data quality assessment methods generally base on the measurement theory [35,36,37,38]. Each dimension of data quality consists of a set of attributes. ... quality control, analysis and presentation for users, an integrated data warehouse, a metadata dictionary, unique identifier codes available: Mixed methods: quantitative and qualitative. Use ...

  13. PPT

    Presentation Transcript. The Data Quality Assessment Framework OECD Meeting of National Accounts Experts October 2001. Purpose of this Presentation To describe: • The IMF's Data Quality Assessment Framework (DQAF), and • Experience to date with the DQAF for Reports on Observance of Standard and Codes (ROSCs) and beyond.

  14. Top 12 Actions to Improve and Sustain Your Data Quality

    Summary. Data quality issues can significantly impact the business, but they may not be hard to fix if you address them early. This presentation provides 12 pragmatic actions to help data and analytics leaders holistically improve their data quality to attain long-term, sustainable business value.

  15. DAS Slides: Data Quality Best Practices

    Tackling data quality problems requires more than a series of tactical, one off improvement projects. By their nature, many data quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process and technology. Join Nigel Turner and Donna Burbank as they ...

  16. Data quality in 6 step process showing assessment and control

    PowerPoint presentation slides: Presenting this set of slides with name - Data Quality In 6 Step Process Showing Assessment And Control. This is a six stage process. The stages in this process are Data Quality, Data Governance, Data Management.

  17. PPTX Home

    ÐECÍãP"J`oí"ìê‹Z'oÂȾƒÉ‹ ">m UýÀ(Ž¡"þ.›ô^*™ô] UŸZYè» × ™X µ1&iàG8úFdr \æЀ™Gø.@ !l ¦ ¯€V¡ƒG}Ïõ÷ècÏQüÀ À ­BWÇúÅØÇŽ†žGˆcSä½EE¾ ä` 2ŒuK) ÿRicš±Ì†NA¹Ê¢™"YQ„CËM¼ ÛIìg882̽ ¸rŒîÊQUÇs(* ¶ ÁN;¾öÙöÁ°¾y ˆ=`©ºFòL,n„yø 'CÞªg®`ùÏ ...

  18. Data quality management with data cleansing and assessment

    Use our PPT slide to represent data profiling, data reporting, data repair, and other pillars of information quality management. Consolidate data quality assessment criteria like uniqueness, completeness, timeliness, accuracy, validity, and consistency through this PowerPoint template. Highlight the data cleansing system with the help of this ...

  19. Data Quality Template for PowerPoint and Google Slides

    Quality experts and data scientists will find this deck helpful in showcasing the characteristics and elements of data quality. You can depict the key areas of data quality management. You can also demonstrate the stepwise process of improving the quality of data. Visualize the key factors for data quality and how they contribute to effective ...

  20. PPT

    Presentation Transcript. Assessment of data quality Mirza Muhammad Waqar Contact: [email protected] +92-21-34650765-79 EXT:2257 RG610 Course: Introduction to RS & DIP. Contents • Hard vs Soft Classification • Supervised Classification • Training Stage • Field Truthing • Inter class vs Intra Class Variability • Classification ...

  21. Data Assessment

    Data Assessment found in: Data Assessment Powerpoint Ppt Template Bundles, Six dimensions for data quality assessment, Data Quality Assessment Metrics For Operational Records, Enterprise Data And Analytics Maturity Assessment..

  22. Association between different levels of suppressed viral load and the

    Introduction HIV is a major global public health issue. The risk of sexual transmission of HIV in serodiscordant couples when the partner living with HIV maintains a suppressed viral load of <200 copies of HIV copies/mL has been found in systematic reviews to be negligible. A recent systematic review reported a similar risk of transmission for viral load<1000 copies/mL, but quantitative ...