Data quality is the degree to which data are fit for their intended use. In clinical research, intended use includes participant safety oversight, protocol compliance, statistical analysis, regulatory reporting, publication, data sharing, and long-term archival. Data quality is therefore not a single property and cannot be judged only by whether a dataset contains values. A dataset may be complete but inaccurate, accurate but late, valid but poorly documented, or internally consistent but not suitable for answering the study question.
In clinical research, data quality has scientific, ethical, and operational significance. Scientifically, poor-quality data can lead to biased estimates, incorrect conclusions, and invalid recommendations. Ethically, participants contribute data with the expectation that their information will be used responsibly and meaningfully. If data are unusable because of poor quality, the study may fail to honor that contribution. Operationally, poor-quality data consume time and resources through repeated queries, delayed analysis, extended monitoring, and postponed database lock.
Data quality management is the planned and systematic process of preventing, detecting, documenting, resolving, and learning from data quality problems. Prevention occurs through good protocol design, CRF design, REDCap validation, user training, and standard operating procedures. Detection occurs through data quality checks, reports, monitoring, source data verification, audit review, and statistical review. Resolution occurs through query management, corrections, documented explanations, and sometimes protocol or database amendments. Learning occurs when the study team uses patterns in quality problems to improve
training, forms, workflows, or monitoring focus.
One of the most important ideas in this chapter is that data quality is not achieved at the end of a study. It is built throughout the study. A study that waits until final analysis to review missing outcomes, inconsistent dates, duplicate participants, or unresolved adverse event information will face delays and may discover problems too late to correct. A well-managed study reviews quality continuously and treats data quality as a shared responsibility across investigators, site
staff, data managers, monitors, statisticians, and sponsors.