Course Content
Clinical Research Data Management Course

High-quality clinical research data have several interrelated characteristics. Accuracy means that data correctly represent what was observed, measured, reported, or documented. Completeness means that required data are present or that missingness is clearly documented.
Consistency means that data do not contradict related information elsewhere in the dataset. Timeliness means that data are collected, entered, reviewed, and available within expected timeframes. Validity means that values conform to predefined formats, ranges, and rules. Reliability means that measurements are reproducible under comparable conditions. Integrity means that data are protected against unauthorized or undocumented change.

These characteristics should not be treated as abstract ideals. They have practical implications for every part of a study. A temperature value of 370 degrees Celsius is invalid because it falls outside a biologically plausible range. A participant recorded as male with a positive pregnancy status may indicate inconsistency, a data entry error, or a special clinical circumstance requiring clarification. A missing primary outcome threatens completeness. A laboratory value entered three months after collection may raise concerns about timeliness.

A dataset exported without a data dictionary may be difficult to interpret, even if the values themselves are accurate. Quality is also contextual. A variable may be complete but not useful if it is collected in a form that does not support the analysis. For example, recording antimalarial treatment as a free-text field may capture information, but it may not be easy to summarize unless coding rules are later applied. Similarly, a database may have strong validation rules but still produce poor data if staff are not trained, source documents are incomplete, or workflows encourage users to bypass required checks.