Data quality, provenance and transparency in real-world data : Aligning quality standards with data governance legal frameworks
Abstract
There have been numerous papers discussing data quality and data protection independently, but there has been little discussion on how data quality relates to data protection and other data governance regulatory frameworks. This paper is a step towards addressing that gap and makes the case for why data quality is relevant for data protection and legal compliance professionals. Real-world data in the context of healthcare refers to data that is routinely collected in the course of delivering healthcare. From a data protection regulatory perspective, Article 5 of the General Data Protection Regulation (GDPR) lists data accuracy as one of the principles for data processing. The recently adopted European Union Artificial Intelligence Act (EU AI Act) Article 10 outlines requirements for data and data governance, specifically quality criteria for datasets used to train, test and validate high-risk AI models to address concerns around algorithmic bias due to biases in the training data. The Standards for Data Diversity, Inclusivity and Generalisability (STANDING) Together consensus recommendations for dataset curators on transparency in dataset documentation enable an informed assessment of the suitability of data and examination of biases, for development of AI health technologies. This includes information on data provenance, modifications, sociodemographic composition and bias assessment findings. The Clinical Practice Research Datalink (CPRD) database is used to illustrate how these recommendations can be implemented in a practical way using unique identifiers such as digital object identifiers (DOIs), metadata, published data resource profiles with sociodemographic information and data quality assessments using validation and comparability studies. There is considerable alignment between established scientific standards, medical product regulatory and data governance legal requirements on data quality, as well as emerging international consensus which will reduce the compliance burden on curators and users of real-world data. This article is also included in The Business & Management Collection which can be accessed at https://hstalks.com/business/.
The full article is available to subscribers to the journal.
Author's Biography
Puja Myles is Director of the Medicines and Healthcare products Regulatory Agency’s (MHRA) specialist real-world data research services centre, Clinical Practice Research Datalink (CPRD). She initially joined the MHRA as Head of Observational Research, CPRD in 2017. Prior to this, she trained as a public health specialist and was a public health academic at the University of Nottingham, UK. Puja is a fellow of the Faculty of Public Health, UK, a senior fellow of the Higher Education Academy, UK and has a doctorate in epidemiology. Her areas of expertise include real-world data, data quality, synthetic data, artificial intelligence (AI), regulatory science, data governance and privacy implementation.
Eleanor Axson is a Senior Researcher-Senior Assessor with the Observational Research Team at the Clinical Practice Research Datalink (CPRD). She joined CPRD in 2020 and has been the Data Delivery Workstream Lead since 2024. Eleanor holds a PhD in clinical medicine research from Imperial College London and was a Research Assistant at the National Heart and Lung Institute at Imperial College London before joining CPRD. She has an MPH in epidemiology from the University of Michigan. Her areas of expertise include real-world data, observational research studies, ethnicity data and small area data.
Colin Mitchell is Head of Humanities at the PHG Foundation, a health policy unit and part of the University of Cambridge. The PHG Foundation’s multidisciplinary team works with health professionals, researchers and policy makers to explore the implications of emerging data and related technologies for healthcare and research. Colin leads the foundation’s work on legal and ethical issues arising from novel health technologies, biomedical research and data-driven innovation. He has a PhD in health law from the University of Amsterdam, a Masters of Studies in legal research from the University of Oxford and a BA in law from the University of Cambridge.