Data Research
Goldacre Review
Tweet Thread By Researcher and Review Author Jess Morley Goldacre Review Summary:
- NHS needs better data analysis in research and day to day work
- Look into:
- NHS-R
- NHS-Python
- AnalystX
- AphA
- UK FCI
Chapter One
Summary Recommendations from chapter one:
- Create NHS Analyst Service
- Embrace modern open working methods, with reproducible analytical pipelines
- Create an Open College for NHS Analysts
- Recognise value of knowledge management
- Get expert help from academia and industry
- Train non-analysts in how to be good customers of data teams
Chapter Two
- You should be using Reproducible Analyticle Pipelines
- RAPs
- They recommend using:
- Version Control/Github
- Code Review
- Functions
- Unit Tests
- Libraries
- Documentation
-
They recommend looking at:
- The Turing Way
- Design Principles for Government Digital Services
- We need to talk about the lack of investment in digital research infrastructure Summary Recommendations from chapter two:
-
Promote and Resource Reproducible Analytical Pipelines as the minimum standard for academic/NHS data analysis
- Ensure all code for data curation and analysis paid for by the state is shared openly with appropriate technical documentation to all data users
- Recognsise software development is central feature of all good work with data
- Bridge gap between health research and software development:
- Train academic researchers in contemporary computational data science techniques
- Open code is different to open data. It is reasonable for NHS and government to do some analyses discreetly
Chapter Three - Privacy
- Privacy
- Reidentification and leak of disclosive information is possible
- Other techniques to minimise risk:
- Pseudonymisation
- Data minimisation
- Removal of sensitive codes
- sub sampling
- data perturbation & synthetic data
- homorphic encryption
- The current systems exist to protect privacy and are important. But can also be extremely slow.
- The main shortcoming with the current system is that it relies on trust.
-
Trusted Research Environments are only realistic way to safely manage this expansion of datasets and researchers.
-
So for my sake, the SAFEHAVEN stuff would be a Trusted Research Environment? I think
Chapter Four - Trusted Research Environments
This chapter recommends that Trusted Research Envrionments need to be the way to approach data access
They describe them as:
Standard environments that share code and working practices.
Which comprise of 3 components:
1 - Service Wrapper - Rules Regulations, Governance Etc 2 - Generic Compute and Database 3 - Subject Specific Code - Functions/Libraries/Documentation that can deliver specific NHS Analyses, can be reused by all those using the TRE
Summary Recommendations for Chapter Four:
- Build a small number of Secure TREs:
- Make them the norm for all analysis of NHS ER data
- Should be as few TREs as possible, with culture of openness and re-use around all code
- Use enhanced privacy protections of these to create new, faster access rules and processess:
- i.e. ensure all TREs publish logs of all activity
- Map all current bulk flows of pseudonymised NHS data and shut them down, replacing them with TREs
- Use these TREs as an opportunity to drive modern/efficient/open approaches to data science
Chapter Five - Information Governance
- This chapter points out that the current rules are cumbersome
-
With the recognition they've ended up that way as historically we've fucked it up and not had strict enough rules
-
They highlight Patient/Public Involvement and Engagement is vital:
- and that the patients and relatives will often know the most relevant of questions, more so than the researcher without the experience of the illness
Summary Recommendations for Chapter Five:
- Rationalise approvals - Have a clear map of what you need to get data approvals and get rid of the duplicate work
- Have frank public conversation about commercial use of NHS data, after privacy has been fixed via TREs. Make sure NHS and patients get financial return when marketable stuff comes from NHS data
- Have clear rules about the use of NHS patient records in performance management of NHS organisations
- Address problem of having too many data controllers. Either have one national organisation, or an approvals pool.
Chapter Six - Data Curation
- Things needed to create datasets include knowledge on::
- Clinical medicine
- Health Data
- SNOMED-CT codes (I don't know what that is!)
- Clinical Informatics
Summary recommendations:
- Use RAPs/code sharings
- Use an Open Library where all NHS data curation work can be shared
- Get some Data Pioneers to populate the open library with curation code
- Get some open competitive funding for code in this space
- Conduct this work in standard TRE settings
Chapter Seven - Strategy
- Use people with technical skills to manage complex technical problems. i.e.:
- Hire some senior developers/data architects/data scientists
- Build using new techniques
- Create some data pioneer groups
- Build TRE capacity
- Focus on platforms