Documents: TechReport (SDAPWG_2024) Safe Data Access Professsionals Working Group| Handbook on Statistical Disclosure Control for Outputs |2024-07-01
TechReport (Lowthain_2017) Lowthain, Philip / Ritchie, Felix | Ensuring the confidentiality of statistical outputs from the ADRN 2017-04-01
Safe Data Control: Documents from the ONS course
[@SDAPWG_2024; @Lowthain_2017; @Eurostat_2026; @ONS_2026; @ONS_2026a]
5 Safes Framework
- Project
- People
- Setting
- Data
- Output
Project: Legal Feasible Ethical Needs to have clear public benefit and consider any harm
People: Researchers trusted to work safely with data
Setting: Data security proportional to sensitivity of data it holds
Data: Proportional to its use
Outputs: Non disclosive, safeguard confidentiality
Project Application - Have it approved by UKSA They ask: Is there public benefit Is there analytical merit Is the project feasible Are privacy implications mitigated Has it completed ethical review
If you need to changes your original data plan to a new data plan: Needs changed with a "change of scope request"
Datasets with 2 in freq table are as directly disclosable as 1, because one of the patients would be able to work out about the other (with their own information)
So 10 is the minimum for output with ONS TRE
They consider 0's to often be disclosable - depends on if you expect it or not
How do you deal with primary disclosing cells in the dataset?
- Cell suppression (with recalculated totals)
- Round to some values (nearest 5, nearest 10, etc)
- Don't do a frequency table, do a ratio/growth rates
- Group some rows together
Think about dominance in a dataset though: - when dataset contains one group that makes up > 43.75 of a dataset - or two groups make up more than ~86%
Be worried about dominance when reporting low counts, where there are outliers in it When reporting a median, remember the median reflects an individual, so you'd probably then need to round the data afterwards, or bucket it.
What about secondary disclosure, this is where you combine two tables/datasets and it gives you something identifiable
Primary Disclosue - You can see the disclosable values Secondary Disclosure - You can calculate the disclosable values
What if you need to supress a value, but you still need accurate totals for one reason or another? You supress another value, that way you don't know what combination goes into each bracket to make the total Remember you would need to do this for rows and columns in a table The another value plus your original cell to be suppressed would need to meet the disclosure value x2 And remember that you need to ensure all combinations aren't disclosable, ultimately you have one cell to supress means a second to go from column, a third to go from rows, and then a fourth to go from the combination!
How do you output regression models? As long as degrees of freedom (people in model minus variable levels in model) are above the disclosure risk you are ok
So you aren't going to get residuals or scatter plots out, also outliers in boxplots are a sd
When you are doing your output request, give a nice clear description in the table at the start: description, numbers, covarage: CONTEXT
Researcher Integrity Course - Uni of Glasgow
This is the Concordat to Support Research Integrity as the main document guiding it. https://www.universitiesuk.ac.uk/what-we-do/policy-and-research/publications/concordat-support-research-integrity
- Honesty
- Care
- Rigour
- Transparency
- Accountability
Research needs to comply with:
- Ethics
- Legislation
- Safeguarding