This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Protecting personally identifiable information
Personally-identifiable information needs to be treated very carefully. It is indispensable in many research fields, although its misuse might lead to high fines and ethical issues.
Updated: 2 May, 2020
The General Data Protection Regulation (GDPR) [1] doesn’t specify how personally identifiable information should be protected in practice. Solutions include the use of Safe Share [2] to transfer data securely between partners and DataSHIELD [3] to enable the remote and non-disclosive analysis of personally identifiable information.
Encryption is an approach favoured by some. However, this is not without its own problems, particularly in relation to real-time access and storage/recovery of encryption keys. Key Escrow – the storage of a key by a trusted third part – is a potential solution to key loss, but is not yet widely adopted in the research community and has drawbacks [4] of its own.
In any event, you should know that the GDPR is particularly strict when you deal with data revealing an individual’s:
- Racial or ethnic origin
- Political opinions
- Religious or philosophical beliefs
- Trade union membership.
In addition, the processing of the following is also within the scope of the GDPR:
- Genetic data
- Biometric data
- Data concerning health
- Data concerning a natural person’s sex life or sexual orientation.
Types of personally identifiable information
There are different types of personally identifiable information [5]:
- Direct identifiers: information that, on its own, allows you to identify an individual. This includes names, email addresses including one’s name, fingerprints, facial photos, etc. This information presents a high risk.
- Strong indirect identifiers: information that allows you to identify an individual through minimal effort. This includes postal addresses, telephone numbers, email addresses not including one’s name, URL of personal pages, etc. This information presents a moderate risk.
- Indirect identifiers: information that allows you to identify an individual when linked with other available information. This includes background information on people, such as age, location, gender and job title. This information presents a low risk.
You should address indirect identifiers as thoroughly as you would the direct ones. When it comes to data protection regulation, ignorance of law excuses no one [6] and your best efforts are expected.
Anonymising research data
A key approach to protecting personally identifiable information is anonymisation, i.e. the irreversible removal and deletion of personal identifiers. As a starting point, you should know whether you are performing quantitative [7] or qualitative [8] research. In the former case, anonymisation is slightly simpler, because sometimes all you have to do is delete direct identifiers. If your dataset is more complex (e.g. it contains free text), you will have to be more thorough and resort to ad-hoc anonymisation techniques. If you are undertaking qualitative research, anonymisation is far more complicated. It requires a much higher extent of personal judgement and you should follow the best practices [8] highlighted by the UK Data Service.
When anonymising data, it is good practice to consider how and why your data could be linked to other datasets and to take steps to prevent such linking from being possible.
Further reading
Footnotes
- [1] General Data Protection Regulation (GDPR) http://ec.europa.eu/justice/data-protection/reform/files/regulation_oj_en.pdf
- [2] Safe Share https://www.jisc.ac.uk/safe-share
- [3] DataSHIELD https://www.datashield.ac.uk/
- [4] The Risks of Key Recovery, Key Escrow, and Trusted Third-Party Encryption https://www.schneier.com/academic/archives/1997/04/the_risks_of_key_rec.html
- [5] Anonymisation and Personal Data http://www.fsd.uta.fi/aineistonhallinta/en/anonymisation-and-identifiers.html
- [6] Ignorantia juris non excusat https://en.wikipedia.org/wiki/Ignorantia_juris_non_excusat
- [7] Anonymisation - Quantitative data https://www.ukdataservice.ac.uk/manage-data/legal-ethical/anonymisation
- [8] Anonymisation - Qualitative data https://www.ukdataservice.ac.uk/manage-data/legal-ethical/anonymisation/qualitative
RDM at your institution
Quick access to relevant RDM information and guidance provided by your institution.
Add/update a link to your institutional RDM pageGot a suggestion for an update?
To suggest changes, or new content to be included in the toolkit, please get in touch.