The General Data Protection Regulation (GDPR) doesn’t specify how personally identifiable information should be protected in practice. Solutions include the use of Safe Share to transfer data securely between partners and DataSHIELD to enable the remote and non-disclosive analysis of personally identifiable information.
Encryption is an approach favoured by some. However, this is not without its own problems, particularly in relation to real-time access and storage/recovery of encryption keys. Key Escrow – the storage of a key by a trusted third part – is a potential solution to key loss, but is not yet widely adopted in the research community and has drawbacks of its own.
In any event, you should know that the GDPR is particularly strict when you deal with data revealing an individual’s:
- Racial or ethnic origin
- Political opinions
- Religious or philosophical beliefs
- Trade union membership.
In addition, the processing of the following is also within the scope of the GDPR:
- Genetic data
- Biometric data
- Data concerning health
- Data concerning a natural person’s sex life or sexual orientation.
Types of personally identifiable information
There are different types of personally identifiable information :
- Direct identifiers: information that, on its own, allows you to identify an individual. This includes names, email addresses including one’s name, fingerprints, facial photos, etc. This information presents a high risk.
- Strong indirect identifiers: information that allows you to identify an individual through minimal effort. This includes postal addresses, telephone numbers, email addresses not including one’s name, URL of personal pages, etc. This information presents a moderate risk.
- Indirect identifiers: information that allows you to identify an individual when linked with other available information. This includes background information on people, such as age, location, gender and job title. This information presents a low risk.
You should address indirect identifiers as thoroughly as you would the direct ones. When it comes to data protection regulation, ignorance of law excuses no one and your best efforts are expected.
Anonymising research data
A key approach to protecting personally identifiable information is anonymisation, i.e. the irreversible removal and deletion of personal identifiers. As a starting point, you should know whether you are performing quantitative or qualitative research. In the former case, anonymisation is slightly simpler, because sometimes all you have to do is delete direct identifiers. If your dataset is more complex (e.g. it contains free text), you will have to be more thorough and resort to ad-hoc anonymisation techniques. If you are undertaking qualitative research, anonymisation is far more complicated. It requires a much higher extent of personal judgement and you should follow the best practices highlighted by the UK Data Service.
When anonymising data, it is good practice to consider how and why your data could be linked to other datasets and to take steps to prevent such linking from being possible.