This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Active data storage and backup
Before starting practical work, storage and backup solutions must be selected. Good choices will mean data is safe during a research project, including in the case of unpredicted problems.
Updated: 24 July, 2019
Active data is data that is added as a research project develops. Knowing where to store it, including physical copies, digitised documents, datasets and multimedia files, is important. It might well prevent the failure of your project, as good data storage practices protect you from data loss and enable effective collaborations.
Before making any plans about active data storage, you should check whether your organisation has a preferred approach or has solutions already in place such as networked drives or institution-provided cloud storage. This could reduce costs and help you comply with local research data policies.
Tips for safe data storage
Media fails! You WILL experience a failure in your lifetime. Whether or not it is catastrophic depends upon steps you take. To store digital information, you can choose [1] CDs/DVDs, hard drives, SSDs, the cloud [2] and other media. Each storage solution presents risks, often related to the long-term reliability of the device (e.g. a desktop computer, a laptop, a portable device), the medium (e.g. a hard drive as opposed to a DVD) used and, for some projects that span a number of years, the longevity of the software required to access and use the data.Best practice [3] suggests that:
- You use at least two types of storage media
- You replace them every two to five years
- You carry out integrity checks [4] (based on the principle of fixity [5]).
In some cases, you might wish to store laboratory notes throughout a project. The state-of-the-art approach to this is to use electronic lab notebooks [6], which allow you to make notes but also to share, search, protect, and back them up.
There is a growing trend towards using cloud-based drop box style storage products for storing active data. Whilst undoubtedly convenient, there are some potential drawbacks that need to be considered. For instance, although some products allow a degree of “roll-back” to previous versions, many don’t. If something is deleted from such a storage space, recovery may be difficult, if not impossible. Similarly, overwriting a file may prevent you from recovering a previous version. Most often, cloud-based storage solutions are not a substitute for a correctly configured an instigated backup procedure.
This might seem like a lot of work for just deciding where to store your everyday work – but data loss horror stories [7] are a sobering reminder of the consequences of taking data storage and backup lightly!
Why back up research data?
The best way to keep your research data safe is to consistently back it up. Backups can protect your data from hardware failures, thefts, software faults and more. You will need to think about what to backup, in what format, how many copies to keep, how long to keep them for, how frequently you back-up and how often you take a snapshot of the data to preserve an “instance in time”. It is also important to assign responsibilities for data storage and backup: always make sure that someone in the team knows it is their job to check that data exists and is safeguarded. Perhaps most important of all is to test your backup procedures to make sure the data is actually being backed up and that it is recoverable should a disaster happen. Your organisation might have a strategy in place to manage backups, so check with relevant IT specialists. Alternatively, there are tools [8] that can help you back up research data more efficiently via automated workflows. Note that backup should not be confused with data preservation!
Backing up personally identifiable information
When backing up your research, you need to think about how to protect personally identifiable information (e.g. racial or ethnic origin, political opinions, religious beliefs). Try to create as few copies of personal data as possible, which reduces security concerns and makes it easier to destroy the backed-up data when the project ends.
The storage location of personally identifiable information is equally important. For instance, data protection regulation may require that you keep it in the EEC (note that many cloud services are not based in Europe). It is also paramount that you consider the security of such information, who might gain access, how they might gain access and the consequences should this happen.
For some tips on how to keep data safe, see the Security section of this toolkit.
Further reading
Footnotes
- [1] CESSDA Expert tour guide on RDM - Storage https://www.cessda.eu/Research-Infrastructure/Training/Expert-tour-guide-on-Data-Management/4.-Store/Storage
- [2] Cloud security collection https://www.ncsc.gov.uk/guidance/cloud-security-collection
- [3] UKDS - Data storage https://www.ukdataservice.ac.uk/manage-data/store/storage
- [4] Checksum Checker http://checksumchecker.sourceforge.net
- [5] Checksum http://dictionary.casrai.org/Checksum
- [6] Electronic Lab Notebooks - for prospective users https://www.gurdon.cam.ac.uk/institute-life/computing/elnguidance
- [7] How Toy Story 2 Almost Got Deleted https://www.youtube.com/watch?v=8dhp_20j0Ys
- [8] Fixity (tools) http://coptr.digipres.org/Category:Fixity
RDM at your institution
Quick access to relevant RDM information and guidance provided by your institution.
Add/update a link to your institutional RDM pageGot a suggestion for an update?
To suggest changes, or new content to be included in the toolkit, please get in touch.