Sharing data is now encouraged by major funding agencies, and many journals require it as a prerequisite for publication. The NSF specifically states:
Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.
In addition to funder requirements, data sharing is important because it can lead to a broader impact for your research and facilitate advances in science. Sharing your data in a subject repository will facilitate the sharing and re-use of your data.
Data citation is an important component of data sharing and data reuse. Citing data gives data creators credit for creating and sharing their work, and creates a trail of research progress similar to the citation of articles and books.
These guidelines will also help you make sure that the data you generate and share is also citable by others.
Check with the journal you're publishing in to see if they have a data citation format recommendation. Many journals and citation styles don't specifically require you to cite research data, or they don't give you specific citation guidelines for a research data set. In this case, you should still cite data you use in your analysis and publications with these key elements:
DataCite is an international organization that helps researchers to find, access, and use data. Their recommended data citation format is:
It may also be desirable to include information from two optional properties, Version and ResourceType (as appropriate). If so, the recommended form is as follows:
For citation purposes, DataCite recommends that DOI names are displayed as linkable, permanent URLs:
Digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle.
The active management of research data reduces threats to their long-term research value and mitigates the risk of digital obsolescence. Meanwhile, curated data in trusted digital repositories may be shared among the wider UK research community.
As well as reducing duplication of effort in research data creation, curation enhances the long-term value of existing data by making it available for further high quality research.
The digital curation lifecycle
Digital curation and data preservation are ongoing processes, requiring considerable thought and the investment of adequate time and resources. You must be aware of, and undertake, actions to promote curation and preservation throughout the data lifecycle.
The digital curation lifecycle comprises the following steps:
Conceptualise: conceive and plan the creation of digital objects, including data capture methods and storage options.
Create: produce digital objects and assign administrative, descriptive, structural and technical archival metadata.
Access and use: ensure that designated users can easily access digital objects on a day-to-day basis. Some digital objects may be publicly available, whilst others may be password protected.
Appraise and select: evaluate digital objects and select those requiring long-term curation and preservation. Adhere to documented guidance, policies and legal requirements.
Dispose: rid systems of digital objects not selected for long-term curation and preservation. Documented guidance, policies and legal requirements may require the secure destruction of these objects.
Ingest: transfer digital objects to an archive, trusted digital repository, data centre or similar, again adhering to documented guidance, policies and legal requirements.
Preservation action: undertake actions to ensure the long-term preservation and retention of the authoritative nature of digital objects.
Reappraise: return digital objects that fail validation procedures for further appraisal and reselection.
Store: keep the data in a secure manner as outlined by relevant standards.
Access and reuse: ensure that data are accessible to designated users for first time use and reuse. Some material may be publicly available, whilst other data may be password protected.
Transform: create new digital objects from the original, for example, by migration into a different form.
Preservation of data is different from simple storage of data. For preservation purposes, data will be migrated from format to format as new storage models come into use, and the data's integrity will be maintained through the process. A good example of data preservation is the Inter-university Consortium for Political and Social Research which is a social science data archive containing thousands of data sets from all over the world back to the 1800's. Data in ICPSR has been and will continue to be properly managed to ensure access and usability of data over time.
Not many individual labs are equipped to preserve data for long-term use, so domain archives like ICPSR can be a good alternative. Several journals and funding agencies require data deposit into a repository (such as GenBank) for long-term reliable preservation.
There are hundreds of domain repositories. Some will accept only certain data funded by certain agencies, and others will accept data that fits their collection policy. re3data.org is a database of research repositories by discipline:
Check these out and see if a repository there matches the long-term home you envision for your data. Keep in mind that not every subject repository will accept your data and not every repository is suited for long-term preservation. If you need help identifying a suitable repository for your data, contact us.