After defining what we mean by data, it is helpful to consider what types of data you create and/or work with, and what format those data take. Your data stewardship practices will be dictated by the types of data that you work with, and what format they are in.
Data types generally fall into five categories:
Observational
- Captured in situ
- Can’t be recaptured, recreated or replaced
- Examples: Sensor readings, sensory (human) observations, survey results
Experimental
- Data collected under controlled conditions, in situ or laboratory-based
- Should be reproducible, but can be expensive
- Examples: gene sequences, chromatograms, spectroscopy, microscopy
Derived or compiled
- Reproducible, but can be very expensive
- Examples: text and data mining, derived variables, compiled database, 3D models
Simulation
- Results from using a model to study the behavior and performance of an actual or theoretical system
- Models and metadata, where the input can be more important than output data
- Examples: climate models, economic models, biogeochemical models
Reference or canonical
- Static or organic collection [peer-reviewed] datasets, most probably published and/or curated.
- Examples: gene sequence databanks, chemical structures, census data, spatial data portals.
Research data comes in many varied formats: text, numeric, multimedia, models, software languages, discipline-specific (e.g. crystallographic information file (CIF) in chemistry), and instrument specific.
Formats more likely to be accessible in the future are:
- Non-proprietary
- Open, documented standards
- In common usage by the research community
- Using standard character encodings (ASCII, UTF-8)
- Uncompressed (desirable, space permitting)
Use the table below to find an appropriate and recommended format for preserving and sharing your data over the long term.
|
|
Other Acceptable formats |
|
|
|
|
|
|
vector and raster data
|
|
|
textual
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
spectroscopy data and other plots which require the capability of representing contours as well as peak position and intensity
|
JCAMP file viewers: JSpecView, ChemDoodle |
Sources: University of Edinburgh Information Services
University of Oregon Libraries
California Digital Libraries
Information duplicated from Oregon State University Libraries