Skip to main content

Data Management Services: Data Documentation/Metadata

This guide will assist Clemson researchers in managing their data, and includes information on creating Data Management Plans for funding agencies.

Documenting Your Data

In order for your data to be used properly by you, your colleagues, and other researchers in the future, they must be documented.  Data documentation (also known as metadata) enables you understand your data in detail and will enable other researchers to find, use and properly cite your data.

It is critical to begin to document your data at the very beginning of your research project, even before data collection begins; doing so will make data documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project.

Researchers can choose among various metadata standards, often tailored to a particular file format or discipline.  Following are some general guidelines for aspects of your project and data that you should document, regardless of your discipline.  At minimum, store this documentation in a readme.txt file or the equivalent, together with the data. One can also reference a published article which may contain some of this information.

Title
Name of the dataset or research project that produced it

Creator
Names and addresses of the organization or people who created the data

Identifier
Number used to identify the data, even if it is just an internal project reference number

Subject
Keywords or phrases describing the subject or content of the data

Funders
Organizations or agencies who funded the research

Rights
Any known intellectual property rights held for the data

Access information
Where and how your data can be accessed by other researchers

Language
Language(s) of the intellectual content of the resource, when applicable

Dates
Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., maintenance cycle, update schedule

Location
Where the data relates to a physical location, record information about its spatial coverage

Methodology
How the data was generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook

Data processing
Along the way, record any information on how the data has been altered or processed

Sources
Citations to material for data derived from other sources, including details of where the source data is held and how it was accessed

List of file names
List of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')

File formats
Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data

File structure
Organization of the data file(s) and the layout of the variables, when applicable

Variable list
List of variables in the data files, when applicable

Code lists
Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data')

Versions
Date/time stamp for each file, and use a separate ID for each version

Checksums
To test if your file has changed over time

 

Thanks to MIT Libraries for sharing their content.

Metadata Standards

Some metadata standards are very general and can be applied to a variety of situations, while others are more discipline-specific.