What type of data will you create, collect, link to or record?
Types of Data
Have you defined each element of your dataset (e.g., definitions of abbreviations used, units of measurement)?
Consider creating a data dictionary. A data dictionary is a file that describes each element of your dataset. If your dataset includes tabular (spreadsheet) data, the data dictionary would include a list of the fields in the table and what they mean, including units and precision.
If your data included R or Python code or scripts, the dictionary would provide a brief overview of the purpose of the code (if not already contained in comments); and information about the code relates to the dataset. [From Smithsonian Data Management Best Practices. Describing Your Data: Data Dictionaries (pdf)].
Data dictionaries have several benefits:
- Keeping things consistent across a project. The dictionary can define data names, labels, units, constraints such as acceptable range of values, and other characteristics.
- Enabling software to process a data file, by providing details to the software about the file. This information might include the type of data in each column (integer, character, date, etc); the name of the column; the physical units, if relevant; whether nulls are included; etc.
- Increasing interoperability and reuse of the data that you want to share and publish.
- Providing “human-readable” details to support discovery, interpretation and analysis.
For more details on what might be in a data dictionary, how to make one, and examples, see:
- USGS Data Management: Data Dictionaries
- Smithsonian Libraries’ Describing Your Data: Data Dictionaries
- Open Science Framework: How to Make a Data Dictionary
Documenting Data
Metadata
ReadMe
Data Dictionary information created by The University of Iowa Libraries and used with permission, Creative Commons Attribution license (CC BY) .
Ethics and Legal Compliance
◉ Does your data include sensitive information or research involving humans?
◉ Are there person-related ethical considerations with your data?
◉ What legal or intellectual property issues might you encounter?
◉ Does your University have a data ownership policy?