Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Guides

SGS-GPS Workshop: Introduction to Research Data Management

Know Your Data

The Data

Any management is easier if you have a good handle on what, exactly, you're managing and who is involved. You know your data, but reflect on how aspects of your data might affect how you organize and share them. For example, if there are relationships between or among datasets, you may want to indicate this in your file system or naming convention. Consider the

  • Data types will you create or use
  • Methods of capture or creation, processing, and analyzing
  • Existing data
  • Possible relationships among datasets

Another thing to keep in mind that will likely affect access and storage to your data is whether it contains confidential or sensitive information. Confidential information is personally identifying (eg., names, DOBs). Sensitive information is any data that, if released to the public, would have an adverse effect (eg., the location of a nest of an enangered bird species). 

Lastly, make sure you consult any funding agency policies-- you may need to share and/or preserve or your data.

It's People!

Consider the members of the research team. Who will be responsible for keeping track of where data are and who has access to them? Who will decide whether to share, preserve, or archive the data? Sorting out the roles of your research team in regard to data management is key. That might be easy if you're working alone, but is potentially quite difficult in larger research groups. 

File Structures and File Names

File Structures

Consider what organizing principle will work best for keeping your data in control and easy to find. You can use any number of things--dates, methods, objects, data type, etc--to group files and create a meanignful directory tree. What will work for one study or experiment may not work for others. It's a good idea to think about this before you start collecting data. More importantly, be consistent and use your plan!

File Names

The following guidelines will help you create useful fileneames:

  • Avoid special characters  
  • Capitals or underscores instead of periods or spaces; try camelCase
  • Use 32 or fewer characters
  • Use date format ISO 8601:YYYYMMDD.
  • Include version information
  • Use meaningful names
  • Above all else: BE CONSISTENT

 

Documentation and Description

document all the things

Documentation

Keeping track of what data you have, what has been done to the data, and where the data are is vitial to managing data. Documentation is not glamorous. In 2 years (or, if you're like me, 2 weeks), future you might be baffled by the arrangement of materials, forget where things are, or even what you have. Make sure to

  • Inventory all forms of data, digital and analog
  • Include your code book or data dictionary
  • Document your data!
  • What has been done
  • Where it is and who has it
  • Track versions/use version control
  • Decide on a file system

 

Codebooks and Data Dictionaries 

By describing data files, codebooks make data understandable and usable in the future. Codebooks can vary considerably from study to study and discipline to discpline. ICPSR's "What is a Codebook?" and Guide to Codebooks  and Princeton University Data and Statistical Services' "How to Use a Codebook" outline uses for and typical features of codebooks. Usually associated with databases, data dictionaries define the fields and relationships among fields and tables. But data dictionaries (and thesauri) can also be used ot create a glossary of terms or define values in a study. 

The overarching principles of consistency and application (i.e. just doing it the same way every time) are the most important here and apply in metadata as well.

Metadata

metadata is a love not to the future

People annoyingly define metadata as "data about data" or slightly more specifically "structured data about data." By describing things, metadata helps us identifydiscover, assess, and manage those objects. Put another way, metadata can serve as surrogates for objects and helps us find and understand those objects. As does documentation, metadata makes objects meaningful for the future by providing important context.

An example of a metadata record is a catalog entry for a book, which has a number of values for a number of fields. For example, Catcher in the Rye is a value for the 'title' field. Metadata standards are collections of fields that have been created to describe certain types of objects. (There are a lot of different types of things out there!) It's always a good idea to use a standard whenever available. 

A number of standards have been developed for different types of data, usually for data associated with domains of knowledge or disciplines. Often, these standards will have fields that provide necessary context for understanding a dataset. Ask around in your department or research group to see if there's a standard used in your field.

Metadata works best when its consistent. Don't use synonyms to describe the same thing: choose a term! (Libarians and catalogers often use controlled vocabularies to keep things consistent, but you won't necessarily need to do that. We like this explanation of controlled vocabularies and some related concepts.)

If you're having difficulty with metadata, consult with the Library! 

 

Photo Credit: Metadata is a Love Note to the Future, photo by @kissane]

Preservation and Storage

Storage

Always back up your data. Always. But aside from that, consider how best to store your data to ensure it is accessible and protected. If your data contains confidential or sensitive data, use encryption. 

Preservation

Good data management practices facilitate data preservation, which may be required by your funding body. The goal is to make sure that the objects are understandable (using documentation and description) and useful. This last part requires that you keep in mind the file formats in which you might save your data. It's good practice to

  • Use open source, non-proprietary file formats
  • Document/include any software package required to read and work with the data
  • Document if multiple files comprise the data file structure
  • Include metadata

It's also a good idea to know who is in charge of the data. If you're privately preserving that data (i.e. not depositing it in an archive), make sure you know where it is!