Skip to main content

Data Management: File Naming and Formats

Folder and File Naming

Developing and following a consistent file naming strategy can make data easier to find, particularly when large numbers of files exist. Although it's common to think that file names will be remembered in the future, unorganized files and folders can lead to frustration down the road.

Good file naming allows you to identify the contents of the file from the name. It may be useful to consider including some or all of the following information, as long as it is done consistently.

Common Elements for Folder or File Naming:

  • File creator's name/initials
  • Date of creation
  • Version number
  • Project or experiment name or acronyn
  • Location/spacial coordinates
  • Type of data
  • Conditions
  • File version number

Rules of Thumb for File Naming:

  • Keep file names as short as possible while including necessary information.
  • Do not use special characters such as #, <, >, $, +, %, *, etc. Different operating systems may interpret these as instructions to perform special tasks.
  • Use hyphens (-) instead of underscores (_), full stops (.), or spaces in file names.
  • If you add dates, make sure you use consistent formatting.

Why Hyphens Instead of Underscores?

  • While underscores are invisible when links are underlinesd, hyphens can still be seen.
  • For files that might be searched online, some search engines don't recognize underscores and others may downgrade page rankings if underscores are used.

 

File Format Choices for Long-Term Access to Data

As anyone with data saved on a floppy disk probably realizes, technology changes constantly and, if not planned for, hardware and software obsolescence can render data inaccessible and unreadable. To ensure long-term accessibility of research data, researchers should consider using file formats that share these characteristics:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Data saved in proprietary formats are vulnerable to obsolescence. Here are examples of open file formats you can use to increase the likelihood that research data will be accessible in the future. A number of applications have been developed to open these types of files:

  • Rich text format (.rtf), Text file (.txt) or .pdf instead of a Microsoft Word extension (.doc, .docx)
  • Comma-separated values (.csv) instead of a Microsoft Excel extension (.xls, .xlsx)
  • TIFF or JPEG2000 instead of .gif or .jpg