File formats

A file format is the way the information in a digital file is coded. To keep research data usable, it is important to save them in a durable format.


The file format determines which software you need in order to open and work on a file. Software may become obsolete. A file which depends on that software can then no longer be opened. 

File formats with long-term durability are independent of (non-proprietary) specific paid software, developers or providers. They are file formats which have open specifications (open format) or are used so much globally that the chances are small that in the long run there will no longer be any software to open the files.

Data archives, such as DANS and 4TU.ResearchData, assess the durability of file formats. They indicate which file formats are the best guarantee for long-term usability and accessibility.

Choosing a file format

When choosing a file format you should consider: 

  • which file format works with the software you are going to use to analyse the data?
  • which file format works for similar data? Is there a standard?
  • which file format does the data archive (that you intend to use) advise?

If you use data analysis software, it is preferable to use software which allows you to export data and save them independently of the software. 

Changing the file format

It is often possible to convert a file to a different file format, for example a format which requires less storage space. Such conversion carries certain risks:

  • loss of content (data)
  • loss of characteristics of the file stored within the file (metadata)
  • loss of layout ( e.g. in text files)
  • loss of quality ( e.g. in graphic files)

It is advisable, when converting a file to another file format, to keep a copy in the original file format. If you discover that something has gone wrong during the conversion, you can always repair the damage by means of the original file.

File extensions

The name of a file usually ends with a dot followed by 3 or 4 letters. This file extension is an indication of the file format, and therefore of the software needed to open the file.

Published by  RDM support

3 June 2016