Data management in Computer Science

In the exact sciences large quantities of data of all kinds are often produced: data from various kinds of measuring apparatus, image files, databases, simulations, statistical data, geographical data, spreadsheets, etc. But also data in publications such as (Open) Office documents, CSV, HTML or PDF files.

To keep these data retrievable, accessible and understandable in the long term, the storage, sharing and archiving of the data must be carefully organised and documented. Responsible storage and handling of research data is what is meant by research data management (RDM).

Data Management Plan

A first step towards responsible data management is the drawing up of a Data Management Plan (DMP). In this plan the researcher(s) describe(s) what type of data will be collected, how and where the data are stored, and who will have access to the data. A DMP is often mandatory when submitting a grant or research proposal.

On the Data Management Plan page you can find the kind of questions which must be covered by a DMP. There are also links to models, checklists and online DMP tools which may help when drawing up a DMP.

On the website of Wageningen University there are a template and examples of Data Management Plans for PhD research in eco-hydrology and eco-toxicology.

Note: drawing up a DMP must be done by the researchers themselves. An information specialist from the Library can offer advice and support during the writing. 

Metadata

The data which are recorded during the production of rough data are called metadata. It refers to information which describes who collected the data, where, when, what type of data, within which discipline, etc. When depositing a dataset in a data repository you will also be asked to enter information which describes the dataset. A widely used standard is Dublin Core. This standard offers a wide choice of disciplines and is suitable for many types of data.

A number of disciplines have their own standards:


 Biology Darwin Core
 Ecology Ecological Metadata Language

(EML)

 Genomics

Genome Metadata

MIxS

 Informatics   

Among others:

Resource Description Framework (RDF)

eXtensible Markup Language (XML)

 General

Dublin Core

DataCite Metadata Schema

An extensive overview for all disciplines can be found on the website of the Digital Curation Center.

The person who submits the data is responsible for adding the metadata. An information specialist from the Library can help you select the most useful metadata. 

Storage of research data

Some research groups use a server of their own to store data, or space on the common network disk or external hard disks. 

UvA

There is an ICT Services/University Library project which is investigating possible ways to create an environment for researchers to store, manage and share data. Within the Faculty of Science two research groups take part in the pilot project, one wihin SILS and one within IBED.

There is also the RDM Repository project to investigate repository systems to publish and/or archive research data in citable form. A research group from HIMS takes part in the B2share pilot as part of the EU financed EUDAT initiative.

4TU.ResearchData

4TU.ResearchData is a collaboration of the three technical universities in the Netherlands. UvA researchers may use 4TU.ResearchData for long-term archiving of their data. 4TU.ResearchData provides the following services:

  • Self-uploads of datasets to the data repository. These are datasets consisting of 1 description and 1 dataset to which 1 Digital Object Identifier (DOI) is assigned. The datasets must be deposited together with the description via a web upload form (log in with your UvAnetID).
  • Depositing special datasets in the data repository. These may be large datasets which are deposited in another way than via the web upload form, but a special collection may also consist of more than one dataset and more descriptions (e.g. of measuring apparatus, location and/or period).

For depositing special collections and large quantities of data (> 20GB) a separate agreement must be concluded. The party depositing the data is charged each year for the total quantity of data deposited in the repository in that particular year at that year's rate. The new rates have not yet been decided on but will be comparable to the rates of DANS and Vancis.

4TU.ResearchData has the Data Seal of Approval. The Data Seal of Approval guarantees sustainable storage of datasets according to international norms.

Dryad

Dryad is a digital repository for storing data which accompany scientific publications. Dryad arose out of an initiative of a group of journals and scientific organisations in the field of evolutionary Biology and Ecology. The charges depend on the journal in which the article is published.

Figshare

Figshare is a repository where researchers can make their research output available in a citable, shareable and retrievable way. It is useful for storing and making available small quantities of data temporarily. It is free up to 1 GB of data and a limited number of users. For larger quantities of data and users there is a pricing plan.

TAIR

The Arabidopsis Information Resource  (TAIR) was created for the storage of genetic and molecular-biological information of the model higher plant Arabidopsis thaliana (thale cress). The UvA pays an annual contribution in support.

Other data repositories

Data repositories such as Ecological Archives (ESA, ecology), GitHub (software), the University of Florida Sparse Matrix Collection (mathematics and physics) and others can be found via registers of data repositories.

How do I get a DOI for my data?

Persistent identifiers (such as DOI) are unique identifiers for digital objects. During the whole life cycle of the data they remain linked to the objects and their function is to identify the objects, independently of where they are stored or the storage time.DOIs are not only assigned to publications, but also to data. Research data to which DOIs are assigned must be stored in recognised data centres or repositories.

DOIs for datasets are assigned by  DataCite, an international consortium for data citation. In the Netherlands the TU Delft represents DataCite. DataCite Netherlands does not deal with individual researchers. Organisations who wish to take part in the DOI system must request an account from DataCite Netherlands. A unique prefix is then assigned to the organisation ( e.g. 10.5117 for Amsterdam University Press). The organisation itself can then assign suffixes to their digital objects, provided they are unique. DataCite requires a minimal set of metadata for each DOI that has been assigned. The metadata are stored centrally and made public via suitable portals.

Apart from Amsterdam University Press, there are no other DOI assigning bodies within the UvA (as yet). For more information, please contact the TU Delft  library.

Support

If you have any questions on research data management in Biology, Informatics or Logic, please contact drs. G.H. (George) Meerburg. He can also give you advice on drawing up a DMP or on selecting a good metadata standard or repository for your research data. This service is still being developed, so please take into account a period in which knowledge and expertise must be developed.

  • drs. G.H. (George) Meerburg

    Information specialist Biology, Informatics and Logic

    G.H.Meerburg@uva.nl | T: 0205256643

    Go to detailpage

Published by  RDM support

3 June 2016