Open data

A digital lolly jar

Open source, open science and open access are all trends in science. Meticulous data management is therefore becoming increasingly important. A recently launched international hallmark encourages the sharing of data.

Text: Marion de Boo

Open data... a (digital) lolly jar

Data archives are genuine treasure troves. Researchers spend a lot of time and energy collecting data. Once an article has been published or a thesis has been completed, the dataset often disappears into a drawer, collects dust in an archive or sits on a hard disk that perishes after a few years. The researchers are no longer interested. They have a new job or other interests and they no longer maintain the website. And then we have not even talked about the laptop of unique research data that is stolen from a car or is accidentally left behind on the train.

If datasets were stored in a sustainable and open manner then other researchers would be able to obtain valuable information from them. Because they ‘are standing on the shoulders of giants’, as Einstein once said. In 2005, NWO and the Royal Netherlands Academy of Arts and Sciences therefore decided to establish DANS, the Netherlands institute for sustainable access to digital research resources. Currently, there are more than 40,000 datasets stored at DANS and about the same number are requested by researchers each year. ‘The proportion of data that is open has risen in just a few years from 40% to about 70%’, says the director of DANS, Peter Doorn. ‘Partly as a consequence of a change in mentality among researchers and partly forced by the policy of research funding bodies.’

Combating fraud

Properly stored data should also be transparent, a value that is becoming increasingly more important in science. Doorn: ‘Over the years, conventions have evolved about the correct citation of research literature. Such clear standards should also exist for checking the underlying research data.’ Transparent datasets make it more difficult to commit fraud in science and ensure that future researchers can verify results from their predecessors. How exactly was this project tackled? Are the conclusions correct?’ Sharing data is not only efficient but it also has interesting side effects. Sometimes contacts arise between researchers who are working on the same theme in different parts of the world. And according to Doorn, researchers who make their data available enjoy greater prominence and are cited more often. Are you scared that other people will steal your material? You do not need to be. At DANS, the owner of the data can determine who has access to the material and under which conditions.

Core Trust Seal

Data storage organisations need to satisfy strict requirements these days. An international hallmark was recently launched for this: the Core Trust Seal. For example, a meticulous data manager uses a persistent identifier in the same way as books have an ISBN number. Should a dataset still go missing, it can always be traced. Furthermore, the data must be supplied with proper metadata and documentation and any changes made later to the original dataset must be clearly recognisable. The data storage organisation also needs to monitor the access licenses. Doorn: ‘The security of the storage and the privacy of study subjects must be guaranteed. And of course the digital safe is properly secured against hackers. Researchers who entrust their datasets to a storage location with the Core Trust Seal can count on their data being in safe hands. There are about two thousand storage locations in the international research community and several hundred of these have been certified with the Core Trust Seal. We are pleased about that.’

Popular datasets

Voters study

A frequently consulted dataset is from the Nationaal Kiezersonderzoek (Dutch Voters Study). This study has been held around the Dutch general elections since 1971. Researchers, the media and political parties can find a rich source of information here concerning almost fifty years of voting behaviour. How did the Dutch vote? What role did religious belief play in society? How satisfied were people with the sitting government?

Ageing

Medical datasets are also frequently consulted. The databank of DAN contains part of the data of about 5000 elderly people who have been participating in the Longitudinal Aging Study Amsterdam (LASA) of the VU Medical Center since 1991. The study maps the function and well-being of elderly people. For example, how does lifestyle affect memory? Does playing music slow down cognitive ageing? What hindrances do elderly people with arthrosis experience in their daily life? Does connectedness protect elderly migrants from loneliness?

Epidemiology

Much epidemiological research is based on the reuse of data about illness and health. The data of individual patients is combined and made anonymous so that patterns can subsequently be discovered in the spread of viruses, for example. 

Ship voyages

During historical ship voyages, weather observations were made every few hours and recorded in ship logbooks. The wind force was measured, for example, as 'that in which well-conditioned top sails could just carry' or ‘that which no canvas sails could withstand’. The Beaufort Scale for wind force did not exist before 1805. Thousands of observations from hundreds of ship voyages have been combined by meteorologists in climate models. This enabled reconstruction of data about the climate at sea.

Archaeology

The collection contains a lot of archaeological material, data from thousands of individual excavations and exploratory drillings. Drawings, photos, maps…  Combining such data provides far more insight into how a society developed and spread. It is also possible to produce archaeological expectation maps that indicate which locations are promising for an archaeological dig.