Data management: how and why in RDI projects?
Data management planning aims to secure that good data management and responsible conduct of research are followed, and that data protection and security are well taken care of (Data protection in research).
Data management planning includes data creation, storing, describing, and organizing during the project. RDI data can be measurement results, laboratory diaries, source codes, programs, statistics, pictures, sound recordings, interview transcriptions, survey data or observations based on field work. Data can also be physical or biological. Data management planning ensures that the data can be used further, and that it is open according to the principles of open science: as open as possible, as closed as necessary.
When data management is well taken care of in RDI projects, it’s beneficial for everyone. Open data increases networking and the impact of RDI activities and makes it better known to the public. It also makes research faster, enables repeatability and reduces overlapping work. Therefore, it’s important to know what kind of data is collected in the RDI projects and how it can be used.
For a researcher, producing and opening data are merits that can be added to the CV. When the data is found via the data archive and referred to, the researcher will also get credit. Read more about the benefits of open access and further use of data from Finnish Social Science Data Archive.
In the planning stage of the project
In HAMK, data management plan must be made at the latest when the funding decision takes effect. Persons responsible for tasks related to data management are appointed in the data management plan. Circumstances and needs may change in RDI projects, and the data management plan should be adjusted and updated during the project. Funders may also have their own requirements for data management plans.
HAMK has committed to follow the ethical principles for human sciences set by the Finnish National Board on Research Integrity (TENK). Support persons for ethical questions in research are Janne Salminen and Sajal Kabiraj in HAMK. Necessary guidelines and recommendations for ethics in research will be discussed and agreed with the research partners.
If the RDI activities are focused on human participants, the necessary research permits must be established before starting the project. If RDI project focuses on organization’s activities (for example in a teaching institution or an enterprise), a research permit for RDI activities must be sought from the target organization from which research subjects are recruited. An organization’s procedures for obtaining the permission are usually explained on their website. If permission to conduct research or an advanced ethical review is needed, the permissions should be obtained well in advance. The ethical review must be sought from the correct ethical committee before a research permit is applied for.
In RDI projects, the materials may or may not be copyright protected. It is best to agree on the use of both kinds of material in the planning stage of the project. Read more about copyright agreements and authorship on Finnish Social Science Data Archives’ Data Management Guidelines.
The ground rules for the project will be laid out in a research, project or collaboration contract which covers the project and its implementation. The contract (Preparation of RDI-projects, HAMK intranet) also defines the rights of use and publishing as well as ownership of background and result materials. Open access is required in principle for projects that have obtained public funding.
A general guide for starting projects, which helps with data management, research ethics, data protection, publication and communication, can be found on HAMK intranet (requires logging in with HAMK staff ID).
Data management during a project
Drawing up data management plan as a part of starting a project
Data management plan will be done with DMPTuuli tool by answering questions that cover all five areas of data management:
- general description of materials: collection and use of data
- following the law and ethical principles,
- documentation and metadata,
- storage and backup during the project,
- opening, publishing, and archiving materials after the project’s completion.
Two of HAMK’s own data management plan templates have been drawn up for DMPTuuli, one for research projects and the other for education and development projects. You can choose the template that suits your project the best.
Data management plan describes materials and how they are collected, ownership, copyright and right of possession, how the research participants are informed and who gives access to the materials during and after the project. Data management plan also answers to the following questions: does the material include confidential data or data to be kept secret or documents or personal data, how the data is protected and has a privacy notice been done (Data protection – informing the data subject). Read also General Finnish DMP Guidance 2021.
Planning data collection is essential for the quality and success of the whole RDI project. Data collection will be planned so that laws and societal basic values concerning people, animals and environment are respected. Producing good documentation during the implementation stage is important, because describing data later is often difficult and sometimes even impossible.
Sometimes good data material exists already: it can be collected in previous research or published by authorities or enterprises in different data archives. Citing data material has the same ethical principles and rules as when citing literary materials: https://datacite.org/cite-your-data.html.
If the RDI activities are focused on human beings, personal data and anonymization (Finnish Social Science Data Archives’ Data Management Guidelines) must be considered when planning use of the data. Particular attention should be paid to the processing of sensitive personal data. Further information on Additional instructions for planning the management of sensitive and confidential data 2019, PDF (Tuuli project).
Informing research participants is less about the topic and objectives of the research and more about how personal data is managed in the project. The main purpose of data protection is to protect the research subjects. By following the general data protection regulation (GDPR) and data protection act of Finland the research subjects’ rights are protected, as well as the legal protection of the project is guaranteed. Basis of processing personal data must always be justified. Data impact assessment may have to be done in research where processing personal data is likely to cause a high risk to the rights and freedoms of the data subject.
After being informed, research participant should understand how their personal data is being collected, used, stored, transferred, or processed otherwise. Finnish Social Science Data Archives’ Data Management Guidelines include a detailed and up to date guide on processing personal data.
Where and in what format the data is stored and how the files and their different versions are named should be considered before data collection. When saving, storing and processing data, software and storage mediums should be chosen carefully so that the technical quality of the material can be preserved throughout its whole life cycle, as well as data security and protection.
Storage formats should be chosen with long term storage in mind, so that the data can be transferred and used. To ensure usability, save a copy of the file as a format that is common and supported by multiple programs. Up-to-date information on suitable file formats: https://digitalpreservation.fi/en.
You can check from HAMK’s Classification of data guide where you can store confidential data and data to be kept secret and when you have to manually back up your data. To ensure that all the data collected and produced in the project is safe, it should be saved in at least two different places in different storage systems.
Data security includes making sure that the data is safe from destruction, alteration, and unauthorized use. Different user groups’ rights for viewing and processing the files must be defined. In HAMK, an individual person can store data on their personal home directory (P-drive). Research groups order a protected storage place, that is also suitable for sensitive data, from the staff’ protected team folders (S-drive) through ServiceDesk. Sensitive data is transferred via secured e-mail or on encrypted files (Information Security, Privacy and Rules, HAMK intranet). If students are included in the research group, the storage will be on the shared folders (U-drive).
RDI projects use national storage services if part of the research group comes from outside of HAMK. Research projects can use IDA service. International research groups can also use EUDAT services for sharing data. Ask more about these services from the RDI support team.
Documenting the data analysis, methods and codes supports the repeatability of the research. Data collection and analysis practices are field-specific, and they also depend on the data type.
Metadata production should be started early in the project. Without proper metadata, the data is useless. Metadata enables the data’s retrievability, accessibility, long term storage and reuse. Each research data should have a directory, where the data and metadata are saved. The data collection instrument, such as a questionnaire or Webropol form, is saved as a text file. Descriptive data may also include code keys and README-files. Read more on Making a research project understandable – Guide for data documentation (Fuchs and Kuusniemi) and Finnish Social Science Data Archives’ Data Management Guidelines, where they have detailed instructions for processing qualitative and quantitative data.
In HAMK, metadata for collected data is produced in the national Qvain service or by updating the data management plan. The RDI support team helps concluding projects with producing metadata.
Before the projects ends
Ownership, copyrights and right of possession of the data and documentation related to those, as well as the funders’ conditions and recommendations for publishing and archiving should be reviewed well before the project’s completion. Creative Commons licenses are suitable for research data and metadata. Creative Commons 4.0 or CC0 open licenses are recommended. Before data can be licensed, the ownership has to be cleared. During the final stages, what data will be stored and for how long will also be defined. Unnecessary and sensitive data will be destroyed while taking care of data security and protection.
Data and everything needed for understanding and further using it will be compiled in a folder and saved in file formats that are suitable for long term storage. RDI projects’ data will be stored and published first and foremost in data archives meant for research data, if possible.
Data storage systems recommended by HAMK and how to store other project data are described in HAMK’s General guide for concluding projects (requires logging in with HAMK staff ID).
Producing metadata is also a part of concluding the project. Information about the data and its storage is updated in the data management plan or produced in the Qvain service. Metadata produced in Qvain can be found on the national Research website. The RDI support team helps with data storage and producing metadata.