There are several sources from which to find research data for the social sciences. The Registry of Research Data Repositories resource is a general service for finding suitable data sets by type, subject, accessibility conditions and other useful categories which can be filtered to obtain, for example, specific results for the social sciences. Other useful data services are:
- gesisDataSearch – Search for social and economic research data across a diverse portfolio of data repositories and metadata services
- Social Science Open Access Repository Social Science Open Access Repository (GESIS)
- ICPSR – Inter-university Consortium for Political and Social Research social science data archive
Data collecting and structuring
Although most of the social-science data pertain to humans, these tend to be very heterogeneous in type and format. Large scale quantitative survey data, geographical data, government records, qualitative data (e.g. interview data) are only a few types of commonly available data types. The time spent organizing these very different types of research outputs is time well-spent.
These are our recommendations for the handling of publications and data sets:
Make use of the excellent technical features of modern reference management software to store details of your articles, books and other data sources. Some software packages offer additional features, such as the ability to store and share data sets and conduct collaborative work. The ULB offers training and tutorials for a number of reference management software.
- Use logical structures and a file naming convention which is easy to understand and which describes your data so that others can understand the logic behind the structure of your data.
- Use versioning to clearly identify older version of your data. This will not only save time but it will also contribute to make your research data reproducible and verifiable.
- Make your data sets readable across operating systems by not using special characters and by using the _ character instead of space while naming your files.
Legal ethical and data protection aspects
In most cases, social science data sets refer directly or indirectly to human subjects and therefore legal, data protection, and ethical matters must be thoroughly addressed.
– If new data from subjects is being collected special attention must be placed on issues such as consent forms and clearing any existing or resulting copyright or intellectual property rights (IPRs).
When drafting data consent forms it should be specified exactly how and for what purposes the data will be acquired, and also, what will happen to the data throughout a project, including any plans for future sharing. If the consent form is written cleverly, it is possible to secure the usage of the data for future analysis and research questions. Further information about consent forms and templates can be found here.
If work is conducted with existing data sets (census data, cohort data sets) or specialized data collection instruments (questionnaires, tests) all copyright and IPRs issues associated to these resources must be cleared first. Otherwise, it may not be possible to re-use newly created data sets if these have been created with copyright-protected resources.
Compliance with legal and ethical conditions can be time consuming and could demand, in some cases, additional project resources. It may thus be worth seeking advice from services at the University which can help with specific topics:
Data protection advice, drafting of legal and cooperation agreements and other documents –> the legal department of the MLU
Guidance with the writing of consent forms, advice on methodological implementation of data embargo restrictions, archiving of closed access collections, online publication of data resources –> The Open Science Team of the ULB
Data formats and Metadata
The ULB provides a list of accepted and preferred data formats which can be used to maximize the readability of data in the future. The list gets updated periodically to ensure the advice given remains accurate. Other useful links and advice for choosing the right data formats for long-term preservation can be found here:
Providing good metadata is crucial to make your data understandable and ensure its usability. The Data Documentation Initiative (DDI) provides a suitable standard for the description of social science data. With the latest DDI standard it is possible to provide others information about the entire research data cycle of a given project or study – from design and planning to data collection, preparation and analysis and archiving. In the domain of aggregate statistical data, the SDMX Standard is also widely used and accepted. To further enrich your documentation and make it interoperable you may want to use some of the freely available documentation tools or converters.
The ULB offers staff members at the MLU support for the creation of documentation and expert advice on the usage of a number of metadata standards such as DC, METS / MODS and DDI, and TEI. Get in touch with us if you have questions.
Generally, when creating metadata and writing documentation information on the following topics should be included:
- The context of your data collection including any important administrative remarks
- The structure of the data set, including relationships between data files and other documents
- The kind of validation, cleaning and quality assurance checks conducted
- Accessibility conditions
- Specific agreements for reusability and secondary usage of the resources
- Citation and acknowledging statements
A number of aspects should be considered when preparing social science data for re-usage by others:
- Seek early approval to re-use and share your data from surveyed participants
- Enough time should be planned to draft study/project-specific consent forms
- The anonymization of the data sets should be conducted taking into account the agreements in the consent forms and should be overseen by individuals with good data management skills and experience with quality control processes
- Data should be made available via secure data transfer methods
- Supporting documentation should be made available with the data sets
- A data user agreement and licensing should be used to ensure that no attempts are made to re-identify or contact trial participants, that data creators are acknowledged and cited correctly, and that the approved conditions upon which the data set can be used are clearly outlined