The "FAIR data toolkit"
Learning resources for FAIR data
Meet HEAP researchers, FAIR frog and Data Gator in the “Swamped?” video series, and discover FAIR data through practical examples, tips, tricks, and a few common mishaps.
An overview of the BIBBOX – a platform to supports researchers in publishing their datasets in a FAIR manner, published in the journal New Biotechnology in November 2023 by the Work Package 7 team from the Medical University of Graz.
Tools for FAIR data
FAIR data self-assessment service
The Self Assessment Service (SAS) is an on-line questionnaire to guide researchers on how Findable, Accessible, Interoperable and Reusable (FAIR) their data is.
FAIR toolbox
A modular component-based toolkit for life sciences. Open-source software is combined with ID- and user-management to record data provenance graphs to enable implementation of research workflows in a FAIR manner.
More tools from HEAP
Hopsworks exposome Platform as a Service
PaaS for distributed management and analysis of exposome data. GDPR compliant, sharable data, bioinformatics tools and analysis pipelines. Provides Machine Learning out of the box.
How will the HEAP informatics platform help me as a researcher? HEAP software developer Alex explains the key features of HEAP in this 4-minute video.
Exposome Toolbox
The HEAP tools are part of EHEN's virtual exposome toolbox, which is a signposting hub for the tools developed by the European Human Exposome Network projects. They include data models, guidelines and protocols,
Scientific publications from HEAP
Project profile paper
Published in the journal Environmental Epidemiology in December 2021, an overview of the HEAP platform as a research resource for the integrated and efficient management and analysis of human exposome data.
Epigenetics and biomarkers
Published in Clinical Epigenetics in September 2024.
This study presents a new computational method to further refine existing preprocessing methods for Illumina methylation array data by excluding unreliable probes from downstream analyses. The methods to calculate MI and unreliability scores are implemented in an R package, epicMI, which is publicly available on GitHub.
Published in Nature Aging in September 2024.
Biomarkers of aging (BOA) are quantitative parameters that predict biological age. In recent years, many promising molecular and omic BOA have emerged with potential for translational geroscience and improving healthspan. This study surveyed experts in these areas to better understand current challenges for the translation of aging biomarkers, identified six key barriers to clinical translation and developed guidance to overcome them.
Published in Nature Medicine in June 2024 by the Work Package 8 team at EUTOPS, University of Innsbruck.
Study establishing that screening of cervical cancer using a DNA methylation-based triage showed better performance than cytology in the detection of prevalent disease and prediction of incident disease.
Published in Cancer Research in June 2024 by the Work Package 8 team at EUTOPS, University of Innsbruck.
Research on the effects of tobacco and e-cigarette smoking in distinct cell types and anatomical locations. The findings indicate the same epigenomic ‘cancer-associated’ changes in both e-cigarette and tobacco smokers.
Published in the International Journal of Cancer in February 2023 by the Work Package 8 team at the University of Innsbruck.
Using a DNA methylation signature to investigate changes in the epigenome caused by the HPV virus.
Published in the journal Genome Biology in February 2022 by the Work Package 8 team at the University of Innsbruck.
The findings imply that there are multiple epigenetic clocks, many of which are tissue-specific, and that the differential tick rate between these clocks may be an informative surrogate measure of disease risk.
Ethics and regulations
Policy comment
Published in the journal Health Policy in June 2023, policy recommendations to address the perceived risks of the proposed European Health Data Space (EHDS). The authors included a legal and ethical expert from the HEAP Ethics and Regulations Work Package team.
GDPR and joint controllershop
Published in Open Research Europe in June 2022 by the Work Package 2 team at MLCF Foundation (now Lygature).
Proposal for a ‘funnel-and-sieves’ model for deciding who are joint controllers under GDPR legislation for the data processing in research consortia and who are not.
Medical ethics
Published in the journal Computer in July 2021 by the Work Package 7 team at the Medical University of Graz and the Work Package 2 team at MLCF Foundation (now Lygature).
Practical guidelines for those applying artificial intelligence to provide a concise checklist to a wide group of stakeholders.
Artificial Intelligence (AI) and Machine Learning
Published in SIGMOD/PODS ’24: Companion of the 2024 International Conference on Management of Data and the ACM (Association for Computing Machinery) Digital Library in June 2024.
Presentation of the Hopsworks feature store for machine learning as a platform for managing feature data with API support for columnar, row-oriented, and similarity search query workloads. This publication addresses challenges related to feature reuse, organizing data transformations, and ensuring correct and consistent data between feature engineering, model training, and model inference.
Published in the journal New Biotechnology in May 2023 as part of a special issue entitled Artificial Intelligence for Life Sciences.
An overview of open research issues and challenges relating to biotechnology and Artifical Intelligence.
Published in Explainable AI: Foundations, Methodologies and Applications in October 2022. Authors included HEAP researchers at the Medical University of Graz (Data interoperability and sharing Work Package).
Book chapter demonstrating the crucial role that human-AI interfaces play in conveying the trustworthiness of AI solutions to their users.
Published in the journal New Biotechnology in September 2022 by the Work Package 7 team at the Medical University of Graz.
Description of concepts and examples of how explainability and causability are essential to demonstrate scientific validity, as well as analytical and clinical performance for future AI-based IVDs.
Published in the journal Computer in February 2022 by the Work Package 7 team at the Medical University of Graz.
An argument for using causability in medical artificial intelligence (AI) to develop and evaluate future human–AI interfaces.
Published in the journal IEEE in February 2022 by the Work Package 7 team at the Medical University of Graz.
A 5-step approach for developing personas to support human-centered design of AI applications, with practical examples from personas development for AI solutions for digital pathology.
Cohort data
The Swedish Cervical Screening Cohort
Cohort profile paper published in the journal Scientific Data in June 2024.
The Cervical Screening Cohort enrols women screened for human papillomavirus (HPV) and cervical abnormalities within the Stockholm region. It started in 2011 and has enrolled more than 670,000 women, contributing more than 1.2 million biobanked samples. It is systematically updated with individual-level data (including birthdate, sampling date, cytological, histopathological and HPV analysis results) from the Swedish National Cervical Screening Registry (NKCx). The cohort is ideal for longitudinal, long-term follow-up studies due to its validated documentation and registry-derived information.
Community-randomised HPV vaccination trial
Published in the journal Cell Host & Microbe in November 2023 by the Large sample Cohorts Work Package team at Karolinska Institutet and the University of Oulu.
The study shows that in the years following vaccination, cancer-causing HPVs are replaced by vaccine-untargeted HPV types with low or no risk for cancer. This effect was most pronounced in communities where gender-neutral vaccination campaigns had been carried out.
Consumer Purchase Data (CPD)
The Health, Food, Purchases and Lifestyle (SMIL) cohort is a prospective open Danish cohort that collects electronic consumer purchase data, which can be linked to Danish nationwide administrative health and social registries. This paper provides an overview of the cohort’s baseline characteristics and marginal differences in the monetary percentage spent on food groups by sex, age and hour of the day, published in the journal BMJ Open in March 2024.
Consumer Purchase Data (CPD)
Danish register-based cohort study including consumer purchase data from households where at least one member had their Body mass index (BMI) measured in childhood, published in the journal Plos One in August 2024.
Households with a member with BMI classified as overweight in childhood spent more on unhealthy foods and less on vegetables, compared to the reference households., highlighting the need nutrition education and intervention.
Consumer Purchase Data (CPD)
Cohort profile paper published in the journal Nature Scientific Reports in December 2023 by the Consumer Exposure Monitoring System Work Package team from at Statens Serum Institut.
Description of the My Purchases cohort, a web-app enabled, prospective collection of CPD, covering several large retail chains in Denmark, that enables linkage to health outcomes. Combined with extensive product databases and health outcomes, CPD could provide the basis for extensive investigations of how what we buy affects our health.
Consumer Purchase Data (CPD)
Published in the journal BMJ Open in June 2022 by the Consumer Exposure Monitoring System Work Package team from at Statens Serum Institut.
Protocol for a population-based inception cohort study aiming to investigate the underlying mechanisms for the heterogeneous course of IBD, including need for, and response to, treatment. Environmental factors and quality of life will be assessed using questionnaires and, when available, automatic registration of purchase data.
Metagenomics
Published in the journal Cancer Medicine in 2023.
Analysis of bacterial and viral communities in colorectal cancer using non-targeted deep-sequencing strategies enabling full microbiome characterisation up to species level.
Published in the journal Nature, Scientific Reports in July 2022 by the Large sample Cohorts Work Package team from at Karolinska Institutet.
Description of “HPV-meta”, the first open-source pipeline aiming to specifically detect HPV transcripts in RNA sequencing data.
Published in the journal Multidisciplinary Digital
Publishing Institute (MDPI) in 2022.
Identification of a microbiome signature for predicting the risk of colorectal cancer, that was validated in a new study, Colorectal Cancer Screening (COLSCREEN).
Biobanking
Published by Springer Nature Switzerland in March 2022 by the Data Interoperability and Sharing Work Package team at the Medical University of Graz.
Standards and tools for biospecimen quality management must be democratised for biorepositories in a variety of settings to have a truly global impact on research.
https://doi.org/10.1007/978-3-030-87637-1_20
Education and outreach
Published by Open Research Europe in February 2023 by the Education and Dissemination Work Package team at the International Agency for Research on Cancer (IARC).
The exposome is a broad and a recent concept, and is challenging to define in a structured way. Personas have been used in computer science to improve our understanding of human-computer interaction. Using personas specific to exposome research is a useful way of supporting education activities for this complex scientific field.
Project reports and public deliverables
This document defines the life cycle and governance framework for all data to be collected, processed and generated during the HEAP project.
This document provides guidance for the development and use of the Reference Architecture within the Human Exposome Assessment Platform (HEAP). It describes the technical architecture, data and metadata flow and means for accessing data in the platform.
The HEAP project reports are all available on Zenodo...
- Bala, Piotr, Gorski, Lukasz, Mrocek, Magdelena, Marek, Nowicki, Arroyo Muhr, Sara, Garcia-Serrano, Ainhoa, Pimenoff, Ville, Merino Martinez, Roxana, Dillner, Joakim. (June, 2023). Massively parallel Bioinformatics pipeline. Public deliverable 9.4. Zenodo. https://doi.org/10.5281/zenodo.10693082
- Roxana Merino Martinez. (September, 2023). The HEAP project poster. Zenodo. https://doi.org/10.5281/zenodo.8380149
- Heimo Muller, Roxana Merino, Stefan Negru, Martin Boeckhout, Evert-Neb van Veen. (May, 2020). Human Exposome Assessment Platform - Data Management Plan. Public deliverable 7.1. Update Jan 2023 (Version Final deliverable). Zenodo. https://doi.org/10.5281/zenodo.7572272
- Arroyo, Sara, Merino, Roxana, Pimenoff, Ville, Bala, Piotr. (December, 2022). Ontology and semantic metadata for metagenomic profiles and tools for NGS data management. Public deliverable 9.3. Zenodo. https://doi.org/10.5281/zenodo.7540992
- Górski, Łukasz Górski, Mroczek, Magdalena, Nowicki, Marek, Bala, Piotr. (February, 2022). AI based alignment free (agnostic) taxonomic classification. Public deliverable 9.2. Zenodo. https://doi.org/10.5281/zenodo.7520385
- Szymkiewicz, Szymon, Bala, Piotr. (December, 2020). Bacterial and viral metagenomics. Public deliverable 9.1. Zenodo. https://doi.org/10.5281/zenodo.7520376
- Zhang, Allison, Pimenoff, Ville. (December, 2022). Exposome and metabolomics analysis pipelines. Public deliverable 5.3. Zenodo. https://doi.org/10.5281/zenodo.7499184
- Zhang, Allison, Snyder, Michael, Pimenoff, Ville. (December, 2020). Improved wearable Personal Exposome Monitors (PEMs). Public deliverable 5.1. Zenodo. https://doi.org/10.5281/zenodo.7499101
- Trier Møller, Frederik, Ewes, Caroline, Wilkowski, Bartlomiej, Chong, Steven, Grønborg Junker, Thor. (December, 2022). Consumer cohort - secure platform and recruitment. Public deliverable 4.1. Zenodo. https://doi.org/10.5281/zenodo.7499081
- Coombs, Heather, Kozlakidis, Zisis, Berger, Anouk. (December, 2022). Knowledge and Information - Phase 1 report. Public deliverable 11.3. Zenodo. https://doi.org/10.5281/zenodo.7499065