News from BYOD#3
Bring Your Own Data (BYOD) workshops are the engine of the HEAP project, and are essential to its mission to develop a secure and scalable informatics platform to analyse large scientific datasets.
The BYOD#3, held in Stockholm in November 2022, provided a collaborative working space for researchers to test a range of datasets, and to harmonise and integrate bioinformatic pipelines within the HEAP platform, which is based on Hopsworks 3.0 integrated into the CSC secure research infrastructure.
The HEAP research teams brought data from the following sources:
- Swedish Cervical Cancer Screening Cohort supported by Machine Learning for metagenomics (Karolinska Institute and ICM, University of Warsaw)
- The Finnish Maternity Cohort and the associated Wearable Data Collection study (University of Oulu and Karolinska Institute/Stanford University)
- HPV Vaccination Cohort (Tampere University Hospital, Finland)
- The Danish Consumer cohort (Statens Serum Institut, Denmark)
- Lifestyle study – TirolGesund (University of Innsbruck, Austria)
The main challenges for the HEAP researchers and technicians at BYOD#3 were to progress the design and implementation of common metadata for querying their datasets, to define the source code, resources, dependencies, repositories, and documentation for the analysis tools, and finally, to install the selected tools and implement the bioinformatic pipelines.
The research team from Karolinska led by Sara Arroyo Muhr, provided two open-source bioinformatics pipelines, “HPV meta” and “HPV biopipeline” that aim to detect HPV transcripts in RNA sequencing data and to perform taxonomic classification (bacteria, virus, fungi) in human specimens.
The team began work to test HPV-meta on Karolinska’s TCGA dataset, and on a subset of 1400 biological samples from the Swedish cervical screening cohort, which is currently ongoing. The University of Oulu and Stanford University partners tested the HPV biopipeline with data from the Finnish maternity cohort and wearable studies. Work also began to add a tool providing post-analysis validation in R to this pipeline.
The Lifestyle study team from the University of Innsbruck started to implement preprocessing pipelines for epigenetic DNA methylation data in Hopsworks, which they will use to investigate exposure signatures, cellular ageing, and cancer risk.
Researchers from the Finnish Maternity Cohort and Wearable Data Collection worked on taxonomy classification for their tools and pipelines in Hopsworks.
Following BYOD#3, Sara Arroyo Muhr said, “Our main aim in testing the pipelines is to develop open-source and user-friendly bioinformatic tools that researchers can use and adapt for their data analysis. Testing pipelines with different datasets helps developers to improve the pipeline, and researchers to analyse their data. In 2023, we aim to develop and validate more bioinformatic tools within Hopsworks 3.0, targeting novel discovery and machine learning models for disease and cancer prediction.”
HEAP Project Manager Roxana Merino said, “BYOD #3 has demonstrated the progress of the HEAP technical platform and the challenges for reusing and sharing research data from different institutions. We are looking forward to continuing our work to create an integrated research framework for data analysis and knowledge discovery, and to make it available to the exposome research community.”
An overview of “HPV meta”, that was developed through a collaboration between medical researchers and bioinformaticians at Karolinska Institute was published in the journal Nature, Scientific Reports in July 2022.
Are you a researcher and would like to find out more about the technical features of the HEAP platform?
In a short video on the HEAP YouTube channel, Alex Ormenisan from Hopsworks presents Key features of the Hopsworks platform for researchers.