The HEAP project newsletter - Issue 2

Interviews, events and updates from the Human Exposome Assessment Platform

January 2021 - Issue 2

View online

The HEAP interview - metagenomics

We talk to Piotr Bala, leader of HEAP’s Metagenomic Analysis Work Package. He is Professor of parallel and grid computing, algorithms and programming at the University of Warsaw’s Interdisciplinary Centre for Mathematical and Computational Modelling (ICM).

First of all, what does metagenomics involve, and how is it being used in the Human Exposome Assessment Platform (HEAP) project?

Metagenomics is the study of genetic material in biological samples. In the case of HEAP, these are biospecimens from the Swedish Cervical Screening cohort, based at Karolinska Institutet in Stockholm. The aim of the Metagenomics Work Package is to help the biologists and bioinformaticians at KI to quickly and efficiently generate and analyse metagenomic sequencing datasets from their biospecimens, and analyse them to identify bacterial and viral sequences.

My team at ICM use computing expertise to test bioinformatics pipelines to perform the analysis several hundred times faster than previously possible. The result of this work will be a better understanding of how bacteria and viruses impact our health. In the longer term, these datasets and insights will be available for researchers to access via the HEAP platform.

Who are the ICM team working on HEAP?

We are a multi-disciplinary team. My background is in physics and computer science, and I work on modelling enzymatics reactions, and on parallelization of molecular dynamics software. We also have two computer science faculty members, both with PhDs, providing technical expertise – Marek Nowiński, who has already worked on parallelization of BLAST code, and Łukasz Górski, who has experience in parallelization of Artificial Intelligence (AI) tools. He has also extensive knowledge of software used in the HEAP project, such as Hadoop and Spark.

We also have a bioinformatician - Szymon Szyszkowski, who installed and tested the bioinformatic pipelines.

The newest member of our team is Magdalena Mroczek, who specialises in genetics and finished medical studies at Warsaw Medical University. Since graduating she has been working in research laboratories, and she is currently extending her computer science skills through an ICM Masters degree programme in computational engineering.

Finally, the ICM team works closely with Roxana Merino, Ville Pimenoff and Sara Arroyo from Karolinksa Institutet, who have provided us with data from biosamples. As the end users of the bioinformatics pipeline, they provide us with ideas and challenges. This cooperation is really valuable to us, and helps us fully understand their requirements.

How did your team at ICM come to be working on HEAP?

The team has been working with Karolinska Institutet for the past 6 years on BLAST, a widely used bioinformatics software. BLAST compares a sequence of interest (the query sequence) to sequences in a large database and reports the number of matches. KI used BLAST to search for virus sequences in human DNA. This required a lot of computational time, and we developed a parallel version of BLAST which runs on supercomputers and shortened the analysis time from months to days or hours.

For us, the HEAP project is an opportunity to continue our work with KI, looking at ways to shorten the time taken to analyse DNA using parallel computing. We plan to parallelize additional software packages to enable more extensive analysis of DNA.

What stage is your work at right now?

In December 2020, the ICM team completed tests of bioinformatics software pipelines on the ICM cluster computer. We began our latest round of work by testing 6 potential bioinformatics pipelines, based on a variety of different algorithms. The tests identified bottlenecks and corrected them to improve simultaneous analysis, to allow software can work in parallel on cluster computers.

Having run the tests using sample data from Karolinska Institutet, we narrowed this down to 2 or 3 pipelines. The final decision will be taken soon.

What will your team be working on in 2021?

Our aim in 2021 is to build on the pipelines we have tested and selected, to make them more advanced. The next stage of work will focus on workflows and machine learning algorithms to identify viral DNA sequences in biological samples.

As a result of this work, by spring 2022, we aim to have developed agnostic taxomic classifications methods for virus genomes.

In the longer term, what do you hope will be the legacy of the HEAP project?

The project will advance knowledge on how the environment influences human health by developing the HEAP integrated platform to analyse health-related data, including genetic data. Such an ambitious aim requires diverse, multi-disciplinary experience and expertise, and this is offered by the HEAP consortia partners.

For ICM, it is important that we work closely with medical professionals, in order to provide them with an efficient solution to speed up genetics analysis, and therefore to improve health of patients and of the whole population. This process is challenging, as it requires interaction between software developers and users, and can only be performed with interdisciplinary and cross-institutional collaboration.

Upcoming exposome events

Virtual conference - 2021 USA - European Exposome Symposium - Exposomics, "COVID 19 and health disparities" 27th-28th January, 2021

This two-day virtual conference is hosted by the Mount Sinai Institute for Exposomic Research, in collaboration with the University of Brescia and the University of Utrecht.

Talks include “Exposome and Racism in Health Disparities”, “Identifying chemicals of emerging concern”, “Possible sources of COVID-19 first wave spread and new sampling approaches for the identification of air transmission danger”, “Environmental Biodynamics: redefining the role of time in the exposome”, and a panel discussion on “Leveraging Exposomics Research in a Global Pandemic.”

Click here to register.

Virtual conference - "Biobanking for precision care" Lessons learned from global crises" 8th-10th March, 2021

This 3-day virtual conference will feature representatives from national biobanks, including the Danish national biobank, which is hosted by HEAP consortium member Statens Serum Institut.

Subjects to be covered the response to the COVID 19 pandemic from a local and international perspective, and approaches adopted by biobanks to ensure long term survival and sustainability to support precision medicine research, as well as an overview of the newest technologies and trends in the global biobanking sector.

For the full programme of speakers and abstracts, click here.

Click here to register

Who's bringing the data? Meet the HEAP "Personas"

Report from the Bring Your Own Data workshop: One of the aims of the Bring Your Own Data (BYOD) workshop was to understand more about the people who […]

Prepare to test! HEAP's first Bring Your Own Data workshop

The partners in the HEAP Consortia will gather for a three-day, virtual Hackathon event from December 16-18 2020. The event marks the end of the first year of […]

The HEAP interview - Allison Zhang on Personal Exposome Monitors (PEMs)

We talk to Allison Zhang, a postdoctoral scholar who is working on a HEAP project to pilot continuous personal exposome profiling of 100 volunteers during their pregnancies. She is […]

Join the HEAP community

Unsubscribe | View online

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 874662