We talk to Sara Arroyo, research coordinator at Karolinska Institutet in Sweden, about her specialist work in the proof-of-concept phase of the HEAP platform.
Thanks for taking the time to talk to us Sara. Tell us more about your role in the HEAP project.
My main role is to coordinate large-scale metagenomic and metatranscriptomic work, including implementation of data processing workflows and automated analytic pipelines within the HEAP informatics platform. My role is to detect all the microorganisms that we are exposed to, including all known viruses, bacteria and fungi, and also to identify those that are currently unknown.
I am also responsible for making the Swedish cervical screening cohort data available on the HEAP platform for testing purposes and pilot studies.
Tell us more about this cohort, and how it will contribute to the HEAP project.
The Swedish cervical screening cohort is one of the largest cohorts in the world. It comprises all data and metadata from women in Sweden who are screened for human papillomavirus and cervical abnormalities by the cervical screening programme. The cohort currently includes around half a million women, with around 150,000 new women enrolled every year. In total, the cohort has around one million biological samples.
The programme began in 1964 and was expanded nationally between 1967 and 1973. Every female resident in Sweden is invited for a cervical screening test at least once every 3–5 years between the ages of 23 to 64. This is a well-established cohort that has been expanding for almost 60 years and has validated documentation, making it ideal for longitudinal, long-term follow-up studies.
We collect all the data and metadata associated with the participants and their corresponding samples, including information on their age, diagnosis, HPV status, invitations sent, and dates. Biospecimens have been collected and stored in the biobank since 2011, and we perform systematic epigenomic and metagenomic analyses on the samples.
The Swedish cervical screening programme uses a national cervical screening registry, the NKCx, as part of evidence-based surveillance and quality assurance. The Swedish National Cervical Screening Registry has 100% national coverage and monitors and evaluates the programme using reports of screening invitations, cervical cytologies, histopathologies, and human papillomavirus (HPV) tests.
The programme uses Key Performance Indicators (KPIs), such as population test coverage, diagnostic profiles, invitation coverage, the proportion of follow up of abnormal tests, and other statistics. These are reported back to the cervical screening programmes in each region to help improve the effectiveness of the programme.
For our HEAP-related research, we are performing Next Generation Sequencing of the cohort’s biological samples to investigate microbial exposures and other scientific questions such as: Does HPV mutate over time? Are there changes in HPV infecting isolates in persistent infections? Do bacterial communities differ in healthy tissue vs. lesions or cancer tissues?
What are the potential benefits of making the cohort data more widely available?
By combining our data with data from other cohorts, we can help develop accurate Artificial Intelligence (AI) algorithms. Developing accurate models requires huge amounts of data.
Also, combining Swedish cohort data with data from other countries allows us to compare microbial exposure between different geographical settings. And, of course, new research questions could be explored using this validated data, which avoids the need to collect new data.
What kinds of results would you hope to see from a wider use of the cohort data?
Our ultimate goal is to eradicate cervical cancer. There are some important questions that we hope our work with HEAP can help answer, such as: Why isn’t HPV present in some cervical cancers? Do microbiota play a role in the development of lesions, and if so, can they be used as biomarkers? How will genotypes look when women have been vaccinated at a young age? Can AI be developed to screen and detect cases where HPV is persistent?
The cohort will provide detailed and reliable data that will inform the design of strategies to eradicate HPV and cervical cancer.
We also hope to achieve a full description of the environmental exposures that affect women (their exposome) and how these may interact to cause cancer.
What have been the main challenges with this project so far?
I would not call it a challenge as such, but collaborating with people from different fields of expertise has led me to revise my ideas about the FAIR principles, user-friendliness and open science. What I previously thought was user-friendly or FAIR has changed since I began working on this project.
What is the most exciting thing about this project, from your perspective?
It has been rewarding to collaborate with experts from many different fields, to learn from the obstacles that we encounter, and to solve them together.
It is also exciting to share our pipelines and data with other exposome researchers who lacked the resources that we are fortunate to have with this cohort.
Thanks for the interview Sara, and we wish you the best with your research!