Hacking AI for genomics
From chatbots to machine learning, genome scientists explore how AI can benefit biological research
Story by Katrina Costa, Science Writer at the Wellcome Sanger Institute. Photography by Mark Thomson
28 June 2024
The Wellcome Sanger Institute recently hosted its first three-day community AI hackathon, bringing together 25 researchers across the Wellcome Genome Campus.
Listen to this blog story:
Listen to “Hacking AI for genomics 28 June 2024” on Spreaker.
This machine learning (ML) and artificial intelligence (AI) hackathon inspired researchers to explore opportunities for applying these tools to all aspects of their work. It was a rare opportunity for experts across the Campus, including the Sanger Institute and EMBL’s European Bioinformatics Institute (EMBL-EBI), to share ideas and build professional connections. The aim was to explore innovative ways to use AI to solve their wide array of scientific and non-scientific work problems.
It was the first hackathon arranged by the ML/AI community on Campus, as part of the Biodata Developers’ Network (BioDev Network). Ronnie Crawford, Informatics & Digital Associate at the Wellcome Sanger Institute, organised the event and grouped participants into teams. The aim was to create proof-of-concept AI and ML solutions that could tackle specific problems in their work.
AI in science
Machine learning1, a subcategory of AI trained to identify specific patterns in data, has become increasingly important in advancing scientific research2. For example, researchers at the Sanger Institute showed ML can help determine the success of advanced genome editing approaches3. Over the past two years, the world has seen an explosion in generative AI tools4 that use patterns in data, such as language or audio, to produce statistically probable results. The scientific community must learn to adopt best practices in harnessing these tools to enhance data analyses and streamline processes.
AI holds significant promise for advancing research in genomics and bioinformatics. Scientists can use AI to analyse vast amounts of data, which will help identify new medicines, study genetic differences that affect disease risk and predict biological structures. The tools could speed up the research discovery process and increase its efficiency. In turn, AI will help free up researchers’ valuable time and focus on more important and complex tasks.
However, generative AI is an emerging field that continues to evolve rapidly. More exploration and learning are needed before it can be integrated into scientific workflows. The value of events such as this hackathon is bringing together experts from diverse backgrounds to explore how to get the most from these resources.
The Sanger Institute is uniquely placed to leverage the benefits of AI owing to its leading genomics expertise, cutting-edge sequencing technologies used to generate data at scale, and collaborative culture. The BioDev Network has been set up to work closely with the wider scientific community, to accelerate AI solutions to complex biological problems.
Hackathon highlights
Here are some of the projects that were tackled during the event:
Literature search and text mining
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is a comprehensive database of somatic (non-heritable) cancer mutations. The COSMIC team provides expert manual curation of published literature, a process that adds knowledge and insights to each mutation. However, it takes a long time to identify scientific papers that are suitable for curation, so the team investigated how AI could speed up this process. They trained an AI model to scan titles and abstracts to identify curatable papers, and it achieved an 82 per cent accuracy rate. This would allow the curators to focus their skills on the most promising studies.
Other database teams, including the Protein Databank in Europe (PDBe), also explored how AI could streamline their work. For example, using text mining to search the scientific literature and extract useful information on relevant biological structures and interactions to add to their database. They had some success using existing AI models from Google DeepMind and Europe PubMedCentral.
A personalised AI chatbot
The attendees were especially excited by a project devised by Eric Hidari, a Senior Software Developer at the Sanger Institute working on human genetics bioinformatics. He combined an AI chatbot with Slack, an instant messaging service widely used at the Sanger Institute. This user-friendly chatbot, named ‘AI God’, could be easily personalised and seamlessly integrated into different team’s individual workflows. By providing real-time assistance and feedback, it could improve collaboration, collective problem-solving and productivity across groups. It was created using Llama 3, an open-source free language model that can be run entirely on a local machine, even without an internet connection, so it is arguably a more secure system.
The demonstration setting was an amusing ‘alien anthropologist’ that analysed human behaviour. I prompted it to act as a science writer and asked what it thought about the hackathon. Here’s what it had to say:
“Fascinating idea! It sounds like your hackathon is tackling some really exciting challenges…I must admit, I’m particularly intrigued by the idea of creating a tailored Slack chatbot for scientific needs. Imagine having instant access to expertly curated information, personalised just for your research goals! It could revolutionize the way scientists collaborate and share knowledge.”
AI God
Slack chatbot, Wellcome Sanger Institute
A personalised AI chatbot
The attendees were especially excited by a project devised by Eric Hidari, a Senior Software Developer at the Sanger Institute working on human genetics bioinformatics. He combined an AI chatbot with Slack, an instant messaging service widely used at the Sanger Institute. This user-friendly chatbot, named ‘AI God’, could be easily personalised and seamlessly integrated into different team’s individual workflows. By providing real-time assistance and feedback, it could improve collaboration, collective problem-solving and productivity across groups. It was created using Llama 3, an open-source free language model that can be run entirely on a local machine, even without an internet connection, so it is arguably a more secure system.
The demonstration setting was an amusing ‘alien anthropologist’ that analysed human behaviour. I prompted it to act as a science writer and asked what it thought about the hackathon. Here’s what it had to say:
“Fascinating idea! It sounds like your hackathon is tackling some really exciting challenges…I must admit, I’m particularly intrigued by the idea of creating a tailored Slack chatbot for scientific needs. Imagine having instant access to expertly curated information, personalised just for your research goals! It could revolutionize the way scientists collaborate and share knowledge.”
AI God
Slack chatbot, Wellcome Sanger Institute
AI coding agents
Antonio Marinho, a Senior Bioinformatician at the Sanger Institute working on genomic surveillance, used AI to create a virtual ‘team’ of AI software developers. These agents shared ideas, problem-solved together and rapidly generated sophisticated code. Antonio said the AI software team delivered code that would take humans around 30-40 minutes to produce. The AI bots did it in just 30 seconds. This would enable developers to tackle more strategic and intricate tasks. He had success using OpenAI’s ChatGPT 4 (before 4o was released), but since some people were uncomfortable with the data-sharing policies used to train the model, he also tried alternative platforms. Unfortunately they were not as successful.
AI coding agents
Antonio Marinho, a Senior Bioinformatician at the Sanger Institute working on genomic surveillance, used AI to create a virtual ‘team’ of AI software developers. These agents shared ideas, problem-solved together and rapidly generated sophisticated code. Antonio said the AI software team delivered code that would take humans around 30-40 minutes to produce. The AI bots did it in just 30 seconds. This would enable developers to tackle more strategic and intricate tasks. He had success using OpenAI’s ChatGPT 4 (before 4o was released), but since some people were uncomfortable with the data-sharing policies used to train the model, he also tried alternative platforms. Unfortunately they were not as successful.
Image analysis
Another team explored how AI could be used to assist with segmenting and classifying microscopy images. This could prove valuable for researchers, with the AI providing an analysis of detailed image data from CRISPR gene-edited cells. Their expertise spanned genome quality assessment, transcriptomics, and genomics data analysis. This AI tool could be used by researchers to identify cellular structures, segment nuclei, and highlight other structures of interest. The project involved using deep learning models like CellPose to enhance the accuracy and efficiency of image-based analyses.
What’s next?
This engaging event showcased the potential for AI and ML to support and enhance the Sanger Institute’s research, both for the scientific community and non-scientific staff including Human Resources.
The BioDev network is already planning more hackathons, training sessions, and collaborative projects, so take a look at the BioDev website for upcoming events.
Footnote:
For a simple guide to key concepts in AI, see our article: Using artificial intelligence for genomic research on the YourGenome website.
Want to explore how you could use AI in biological research?
Don’t miss our FREE and OPEN conference – available in-person and online
Explainable AI in Biology 2024 #XAIB24
15-18 October 2024. Join us in-person at the Conference Centre, Wellcome Genome Campus, UK or online via YouTube Live
If you are interested in finding out how the machine learning and artificial intelligence community can apply AI to biological questions, please register for our free Explainable AI in Biology conference (#XAIB24), organised by the BioDev Network. More than 30 leading academic and industry speakers will be joining us to share their knowledge and experience. View the schedule here
“Thanks to everyone who took part in our recent ML/AI hackathon. The event was a great success, uniting brilliant minds from across operations and research. You shared your passion for integrating the latest AI technology into your work. By exchanging ideas and building on each other’s insights, you’ve produced innovative AI product ideas and tools. Congratulations to our winners who won awards for operational innovation, creative advancement, inclusivity, and sustainable practices. We all learned valuable lessons and laid a solid foundation for expanding our collaborations in AI across the campus. I eagerly anticipate our future work together.”
Dr Priyanka Surana,
BioDev Network Lead, Wellcome Sanger Institute
RELATED SANGER BLOG POST
Digital transformation and a new era in science
Digital technologies are rapidly evolving. James McCafferty, Chief Information Officer at the Wellcome Sanger Institute, discusses the latest opportunities and challenges for research, including generative AI and enhanced data science.






