Research

Overview

I try to solve scientific problems with machine learning. Much of my work has been on combining machine learning and crowdsourcing to do better science than with either alone. You can find my published research on ORCID and Google Scholar.

Machine learning is only useful if it’s applied to real data. I spend more than half my time building end-to-end data pipelines to enable data-centric ML. I also spend a lot of time trying to design ML systems which are effective at solving a scientific task and yet simple enough to run reliably in production.

Foundation Models with Galaxy Zoo

I am the lead data scientist for Galaxy Zoo, a citizen science project where hundreds of thousands of volunteers classify images of millions of galaxies.

I build machine learning systems that collaborate with our volunteers to label galaxies hundreds of times faster. My deep learning models learn to accurately predict what volunteers would say for 1.5 million galaxies. They can then be adapted to solve new tasks with little (if any) additional labels.

These pretrained foundation models are available to all astronomers via my open-source package Zoobot. Zoobot is part of the software pipeline for space telescope Euclid, launching in July 2023.

Please find more details on this blog and this ICML workshop paper.

Mysterious Bursts from Space with CHIME

I am the Principal Investigator of Bursts from Space. We’re searching for fast radio bursts using citizen scientists and machine learning. We’ve investigated more than 55,000 candidate signals in our search and have made some intruiging finds which I hope to share soon.

Funding

I started and am a co-lead of the €150,000 European Space Agency grant “Exploring the evolution of galaxy morphology in different environments with Zoobot” (2022), funding a two year postdoc to apply my software to ESA’s archives.

I started and am co-lead of the $29,000 Meta (Facebook) grant “AI-assisted Soft Segmentation of Distant Galaxies by Citizen Scientists”, funding a software developer to help me create efficient methods for semantic segmentation.

I am grateful to have been also awarded funding by the Software Sustainability Institute (Fellow ‘22), and the Alan Turing Institute (Postdoctoral Enrichment Scheme ‘22).