In 2007, while a researcher at Oxford, astrophysicist Kevin Schawinski co-founded what would become the largest online citizen science project to date. Galaxy Zoo involved several hundred thousand volunteers pouring over images from the Sloan Digital Sky Survey to classify galaxies. Significant discoveries were made, dozens of journal articles published the results, and another site, Zooniverse, launched to apply the same process in other fields.
Since then, and in part as a result, citizen science has become all the rage. Funding agencies are keen to support it, the concept has proven useful, and it’s a popular pastime for those whose passion for science must be expressed outside of working hours. But just as the movement has earned its chops, it may be about to be upstaged.
“I think the whole old approach of ‘Let’s crowdsource it to the whole internet’, that’s I think somewhat if not largely superseded’” says Schawinski, now at ETH Zurich. “Just because thanks to machine learning, we don’t need half a million people to go click away at galaxies anymore. I think that is probably over.”
Galaxy Zoo sorted about a million objects, but looking ahead to data streams that astronomers will be analyzing, the numbers will be more like a billion or a trillion objects.
“That’s not crowd-sourceable even if everyone on the internet stopped looking at cat videos,” says Schawinski.
Part of the issue is that astronomy simply has more data to process than other fields. Schawinski recalls a forum on Big Data in which a pharma CEO attempted to convince the room of the importance of new analytical tools.
“He was saying how ‘a single human genome is as much data as the Hubble Space Telescope has taken over its lifetime’, to which I was giggling and thinking, ‘yeah, the Hubble Space Telescope is a tool that was developed in the 80s,’” says Schawinski.
Now, astronomical systems generate that amount on almost a daily basis.
The ETH team is currently developing neural networks to tackle the influx, as well as to extract more value from old datasets, rather than just building newer and more expensive telescopes. The idea is that, while current machine learning trials involve training the network on a few thousand pre-labeled images, eventually they would like to leave the machine unsupervised with large databases and have it come back with its own ideas of how the universe is organized. Perhaps it will produce an entirely new way to classify the heavenly objects we take for granted.
After four hundred years of tinkering with taxonomies using the scientific method, astronomers now have the chance to look outside of our species for a second opinion. The trouble is, neural network technologies have been developed to recognize images on a human scale–pictures of cats or bedrooms or faces. Astronomical images, in contrast, have a much larger dynamic range. Most of the pixels register at zero, and then a few are very bright.
After several frustrating trials, the team found that stretching the images using a hyperbolic sine (usually used to make starscape images easier for humans to perceive) also makes it easier for the neural network to find patterns. Their approach has implications for machine learning in other areas of science.
“If you look at proteinomics, they have exactly the same problem. The dynamic range of their spectrum is very large,” says Ce Zhang, who leads the computer science portion of the team. “Hopefully in the next couple of years, we can have a neural network just designed for astronomy. That is my dream.”
It’s an admirable dream, but it somehow lacks the soaring soundtrack of the citizen science dream. Perhaps we millions of humans chiming in from all over the world–only to be made redundant–can somehow hold on to our unity of purpose. Perhaps we can redirect our efforts to an even greater cause.
Image: NASA – the Crab Nebula as seen by Hubble
Waiting with bated breath for suggestions of such causes in the comments
My understanding of one of the goals of the Galaxy Zoo project in the first place was to make labeled data for ML, and at least as far back as 2009 they were looking to automate things: https://arxiv.org/abs/0908.2033
Yup, we sort of covered that sort of at the time: https://www.lastwordonnothing.com/2010/06/01/the-new-cincinnati/. But Jessa says they’ve found a new way to do it. Or something. My understanding of these things is inexact. Anyway, thank you for writing.