When marine biologist Julia Lowndes started graduate school in California in 2006, she expected to spend the next several years learning about the behavior of the Humboldt squid, which had recently—and dramatically—expanded its range north along the California coast.
But before she learned anything about the squid, she discovered, she had to learn to code.
The satellite tags that Lowndes attached to individual squid took second-by-second measurements of depth and temperature, so every day she collected thousands of data points. When she first tried to look at her raw numbers, the file was so huge that she couldn’t open it in Microsoft Excel. Her fellow biologists didn’t know how to help her, so she enrolled in a computer-science course intended for game designers and pored over the book Practical Computing for Biologists, gradually piecing together the programming skills she needed to manage and analyze her massive dataset. “I learned to code in a panic, and mostly on my own,” she says now, “and that’s how a lot of biologists still do it.”
Eventually, her lines of code revealed the story in the data: she learned that Humboldt squid off the California coast can swim 30 miles a day and dive to depths of almost a mile, and that they like to hang out below the surface, feeding on fish day and night.
Lowndes is now a marine data scientist at the National Center for Ecological Analysis and Synthesis in Santa Barbara, California, and she’s part of the Ocean Health Index, an international effort to track the overall state of the world’s oceans. The project’s early challenges were, in some ways, jumbo versions of those Lowndes faced in graduate school: The researchers wanted to turn mountains of ecological data into coherent stories that could be understood worldwide, but their usual tools weren’t up to the task. In a paper published today in Nature Ecology and Evolution, Lowndes and her co-authors describe how unwieldy email threads, vague and inconsistent Excel filenames, and other seemingly small annoyances steadily undercut the project and its goals.
So during their second global assessment, in 2013, the team members gradually transformed themselves from scientists into scientist-programmers. They learned to code in R and RStudio, track the different versions of their files in Git, and share their work in GitHub. They learned from groups like Software Carpentry and rOpenSci that are helping environmental scientists overcome their fear of data science. Now, Lowndes says, marine biologists working in the Baltic can much more easily compare their data to those gathered in the Pacific, and successive assessments can be confidently compared over time. Such comparisons could help inform and enforce wide-ranging protections, such as the newly proposed code of conduct for marine conservation. “There’s this myth that you’re either a coder or you’re not, and that environmental scientists are definitely not,” says Lowndes. “But when people see how powerful these tools are for collaboration and communication, they get on board.”
The Ocean Health Index isn’t the only international conservation effort to suffer from too much information and not enough communication: thanks to cheap technology and the rise of citizen science, we have more data about more species and habitats in more places than ever before, but a lot of those data aren’t being used to protect what they describe. Changing that requires the political will to solve global conservation problems, of course, but it also requires a language that crosses borders. Maybe the common language of conservation is R.
Top photo by Marcus Spiske.
Astronomers went through the same awakening, only earlier. And like your squid researcher, they generally picked up coding as they went along — some of them because they wanted to improve their video games. Even now, I’m not sure they have classes in it. Also they use C. Nor does the field know what to do with astronomers who primarily write code and tends to treat them as not-real astronomers — which has real career consequences. Given the difficulty of coding, this situation has always seemed miraculous to me.
Fascinating. Yes, it seems like the attitude in so many fields (including journalism) evolves from “we don’t need this” to “we need this, but we can just hire a code monkey to do it” to “crap, I guess we all have to learn this.” And early adopters get shoved into the code-monkey category until everyone finally arrives at step 3.
I was a professional developer for 20 years before I started my PhD in Earth Sciences. I noticed that *everyone* needed to be able to code but *no one* had any training in it. I spent a considerable amount of time re-writing terrible bits of coding – bugs in some of which invalidated research – and training my fellow students and, yes, advisors how to code.
Today’s scientist spends 90+% of their time in front of a computer. It seems ridiculous that departments don’t recognize this and require elementary programming as part of the core coursework.
Wonderful blog! My sister is a middle school librarian and she teaches kids to code. There’s also this big lag in school curricula in which coding is not generally taught. She tells me there are something like 700K jobs in NY state in coding that are unfilled. Her piece of advice to those teaching and learning it — skip over the “exercises” and go straight to writing code for something you are interested in achieving. She says it’s not really harder, just a bit more tedious, but more motivating, and then of course, you have usable code at the end of it.
Yup. I wrote about these pi-shaped researchers some years ago for the careers section of Science .
When I interviewed the founder of Software Carpentry, Greg Wilson, for the piece he said he’d wanted to be a science journalist himself!