What’s in a (gene) name

|

Last autumn, Microsoft made a subtle change to its Excel spreadsheets, one which flew under the radar for normal mortals. Unlike many other software tweaks, this one made it easier to stop the software automating certain tasks. Specifically, its tendency to automatically convert input into dates. One major reason for the change was the havoc this automation had wreaked on geneticists needing to populate Excel with gene names like MARCH 1 and SEPT1. “Stop helping me,” one commenter pleaded in the Verge writeup: “Never automate anything without providing an easily discoverable means to de-automate it.” From your lips to god’s ear, mate.

This change from Microsoft was a welcome olive branch, as the human genome has had more than enough problems getting its naming conventions into the 21st century. Here’s my writeup about that from a couple of years ago.

* * *

Look, no one is trying to get a dick joke into the human genome. If it happens, it won’t be by design. No one even really thought it was a possibility until the late 1990s, when the physical chemistry professor Paul W. May was having a beer with some other science friends and they got around to talking about funny molecules. Everyone knew about the ring-shaped molecule called Arsole. It didn’t take long to conjure up several more funny science terms. May began to collect these, and soon had so many that he turned the collection into a blog. By 2008 the blog had become a book. (NB: both blog and book are written in Comic Sans. And he commits to the bit. Main text, table of contents, acknowledgments, and references – all Comic Sans. References!) The book has a whole separate section on gene names, and here you will find some of the spiciest names in science. By the time the book was published, however, some of them were already out of date – the Human Genome Nomenclature Committee had begun to take a keen interest in what geneticists were calling their new genes, and by 2006 had put the kibosh on 10 names deemed the most offensive. But if they thought their work was done, they didn’t know how much stranger it could get.

Back when researchers and their grad students first started the project of identifying genes and their variants, there was no governing body to supervise the naming process. So, as Elah Feder and Helen Zaltzman recount in a recent episode of the Science Diction podcast, any names they conjured up became instant canon. No one cared too much about standards because a lot of the first studies were in fruit flies. And so we were graced with slightly too-on-the-nose gene names like eyeless and antikevorkian.

But there was a method in their madness. A gene is often named after its function, or the loss thereof. Eyeless, well, you can imagine. Some of them got a little fancy, for example the tin man gene, so called because it caused a fruit fly to develop without a heart (geddit?).

Others got extremely fancy, probably the humanities double majors. Amontillado causes fruit fly larvae not to be able to hatch, a reference to Edgar Allan Poe’s gruesome story of a man buried alive in a small space. Things go full arcana with thisbe and pyramus. I don’t have the training for this explanation, either in genetics or in Latin poetry canon.

What’s interesting about paging through May’s book is how these names reflect the culture in which they were conceived. Can you guess the decade they found Sonic hedgehog? Similarly of its time is Evander – a zebrafish with this mutant gene is missing an ear – after Evander Holyfield, whose ear was bitten off in a famous fight with Mike Tyson. Antikevorkian prevents programmed cell death in plant cells. Where Sonic is a little flip, antikevorkian starts to edge right into the realm of bad taste. Similarly, in 2022 you’d probably think twice before christening a gene that makes fruit flies less tolerant of alcohol “cheap date“.

Which brings us to the dick jokes.

Celibate: Male flies are attracted to females but never mate. Dissatisfaction. “Involved in many aspects of sexual behaviour.” Farinelli: after the castrato. A plant gene that produces sterile male flowers. Icebox. Makes female flies uninterested in males. Superman: a gene whose mutation gives you extra dicks (where “you” = “a flower”). Kryptonite – you know what this does.

Finding one dick joke in a gene name is an unexpected delight. But as I leafed through pages of them in May’s book, I started to feel like I was trapped in a room full of seventh grade boys in 1997. Less delightful.

Which brings us back to the Human Genome Nomenclature Committee. The HGNC was established to get rid of these poorly conceived names, but only when a gene named in a fruitfly or other model organism turned out to exist also in humans. (The disease caused by a mutation in lunatic fringe isn’t funny, guys.) But if they thought they could get rid of bad taste and call it a day, they had another thing coming: and that thing was Microsoft Excel and its draconican autocorrect feature.

Now the problem wasn’t the rude names – it was the boring ones. Namely, the ones that Microsoft mistook for dates. American-formatted dates. Dutifully, the software turned the gene MARCH 1 into the date March 1, and SEPT1 into the first day of September. So in 2019, the HGNC gave 27 more genes the chop, renamed into something that wouldn’t confuse Clippy. And that should have been the end of it. It wasn’t. They only got rid of the genes that fell afoul of American autocorrect. Clippy speaks Finnish too. In a paper called “Gene Names: Lessons Not Learned”, Mandhri Abeysooriya and her colleagues last year pointed out “a variety of additional novel error modes,” some of which were “likely related to locale language settings.”

A few papers had the human gene AGO2 converted to Aug-02 (where Excel was in Italian, Spanish or Portugese). The gene MEI1 was converted to May-01 when Excel was speaking Dutch (mei). And “TAMM41 was apparently converted to “Jan-41” due to similarity with the month of January in Finnish (tammikuu).”

Neither were dick jokes and Microsoft-incompatible nomenclature the end of the treats that lurked among the tens of thousands of genes in the literature. There were also the practical jokes.

A few decades ago German fruit-fly geneticists started naming their genes in ways that seemed almost calculated to twist English-speaking tongues into Spätzle (Exhibit A: the gene spätzle). The plant biologist Edward Farmer told me that, rather than being an unavoidable consequence of a global genetics community, this was in fact a very intentional deployment of weaponised linguistics: knirps, krüppel and spätzle don’t exactly roll off the American or British tongue. They were trolling. The plant community soon followed suit, with genes like knolle, wüschel, and zwille.

Farmer himself is no innocent here – when it was time to name his own plant gene, he went for gene name trolling Olympic gold, executing a linguistic double entendre: Fou2 is involved in plant defense mechanisms. It’s a completely unremarkable thing you can say anywhere in the world with a straight face and a clean conscience. Except in France, where the pronunciation (“foo-tu”) is a homonym of “fucked up”. “We did it deliberately for when we spoke in front of French speakers,” he says cheerfully. “With a bit of a theatrical presentation it worked quite well.” He shrugs: “Geneticists like to have fun.”

Someday someone is going to do a PhD on the semiotics of the gene names, and I will be here for it.

Photo credit: “Suprised Tin Man” by Thomas Hawk, licensed under CC BY 2.0

Categorized in: Miscellaneous