I don’t know where you’re sitting right now, but do me a favor: zoom out to a space shuttle’s eye-view of your spot on the blue marble. Now spin the globe until you’re looking at exactly the opposite side.
If you were here with me in London, our antipodal opposite would be (approximately) New Zealand. So now zoom back in with me, past the pretty blue beaches and the cabbage trees, maybe into one of those sweet little bungalows in Auckland, into its neatly decorated living room, where tucked into the drawer of an end table you’ll find a wedding album. Flip past the smiling bride and groom, and eventually the pictures will start to look weirdly familiar. Because behind their tired, happy honeymoon smiles, you can’t help noticing the architectural idiosyncrasies of your home town. And there, in the far left corner of the picture–that’s you.
I’m going to be honest here. You don’t look great. That pissy grimace is because you’ve just spilled coffee on your clean white shirt, a fitting end to what looks to have been a very long day. Unfortunately, I’m also noticing that this picture was taken during that phase when you were trying to bring back Hammer pants. If this image appeared in your Facebook feed, you’d detag it in a New York minute. You might even ask the “friend” who uploaded it to reconsider.
But here, safely tucked into a random stranger’s album in a drawer on the other side of the planet, that radioactively unflattering picture is fine with you. And the happy couple certainly doesn’t care: to them, you’re just human scenery.
Random strangers in our pictures are a prosaic fact of life on par with breathing and gravity. The question of how many tourist pictures we’re in is an exercise in stoner philosophy on par with whether we’re all really perceiving exactly the same color when we say “red”. Interesting enough, but fundamentally unknowable.
But those days are numbered. Face recognition software is getting better fast, and within a few years you should have an answer to the Where’s Waldo conundrum of where you appear in the background of the world’s tourist pictures. What kind of breadcrumb trail would those images reveal? If you could aggregate all the photos of you in the world, it might be possible to build up a surprisingly telling narrative of your life. That would be a dream for archaeologists and historians, and a nightmare for privacy advocates.
Even before Facebook rolled out its face recognition feature, there was a rising chorus of grumbling about the panopticon being formed by citizen photojournalists. Everyone’s got a camera and a Facebook account after all, and that means you have much less control over the dissemination of your image than you once did. Then, this month, three Carnegie Mellon researchers revealed just how advanced we’ve gotten at being able to tell who you are just by snapping a picture of you.
In one experiment, Alessandro Acquisti’s team identified individuals on a popular online dating site where members protect their privacy through pseudonyms. In a second experiment, they identified students walking on campus — based on their profile photos on Facebook.
This is the kind of thing you can expect more of when Google rolls out its super secret face recognition algorithm. Right around this time last year, Google engineer David Petrou was giving a conference talk about Google Goggles, the mobile image recognition app that can tell you all about any object you snap a picture of. Point your smartphone at the Eiffel tower, for example, and within moments you’ll have identifying information and maybe some history, without needing to type a single line of text into a search bar. It’s the inverse of a Google image search.
Then Petrou dropped the bomb: Google Goggles also works on faces.
“The more labeled samples you have—say pictures on social networks—the better we can do,” Petrous said. “There’s a sweet spot, around 17 images, when this technology, given a new picture of you, will rank you in the top ten results 50 percent of the time. When you feed it 50 pictures, you will appear in the top 5 results half the time.”
And yet, after tantalising us with this vision, Petrou announced that Google is opting not to release that algorithm into the wild just yet. The problem is that in terms of legal consequences, this is undiscovered country. Imagine someone snapping your picture in a bar, getting your details and then robbing your house, because they know you’re not home.
Legally, Google can’t afford to roll out its internet-wide image recognition program just yet. Smaller efforts have been made–such as when UK law enforcement recently tried to use face recognition to identify some of the most disruptive rioters–but they found that their algorithms didn’t work that well. No existing algorithm can match what we can do. The way we humans effortlessly identify faces is so subtle and exact that we’ve had notorious trouble replicating the mechanism in machines. One major reason is naturally occurring image variability, what we’ll call the Yoda problem.
You know Yoda. Little green dude from Star Wars; big ears. You’d know him if he was hanging out in a harshly lit fluorescent kitchen, lounging in a bathing suit on a sunny beach, or hiding behind a martini in a shadowy nightclub. You’d know him if you saw him from behind, super close up, or at a considerable distance in a crowd.
But machines aren’t quite as versed in pop culture iconography as you are. They’ll see a well-lit portrait shot of Yoda, and a picture of Yoda lurking in a shadowy alley, and won’t necessarily be able to define enough common features to make a connection–it’s the same guy!–that to you and me seems eye-rollingly obvious. Put your own much less distinctive face in place of Yoda’s, and the machine vision algorithm doesn’t stand a chance.
So why are we humans so good at recognizing other people when they don’t even look like themselves? Marek Barwinski, a senior research scientist at London-based image recognition startup Cortexica, says there is a crucial emotional component to identifying people that might not map well to today’s machine vision algorithms. “Emotions play a crucial role in facilitating strong, even subconscious, connections between the brain areas that process a person’s facial features and looks, and a multitude of other areas related to factual information, attraction, childhood memories, social strategy and so on,” Barwinski tells me. To illustrate his point, I have to tell you about a picture I found.
There was this guy. You know this story, I was 17 and he was A Bad Idea and so of course he was catnip. After a few torrential and intermittent years, we happily cut all ties. I had no idea if he was dead or alive (and he was the type you might wonder that about). He just disappeared.
Around 2005 I was idly thumbing through the internets, and I stumbled across Found Photos, a brilliant little site assembled from images pilfered off people’s openly shared folders. (Click the link if you must, but the rest of your day will be a write-off.)
I was making my way through a series of 70s-era toddler pics–featuring the requisite mom in massive ’70s glasses and brown prints–and there he suddenly was. The light was bad, he was passed out in some kind of Burning Man looking tent and his hair was splayed across his face. He was lying sideways on a sleeping bag, passed out in a pool of (what I’m hoping was his own) vomit. There were a couple of people near him bent over in paroxysms of laughter.
I stared at that picture for a long time. Not to convince myself that it was really him– the shock of recognition had hit me before I could even fully process the image. I could barely see his face. But my hands had been in that hair. I knew what he was wearing: this was how he dressed. I knew his hands. I knew how probable it was that he would have accumulated a group of friends who would find it riotously funny to take a picture of him passed out in a pile of his own puke at Burning Man.
No machine vision algorithm has access to that kind of back story. But those details are crucial in recognizing a person without relying on exact ratios and good lighting. So if Google’s algorithm is going to identify more than just well-lit front-facing portraits, they need to develop a way to see you the way a person sees you, a person who knows you, a person who will feel that shock of recognition when they see you in an unexpected place with different facial hair and after a few rough years of aging. And that’s about a lot more than your face.
Google’s stated goal is to make all the world’s information searchable. When they solve the Yoda problem, it will be possible to trawl for your image in the farthest corners of the internet–Picasa, flickr, FoundPhotos and anything public on Facebook–and dredge up photos that by all rights should have been lost forever.
And when that happens, we should be prepared for the questions that will arise. Do you have any right to your representation on the internet? What if most of the “found photos” of you online are awful? What story will they tell about you? Do you have a right to keep that story to yourself?
Photos: Evil eyes, courtesy Focal Point on wikimedia commons; Yoda, courtesy star wars wikia
The issue with machine vision is that researchers are only willing, paid to or able to solve a well-defined limited challenge of, say, designing AN algorithm to recognise faces.
Us humans have a broad and versatile visual platform on which we base our perceptual experiences, coupled with cognition to quite literally supervise our own learning.
Additionally, we constantly physically interact with the world and therefore learn all the essential mechanics, ray tracing and sometimes… heuristics of perception.
We know the sky is above us, that things grow upwards, that light falls from the top. We experience perspective and non-trivial temporal aspects of movement that teach us implicitly about three dimensional geometry. We construct mental 2.5D or 3D models of objects we interact with.
Most of those things are missing from typical ad-hoc face recognition and object recognition algorithms. Clearly, once a simulated brain, has learned the things that make human perception what it is, solving and object recognition or face recognition problem will not be a challenge.
Finally, as you wrote, emotional responses trigger faster learning. Given the complex social interactions and a spectrum of emotions associated with human behaviour, no wonder we excel at recognising identities and facial expressions.
So Marek, does this mean that if you want to build a truly robust face recognition system, you first have to build a whole brain? I’m suddely reminded of Carl Sagan’s rejoinder that “if you wish to make an apple pie from scratch, you must first invent the universe.”
Well, surely that would help, but of course one is restricted in time and resources so one needs to take short cuts. As I mentioned, very few lucky scientists get the grant money to actually build a whole brain.
I suppose the better face recognition algorithms should have: a decent 3D model of a face, understanding of skin colour tone changes in different illuminations and other non-trivial features – hair recognition? Apple, not often mentioned in face rec acquired Polar Rose – a company with a unique approach to facial feature parametrisation.
Perhaps it is also a question of the size of the database. Dunbar’s number is very low, and I suppose even Dunbar’s number squared is not a very challenging problem for image retrieval. So, while constructing an algorithm to recognise from a list of a couple thousands is doable, a similar system that would work on a population of a globe without additional metadata might never be created.
I suppose in this discussion Sally you could not only invoke Sagan but also paraphrase Harry Truman and beg for a one-handed scientist.
Won’t be much use for photos not digitized or uploaded onto the internet!
good point, but I suspect that like checks, non-digital photos won’t be around for much longer.