The Quest for the Anti-Me - Truth in Tables

Hi, and welcome back to Deeply Curious! We’re still goofing around in Latent Space. Last time, I promised we’d “go way too deep into age, gender, and smileyness… and then keep going!” That’s still coming, but first, we must find fill a table, with truth!

What’s a Latent Direction?

As you may recall, last time I found myself in latent space. That is to say, I found the big list (the “vector”) of numbers (or “latents”), that, when fed into the program, generates my face: my face, generated

The really cool thing about these “latent vectors”, is that there are… directions you can go. And often, those directions are understandable by people.

So, for example, you can take a whole bunch of latent vectors of people smiling, and another group of people frowning, and do something like:

(smiling people) - (not smiling people) = [some new big list of numbers]

That new list of numbers describes (approximately) the “smiling/not-smiling” direction. So then you take that, and add it to a latent vector that describes someone’s face, and get something like this:

a (In this picture, from the original Stylegan-Encoder repo by Puzer, Puzer has managed to find a latent direction that describes frowning/smiling. They then added and subtracted that latent direction to pictures of famous politicians. The original photos are down the center.)

I was wondering whether I’d be able to do anything like this myself, given the right pairs of pictures. I went out into a local park with my friend, and we took a set of 8 pictures…

The Truth Table

So, I was wondered what the difference would be,

  • between hatted me,
  • and smiling me,
  • and with my glasses on so I can see.

So I made the combinations of all three!

(This sort of layout, where you write down every combination in yes/no format, is called a “Truth Table”)

Hat Glasses Smile Original File Generated after a While
no no no 4138Aligned original: MG_4138 41384500 iterations, FID=0.12
no no ya 4142Aligned original: MG_4142 41423600 iterations, FID=0.14
no ya no 4139Aligned original: MG_4139 41399868 iterations, FID=0.09
no ya ya 4143Aligned original: MG_4143 414311400 iterations, FID=0.13
Hat Glasses Smile Original File Generated after a While
ya no no 4137Aligned original: MG_4137 41371500 iterations, FID=0.19
ya no ya 4141Aligned original: MG_4141 414113000 iterations, FID=0.08
ya ya no 4136Aligned original: MG_4136 413657000 iterations, FID<0.05
ya ya ya 4140Aligned original: MG_4140 414012000 iterations, FID<0.16

Bonus Image

Me, with Cowboy Hat on.Me, with Cowboy Hat on. 19688 iterations, FID=0.12

Note: this took forever to do. I had to roll balls down hills for hours.

Next time: Can we roll a ball down a hill more efficiently?

Gradient descent animation from "An overview of gradient descent optimization algorithms" by Alec Radford

(Image from “An overview of gradient descent optimization algorithms” by Sebastian Ruder, who credits Alec Radford)

Written on April 6, 2019