Which Of These Words Means "Zebra"? (Or, Using Visual and Context Clues to Learn Words With Just A Few Examples)

Zebra (image source cc-by). Previously, we’ve discussed the fact that visual information is underutilised, and could help us in language modeling, today I want to look at an example! Now, I don’t read Russian, and I don’t know what any of these words mean. But, looking at this illustration from a children’s storybook on the Bloom Library, I can start to make some guesses. Most likely, the words underneath have something to do with what’s in the image. So which of them, if any, means “Zebra”? This is something Humans can puzzle out, but is hard for NLP. If we can develop ways for machines to “unlock” this knowledge, that would help them learn new languages with limited examples.

Read More

Using the Whole Tatanka (Or, I Don't Have A Massive Pile of Text, How Can We Use ALL the Data We've Got?)

Whole Tatanka In the 1990 film Dances with Wolves, Kevin Costner’s character encounters a group of people whose language he does not speak. In order to start establishing a common vocabulary, he uses his body to mime the shape and behavior of a buffalo/bison. Recognizing this, they teach him their word for the animal (“Tatanka”). Our computational approaches to language learning miss out on this kind of thing, relying on massive quantities of mostly text, and leaving much of the data we do have unused. When I’ve spoken with actual linguists working on smaller languages, I’ve found that the they often have data, it’s just not in a form that computers can use easily, distributed across many formats and files. How can we “use the whole Tatanka?”, not wasting the data that is available?

Read More

TPUS Go BRRR... But I Don't Have Data! (Or, Can We Train Language Models Without Billions of Tokens?)

(Insert “bitter lesson” meme here) So, you want to build language technology to help people. Say a machine translator, so you can help folks to communicate, to read, to share knowledge… ideally you would like your computer to learn new languages! In fact, maybe you want it to learn all the languages! It turns out this is very hard and expensive, let’s look at why that is, and how to attack it.

Read More

Hani Neural Machine Translation: Translating A Low Resource Language

Hani Storybook Sample Summary: Describe the process of creating what is likely the first ever machine translation model for the Hani language, starting with no previous datasets or trained models. Describe data, tools, techniques, and commands used, hopefully enabling easier progress in low-resource translation efforts like this one. Present a baseline and ideas for future improvement.

Read More

Notes from ACL 2020

Brief thoughts and takeaways from the Association for Computational Linguistics 2020 conference, especially as the relate to Bible Translation. Not likely to be complete or even accurate. If I’ve gotten anything wrong feel free to let me know and I’ll try to correct it.

Read More

The PhD Journey

So, I’ve now begun a PhD in machine translation at the University of Dayton. I have it on good authority that blogging about the process can be of great help. This is the start of that! As usual, the idea is for it to be a stable, long-term essay that gets better over time.

Read More

Sidequest - Fun with FUNIT

Very quick post: NVIDIA makes the best deep learning toys. Latest one is GANimal (formerly Petswap), which lets you upload a picture of your pet, and transform it to look like other animals. So I uploaded a picture of my face instead: Me, as various animalsMe, transformed to many 16 different animals

Read More

The Quest for the Anti-Me - Truth in Tables

Hi, and welcome back to Deeply Curious! We’re still goofing around in Latent Space. Last time, I promised we’d “go way too deep into age, gender, and smileyness… and then keep going!” That’s still coming, but first, we must find fill a table, with truth!

Read More

A Curious Beginning

Hi, I’m Colin, and welcome to Deeply Curious, where I have way too much fun playing with deep learning and AI.

Read More