Notes from ACL 2020
Brief thoughts and takeaways from the Association for Computational Linguistics 2020 conference, especially as the relate to Bible Translation. Not likely to be complete or even accurate. If I’ve gotten anything wrong feel free to let me know and I’ll try to correct it.
I got a chance to “attend” ACL 2020 this year, which had, like most conferences during the present pandemic, shifted to a virtual platform. I did this as a sort of part-time thing while working, mainly watching the bite-sized pre-recorded videos and taking some notes. Here is a small collection of notes and takeaways based on the experience.
Thoughts on the conference itself.
The virtual format was interesting. I appreciated the pre-recorded videos, and especially the “Visualization” feature, which visually clustered papers based on similarity. The overall experience for discovering interesting research was really nice. And the pre-recorded videos were mostly just the right length and well done, communicating the important points concisely.
Overall, it seemed like a really quick way to get up to speed on the state of the art.
Quick notes on particular papers.
Emerging Cross-lingual Structure in Pretrained Language Models
- It seems that models from different languages and domains are learning similar things. As in, there are inherent similarities in a purely German-trained BERT and a purely English-trained BERT.
- No, you don’t need a shared vocabulary.
- It is important that you have shared parameters in the top layers of the multi-lingual encoder though.
- Perhaps there’s some universal similarities being independently discovered?
- It seems that multilingual models are automatically finding and aligning based on those symmetries. I wonder what the Universal Dependencies folks would think of this?
On the Cross-lingual Transferability of Monolingual Representations
- They get comparable results to Multilingual BERT, starting with monolingual models.
- This paper, and the one above, seem to both be hitting similar concepts, namely results which “suggest that deep monolingual models learn some abstractions that generalize across languages”.
- Oh, and they release a new dataset, XQuAD
On the Linguistic Representational Power of Neural Machine Translation Models
- *this one is interesting towards the problem of “backing out” a grammar. Perhaps AllTheWord would be interested?
Finding Universal Grammatical Relations in Multilingual BERT
- “Does Multilingual BERT learn a cross-lingual representation of syntactic structure? Yes!”
- Turns out it gets (almost) the same results as Universal Dependencies, which is sort of the Gold Standard.
- Again, perhaps AllTheWord would be interested?.
Tabula nearly rasa: Probing the Linguistic Knowledge of Character-Level Neural Language Models Trained on Unsegmented Text
- I thought this one was a really interesting concept. A truly Universal Translator would presumably have to get away from custom methods for particular languages.
- However, most models these days have a lot of information encoded in them already simply by the fact that there’s spaces between words. That’s kind of cheating, right? And what’s more doesn’t transfer well to, say, Chinese.
- Some parts better, some parts worse. Kinda promising still.
- They use an LSTM here. Why not a transformer?
- Note to self: if some other Machine Learning technique comes along that can learn multiple levels of structure (morphology, grammar, etc), come back and rub it on this problem of character-level data…
A Call for More Rigor in Unsupervised Cross-lingual Learning
- Essentially a reality-check. Turns out that there’s more supervision in most real-world scenarios than you might think.
- For example, using the dictionaries from PanLex you can usually get at least a bit of word-level supervision.
- They describe speech data as “promising but challenging”. Perhaps this is the next great frontier!
Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting
- I really like the concept behind this one, because it makes the creation of training data much easier. All you need is for people to caption images in their own languages.
- Then, you train image captioners for each language.
- Then you have these “pivot” points that help to align the languages together.
- I wonder if you could do it with audio?
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
- This is the sort of thing we need in order to make MT viable for the “laptop in the jungle” scenarios.