Kenneth Heafield

My work focuses on data and speed for language models and machine translation.

Data:

Technical lead manager of text pre-training data at Meta for two years.
Created HPLT that released clean text and models from 7.2 petabytes of web crawl.
Ran ParaCrawl that made the largest parallel corpora for many languages; also did patents.
Worked on No Language Left Behind, published in Nature.

Speed:

Founded and industry exit for Efficient Translation Limited, which sold low latency machine translation.
Created and ran Bergamot that launched client-side machine translation, installed by default in Firefox.
Wrote the KenLM toolkit for efficient n-gram language models.

According to the New York Times, I am a native speaker of C++ "on semipermanent loan from the Internet" and my t-shirt collection is "threadbare."

People have trouble spelling my last name, even under oath.

Brief CV

Meta:	Technical Lead Manager, Llama text data
Efficient Translation:	Founder with industry exit
Edinburgh:	Reader ≅ Associate Professor
Edinburgh:	Lecturer ≅ Assistant Professor
Bloomberg:	Senior Research Scientist
Stanford:	Postdoc
Edinburgh:	Research Associate
Carnegie Mellon:	PhD advised by Alon Lavie
Google:	Software Engineer
Caltech:	BSc, Mathematics and Computer Science