Todor Mihaylov

Todor Mihaylov

Staff Research Scientist at Meta Superintelligence Labs, Llama Core Team
Email Google Scholar LinkedIn

About

I am a Staff Research Scientist at Meta Superintelligence Labs, Llama Core Team, where I work on pre-training data curation for large language models. I have contributed to major language model releases including Llama 4, Llama 3, Llama 2, and OPT.

I completed my PhD in Computational Linguistics at the Natural Language Processing Group at Heidelberg University, under the supervision of Prof. Dr. Anette Frank.

Selected Papers

→ All papers on Google Scholar

Experience

Staff Research Scientist
Meta Superintelligence Labs, Llama Core Team
Nov 2020 - Present
Working on pre-training data curation for Llama 4+, 3, 2.
Research Intern
AI2 Aristo Team, Seattle
Apr 2018 - Jun 2018
Published OpenBookQA dataset for AI systems question answering and reasoning.
Research Scientist Intern
Amazon AI, Palo Alto
Jun 2017 - Sep 2017

Education

PhD in Computational Linguistics
Heidelberg University
Apr 2016 - Sep 2019
MSc in Computer Science
Sofia University
Oct 2013 - Jul 2015
BEng in Computer Systems
Technical University of Sofia
Sep 2008 - May 2012
Email Google Scholar LinkedIn