Mission
LEYA is dedicated to interdisciplinary research in the fields of Machine Learning and Natural Language Processing. We try to address fundamental properties of language, computation and learning, that might contribute to a better qualitative understanding of language as a whole. We believe that deeper conceptual insights into the nature of language and its relation to cognition are vital for the further progress in NLP.
Study & Location
Career
LEYA is a research laboratory located in St.Petersburg, Russia. We are committed to innovative fundamental research of machine learning systems for text representation, comprehension and generation.
This mission lives on an interdisciplinary approach to the scientific questions. We seek inspiration across various fields such as cognitive research, linguistics, information theory, theory of dynamical systems, game theory, complexity, and data science. We are committed to open international research environment. Our research positions do not require any knowledge of Russian.
We are looking for a PhD Student
If you have a master’s degree with a strong commitment to data-driven empirical research and/or machine learning, want to obtain a PhD from the leading Russian research university, want to work in a close collaboration with the biggest European IT company, and spend the next three years in a vibrant city of St. Petersburg, you should apply.
Our offer
- Full-time PhD research position for 3 years with a possibility of extension
- No teaching duties, but the possibility to teach a course for undergraduates
- We provide a scientific environment where you are encouraged to explore new scientific directions
Application
Deadline November 1 (starting date flexible).
Please submit the following information:- CV
- list of publications
- transcript of Master certificate
- research statement addressing possible research interests
- names and contact data of two or more referees willing to write a letter of recommendation
Applications should be sent by email to iyamshchikov@hse.ru.
We are looking for a Postdoctoral Researcher
If you have a strong research interest in discrete complex sequences combined with a background in machine learning theory and applications, want to work in a close collaboration with the biggest European IT company, and spend several years in a vibrant city of St. Petersburg, you should apply.
Our offer
- Full-time PostDoc research position for 1-3 years
- No teaching duties, but the possibility to teach an advanced course for our masters
- We provide a scientific environment where you are encouraged to explore new scientific directions
Application
Deadline November 1 (starting date flexible).
Please submit the following information:- CV
- list of publications
- transcript of PhD certificate (if you did not receive your PhD certificate yet: Please upload a short statement about the current status of your PhD and a brief summary of what your thesis is about)
- names and contact data of two or more referees willing to write a letter of recommendation
Applications should be sent by email to iyamshchikov@hse.ru.
Events
We host a weekly research seminar via Zoom but would be happy to host it offline as long as the COVID-19 situation is under control globally and international travel is simplified. In order to register for the seminar, please, join our mailing list.
Research
The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of such methods. Using a new dataset of fourteen thousand sentence pairs human-labeled according to their semantic similarity, we demonstrate that none of the metrics widely used in the literature is close enough to human judgment in these tasks. A number of recently proposed metrics provide comparable results, yet Word Mover Distance is shown to be the most reasonable solution to measure semantic similarity in reformulated texts at the moment.
May 18, 2021This paper presents StoryDB - a broad multi-language dataset of narratives. StoryDB is a corpus of texts that includes stories in 42 different languages. Every language includes 500+ stories. Some of the languages include more than 20 000 stories. Every story is indexed across languages and labeled with tags such as a genre or a topic. The corpus shows rich topical and language variation and can serve as a resource for the study of the role of narrative in natural language processing across various languages including low resource ones. We also demonstrate how the dataset could be used to benchmark three modern multilanguage models, namely, mDistillBERT, mBERT, and XLM-RoBERTa.
Team
- Ivan P. YamshchikovDr.rer.nat., Researcher at Yandex, Associate Professor at Higher School of Economics in St. Petersburg.
Ivan’s research interests include natural language generation, computational creativity and various applications of exploratory data analysis. He received his Ph.D. in applied mathematics at the Technical University Cottbus-Senftenberg (Germany). Ivan is a frequent speaker at major professional conferences and forums. He is also one of the co-founders of Creaited Labs, a project that studies intersections of art and artificial intelligence.
- Vladislav MosinMaster student at Higher School of Economics in St. Petersburg, junior researcher at LEYA.
Vladislav’s research interests include different areas of natural language processing and applied math connected with machine learning. Vladislav received a bachelor’s degree at Higher School of Economics and has been combining his studies with work at Jetbrains Research, Huawei and Yandex.
- Kristina ZaidesPh.D., manager at LEYA, researcher at Saint Petersburg State University, Russian teacher at ITMO University.
Kristina’s research interests include spontaneous spoken speech analysis, corpus linguistics, and modern language investigation. She received her Ph.D. in linguistics at Saint Petersburg State University. At LEYA, Kristina maintains business processes, implements administrative and financial work.
- Maxim SurkovMaster student at Higher School of Economics in St. Petersburg, junior researcher at LEYA.
Maxim’s research interests include natural language processing and generation. He received his BSc in applied mathematics and informatics at the Higher School of Economics (Saint Petersburg). Maxim is the Northern Eurasia Finals 2020-2021 Online champion and ICPC 2020-2021 world finalist.
- Alexey TikhonovExternal Senior Researcher at LEYA, Senior Data Analyst, Yandex, Berlin.
Alexey is interested in various aspects of modern deep learning. His particular focus is currently in the areas of generative natural language processing, commonsense reasoning and multimodal data.
- Sharwin RezagholiPhD, External Senior Researcher at LEYA
Sharwin is focused on theoretical aspects of machine learning. He combines expertise in the areas of dynamical systems, category theory and theoretical computer science.