Laboratory for Natural Language Processing

Mission

.01

LEYA is dedicated to interdisciplinary research in the fields of Machine Learning and Natural Language Processing. We try to address fundamental properties of language, computation and learning, that might contribute to a better qualitative understanding of language as a whole. We believe that deeper conceptual insights into the nature of language and its relation to cognition are vital for the further progress in NLP.

Study & Location

.02

Career

.03

LEYA is a research laboratory located in St.Petersburg, Russia. We are committed to innovative fundamental research of machine learning systems for text representation, comprehension and generation.

This mission lives on an interdisciplinary approach to the scientific questions. We seek inspiration across various fields such as cognitive research, linguistics, information theory, theory of dynamical systems, game theory, complexity, and data science. We are committed to open international research environment. Our research positions do not require any knowledge of Russian.

  • We are looking for a PhD Student

    If you have a master’s degree with a strong commitment to data-driven empirical research and/or machine learning, want to obtain a PhD from the leading Russian research university, want to work in a close collaboration with the biggest European IT company, and spend the next three years in a vibrant city of St. Petersburg, you should apply.

    Our offer

    • Full-time PhD research position for 3 years with a possibility of extension
    • No teaching duties, but the possibility to teach a course for undergraduates
    • We provide a scientific environment where you are encouraged to explore new scientific directions

    Application

    Deadline November 1 (starting date flexible).
    Please submit the following information:

    • CV
    • list of publications
    • transcript of Master certificate
    • research statement addressing possible research interests
    • names and contact data of two or more referees willing to write a letter of recommendation

    Applications should be sent by email to iyamshchikov@hse.ru.

  • We are looking for a Postdoctoral Researcher

    If you have a strong research interest in discrete complex sequences combined with a background in machine learning theory and applications, want to work in a close collaboration with the biggest European IT company, and spend several years in a vibrant city of St. Petersburg, you should apply.

    Our offer

    • Full-time PostDoc research position for 1-3 years
    • No teaching duties, but the possibility to teach an advanced course for our masters
    • We provide a scientific environment where you are encouraged to explore new scientific directions

    Application

    Deadline November 1 (starting date flexible).
    Please submit the following information:

    • CV
    • list of publications
    • transcript of PhD certificate (if you did not receive your PhD certificate yet: Please upload a short statement about the current status of your PhD and a brief summary of what your thesis is about)
    • names and contact data of two or more referees willing to write a letter of recommendation

    Applications should be sent by email to iyamshchikov@hse.ru.

Events

.04

We host a weekly research seminar via Zoom but would be happy to host it offline as long as the COVID-19 situation is under control globally and international travel is simplified. In order to register for the seminar, please, join our mailing list.

Research

.05
  • The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of such methods. Using a new dataset of fourteen thousand sentence pairs human-labeled according to their semantic similarity, we demonstrate that none of the metrics widely used in the literature is close enough to human judgment in these tasks. A number of recently proposed metrics provide comparable results, yet Word Mover Distance is shown to be the most reasonable solution to measure semantic similarity in reformulated texts at the moment.

    By Ivan P. Yamshchikov, Viacheslav Shibaev, Nikolay Khlebnikov, Alexey Tikhonov
    May 18, 2021
  • Eval4NLP (co-located with EMNLP 2021)StoryDB: Broad Multi-language Narrative Dataset

    This paper presents StoryDB - a broad multi-language dataset of narratives. StoryDB is a corpus of texts that includes stories in 42 different languages. Every language includes 500+ stories. Some of the languages include more than 20 000 stories. Every story is indexed across languages and labeled with tags such as a genre or a topic. The corpus shows rich topical and language variation and can serve as a resource for the study of the role of narrative in natural language processing across various languages including low resource ones. We also demonstrate how the dataset could be used to benchmark three modern multilanguage models, namely, mDistillBERT, mBERT, and XLM-RoBERTa.

Team

.06
  • Ivan P. Yamshchikov
    Ivan P. Yamshchikov
    Dr.rer.nat., Researcher at Yandex, Associate Professor at Higher School of Economics in St. Petersburg.

    Ivan’s research interests include natural language generation, computational creativity and various applications of exploratory data analysis. He received his Ph.D. in applied mathematics at the Technical University Cottbus-Senftenberg (Germany). Ivan is a frequent speaker at major professional conferences and forums. He is also one of the co-founders of Creaited Labs, a project that studies intersections of art and artificial intelligence.

  • Vladislav Mosin
    Vladislav Mosin
    Master student at Higher School of Economics in St. Petersburg, junior researcher at LEYA.

    Vladislav’s research interests include different areas of natural language processing and applied math connected with machine learning. Vladislav received a bachelor’s degree at Higher School of Economics and has been combining his studies with work at Jetbrains Research, Huawei and Yandex.

  • Kristina Zaides
    Kristina Zaides
    Ph.D., manager at LEYA, researcher at Saint Petersburg State University, Russian teacher at ITMO University.

    Kristina’s research interests include spontaneous spoken speech analysis, corpus linguistics, and modern language investigation. She received her Ph.D. in linguistics at Saint Petersburg State University. At LEYA, Kristina maintains business processes, implements administrative and financial work.

  • Maxim Surkov
    Maxim Surkov
    Master student at Higher School of Economics in St. Petersburg, junior researcher at LEYA.

    Maxim’s research interests include natural language processing and generation. He received his BSc in applied mathematics and informatics at the Higher School of Economics (Saint Petersburg). Maxim is the Northern Eurasia Finals 2020-2021 Online champion and ICPC 2020-2021 world finalist.

  • Alexey Tikhonov
    Alexey Tikhonov
    External Senior Researcher at LEYA, Senior Data Analyst, Yandex, Berlin.

    Alexey is interested in various aspects of modern deep learning. His particular focus is currently in the areas of generative natural language processing, commonsense reasoning and multimodal data.

  • Sharwin Rezagholi
    Sharwin Rezagholi
    PhD, External Senior Researcher at LEYA

    Sharwin is focused on theoretical aspects of machine learning. He combines expertise in the areas of dynamical systems, category theory and theoretical computer science.