Laboratory for Natural Language Processing

Mission

.01

LEYA is dedicated to interdisciplinary research in the fields of Machine Learning and Natural Language Processing. We try to address fundamental properties of language, computation and learning, that might contribute to a better qualitative understanding of language as a whole. We believe that deeper conceptual insights into the nature of language and its relation to cognition are vital for the further progress in NLP.

Study & Location

.02

Career

.03

LEYA is a research laboratory located in St.Petersburg, Russia. We are committed to innovative fundamental research of machine learning systems for text representation, comprehension and generation.

This mission lives on an interdisciplinary approach to the scientific questions. We seek inspiration across various fields such as cognitive research, linguistics, information theory, theory of dynamical systems, game theory, complexity, and data science. We are committed to open international research environment. Our research positions do not require any knowledge of Russian.

  • We are looking for a PhD Student

    If you have a master’s degree with a strong commitment to data-driven empirical research and/or machine learning, want to obtain a PhD from the leading Russian research university, want to work in a close collaboration with the biggest European IT company, and spend the next three years in a vibrant city of St. Petersburg, you should apply.

    Our offer

    • Full-time PhD research position for 3 years with a possibility of extension
    • No teaching duties, but the possibility to teach a course for undergraduates
    • We provide a scientific environment where you are encouraged to explore new scientific directions

    Application

    Deadline November 1 (starting date flexible).
    Please submit the following information:

    • CV
    • list of publications
    • transcript of Master certificate
    • research statement addressing possible research interests
    • names and contact data of two or more referees willing to write a letter of recommendation

    Applications should be sent by email to iyamshchikov@hse.ru.

  • We are looking for a Postdoctoral Researcher

    If you have a strong research interest in discrete complex sequences combined with a background in machine learning theory and applications, want to work in a close collaboration with the biggest European IT company, and spend several years in a vibrant city of St. Petersburg, you should apply.

    Our offer

    • Full-time PostDoc research position for 1-3 years
    • No teaching duties, but the possibility to teach an advanced course for our masters
    • We provide a scientific environment where you are encouraged to explore new scientific directions

    Application

    Deadline November 1 (starting date flexible).
    Please submit the following information:

    • CV
    • list of publications
    • transcript of PhD certificate (if you did not receive your PhD certificate yet: Please upload a short statement about the current status of your PhD and a brief summary of what your thesis is about)
    • names and contact data of two or more referees willing to write a letter of recommendation

    Applications should be sent by email to iyamshchikov@hse.ru.

Events

.04

We host a weekly research seminar via Zoom but would be happy to host it offline as long as the COVID-19 situation is under control globally and international travel is simplified. In order to register for the seminar, please, join our mailing list.

  • Statistical inference for Bures–Wasserstein barycenters
    2nd of December. 17:00 St. Petersburg time

    Dr. Alexandra Suvorikova, Weierstrass Institute for Applied Analysis and Stochastics, Berlin.

    In this work we introduce the concept of Bures–Wasserstein barycenter Q∗, that is essentially a Fréchet mean of some distribution P supported on a subspace of positive semi-definite d-dimensional Hermitian operators H+(d). We allow a barycenter to be constrained to some affine subspace of H+(d), and we provide conditions ensuring its existence and uniqueness. We also investigate convergence and concentration properties of an empirical counterpart of Q∗ in both Frobenius norm and Bures–Wasserstein distance, and explain, how the obtained results are connected to optimal transportation theory and can be applied to statistical inference in quantum mechanics.

Research

.05
  • The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of such methods. Using a new dataset of fourteen thousand sentence pairs human-labeled according to their semantic similarity, we demonstrate that none of the metrics widely used in the literature is close enough to human judgment in these tasks. A number of recently proposed metrics provide comparable results, yet Word Mover Distance is shown to be the most reasonable solution to measure semantic similarity in reformulated texts at the moment.

    By Ivan P. Yamshchikov, Viacheslav Shibaev, Nikolay Khlebnikov, Alexey Tikhonov
    May 18, 2021
  • Eval4NLP (co-located with EMNLP 2021)StoryDB: Broad Multi-language Narrative Dataset

    Paper will be presented at Eval4NLP (co-located with EMNLP 2021).

Team

.06
  • Ivan P. Yamshchikov
    Ivan P. Yamshchikov
    Dr.rer.nat., Researcher at Yandex, Associate Professor at Higher School of Economics in St. Petersburg.

    Ivan’s research interests include natural language generation, computational creativity and various applications of exploratory data analysis. He received his Ph.D. in applied mathematics at the Technical University Cottbus-Senftenberg (Germany). Ivan is a frequent speaker at major professional conferences and forums. He is also one of the co-founders of Creaited Labs, a project that studies intersections of art and artificial intelligence.

  • Vladislav Mosin
    Vladislav Mosin
    Master student at Higher School of Economics in St. Petersburg, junior researcher at LEYA.

    Vladislav’s research interests include different areas of natural language processing and applied math connected with machine learning. Vladislav received a bachelor’s degree at Higher School of Economics and has been combining his studies with work at Jetbrains Research, Huawei and Yandex.

  • Kristina Zaides
    Kristina Zaides
    Ph.D., manager at LEYA, researcher at Saint Petersburg State University, Russian teacher at ITMO University.

    Kristina’s research interests include spontaneous spoken speech analysis, corpus linguistics, and modern language investigation. She received her Ph.D. in linguistics at Saint Petersburg State University. At LEYA, Kristina maintains business processes, implements administrative and financial work.

  • Maxim Surkov
    Maxim Surkov
    Master student at Higher School of Economics in St. Petersburg, junior researcher at LEYA.

    Maxim’s research interests include natural language processing and generation. He received his BSc in applied mathematics and informatics at the Higher School of Economics (Saint Petersburg). Maxim is the Northern Eurasia Finals 2020-2021 Online champion and ICPC 2020-2021 world finalist.