Natural Language Processing

Natural Language Processing (NLP) research at Google focuses on algorithms that apply at scale, across languages, and across domains. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more.

Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems. We are particularly interested in algorithms that scale well and can be run efficiently in a highly distributed environment.

Our syntactic systems predict part-of-speech tags for each word in a given sentence, as well as morphological features such as gender and number. They also label relationships between words, such as subject, object, modification, and others. We focus on efficient algorithms that leverage large amounts of unlabeled data, and recently have incorporated neural net technology.

On the semantic side, we identify entities in free text, label them with types (such as person, location, or organization), cluster mentions of those entities within and across documents (coreference resolution), and resolve the entities to the Knowledge Graph.

Recent work has focused on incorporating multiple sources of knowledge and information to aid with analysis of text, as well as applying frame semantics at the noun phrase, sentence, and document level.

Recent Publications

Connecting Language Technologies with Rich, Diverse Data Sources Covering Thousands of Languages

Daan van Esch

Sandy Ritchie

Sebastian Ruder

Julia Kreutzer

Clara Rivera

Ishank Saxena

Isaac Caswell

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequential Labeling?

Kazuma Hashimoto

Iftekhar Naim

Karthik Raman

EACL 2024 workshop on UncertaiNLP

Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines

Yuchen Li

Alexandre Kirchmeyer

Aashay Mehta

Yilong Qin

Boris Dadachev

Kishore Papineni

Sanjiv Kumar

Andrej Risteski

International Conference on Machine Learning (2024) (to appear)

Now You See Me, Now You Don't: 'Poverty of the Stimulus' Problems and Arbitrary Correspondences in End-to-End Speech Models

Daan van Esch

Proceedings of the Second Workshop on Computation and Written Language (CAWL) 2024

LinguaMeta: Unified Metadata for Thousands of Languages

Sandy Ritchie

Daan van Esch

Uche Okonkwo

Shikhar Vashishth

Emily Drummond

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Demystifying Embedding Spaces using Large Language Models

Guy Tennenholtz

Yinlam Chow

Chih-wei Hsu

Jihwan Jeong

Lior Shani

Aza Tulepbergenov

Deepak Ramachandran

Martin Mladenov

Craig Boutilier

The Twelfth International Conference on Learning Representations (2024)

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Natural Language Processing

Recent Publications

Some of our teams

Join us

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Natural Language Processing

Recent Publications

Some of our teams

Join us

AI/ML Foundations  & Capabilities