Wht do u mean? Learning representations of social meaning for natural language processing

Summary

Language is a social practice and how things are said instead of what is said can provide social information. I will integrate the social aspects of language in one of the core components of many modern natural language processing (NLP) systems: neural vector representations.

The way language is represented in NLP systems has changed radically with the emergence of neural network approaches that learn to represent words, sentences, and other linguistic units as dense real-valued vectors. The large body of work on vector representations has focused almost exclusively on their semantic and syntactic properties. However, the vectors are learned from contextual information and therefore have the potential to encode various factors—not only syntactic and semantic ones but also social factors. Although vector representations are widely used in NLP, language’s social dimension has been neglected in developing and evaluating representations.

Building on my expertise in computational sociolinguistics, I will bring a much-needed perspective into the development of word representations by focusing on social meaning. Words can carry social meaning, i.e. they can be associated with being young, male, relaxed, etc. For example, ‘probably’ and ‘prolly’ have the same referential meaning, but not the same social meaning. Representations must capture such differences for a nuanced and effective understanding of language use, for example, for realistic conversational systems or fine-grained analyses of social phenomena.

First, I will develop methodologies to interrogate the vector representations to understand whether and how they capture social meaning. Second, I will improve the representations by integrating socio-situational information in the learning process and by decomposing the representations. Third, I will apply them to style-sensitive text generation and in a study on linguistic styles. The project will lead to (1) improved NLP systems for language in social contexts; and (2) new methods of sociolinguistic inquiry.

Details

Project number

VI.Veni.192.130

Main applicant

Dr. D. Nguyen

Affiliated with

The Alan Turing Institute

Duration

01/01/2020 to 31/12/2022