Language can tell us a lot about society. We use language to communicate who we are and where we are from. With the recent availability of large amounts of data and machine learning methods, we can now gain new insights and corroborate perceived wisdom.
In this talk, I will give a brief introduction of a method called embeddings, and will show several applications of it. Embeddings are a new way of representing words (a direct implementation of the distributional hypothesis by Firth) as points in in a multi-dimensional vector space. This is not unlike arranging word magnets on a fridge. Each word’s position relative to all others is determined by the contextual similarity to all other words, thereby determining semantic and syntactic groupings.
The resulting vector representations of words have turned out to capture a variety of latent factors, from lexical semantics to syntax to socio-demographic aspects to societal attitudes.
The ease of use and the range of applications make embeddings a valuable tool in language-related research. I will show how they capture regional variation at an intra- and interlingual level, how they distinguish varieties and linguistic resources, and how they allow for the assessment of changing societal norms and associations.