Lightnews — Scholar-powered news

Louis Teitelbaum

@louisteitelbaum.bsky.social

Computational Social Psychology @ Ben-Gurion University, Distributional Semantics × Spread of Ideas. Co-author of https://ds4psych.com/

Posts Replies Media Videos

Louis Teitelbaum

@louisteitelbaum.bsky.social

Check out the package at rimonim.github.io/embedplyr/

Tools for Working With Text Embeddings

Common operations with word and text embeddings within a tidyverse/quanteda workflow, as demonstrated in "Data Science for Psychology: Natural Language". Includes simple functions for calculating comm...

rimonim.github.io

November 11, 2025 at 7:40 AM

Louis Teitelbaum

@louisteitelbaum.bsky.social

Just as dplyr is “a grammar of data manipulation”, embedplyr is a grammar of embeddings manipulation, designed to facilitate the use of word and text embeddings in common analysis workflows without introducing new syntax or unfamiliar data structures. This makes it perfect for teaching students.

November 11, 2025 at 7:40 AM

Louis Teitelbaum

@louisteitelbaum.bsky.social

e.g. Analyzing texts with a anchored Distributed Dictionary Representation (DDR) used to take ~100 lines of code + an hour figuring out how to load the pretrained model you like. With embedplyr you can do that in ~6 lines of code, and your favorite pretrained model is loaded automatically.

November 11, 2025 at 7:40 AM

Louis Teitelbaum

@louisteitelbaum.bsky.social

I'm only a year into my PhD, and I already have a steady stream of grad students coming to me for help analyzing text with semantic embeddings. Sometimes they have legitimate methodological questions, but often they just don't know where to start with the analysis code. Enter embedplyr...

November 11, 2025 at 7:40 AM

Louis Teitelbaum

@louisteitelbaum.bsky.social

9/
Finally—cosine is not the only similarity metric out there. We go through the pros and cons of each, with advice about when e.g. dot product is more effective.

June 24, 2025 at 2:10 PM

Louis Teitelbaum

@louisteitelbaum.bsky.social

8/
You may think good and evil are opposites, but your embedding model might think: “Those are both moral judgements! Very similar!” If your construct has an opposite, consider using an anchored vector.

June 24, 2025 at 2:10 PM

Louis Teitelbaum

@louisteitelbaum.bsky.social

7/
CAV = learn a vector representation from labeled examples. Humans rate a few posts; you apply the pattern to analyze new texts! This new method gives precise, interpretable scores if you have relevant training data on hand.

June 24, 2025 at 2:10 PM

Louis Teitelbaum

@louisteitelbaum.bsky.social

6/
CCR = embed a questionnaire. Very powerful when your texts are similar to questionnaire scale items (e.g. open-ended responses). We point out a risk—if you aren’t careful, you might measure how much your texts sound like psychological questionnaires—but there are solutions!

June 24, 2025 at 2:10 PM

Louis Teitelbaum

@louisteitelbaum.bsky.social

5/
DDR = average embedding of a word list. Great for summarizing abstract dimensions (emotion, morality) across genres. Not good for more complex constructs. NEW important tip: weight words by frequency to reduce noise from rare words.

June 24, 2025 at 2:10 PM

Louis Teitelbaum

@louisteitelbaum.bsky.social

4/
We review 3 ways to improve on traditional methods: Distributed Dictionary Representation (DDR), Contextualized Construct Representation (CCR) & our new Correlational Anchored Vectors (CAV).
Each has advantages and disadvantages.

June 24, 2025 at 2:10 PM

Louis Teitelbaum

@louisteitelbaum.bsky.social

3/
Your trusty Likert scale questionnaire could be free response instead.
Your validated word list could be leveraged to analyze words that aren’t included.
Your painstaking MTurk-rated dataset could be extended to analyze 10,000 social media posts.

June 24, 2025 at 2:10 PM

Louis Teitelbaum

@louisteitelbaum.bsky.social

2/
What’s an embedding?
Why choose one model over another?
Why do you need embeddings when you can ask ChatGPT to rate your texts?
Take a look: doi.org/10.31234/osf...

OSF

doi.org

June 24, 2025 at 2:10 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news