David Dukić
ddaviddukic.bsky.social
David Dukić
@ddaviddukic.bsky.social
PhD student in NLP | TakeLab 🇭🇷 | Information extraction, representation learning & analysis | Making LLMs better one step at a time
So, news becomes more positive as the years go by. Or does it? We trained sentiment classifiers on STONE & 24sata, then analyzed sentiment over 5 periods of the TL Retriever. We find that positivity rises at the expense of neutrality. But negativity in news headlines also increases.
July 15, 2025 at 12:14 PM
We detect sentiment shift by swapping embeddings across periods. Using later-period embeddings in earlier periods results in increased positive sentiment. Using earlier-period embeddings in later periods results in decreased positive sentiment.
July 15, 2025 at 12:14 PM
We identify words that change the most by their cumulative cosine distance scores within the last 25 years. For these words, we unveil the change in meaning by picking five nearest neighbors per period. We group the words into three major topics: EU, technology, and COVID.
July 15, 2025 at 12:14 PM
We train embeddings using skip-gram with negative sampling (SGNS) method from Word2Vec. We align embeddings between different periods using Procrustes alignment. We validate the quality of embeddings on two word similarity datasets.
July 15, 2025 at 12:14 PM
📣📣 New preprint alert!!

Despite events in the world becoming bleaker, the news is… more positive?

We conduct a diachronic study of word embeddings trained on 10M Croatian news articles spanning 25 years and find some surprising results!

arxiv.org/abs/2506.13569
July 15, 2025 at 12:14 PM
So, news becomes more positive as the years go by. Or does it? We trained sentiment classifiers on STONE & 24sata, then analyzed sentiment over 5 periods of the TL Retriever. We find that positivity rises at the expense of neutrality. But negativity in news headlines also increases.
July 15, 2025 at 12:09 PM
We detect sentiment shift by swapping embeddings across periods. Using later-period embeddings in earlier periods results in increased positive sentiment. Using earlier-period embeddings in later periods results in decreased positive sentiment.
July 15, 2025 at 12:09 PM
We identify words that change the most by their cumulative cosine distance scores within the last 25 years. For these words, we unveil the change in meaning by picking five nearest neighbors per period. We group the words into three major topics: EU, technology, and COVID.
July 15, 2025 at 12:09 PM
We train embeddings using skip-gram with negative sampling (SGNS) method from Word2Vec. We align embeddings between different periods using Procrustes alignment. We validate the quality of embeddings on two word similarity datasets.
July 15, 2025 at 12:09 PM