They’re evaluating Fuentes
November 11, 2025 at 5:13 AM
They’re evaluating Fuentes
The agreement to hold a vote on ACA subsidies is not a concession. It's worth literally nothing. It should be ignored when evaluating the merits of the deal
Sanders: That is a totally meaningless gesture. You can get 100 votes here and it won't mean anything because the House is not going to take it up.
November 10, 2025 at 1:31 AM
The agreement to hold a vote on ACA subsidies is not a concession. It's worth literally nothing. It should be ignored when evaluating the merits of the deal
An empirical approach to evaluating the prevalence of long-lived balancing selection in humans--and important limitations. Work by @hannahmm.bsky.social
November 11, 2025 at 7:14 PM
An empirical approach to evaluating the prevalence of long-lived balancing selection in humans--and important limitations. Work by @hannahmm.bsky.social
Those of us who've connected within the fandom will often confer over individuals we've come across, evaluating whether they are safe and weighing their microaggressions. These are typically people that most of you would probably describe as "but they're so nice!"
So what if they are?
So what if they are?
November 10, 2025 at 8:06 PM
Those of us who've connected within the fandom will often confer over individuals we've come across, evaluating whether they are safe and weighing their microaggressions. These are typically people that most of you would probably describe as "but they're so nice!"
So what if they are?
So what if they are?
I've been digging through a pile of AI evaluation papers. Not evaluating models, but evaluating the benchmarks for evaluating models, or proposed evaluations/benchmarks that are better than what is being used.
TL;DR most AI benchmarks for testing capability are still shit
TL;DR most AI benchmarks for testing capability are still shit
November 9, 2025 at 9:15 PM
I've been digging through a pile of AI evaluation papers. Not evaluating models, but evaluating the benchmarks for evaluating models, or proposed evaluations/benchmarks that are better than what is being used.
TL;DR most AI benchmarks for testing capability are still shit
TL;DR most AI benchmarks for testing capability are still shit
can you imagine a world where the currently-present influence of tolkienesque fantasy was in its fascination with linguistics or even like, the latter-years interest tolkien took in evaluating his writing through the lens of a subjective mythology and pseudohistorical account
November 10, 2025 at 10:42 PM
can you imagine a world where the currently-present influence of tolkienesque fantasy was in its fascination with linguistics or even like, the latter-years interest tolkien took in evaluating his writing through the lens of a subjective mythology and pseudohistorical account
People may be relying on you in many ways today, so be careful that you don't let anyone down. You may feel like judges are evaluating your performance. Try not to get too carried away with this concept 1/2
November 10, 2025 at 8:26 AM
People may be relying on you in many ways today, so be careful that you don't let anyone down. You may feel like judges are evaluating your performance. Try not to get too carried away with this concept 1/2
*Evaluating the current state of all the teams I support*
Only about 3 months until pitchers and catchers report
Only about 3 months until pitchers and catchers report
November 9, 2025 at 8:14 PM
*Evaluating the current state of all the teams I support*
Only about 3 months until pitchers and catchers report
Only about 3 months until pitchers and catchers report
Dumber, encompassing more things, and with a lot more opportunity to be disproven by personal experience.
Most people have no way of personally evaluating claims of voter fraud in a distant state. They can personally evaluate things like their own energy bill or getting stuck at an airport.
Most people have no way of personally evaluating claims of voter fraud in a distant state. They can personally evaluate things like their own energy bill or getting stuck at an airport.
November 9, 2025 at 4:46 PM
Dumber, encompassing more things, and with a lot more opportunity to be disproven by personal experience.
Most people have no way of personally evaluating claims of voter fraud in a distant state. They can personally evaluate things like their own energy bill or getting stuck at an airport.
Most people have no way of personally evaluating claims of voter fraud in a distant state. They can personally evaluate things like their own energy bill or getting stuck at an airport.
📰 Nintendo's patent on "games where you summon characters to fight for you" is being reexamined by the US patent office, evaluating its validity. #gamedev #gaming #nintendo
gamesfray.com/huge-blow-fo...
gamesfray.com/huge-blow-fo...
HUGE blow to Nintendo: head of U.S. patent office takes RARE step to order reexamination of “summon subcharacter and let it fight in 1 of 2 modes” patent – games fray
gamesfray.com
November 7, 2025 at 9:05 AM
📰 Nintendo's patent on "games where you summon characters to fight for you" is being reexamined by the US patent office, evaluating its validity. #gamedev #gaming #nintendo
gamesfray.com/huge-blow-fo...
gamesfray.com/huge-blow-fo...
length of time spent transitioning *alone* as a metric for evaluating advisory wisdom. idk why to bring in the weird "baby trans" thing here either. using that terminology itself is really kinda weird because i for one simply wouldn't call fellow adult trans people babies or infantilize them
November 9, 2025 at 4:21 AM
length of time spent transitioning *alone* as a metric for evaluating advisory wisdom. idk why to bring in the weird "baby trans" thing here either. using that terminology itself is really kinda weird because i for one simply wouldn't call fellow adult trans people babies or infantilize them
This paper does some work evaluating the macro forecasts from the time and they didnt perform that badly!
Very interesting paper. Confirms what we knew from cross-country studies - large, rapid effects. The really interesting part is the firm-level analysis, which shows similar, if somewhat lower, magnitudes.
www.nber.org/papers/w3445...
www.nber.org/papers/w3445...
November 10, 2025 at 2:36 PM
This paper does some work evaluating the macro forecasts from the time and they didnt perform that badly!
I got into Talking Heads in high school and played Fear of Music for a friend, he said "This guy's afraid of everything. Why would you listen to this? AC/DC would kick this guy's ass." We all have our own criteria for evaluating art, and they're all valid.
November 7, 2025 at 6:25 PM
I got into Talking Heads in high school and played Fear of Music for a friend, he said "This guy's afraid of everything. Why would you listen to this? AC/DC would kick this guy's ass." We all have our own criteria for evaluating art, and they're all valid.
Evaluating my social options for the evening based on how convenient the parking will be.
November 8, 2025 at 1:50 AM
Evaluating my social options for the evening based on how convenient the parking will be.
Led by @stolenpyjak.bsky.social, we built a user-friendly python package for generating and evaluating privacy-preserving synthetic data! See details in our EMNLP Demo paper:
🚀 SynthTextEval, our open-source toolkit for generating and evaluating synthetic text data for high-stakes domains, will be featured at EMNLP 2025 as a system demonstration!
GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...
#EMNLP2025 #EMNLP #SyntheticData
GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...
#EMNLP2025 #EMNLP #SyntheticData
GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)
SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval
github.com
November 10, 2025 at 6:14 AM
Led by @stolenpyjak.bsky.social, we built a user-friendly python package for generating and evaluating privacy-preserving synthetic data! See details in our EMNLP Demo paper:
🤖⚖️ LLM-as-a-Judge with #SpringAI
Evaluating LLM output is challenging. Traditional metrics fall short, and human evaluation doesn't scale.
LLM-as-a-Judge uses LLMs to evaluate AI-generated content, matching human judgment
📖 spring.io/blog/2025/11...
🛠️ github.com/spring-proje...
Evaluating LLM output is challenging. Traditional metrics fall short, and human evaluation doesn't scale.
LLM-as-a-Judge uses LLMs to evaluate AI-generated content, matching human judgment
📖 spring.io/blog/2025/11...
🛠️ github.com/spring-proje...
November 10, 2025 at 10:17 AM
🤖⚖️ LLM-as-a-Judge with #SpringAI
Evaluating LLM output is challenging. Traditional metrics fall short, and human evaluation doesn't scale.
LLM-as-a-Judge uses LLMs to evaluate AI-generated content, matching human judgment
📖 spring.io/blog/2025/11...
🛠️ github.com/spring-proje...
Evaluating LLM output is challenging. Traditional metrics fall short, and human evaluation doesn't scale.
LLM-as-a-Judge uses LLMs to evaluate AI-generated content, matching human judgment
📖 spring.io/blog/2025/11...
🛠️ github.com/spring-proje...
The mere fact that the Seattle mayoral race between centrist Dem Bruce Harrell and progressive activist Katie Wilson is virtually tied (even though Harrell was supposedly a shoo-in just a few months ago) SHOULD have other centrist Dems re-evaluating.
Probably won't, but it should.
Probably won't, but it should.
November 11, 2025 at 9:32 PM
The mere fact that the Seattle mayoral race between centrist Dem Bruce Harrell and progressive activist Katie Wilson is virtually tied (even though Harrell was supposedly a shoo-in just a few months ago) SHOULD have other centrist Dems re-evaluating.
Probably won't, but it should.
Probably won't, but it should.
Well. Who is evaluating and assessing procurement risk vs operational risk? Do you trust those people? Currently?
He’s right about that.
Hegseth: "Let me say that again. We need to increase acquisition risk in order to decrease operational risk ... An 85% solution in the hands of our armed forces today is infinitely better than an unachievable 100% solution endlessly undergoing testing."
November 7, 2025 at 9:24 PM
Well. Who is evaluating and assessing procurement risk vs operational risk? Do you trust those people? Currently?
Journalists, pundits, & the public: don't assume that doctors are experts on health insurance policy, public health, or conducting/evaluating medical research.
Some are, yes, but b/c they've had specific training in these fields. The average MD? No.
And, as in any field, some MDs are just dumb.
Some are, yes, but b/c they've had specific training in these fields. The average MD? No.
And, as in any field, some MDs are just dumb.
November 11, 2025 at 11:25 AM
Journalists, pundits, & the public: don't assume that doctors are experts on health insurance policy, public health, or conducting/evaluating medical research.
Some are, yes, but b/c they've had specific training in these fields. The average MD? No.
And, as in any field, some MDs are just dumb.
Some are, yes, but b/c they've had specific training in these fields. The average MD? No.
And, as in any field, some MDs are just dumb.
Evidence for symbolic use of ochre by Micoquian Neanderthals in Crimea 🏺🧪
www.science.org/doi/10.1126/...
Results highlight Neanderthal cognitive complexity and underscore the importance of regional, multiproxy approaches in evaluating the emergence of symbolic material culture.
www.science.org/doi/10.1126/...
Results highlight Neanderthal cognitive complexity and underscore the importance of regional, multiproxy approaches in evaluating the emergence of symbolic material culture.
November 7, 2025 at 7:00 PM
Evidence for symbolic use of ochre by Micoquian Neanderthals in Crimea 🏺🧪
www.science.org/doi/10.1126/...
Results highlight Neanderthal cognitive complexity and underscore the importance of regional, multiproxy approaches in evaluating the emergence of symbolic material culture.
www.science.org/doi/10.1126/...
Results highlight Neanderthal cognitive complexity and underscore the importance of regional, multiproxy approaches in evaluating the emergence of symbolic material culture.
Kinda related, but evaluating individual politicians rather than parties is a major flaw of American democracy. Any individual lawmaker is useless on their own. What you're voting for come election day is party control.
November 11, 2025 at 5:12 PM
Kinda related, but evaluating individual politicians rather than parties is a major flaw of American democracy. Any individual lawmaker is useless on their own. What you're voting for come election day is party control.
Working alongside people with lived experience of multiple disadvantage can transform services – and lives.
Our new study with Changing Futures Bristol shows how trauma-informed, co-produced work makes a difference.
📄 Read more: arc-w.nihr.ac.uk/evaluating-h...
#HealthResearch #CoProduction
Our new study with Changing Futures Bristol shows how trauma-informed, co-produced work makes a difference.
📄 Read more: arc-w.nihr.ac.uk/evaluating-h...
#HealthResearch #CoProduction
Evaluating how people with lived experience of multiple disadvantage can improve services - ARC West
Working alongside people with lived experience of multiple disadvantage to improve services can have multiple benefits, according to a new study published in Health Expectations. Multiple disadvantage...
arc-w.nihr.ac.uk
November 10, 2025 at 10:42 AM
Working alongside people with lived experience of multiple disadvantage can transform services – and lives.
Our new study with Changing Futures Bristol shows how trauma-informed, co-produced work makes a difference.
📄 Read more: arc-w.nihr.ac.uk/evaluating-h...
#HealthResearch #CoProduction
Our new study with Changing Futures Bristol shows how trauma-informed, co-produced work makes a difference.
📄 Read more: arc-w.nihr.ac.uk/evaluating-h...
#HealthResearch #CoProduction
We spent all of last year evaluating and discussing safety thanks to 2 MnDOT safe routes to school grants we got, which helped set the stage (I hope!) & allowed us to pilot a school street where we got to experience safety on the least safe block.
But honestly it was an angry email a neighbor…
But honestly it was an angry email a neighbor…
November 11, 2025 at 10:59 PM
We spent all of last year evaluating and discussing safety thanks to 2 MnDOT safe routes to school grants we got, which helped set the stage (I hope!) & allowed us to pilot a school street where we got to experience safety on the least safe block.
But honestly it was an angry email a neighbor…
But honestly it was an angry email a neighbor…
Live in five with Ambrosia Sky and these cats! twitch.tv/bbwolfe
November 10, 2025 at 11:55 PM
Live in five with Ambrosia Sky and these cats! twitch.tv/bbwolfe