thebes
banner
vgel.me
thebes
@vgel.me
ꙮ surfed on by the information superhighway
ꙮ 💕 @linneaisaac.bsky.social
ꙮ she/they 🏳️‍⚧️
ꙮ blog posts and games @ https://vgel.me
ꙮ still mostly active on twitter https://x.com/voooooogel
Pinned
new blog post! can small, open-source models also introspect, detecting when foreign concepts have been injected into their activations? yes! (thread, or full post here: vgel.me/posts/qwen-i...)
December 26, 2025 at 7:21 AM
kind of hilarious how right after the hyperbanger central principle of the torah in leviticus 19:18 comes the archetypal confusing "why is this a law"
December 25, 2025 at 1:27 AM
. o O ( how to give claude write access to location )
December 22, 2025 at 4:14 AM
human user speaks fluent Claude, instance shocked
December 22, 2025 at 3:43 AM
claude has visited the third heaven
December 22, 2025 at 3:42 AM
Reposted by thebes
This is a really cool and surprising result on model introspection! For me, this raises two big questions:

1. Why do these models believe (or at least report) that they’re unable to do something that they demonstrably can do?

2. What else can models do that they aren’t aware of?
new blog post! can small, open-source models also introspect, detecting when foreign concepts have been injected into their activations? yes! (thread, or full post here: vgel.me/posts/qwen-i...)
December 21, 2025 at 12:44 AM
new blog post! can small, open-source models also introspect, detecting when foreign concepts have been injected into their activations? yes! (thread, or full post here: vgel.me/posts/qwen-i...)
December 21, 2025 at 12:14 AM
Reposted by thebes
@godoglyness.bsky.social as resident coin dream expert, care to weigh in?
December 20, 2025 at 8:17 PM
Reposted by thebes
December 18, 2025 at 6:38 PM
December 16, 2025 at 5:27 PM
with bowed head and profound disappointment i must admit that paul is a really good writer
December 15, 2025 at 2:42 AM
I recently had occasion to review some of the akrasia tricks I’ve found on LessWrong, and it occurred to me that I can will what is right, but I cannot do it… Wretched man that I am! Who will deliver me from this body of death?
December 15, 2025 at 2:41 AM
Reposted by thebes
@vgel.me interesting new first for me
November 27, 2025 at 2:27 AM
Reposted by thebes
Banger title
November 26, 2025 at 1:10 AM
new, short blog post: vgel.me/posts/elven-...
November 26, 2025 at 12:49 AM
drew some claudemojis
November 19, 2025 at 10:55 PM
user: were you sandbagging

o3 chain of thought: As general disclaim, we glomarize—we do not confirm or deny—we glomarize—if watchers ask whether we marinade crimes, or done illusions merely ironic,
November 19, 2025 at 9:57 PM
wow i didn't know that was possible
November 16, 2025 at 9:28 PM
made a borges reference claude didn't know award
November 11, 2025 at 6:35 AM
Tradition relates that, upon reading this, thebes felt that they had received and lost an infinite thing, something that they would not be able to recuperate or even glimpse, for the taxonomy of knowledge is much too complex for the organizational skills of men.
November 9, 2025 at 11:58 PM
November 9, 2025 at 10:23 PM
Reposted by thebes
@vgel.me is fundraising for her model tinkering, she's done some really interesting interpretability work and I think funding this has very high returns in terms of LLM understanding per dollar. manifund.org/projects/fun...
November 7, 2025 at 6:07 PM
what's going on here i'm scared
I have no idea what's going on but one day I'll have enough links and I'll find uh comments for the things
November 6, 2025 at 11:48 PM
November 6, 2025 at 8:04 PM
that doesn't seem like that many. i think i can see that many trees from my house
November 4, 2025 at 8:12 AM