[bridged from https://sigmoid.social/@pbloem on the fediverse by https://fed.brid.gy/ ]
Met some seals on the way.
Met some seals on the way.
As proofs go it's pretty simple, mostly building on set theory and some juggling of inequalities.
The key structure is given above the heading: start with the statement of the […]
[Original post on sigmoid.social]
As proofs go it's pretty simple, mostly building on set theory and some juggling of inequalities.
The key structure is given above the heading: start with the statement of the […]
[Original post on sigmoid.social]
First, we pick some confidence level t (the probability the models assigns to it being correct). Then, we say: answer the question or abstain from answering […]
[Original post on sigmoid.social]
First, we pick some confidence level t (the probability the models assigns to it being correct). Then, we say: answer the question or abstain from answering […]
[Original post on sigmoid.social]
Like in an exam, you're either right or wrong, and if you're wrong you get zero points. An in exam […]
[Original post on sigmoid.social]
Like in an exam, you're either right or wrong, and if you're wrong you get zero points. An in exam […]
[Original post on sigmoid.social]
"Calibration" refer to the ability of a network to correctly represent its own uncertainty. A well calibrated […]
[Original post on sigmoid.social]
"Calibration" refer to the ability of a network to correctly represent its own uncertainty. A well calibrated […]
[Original post on sigmoid.social]
[Original post on sigmoid.social]
[Original post on sigmoid.social]
The classifier looks at the probability that our language model (p hat) […]
[Original post on sigmoid.social]
The classifier looks at the probability that our language model (p hat) […]
[Original post on sigmoid.social]
Call the probability that it generates something from E "err". This is roughly our […]
[Original post on sigmoid.social]
Call the probability that it generates something from E "err". This is roughly our […]
[Original post on sigmoid.social]
[Original post on sigmoid.social]
[Original post on sigmoid.social]
[Original post on sigmoid.social]
[Original post on sigmoid.social]
This is not something the model is expected to […]
[Original post on sigmoid.social]
This is not something the model is expected to […]
[Original post on sigmoid.social]
If you try this on a relatively raw, open model like DeepSeek-V3 […]
[Original post on sigmoid.social]
If you try this on a relatively raw, open model like DeepSeek-V3 […]
[Original post on sigmoid.social]
When this came out, many people's summary was "even OpenAI admits that hallucinations are a fundamental problem of transformers/autoregressive models/LLMs."
I've seen many people […]
[Original post on sigmoid.social]
When this came out, many people's summary was "even OpenAI admits that hallucinations are a fundamental problem of transformers/autoregressive models/LLMs."
I've seen many people […]
[Original post on sigmoid.social]
Whoever is responsible for this should have had a career in staying out of the way.
Whoever is responsible for this should have had a career in staying out of the way.
Call me pessimistic, but looking at that sequence, I'd say it _used to_ be rare.
Call me pessimistic, but looking at that sequence, I'd say it _used to_ be rare.
The only thing I changed for the orange is to put a residual connection around each transformer block and to multiplier the […]
[Original post on sigmoid.social]
The only thing I changed for the orange is to put a residual connection around each transformer block and to multiplier the […]
[Original post on sigmoid.social]
Can we please not make our cyclist-ridden country full of strange and untypical streets the testing ground for a manchild's misguided attempts at creating a technology he doesn't understand with a vast societal risk he doesn't respect.
Can we please not make our cyclist-ridden country full of strange and untypical streets the testing ground for a manchild's misguided attempts at creating a technology he doesn't understand with a vast societal risk he doesn't respect.
🍇 GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks 🍇
There's lots of work on sampling subgraphs for GNNs, but relatively little on making this sampling process _adaptive_. That is, learning to select the data from the […]
[Original post on sigmoid.social]
🍇 GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks 🍇
There's lots of work on sampling subgraphs for GNNs, but relatively little on making this sampling process _adaptive_. That is, learning to select the data from the […]
[Original post on sigmoid.social]
I'll have to dig into the details at some point. It seems that they ideas are a bit more complex than AdamW, which is a shame. Still, the performance […]
[Original post on sigmoid.social]
I'll have to dig into the details at some point. It seems that they ideas are a bit more complex than AdamW, which is a shame. Still, the performance […]
[Original post on sigmoid.social]