Matt Brown
@mattb.nz
I look forward to the post I assume this is leading to for further enlightenment - I've been long puzzled by how seemingly simple/small the code around the giant pile of weights seems to be in practice, these sorts of insights are awesome.
November 11, 2025 at 10:50 AM
I look forward to the post I assume this is leading to for further enlightenment - I've been long puzzled by how seemingly simple/small the code around the giant pile of weights seems to be in practice, these sorts of insights are awesome.
huh, this (cache value, being a pure functional mapping from the input prompt) is a TIL moment for me...
I naively assumed that the point of prompt caching was about restoring internal state of the model...!
In hindsight that assumption seems obviously dumb, given the sizes involved!
Thanks :)
I naively assumed that the point of prompt caching was about restoring internal state of the model...!
In hindsight that assumption seems obviously dumb, given the sizes involved!
Thanks :)
November 11, 2025 at 2:12 AM
huh, this (cache value, being a pure functional mapping from the input prompt) is a TIL moment for me...
I naively assumed that the point of prompt caching was about restoring internal state of the model...!
In hindsight that assumption seems obviously dumb, given the sizes involved!
Thanks :)
I naively assumed that the point of prompt caching was about restoring internal state of the model...!
In hindsight that assumption seems obviously dumb, given the sizes involved!
Thanks :)
But IMO just as easy to put the server in a secured room with badge access logs, etc, and/or a locked rack in the corner of the office vs under a desk.
I don't think a compromised dev machine is comparable - they should *never* have secrets directly exposed them (vs CI/CD which requires them)...
I don't think a compromised dev machine is comparable - they should *never* have secrets directly exposed them (vs CI/CD which requires them)...
October 22, 2025 at 8:58 PM
But IMO just as easy to put the server in a secured room with badge access logs, etc, and/or a locked rack in the corner of the office vs under a desk.
I don't think a compromised dev machine is comparable - they should *never* have secrets directly exposed them (vs CI/CD which requires them)...
I don't think a compromised dev machine is comparable - they should *never* have secrets directly exposed them (vs CI/CD which requires them)...
Yes, SOC2 in my experience is mostly validating that you have a set of policies and controls in place, that you assert are suitable for your business (vs a very low-bar baseline) and that you actually follow them.
So if you want to declare this not a risk, your auditor will probably accept it.
So if you want to declare this not a risk, your auditor will probably accept it.
October 22, 2025 at 8:58 PM
Yes, SOC2 in my experience is mostly validating that you have a set of policies and controls in place, that you assert are suitable for your business (vs a very low-bar baseline) and that you actually follow them.
So if you want to declare this not a risk, your auditor will probably accept it.
So if you want to declare this not a risk, your auditor will probably accept it.
I'd be more worried about the security/supply chain risks:
Assumption: You sell a product to/maintain OSS used by someone important that attacker X wants to compromise.
Threat model: X breaks into your office, compromises your under-desk CI server with subtle malware that backdoors your builds.
Assumption: You sell a product to/maintain OSS used by someone important that attacker X wants to compromise.
Threat model: X breaks into your office, compromises your under-desk CI server with subtle malware that backdoors your builds.
October 22, 2025 at 3:46 AM
I'd be more worried about the security/supply chain risks:
Assumption: You sell a product to/maintain OSS used by someone important that attacker X wants to compromise.
Threat model: X breaks into your office, compromises your under-desk CI server with subtle malware that backdoors your builds.
Assumption: You sell a product to/maintain OSS used by someone important that attacker X wants to compromise.
Threat model: X breaks into your office, compromises your under-desk CI server with subtle malware that backdoors your builds.
Under-desk (vs on-prem server room) also raises physical security questions (e.g. evil maid/cleaner attack) that I would find harder to justify SOC2/ISO controls against.
A CI server is riskier than a dev desktop - it deploys directly to prod, while desktop actions are gated through a review step.
A CI server is riskier than a dev desktop - it deploys directly to prod, while desktop actions are gated through a review step.
October 21, 2025 at 11:24 PM
Under-desk (vs on-prem server room) also raises physical security questions (e.g. evil maid/cleaner attack) that I would find harder to justify SOC2/ISO controls against.
A CI server is riskier than a dev desktop - it deploys directly to prod, while desktop actions are gated through a review step.
A CI server is riskier than a dev desktop - it deploys directly to prod, while desktop actions are gated through a review step.
I'd look at it less from a reliability perspective and more from maintenance and security.
Under-desk might be fine if it's well-managed (updated, monitored, etc) but "spare box" has connotations that point away from that...
Is the under-desk runner in your MDM/inventory and regularly updated?
Under-desk might be fine if it's well-managed (updated, monitored, etc) but "spare box" has connotations that point away from that...
Is the under-desk runner in your MDM/inventory and regularly updated?
October 21, 2025 at 11:24 PM
I'd look at it less from a reliability perspective and more from maintenance and security.
Under-desk might be fine if it's well-managed (updated, monitored, etc) but "spare box" has connotations that point away from that...
Is the under-desk runner in your MDM/inventory and regularly updated?
Under-desk might be fine if it's well-managed (updated, monitored, etc) but "spare box" has connotations that point away from that...
Is the under-desk runner in your MDM/inventory and regularly updated?
watching with interesting, and intruiged by the idea, but timezones are challenging...
If/when you have an iteration of this that works for UTC+12/UTC+13 (NZ) I would be interested.
If/when you have an iteration of this that works for UTC+12/UTC+13 (NZ) I would be interested.
October 16, 2025 at 3:44 AM
watching with interesting, and intruiged by the idea, but timezones are challenging...
If/when you have an iteration of this that works for UTC+12/UTC+13 (NZ) I would be interested.
If/when you have an iteration of this that works for UTC+12/UTC+13 (NZ) I would be interested.
added to my queue, but do you know why the transistor share page doesn't link to Spotify?
I had to spend an extra minute manually searching for it in Spotify...
I had to spend an extra minute manually searching for it in Spotify...
October 16, 2025 at 1:51 AM
added to my queue, but do you know why the transistor share page doesn't link to Spotify?
I had to spend an extra minute manually searching for it in Spotify...
I had to spend an extra minute manually searching for it in Spotify...
I'm guessing #835 having just done all 3...
I got it 3rd, but purely by guessing/segmenting the 8 remaining words into which 4 seemed most likely to match some weird american grouping - a tactic I have to use frequently!
I got it 3rd, but purely by guessing/segmenting the 8 remaining words into which 4 seemed most likely to match some weird american grouping - a tactic I have to use frequently!
September 23, 2025 at 11:41 PM
I'm guessing #835 having just done all 3...
I got it 3rd, but purely by guessing/segmenting the 8 remaining words into which 4 seemed most likely to match some weird american grouping - a tactic I have to use frequently!
I got it 3rd, but purely by guessing/segmenting the 8 remaining words into which 4 seemed most likely to match some weird american grouping - a tactic I have to use frequently!
is "today" for you #834, #835 or #836 ?
Timezones make this hard :)
Timezones make this hard :)
September 23, 2025 at 11:33 PM
is "today" for you #834, #835 or #836 ?
Timezones make this hard :)
Timezones make this hard :)
how/where do you review Claude Code's output without an editor?
August 11, 2025 at 8:28 PM
how/where do you review Claude Code's output without an editor?
1 minute of confusion, but otherwise easy:
1) Google "nz current covid case data"
2) Click first result (https://www.tewhatuora.govt.nz/for-health-professionals/data-and-statistics/covid-19-data/covid-19-case-demographics)
3) Navigate breadcrumbs to "Current Cases" (parent page + first link there)
1) Google "nz current covid case data"
2) Click first result (https://www.tewhatuora.govt.nz/for-health-professionals/data-and-statistics/covid-19-data/covid-19-case-demographics)
3) Navigate breadcrumbs to "Current Cases" (parent page + first link there)
July 28, 2025 at 10:19 PM
1 minute of confusion, but otherwise easy:
1) Google "nz current covid case data"
2) Click first result (https://www.tewhatuora.govt.nz/for-health-professionals/data-and-statistics/covid-19-data/covid-19-case-demographics)
3) Navigate breadcrumbs to "Current Cases" (parent page + first link there)
1) Google "nz current covid case data"
2) Click first result (https://www.tewhatuora.govt.nz/for-health-professionals/data-and-statistics/covid-19-data/covid-19-case-demographics)
3) Navigate breadcrumbs to "Current Cases" (parent page + first link there)
This was super encouraging to consider on day ~700 of my current journey, thanks!
Do you have a longer article on this topic too? I'm particularly interested in whether 1000 is just a nice round number that feels right from your experience, or whether there's something deeper...
Do you have a longer article on this topic too? I'm particularly interested in whether 1000 is just a nice round number that feels right from your experience, or whether there's something deeper...
July 16, 2025 at 2:28 AM
This was super encouraging to consider on day ~700 of my current journey, thanks!
Do you have a longer article on this topic too? I'm particularly interested in whether 1000 is just a nice round number that feels right from your experience, or whether there's something deeper...
Do you have a longer article on this topic too? I'm particularly interested in whether 1000 is just a nice round number that feels right from your experience, or whether there's something deeper...
terminal testing also covers both me and claude code, probably more claude code than me these days :)
June 12, 2025 at 3:05 AM
terminal testing also covers both me and claude code, probably more claude code than me these days :)
1) manually from the terminal during dev (specific tests, relevant to code under dev)
2) pre-push (via a git hook and the pre-commit framework, full repo suite)
3) as a branch compliance/merge requirement (via GitHub Actions) - the most important one, in a known good environment, etc).
2) pre-push (via a git hook and the pre-commit framework, full repo suite)
3) as a branch compliance/merge requirement (via GitHub Actions) - the most important one, in a known good environment, etc).
June 12, 2025 at 2:51 AM
1) manually from the terminal during dev (specific tests, relevant to code under dev)
2) pre-push (via a git hook and the pre-commit framework, full repo suite)
3) as a branch compliance/merge requirement (via GitHub Actions) - the most important one, in a known good environment, etc).
2) pre-push (via a git hook and the pre-commit framework, full repo suite)
3) as a branch compliance/merge requirement (via GitHub Actions) - the most important one, in a known good environment, etc).