Lightnews — Scholar-powered news

Michael R. Bock

@michaelrbock.com

imagine falling for the most obvious spy of all time on bumble ???

(a friend sent me this screenshot, I'm married 😅)

November 3, 2025 at 4:40 AM

Michael R. Bock

@michaelrbock.com

How will we know that AI has really “made it”?

The task that most exemplifies our ability to automate knowledge work is “doing your taxes”.

At Column Tax we’re now within line of sight to fully automating taxes. We started the company at the perfect moment, with LLMs just on the horizon.

October 29, 2025 at 1:48 PM

Michael R. Bock

@michaelrbock.com

Positive review of my most popular blog post: "Hypothesis Sheets - how to navigate and exit the idea maze with a (good) startup idea".

Glad to hear the founder whisper networks are still sharing this knowledge around.

October 23, 2025 at 3:35 PM

Michael R. Bock

@michaelrbock.com

3/ GPT-5 is impressive in many ways

especially because it's knowledge cutoff is still September 2024

but it's not the leader in tax calculation today

(even with maximal test time compute)

September 18, 2025 at 5:39 PM

Michael R. Bock

@michaelrbock.com

1/ GPT-5 is worse than Gemini 2.5 Pro at filing your taxes (but it's really close and they both can't do it yet)

we proved it via our tax calculation benchmark:

September 18, 2025 at 5:38 PM

Michael R. Bock

@michaelrbock.com

I got married last month.🤵‍♂️👰‍♀️

Here's what it taught me about B2B2C tax software:

Just kidding :) but I do really recommend getting married to the love of your life with all your friends & family around!

September 17, 2025 at 2:00 PM

Michael R. Bock

@michaelrbock.com

no one had even heard of git worktress before claude code

August 13, 2025 at 8:49 PM

Michael R. Bock

@michaelrbock.com

amazing ChatGPT Agent Mode use case: find & validate coupon codes without having to test them yourself

August 3, 2025 at 7:08 PM

Michael R. Bock

@michaelrbock.com

8/ Models are also inconsistent:

using pass^k (a measure of reliability of a model across multiple runs on the same task), performance degrades with additional runs meaning models mess up in new & surprising ways when calculating tax returns.

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

7/ For some models, performance improves with increased inference-time compute (thinking budget tokens)

but not for the best model (Gemini 2.5 Pro), suggesting alternative techniques/scaffolding/orchestration is required to get AI to do this tax calculation task.

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

6/ Models consistently:

1. Misuse tax tables
2. Make calculation errors

For example, models will hallucinate line numbers on Forms or use incorrect eligibility limits.

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

5/ Takeaway: models can’t calculate tax returns reliably today.

Even on this simplified data set and allowing the models to output to a simplified format, the best model only calculates 32.35% of returns correctly.

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

4/ TaxCalcBench is a dataset of 51 pairs of user inputs and the expected tax return output + a testing harness.

We made the task easy for the models. We provide:
- all of the data (e.g. W-2s) needed to file a return
- the expected output in IRS XML format

July 23, 2025 at 3:17 PM

Michael R. Bock

@michaelrbock.com

3/ Tax calculation means taking a user’s "inputs" (W-2s, 1099s) and outputting the Form 1040 in the IRS XML format.

75k pages of English text define the transformations required to do this.

Companies like @ColumnTax use deterministic tax engines to do these calculations.

July 23, 2025 at 3:17 PM

Michael R. Bock

@michaelrbock.com

1/ Can AI file your taxes? Not yet.

We tested the latest frontier models and the results were full of catastrophic errors.

Letting AI do your taxes would mean IRS rejections, audits, and penalties:

July 23, 2025 at 3:17 PM

Michael R. Bock

@michaelrbock.com

this is the wildest cold twitter dm opener i've ever received

July 21, 2025 at 11:17 PM

Michael R. Bock

@michaelrbock.com

this is what founder <> founder private text messages look like (and what makes the job so fun)

July 21, 2025 at 2:05 PM

Michael R. Bock

@michaelrbock.com

July 19, 2025 at 6:10 PM

Michael R. Bock

@michaelrbock.com

why is everyone complaining about a GPU shortage if it turns out you can just buy them on amazon ;)

June 21, 2025 at 3:48 PM

Michael R. Bock

@michaelrbock.com

1/ I’m working on tax filing software, and this caught me by surprise: an engineer just open sourced the IRS's Direct File project (now that it seems likely the current administration will shut down the program for next year).

June 5, 2025 at 4:02 PM

Michael R. Bock

@michaelrbock.com

1/ Tax is a special domain: it requires 100% correctness (you don’t want to compute someone’s taxes incorrectly!)

So we’ve had to build lots of custom infrastructure to deploy AI agents.

It’s working. We’re now able to build our tax engine much faster than traditional methods: