Andrew Gross
gross.systems
Andrew Gross
@gross.systems
Engineer at YipitData.
NYC Area

https://github.com/andrewgross/
https://gross.systems

I was told I had to add AI Engineer to my profile for the bots to find me.

Views my own, not my employer etc etc.
Biggest work distractions.

1. Arguing with healthcare providers about billing
2. Meetings

Distant 3rd: IDK IT Issues or something
November 19, 2025 at 3:25 PM
I wonder how long until we see all these tools that are meant to stop overly-aggressive AI data crawlers start poisoning their data www.anthropic.com/research/sma....
A small number of samples can poison LLMs of any size
Anthropic research on data-poisoning attacks in large language models
www.anthropic.com
October 26, 2025 at 6:05 PM
Every new benchmark or tool I see screams that the real limiting factor for making effective systems with LLMs/ML is context + evals. Model "intelligence" is rarely the deciding factor now.
September 26, 2025 at 1:54 PM
Astounding to me that OpenAI has had their new billing dashboard for this long without a good way to tie an API key to usage. API keys get human names, but billing refers to them by `key_XXXXXXXXX`, with no mapping between them. Have to use the legacy dashboard platform.openai.com/account/usag...
OpenAI Platform
Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.
platform.openai.com
September 22, 2025 at 3:38 PM
"May you create a successful open source project" - Ancient Developer Curse
September 14, 2025 at 7:06 PM
Its becoming pretty apparenty from using agentic systems and tools that there are a few big blockers making them more effective

1. Context is way too small
2. Retrieval from that context sucks
3. Density of context is terrible
September 12, 2025 at 11:41 AM
It was a little annoying that Claude Code didn't have a way to limit the context so it was easy to use other models without manually running compact. I ended up hacking on the JS blob after reviewing the unminified code to find what I needed. I did feel only having 128K context vs the 200k/1mm
September 11, 2025 at 12:35 AM
Surprised I haven't seen more discussion of the MTP features in GLM-4.5. Once its configured it really lets the model fly. Went from 70 tok/s to over 200 tok/s. Pretty incredible speedup but no one seems to be running with it.
September 10, 2025 at 11:55 PM
Finally got SGLang working with FP8 on Blackwell. Enabling MTP took GLM 4.5 Air from 70 tok/s to around 200. Pretty great performance! Looks like vLLM does support MTP but hard codes only looking one token ahead, which doesn't do much.
September 9, 2025 at 3:15 PM
Today in SGLang configs documented nowhere, `USE_TRITON_W8A8_FP8_KERNEL`. If you have a non-enterprise blackwell GPU, you should set this when running FP8 models (like GLM-Air-FP8). It will allow the model to run and should let you use the tuned triton Blackwell RTX 6000 config.
September 9, 2025 at 1:40 AM
Man, Blackwell has been out for almost a year and it is still like pulling teeth to get things working on it. Todays adventure is getting SGLang to play nice with MoE FP8 kernels (hint: use Triton), and then getting SGLang to play nice with itself.

github.com/sgl-project/...
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
github.com
September 8, 2025 at 8:42 PM
These days it feels like buying a couple tickets to the lottery is less about escapism and more playing to your outs.
September 7, 2025 at 2:53 PM
The state of the web is bad enough that I am pondering using a small LLM just to do a better job of filling out address / CC form fields.
September 7, 2025 at 11:42 AM
Ran claude code (clis.js) through Humanify to get a version thats a little more readable. github.com/andrewgross/...

Working on some tooling to make this easier, faster and a bit cleaner on the output.
GitHub - andrewgross/claude-code-unminified
Contribute to andrewgross/claude-code-unminified development by creating an account on GitHub.
github.com
September 4, 2025 at 2:49 PM
Today I learned the hazard of having a dated version of libnccl-dev installed in a container where the CUDA Toolkit and Drivers are a newer version. However, you can go too far, installing the cuda13.0 nccl version with cuda 12.9 installed will not work.
September 2, 2025 at 1:34 AM
Turns out all those tips about setting Thinking in Claude Code using terms like ULTRATHINK or MEGATHINK aren't encoded into the model, but just set the thinking token budget: gist.github.com/andrewgross/...
claude_code_thinking.js
GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
August 18, 2025 at 3:18 PM
Toying around with tracking some global claude configs in git. Some commands, an agent or two, and a global claude md (python focused). github.com/andrewgross/...
GitHub - andrewgross/claude_configs
Contribute to andrewgross/claude_configs development by creating an account on GitHub.
github.com
August 14, 2025 at 4:49 PM
Fun fact, if you run `pip install pyspark` in Databricks and restart the session, it will crash. Although you are running Pyspark, it does not present as an installed Python package, and when you install it, it will overwrite key libraries and break the session.
August 9, 2025 at 5:02 PM
Can't decide if its genius or foolish to group records in a Pyspark table by converting the UUID characters to integers for modulo arithmetic. github.com/andrewgross/...

I often need to run processing over lots of records (ML Models, LLM calls) and its too much to collect it all to one driver.
GitHub - andrewgross/pyspark_toolkit: Some missing pyspark functions.
Some missing pyspark functions. Contribute to andrewgross/pyspark_toolkit development by creating an account on GitHub.
github.com
August 7, 2025 at 2:33 AM
Back on my bullshit signing S3 URLs in pure pyspark

issues.apache.org/jira/browse/...

We need a "native" HMAC implementation because my pyspark implementation blows up when it nests 5 layers deep in the signing algorithm.
[SPARK-53154] Add HMAC to pyspark.sql.functions - ASF JIRA
issues.apache.org
August 6, 2025 at 8:59 PM
MLFlow type inference on `.predict` and its subsequent requirement for use in Unity Catalog is causing me no end of trouble. Insane to me that passing a Pyspark dataframe to pyfunc.predict gives me [Any()] schema and then blows up completely when I run it.
July 24, 2025 at 6:07 PM
As expected, vibe coding with Claude Code is both great and terrible. It did some really cool stuff, implementing an AST parser to build out strong requirements for input and output. However, it also wrote tons of terrible/useless tests and had lots of cluttered and unneeded functions.
July 23, 2025 at 3:52 AM
Claude Code seems to really love creating tests where it mocks out both the function and the return value, and then tests that the results from the mocks are as expected...

Great work, no notes.
July 22, 2025 at 7:06 PM
Claude Code should have a `/yolo 15` which just lets it run any command for the next 15 minutes (maybe with some egregious exceptions)
July 21, 2025 at 1:27 AM
Seems like MLFlow/Cloudpickle can't serialize Logger objects if they are in the global scope. It took way too longer to figure this out since the tracebacks from pickling are supremely unhelpful.
July 15, 2025 at 8:51 PM