lhl
banner
lhl.bsky.social
lhl
@lhl.bsky.social
Easily distracted, currently building open source AI. Living online since FidoNet
I started watching this epic 3.5h investigative journalism piece by Gamers Nexus on Chinese GPU smuggling, it's really amazing the work this independent YouTube gaming channel is doing: www.youtube.com/watch?v=1H3x...
THE NVIDIA AI GPU BLACK MARKET | Investigating Smuggling, Corruption, & Governments
YouTube video by Gamers Nexus
www.youtube.com
August 18, 2025 at 6:18 AM
Over the past couple weeks I've been working on some Strix Halo testing in my spare time. This includes bringing up a harness for doing full sweeps for pp/tg for a variety of different model architectures, backends, and flags. Writeup just posted to r/LocalLLama: www.reddit.com/r/LocalLLaMA...
July 22, 2025 at 11:05 AM
One neat thing is that experimenting with using Shisa V2 405B to regen our datasets, I'm seeing gains w/ new chosen DPO (slight boost on Qwen 3 vs original DPO), and for SFT+DPO, close to a 0.5 point gain on Shaberi averages for Llama 3.1 8B.
June 20, 2025 at 6:24 PM
Recently I started doing some Qwen3 testing (Shaberi, GPT-4.1 judge) and interestingly for almost all models, reasoning yielded worse performance. Note: I need to stand multieval back up - Even though Qwen3 8B tunes appear to match the Shisa V2 12B/14B tunes, they are much worse on translation.
June 15, 2025 at 5:03 AM
I had a chat w/ o3 chatgpt.com/share/6846ff... about Apple's new "Illusion of Thinking" paper machinelearning.apple.com/research/ill... - based on the researchers' definition, neither reasoning LLMs nor humans are true reasoners, but the Python script I had o3 write to solve the logic puzzles are.
ChatGPT - Illusion of Thinking Summary
Shared via ChatGPT
chatgpt.com
June 9, 2025 at 3:43 PM
Today we launched one more addition to the Shisa V2 models: Shisa V2 405B. This is new Llama 3.1 405B post-tune that is the strongest model ever trained in Japan! It matches GPT-4o and DeepSeek-V3 in JA MT-Bench. Read more here: shisa.ai/posts/shisa-...
June 3, 2025 at 4:59 AM
OK, first JA slide deck in the books. 😅 (Thanks, ChatGPT 4.5.)
May 27, 2025 at 4:19 AM
BTW, in case anyone wants to kick the tires or test their 日本語, I have our Shisa V2 405B model up and running temporarily (just a day or two until I finish evals/start training again): chat.shisa.ai
Shisa V2 405B
chat.shisa.ai
May 24, 2025 at 9:19 PM
When your model is sufficiently better than the judge model, it may just start throwing a lot of 10s in its scoring 😂 (based on our overall eval battery shisa-v2 70b is a fair amount better than gpt-4 and gpt-4-turbo, but that's the standard judge used for 1:1 comparisons...)
May 23, 2025 at 5:34 AM
I've recently been poking at Strix Halo. For those interested in using it for inference, it's about expected (except for surprisingly bad llama.cpp HIP perf): www.reddit.com/r/LocalLLaMA... - but for those looking to do work (PyTorch, etc)... the current state is not good.
May 14, 2025 at 5:46 PM
For those curious, Like with Llama 4, I've run Qwen 3 through some Japanese language evals. Writeup here: shisa.ai/posts/qwen3-...
Qwen 3 Japanese Performance – Shisa.AI
shisa.ai
May 1, 2025 at 5:36 AM
Over the weekend, I finished up our Llama 405B run (4th group I know of to do a FFT?). It was a real beast to train, but beats our Shisa V2 70B (as well as GPT-4 and GPT-4 Turbo) using basically our Shisa V2 recipe. It is, I believe the best performing LLM (JA and EN) to ever be trained in Japan.
April 28, 2025 at 12:25 PM
Our small team (of 2!) has just released some of the strongest open Japanese LLMs, Shisa V2 (7-70B). We tried quite a few new techniques (most failed to replicate), so in the end, it was largely grinding out better datasets the past few months: shisa.ai/posts/shisa-...
Shisa V2 – Shisa.AI
shisa.ai
April 15, 2025 at 5:51 PM
For those interested in how Llama 4's Japanese capabilities stack up, I've just published a set of evals I've run here (better than Llama 3, pretty good for their active parameter counts): shisa.ai/posts/llama4...
Llama 4 Japanese Performance – Shisa.AI
shisa.ai
April 10, 2025 at 4:05 PM
I asked OpenAI Deep Research to do an analysis on the Trump tariffs, their walkback due to the bond market exploding etc (including a document someone pointed to as the Trump tariff playbook and China's 4/9 published response. It's a long but pretty digestible read: chatgpt.com/share/67f778...
ChatGPT - Trump Trade Strategy Analysis
Shared via ChatGPT
chatgpt.com
April 10, 2025 at 8:29 AM
The new Llama 4 release has been a bit of a mess. I've been busy so waited for a vLLM stable release blog.vllm.ai/2025/04/05/l... (w/ inference accuracy validation) to see if it's really that bad... Run on an H100 node, they do OK on EN/JA benchmarks (including some unreleased/just created ones)
April 7, 2025 at 10:02 AM
quasar-alpha looks... quite good
April 5, 2025 at 6:25 PM
Daniel Kokotajlo's et al's latest essay, published at ai-2027.com is definitely a full read, but I also asked OpenAI Deep Research to do an in-depth critique analysis of it and some earlier essays. It spent almost 30 minutes on the task: chatgpt.com/share/67efd3...
ChatGPT - AI Projection Analysis 2025
Shared via ChatGPT
chatgpt.com
April 4, 2025 at 1:32 PM
A lot of fun great stuff w the new ChatGPT image generation (r/ChatGPT is a nice party) but this is probably the most interesting things I’ve seen so far: threadreaderapp.com/thread/19057...
Thread by @Josikinz on Thread Reader App
@Josikinz: This just in: Claude expresses significantly less existential distress than chatGPT 4o when presented with the same prompt asking it to script comics about its life (more detail in thread)....
threadreaderapp.com
March 30, 2025 at 12:39 PM
Finally at a point where I can just kick back and wait for results...
March 29, 2025 at 4:04 AM
I never noticed this before. OpenAI Deep Research has some new tricks up its sleeve?
March 28, 2025 at 3:50 PM
Holy crap, I burnt like 8 hours this week banging my head on trying to fix things, but my big runs were blowing up b/c DeepSpeed 0.16.4 is super borked (fix is disable gradient checkpointing or downgrade to 0.15.0): github.com/deepspeedai/...
[BUG] OOM when train 70B models using deepspeed 0.16.4 · Issue #7116 · deepspeedai/DeepSpeed
We found that using OpenRLHF + DeepSpeed 0.15.0, SFT + Adam Offload can train a 70B model with 8 A100 70G + ZeRO3, whereas DeepSpeed 0.16.4 results in OOM. You can try the script https://github.com...
github.com
March 27, 2025 at 3:21 PM
I've been going through some of the rl releases from last year I've been meaning to try out, like SPIN github.com/uclaml/SPIN - I implemented a DPO version w/ tuned hyperparameters, and despite decent trajectories, it fails hard (each iteration eval'd worse than the last)
March 17, 2025 at 6:39 PM
Recently tested SimPO vs DPO and got similar to others w/ DPO better even when (grey line) using the "V2" optimized hyperparams w/ same ArmoRM dataset on similar model (a llama3.1-8b SFT) - used trl 0.13.0 since there's a multi-GPU bug w/ CPOTrainer: github.com/huggingface/...
March 14, 2025 at 5:28 AM
This has been pretty good (and on theme, lol) while doing training/data cleaning: www.youtube.com/watch?v=JRnD... (the work is mysterious and important)
Severance — Music To Refine To feat. ODESZA | Apple TV+
YouTube video by Apple TV
www.youtube.com
March 14, 2025 at 5:15 AM