Ryan Angilly
angilly.bsky.social
Ryan Angilly
@angilly.bsky.social
Applied Research @ NVIDIA
How’d it do?
December 10, 2024 at 1:06 AM
Qwen is probably best out there right now: ollama.com/library/qwen...
qwen2.5-coder
The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.
ollama.com
December 9, 2024 at 2:02 AM
If you want a nicer UI, check out OpenWebUI. It presents a nice ChatGPT-esque web UI with history and etc….
December 9, 2024 at 2:00 AM
My hunch is that they can write machine code right now well enough. I've never seen any evals on it, though.

One thing to consider is portability. Machine code is denser than source code, but I'd bet cross compiling source code to 50 distros is far cheaper from a compute perspective.
December 2, 2024 at 4:38 PM
But yeah I guess bottom line, rag can get you far. Won’t know where it breaks until it does unfortunately. I look forward to a world where RAG systems can monitor themselves and signal to a user “hey it might be time to do some fine tunings!”
December 2, 2024 at 3:17 PM
Depends on the use case. If the query is “what is my most controversial opinion across all my notes?” then rag can easily fall over unless you anticipated it ahead of time in the indexing pipeline. That’s admittedly an extreme example, but the spectrum between that & simple fact retrieval is blurry
December 2, 2024 at 3:12 PM
Yeah I get what you’re saying. But I’d caution against dismissing people because they don’t speak for _everyone_.

I am an expert 😂 and while I trust LLMs for many things, me and most of my friends very much would not trust an LLM machine code output.
December 2, 2024 at 3:00 PM
What is total dataset size in bytes? If complex reasoning across the whole set of notes is required for your use case — it could be! — RAG will fall over on you.
December 2, 2024 at 1:29 PM
Have you done any experiments with your benchmarks going from 1 to 100 examples to see if accuracy regresses?
December 2, 2024 at 1:22 PM
I think it can[1] but we don’t do it because:

1) we don’t trust the LLM enough. We want to review the code.
2) high level languages give you a higher density of expression per token. i.e. it takes less tokens so you get faster answers

[1] chatgpt.com/share/674db3...
ChatGPT - x86 Hello World Code
Shared via ChatGPT
chatgpt.com
December 2, 2024 at 1:20 PM
I work in it so I’m in a bit of a bubble. What are some of the most egregious lies you see?
November 30, 2024 at 7:46 PM
Ok very cool.

Do you run any benchmarks against your default prompt templates, and have you published them so others can compare different models or prompt/template tweaks?
November 30, 2024 at 6:42 PM
Do you fine tune any of your models much or do you just work with prompt templating?
November 30, 2024 at 6:15 PM
Long story short I think the change is a 10 year horizon. Not 2.
November 30, 2024 at 6:07 PM
Only just recently have the models with long enough context length and recall across context to make retrieval work.
November 30, 2024 at 6:07 PM
It’s completely transformed how I work: writing code, tests, design docs; less time scouring stackoverflow or fighting with plantuml/mermaid making diagrams. I’m far more productive.

But I’m a special case.

I think the real unlock is going to be agents. This promise still hasn’t been realized.
November 30, 2024 at 6:03 PM
How this? Can’t tell if it’s underdone.
November 28, 2024 at 7:09 PM
Flowers enjoyed. Very nice flowers.
November 27, 2024 at 3:00 PM
🙋🏻‍♂️
November 27, 2024 at 2:53 PM