Derek Abdine
dabdine.bsky.social
Derek Abdine
@dabdine.bsky.social
CEO furl.ai. Previously CTO @censys, Head of Labs @rapid7
Fair enough. Way more conspiracy theories in X these days than there used to be though. Also had about 11 bots follow me yesterday after a single tweet. One created a meme coin for my startup. Maybe dead internet theory is actually real….
January 3, 2025 at 1:55 AM
Perfectly describes the current state of Xhitter
January 3, 2025 at 1:50 AM
They’re waiting for you, Gordon. In the tessssssst chamberrrrrr.
December 14, 2024 at 2:43 AM
I keep forgetting they’re still doing this
December 14, 2024 at 1:57 AM
Manual. YMMV with prepared stuff like AutoGPT, but base LLMs at a fundamental level are just token emitters, so you have to string them together with other stuff to make them useful. Like a brain without a body.
December 13, 2024 at 8:57 PM
Another fun thought: I could give furl an agent that knows how furl itself is designed, its code framework, etc., and make it self-generate new agents and tools in case it can’t accomplish a task itself. Even an agent/tool to (re)train its own model.
December 13, 2024 at 7:55 PM
- Anthropic released a computer use model which seems like it would rely on tools combined with image processing (which has already existed).

To name a few. In other words, innovation seems to be on price per token and specific application now rather than on overall accuracy of base models.
December 13, 2024 at 7:38 PM
AI layer to research details about software like vendor website, docs, etc that a human could do but would take forever. Useful for remediation to have all the details about a particular software / package / whatever available when deciding what to do.
December 13, 2024 at 7:27 PM
This setup is used as the backing AI to furl.ai’s autonomous patching. We expose it all as a REST API internally to our other services which rely on our AI layer to gen the scripts/instrictuons/research details on software for us (software inventory info databases suck so we also use our 1/2
December 13, 2024 at 7:27 PM
For executing scripts we basically just boot a clean macOS / windows / Linux (rhel, Ubuntu) host and ship the script, execute and return stdout/stderr. Lots of ways to do that (some cheaper than others). 2/2
December 13, 2024 at 7:23 PM
Nope, those tools were built by us in-house. You can use scraperapi or other headless browser scraping services for content extraction (note: this is a slightly dumb way to do it, there are more intelligent ways to extract text from websites). 1/2
December 13, 2024 at 7:23 PM
to use with the web_scrape tool. If we find that it isn't doing that well enough, we can make a google_search agent (agents have a system prompt, samples, own model, etc that tools don't have. Tools are just functions.) that is specialized for this task. 5/5
December 13, 2024 at 5:53 PM
The research_from_internet tool actually calls our "internet_researcher" agent, which itself has web_scrape and search_google tools. The former will use services to extract text from rendered websites, the latter will use Google's customsearch api. internet_researcher must also gen search terms 4/5
December 13, 2024 at 5:53 PM
For example, the "upgrade_script_developer" agent uses OpenAI's base gpt-4o model, but itself knows about two tools: execute_script_on_runner and research_from_internet. The execute_script_on_runner tool runs a script that is generated by the LLM on a host and simply returns the response. 3/5
December 13, 2024 at 5:53 PM
with it's own system prompt and tool knowledge. Each agent can be configured to use its own model if we want (but don't do right now). When we build out a new agent, we can make the agent use other agents to achieve its goal.
2/5
December 13, 2024 at 5:53 PM
We use OpenAI's base models with RAG (later, fine tuned) essentially. So, in this case gpt-4o. Our "cognition" framework (which follows the NVIDIA blog post) contains agents and tools. Agents know about tools. Agents can be tools themselves. So basically each agent is the specialist 1/5
December 13, 2024 at 5:53 PM
Right now we just use OpenAI, though our design allows us to plug any LLM in (we have support for Gemini, Azure OpenAI, Grok, and Anthropic). Only very few support tool calls. For those that do, I still haven’t seen accuracy or reliability as high as OpenAI. Tool calls can be added to any LLM tho.
December 13, 2024 at 5:41 PM
More or less implement the components here, though the agent graph is not detailed:

developer.nvidia.com/blog/introdu...
Introduction to LLM Agents | NVIDIA Technical Blog
Consider a large language model (LLM) application that is designed to help financial analysts answer questions about the performance of a company. With a well-designed retrieval augmented generation…
developer.nvidia.com
December 13, 2024 at 5:38 PM
Haven’t written a guide, but open to doing that. LangGraph may be the closest framework to what we’ve built.

Most of what we have now is the culmination of trial & error + arxiv papers + blog posts + security/scanning backgrounds + some major conceptual contributions from our former chief of ai
December 13, 2024 at 5:34 PM
Definitely is. I’ve found accuracy improves greatly as you add more “specialists” that work in concert with each other (ie a true multi agent architecture), not just tools and not just prompt engineering. Accuracy scales fairly well and much faster than with prompt tweaks alone.
December 13, 2024 at 5:25 AM
Dunno. I’ve built one that uses agents to reason through creating upgrade scripts that work by giving it access to search google, scrape content from websites, and execute stuff in a sandbox. If it fails itll correct itself and try again. Knowing when to stop is key tho not hard for narrow use cases
December 13, 2024 at 4:12 AM
Yep. Basically run the original request and response through a “critic” which attempts to refute hallucinated bullshit. LLMs are pretty damn good at text extraction, so you are sort of leaning on that to provide some level of error correction.
December 13, 2024 at 3:46 AM