A @jessmart.in and @danversfleury.bsky.social project.
We're building a AI intern in this new environment and planning a bakeoff against Pat Sharpe's approach.
Stay tuned for results and more technical deep-dives...
Read the full update here: sociotechnica.org/q1-2025-update
We're building a AI intern in this new environment and planning a bakeoff against Pat Sharpe's approach.
Stay tuned for results and more technical deep-dives...
Read the full update here: sociotechnica.org/q1-2025-update
An invention of necessity:
👁️ Environmental Awareness: A shared data environment where the AI sees context & remembers past actions
⚒️ Tool Autonomy: Agents discover, request & create their own tools
🤝 True Partnership: Genuine human-AI collaboration
An invention of necessity:
👁️ Environmental Awareness: A shared data environment where the AI sees context & remembers past actions
⚒️ Tool Autonomy: Agents discover, request & create their own tools
🤝 True Partnership: Genuine human-AI collaboration
❎ Needs microscopic task breakdowns
❎ Forgets everything after each task
❎ Never learns from mistakes
❎ Can't do basic math
❎ Works in a windowless room with no awareness
❎ Needs everything translated to simplified formats
You're fired. 🙅
❎ Needs microscopic task breakdowns
❎ Forgets everything after each task
❎ Never learns from mistakes
❎ Can't do basic math
❎ Works in a windowless room with no awareness
❎ Needs everything translated to simplified formats
You're fired. 🙅
We (almost) hit our $5/day target! But what we built was still fairly dumb—essentially just fancy scripts that don't truly leverage the reasoning capabilities that make agents interesting.
We (almost) hit our $5/day target! But what we built was still fairly dumb—essentially just fancy scripts that don't truly leverage the reasoning capabilities that make agents interesting.
Our LLM kept us updated through Discord with helpful and fun pirate-themed messages. No need to specify exact text—the AI knew how to be engaging and clear when reporting back.
Our LLM kept us updated through Discord with helpful and fun pirate-themed messages. No need to specify exact text—the AI knew how to be engaging and clear when reporting back.
We saw this coming.
Calculating profit margins is crucial, and AI consistently makes elementary errors. Even giving it a calculator didn't help—calculators require knowing how to use them properly!
We ended up writing custom calculation tools instead.
We saw this coming.
Calculating profit margins is crucial, and AI consistently makes elementary errors. Even giving it a calculator didn't help—calculators require knowing how to use them properly!
We ended up writing custom calculation tools instead.
LLMs often drew incorrect conclusions or overlooked details when parsing the entire HTML for the page.
We had to transform everything into simplified, structured formats like CSV with filtered columns.
LLMs often drew incorrect conclusions or overlooked details when parsing the entire HTML for the page.
We had to transform everything into simplified, structured formats like CSV with filtered columns.
Our AI confused browser UI with website content, misread digits, and struggled with navigation.
An LLM with vision issues + your credit card = danger! We had to shift to browser automation scripts instead.
Our AI confused browser UI with website content, misread digits, and struggled with navigation.
An LLM with vision issues + your credit card = danger! We had to shift to browser automation scripts instead.
We built a simple trading system with clear rules:
- Buy low, sell high-ish
- Target uninjured stars on playoff teams
- Make offers below market
- Post purchases for resale
- Talk like a pirate, arrr!
We built a simple trading system with clear rules:
- Buy low, sell high-ish
- Target uninjured stars on playoff teams
- Make offers below market
- Post purchases for resale
- Talk like a pirate, arrr!
We chose NBA Top Shot as our testing ground. Why a dying market? Because we know it well, and handing AI your credit card is weird enough without extra variables.
We chose NBA Top Shot as our testing ground. Why a dying market? Because we know it well, and handing AI your credit card is weird enough without extra variables.