Janis
janisoteps.bsky.social
Janis
@janisoteps.bsky.social
Founder and developer automating e-commerce industry using agentic programs.

📍Berlin
This shows that it's still feasible to create unsaturated, interesting benchmarks that are easy for humans, yet impossible for AI -- without involving specialist knowledge. We will have AGI when creating such evals becomes outright impossible.
December 20, 2024 at 6:20 PM
So, is this AGI?

While the new model is very impressive and represents a big milestone on the way towards AGI, I don't believe this is AGI -- there's still a fair number of very easy ARC-AGI-1 tasks that o3 can't solve, and we have early indications that ARC-AGI-2 will remain extremely hard for o3.
December 20, 2024 at 6:20 PM
It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.
December 20, 2024 at 6:20 PM
Good point!
December 1, 2024 at 12:32 PM
4. Complex Evaluation: Human feedback is noisy, making it hard to assess models.
5. Weak Frameworks: Manual checks are easier than designing solid, expert-based evaluation systems.
December 1, 2024 at 9:24 AM
1. Slower Cycles: Relying on external experts delays progress.
2. Feedback Resistance: Devs often follow their own ideas over expert advice.
3. Expert Distrust: Specialists may dismiss models they didn’t help build.
December 1, 2024 at 9:24 AM
Applying AI in areas like medicine, law, or science brings big challenges. Devs lack domain expertise, making it tough to judge system performance. Partnering with experts is vital, but this reliance creates hurdles that slow progress and complicate development.
December 1, 2024 at 9:24 AM
However bad Trump is, one thing he does achieve is action from US trading partners.
Shows that democratic leaders should be a bit more agressive too otherwise they give off a wibe of inaction and meekness.
November 30, 2024 at 10:17 AM
Not that I want to defend Elonia but my understanding is that the idea is that due to stricter border enforcement the supply of fentanyl will be reduced and as per usual market dynamics one would expect the prices to go up.
November 26, 2024 at 11:33 AM
A bit slow start of the year but a strong summer.
November 25, 2024 at 4:52 PM
Cursor IDE - great productivity booster for coding.
November 24, 2024 at 8:31 PM
Tech Stack

TiDB – Database to store chat history, vector, json, and analytics
LlamaIndex - RAG framework
DSPy - The framework for programming foundation models
Next.js – Framework
shadcn/ui - Design
November 24, 2024 at 6:49 PM
Features:

Perplexity-style Conversational Search page: website crawler. This crawler navigates official and documentation sites, sitemap URL scraping.

Embeddable js Snippet: Integrate search window into a website by embedding a js facilitates instant responses to product-related queries.
November 24, 2024 at 6:49 PM