demonstrous.bsky.social
@demonstrous.bsky.social
Perhaps a better example would have been Uber, which lost $32 billion before turning a profit.
www.fool.com/investing/20...
Uber Is Profitable. For Real This Time. | The Motley Fool
It's been a long road to real profits.
www.fool.com
December 9, 2024 at 1:41 AM
A lot of people seem to be misinterpreting the point of this chart. The point is that Amazon famously lost about $6 billion dollars (inflation adjusted) before ever turning a profit.
December 9, 2024 at 1:34 AM
I believe that the LMSYS arena hasn't been valuable for a while, and the best indicators are some of the more difficult benchmarks (SWE-bench, GPQA, etc). This might be a good summary: www.vellum.ai/llm-leaderbo...
LLM Leaderboard 2024
This AI leaderboard shows comparison of capabilities, price and context window for leading commercial and open-source LLMs, based on the benchmark data provided in technical reports in 2024.
www.vellum.ai
December 8, 2024 at 5:04 PM
Yes, this is any excellent list, with the caveat that the criticism is mostly confined to LLMs. I think Gary Marcus does go this far, saying that they are basically useless, and the others state that they are overhyped distractions from real work.
December 7, 2024 at 8:18 PM
I'm not seeing that. o1 is a miracle - see the CodeForces benchmarks, but also massive improvements on my internal benchmarks.
December 6, 2024 at 10:10 PM
That is fascinating. That and some of the other mitigations show that they are actively trying to prevent rogue or replicating agents.
December 6, 2024 at 10:00 PM
Let's try to keep Bluesky more respectful and friendly than Twitter.
December 6, 2024 at 7:27 PM
I'm sorry, but that is not what is happening. LLMs have not disappointed, nor are they slowing. At the fortune 50 company I work at, LLMs have already replaced about 20% of the workforce. I'm sorry, but you need to take this seriously.
December 6, 2024 at 5:24 PM
AI's could never pose an autonomous threat without agency. Thus we can't make any real evaluation until AI agents arrive, which is just happening now. In another year, we can make a real status update.
December 5, 2024 at 1:33 PM
No, for most applications o1 is OpenAI's frontier model. For example, it gets 78% on GPQA (versus 48% here).
December 4, 2024 at 6:32 PM