MJ
banner
mjrun.bsky.social
MJ
@mjrun.bsky.social
Seeking AI and data-driven strategies to create personalized and impactful educational experiences. With a focus on breaking down data silos, operationalize data and empower teams to make smarter, faster decisions.
#Sonnet 3.7 has arrived. #Anthropic has caught up in several of the reasoning-heavy benchmarks. I expect the coding ability to lead the pack.
February 25, 2025 at 2:38 AM
Do robots dream of electric sheep and why don't LLMs request calculators? www.mindprison.cc/p/why-llms-d...
February 21, 2025 at 2:34 AM
Anatomy of a good #o1 prompt
February 19, 2025 at 10:18 PM
What happens when you tell #ClaudeAI about recent events...
February 18, 2025 at 7:37 PM
#Grok-3 lands at #1 in #LMArena
February 18, 2025 at 4:31 AM
#Deepseek R1 just erased about 1/2 a trillion of market cap from #nvidia. It remains to be seen how China did this, if they can extend this low cost model to #o3 levels it means a lasting change in the #AI landscape.

I've experimented with #R1 a lot and will say it's suspiciously similar to #o1pro.
January 27, 2025 at 10:58 PM
@dario_amodei Says that in 2-3 years we will have "a country of geniuses in a datacenter". This in reference to what he sees as the most likely path for #AI development.
January 21, 2025 at 5:36 PM
#deepseek has dropped a bomb on the AI world. #R1 is an extremely impressive open source model that can be used at a much lower cost than #o1 with comparable performance. It can rival Claude 3.5 in coding. The distilled models can easily beat #4o even at 1.5B parameters (which could run on a phone).
January 20, 2025 at 11:01 PM
Plotting #GPQA based on release date indicates a curve that certainly looks exponential. #e/acc
January 18, 2025 at 5:20 PM
#o3mini is on its way. Not to mention a tease of the GPT and o series being merged.
January 17, 2025 at 11:08 PM
I feel like this happens when you assume Ex Machina was a documentary.
January 15, 2025 at 5:59 PM
Used #o1pro to create an entire synthetic database schema in #SQLite. I then worked with it to create an #agentic framework to run SQL selects and create Python code for analysis.

#AiEDU I'd like to scale this to become an IPEDS and State reporting tool with documentation that provides real answers
January 8, 2025 at 6:36 PM
I got #o1pro and because it's $200 I almost feel obligated to use it.

The paradox here, for @samasama.bsky.social to solve, is when you make the price fairly high you make people feel like they *must* use it to get their money's worth. Had it been set to $50 I would not feel so motivated.
January 6, 2025 at 5:54 PM
2025 will likely be the year of the #AIAgent. Pairing #o3 with a robust agentic architecture will make it a perfectly functional employee. Snip below from @samasama.bsky.social
January 6, 2025 at 3:49 AM
#OpenAI staff throwing around the #ASI hype pretty freely these days...
January 5, 2025 at 3:40 AM
This seems plausible. I'd say #o1pro can already do supervised ML research (assuming the human is in the loop to provide access to data and run the code).
January 3, 2025 at 12:13 AM
@officiallogank.bsky.social thinks we are on the path to #ASI even without, apparently, any major new breakthroughs. I assume this means #TTC is going to have some legs.
January 1, 2025 at 3:03 PM
Researchers at Stanford found #LLM performance on the #Putnam math benchmark worsened substantially when the problem set used slightly different numbers in the problem. This suggests models are already trained on these public datasets.

#o1 preview suffered almost a 30% decline in performance.
January 1, 2025 at 2:59 PM
Here are the things @samasama.bsky.social heard most in a recent request for features. Apparently not that much overlap with what they're planning for 2025. Personally I'm quite interested in what a "grown up mode" would mean.
December 30, 2024 at 11:27 PM
Why hallucinations in #AI models are sometimes great.

archive.ph/0e3bV
December 30, 2024 at 4:39 PM
At the end of 2024 what are some opinions you hold on #AI that diverge from consensus?
December 29, 2024 at 3:37 PM
#google is ramping up for a big 2025 in AI. @demishassabis.bsky.social has virtually promised full AI agentic capability (just kidding).
December 28, 2024 at 3:27 PM
Hi @microsoft.com did you forget to hit publish on #Phi-4? 😑
December 27, 2024 at 4:38 PM
New scores on #aidenbench. Gemini Flash is doing some heavy lifting. Looking forward to the full thinking Gemini release.
December 27, 2024 at 3:00 AM
I don't think you can give all the credit to #ChatGPT but it certainly did help add 8 trillion in market cap to the #Mag7 (or Mag 6 in this case) over the two years since #OpenAI released it.
December 26, 2024 at 10:12 PM