- AI 101 series
- ML techniques
- AI Unicorns profiles
- Global dynamics
- ML History
- AI/ML Flashcards
Haven't decided yet which handle to maintain: this or @kseniase
• Costs to run models are dropping as they are becoming more efficient.
• The release of ChatGPT sped up the growth of efficiency of new models up to 50%!
• Techniques like pruning and distillation don’t necessarily make models more efficient.
• Costs to run models are dropping as they are becoming more efficient.
• The release of ChatGPT sped up the growth of efficiency of new models up to 50%!
• Techniques like pruning and distillation don’t necessarily make models more efficient.
Focus on a balance between models' size and performance is more important that aiming for larger models
Tsinghua University and ModelBest Inc propose the idea of “capacity density” to measure how efficiently a model uses its size
Focus on a balance between models' size and performance is more important that aiming for larger models
Tsinghua University and ModelBest Inc propose the idea of “capacity density” to measure how efficiently a model uses its size
Generates 3D environments with object interactions, animations, and physical effects from one image or text prompt. You can interact with them in real-time using a keyboard and mouse.
Paper: deepmind.google/discover/blo...
Our example: www.youtube.com/watch?v=YjO6...
Generates 3D environments with object interactions, animations, and physical effects from one image or text prompt. You can interact with them in real-time using a keyboard and mouse.
Paper: deepmind.google/discover/blo...
Our example: www.youtube.com/watch?v=YjO6...
Here are 2 latest revolutional World Models, which create interactive 3D environments:
1. GoogleDeepMind's Genie 2
2. AI system from World Labs, co-founded by Fei-Fei Li
Explore more below 👇
Here are 2 latest revolutional World Models, which create interactive 3D environments:
1. GoogleDeepMind's Genie 2
2. AI system from World Labs, co-founded by Fei-Fei Li
Explore more below 👇
Flow Matching (FM) is used in top generative models, like Flux, F5-TTS, E2-TTS, and MovieGen with state-pf-the-art results. Some experts even say that FM might surpass diffusion models👇
Flow Matching (FM) is used in top generative models, like Flux, F5-TTS, E2-TTS, and MovieGen with state-pf-the-art results. Some experts even say that FM might surpass diffusion models👇
INTELLECT-1 is a 10B open-source LLM trained over 42 days on 1T tokens across 14 global nodes, leverages the PRIME framework for exceptional efficiency (400× bandwidth reduction).
github.com/PrimeIntelle...
INTELLECT-1 is a 10B open-source LLM trained over 42 days on 1T tokens across 14 global nodes, leverages the PRIME framework for exceptional efficiency (400× bandwidth reduction).
github.com/PrimeIntelle...
MultiFoley is an AI model generating high-quality sound effects from text, audio, and video inputs. Cool demos highlight its creative potential.
arxiv.org/abs/2411.17698
MultiFoley is an AI model generating high-quality sound effects from text, audio, and video inputs. Cool demos highlight its creative potential.
arxiv.org/abs/2411.17698
ShowUI is a 2B vision-language-action model tailored for GUI tasks:
- features UI-guided token selection (33% fewer tokens)
- interleaved streaming for multi-turn tasks
- 256K dataset
- achieves 75.1% zero-shot grounding accuracy
arxiv.org/abs/2411.17465
ShowUI is a 2B vision-language-action model tailored for GUI tasks:
- features UI-guided token selection (33% fewer tokens)
- interleaved streaming for multi-turn tasks
- 256K dataset
- achieves 75.1% zero-shot grounding accuracy
arxiv.org/abs/2411.17465
OLMo 2, a family of fully open LMs with 7B and 13B parameter, is trained on 5 trillion tokens.
allenai.org/blog/olmo2
OLMo 2, a family of fully open LMs with 7B and 13B parameter, is trained on 5 trillion tokens.
allenai.org/blog/olmo2
• Alibaba’s QwQ-32B
• OLMo 2 by Allen AI
• ShowUI by Show Lab, NUS, Microsoft
• Adobe's MultiFoley
• INTELLECT-1 by Prime Intellect
🧵
• Alibaba’s QwQ-32B
• OLMo 2 by Allen AI
• ShowUI by Show Lab, NUS, Microsoft
• Adobe's MultiFoley
• INTELLECT-1 by Prime Intellect
🧵
This framework leverages recursive language-based "games" for self-improvement, focusing of feedback, coverage, and scalability. It suggests a roadmap for scalable AI via autonomous data gen and feedback loops
arxiv.org/abs/2411.16905
This framework leverages recursive language-based "games" for self-improvement, focusing of feedback, coverage, and scalability. It suggests a roadmap for scalable AI via autonomous data gen and feedback loops
arxiv.org/abs/2411.16905
@msftresearch.bsky.social’s MH-MoE improves sparse MoE by adding multi-head attention, reducing perplexity without increasing FLOPs, and demonstrating robust performance under quantization.
arxiv.org/abs/2411.16205
@msftresearch.bsky.social’s MH-MoE improves sparse MoE by adding multi-head attention, reducing perplexity without increasing FLOPs, and demonstrating robust performance under quantization.
arxiv.org/abs/2411.16205
Presents a taxonomy of methodologies and applications of LLMs for judgment tasks, highlighting bias, vulnerabilities, and self-judgment, with future directions in human-LLM collaboration and bias mitigation
arxiv.org/abs/2411.16594
Presents a taxonomy of methodologies and applications of LLMs for judgment tasks, highlighting bias, vulnerabilities, and self-judgment, with future directions in human-LLM collaboration and bias mitigation
arxiv.org/abs/2411.16594
NVIDIA introduced a block-sparse attention mechanism for Transformer-based LLMs. It uses local/global attention phases to achieve up to 11x inference speedup on sequences up to 1M tokens, retaining 95-100% accuracy.
arxiv.org/abs/2411.17116
Code: github.com/NVIDIA/Star-...
NVIDIA introduced a block-sparse attention mechanism for Transformer-based LLMs. It uses local/global attention phases to achieve up to 11x inference speedup on sequences up to 1M tokens, retaining 95-100% accuracy.
arxiv.org/abs/2411.17116
Code: github.com/NVIDIA/Star-...
• Natural Language Reinforcement Learning
• Star Attention, NVIDIA
• Opportunities and Challenges of LLM-as-a-judge
• MH-MoE: Multi-Head Mixture-of-Experts, @msftresearch.bsky.social
• Boundless Socratic Learning with Language Games, Google DeepMind
🧵
• Natural Language Reinforcement Learning
• Star Attention, NVIDIA
• Opportunities and Challenges of LLM-as-a-judge
• MH-MoE: Multi-Head Mixture-of-Experts, @msftresearch.bsky.social
• Boundless Socratic Learning with Language Games, Google DeepMind
🧵
After "looking" at the screen, the agent prepares a prompt for the AI model, which includes the user’s instructions, the visual data (like screenshots or buttons layout), and other context the agent needs to understand the task.
After "looking" at the screen, the agent prepares a prompt for the AI model, which includes the user’s instructions, the visual data (like screenshots or buttons layout), and other context the agent needs to understand the task.
Firstly, the agent needs to "see" the software it’s working with. This is done through methods that capture the layout of the app or website, such as screenshots, lists of buttons and menus called widget trees.
Firstly, the agent needs to "see" the software it’s working with. This is done through methods that capture the layout of the app or website, such as screenshots, lists of buttons and menus called widget trees.
They blend LLMs' capabilities with software interaction to work with websites, mobile apps, and desktop software, simplifying complex tasks.
A new survey on LLM-brained GUI agents was published👇
They blend LLMs' capabilities with software interaction to work with websites, mobile apps, and desktop software, simplifying complex tasks.
A new survey on LLM-brained GUI agents was published👇
www.turingpost.com/t/Twitter-Li...
www.turingpost.com/t/Twitter-Li...
• 100 Days of ML Code
• Data Science For Beginners
• Awesome Data Science
• Data Science Masters
• Homemade Machine Learning
• 500+ AI Projects List with Code
• Awesome Artificial Intelligence
...
Check out for more👇
• 100 Days of ML Code
• Data Science For Beginners
• Awesome Data Science
• Data Science Masters
• Homemade Machine Learning
• 500+ AI Projects List with Code
• Awesome Artificial Intelligence
...
Check out for more👇
- Thanks to thought cards, HiAR-ICL requires less human input to guide it and adapts better to different types of problems
-Faster by focusing only on actions from thought card
- Accuracy: It achieved 79.6% accuracy on a math benchmark, compared to GPT-4o’s 76.6%.
- Thanks to thought cards, HiAR-ICL requires less human input to guide it and adapts better to different types of problems
-Faster by focusing only on actions from thought card
- Accuracy: It achieved 79.6% accuracy on a math benchmark, compared to GPT-4o’s 76.6%.
Thought cards are like reusable problem-solving templates.
After MCTS generates multiple possible solution paths for each problem, HiAR-ICL picks the best one by weighing the paths using Value of Computation (VOC).
Thought cards are like reusable problem-solving templates.
After MCTS generates multiple possible solution paths for each problem, HiAR-ICL picks the best one by weighing the paths using Value of Computation (VOC).