𝚐𝔪𝟾𝚡𝚡𝟾
banner
gm8xx8.bsky.social
𝚐𝔪𝟾𝚡𝚡𝟾
@gm8xx8.bsky.social
☺︎
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

ckpt 8B: huggingface.co/jiuhai/flore...
demo: huggingface.co/spaces/jiuha...
code: github.com/JiuhaiChen/F...
paper: arxiv.org/abs/2412.04424
jiuhai/florence-vl-8b-sft · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
December 6, 2024 at 10:26 AM
NVILA, a VLM, enhances VILA by scaling spatial and temporal resolutions before compressing visual tokens, enabling efficient high-resolution image & long video processing. Cuts training costs by 4.5X, improves memory & latency, and outperforms top VLMs on benchmarks. Code & models will be released 🔜
December 6, 2024 at 6:47 AM
ClearerVoice-Studio by Tongyi Lab is a versatile voice processing framework offering noise removal, speech separation, audio-video speaker extraction, and tools for model fine-tuning and optimization.

git: github.com/modelscope/C...
demo: huggingface.co/spaces/aliba...
ClearerVoice-Studio (Speech Enhancement, Separation and Extraction) - a Hugging Face Space by alibabasglab
Better AI powered platform to purify your speech signal
huggingface.co
December 6, 2024 at 6:32 AM
PaliGemma 2: A Family of Versatile VLMs for Transfer

paper: arxiv.org/abs/2412.03555
December 5, 2024 at 3:24 AM
Home robotics just got a boost.

Stretch AI - a new open-source suite of tools, tutorials, and reference code to explore and build AI-enabled home robot applications.
GitHub - hello-robot/stretch_ai
Contribute to hello-robot/stretch_ai development by creating an account on GitHub.
github.com
December 3, 2024 at 7:34 PM
Liquid AI introduces synthesis of tailored architectures (STAR) a new approach to automate neural network design tailored to various tasks and hardware setups.

🔗: www.liquid.ai/research/aut...
December 2, 2024 at 11:45 PM
Reposted by 𝚐𝔪𝟾𝚡𝚡𝟾
JetFormer: An Autoregressive Generative Model of Raw Images and Text

paper: arxiv.org/abs/2411.19722

JetFormer unifies text and image modeling with a normalizing flow, enabling strong text-to-image generation and image understanding.
JetFormer: An Autoregressive Generative Model of Raw Images and Text
Removing modeling constraints and unifying architectures across domains has been a key driver of the recent progress in training large multimodal models. However, most of these models still rely on ma...
arxiv.org
December 2, 2024 at 6:09 AM
Marconi: Prefix Caching for the Era of Hybrid LLMs

paper: arxiv.org/abs/2411.19379

Marconi improves caching for hybrid LLMs with policies optimizing reuse likelihood and compute savings, achieving 34.4× higher token hit rates and significantly reducing latency.
December 2, 2024 at 9:35 AM
DeMo: Decoupled Momentum Optimization

code: github.com/bloc97/DeMo
paper: arxiv.org/abs/2411.19870
December 2, 2024 at 9:29 AM
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

code: github.com/PKU-HMI-Lab/...
paper: arxiv.org/abs/2411.18623
project: lift3d-web.github.io
GitHub - PKU-HMI-Lab/LIFT3D
Contribute to PKU-HMI-Lab/LIFT3D development by creating an account on GitHub.
github.com
December 2, 2024 at 9:17 AM
OMuleT: Orchestrating Multiple Tools for Practicable Conversational Recommendation

Roblox paper: arxiv.org/abs/2411.19352

A CRS enhancing LLMs with 10+ tools improves recommendations and shares insights from design, evaluation, and deployment.
OMuleT: Orchestrating Multiple Tools for Practicable Conversational Recommendation
In this paper, we present a systematic effort to design, evaluate, and implement a realistic conversational recommender system (CRS). The objective of our system is to allow users to input free-form t...
arxiv.org
December 2, 2024 at 7:37 AM
Training Agents with Weakly Supervised Feedback from Large Language Models

paper: arxiv.org/abs/2411.19547
December 2, 2024 at 7:36 AM
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

paper: arxiv.org/abs/2411.19943
December 2, 2024 at 6:11 AM
Reverse Thinking Makes LLMs Stronger Reasoners

paper: arxiv.org/abs/2411.19865

RevThink improves LLM reasoning by 13.53% using structured forward-backward reasoning, ensuring strong generalization and data efficiency.
Reverse Thinking Makes LLMs Stronger Reasoners
Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. Thi...
arxiv.org
December 2, 2024 at 6:10 AM
JetFormer: An Autoregressive Generative Model of Raw Images and Text

paper: arxiv.org/abs/2411.19722

JetFormer unifies text and image modeling with a normalizing flow, enabling strong text-to-image generation and image understanding.
JetFormer: An Autoregressive Generative Model of Raw Images and Text
Removing modeling constraints and unifying architectures across domains has been a key driver of the recent progress in training large multimodal models. However, most of these models still rely on ma...
arxiv.org
December 2, 2024 at 6:09 AM
Q-learning-based Model-free Safety Filter

paper: arxiv.org/abs/2411.19809

A plug-and-play model-free safety filter uses Q-learning to ensure safe actions in robotics, integrating easily with RL algorithms. Simulations and real-world tests confirm its effectiveness.
Q-learning-based Model-free Safety Filter
Ensuring safety via safety filters in real-world robotics presents significant challenges, particularly when the system dynamics is complex or unavailable. To handle this issue, learning-based safety ...
arxiv.org
December 2, 2024 at 6:09 AM
KV Shifting Attention Enhances Language Modeling

paper: arxiv.org/abs/2411.19574

KV shifting attention enhances induction heads in LLMs improving efficiency, in-context learning, and faster convergence, even in models with over 10 billion parameters.
KV Shifting Attention Enhances Language Modeling
The current large language models are mainly based on decode-only structure transformers, which have great in-context learning (ICL) capabilities. It is generally believed that the important foundatio...
arxiv.org
December 2, 2024 at 6:08 AM
CATP-LLM is a cost-efficient tool planning framework for LLMs, using a planning language for concurrent execution and offline reinforcement learning to balance performance and cost. It outperforms GPT-4 on OpenCATP, with up to 30.2% better performance and 45.8% lower costs.
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Utilizing large language models (LLMs) for tool planning has emerged as a promising avenue for developing general AI systems, where LLMs automatically schedule external tools (e.g. vision models) to t...
arxiv.org
November 30, 2024 at 9:09 PM
Directed Token Sliding

The Token Sliding problem examines transforming one token configuration into another on a graph by sliding tokens while keeping independence. It is PSPACE-complete for various graph types but solvable in polynomial time for oriented cycles and cographs.
Directed Token Sliding
Reconfiguration problems involve determining whether two given configurations can be transformed into each other under specific rules. The Token Sliding problem asks whether, given two different set o...
arxiv.org
November 30, 2024 at 8:59 PM
We’re here with good intentions. A lot of the researchers here are genuinely helpful—take the time to follow them and explore their work. Our aim is to contribute, grow, and make things better for everyone.
Did you know that 99% of email today is spam? Your inbox isn’t 99% spam because AI is used to filter it.

The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
November 29, 2024 at 6:58 AM
Star Attention: Efficient LLM Inference over Long Sequences

🔗: github.com/NVIDIA/Star-...
paper: arxiv.org/abs/2411.17116
GitHub - NVIDIA/Star-Attention: Efficient LLM Inference over Long Sequences
Efficient LLM Inference over Long Sequences. Contribute to NVIDIA/Star-Attention development by creating an account on GitHub.
github.com
November 28, 2024 at 12:44 AM
Skywork-o1-Open
Skywork-o1-Open - a Skywork Collection
Skywork o1 open model collections
huggingface.co
November 27, 2024 at 11:14 PM