Lightnews — Scholar-powered news

@gm8xx8.bsky.social

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

ckpt 8B: huggingface.co/jiuhai/flore...
demo: huggingface.co/spaces/jiuha...
code: github.com/JiuhaiChen/F...
paper: arxiv.org/abs/2412.04424

jiuhai/florence-vl-8b-sft · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

December 6, 2024 at 10:26 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

NVILA, a VLM, enhances VILA by scaling spatial and temporal resolutions before compressing visual tokens, enabling efficient high-resolution image & long video processing. Cuts training costs by 4.5X, improves memory & latency, and outperforms top VLMs on benchmarks. Code & models will be released 🔜

December 6, 2024 at 6:47 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

ClearerVoice-Studio by Tongyi Lab is a versatile voice processing framework offering noise removal, speech separation, audio-video speaker extraction, and tools for model fine-tuning and optimization.

git: github.com/modelscope/C...
demo: huggingface.co/spaces/aliba...

ClearerVoice-Studio (Speech Enhancement, Separation and Extraction) - a Hugging Face Space by alibabasglab

Better AI powered platform to purify your speech signal

huggingface.co

December 6, 2024 at 6:32 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

PaliGemma 2: A Family of Versatile VLMs for Transfer

paper: arxiv.org/abs/2412.03555

December 5, 2024 at 3:24 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Home robotics just got a boost.

Stretch AI - a new open-source suite of tools, tutorials, and reference code to explore and build AI-enabled home robot applications.

GitHub - hello-robot/stretch_ai

Contribute to hello-robot/stretch_ai development by creating an account on GitHub.

github.com

December 3, 2024 at 7:34 PM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Liquid AI introduces synthesis of tailored architectures (STAR) a new approach to automate neural network design tailored to various tasks and hardware setups.

🔗: www.liquid.ai/research/aut...

December 2, 2024 at 11:45 PM

Reposted by 𝚐𝔪𝟾𝚡𝚡𝟾

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

JetFormer: An Autoregressive Generative Model of Raw Images and Text

paper: arxiv.org/abs/2411.19722

JetFormer unifies text and image modeling with a normalizing flow, enabling strong text-to-image generation and image understanding.

JetFormer: An Autoregressive Generative Model of Raw Images and Text

Removing modeling constraints and unifying architectures across domains has been a key driver of the recent progress in training large multimodal models. However, most of these models still rely on ma...

arxiv.org

December 2, 2024 at 6:09 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Marconi: Prefix Caching for the Era of Hybrid LLMs

paper: arxiv.org/abs/2411.19379

Marconi improves caching for hybrid LLMs with policies optimizing reuse likelihood and compute savings, achieving 34.4× higher token hit rates and significantly reducing latency.

December 2, 2024 at 9:35 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

DeMo: Decoupled Momentum Optimization

code: github.com/bloc97/DeMo
paper: arxiv.org/abs/2411.19870

December 2, 2024 at 9:29 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

code: github.com/PKU-HMI-Lab/...
paper: arxiv.org/abs/2411.18623
project: lift3d-web.github.io

GitHub - PKU-HMI-Lab/LIFT3D

Contribute to PKU-HMI-Lab/LIFT3D development by creating an account on GitHub.

github.com

December 2, 2024 at 9:17 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Efficient Track Anything

paper: arxiv.org/abs/2411.18933
project page: yformer.github.io/efficient-tr...

Efficient Track Anything

Segment Anything Model 2 (SAM 2) has emerged as a powerful tool for video object segmentation and tracking anything. Key components of SAM 2 that drive the impressive video object segmentation perform...

arxiv.org

December 2, 2024 at 8:13 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

OMuleT: Orchestrating Multiple Tools for Practicable Conversational Recommendation

Roblox paper: arxiv.org/abs/2411.19352

A CRS enhancing LLMs with 10+ tools improves recommendations and shares insights from design, evaluation, and deployment.

OMuleT: Orchestrating Multiple Tools for Practicable Conversational Recommendation

In this paper, we present a systematic effort to design, evaluate, and implement a realistic conversational recommender system (CRS). The objective of our system is to allow users to input free-form t...

arxiv.org

December 2, 2024 at 7:37 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Training Agents with Weakly Supervised Feedback from Large Language Models

paper: arxiv.org/abs/2411.19547

December 2, 2024 at 7:36 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

paper: arxiv.org/abs/2411.19943

December 2, 2024 at 6:11 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Reverse Thinking Makes LLMs Stronger Reasoners

paper: arxiv.org/abs/2411.19865

RevThink improves LLM reasoning by 13.53% using structured forward-backward reasoning, ensuring strong generalization and data efficiency.

Reverse Thinking Makes LLMs Stronger Reasoners

Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. Thi...

arxiv.org

December 2, 2024 at 6:10 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

JetFormer: An Autoregressive Generative Model of Raw Images and Text

paper: arxiv.org/abs/2411.19722

JetFormer unifies text and image modeling with a normalizing flow, enabling strong text-to-image generation and image understanding.

JetFormer: An Autoregressive Generative Model of Raw Images and Text

Removing modeling constraints and unifying architectures across domains has been a key driver of the recent progress in training large multimodal models. However, most of these models still rely on ma...

arxiv.org

December 2, 2024 at 6:09 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Q-learning-based Model-free Safety Filter

paper: arxiv.org/abs/2411.19809

A plug-and-play model-free safety filter uses Q-learning to ensure safe actions in robotics, integrating easily with RL algorithms. Simulations and real-world tests confirm its effectiveness.

Q-learning-based Model-free Safety Filter

Ensuring safety via safety filters in real-world robotics presents significant challenges, particularly when the system dynamics is complex or unavailable. To handle this issue, learning-based safety ...

arxiv.org

December 2, 2024 at 6:09 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

KV Shifting Attention Enhances Language Modeling

paper: arxiv.org/abs/2411.19574

KV shifting attention enhances induction heads in LLMs improving efficiency, in-context learning, and faster convergence, even in models with over 10 billion parameters.

KV Shifting Attention Enhances Language Modeling

The current large language models are mainly based on decode-only structure transformers, which have great in-context learning (ICL) capabilities. It is generally believed that the important foundatio...

arxiv.org

December 2, 2024 at 6:08 AM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

CATP-LLM is a cost-efficient tool planning framework for LLMs, using a planning language for concurrent execution and offline reinforcement learning to balance performance and cost. It outperforms GPT-4 on OpenCATP, with up to 30.2% better performance and 45.8% lower costs.

CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning

Utilizing large language models (LLMs) for tool planning has emerged as a promising avenue for developing general AI systems, where LLMs automatically schedule external tools (e.g. vision models) to t...

arxiv.org

November 30, 2024 at 9:09 PM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

Directed Token Sliding

The Token Sliding problem examines transforming one token configuration into another on a graph by sliding tokens while keeping independence. It is PSPACE-complete for various graph types but solvable in polynomial time for oriented cycles and cographs.

Directed Token Sliding

Reconfiguration problems involve determining whether two given configurations can be transformed into each other under specific rules. The Token Sliding problem asks whether, given two different set o...

arxiv.org

November 30, 2024 at 8:59 PM

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8.bsky.social

We’re here with good intentions. A lot of the researchers here are genuinely helpful—take the time to follow them and explore their work. Our aim is to contribute, grow, and make things better for everyone.

Jeremy Howard @howard.fm · Nov 28

Did you know that 99% of email today is spam? Your inbox isn’t 99% spam because AI is used to filter it.

The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.

November 29, 2024 at 6:58 AM