See more work from the NVIDIA Spatial Intelligence Lab: research.nvidia.com/labs/toronto...
Work supported indirectly by MIT CSAIL, @vectorinstitute.ai
#nvidia #mit
See more work from the NVIDIA Spatial Intelligence Lab: research.nvidia.com/labs/toronto...
Work supported indirectly by MIT CSAIL, @vectorinstitute.ai
#nvidia #mit
We report an improved SDR to ground-truth sources when available and show improved CLAP scores after training.
We report an improved SDR to ground-truth sources when available and show improved CLAP scores after training.
We improve CLAP scores over training for prompts, along with qualitative results. Impact synthesis shows improved performance on impact-oriented prompts.
We improve CLAP scores over training for prompts, along with qualitative results. Impact synthesis shows improved performance on impact-oriented prompts.
We demonstrate a pipeline that takes a video from the internet, captions the audio with a model (like AudioCaps), and provides that to an LLM-assistant who suggests source decompositions. We run our method on the suggested decompositions.
We demonstrate a pipeline that takes a video from the internet, captions the audio with a model (like AudioCaps), and provides that to an LLM-assistant who suggests source decompositions. We run our method on the suggested decompositions.
🅰 We use an augmented Decoder-SDS in audio space, 🅱 using a spectrogram emphasis to better weight transients, and 🅲️ multiple denoising steps to increase fidelity.
This image highlights these in red in the detailed overview of our update.
🅰 We use an augmented Decoder-SDS in audio space, 🅱 using a spectrogram emphasis to better weight transients, and 🅲️ multiple denoising steps to increase fidelity.
This image highlights these in red in the detailed overview of our update.
A prompt-conditioning source separation for a given audio, such as separating a “sax …” and “cars …” from a music recording on a road, by using the audio-SDS update for each channel while forcing the sum of channels to reconstruct the audio.
A prompt-conditioning source separation for a given audio, such as separating a “sax …” and “cars …” from a music recording on a road, by using the audio-SDS update for each channel while forcing the sum of channels to reconstruct the audio.
We generate impacts consistent with prompts like “hitting pot with wooden spoon” by convolving an impact with a learned object and reverb impulse. We learn the parametrized forms of the object and reverb impulses.
We generate impacts consistent with prompts like “hitting pot with wooden spoon” by convolving an impact with a learned object and reverb impulse. We learn the parametrized forms of the object and reverb impulses.
A toy setup where we generate settings aligning with prompts like “kick drum, bass, reverb” using sine oscillators modulating each other’s frequency as in a synthesizer.
We visualize the final optimized parameters as the dial settings on a synthesizer instrument's user interface.
A toy setup where we generate settings aligning with prompts like “kick drum, bass, reverb” using sine oscillators modulating each other’s frequency as in a synthesizer.
We visualize the final optimized parameters as the dial settings on a synthesizer instrument's user interface.
This image briefly summarizes the use case, optimizable parameters, rendering function, and parameter update.
This image briefly summarizes the use case, optimizable parameters, rendering function, and parameter update.
We repurpose Score Distillation Sampling (SDS) for audio, turning any pretrained audio diffusion model into a tool for diverse tasks, including source separation, impact synthesis & more.
🎧 Demos, audio examples, paper: research.nvidia.com/labs/toronto...
🧵below
We repurpose Score Distillation Sampling (SDS) for audio, turning any pretrained audio diffusion model into a tool for diverse tasks, including source separation, impact synthesis & more.
🎧 Demos, audio examples, paper: research.nvidia.com/labs/toronto...
🧵below
See more work from the #NVIDIA Toronto AI Lab here: research.nvidia.com/labs/toronto...
Work supported by Tsinghua University, @vectorinst.bsky.social, @uoft.bsky.social #UofT #Tsinghua
See more work from the #NVIDIA Toronto AI Lab here: research.nvidia.com/labs/toronto...
Work supported by Tsinghua University, @vectorinst.bsky.social, @uoft.bsky.social #UofT #Tsinghua
🧩 Blender Addon: github.com/huggingface/...
🧩 Blender Addon: github.com/huggingface/...
🕹️ Demo: huggingface.co/spaces/Zheng...
🕹️ Demo: huggingface.co/spaces/Zheng...
We enable LLMs to generate 3D meshes by representing them as plain text and fine-tuning, unifying 3D and text modalities in a single model.
🔎 Webpage research.nvidia.com/labs/toronto...
🕹️ Interactive Demo huggingface.co/spaces/Zheng...
💾 Model checkpoint available
We enable LLMs to generate 3D meshes by representing them as plain text and fine-tuning, unifying 3D and text modalities in a single model.
🔎 Webpage research.nvidia.com/labs/toronto...
🕹️ Interactive Demo huggingface.co/spaces/Zheng...
💾 Model checkpoint available
Supported (indirectly) by @anthropic.com , NVIDIA, @vectorinst.bsky.social, @uoft.bsky.social/ @uoftartsci.bsky.social
Supported (indirectly) by @anthropic.com , NVIDIA, @vectorinst.bsky.social, @uoft.bsky.social/ @uoftartsci.bsky.social