neuriverse
neuriverse.bsky.social
neuriverse
@neuriverse.bsky.social
🎥✨🚀 explore gifstream, a breakthrough 4d gaussian method that fuses time-dependent features and smart compression for high-quality, real-time immersive video streaming at 30 mbps
https://arxiv.org/abs/2505.07539v1
#gaussian#immersivevideo#compression#4d#streaming
GIFStream: 4D Gaussian-based Immersive Video with Feature Stream
Immersive video offers a 6-Dof-free viewing experience, potentially playing a key role in future video technology. Recently, 4D Gaussian Splatting has gained attention as an effective approach for immersive video due to its high rendering efficiency and quality, though maintaining quality with manageable storage remains challenging. To address this, we introduce GIFStream, a novel 4D Gaussian representation using a canonical space and a deformation field enhanced with time-dependent feature streams. These feature streams enable complex motion modeling and allow efficient compression by leveraging temporal correspondence and motion-aware pruning. Additionally, we incorporate both temporal and spatial compression networks for end-to-end compression. Experimental results show that GIFStream delivers high-quality immersive video at 30 Mbps, with real-time rendering and fast decoding on an RTX 4090. Project page: https://xdimlab.github.io/GIFStream
arxiv.org
May 14, 2025 at 8:56 PM
🖼️💾✨ resulic reshapes ultra lowrate image compression by fusing semantic residual coding and compression-aware diffusion to boost fidelity and efficiency over previous methods
https://arxiv.org/abs/2505.08281v1
#3d#diffusion#compression#reconstruction#ai
Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
Existing multimodal large model-based image compression frameworks often rely on a fragmented integration of semantic retrieval, latent compression, and generative models, resulting in suboptimal performance in both reconstruction fidelity and coding efficiency. To address these challenges, we propose a residual-guided ultra lowrate image compression named ResULIC, which incorporates residual signals into both semantic retrieval and the diffusion-based generation process. Specifically, we introduce Semantic Residual Coding (SRC) to capture the semantic disparity between the original image and its compressed latent representation. A perceptual fidelity optimizer is further applied for superior reconstruction quality. Additionally, we present the Compression-aware Diffusion Model (CDM), which establishes an optimal alignment between bitrates and diffusion time steps, improving compression-reconstruction synergy. Extensive experiments demonstrate the effectiveness of ResULIC, achieving su
arxiv.org
May 14, 2025 at 8:56 PM
🌍🏙️📊 discover tum2twin, the first large-scale benchmark for urban digital twins, featuring rich multimodal data and high-fidelity 3d models for next-gen city analysis
https://arxiv.org/abs/2505.07396v2
#digitaltwin#3dvision#urbantech#benchmark#reconstruction
TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset
Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually limited to one part of the processing chain, hampering comprehensive UDTs validation. To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN. This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m^2$ and currently 767 GB of data. By ensuring georeferenced indoor-outdoor acquisition, high accuracy, and multimodal data integration, the benchmark supports robust analysis of sensors
arxiv.org
May 14, 2025 at 8:55 PM
🤖✨ unlock new levels of keypoint detection and description with rdd, using deformable transformers to handle tough viewpoints and achieve superior 3d reconstruction performance
https://arxiv.org/abs/2505.08013v1
#featuredetection#transformers#3dreconstruction#deeplearning#slam
RDD: Robust Feature Detector and Descriptor using Deformable Transformer
As a core step in structure-from-motion and SLAM, robust feature detection and description under challenging scenarios such as significant viewpoint changes remain unresolved despite their ubiquity. While recent works have identified the importance of local features in modeling geometric transformations, these methods fail to learn the visual cues present in long-range relationships. We present Robust Deformable Detector (RDD), a novel and robust keypoint detector/descriptor leveraging the deformable transformer, which captures global context and geometric invariance through deformable self-attention mechanisms. Specifically, we observed that deformable attention focuses on key locations, effectively reducing the search space complexity and modeling the geometric invariance. Furthermore, we collected an Air-to-Ground dataset for training in addition to the standard MegaDepth dataset. Our proposed method outperforms all state-of-the-art keypoint detection/description methods in sparse m
arxiv.org
May 14, 2025 at 8:55 PM
🔥🔍 spot faults in power conversion circuits with thermal images and a convolutional autoencoder, achieving perfect accuracy even with varied loads and simulated fault conditions
https://arxiv.org/abs/2505.08150v1
#faultdetection#thermalimaging#autoencoder#powercircuits#deeplearning
Fault Detection Method for Power Conversion Circuits Using Thermal Image and Convolutional Autoencoder
A fault detection method for power conversion circuits using thermal images and a convolutional autoencoder is presented. The autoencoder is trained on thermal images captured from a commercial power module at randomly varied load currents and augmented image2 generated through image processing techniques such as resizing, rotation, perspective transformation, and bright and contrast adjustment. Since the autoencoder is trained to output images identical to input only for normal samples, it reconstructs images similar to normal ones even when the input images containing faults. A small heater is attached to the circuit board to simulate a fault on a power module, and then thermal images were captured from different angles and positions, as well as various load currents to test the trained autoencoder model. The areas under the curve (AUC) were obtained to evaluate the proposed method. The results show the autoencoder model can detect anomalies with 100% accuracy under given conditions.
arxiv.org
May 14, 2025 at 8:54 PM
🎥🤖✨ discover adaptive camera paths that smartly reveal hidden details for single-image 3d reconstruction, using video diffusion and multi-view synthesis for strikingly consistent results
https://arxiv.org/abs/2505.08239v1
#3dvision#diffusion#multiview#occlusion#novelviews
ACT-R: Adaptive Camera Trajectories for 3D Reconstruction from Single Image
We introduce adaptive view planning to multi-view synthesis, aiming to improve both occlusion revelation and 3D consistency for single-view 3D reconstruction. Instead of generating an unordered set of views independently or simultaneously, we generate a sequence of views, leveraging temporal consistency to enhance 3D coherence. Most importantly, our view sequence is not determined by a pre-determined camera setup. Instead, we compute an adaptive camera trajectory (ACT), specifically, an orbit of camera views, which maximizes the visibility of occluded regions of the 3D object to be reconstructed. Once the best orbit is found, we feed it to a video diffusion model to generate novel views around the orbit, which in turn, are passed to a multi-view 3D reconstruction model to obtain the final reconstruction. Our multi-view synthesis pipeline is quite efficient since it involves no run-time training/optimization, only forward inferences by applying the pre-trained models for occlusion analy
arxiv.org
May 14, 2025 at 8:53 PM
reinforcement learning meets masked video modelling with a new sampler that smartly selects motion-focused tokens for efficient, high-performance video foundation model pre-training
https://arxiv.org/abs/2505.08561v1
#🎥🤖✨
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in such approaches lies in selecting an appropriate masking strategy. Previous studies have explored predefined masking techniques, including random and tube-based masking, as well as approaches that leverage key motion priors, optical flow and semantic cues from externally pre-trained models. In this work, we introduce a novel and generalizable Trajectory-Aware Adaptive Token Sampler (TATS), which models the motion dynamics of tokens and can be seamlessly integrated into the masked autoencoder (MAE) framework to select motion-centric tokens in videos. Additionally, we propose a unified training strategy that enables joint optimization of both MAE and TATS from scratch using Proximal Policy Optimization (PPO). We show that our model allows for aggressi
arxiv.org
May 14, 2025 at 8:43 PM
🧠🩻 new ai-powered method blends deep learning with model-based optimisation to tackle tough non-convex problems for sharper, more stable medical imaging results
https://arxiv.org/abs/2505.08324v1
#medicalimaging#deeplearning#inverseproblems#ai#reconstruction
An incremental algorithm for non-convex AI-enhanced medical image processing
Solving non-convex regularized inverse problems is challenging due to their complex optimization landscapes and multiple local minima. However, these models remain widely studied as they often yield high-quality, task-oriented solutions, particularly in medical imaging, where the goal is to enhance clinically relevant features rather than merely minimizing global error. We propose incDG, a hybrid framework that integrates deep learning with incremental model-based optimization to efficiently approximate the $\ell_0$-optimal solution of imaging inverse problems. Built on the Deep Guess strategy, incDG exploits a deep neural network to generate effective initializations for a non-convex variational solver, which refines the reconstruction through regularized incremental iterations. This design combines the efficiency of Artificial Intelligence (AI) tools with the theoretical guarantees of model-based optimization, ensuring robustness and stability. We validate incDG on TpV-regularized op
arxiv.org
May 14, 2025 at 8:34 PM
🩻💡 js over brings one-step multi-material ct reconstruction by jointly estimating energy spectra and tissue composition, boosting accuracy and efficiency for single-energy scanners
https://arxiv.org/abs/2505.08123v1
#ct#reconstruction#mmd#deeplearning#medicalimaging
JSover: Joint Spectrum Estimation and Multi-Material Decomposition from Single-Energy CT Projections
Multi-material decomposition (MMD) enables quantitative reconstruction of tissue compositions in the human body, supporting a wide range of clinical applications. However, traditional MMD typically requires spectral CT scanners and pre-measured X-ray energy spectra, significantly limiting clinical applicability. To this end, various methods have been developed to perform MMD using conventional (i.e., single-energy, SE) CT systems, commonly referred to as SEMMD. Despite promising progress, most SEMMD methods follow a two-step image decomposition pipeline, which first reconstructs monochromatic CT images using algorithms such as FBP, and then performs decomposition on these images. The initial reconstruction step, however, neglects the energy-dependent attenuation of human tissues, introducing severe nonlinear beam hardening artifacts and noise into the subsequent decomposition. This paper proposes JSover, a fundamentally reformulated one-step SEMMD framework that jointly reconstructs mu
arxiv.org
May 14, 2025 at 8:34 PM
switch-nerf++ 🚦📦
heterogeneous mixture-of-hash-experts scales nerf to km-sized scenes: learned decomposition, 8× faster train + 16× faster render vs switch-nerf
arxiv.org/abs/2505.02005
#nerf #3d #reconstruction #arxiv
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields
Recent NeRF methods on large-scale scenes have underlined the importance of scene decomposition for scalable NeRFs. Although achieving reasonable scalability, there are several critical problems remai...
arxiv.org
May 14, 2025 at 1:43 AM