

Long LRM
Overview :
Long-LRM is a model designed for 3D Gaussian reconstruction, capable of recreating large scenes from a series of input images. The model can process 32 images at a resolution of 960x540 in just 1.3 seconds, operating on a single A100 80G GPU. It integrates the latest Mamba2 modules with traditional transformer modules to enhance efficiency without compromising quality through effective token merging and Gaussian trimming. Unlike traditional feedforward models that can only reconstruct small portions of a scene, Long-LRM can regenerate the entire scene in one go. On large-scale scene datasets like DL3DV-140 and Tanks and Temples, Long-LRM's performance is comparable to optimization-based methods while achieving two orders of magnitude greater efficiency.
Target Users :
The target audience includes 3D modelers, game developers, virtual reality content creators, and any professionals needing rapid and efficient 3D scene reconstruction. Long-LRM's high efficiency and quality reconstruction capabilities allow these users to create realistic 3D scenes in a short amount of time, accelerating product development processes and enhancing work efficiency.
Use Cases
Use Long-LRM to quickly reconstruct a 3D city model from a series of street scene images.
In game development, leverage Long-LRM to recreate game scenes from real-life photographs to enhance realism.
Virtual reality content creators utilize Long-LRM to reconstruct high-precision virtual environments from images taken from multiple angles.
Features
Processes up to 32 high-resolution input images for rapid 3D scene reconstruction
Utilizes a hybrid architecture of Mamba2 blocks and transformer blocks to enhance token processing capabilities
Balances reconstruction quality and efficiency through token merging and Gaussian trimming steps
Reconstructs the entire scene in a single feedforward step without multiple iterations
Exhibits performance comparable to optimization methods on large-scale scene datasets
Achieves two orders of magnitude greater efficiency, significantly reducing computational resource consumption
Supports extensive view coverage and high-quality photorealistic reconstructions
How to Use
1. Prepare a series of input images for the scenes to be reconstructed, with a minimum resolution of 960x540.
2. Ensure you have compatible GPU hardware, such as an A100 80G GPU.
3. Load the input images and the Long-LRM model into the computing environment.
4. Configure the model parameters, including token merging strategy and Gaussian trimming threshold.
5. Run the Long-LRM model and wait for it to process the input images and generate 3D reconstruction results.
6. Review and evaluate the reconstructed 3D scenes, and perform post-processing and optimization as necessary.
7. Apply the reconstructed 3D scenes to the desired domains, such as 3D printing, virtual reality, or game development.
Featured AI Tools

Meshpad
MeshPad is an innovative generative design tool that focuses on creating and editing 3D mesh models from sketch input. It achieves complex mesh generation and editing through simple sketch operations, providing users with an intuitive and efficient 3D modeling experience. The tool is based on triangular sequence mesh representation and utilizes a large Transformer model to implement mesh addition and deletion operations. Simultaneously, a vertex alignment prediction strategy significantly reduces computational cost, making each edit take only a few seconds. MeshPad surpasses existing sketch-conditioned mesh generation methods in mesh quality and has received high user recognition in perceptual evaluation. It is primarily aimed at designers, artists, and users who need to quickly perform 3D modeling, helping them create artistic designs in a more intuitive way.
3D modeling
182.2K

Spatiallm
SpatialLM is a large language model designed for processing 3D point cloud data. It generates structured 3D scene understanding outputs, including semantic categories of building elements and objects. It can process point cloud data from various sources, including monocular video sequences, RGBD images, and LiDAR sensors, without requiring specialized equipment. SpatialLM has significant application value in autonomous navigation and complex 3D scene analysis tasks, significantly improving spatial reasoning capabilities.
3D modeling
153.7K