PRISMA : Performs a variety of inferences from images or videos

PRISMA

AI Image Generation AI Video Editing #Deep Learning #Computational Photography #3D Reconstruction Standard Picks Open Source

Overview :

PRISMA is a computational photography pipeline that can perform a variety of inferences from any image or video. Similar to how light is refracted into different wavelengths through a prism, this pipeline expands images into data usable for 3D reconstruction or real-time post-processing operations. It integrates various algorithms and open-source pretrained models, such as monocular depth (MiDAS v3.1, ZoeDepth, Marigold, PatchFusion), optical flow (RAFT), segmentation masks (mmdet), and camera pose estimation (colmap), among others. The results are stored in a folder with the same name as the input file, with each band saved as a separate .png or .mp4 file. For videos, in the final step, it attempts to perform sparse reconstruction, which can be used for NeRFs (such as NVidia's Instant-ngp) or Gaussian diffusion training. The inferred depth information is exported by default as heatmap GLSL/HLSL samples that can be decoded in real-time using LYGIA, and the optical flow is encoded as HUE (angle) and saturation, which can also be decoded in real-time using LYGIA's optical flow GLSL/HLSL sampler.

Target Users :

["3D reconstruction","Image/video post-processing","Generating NeRF training data"]

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 59.1K

Use Cases

Extracting multiple band information from images for analysis

Capturing depth/optical flow information from videos to create 3D effects

Serving as a data source for training NeRF networks

Features

Monocular depth inference

Optical flow estimation