1Zhejiang University 2GigaAI 3University of Electronic Science and Technology of China
4The Chinese University of Hong Kong 5Tsinghua University 6Monash University
* Equal contribution † Corresponding authors
VolSplat improves multi-view consistency and geometric accuracy for feed-forward 3DGS with voxel-aligned prediction.
Pixel-aligned feed-forward 3DGS methods suffer from two primary limitations: 1) 2D feature matching struggles to effectively resolve the multi-view alignment problem, and 2) the Gaussian density is constrained and cannot be adaptively controlled according to scene complexity. We propose VolSplat, a framework that directly regresses Gaussians from 3D features based on a voxel-aligned prediction strategy. This approach achieves adaptive control over scene complexity and resolves the multi-view alignment challenge.
Overview of our VolSplat. Given multi-view images as input, we first extract 2D features for each image using a Transformer-based network and construct per-view cost volumes with plane sweeping. Depth Prediction Module then estimates a depth map for each view, which is used to unproject the 2D features into 3D space to form a voxel feature grid. Subsequently, we employ a sparse 3D decoder to refine these features in 3D space and predict the parameters of a 3D Gaussian for each occupied voxel. Finally, novel views are rendered from the predicted 3D Gaussians.
GNN | MVSplat | TransSplat | DepthSplat | VolSplat (Ours) | Ground Truth |
---|---|---|---|---|---|
DepthSplat | VolSplat (Ours) |
---|---|
GNN | MVSplat | TransSplat | DepthSplat | VolSplat (Ours) | Ground Truth |
---|---|---|---|---|---|
@article{wang2025volsplat,
title={VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction},
author={Wang, Weijie and Chen, Yeqing and Zhang, Zeyu and Liu, Hengyu and Wang, Haoxiao and Feng, Zhiyuan and Qin, Wenkang and Zhu, Zheng and Chen, Donny Y. and Zhuang, Bohan},
journal={arXiv preprint arXiv:2509.19297},
year={2025}
}