1Department of Automation, Tsinghua University 2NNKosmos Technology
Fig 1. Given 4 sparse static RGB camera views of a dynamic scene (a), our proposed Tensor4D decomposition enables multiview reconstruction to achieve fine-grained geometry reconstruction even on human fingers (b) and temporal-consistent novel view synthesis on a 3D holographic display (c,d,e). The proposed method demonstrates low-cost, portable and highly immersive telepresence experience.
We present Tensor4D, an efficient yet effective approach to dynamic scene modeling. The key of our solution is an efficient 4D tensor decomposition method so that the dynamic scene can be directly represented as a 4D spatio-temporal tensor. To tackle the accompanying memory issue, we decompose the 4D tensor hierarchically by projecting it first into three time-aware volumes and then nine compact feature planes. In this way, spatial information over time can be simultaneously captured in a compact and memory-efficient manner. When applying Tensor4D for dynamic scene reconstruction and rendering, we further factorize the 4D fields to different scales in the sense that structural motions and dynamic detailed changes can be learned from coarse to fine. The effectiveness of our method is validated on both synthetic and real-world scenes. Extensive experiments show that our method is able to achieve high-quality dynamic reconstruction and rendering from sparse-view camera rigs or even a monocular camera.
Fig 2. Illustration of our hierarchical tri-projection decomposition method. For a neural 4D field f(x, y, z, t), we first decompose the 3D space part from 4D spatio-temporal tensor into three time-aware volumes, which are then further projected onto nine 2D planes.
Fig 3. The framework of Tensor4D for multi-view and monocular reconstruction. a). Tensor4D for multi-view reconstruction. The 4D NeRF-T fields are separately factorized by the LR and HR feature planes. b). Tensor4D for monocular reconstruction. The 4D flow fields are factorized by the LR feature plane for better disentanglement of shape and motion. The 3D canonical representation is factorized by three LR and HR feature planes.
Ruizhi Shao, Zerong Zheng, Hanzhang Tu, Boning Liu, Hongwen Zhang, Yebin Liu. "Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering". CVPR 2023
@misc{shao2023tensor4d,
title = {Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering},
author = {Shao, Ruizhi and Zheng, Zerong and Tu, Hanzhang and Liu, Boning and Zhang, Hongwen and Liu, Yebin},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year = {2023}
}