Weijie Wang | 王伟杰

I am a first-year Ph.D. student in Computer Science at College of Computer Science and Technology, Zhejiang University, advised by Prof. Bohan Zhuang. My research interests lie in the fields of efficient large 3D model/world model. I am currently a research intern at ByteDance Seed.

Previously, I was a research intern at Microsoft Research Asia. I obtained my B.E. degree from College of Computer Science and Technology and Chu Kochen Honors College (ACEE) at Zhejiang University in 2025. I was fortunate to work closely with Dr. Donny Y. Chen, Prof. Jianfei Cai, Prof. Chunhua Shen, Prof. Andreas Geiger, Prof. Marc Pollefeys, Prof. Jiawang Bian, Prof. Chuanxia Zheng and Dr. Zheng Zhu.

If you are interested in working with us as an intern at Zhejiang University, please feel free to email me and CC Prof. Bohan Zhuang.

Email / Google Scholar / Github / X / LinkedIn / CV

News

[2026/06] 🎉 Two papers (VolSplat, DiGSeg) have been accepted to ECCV 2026!
[2026/05] 🎉 Our paper DriveGen3D was selected for an Oral Presentation (Top 3%) at ICME 2026!
[2026/05] 🎉 Two papers (World-R1, Flash-GRPO) have been accepted to ICML 2026!
[2026/04] 🔥 World-R1 was selected as Hugging Face #1 Paper of the Day!
[2026/04] 🎉 Our paper CoV has been accepted to ACL 2026 Findings!
[2026/03] 🎉 Our paper DriveGen3D has been accepted to ICME 2026!
[2026/01] 🎉 Two papers (MV-RoboBench, ReVisual-R1) have been accepted to ICLR 2026!
[2025/11] 🎉 Our paper PM-Loss has been accepted to 3DV 2026!
[2025/10] 🏅 I received the NeurIPS Scholar Award.
[2025/09] 🎉 Our paper ZPressor has been accepted to NeurIPS 2025!
[2025/06] 🎉 Our paper WonderTurbo has been accepted to ICCV 2025!
[2025/06] 🏅 I received the Outstanding Graduates of Zhejiang University and Outstanding Thesis of Zhejiang University awards.
[2025/01] 🎉 Our paper TransDiff has been accepted to ICRA 2025!
[2024/07] 🎉 Our paper Street Gaussians has been accepted to ECCV 2024!
[2023/11] 🏅 I received the Zhejiang Government Scholarship.
[2023/01] 🏅 I was named a Meritorious Winner of The Interdisciplinary Contest in Modeling.

Selected Publications

* Equal contribution / † Corresponding authors

	Latent Spatial Memory for Video World Models Weijie Wang, Haoyu Zhao, Yifan Yang, Feng Chen, Zeyu Zhang, Yefei He, Zicheng Duan, Donny Y. Chen, Yuqing Yang, Bohan Zhuang† Paper / Project / Code Latent Spatial Memory stores persistent 3D scene content directly as latent tokens for efficient, spatially consistent video world models.
	TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction Weijie Wang, Zimu Li, Jinchuan Shi, Zeyu Zhang, Botao Ye, Marc Pollefeys, Donny Y. Chen, Bohan Zhuang† Paper / Project / Code / Models TriSplat predicts simulation-ready triangle primitives for feed-forward sparse-view 3D scene reconstruction.
	World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang†, Zeyu Zhang, Yefei He, Yanbo Ding, Xirui Hu, Donny Y. Chen, Zhiyuan He, Yuqing Yang†, Bohan Zhuang† Paper / Project / Code / Hugging Face #1 Paper of the Day ICML 2026* World-R1 aligns text-to-video generation with 3D constraints through reinforcement learning, improving geometric consistency while preserving visual quality and motion diversity.
	Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective Weijie Wang, Qihang Cao, Sensen Gao*, Donny Y. Chen, Haofei Xu, Wenjing Bian, Songyou Peng, Tat-Jen Cham, Chuanxia Zheng, Andreas Geiger, Jianfei Cai, Jia-Wang Bian†, Bohan Zhuang† Paper / Project / GitHub We present a problem-driven survey of feed-forward 3D scene modeling, covering representative methods, datasets, and downstream applications.
	CoV: Chain-of-View Prompting for Spatial Reasoning Haoyu Zhao, Akide Liu, Zeyu Zhang, Weijie Wang, Feng Chen, Ruihan Zhu, Gholamreza Haffari, Bohan Zhuang† Paper / Project / Code ACL 2026 Findings** We propose Chain-of-View (CoV) prompting, a training-free test-time reasoning framework that transforms VLMs into active viewpoint reasoners for spatial reasoning in 3D environments.
	DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Weijie Wang, Jiagang Zhu, Zeyu Zhang, Xiaofeng Wang, Zheng Zhu†, Guosheng Zhao, Chaojun Ni, Haoxiao Wang, Guan Huang, Xinze Chen, Yukun Zhou, Wenkang Qin, Duochao Shi, Haoyun Li, Yicheng Xiao, Donny Y. Chen, Jiwen Lu Paper / Project ICME 2026 Oral (Top 3%) We present DriveGen3D, a novel framework for generating high-quality and highly controllable dynamic 3D driving scenes that addresses critical limitations in existing methodologies.
	VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction Weijie Wang, Yeqing Chen, Zeyu Zhang, Hengyu Liu, Haoxiao Wang, Zhiyuan Feng, Wenkang Qin, Jia-Wang Bian, Zheng Zhu†, Donny Y. Chen, Bohan Zhuang† Paper / Project / Code / Models ECCV 2026 VolSplat improves multi-view consistency and geometric accuracy for feed-forward 3DGS with voxel-aligned prediction.
	Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jia-Wang Bian, Bohan Zhuang† Paper / Project / Code / Models 3DV 2026** We introduce PM-Loss, a novel regularization loss based on a learned point map for feed-forward 3DGS, leading to smoother 3D geometry and better rendering.
	ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS Weijie Wang†, Donny Y. Chen†, Zeyu Zhang, Duochao Shi, Akide Liu, Bohan Zhuang Paper / Project / Code / Models NeurIPS 2025 ZPressor is an architecture-agnostic module that compresses multi-view inputs for scalable feed-forward 3DGS.
	WonderTurbo: Generating Interactive 3D World in 0.72 Seconds Chaojun Ni, Xiaofeng Wang, Zheng Zhu†, Weijie Wang, Haoyun Li, Guosheng Zhao, Jie Li, Wenkang Qin, Guan Huang, Wenjun Mei† Paper / Project / Code ICCV 2025** We introduce WonderTurbo, the first real-time interactive 3D scene generation framework capable of generating novel perspectives of 3D scenes within 0.72 seconds.

Other Publications

	Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Xiaoxuan He, Siming Fu, Zeyue Xue, Weijie Wang, Ruizhe He, Yuming Li, Dacheng Yin, Shuai Dong, Haoyang Huang, Hongfa Wang, Nan DUAN, Bohan Zhuang† Paper / Project / Code ICML 2026* We introduce Flash-GRPO, a one-step policy optimization framework that improves video diffusion alignment efficiency with iso-temporal grouping and temporal gradient rectification.
	Diffusion Model as a Generalist Segmentation Learner Haoxiao Wang, Antao Xiang, Haiyang Sun, Peilin Sun, Changhao Pan, Yifu Chen, Minjie Hong, Weijie Wang, Shuang Chen, Yue Chen, Zhou Zhao Paper / Code ECCV 2026* We introduce DiGSeg, a diffusion-based generalist segmentation framework that repurposes pretrained diffusion models for text-conditioned semantic and open-vocabulary segmentation.
	Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Zhiyuan Feng, Zhaolu Kang, Qijie Wang, Zhiying Du, Jiongrui Yan, Shubin Shi, Chengbo Yuan, Huizhi Liang, Yu Deng, Qixiu Li, Rushuai Yang, Arctanx An, Leqi Zheng, Weijie Wang, Shawn Chen, Sicheng Xu, Yaobo Liang, Jiaolong Yang†, Baining Guo Paper / Code ICLR 2026 We introduce MV-RoboBench, a benchmark for evaluating multi-view spatial reasoning in robotic manipulation, revealing large gaps between state-of-the-art VLMs and human performance.
	Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Shuang Chen, Yue Guo, Zhaochen Su, Yafu Li, Yulun Wu, Jiacheng Chen, Jiayu Chen, Weijie Wang, Xiaoye Qu†, Yu Cheng† Paper / Code ICLR 2026 We introduce ReVisual-R1, achieving a new state-of-the-art among open-source 7B MLLMs on challenging benchmarks.
	TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image Haoxiao Wang, Kaichen Zhou, Binrui Gu, Zhiyuan Feng, Weijie Wang, Peilin Sun, Yicheng Xiao, Jianhua Zhang, Hao Dong† Paper / Project / Code ICRA 2025 We propose a single-view RGB-D-based depth completion framework, TransDiff, that leverages the Denoising Diffusion Probabilistic Models(DDPM) to achieve material-agnostic object grasping in desktop.
	Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng† Paper / Project / Code / Data ECCV 2024 This paper aims to tackle the problem of modeling dynamic urban street scenes from monocular videos. We introduce Street Gaussians, a new explicit scene representation that tackles some major limitations.

Experience

	ByteDance Seed 2026.01 - Present Research Intern at ByteDance Seed - Seed3D Advisors: Jianfeng Zhang and Qianyi Wu
	Zhejiang University 2025.09 - Present Ph.D. Student in Computer Science and Technology ZIP Lab, State Key Lab of CAD&CG, College of Computer Science and Technology Advisor: Prof. Bohan Zhuang
	Microsoft Research Asia 2025.07 - 2026.01 Research Intern at Microsoft Research Asia - Shanghai Lab Advisors: Yuqing Yang, Yifan Yang and Zhiyuan He
	Zhejiang University 2021.09 - 2025.06 Undergraduate B.E. with Honors in Software Engineering from College of Computer Science and Technology Advanced Class of Engineering Education, Chu Kochen Honors College Research Intern at ZJU3DV, advised by Prof. Xiaowei Zhou and Prof. Sida Peng Research Intern at HICAI-ZJU, advised by Prof. Keyan Ding

Selected Honors and Awards

2025.10 NeurIPS Scholar Award

2025.06 Outstanding Graduates of Zhejiang University

2025.06 Outstanding Thesis of Zhejiang University

2023.11 Zhejiang Government Scholarship

2023.01 Meritorious Winner of The Interdisciplinary Contest in Modeling

Talks

[2026/06/17] Invited talk of Towards Intelligent Interactive 3D-aware Video World Model at NTU Physical Vision Group (PVG). [Slides]

[2025/11/07] Invited talk of ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS by EasyConferee at AI Alumni Center of Caohejing Hi-Tech Park. [Slides]

[2025/06/16] Invited talk of ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS by 3DCVer. [Slides / Video]

Academic Service

Journal Reviewer: IEEE Transactions on Visualization and Computer Graphics (TVCG), The Visual Computer (TVC), IEEE Robotics and Automation Letters (RA-L)

Conference Reviewer: NeurIPS 2025, 3DV 2026, ICRA 2026, ICME 2026, ICML 2026, ECCV 2026, SIGGRAPH Asia 2026

Teaching Assistant

[Spring 2025] Database System, with Prof. Xiaoye Miao

[Spring 2024] Database System, with Prof. Bo Zhou

This template is a modification to Jon Barron's website.