Tianshuo Yang | 杨天硕

Ph.D. Student

Department of Computer Science
The University of Hong Kong

Email: zjuyangts@gmail.com

[Google Scholar][GitHub][Twitter]

Short Bio

I am a second-year Ph.D. student at MMLab, The University of Hong Kong, fortunately advised by Prof. Ping Luo. Before that, I received my B.Eng. Degree and Honors Degree (Chu Kochen Honors College) from Zhejiang University. I spent a wonderful time at Shanghai AI Laboratory as a research intern, mentored by Prof. Yao Mu and Dr. Wenqi Shao.

I have experience in spatial reasoning for VLMs, 2D/3D generation, editing, and reconstruction, as well as manipulation VLA. Currently, I'm interested in embodied world models, especially action-conditioned world (video AND 3D) generation. Looking for internship opportunities, feel free to reach out.

Design the World , Play with the World , then Learn the World

Selected Publications

(* denotes equal contribution, and # denotes corresponding author)

Design the World

2D/3D Generation & Editing, 3D/4D Reconstruction.

Diffree project thumbnail Diffree large preview
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Lirui Zhao*, Tianshuo Yang*, Wenqi Shao*, Yuxin Zhang, Yu Qiao, Ping Luo, Kaipeng Zhang#, Rongrong Ji#
arXiv, 2024

Diffree enables text-only object addition by predicting where to place a new object and inpainting it with context-consistent appearance.

Lumina-T2X project thumbnail Lumina-T2X large preview
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Lumina Team: Tianshuo Yang (Core Contributor for 3D Generation)
ICLR Spotlight, 2025

Lumina-T2X introduces a unified flow-based diffusion transformer framework for text-conditioned generation across images, videos, 3D views, and audio at flexible resolutions and durations.

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
arXiv, 2026

AnyRecon enables high-quality and large-scale 3D reconstruction from sparse inputs.

4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture
Yutian Chen, Shi Guo, Tianshuo Yang, Lihe Ding, Xiuyuan Yu, Jinwei Gu, Tianfan Xue
ACM SIGGRAPH Asia, 2025

Our method can reconstruct high speed and complex 4D motion with high quality.

Play with the World

Embodied reasoning and manipulation VLA.

HiVLA project thumbnail HiVLA large preview
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
Tianshuo Yang*, Guanyu Chen*, Yutian Chen, Zhixuan Liang, Yitian Liu, Zanxin Chen, Chunpu Xu, Haotian Liang, Jiangmiao Pang, Yao Mu#, Ping Luo#
arXiv, 2026

A visual-grounded-centric hierarchical framework that explicitly decouples high-level semantic planning from low-level motor control.

Discrete Diffusion VLA project thumbnail Discrete Diffusion VLA large preview
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Tian Nian, Liuao Pei, Shunbo Zhou, Xiaokang Yang, Jiangmiao Pang, Yao Mu#, Ping Luo#
arXiv, 2026

Learn the World

Spatial Intelligence.

MMIU project thumbnail MMIU large preview
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng*, Jin Wang*, Chuanhao Li*, Quanfeng Lu, Hao Tian, Tianshuo Yang, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang#, Wenqi Shao#
ICLR, 2025

Honors

Academic Service