Tianshuo Yang | 杨天硕

Ph.D. Student

Department of Computer Science
The University of Hong Kong

Email: zjuyangts@gmail.com

[Google Scholar][GitHub][Twitter]

Short Bio

I am a second-year Ph.D. student at MMLab, The University of Hong Kong, fortunately advised by Prof. Ping Luo. Before that, I received my B.Eng. Degree and Honors Degree (Chu Kochen Honors College) from Zhejiang University. I spent a wonderful time at Shanghai AI Laboratory as a research intern, mentored by Prof. Yao Mu and Dr. Wenqi Shao.

I have experience in spatial reasoning for VLMs, 2D/3D generation, editing, and reconstruction, as well as manipulation VLA. Currently, I'm interested in embodied world models, especially action-conditioned world (video AND 3D) generation. Looking for internship opportunities, feel free to reach out.

Design the World , Play with the World , then Learn the World

Selected Publications

(* denotes equal contribution, and # denotes corresponding author)

Design the World

2D/3D Generation & Editing, 3D/4D Reconstruction.

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Lirui Zhao^*, Tianshuo Yang^*, Wenqi Shao^*, Yuxin Zhang, Yu Qiao, Ping Luo, Kaipeng Zhang^#, Rongrong Ji^#

arXiv, 2024

[Project Page] [Paper] [Code]

Diffree enables text-only object addition by predicting where to place a new object and inpainting it with context-consistent appearance.

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Lumina Team: Tianshuo Yang (Core Contributor for 3D Generation)

ICLR Spotlight, 2025

[Project Page] [Paper] [Code]

Lumina-T2X introduces a unified flow-based diffusion transformer framework for text-conditioned generation across images, videos, 3D views, and audio at flexible resolutions and durations.

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

Yutian Chen, Shi Guo^#, Renbiao Jin, Tianshuo Yang, Xin Cai, Yawen Luo, Mingxin Yang, Mulin Yu, Linning Xu, Tianfan Xue

arXiv, 2026

[Project Page] [Paper] [Code]

AnyRecon enables high-quality and large-scale 3D reconstruction from sparse inputs.

4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture

Yutian Chen, Shi Guo, Tianshuo Yang, Lihe Ding, Xiuyuan Yu, Jinwei Gu, Tianfan Xue

ACM SIGGRAPH Asia, 2025

[Project Page] [Paper] [Code]

Our method can reconstruct high speed and complex 4D motion with high quality.

Play with the World

Embodied reasoning and manipulation VLA.

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

Tianshuo Yang^*, Guanyu Chen^*, Yutian Chen, Zhixuan Liang, Yitian Liu, Zanxin Chen, Chunpu Xu, Haotian Liang, Jiangmiao Pang, Yao Mu^#, Ping Luo^#

arXiv, 2026

[Project Page] [Paper] [Code]

A visual-grounded-centric hierarchical framework that explicitly decouples high-level semantic planning from low-level motor control.

Discrete Diffusion VLA project thumbnail

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Tian Nian, Liuao Pei, Shunbo Zhou, Xiaokang Yang, Jiangmiao Pang, Yao Mu^#, Ping Luo^#

arXiv, 2026

[Project Page] [Paper] [Code]

Learn the World

Spatial Intelligence.

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Fanqing Meng^*, Jin Wang^*, Chuanhao Li^*, Quanfeng Lu, Hao Tian, Tianshuo Yang, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang^#, Wenqi Shao^#

ICLR, 2025

[Project Page] [Paper] [Code]

Honors

HKU Presidential PhD Scholarship (HKU-PS)

2024
Outstanding Graduate of Zhejiang University

2024
Xiaomi Scholarship

2023

Academic Service

Conference Reviewer: CVPR, ECCV, ICLR