About me
I am a Ph.D. candidate in Computer Science at the University of Texas at Dallas, advised by Prof. Yapeng Tian. My research focuses on multimodal AI, with an emphasis on audio-visual learning, video understanding, and LLM-augmented multimodal systems. I am particularly interested in building robust and agentic AI systems that operate reliably in real-world settings.
Prior to UTD, I received a Bachelor of Science in Mathematics from Sichuan University.
News
Publications
Building robust and agentic multimodal AI systems for real-world understanding and assistance.
Theme A centers on multimodal perception that remains stable under ambiguity, spurious cues, and distribution shift. This line spans robust audio-visual segmentation, open-set adaptation, and benchmark-driven reliability analysis.
Theme B focuses on agentic multimodal reasoning: systems that do not just encode video, but iteratively reason over it using tools, skills, and verification. The goal is structured multimodal decision-making beyond one-shot inference.
Theme C grounds the agenda in egocentric and real-world AI applications, especially online gaze understanding, assistive computing, and multimodal systems that must operate causally and efficiently in deployment settings.
Projects