Jia Li

About me

I am a Ph.D. candidate in Computer Science at the University of Texas at Dallas, advised by Prof. Yapeng Tian. My research focuses on multimodal AI, with an emphasis on audio-visual learning, video understanding, and LLM-augmented multimodal systems. I am particularly interested in building robust and agentic AI systems that operate reliably in real-world settings.

Prior to UTD, I received a Bachelor of Science in Mathematics from Sichuan University.

News

Publications

Building robust and agentic multimodal AI systems for real-world understanding and assistance.

A Track A

Multimodal Perception & Robustness

Robust audio-visual perception, out-of-distribution understanding, and reliable multimodal systems under real-world shifts.

Tap to open this research track

Theme A centers on multimodal perception that remains stable under ambiguity, spurious cues, and distribution shift. This line spans robust audio-visual segmentation, open-set adaptation, and benchmark-driven reliability analysis.

B Track B

Agentic Multimodal Reasoning

LLM-integrated reasoning over visual and multimodal evidence with tool use, verification, and structured decision-making.

Tap to open this research track

Theme B focuses on agentic multimodal reasoning: systems that do not just encode video, but iteratively reason over it using tools, skills, and verification. The goal is structured multimodal decision-making beyond one-shot inference.

C Track C

Egocentric & Real-World AI Applications

Causal egocentric perception and assistive multimodal AI designed for deployment, accessibility, and human-centered impact.

Tap to open this research track

Theme C grounds the agenda in egocentric and real-world AI applications, especially online gaze understanding, assistive computing, and multimodal systems that must operate causally and efficiently in deployment settings.

Projects

Resume

Education

University of Texas at Dallas
2023 — Present
Artificial Intelligence; Machine Learning; Data Structure and Algorithm ...
Sichuan University
2018 — 2023
Probability Theory, General Topology, Functional Analysis ...

Experience

Research Assistant
2024 — Present
Working on LLM and HAYSTAC Projects.
Teaching Assistant
2023 — 2024
Operation Systems & Discrete Math

My skills

Python - data preprocess
80%
Pytorch - Deep learning Architecture
70%
Linux
90%

publications

A Track A

Multimodal Perception & Robustness

Robust audio-visual perception, out-of-distribution understanding, and reliable multimodal systems under real-world shifts.

Tap to open this research track

B Track B

Agentic Multimodal Reasoning

LLM-integrated reasoning over visual and multimodal evidence with tool use, verification, and structured decision-making.

Tap to open this research track

C Track C

Egocentric & Real-World AI Applications

Causal egocentric perception and assistive multimodal AI designed for deployment, accessibility, and human-centered impact.

Tap to open this research track

About me

News

Publications

Multimodal Perception & Robustness

Agentic Multimodal Reasoning

Egocentric & Real-World AI Applications

Projects

Resume

Education

University of Texas at Dallas

Sichuan University

Experience

Research Assistant

Teaching Assistant

My skills

Python - data preprocess

Pytorch - Deep learning Architecture

Linux

Portfolio

Finance

Orizon

Fundo

Brawlhalla

DSM.

MetaSpark

Summary

Task Manager

Arrival

publications

Multimodal Perception & Robustness

Agentic Multimodal Reasoning

Egocentric & Real-World AI Applications

Multimodal Perception & Robustness

Agentic Multimodal Reasoning

Egocentric & Real-World AI Applications

Daniel lewis

Education

University of Texas at Dallas

Sichuan University

Experience

Research Assistant

Teaching Assistant

My skills

Python - data preprocess

Pytorch - Deep learning Architecture

Linux

Finance

Orizon

Fundo

Brawlhalla

DSM.

MetaSpark

Summary

Task Manager

Arrival

Multimodal Perception & Robustness

Agentic Multimodal Reasoning

Egocentric & Real-World AI Applications