👋 About Me

I am currently a PhD student jointly affiliated with the State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS) at the Institute of Automation, Chinese Academy of Sciences (CASIA), under the supervision of Prof. Cheng-Lin Liu.

🎓 Education Background

  • 2024.09 - Present: Ph.D. Candidate in Pattern Recognition and Intelligent Systems, CASIA-MAIS
  • 2021.09 - 2024.06: M.S. in Electronic Information, NLPR, CASIA
  • 2017.09 - 2021.06: B.E. in Space Science and Technology, Xidian University

🔬 Research Focus

My research centers on data modeling — developing structured representations and learning frameworks that bridge perception, language, scientific knowledge, and physical interaction. Specifically, it includes the following directions:

  • 🧬 AI for Science: AI-driven vaccine adjuvant discovery and development
  • 🗃️ Multimodal Large Language Models: Reliable reasoning, inference acceleration, vision token optimization, video understanding and streaming models
  • 🤖 State and Action: Visual Language Navigation and Visual Language Action Model
  • ✍️ Handwritten Text Recognition & Generation: Online Chinese text recognition and synthesis

📊 Academic Impact

You can find my publications on Google Scholar and connect with me through various academic platforms listed in the sidebar.

🔥 News

  • 2026.04:  🎉🎉 Our paper accepted to ACL 2026 Findings: MR-ALIGN: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models.
  • 2026.03:  🎉🎉 Our paper accepted to Materials Genome Engineering Advances: An Efficient Strategy for Data-constrained Machine Learning in Materials Science.
  • 2026.02:  🎉🎉 Three papers accepted to CVPR 2026! Including “MeteorPred” (meteorological multimodal model), “ChartAgent” (chart understanding framework), and “Fine-Grained Post-Training Quantization” (VLM optimization).
  • 2026.01:  🎉🎉 Three papers accepted to top-tier conferences! Two papers to ICLR 2026: “An Open-Ended Benchmark for Adjuvant Research with MLLM” and “One Patch Doesn’t Fit All” (adaptive patching for MLLMs). One paper to ICRA 2026: “RANGER” (monocular zero-shot semantic navigation).
  • 2025.11:  🎉🎉 One paper accepted to AAAI 2026! “VAGU & GtS: LLM-Based Benchmark and Framework for Joint Video Anomaly Grounding and Understanding” - a comprehensive framework for video anomaly detection and understanding.

📝 Publications

arXiv 2026
adjuvant framework

SAVANT: A Neuro-Symbolic Verification Framework for Adjuvant Design

Yi Chen, Yu Zhang, Jian Xu, Xu-Yao Zhang, Hua Yue, Xinming Wang, Zequan Lyu, Boran Wang, Hongyi Liu, Yan Wang, Peiyuan Cao, Wei Wei, Cheng-Lin Liu

arXiv 2026

  • First proposed a neuro-symbolic verification framework for LLM-generated adjuvant designs.
  • First formalized adjuvant design verification as literature-grounded mechanistic proof checking.
  • First introduced a three-stage verification pipeline covering precedent, immune outcome, and mechanism chain.
  • Enabled interpretable identification of supported mechanisms, weak evidence, and knowledge gaps.
arXiv 2026
vistopo framework

Topology-Aware Visual Prompts are Weakly Supervised Spatial Grounding Learners

Yi Chen, MingMing Yu, Boran Wang, Jie Gu, Chu Tang, Jingmin Chen, Rui-Qi Wang

arXiv 2026

  • First formulated MLLM spatial failures as a referent-relation language grounding problem.
  • First proposed VisTopo, a topology-aware prompting method for explicit referent-relation modeling.
  • Introduced region-anchor tokens and relation-prefix tokens to expose spatial structure before answer generation.
  • Enabled weakly supervised spatial grounding using only standard VQA supervision, without external spatial annotations or perception backbones.
ICLR 2026
adjuvant framework

An Open-Ended Benchmark and Formal Framework for Adjuvant Research with MLLM

Yi Chen*, Yu Zhang*, Jian Xu, Xu-Yao Zhang, Hua Yue, Xinming Wang, Zequan Lyu, Wei Wei, Cheng-Lin Liu

ICLR 2026

  • First benchmark dedicated to adjuvant research using multimodal large language models
  • Formal framework for representing adjuvant design principles and immune mechanisms
AAAI 2025
recoverable compression

Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

Yi Chen, Jian Xu, Xu-Yao Zhang, Wen-Zhuo Liu, Yang-Yang Liu, Cheng-Lin Liu

AAAI 2025

  • Text-guided dynamic visual token recovery mechanism for multimodal models
  • Achieves comparable performance while compressing visual tokens to 10% of the original quantity

🧬 AI for Science & Scientific Computing

🗃️ Multimodal Large Language Models

🧠 Machine Learning

🤖 Embodied Intelligence & Robotics

🛠️ Intelligent Agents

🔍 Video Analysis & Anomaly Detection

📊 NLP & Information Processing

✍️ Handwritten Text Recognition & Generation

🎖 Honors and Awards

  • 2026 ICML 2026 Gold Reviewer
  • 2025 Academic Research Star, National AI Academy Beijing Zhongguancun Academy
  • 2025 Best Paper Award, AIHCIR 2025 (for “ManiNet: Manifold Network for Few-Shot Learning”)
  • 2024 3rd Place, ICDAR2024 Competition on Multi Font Group Recognition and OCR

📖 Education

  • 2024.09 - Present, Ph.D. Candidate in Pattern Recognition and Intelligent Systems
    State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences & Zhongguancun Academy
    Supervisor: Prof. Cheng-Lin Liu

  • 2021.09 - 2024.06, M.S. in Electronic Information
    National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences
    Supervisor: Prof. Cheng-Lin Liu

  • 2017.09 - 2021.06, B.E. in Detection Guidance and Control Technology
    School of Space Science and Technology, Xidian University

🔬 Research Interests

  • AI for Science: Applying artificial intelligence to scientific discovery, particularly in adjuvant research and materials science
  • Multimodal Large Language Models (MLLMs): Developing robust and efficient multimodal AI systems
  • Online Handwritten Text Recognition: Recognition and generation of handwritten Chinese text
  • Computer Vision: Image understanding, visual reasoning, and multimodal perception

🤝 Academic Service

  • Journal Reviewer: IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), Transactions on Machine Learning Research (TMLR)
  • Program Committee Member: AAAI 2026
  • Conference Reviewer: ICLR 2026, CVPR 2026, ICML 2026, ECCV 2026

🌟 Open Source Contributions

PaddleScience Contributor: Integrated Crystal Graph CNN (CGCNN) model for materials chemistry applications PR #977
  • Implemented full pipeline for crystal structure data processing and graph neural network training
  • Code merged into official repository and featured as an official case study

📧 Contact

  • Email: yi.chen@nlpr.ia.ac.cn
  • Office: State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
  • Address: Beijing 100190, China

Open to collaboration and academic exchange. Please feel free to contact me via email.