Yipeng Yu

Yipeng Yu (俞一鹏)

Feel free to call me ian. I'm currently an AI researcher in Taobao & Tmall Group at Alibaba. My current research focuses on GenAI and Agentic AI.

I obtained my Ph.D. degree in Computer Science from Zhejiang University under the supervision of Prof. Gang Pan and Prof. Zhaohui Wu, where I investigated Cyborg Intelligence (混合智能) using bidirectional Brain-Computer Interfaces. During this period, I also visited the National University of Singapore as a visiting scholar under Prof. Kay Chen TAN, where I conducted research on Evolutionary Computation with Deep Neural Networks.

yypzju ατ 163 Doτ com

Career

    Current Manager & Senior Staff Algorithm Engineer E-commerce, Taobao & Tmall Group, Alibaba Hangzhou, China
    Previous Tech Leader & Senior Researcher Games, Interactive Entertainment Group, Tencent Shanghai, China
    Earlier Research Scientist Dialog & IoT, China Research Lab, IBM Beijing, China

Paper

# denotes equal contribution, and * denotes corresponding author.
Deep Research
Deep Research of Deep Research: From Transformer to Agent, From AI to AI for Science
Yipeng Yu
arXiv, 2026

This paper provides a deep research of deep research. We position LLMs and Stable Diffusion as the twin pillars of generative AI, and lay out a roadmap evolving from the Transformer to agents. We examine the progress of AI4S across various disciplines. We identify the predominant paradigms of human-AI interaction and prevailing system architectures, and discuss the major challenges and fundamental research issues that remain. AI supports scientific innovation, and science also can contribute to AI growth (Science for AI, S4AI).

TaoType
TaoType: Predicting Fine-Grained Typing Intent for Faster Search
ACL 2026, The 64th Annual Meeting of the Association for Computational Linguistics

Is the user’s current query input exactly what they intend to search for?" Our work aims to answer this question by determining, at each typing, whether the current query is complete. If so, a search is implicitly triggered in advance without waiting for user confirmation. This approach reduces response time and enhances the user search experience. Specifically, we propose TaoType, a client-side framework that introduces innovation in data sampling, feature selection, model design and training, and online strategy. Experiments in a leading mobile shopping application named Taobao validate its effectiveness, achieving offline precision/recall/accuracy of 0.7936/0.8196/0.7742, respectively, and decreasing online response time by 640.51±93.65 milliseconds, which is of great benefit to the search system.

RecGPT
RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation
SIGIR 2026, The 49th International ACM SIGIR Conference on Research and Development in Information Retrieval

In this paper, we propose RecGPT-Mobile, a framework that designs a lightweight LLM-based intent understanding agent to improve recommendation quality in mobile e-commerce scenarios. By deploying LLM directly on mobile devices, our approach can capture the evolving interests of users more quickly and adjust the recommendation results in real time. Extensive offline analyzes and online experiments demonstrate that our method significantly improves the accuracy of recommendation results, laying a practical path for LLM deployment in production-scale recommendation systems on mobile devices, as well as a scalable solution for integrating LLMs into real-world next-query prediction systems.

D2C
1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World
Qiao Xu#, Yipeng Yu#*, Chengxiao Feng, Xu Liu
arXiv:2602.18548

We introduce 1D-Bench, a benchmark grounded in real e-commerce workflows, where each instance provides a reference rendering and an exported intermediate representation that may contain extraction errors. 1D is short for one day, representing the efficient completion of design-to-code tasks in less than one day. Models take both as input, using the intermediate representation as structural cues while being evaluated against the reference rendering, which tests robustness to intermediate representation defects rather than literal adherence. 1D-Bench requires generating an executable React codebase under a fixed toolchain with an explicit component hierarchy, and defines a multi-round setting in which models iteratively apply component-level edits using execution feedback. Experiments on commercial and open-weight multimodal models show that iterative editing generally improves final performance by increasing rendering success and often improving visual similarity. We further conduct a pilot study on post-training with synthetic repair trajectories and reinforcement learning based editing, and observe limited and unstable gains that may stem from sparse terminal rewards and high-variance file-level updates.

Topology
SIT-KGED: Simply Inject Topology into LLM for Knowledge Graph Error Detection Code
WWW, Short, Proceedings of the ACM Web Conference 2026

In this work, we extend KGED to a four-class classification task to identify which element is incorrect or whether the triple is correct. Directly adapting existing LLM-based methods to this setting yields limited performance, as they fail to effectively exploit the topological information of KGs. To address this issue, we propose a novel topology-injected LLM framework, which precomputes high-order common neighbors between head and tail entities as topological evidence, thereby reducing complexity. Furthermore, we introduce a mixture-of-experts adapter that maps structural and topological evidence into the text embedding space.

Document
SEAL: Structure and Element Aware Learning Improves Long Structured Document Retrieval Code
EMNLP, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We propose SEAL, a novel contrastive learning framework. It leverages structure-aware learning to preserve semantic hierarchies and masked element alignment for fine-grained semantic discrimination. Furthermore, we release StructDocRetrieval, a long structured document retrieval dataset with rich structural annotations. Extensive experiments on both the released and industrial datasets across various modern PLMs, and online A/B testing demonstrate consistent improvements, boosting NDCG@10 from 73.96% to 77.84% on BGE-M3.

Behavioral Cues
Beyond algorithms: Utilizing multi-modal emotional and behavioral cues as novel predictors of short-video consumption
Computers in Human Behavior Reports [J], 2025

This study aims to address this gap by incorporating users' emotional and behavioral indicators into the prediction of short-video viewing behavior. Study 1 was conducted in a controlled laboratory setting, where participants viewed videos on a computer screen while their physiological activity was recorded as an objective measure of emotional responses. Study 2 aimed to enhance ecological validity by having participants view videos on mobile phones, enabling them to swipe between videos as they would in a typical short-video app. Our findings provide valuable insights into the psychological drivers of short video viewing behavior and present a novel, non-intrusive approach to incorporate users’ real-time experiences into recommendation systems.

Condition
Resolving multi-condition confusion for finetuning-free personalized image generation Code
AAAI 2025, The 39th Annual AAAI Conference on Artificial Intelligence

In this work we investigate the relevance of different positions of the latent image features to the target object in diffusion model, and accordingly propose a weighted-merge method to merge multiple reference image features into the corresponding objects. Next, we integrate this weighted-merge method into existing pre-trained models and continue to train the model on a multi-object dataset constructed from the open-sourced SA-1B dataset.

VideoMaster
VideoMaster: A Multimodal Micro Game Video Recreator Video
Yipeng Yu*, Xiao Chen, Hui Zhan
IJCAI 2023, Demo, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

To free human from laborious video production, this paper proposes the building of VideoMaster, a multimodal system equipped with four capabilities: highlight extraction, video describing, video dubbing and video editing. It extracts interesting episodes from long game videos, generates subtitles for each episode, reads the subtitles through synthesized speech, and finally re-creates a better short video through video editing. To the best of our knowledge, VideoMaster is the first multimedia system that can automatically produce product-level micro-videos without heavy human annotation.

Text2Video
Text2Video: Automatic Video Generation Based on Text Scripts Video
ACM MM 2021, Proceedings of the 29th ACM International Conference on Multimedia

We present Text2Video, a novel system to automatically produce videos using only text-editing for novice users. Given an input text script, the director-like system can generate game-related engaging videos which illustrate the given narrative, provide diverse multi-modal content, and follow video editing guidelines. The system involves five modules.

ratbot
Intelligence-Augmented Rat Cyborgs in Maze Solving Video
PLOS One [J], 2016

In this paper, we build rat cyborgs to demonstrate how they can expedite the maze escape task with integration of machine intelligence. We compare the performance of maze solving by computer, by individual rats, and by computer-aided rats (i.e. rat cyborgs). They were asked to find their way from a constant entrance to a constant exit in fourteen diverse mazes. Performance of maze solving was measured by steps, coverage rates, and time spent. The experimental results with six rats and their intelligence-augmented rat cyborgs show that rat cyborgs have the best performance in escaping from mazes. These results provide a proof-of-principle demonstration for cyborg intelligence. In addition, our novel cyborg intelligent system (rat cyborg) has great potential in various applications, such as search and rescue in complex terrains.

Intellectual Property

Granted patents, most U.S. patents are sorted alphabetically by name.

🇨🇳 Chinese Patents

🇨🇳Copyright of Computer Software

Statistics

👥 Visitors: --  |  👁️ Page Views: --