Research – Tsinghua NetMan Lab

Research Plan for Time Series Intelligence

My future research on time-series intelligence will focus on developingthe ability to perceive, reason, predict, and make decisions from continuous, real-time, sensor-based numerical observations in domains such as IT, healthcare, energy, manufacturing, transportation, and environmental systems, where on-premise sensor data are abundant. A fundamental challenge in these domains is that existing time-series algorithms and models often perform poorly in on-premise deployment settings, where data distributions differ from those seen during training, and data quality is often lower than that available during development and evaluation.

To address this challenge, I aim to build an OpenClaw-inspired agentic framework, which I call time-series intelligence in a box. Integrated into a powerful yet compact device such as NVIDIA DGX Spark and operating entirely on-premise, time-series intelligence in a box will efficiently adapt to each new deployment environment and deliver robust time-series intelligence performance. Its first component is a multimodal time-series foundation model that can understand and reason about time-series data through natural language. Its second component is a Python toolkit for time-series algorithms and models, which serves as the skill layer of the agentic framework. Its third component is a collaborative agent layer for automated adaptation to on-premise deployment environments.

First, I will develop a multimodal time-series foundation model that can understand and reason about time-series data using natural language and perform a wide range of standard time-series tasks. These models must also have low inference overhead so that they can run efficiently in on-premise environments on Spark-like devices. By contrast, existing large language models are strong in textual understanding and generation, but remain limited in the core capabilities required for time-series intelligence. I have already made initial contributions in this direction through ChatTS [1] and FoundRoot [2], and I will continue advancing this line of research to address the gap in large-model capabilities for time-series intelligence.

Second, I will develop an on-device skill layer to support the full spectrum of time-series intelligence, including preprocessing, feature engineering, similarity and distance measures, clustering, change-point detection, classification, forecasting, anomaly detection, causal discovery and inference, imputation, generation, and augmentation. My previous work on AutoDA-TimeSeries [3] is one example along this direction. This skill layer will provide a comprehensive methodological foundation together with modular and efficient components that can be readily orchestrated by intelligent agents and designed to work closely with the multimodal time-series foundation model.

Third, I will develop a multi-agent framework for the on-premise adaptation of time-series intelligence. Powered by the aforementioned multimodal time-series foundation model and a diverse set of AutoML capabilities, this framework will dynamically construct the full analytical pipeline. It will support data profiling, data augmentation, data governance, model selection, post-training adaptation, hyperparameter optimization, algorithm fine-tuning, skill orchestration, evaluation, validation, and reporting. Building on my previous work on AutoKAD [4], I will develop a more general collaborative system in which multiple specialized agents work collaboratively across the entire adaptation pipeline until the desired performance target is achieved.

Ultimately, time-series intelligence in a box will be a unified system that integrates a state-of-the-art pre-trained multimodal time-series foundation model, a comprehensive skill layer, and an agentic framework for on-premise adaptation, so that the entire deployment workflow of time-series intelligence in on-premise environments becomes automated and agent-driven. Through this work, I hope to contribute a systematic technical framework and to translate it into a deployable system that enhances the intelligent operations of modern social-cyber-physical infrastructure.

[1] ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning. VLDB 2025

[2] FoundRoot: Towards Foundation Model for Root Cause Analysis via Structured Deep Thinking. ICSE 2026

[3] AutoDA-Timeseries: Automated Data Augmentation for Time Series. ICLR 2026

[4] AutoKAD: Empowering KPI Anomaly Detection with Label-Free Deployment. ISSRE 2023 Best Paper Award

Summary of Previous Research on AIOps and Time Series Intelligence

My past research focuses on Time Series and AIOps (Artificial Intelligence for IT Operations), motivated by a fundamental challenge in modern online services: large-scale Internet systems still lack the ability to perceive, reason, and make decisions effectively in high-dimensional, fast-changing, and failure-prone operational environments.

Research Approach

To address this challenge, I have established a systematic, iterative, and integrated approach to AIOps research through publications, annual algorithm competitions, a graduate-level AIOps course taught in English, the OpenAIOps online community, and large-scale industrial deployment.

I work closely with industry collaborators, including ByteDance, Alibaba, Tencent, eBay, and Baidu, to identify key research problems arising from operational practice, develop feasible solutions, deploy them in production environments, and then use the limitations revealed in practice to motivate the next research question. This practice-driven research loop has shaped my overall research agenda.

I initiated and organized the CCF AIOps Algorithm Challenge, which has been held successfully for eight consecutive years and has attracted participants from more than 1,000 organizations. I also initiated and organized the CCF OpenAIOps online community, which integrates data, models, courses, benchmarks, testbeds, and competitions into a public service platform. It has attracted visits from more than one million independent IP addresses in more than 100 countries and has become a widely influential open platform in the AIOps field.

Research Impact

My AIOps research has generated a coherent body of work with both scientific depth and tangible social and economic impact. In AIOps and related areas, I have published more than 200 papers, with a Google Scholar h-index of 60 and more than 16,000 citations. These research results have been deployed in key sectors including Internet services, finance, telecommunications, energy, and e-government.

In recognition of my contributions to the advancement of AIOps in China in both academia and industry, I received the First Prize of the 2023 Science and Technology Progress Award of the Chinese Institute of Electronics. My AIOps research has also received four best paper awards and three best paper nominations.

Key Scientific Contributions

1.Failure Detection through Anomaly Detection Based on Generative Models and Language Models. I formulated failure detection problems in AIOps as anomaly detection problems across multiple data modalities and published a series of papers in this direction, including 19 A-class (CCF A or TH-CPL A) papers. Donut [1] was among the first methods to apply variational autoencoders (VAEs) to metric anomaly detection and received the Okawa Foundation Research Grant in 2017. TraceVAE [2] was among the first to apply graph VAEs to anomaly detection in microservice call traces. StepWise [3] was among the early concept-drift adaptation algorithms for metric anomaly detection and received the Best Research Paper Award in ISSRE 2018. LogAnomaly [4] was among the first to apply deep learning-based language models to log anomaly detection.

2. Fault Localization Based on Causal Inference, Multimodal Fusion, and Multi-Agent Collaboration. When a failure occurs in a large-scale distributed system with thousands or even millions of components, symptoms often manifest across many correlated components and multiple data modalities, such as metrics, logs, traces, and alerts. Root cause analysis, the task of identifying the underlying cause of a failure, is therefore like finding a needle in a haystack.

I have studied this problem extensively and continuously incorporated the latest advances in AI into this domain, including expert-rule-based methods, causal-inference-based methods, multimodal-fusion-based deep learning, and LLM-based multi-agent collaboration. In particular, I proposed a metric causal discovery algorithm that explicitly considers propagation delay, as well as CIRCA [5], one of the first causal-inference-based fault localization frameworks.

I also proposed unifying metrics, logs, traces, alerts, and configuration data into multivariate time series. I further reformulated the AIOps root cause analysis problem as a causal discovery and inference problem over a network of time series, addressed through multi-agent collaboration. This enables AIOps research to benefit from broader advances in time series, causal analysis, and multi-agent LLMs.

This line of work has produced 49 A-class papers and 10 authorized invention patents. It received the Best Paper Award in the Industry Track of ISSRE 2024 and a Best Research Paper nomination in ISSRE 2025. These methods have been applied in enterprises such as China Mobile, Alibaba, China Everbright Bank, and eBay.

3.Time Series Intelligence. I have conducted extensive research on time series for two reasons: time series data are the most prevalent modality in AIOps, and time series has broad applications beyond AIOps, including healthcare, finance, energy, transportation, manufacturing, and environmental systems.

My research in this area includes both the development of a systematic research framework and extensive work within that framework. It covers preprocessing, feature engineering, similarity and distance functions, clustering, change-point detection, classification, forecasting, anomaly detection, causal discovery, causal inference, and multimodal and multitask foundation models. I have also organized this framework into Time Series Intelligence, a course that I teach in English to international and interdisciplinary graduate students at Tsinghua.

This line of work has led to 13 A-class papers. It also received the Best Paper Award in ISSRE 2023 and a Best Paper nomination in the Industry Track of ISSRE 2025. In particular, OmniAnomaly [6] proposed the first lightweight multivariate time series anomaly detection algorithm capable of modeling explicit temporal dependencies among stochastic variables and has been cited more than 2,100 times on Google Scholar; AutoKAD [7] proposed the first AutoML framework for time series anomaly detection; and ChatTS [8] filled an important gap in the use of LLMs for time series reasoning.

Summary

Overall, my AIOps research has established a systematic framework spanning anomaly detection, automated fault localization and handling, and time series intelligence. It advances both the scientific foundations and the large-scale industrial deployment of intelligent operations technologies and has made substantial contributions to the reliability and efficiency of critical online services in various industry sectors.

[1] Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications. WWW 2018. #Citations: 1300+

[2] Unsupervised Anomaly Detection on Microservice Traces through Graph VAE. WWW 2023.

[3] Robust and Rapid Adaption for Concept Drift in Software System Anomaly Detection. ISSRE 2018. (Best Paper Award)

[4] LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. IJCAI 2019. #Citations: 800+

[5] Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition. KDD 2022.

[6] Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. KDD 2019. #Citations: 2100+

[7] AutoKAD: Empowering KPI Anomaly Detection with Label-Free Deployment. ISSRE 2023. (Best Paper Award)

[8] ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning. VLDB 2025. (#HuggingFace Downloads: 20000+)