AIOps Spring2023 – Course Home



2022/12/23:  Course site is up. Potential students please come back and check.

Course Instructor
Course Assistants
Class Time and Location

Zhe Xie
Class Time: Tuesday 7:20pm-9:45pm (please see detailed schedule in the course syllabus

Associate Professor
Email: xiez22(at)mails(dot)tsinghua(dot)edu(dot)cn

Department of Computer Science and Technology
Office: East Main Building 9-323

Office: East Main Building 9-319
Office hours:


Office hour: right after the class








Course Description

AIOps stands for Autonomous IT Operations or Artificial Intelligence for IT Operations. It is a interdisciplinary research field between Machine Learning and Systems/Networking, which is why this course had this historical title “Advanced Network Management”.    If you are interested in learning how a large distributed system can be better run with the help of machine learning, this course is for you. If you want to learn how machine learn can help solve challenging problems in a very complex system, this course is for you.

Imagine that you are running a large Internet-based service with hundreds of thousands of servers and many software modules. You want to achieve 99.999% service reliability, but the terabytes of machine-generated monitoring data and hundreds of operators (IT operation engineers) alone won’t get you there, because of the high complexity and sheer scale of the software/hardware system and the vast amount of machine-generated data. What do we do? Machine Learning to the rescue!

This course will cover the latest progress in major topics of AIOps using case studies from recent research papers in top conferences in all major computer science fields, including Machine Learning, Data Mining, System/Networking, Software Engineering, Database, Multimedia, etc. See below figure 🙂

Through these case studies, we will show how  the latest Machine Learning Algorithms are applied to solve the unique challenges in AIOps. The basics of these Machine Learning algorithms will be briefly reviewed in an easy-to-understand way, without going through the detailed theory behind them. Thus by the end of the course, you will be able to learn roughly how these algorithms work, and how it can be applied to solve real-world problems.

  1. Deep Learning
  2. Deep Neural Networks for Time Series or Sequence
  3. Deep Generative Model (VAE, GAN)
  4. Deep Reinforcement Learning
  5. Natural Language Processing
  6. Causal Inference

The major topics of AIOps often coincide with its more general counterparts in Machine Learning, and the major difference is the data in AIOps are machine generated, while those in Machine Learning can be more general:

  1. Anomaly Detection in Time Series, Logs (semi-structured text), Traces (program execution trace), and Graphs
  2. Anomaly Localization
  3. Failure/Event Prediction
  4. Causal Inference and its application in Root Cause Analysis.

This course is a graduate course and is primarily project-oriented.

Grading Policies

Attendance: 10%;

Personal Assignment 1: 10%;

Personal Assignment 2: 15%

Team Project: 65%

Course Information

Course Number


Required text

Reference texts
《Data Science for Business–What you need to know about data mining and data-analytical thinking》Foster Provost & Tom Fawcett

MIT 6.S191 Introduction to Deep Learning 》 with video and slides.

《Site Reliability Engineering –How Google Runs Production Systems》, by Betsy Beyer, Chris Jones, Jennifer Petoff & Niall Richard Murphy

You are expected to be familiar with at least one programming language.


Previous Courses

Scroll Up