Government Sponsors: NSFC, China Ministry of Technology
Industrial sponsors include: Alibaba, Baidu, Sogou, DiDi, Tencent, Petro China, etc.
In Network Management (NetMan) lab, we apply machine learning techniques to build the “brain” for AIOps (Artificial Intelligence for IT Operations). The high-level objectives are that for the targeted networks/applications in the Internet:
1) What happened in the past can be reconstructed automatically and accurately;
2) What’s going on now can be detected/inferred accurately to trigger automatic mitigation or suggest immediate actions to the operators;
3) What will happen in the future can be predicted with high confidence.
IOS is at the intersection of machine learning, engineering, and systems, as illustrated in the below figure.
Our research is typically based on the deep understanding of the system and protocols across the layers, and then employ techniques such as correlation analysis; associate rule mining, time series analysis, anomaly detection, machine learning and deep learning. Our research has been published in top networking conferences such as INFOCOM, UbiComp, USENIX ATC, MobiSys, CoNEXT, IMC, IWQoS etc. Some of the systems we built are currently used in operational networks such as Baidu.
We have exciting collaborations with Microsoft Research Asia, Baidu, Alibaba Group, Sogou, DiDi, Petro China, and Tsinghua WiFi Operations.
Keywords: Internet, Big Data, Machine Learning
NetMan’s current research focus are 1) AppMind, the brain for Intelligent Operations of the Internet systems, and 2) AppMind’s applications to endpoint, access, and cloud.
1. AppMind: Machine Learning-Based Brain for Intelligent Operations of the Internet systems
Thanks to the ubiquitous IT technologies, modern business are equipped with tons of data and are eager to make data-driven decisions and actions for their business, product, IT, and R&D operations. However, there are no available systems/tools that can efficiently achieve these goals. Existing systems/tools focus on data collection, pipeline, storage, search, and visualization, but the users still need a lot of manual efforts and time to get to the decisions and actions.
AppMind system aims to use machine learning to automatically output decisions/actions to the users with KPI-oriented goals. Target application areas are application performance management (APM), ITOM (IT Operations Management), data-driven product Innovation, and business intelligence (BI), etc. Instead of building customized tools for various application scenarios, AppMind plans to build the following generic and cloud-based tools with standard APIs.
- Realtime Analysis for Rapid Mitigation:
- Anomaly detection (Alerting)
- Anomaly localization (Hotspot Analysis)
- Event correlation (Anomaly Association)
- Event aggregation (Log template learning and event extraction)
- Event mitigation suggestion
- Event root cause analysis (RCA)
- Realtime Predication:
- Trend prediction
- Event prediction
- Hotspot Prediction
- Offline Analysis:
- Bottleneck identification
- Offline A/B Test
- Decision suggestion
- Strategy suggestion
- System optimizations driven by machine learning on historical data
- TCP parameter optimization
- Cake tool
- aims to run machine learning experiments in a cloud environment with source-code level compatibility using Python, Java, Scala, C, R, incorporating modules from popular machine learning toolkits such as SciKitLearn, TensorFlow, Weka, Orange, etc. Our goal is that Cake can speed up the machine learning experiments by ~10 times with the same amount of physical computing resources.
2) AppMind’s applications to endpoint, access, and cloud
AppMind have been successfully/are being applied to large Internet Companies (OpRobot Project), university campus (WiFi Union Project), mobile Apps (SmoothApp project), telecom carriers, and global BGP routing systems. as shown in the below figure.
Our research are currently funded by the following projects:
- The National Natural Science Foundation of China (NSFC) under grant 61472214, 61472210.
- The Global Talent Recruitment (Youth) Program.
- Gifts from Alibaba, Tencent, Sogou, and WeBank.
Past Projects:
- The State Key Program of National Science of China under grant 61233007.
- The National High Technology Development Program of China (863 program) under grant 2013AA013302.
- The National Key Basic Research Program of China (973 program) under grant 2013CB329105.
- The Tsinghua National Laboratory for Information Science and Technology key projects.
- The Cross-disciplinary Collaborative Teams Program for Science, Technology and Innovation, of Chinese Academy of Sciences-Network and system technologies for security monitoring and information interaction in smart grid.