The presentation will cover two topics. In the first part, I will talk about the use of machine learning in network fault localization. We consider the problem of isolating the source of failure in a network after receiving alarms or having observed symptoms. In a large communication system, a single fault can often result in a large number of alarms, and multiple faults can occur concurrently. All these factors make it very hard to accurately and efficiently locate the root cause. I'll present a new fault localization method using a machine learning approach. We propose to use logistic regression to study the correlation among network events based on end-to-end measurements. Then based on the regression model, we develop fault hypothesis that best explains the observed symptoms. Unlike previous work, the machine-learning algorithm requires neither the knowledge of dependencies among network events, nor the probabilities of faults, nor the conditional probabilities of fault propagation as input. The ``low requirement'' feature makes it suitable for large complex networks where accurate dependencies and prior probabilities are difficult to obtain. In addition, we present a new bounding method for the learning algorithm that provides a sharper error bound than the well-known McDiarmid bound and its successor. Based on the learning algorithm, we analyze its performance with respect to the accuracy of fault hypothesis. Experimental results and theoretical analysis both show satisfactory performance.
In the second part, I will present a new sequential change point detection method for large data. It has wide applications in communication networks from channel state estimation to attack detection. The method is in sensitive to network traffic patterns, and is superior to the widely used Cumulative Sum (CUSUM) method with shorter detection latency and better detection accuracy.