Network Anomaly Detection

Network Anomaly Detection using HDBSCAN and AutoEncoder, Feature Engineering

Starting my career

Finishing my military service as Korea National Police Agency Auxiliary Police, based on the experience in deep learning by taking NanoDegree courses, I started my research internship in Hongik University. I started my project in network anomaly detection immediately. As this was my first experience in data science and machine learning This is my first off-class data science/machine learning experience. At that time, my early experiences were quite challenging because I was now in my second year, even after two years of not properly handling computers in the military. I still remember staying alone in the lab every day and staying up all night. Since network packet analysis, network security, and technological research were all new experiences for me at the time, I recall attempting to comprehend and obtain results while working extremely hard. Ironically, when I look back on those days, I believe I was joyful at the time. For the first time, I started to be passionate about research by conducting creative tasks such as technology research.

Review on NAD

Project started as a competition for network packet anomaly detection using artificial intelligence. However, when I joined the lab, there was only 1 week remaining for the competition, and therefore instead of aiming for competition, we planned to begin research into creating a new model, and applying it for one of our government tasks that needed network anomaly detection as part of the whole system of security for South Korea's education system.

Our project had four main attack of interest, which were DDoS, DoS, BruteForce, and Portscan. Since all four attacks were related to number of change in port number change in ip address, features were first manually extracted using these informations. However, since security required near-perfect accuracy, I had to find a new approach. At that time, I had never studied latent space, and there were no people to help me with the study of network packet anomaly detection, so it took a long time to find an autoencoder-based method. But in the end, I found a way through research, and finally, I was able to get over 98% accuracy using hdbscan and autoencoder. Resulting model lacked in terms of novelty so it wasn't published as research paper, but it was registered as part of the national project.