In this tutorial, we will create a “deep-learning powered” cybersecurity dashboard that simulates network traffic monitoring for malicious events in real-time.
Network attacks are a broad category of cybersecurity threats in which a malicious actor attempts to disrupt, steal, or corrupt an organization’s data by gaining unauthorized access to its systems. The proverbial “needle in a haystack”, network attacks are an inherently difficult problem because they require finding rare events in extremely large datasets.
When a dataset contains 100s-1000s of dimensions, it can pose tricky challenges (e.g., curse of dimensionality). Similarity search is an approach to understanding high-dimensional data that works by finding objects in a collection that are similar based on some definition of sameness. You can think of it as a k-Nearest Neighbor (k-NN) problem where the similarity of objects is measured by distance (source).

In this series of blogs, we will build a Jina application that leverages similarity search to classify network traffic flow as either benign or malicious. Our goal will be to develop a reliable, scalable, and speedy intrusion detection system that predicts if an attack happens in real-time.
To pull this off, we will perform “network surgery” on a pre-trained neural network, removing the classification layer, and instead repurposing the network as a feature extractor. In other words, our network will output features, as opposed to labels.

Then, we will take the 128-D embeddings generated by our feature extractor and make them searchable by indexing them using a Jina Flow. By indexing thousands of these 128-D vectors along with their labels (benign/malicious), we can capitalize on the powerful relationship between distance and similarity that vector space facilitates.
It will allow us to take unseen network traffic data from a different day, extract its features, and determine whether it is benign or malicious by finding its nearest neighbor and assigning it a class depending on the class of its nearest neighbor.
To recap, we are going to make a slight tweak to a pre-trained neural network and turn a classification problem into a similarity search problem so that we can simulate detecting malicious network traffic in real-time.

Here are the steps involved:
This project won’t build itself! Let's get started already and check out our dataset.
