52
can assume that the vast majority of our data will be normal traffic, without any anomalies
present. This kind of data is exceptional to apply to autoencoder models, which learn from the
uniform structure of normal data and then recognise the anomalies from their deviation
compared to the rest.
A project like that would be good for more advanced research, which can manage both the
feature extraction, the data preparation and the advanced unsupervised models development,
since there are no labels to apply any supervised methods available. Still, there would be
problems and limitations with that kind of development too; namely, there is no way to know
if the model would work optimally, as we can’t extract the percentages of TP, FP, TN and FN
classifications without knowing for sure which datapoints are normal and abnormal traffic, and
to test the model would require us to create synthetic attack type of data, or find attack records
from available datasets and pre-process them so that they are uniform to our own unlabelled
data and tested through the model. This kind of work is too complex and meticulous for the
level of a master thesis.
◦◦◦◦◦
To sum up, despite the challenges and limitations of anomaly detection in the domain of
network security, research advances along with machine learning and AI, applying the same
innovative methods. Unsupervised learning can be more useful in the real world, since the data
in uses don’t need to be labelled, and it can detect unknown and novel attacks. In spite of the
new wave of research focused on unsupervised learning in the past couple of years, there are
still problems and limitations, like mentioned above, and supervised learning, or at least
semi/self-supervised learning, still remain the most effective method to study the topic of
network security.
Even though the five algorithms are somewhat basic and outdated, this thesis provides results
at a satisfactory level, not far behind state-of-the-art experiments. Its benefits lie in the fact
that it utilizes one of the most popular datasets available, and goes through a thorough analysis
of it, and subsequently compares the performance of five algorithms of supervised learning
that are still very commonly used for classification problems and anomaly detection. Even with
this kind of approach, we can see that the results of our classifiers are very close to state-of-
the-art research, which indicates how useful these methods still are for anomaly detection.
There are many ways that the project can be improved in the future, either by developing more
advanced models and moving to unsupervised learning solutions, or by trying to create a new
dataset with similar features and apply unsupervised methods on it.