26
Seeing it in absolute numbers (Table 1) there are a few differences between the training and
the test set.
Table 1: distribution of traffic in training and test datasets
Type of traff
ic
# in
training set
% in training
set
# in test
set
% in test
set
normal
67343
53.46%
9711
43.08%
DoS
45927
36.46%
7460
33.09%
Probe
11656
9.25%
2885
12.79%
R2L
995
0.79%
2421
10.74%
U2R
52
0.04%
67
0.30%
In total, we see a skewed distribution towards the normal and DoS traffic in both datasets. In
the test set however, the normal traffic is not even half of the total and the R2L class of attacks
is accordingly boosted, compared with the training set. This uneven distribution of internet
traffic is a realistic representation of typical internet traffic, where DoS attacks are the most
common, followed by probe attacks, while R2L and U2R are hardly encountered in real life.
For a more detailed approach, all the different attacks need to be addressed. It is notable that
DoS attacks are the most common in terms of encounters in both the datasets, but when it
comes to the number of different attacks each class includes, R2L is the one that comes first.
In the following Table 2, we can see all the labels that are included in the NSL-KDD, divided in
their classes:
Table 2: all attack labels of the NSL-KDD, by class
Class
R2L
DoS
U2R
Probe
Attacks
ftp_write
guess_passwd
httptunnel
imap
multihop
named
phf
sendmail
snmpgetattack
spy
snmpguess
warezmaster
warezclient
xlock
xsnoop
apache2
back
land
neptune
mailbomb
pod
processtable
smurf
teardrop
udpstorm
worm
buffer_overflow
loadmodule
perl
ps
rootkit
sqlattack
xterm
ipsweep
mscan
nmap
portsweep
saint
satan
Total
15
11
7
6
The dataset was also studied to classify all the attacks separately, so it was worthwhile to
investigate how many encounters of each attack are found. This can be seen in the next couple
of figures (Figure
9), in the training and test set respectively. In the case of studying
the various attacks separately, the normal traffic outnumbers the rest by far, followed by
Neptune, the most popular DoS attack. It is also notable that in the training set, only 23 labels