Abstract
There are many problems with Internet traffic
classification for usually applications such as workload
characterization and modeling, capacity planning, route
provisioning and network management. For network
management, the network manager monitors the traffic, and
observes the negative and positive affects over the network
infrastructure, while traffic changes from non-P2P to P2P traffic.
The network manager basically uses flow priority, traffic
policing and diagnostic monitoring to make decision. Accurate
network management is possible through accurate traffic
classification. Internet traffic classification basically used in
many areas such as network management and operation,
network design, Quality of Services, traffic control and network
security on which network administrator can efficiently handle
the network. Some features of data set are irrelevant and
redundant and often leads to negative impacts on the accuracy of
the most ML algorithms. Feature selection basically reduces
features of data set. Thus helps in reducing time required for
model generation. They also reduce noise leading to better
performance. Traditional Internet traffic classification such as,
port number, payload and heuristic, fails to identify the new
version of P2P applications. Early version of P2P systems usually
use TCP with some fixed ports whereas new version of P2P
applications can both use TCP and UDP connections with
arbitrary ports. Researchers have applied another technique
which is based on statistical features and independent from
above methods. The statistical features may be Inter-arrival
packet time, Packet lengths, Total number of packets, Mean
packet size etc. Machine Learning classification algorithms
which are based on statistical features and independent form
port and payload-based methods fall into two categories (i)
Supervised, (ii) Unsupervised.