What is an outliner?
An outliner is a data point that doesn't fit the general trend of the data and is supposed to be in there where an anomaly is an outliner that is not supposed to be in the data however these two terms are used interchangeably.
Classes of outliners and how to spot them
There are classes of outliners that outliners can fit into. One of the classes is a context based outliner where the data fits the normal trend of the data but given the context of the event the data point is an anomaly this could be the same amount of traffic on Christmas day as there would be on a normal day. This is anomaly because given the context of it being a holiday the traffic should be lower and may suggest some unauthorised traffic activity going on.
Can use techniques like machine learning to spot anomaly's because machines can spot more complex patterns then a human eye ever could. However, the success of a machine learning approach depends on the model being used and the data provided to the model.
Categories of Machine learning
- Supervised Learning = When all the data fed into the model is labelled with the result the data had. This model requires the most work as the data probably needs to be labelled by a human
- Semi-Supervised Learning = When the data is partially labelled this could be like only normal data is labelled and no anomaly data is labelled
- Unsupervised Learning = When none of the data is labelled and its the model job after training to classify the data into classes.
Isolated tree
Isolated tree is an unsupervised machine learning method where the dataset is split using random features and random values to split the data up until all the data points in the dataset are isolated from each other than outliners can be detected by finding the nodes closest to the root of the tree because an anomaly is likely to be isolated from other data points easier so therefore closer to the root of tree.
Pros
- Model isn't affected by dimensions
- Scales Well
- Unsupervised doesn't require the data to be labelled.
Cons
- Doesn't perform well on dense data where an anomaly has similar features to an normal data point
- Hyperparameters need to be tuned manually to get the best results from the model
One class support vector machine
One Class Support Vector Machine is a supervised machine learning algorithm that uses a hyperplane to separate data into classes this could be anomaly or not and choices a hyperplane that maximizes the distance between the hyperplane and the closet data point into the class. Then you when new data comes in it is separated into classes depending on what side of the hyperplane the data is.
Kernel map
One class Support Vector Machine can use a kernel map to temporary increase the dimensionality of the data so that the hyperplane can be non-linear so more complex classes can be formed which supports more complex relationships.
Pros
- Can handle High Dimensional data but is still affected
- Unsupervised training don't need all the data to be labelled
- Can produce nonlinear relationships in classes thanks to kernel maps.
Cons
- Doesn't scale well for large data sets
- Noise effects the boundaries of the classes because the boundary's will change
- Hyperparameter need to be manually selected to get good results.
Deep learning
Deep learning is a subset of machine learning that involves using neuron networks to perform modelling tasks. Each neuron in the network takes input from the previous layer with each input having a weight and also a bias is added to the result of all the inputs with all their weights. After that the neuron does an activation function to activate the neuron so that the neuron network can spot non-linear relationships because without the activation function only linear relationship's will be able to be found because the commination on many weights and bias is still linear.
Deep learning can be used to consider a lot of features to find anomalies in the data that a human would never be able to comprehend. However more features run the risk on overfitting the model which will result in bad results on data that has not been used to train the model and makes the model pointless so there needs to be an appropriative mechanism in place to stop training the data at a certain point so the model is not over trained this could be a technique like early stopping which stops training the data after some many passes.
ReLU
ReLU is short for rectified linear unit it is a nonlinear activation function that makes all negative values 0 and doesn't change the positive values. This is a very simple function that doesn`t use much computation power and when used on many layers can represent complex relationships in the data but runs the risk on the exploding data because the positive is not capped so the output of each neuron layer can continue to get bigger (explode) which results in an incorrect output because the model wasn't designed to handle such large values.