开发者社区> 芷沁> 正文
阿里云
为了无法计算的价值
打开APP
阿里云APP内打开

Machine Learning-based Web Exception Detection

简介: The next-generation web intrusion detection technology based on machine learning technology is expected to make up for the deficiency in the tradition
+关注继续查看

Machine_learning_based_web_exception_detection

Web firewalls are the first line of defense when it comes to information security. With rapid updates in network technologies, new hacking tricks are also emerging, bringing forth challenges for firewalls following traditional rules. Traditional web intrusion detection techniques intercept intrusion access by maintaining rule sets. On the one hand, hard rules are easy to bypass by sly hackers, and rule sets based on common knowledge have difficulty coping with 0-day attacks. On the other hand, there is a high threshold and there are high costs for defenders to set up and maintain rules.

The next-generation web intrusion detection technology based on machine learning technology is expected to make up for the deficiency in the traditional rule set approach, and bring new developments and breakthroughs for defenders in the web confrontation.

Machine learning methods can carry out automated learning and training based on massive data and have seen wide application to image, voice, and natural language processing as well as other aspects.

However, there are challenges to apply machine learning to web intrusion detection, the biggest one being the lack of label data. Despite the large amount of normal access traffic data, web intrusion samples are scarce and varied, making it difficult to learn and train models. Therefore, most of the current web intrusion detection approaches rely on unsupervised methods, naming establishing profiles for a large number of normal logs, and accesses that are at variance with normal traffic - identified as exceptions. The idea behind this is just the opposite to the construction of interception rules. Interception rules aim to identify intrusions, and thus need to "trim the sails" in confrontation. However, the profile-based approach is, by design, intended to profile normal traffic, to "meet changes with constancy" in confrontation, and thus is more difficult to bypass.

01

Exception detection-based web intrusion recognition requires to abstract the URL-specific statistics or machine learning profiles that can describe the sample sets based on a large number of normal samples in the training phase. In the detection phase, the approach identifies the exception by determining whether the web access is in line with the profile.

02

There are several key ideas for establishing a profile, mentioned below:

1. Statistics-based learning model
Statistics-based learning web exception detection usually requires numerical extraction and an analysis of the features of normal data traffic. Features include the number of URL parameters, the mean and variance of the length of the parameter value, the parameter character distribution, the access frequency of the URL, and so on. Furthermore, it requires the establishment of mathematical models through statistical analysis on the distribution of features in a large number of samples, and exception detection to proceed via statistical methods.

2. Text analysis-based machine learning model
Web exception detection is a part of the final analysis based on the analysis of log text, thus we can learn some of the ideas in NLP for text analysis and modeling. Specifically, one of the successful practices is the Hidden Markov Model (HMM)-based parameter value exception detection.

3. The one-class model-based
As black samples of web intrusions are few, it is hard to conduct training in traditional supervised learning methods. The white sample-based exception detection, however, enables sample learning through unsupervised or one-class models to construct the minimal model that can completely express the white samples as the profile for exception detection.

4. The clustering model-based
In general, normal traffic has quite a repetition of visitors, with extremely rare intrusion actions. Therefore, through clustering analysis on web access, we can catch hold of a small number of exceptions amongst a large number of normal behaviors to identify intrusions.

03

Statistics-based learning model

The statistics-based learning model approach mainly involves establishing a feature set for the data, and then carrying out statistical modeling for each feature. For the testing sample, first calculate the abnormality degree of every feature, then use the model to fuse and grade the outliers, and finally treat the results as the final basis for exception detection.

Here we take an excerpt from the Stanford University CS259D: Data Mining for Cyber Security course [1] as an example to introduce some effective features and exception detection methods.

Feature 1: Length of parameter value

Model: The length distribution
The mean value is μ, and the variance is σ2. Use Chebyshev's inequality to calculate the outlier p
04

Feature 2: Character distribution

Model: Create a model for character distribution
Use a chi-squared test to calculate the outlier p.
05

Feature 3: Missing parameter

Model: Create a parameter table
Check erroneous or missing parameters through the look-up table method.

Feature 4: Parameter sequence

Model: Directed graph of the order of parameters to determine whether there is a violation of the order.

06

Feature 5: Access frequency (the access frequency of an individual IP address, and the total access frequency)

Model: The access frequency distribution within a given period.
The mean is μ, and the variance is σ2. Use Chebyshev's inequality to calculate the outlier p.

Feature 6: Access interval

Model: The interval time distribution. Use a chi-squared test to calculate the outlier p.
Finally, use multiple feature outliers to fuse the exception grading model to get the final exception score:

07

Text analysis-based machine learning model

Behind the URL parameter input is the background code parsing. In general, the value of each parameter has a range, and the allowable input follows a certain pattern. See the example below:

08

In the example, the green logs represent the normal traffic flows, and the red represents the abnormal traffic flow. As the abnormal traffic flow has very similar parameters, value length, and character distributions to those of the normal traffic flows, using the above feature statistical method, it is difficult to tell them apart. In addition, the normal traffic flows differ from one another, but they share a common pattern, while the abnormal traffic does not match the pattern. In this example, the sample pattern that matches the value is: numbers_letters_numbers, and we can use a state machine to express the valid value range:

09

The text sequence pattern modeling is more accurate and reliable than numerical features. Among these, a suitable application is the sequence modeling based on Hidden Markov Model (HMM), which we will briefly introduce here.

In the HMM-based state sequence modeling, first, you need to convert the original data to a state. For example, 'N' represents the number to indicate the state, and 'a' represents the letter to indicate the state, while other characters remain unchanged. This step can also be the normalization of the original data. The result is that we have effectively managed to compress the state space of the original data and have further reduced the gap between normal samples.

10

As the next step, we calculate the probability distribution of the state after each state. For example, a possible result is as depicted in the figure below. The "^" represents the begin symbol. Since white samples all start with numbers, the probability of the start symbol (state ^) to shift to the number (state N) is 1. Now, the next state of the number (state N) has a probability of 0.8 to be a number (state N) again, a probability of 0.1 to shift to an underscore, and a probability of 0.1 to shift to the terminator (state $), and so on.

11

Using this state transition model, we can determine whether an input sequence conforms to the white sample pattern:

12

The occurrence probability of the state sequence of normal samples is higher than that of the abnormal samples. You can perform the abnormity identification by setting an appropriate threshold.

The one-class model-based

In binary classification issues, since we only have a large number of white samples, we can consider the one-class models to learn the minimum frontier of the single-class samples. Accesses outside of the frontier will fall in the exceptions category.

13

A successful application among such methods is the one-class support vector machine (one-class SVM). Here is a brief introduction of the idea of McPAD, a successful case utilizing this method.

McPAD system first vectorizes the text data through the N-Gram, as in the example below:

14

First, divide the text into N-Gram sequences by a sliding window of length N. In this example, N is 2 and the sliding window step length is 1. You will obtain the following N-Gram sequence.

15

In the next step, we will convert the N-Gram sequence into a vector. Suppose there are 256 different characters, and we will get 256 * 256 kinds of 2-GRAM combinations (such as aa, ab, ac...). We can use a 256 * 256 long vector, and each bit is one-hot encoded (1 for a high bit and 0 for a low bit) to represent whether the 2-GRAM appears in the text. Then we will get a 256 * 256 long 0/1 vector. In the next step, we use the occurrence frequency of this 2-Gram in the text to replace the monotonous "1" for each 2-Gram to deliver more information:

16

At this point, we can represent each text by a 256 * 256 long vector.

17

Now we get the 256 * 256-vector set of the training sample, and need to identify the minimal frontier through the one-class SVM. The problem, however, is that the dimension size of the sample is too high, posing a difficulty in training. Another problem that needs a solution is reduction in the feature dimensionality. There are many mature methods in feature dimension reduction, and the McPAD system adopts the clustering approach for features to reduce the dimensionality.

The clustering model-based

18

In the upper left matrix, black blocks represent 0 and red blocks represent nonzero values. Each row in the matrix represents the 2-Grams in each input text (sample). If we look at this matrix from another angle, each column represents the samples existing in a 2-Gram. Therefore, we can express each 2-Gram by the sample vector. From this perspective, we can get 2-Gram relevance. In the 2-Gram vector clustering, the specified number of classes 'K' is the number of feature dimensions present after dimension reduction. We can then apply the reduced feature vectors into the one-class SVM for further model training.

Furthermore, you can replace the process of McPAD adopting the linear feature reduction + one-class SVM to address the white model training by the in-depth auto-encoding model in deep learning to conduct nonlinear feature reduction. At the same time, the auto-encoding model training process entails learning the compression expression of the training sample and determining whether the input sample is consistent with the model through the reconstruction error of the given input.

19

Now, we will continue with the text vectorization method of McPAD through 2-Gram to directly input the vectors into the in-depth auto-encoding model for training. In the test phase, we will calculate the reconstruction error to serve as the criterion for exception detection.

20

Based on this framework, the basic flow of exception detection is as follows.

21

Summary

This article is a glimpse of several ideas for leveraging machine learning in web exception detection. Web traffic exception detection is only a part of the web intrusion detection, and serves to catch a small amount of "suspicious" behaviors from the massive logs. However, numerous false alerts also arise in this "small amount", and the approach is far from being capable of directly delivering in WAF direct interception. A sound web intrusion detection system should also incorporate intrusion identification on this basis, as well as false alerts reduction.

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
《The 8 Neural Network Architectures Machine Learning Resarchers Need to Learn》电子版地址
The 8 Neural Network Architectures Machine Learning Resarchers Need to Learn
0 0
Optimization of Machine Learning
机器学习就是需要找到模型的鞍点,也就是最优点。因为模型很多时候并不是完全的凸函数,所以如果没有好的优化方法可能会跑不到极值点,或者是局部极值,甚至是偏离。
905 0
Learning Machine Learning, Part 3: Application
This post features a basic introduction to machine learning (ML). You don’t need any prior knowledge about ML to get the best out of this article.
1945 0
Learning Machine Learning, Part 1: An Introduction
This post features a basic introduction to Machine Learning. This post on Machine Learning will not only help you to understand the latest trends in .
2636 0
(转) Graph-powered Machine Learning at Google
    Graph-powered Machine Learning at Google     Thursday, October 06, 2016 Posted by Sujith Ravi, Staff Research Scientist, Googl...
1087 0
Stack based vs Register based Virtual Machine Architecture
进程虚拟机简介 一个虚拟机是对原生操作系统的一个高层次的抽象,目的是为了模拟物理机器,本文所谈论的是基于进程的虚拟机,而不是基于系统的虚拟机,基于系统的虚拟机可以用来在同一个平台下去运行多个不同的硬件架构的操作系统,常见的有kvm,xen,vmware等,而基于进程的虚拟机常见的有JVM,PVM(python虚拟机)等,java和python的解释器将java和python的代码编译成JVM和P
2497 0
Distributed Machine Learning Toolkit (DMTK)
http://www.dmtk.io/download.html
441 0
Machine Learning学习笔记(1)Introduction
1、机器学习可以做什么? 搜索引擎、垃圾邮件过滤、人脸识别等等,不仅用于人工智能领域,生物、医疗、机械等很多领域都有应用。 2、机器学习的定义 A computer program is said to learn from experience E with respect to some task T and some performance measure P
1344 0
+关注
芷沁
https://www.alibabacloud.com/blog/
文章
问答
文章排行榜
最热
最新
相关电子书
更多
Applying Machine Learning to Construction
立即下载
Applying Machine Learning to
立即下载
Snorkel--Easier-to-use Machine Learning Systems
立即下载