HMM, MEMM, and CRF: A Comparative Analysis of Statistical Modeling Methods

简介: HMM, MEMM, and CRF are three popular statistical modeling methods, often applied to pattern recognition and machine learning problems.

This article presents a comparison analysis of Hidden Markov Model (HMM), Maximum Entropy Markov Models (MEMM), and Conditional Random Fields (CRF). HMM, MEMM, and CRF are three popular statistical modeling methods, often applied to pattern recognition and machine learning problems. Let us explore each method in further detail.

Hidden Markov Model (HMM)

The word “Hidden” symbolizes the fact that only the symbols released by the system are observable, while the user cannot view the underlying random walk between states. Many in this field recognize HMM as a finite state machine.

1
Hidden Markov Model (HMM)

Advantages of HMM

HMM has a strong statistical foundation with efficient learning algorithms where learning can take place directly from raw sequence data. It allows consistent treatment of insertion and deletion penalties in the form of locally learnable methods and can handle inputs of variable length. They are the most flexible generalization of sequence profiles. It can also perform a wide variety of operations including multiple alignment, data mining and classification, structural analysis, and pattern discovery. It is also easy to combine into libraries.

Disadvantages of HMM

  • HMM is only dependent on every state and its corresponding observed object:

    The sequence labeling, in addition to having a relationship with individual words, also relates to such aspects as the observed sequence length, word context and others.


  • The target function and the predicted target function do not match:
    HMM acquires the joint distribution P(Y, X) of the state and the observed sequence, while in the estimation issue, we need a conditional probability P(Y|X).

Maximum Entropy Markov Models (MEMM)

2
Maximum Entropy Markov Models

MEMM takes into account the dependencies between neighboring states and the entire observed sequence, hence a better expression ability. MEMM does not consider P(X), which reduces the modeling workload and learns the consistency between the target function and the estimated function.

MEMM Labeling Bias

3
Viterbi algorithm decoding of MEMM

In Figure 3, State 1 tends to convert to State 2, and State 2 tends to stay at State 2 at the same time.

P(1-> 1-> 1-> 1)= 0.4 x 0.45 x 0.5 = 0.09, P(2->2->2->2)= 0.2 X 0.3 X 0.3 = 0.018,

P(1->2->1->2)= 0.6 X 0.2 X 0.5 = 0.06,P(1->1->2->2)= 0.4 X 0.55 X 0.3 = 0.066.

However, the optimal state conversion path is 1 > 1 > 1 >1. Why?

It is because State 2 has more convertible states than State 1 does, hence reducing the conversion probability – MEMM tends to select the state with fewer convertible states. Such selection is termed the labeling bias issue. CRF well addresses the labeling bias issue.

Conditional Random Fields (CRF model)

4

The CRF model has addressed the labeling bias issue and eliminated two unreasonable hypotheses in HMM. Of course, the model has also become more complicated.

MEMM adopts local variance normalization while CRF adopts global variance normalization.

On the other hand, MEMMs cannot find the corresponding parameters to meet the following distribution:

a b c --> a/A b/B c/C              p(A B C | a b c) = 1
a b e --> a/A b/D e/E             p(A D E | a b e) = 1
p(A|a)p(B|b,A)p(C|c,B) = 1                                     
p(A|a)p(D|b,A)p(E|e,D) = 1                                    
But CRFs can.                                                        

Generative model or discriminative model

Suppose o is the observed value, and m is the model.

a) Generative model: Infinite samples > Probability density model = Generative model > Prediction

If you model P(o|m), it is a generative model. The basic idea is, first, to establish the probability density model of the sample, and then use the model for inference prediction. The requirement of the samples being infinite or as large as possible is common knowledge. The method draws from statistical mechanics and Bayes theory.

HMM directly models the transition probability and phenotype probability, and calculates the probability of co-occurrence. Thus, it is a generative model.

b) Discriminative model: Finite samples > Discriminative function = Discriminative model > Prediction

If you model on the conditional probability P(m|o), it is the discriminative model. The basic idea is to establish the discriminant function with finite samples, and directly study the prediction model without considering the generative model of samples. Its representative theory is the statistical learning theory.

CRF is a discriminant model. MEMM is not a generative model, but a model with finite states based on state classification.

Topological structure

HMM and MEMM are a directed graph, while CRF is an undirected graph.

Global optimum or local optimum

HMM directly models the transition probability and the phenotype probability, and calculates the probability of co-occurrence.

MEMM establishes the probability of co-occurrence based on the transition probability and the phenotype probability. It calculates the conditional probability, and only adopts the local variance normalization, making it easy to fall into a local optimum.

CRF calculates the normalization probability in the global scope, rather than in the local scope as is the case with MEMM. It is an optimal global solution and resolves the labeling bias issue in MEMM.

Advantages and Disadvantages of CRF

Advantages

  • Compared with HMM: Since CRF does not have as strict independence assumptions as HMM does, it can accommodate any context information. Its feature design is flexible (the same as ME).
  • Compared with MEMM: Since CRF computes the conditional probability of global optimal output nodes, it overcomes the drawbacks of label bias in MEMM.
  • Compared with ME: CRF computes the joint probability distribution of the entire label sequence when an observation sequence intended for labeling is available, rather than defining the state distribution of the next state under the current state conditions given.

Disadvantages

CRF is highly computationally complex at the training stage of the algorithm. It makes it very difficult to re-train the model when newer data becomes available.

Conclusion:

This post detailed out a comparative analysis between Hidden Markov Model (HMM), Maximum Entropy Markov Models (MEMM), and Conditional Random Fields (CRF). In this post we categorically learnt that CRFs and MEMMS are mainly discriminative sequence models whereas HMMs are primarily generative sequence models. It is Bayes Rule that forms the basis of HMM. On the contrary, CRF and MEMM’s based on MaxEnt models over transition and observable features.

目录
相关文章
|
JavaScript
Element el-radio 单选框详解
本文目录 1. 用途 2. 单选框 3. 单选框样式 4. 单选框组 4. 单选框组样式 5. 尺寸调节 6. 绑定值变化事件 7. 小结
1867 0
Element el-radio 单选框详解
|
11月前
|
存储 JSON 关系型数据库
Elasticsearch 索引
【11月更文挑战第3天】
220 4
|
12月前
如何使用FabricJS为图像添加平滑处理?
在本文中,我们将展示如何使用FabricJS为图像添加平滑效果。
111 2
|
11月前
|
SQL 分布式计算 运维
如何优化超长定时任务:慢节点优化实践
本文介绍了一个复杂的ODPS任务优化过程。通过对任务耗时卡点的分析,发现主要问题是数据倾斜和join任务资源不足。通过提高join任务资源分配、对空值加随机值打散、视图物化落表、节点拆分、前置裁剪和使用Distributed Mapjoin等方法,成功将宽表产出时间从下午一点提前到早上八点半,节省了4小时以上。优化过程中还拆分了宽表节点,降低了回刷成本。文章强调了在设计开发初期应避免代码耦合度过高,以提高代码运行效率和可维护性。
253 0
|
存储 算法 NoSQL
【C/C++ 数据结构】稀疏矩阵解析:从原理到 C++ 实现 指南
【C/C++ 数据结构】稀疏矩阵解析:从原理到 C++ 实现 指南
664 0
|
机器学习/深度学习 人工智能 自然语言处理
人工智能平台PAI产品使用合集之进入DSW后,如何把工作环境切换为GPU状态
阿里云人工智能平台PAI是一个功能强大、易于使用的AI开发平台,旨在降低AI开发门槛,加速创新,助力企业和开发者高效构建、部署和管理人工智能应用。其中包含了一系列相互协同的产品与服务,共同构成一个完整的人工智能开发与应用生态系统。以下是对PAI产品使用合集的概述,涵盖数据处理、模型开发、训练加速、模型部署及管理等多个环节。
|
存储 运维 网络协议
【开源物联网平台】物联网设备上云提供开箱即用接入SDK
IOTDeviceSDK是物联网平台提供的设备端软件开发工具包,可简化开发过程,实现设备快速接入各大物联网平台。设备厂商获取SDK后,根据需要选择相应功能进行移植,即可快速集成IOTDeviceSDK,实现设备的接入。
467 0
|
SQL 人工智能 数据可视化
Kyligence Zen 产品体验-好用的指标平台
近些年企业数据化运营变得尤为重要,在数据的基础上,让数据产生价值,“数据就是资产”已经是不少企业普遍的共识,传统模式堆积如山的分析报表难以聚焦指标本身,无法看出关键指标差异。
475 0