HMM, MEMM, and CRF: A Comparative Analysis of Statistical Modeling Methods

简介: HMM, MEMM, and CRF are three popular statistical modeling methods, often applied to pattern recognition and machine learning problems.

This article presents a comparison analysis of Hidden Markov Model (HMM), Maximum Entropy Markov Models (MEMM), and Conditional Random Fields (CRF). HMM, MEMM, and CRF are three popular statistical modeling methods, often applied to pattern recognition and machine learning problems. Let us explore each method in further detail.

Hidden Markov Model (HMM)

The word “Hidden” symbolizes the fact that only the symbols released by the system are observable, while the user cannot view the underlying random walk between states. Many in this field recognize HMM as a finite state machine.

1
Hidden Markov Model (HMM)

Advantages of HMM

HMM has a strong statistical foundation with efficient learning algorithms where learning can take place directly from raw sequence data. It allows consistent treatment of insertion and deletion penalties in the form of locally learnable methods and can handle inputs of variable length. They are the most flexible generalization of sequence profiles. It can also perform a wide variety of operations including multiple alignment, data mining and classification, structural analysis, and pattern discovery. It is also easy to combine into libraries.

Disadvantages of HMM

  • HMM is only dependent on every state and its corresponding observed object:

    The sequence labeling, in addition to having a relationship with individual words, also relates to such aspects as the observed sequence length, word context and others.


  • The target function and the predicted target function do not match:
    HMM acquires the joint distribution P(Y, X) of the state and the observed sequence, while in the estimation issue, we need a conditional probability P(Y|X).

Maximum Entropy Markov Models (MEMM)

2
Maximum Entropy Markov Models

MEMM takes into account the dependencies between neighboring states and the entire observed sequence, hence a better expression ability. MEMM does not consider P(X), which reduces the modeling workload and learns the consistency between the target function and the estimated function.

MEMM Labeling Bias

3
Viterbi algorithm decoding of MEMM

In Figure 3, State 1 tends to convert to State 2, and State 2 tends to stay at State 2 at the same time.

P(1-> 1-> 1-> 1)= 0.4 x 0.45 x 0.5 = 0.09, P(2->2->2->2)= 0.2 X 0.3 X 0.3 = 0.018,

P(1->2->1->2)= 0.6 X 0.2 X 0.5 = 0.06,P(1->1->2->2)= 0.4 X 0.55 X 0.3 = 0.066.

However, the optimal state conversion path is 1 > 1 > 1 >1. Why?

It is because State 2 has more convertible states than State 1 does, hence reducing the conversion probability – MEMM tends to select the state with fewer convertible states. Such selection is termed the labeling bias issue. CRF well addresses the labeling bias issue.

Conditional Random Fields (CRF model)

4

The CRF model has addressed the labeling bias issue and eliminated two unreasonable hypotheses in HMM. Of course, the model has also become more complicated.

MEMM adopts local variance normalization while CRF adopts global variance normalization.

On the other hand, MEMMs cannot find the corresponding parameters to meet the following distribution:

a b c --> a/A b/B c/C              p(A B C | a b c) = 1
a b e --> a/A b/D e/E             p(A D E | a b e) = 1
p(A|a)p(B|b,A)p(C|c,B) = 1                                     
p(A|a)p(D|b,A)p(E|e,D) = 1                                    
But CRFs can.                                                        

Generative model or discriminative model

Suppose o is the observed value, and m is the model.

a) Generative model: Infinite samples > Probability density model = Generative model > Prediction

If you model P(o|m), it is a generative model. The basic idea is, first, to establish the probability density model of the sample, and then use the model for inference prediction. The requirement of the samples being infinite or as large as possible is common knowledge. The method draws from statistical mechanics and Bayes theory.

HMM directly models the transition probability and phenotype probability, and calculates the probability of co-occurrence. Thus, it is a generative model.

b) Discriminative model: Finite samples > Discriminative function = Discriminative model > Prediction

If you model on the conditional probability P(m|o), it is the discriminative model. The basic idea is to establish the discriminant function with finite samples, and directly study the prediction model without considering the generative model of samples. Its representative theory is the statistical learning theory.

CRF is a discriminant model. MEMM is not a generative model, but a model with finite states based on state classification.

Topological structure

HMM and MEMM are a directed graph, while CRF is an undirected graph.

Global optimum or local optimum

HMM directly models the transition probability and the phenotype probability, and calculates the probability of co-occurrence.

MEMM establishes the probability of co-occurrence based on the transition probability and the phenotype probability. It calculates the conditional probability, and only adopts the local variance normalization, making it easy to fall into a local optimum.

CRF calculates the normalization probability in the global scope, rather than in the local scope as is the case with MEMM. It is an optimal global solution and resolves the labeling bias issue in MEMM.

Advantages and Disadvantages of CRF

Advantages

  • Compared with HMM: Since CRF does not have as strict independence assumptions as HMM does, it can accommodate any context information. Its feature design is flexible (the same as ME).
  • Compared with MEMM: Since CRF computes the conditional probability of global optimal output nodes, it overcomes the drawbacks of label bias in MEMM.
  • Compared with ME: CRF computes the joint probability distribution of the entire label sequence when an observation sequence intended for labeling is available, rather than defining the state distribution of the next state under the current state conditions given.

Disadvantages

CRF is highly computationally complex at the training stage of the algorithm. It makes it very difficult to re-train the model when newer data becomes available.

Conclusion:

This post detailed out a comparative analysis between Hidden Markov Model (HMM), Maximum Entropy Markov Models (MEMM), and Conditional Random Fields (CRF). In this post we categorically learnt that CRFs and MEMMS are mainly discriminative sequence models whereas HMMs are primarily generative sequence models. It is Bayes Rule that forms the basis of HMM. On the contrary, CRF and MEMM’s based on MaxEnt models over transition and observable features.

目录
相关文章
|
5月前
|
机器学习/深度学习 算法 关系型数据库
Hierarchical Attention-Based Age Estimation and Bias Analysis
【6月更文挑战第8天】Hierarchical Attention-Based Age Estimation论文提出了一种深度学习方法,利用层次注意力和图像增强来估计面部年龄。通过Transformer和CNN,它学习局部特征并进行序数分类和回归,提高在CACD和MORPH II数据集上的准确性。论文还包括对种族和性别偏倚的分析。方法包括自我注意的图像嵌入和层次概率年龄回归,优化多损失函数。实验表明,该方法在RS和SE协议下表现优越,且在消融研究中验证了增强聚合和编码器设计的有效性。
38 2
|
机器学习/深度学习 人工智能 自然语言处理
OneIE:A Joint Neural Model for Information Extraction with Global Features论文解读
大多数现有的用于信息抽取(IE)的联合神经网络模型使用局部任务特定的分类器来预测单个实例(例如,触发词,关系)的标签,而不管它们之间的交互。
183 0
|
机器学习/深度学习 存储 自然语言处理
RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level 论文解读
在文档级事件提取(DEE)任务中,事件论元总是分散在句子之间(跨句子问题),多个事件可能位于一个文档中(多事件问题)。在本文中,我们认为事件论元的关系信息对于解决上述两个问题具有重要意义,并提出了一个新的DEE框架
129 0
|
机器学习/深度学习 算法 计算机视觉
NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation
NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation
263 0
|
机器学习/深度学习 数据挖掘
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
57 1
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
|
编解码 计算机视觉
NeRF系列(3): Semantic-aware Occlusion Filtering Neural Radiance Fields in the Wild 论文解读
NeRF系列(3): Semantic-aware Occlusion Filtering Neural Radiance Fields in the Wild 论文解读
224 2
|
机器学习/深度学习 开发框架 数据建模
HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction
远程监督假设任何包含相同实体对的句子都反映了相同的关系。先前的远程监督关系抽取(DSRE)任务通常独立地关注sentence-level或bag-level去噪技术
176 0
|
机器学习/深度学习 编解码 自然语言处理
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers论文解读
我们提出了SegFormer,一个简单,高效而强大的语义分割框架,它将transformer与轻量级多层感知器(MLP)解码器统一起来。
803 0
|
存储 自然语言处理 测试技术
LASS: Joint Language Semantic and Structure Embedding for Knowledge Graph Completion 论文解读
补全知识三元组的任务具有广泛的下游应用。结构信息和语义信息在知识图补全中都起着重要作用。与以往依赖知识图谱的结构或语义的方法不同
129 0
|
机器学习/深度学习 编解码 数据可视化
Speech Emotion Recognition With Local-Global aware Deep Representation Learning论文解读
语音情感识别(SER)通过从语音信号中推断人的情绪和情感状态,在改善人与机器之间的交互方面发挥着至关重要的作用。尽管最近的工作主要集中于从手工制作的特征中挖掘时空信息,但我们探索如何从动态时间尺度中建模语音情绪的时间模式。
141 0
下一篇
无影云桌面