2022小美赛B题The Genetic Process of Sequences序列的遗传过程思路分享
点击链接【2022小美赛数学建模思路分享】:https://jq.qq.com/?_wv=1027&k=pYYvA9gJ
点击链接【2022小美赛数学建模思路分享】:https://jq.qq.com/?_wv=1027&k=pYYvA9gJ
点击链接【2022小美赛数学建模思思路分享】:https://jq.qq.com/?_wv=1027&k=pYYvA9gJ
Sequence homology is the biological homology between DNA, RNA, or protein
sequences, defifined in terms of shared ancestry in the evolutionary history of
life[1]. Homology among DNA, RNA, or proteins is typically inferred from their
nucleotide or amino acid sequence similarity. Signifificant similarity is strong
evidence that two sequences are related by evolutionary changes from a common
ancestral sequence[2].
Consider the genetic process of a RNA sequence, in which mutations in nu
cleotide bases occur by chance. For simplicity, we assume the sequence mutation
arise due to the presence of change (transition or transversion), insertion and
deletion of a single base. So we can measure the distance of two sequences by
the amount of mutation points. Multiple base sequences that are close together
can form a family, and they are considered homologous.
Your team are asked to develop a reasonable mathematical model to com
plete the following problems.
\1. Please design an algorithm that quickly measures the distance between
two suffiffifficiently long(> 103 bases) base sequences.
\2. Please evaluate the complexity and accuracy of the algorithm reliably, and
design suitable examples to illustrate it.
\3. If multiple base sequences in a family have evolved from a common an
cestral sequence, design an effiffifficient algorithm to determine the ancestral
sequence, and map the genealogical tree.
References
[1] Koonin EV. “Orthologs, paralogs, and evolutionary genomics”. Annual Re
view of Genetics. 39: 30938, 2005.
[2] Reeck GR, de Han C, Teller DC, Doolittle RF, Fitch WM, Dickerson RE,
et al. “Homology” in proteins and nucleic acids: a terminology muddle and
a way out of it. Cell. 50 (5): 667, 1987
序列同源性是DNA、RNA或蛋白质序列之间的生物学同源性,根据生命进化史中的共
同祖先定义。DNA、RNA或蛋白质之间的同源性通常是从它们的核苷酸或氨基酸序列
的相似性中推断出来的。显著的相似性有力地证明了两个序列与一个共同的祖先序
列[2]的进化变化相关。
考虑一个RNA序列的遗传过程,其中核苷酸碱基的突变是偶然发生的。为简单起
见,我们假设序列突变是由于单个碱基的变化(过渡或转换)、插入和删除而引起
的。所以我们可以用突变点的数量来测量两个序列的距离。紧密相连的多个碱基序
列可以形成一个家族,它们被认为是同源的。
您的团队被要求开发一个合理的数学模型来完成以下问题。
1 . 请设计一个算法,可以快速测量两个足够长的距离之间的距离(>10 3碱基序
列。
2.请可靠地评估算法的复杂度和准确性,并设计合适的例子来说明它。
3.如果一个家族中的多个碱基序列是从一个共同的祖先序列进化而来的,则设
计一种有效的算法来确定祖先序列,并绘制系谱树。