新型冠状病毒SARS-CoV-2 RNA在发病后数周内在上呼吸道或下呼吸道中检测到,用于泛病原体检测的基因组下一代测序(mNGS)为此次新冠病毒SARS-CoV-2 RNA 的早期发现和准确测序立下了汗马功劳。
目前针对新型冠状病毒核酸检测,是指多重荧光RT-PCR试剂盒,主要针对新冠病毒ORF1ab、N、E基因设置三靶标做的荧光检测。抗体检测就是病毒进入身体,身体的免疫系统产生了针对新型冠状病毒的特殊抗体lgM和lgG,能检查到抗体就检查出来是否被病毒感染和康复。而mNGS是通过测序仪对病人病灶组织的深度测序获得的宏基因组数据,来定位和排查多种病原体。
尽管已经有众多新型冠状病毒RT-PCR试剂盒可选,由于病毒浓度和试剂盒质量,相关RT-PCR试剂盒等试剂出现假阴性较高的问题,导致医生和患者往往需要重复多次检测和长时间等待检测结果。
mNGS的技术优势为通过一次检测就可以排查所有已知的病原体, mNGS检测能有效避免重复采样给医生和患者带来的操作难度,也避免了PCR检测手段下多次检测筛查所需的大量样本在临床中难以实现的问题。基于mNGS核酸序列比对的分析方式,一旦病原体的基因组已知,通过更新数据库,就可以实现高效准确检测病原体的功能。
阿里云基因计算服务AGS提供了针对mNGS宏基因组测序数据的快速比对能力,对一组肺泡采样测序后的宏基因组数据3.2Gbase(22M reads),60秒内可以完成和已知的病原体基因组序列库包括新型冠状病毒SARS-CoV-2,或者39种BetaCov RNA的参考序列的比对,并且支持自定义的数万种的病毒库的上传和比对。对于疾控中心,医院,实验室只需要一个对象存储Bucket, 以及命令行AGS就可以完成整个的比对过程,并拿到高质量的匹配reads的数据和初步质量报告,为多种病原体检测,进一步的新冠病毒的蛋白质研究和,变异研究提供了快捷准确的数据支撑。
与社会共抗疫情,阿里云基因计算服务AGS面向基因测序厂商,疾控中心,医院,学校,制药企业等开放mNGS RNA比对计算能力,欢迎申请使用.
准备
- 下载和安装AGS命令行,请参见AGS命令行帮助。
- 下载安装ossutil命令行,请参见ossutil命令行帮助.
- 准备一个阿里云账号,以及准备一个存放mNGS测序数据的对象存储Bucket, e.g. oss://my-test-shenzhen
- 为服务AGS配置bucket访问权限。e.g. ags config oss my-test-shenzhen
- 上传mNGS 测序数据到Bucket。
e.g.
ossutil cp ICU6G_S2_L001_R1_001.fastq.gz oss://my-test-shenzhen/cov2-samples/
ossutil cp ICU6G_S2_L001_R2_001.fastq.gz oss://my-test-shenzhen/cov2-samples/
- 运行比对任务来对mNGS数据和已知RNA序列和序列数据库做比对,重复执行5,6步骤对不同的样本实现比对。
Usage:
ags remote run rna-mapping \ # <rna-mapping>: RNA 序列的比对任务
--region cn-shenzhen \ # <cn-shenzhen|cn-beijing|...>: 地域ID,目前支持深圳和北京。
--bucket my-test-shenzhen \ # <bucket_name> 对象存储bucket的名称
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \ # 双端测序数据fq1相对路径
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz \ # 双端测序数据fq2的相对路径
--output-bam bam/ICU6G_S2.bam \ #产出比对结果bam的输出路径,报告也在同样位置,以.txt结尾
--reference [sars-cov-2 | betacov-ncbi-39 | <path of RNA library reference in specified bucket >] # 参考序列预置了新型冠状病毒sars-cov-2和目前已经知道的39种betacov的冠状病毒,可以指定自定义的病毒序列库
e.g.
新型冠状病毒比对
1. 提交比对任务比较ICU6G_S2_L001 的测序样本和新型冠状病毒的相似度
ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2.bam \
--reference sars-cov-2
INFO[0002] {"JobName":"rna-mapping-gpu-2ms6w"}
INFO[0002] Job submit succeed
2. 检查比对任务和比对结果
e.g.
在这个比对任务任务中,10M reads(1.4Gbase)和新型冠状病毒序列MN908947.3 在43秒完成比对,比对产生了3629个高质量重合的reads,并在新型冠状病毒特征区间有超过120分的reads条数有404个。说明可以精确的从次样本的测序数据中检测出SARS-CoV-2 RNA的序列。
High Quality Mapped Reads is: 3629
Matched reads in orf1ab range is: 480
Matched reads in orf1ab range with alignment score (AS) is greater than 120: 404
feature sequence of ICU6G_S2_L001 is similar to SARS-CoV-2 with very high mappQ and AS reads: True
ags remote get rna-mapping-gpu-2ms6w --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| JOB NAME | JOB NAMESPACE | STATUS | CREATE TIME | DURATION | FINISH TIME | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-2ms6w | XXXXXXXXXXXX | Succeeded | 2020-03-04 16:40:30 +0800 CST | 43s | 2020-03-04 16:41:13 +0800 CST | 10369818 | 1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
+---------------------------------+--------------------------------------------+
| JOB DETAIL | |
+---------------------------------+--------------------------------------------+
| rna_matached_reads | 480 |
| rna_is_sars_cov2 | True |
| rna_mapping_oss_region | cn-shenzhen |
| rna_mapping_fastq_second_name | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_mapping_no_unmapped | |
| rna_mapping_service | s |
| rna_matached_reads_alignment | 404 |
| rna_high_quality_mapped | 3629 |
| rna_mapping_fastq_first_name | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_mark_dup | |
| rna_mapping_reference_file_name | sars-cov-2 |
| rna_cov_detail_file | bam/ICU6G_S2.bam.cov.txt |
| rna_mapping_bam_file_name | bam/ICU6G_S2.bam |
| rna_mapping_bucket_name | my-test-shenzhen |
+---------------------------------+--------------------------------------------+
3. 下载比对数据bam和报告
ossutil ls oss://my-test-shenzhen/bam/ICU6G_S2.bam
LastModifiedTime Size(B) StorageClass ETAG ObjectName
2020-03-04 16:41:11 +0800 CST 356320 Standard 9596D012A30438A0073A2A0B38F5D578 oss://my-test-shenzhen/bam/ICU6G_S2.bam
2020-03-04 16:41:11 +0800 CST 2889 Standard 63175E7180D110BA9D3BAB34F4313C59 oss://my-test-shenzhen/bam/ICU6G_S2.bam.cov.txt
2020-03-04 16:41:11 +0800 CST 396 Standard 940D51FF7ECFF60B5E5A41D1F635180D oss://my-test-shenzhen/bam/ICU6G_S2.bam.summary.json
ossutil cp oss://my-test-shenzhen/bam/HKU2_160660.summary.json .
ossutil cp -r oss://my-test-shenzhen/bam/ICU6G_S2.bam.cov.txt .
ossutil cp oss://my-test-shenzhen/bam/HKU2_160660.bam .
Example for sars-cov-2 RNA detected
cat bam/ICU6G_S2.bam.cov.txt
Summary:
High Quality Mapped Reads is: 3629
Matched reads in orf1ab range is: 480
Matched reads in orf1ab range with alignment score (AS) is greater than 120: 404
/data/cov2-samples_ICU6G_S2_L001_R1_001.fastq.gz-output/ICU6G_S2.bam is similar to SARS-CoV-2 with very high mappQ and AS reads: True
21571 21581 21591 21601 21611 21621 21631
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTT GTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGT CAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGA CCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCA agtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgcca AGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
atgtttgtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
TGTTTGTTTTTCTTGTTTT CACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
tgtttgtttttcttgtttt
TTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCAATTACCCCCTGC
gtttttcttgttttattgccactagtctctagtcagtgtgttaatcttacaaccagaactcaattaccccctgc
进一步的分析比对数据
用户可以通过samtools stats, plot-bamstat 等工具对比对bam产出,来实现对coverage, depth等的进一步分析相似度,使用产出的bam数据,可以进一步实现蛋白质组成,以及变异分析。
e.g. stats
Coverage Analysis
已知的39个Beta Coronavirus的病毒比对
ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2_virus.bam \
--reference betacov-ncbi-39
INFO[0011] {"JobName":"rna-mapping-gpu-6mpcc"}
INFO[0011] Job submit succeed
ags remote get rna-mapping-gpu-6mpcc --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| JOB NAME | JOB NAMESPACE | STATUS | CREATE TIME | DURATION | FINISH TIME | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-6mpcc | XXXXXXXXX | Succeeded | 2020-03-04 17:36:21 +0800 CST | 40s | 2020-03-04 17:37:01 +0800 CST | 10369818 | 1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
# 2014 mapped reads detected, but no mapped reads found in range
+---------------------------------+--------------------------------------------+
| JOB DETAIL | |
+---------------------------------+--------------------------------------------+
| rna_mapping_reference_file_name | betacov-ncbi-39 |
| rna_matached_reads_alignment | 0 |
| rna_mapping_bam_file_name | bam/ICU6G_S2_virus.bam |
| rna_mapping_fastq_first_name | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_oss_region | cn-shenzhen |
| rna_cov_detail_file | bam/ICU6G_S2_virus.bam.cov.txt |
| rna_mapping_no_unmapped | |
| rna_matached_reads | 0 |
| rna_mapping_mark_dup | |
| rna_mapping_service | s |
| rna_high_quality_mapped | 2014 |
| rna_mapping_bucket_name | my-test-shenzhen |
| rna_mapping_fastq_second_name | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_is_sars_cov2 | False |
+---------------------------------+--------------------------------------------+
使用自定义病毒库的比对
1. 从NCBI GeneBank下载reference序列合并成为一个多contig的参考序列
https://www.ncbi.nlm.nih.gov/nuccore
e.g. 搜索核酸包含'betacov'的所有参考系列, 并下载参考系列
2. 把下载的序列文件sequence.fa 改名为betacov-ncbi-test.fa
3. 上传reference 到对象存储bucket
ossutil cp betacov-ncbi-test.fa oss://my-test-shenzhen/ref/
4. 提交比对任务,指定reference的路径
ags remote run rna-mapping \
--region cn-shenzhen \
--fastq1 cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz \
--fastq2 cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz \
--bucket my-test-shenzhen \
--output-bam bam/ICU6G_S2_virus.bam \
--reference ref/betacov-ncbi-test.fa
INFO[0002] {"JobName":"rna-mapping-gpu-69mwb"}
INFO[0002] Job submit succeed
5. 查看比对报告和获取匹配的比对数据
ags remote get rna-mapping-gpu-69mwb --show
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| JOB NAME | JOB NAMESPACE | STATUS | CREATE TIME | DURATION | FINISH TIME | TOTAL READS | TOTAL BASES |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| rna-mapping-gpu-69mwb | 1365606736606053 | Succeeded | 2020-03-04 17:47:00 +0800 CST | 40s | 2020-03-04 17:47:40 +0800 CST | 10369818 | 1456539874 |
+-----------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
+---------------------------------+--------------------------------------------+
| JOB DETAIL | |
+---------------------------------+--------------------------------------------+
| rna_mapping_fastq_first_name | cov2-samples/ICU6G_S2_L001_R1_001.fastq.gz |
| rna_mapping_fastq_second_name | cov2-samples/ICU6G_S2_L001_R2_001.fastq.gz |
| rna_mapping_mark_dup | |
| rna_mapping_oss_region | cn-shenzhen |
| rna_cov_detail_file | bam/ICU6G_S2_virus.bam.cov.txt |
| rna_is_sars_cov2 | False |
| rna_mapping_bam_file_name | bam/ICU6G_S2_virus.bam |
| rna_mapping_service | s |
| rna_matached_reads_alignment | 0 |
| rna_high_quality_mapped | 2014 |
| rna_mapping_bucket_name | my-test-shenzhen |
| rna_mapping_no_unmapped | |
| rna_mapping_reference_file_name | ref/betacov-ncbi-test.fa |
| rna_matached_reads | 0 |
+---------------------------------+--------------------------------------------+
+---------------------------------+------------------------------------------+
6. 下载比对数据做进一步分析
ossutil ls oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam
LastModifiedTime Size(B) StorageClass ETAG ObjectName
2020-03-04 17:47:38 +0800 CST 753458 Standard DF7B1A6CA5AF5DE6BF4FFDBB6DEF71C3 oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam
2020-03-04 17:47:38 +0800 CST 1474 Standard 9D7968A779A0DE7C1993CC2A8D0E5A56 oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.cov.txt
2020-03-04 17:47:38 +0800 CST 397 Standard 81170E30BAAFEB947A2238E015171A51 oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.summary.json
Object Number is: 3
ossutil cp oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam.summary.json .
cat bam/ICU6G_S2_virus.bam.summary.json
{
"total_reads":10369818,
"total_bases":1456539874,
"pass_vendor_filter_reads":10369818,
"mapped_reads":6736,
"pair_reads":6680,
"properly_paired_reads":6520,
"mapq_40_to_inf_reads":2030,
"mapq_30_to_40_reads":0,
"mapq_20_to_30_reads":1,
"mapq_10_to_20_reads":3,
"mapq_0_to_10_reads":23,
"mapq_0_reads":10367761,
"GC":"46.499%",
"total_alignment":2057,
"supplementary_alignment":0
}%
ossutil cp oss://my-test-shenzhen/bam/ICU6G_S2_virus.bam .
samtools view bam/ICU6G_S2_virus.bam