gget: 一款强大的基因组参考数据库的高效查询工具-阿里云开发者社区

gget: 一款强大的基因组参考数据库的高效查询工具

2023-01-06 382

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 开源 Python 和命令行程序 gget 可以高效、轻松地以编程方式访问存储在各种大型公共基因组参考数据库中的信息。 gget 与可获取用户生成的测序数据的现有工具一起使用，以取代在基因组数据分析过程中效率低下、可能容易出错的手动网络查询。虽然 gget 模块的灵感来自于繁琐的单细胞 RNA-seq 数据分析任务），但我们预计它们可用于广泛的生物信息学任务。

开源 Python 和命令行程序 gget 可以高效、轻松地以编程方式访问存储在各种大型公共基因组参考数据库中的信息。 gget 与可获取用户生成的测序数据的现有工具一起使用，以取代在基因组数据分析过程中效率低下、可能容易出错的手动网络查询。虽然 gget 模块的灵感来自于繁琐的单细胞 RNA-seq 数据分析任务），但我们预计它们可用于广泛的生物信息学任务。

可以通过运行“pip install gget”从命令行安装 gget。下图描述了每个 gget 工具的一个用例和相应的输出。每个 gget 工具都有一个详尽的手册，可作为 Python 环境中的函数文档或在命令行中使用帮助标志 [-h] 作为标准输出。

gget工具地址

gget地址：https://pachterlab.github.io/gget/

gget 示例存储库：https://github.com/pachterlab/gget_examples

gget安装

pip install --upgrade gget

或者

conda install -c bioconda gget

在 Jupyter Lab / Google Colab中调用

import gget

gget模块

* [`gget ref`](https://pachterlab.github.io/gget/ref.html)

Fetch File Transfer Protocols (FTPs) and metadata for reference genomes and annotations from [Ensembl](https://www.ensembl.org/) by species.

* [`gget search`](https://pachterlab.github.io/gget/search.html)

Fetch genes and transcripts from [Ensembl](https://www.ensembl.org/) using free-form search terms.

* [`gget info`](https://pachterlab.github.io/gget/info.html)

Fetch extensive gene and transcript metadata from [Ensembl](https://www.ensembl.org/), [UniProt](https://www.uniprot.org/), and [NCBI](https://www.ncbi.nlm.nih.gov/) using Ensembl IDs.

* [`gget seq`](https://pachterlab.github.io/gget/seq.html)

Fetch nucleotide or amino acid sequences of genes or transcripts from [Ensembl](https://www.ensembl.org/) or [UniProt](https://www.uniprot.org/), respectively.

* [`gget blast`](https://pachterlab.github.io/gget/blast.html)

BLAST a nucleotide or amino acid sequence to any [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) database.

* [`gget blat`](https://pachterlab.github.io/gget/blat.html)

Find the genomic location of a nucleotide or amino acid sequence using [BLAT](https://genome.ucsc.edu/cgi-bin/hgBlat).

* [`gget muscle`](https://pachterlab.github.io/gget/muscle.html)

Align multiple nucleotide or amino acid sequences to each other using [Muscle5](https://www.drive5.com/muscle/).

* [`gget enrichr`](https://pachterlab.github.io/gget/enrichr.html)

Perform an enrichment analysis on a list of genes using [Enrichr](https://maayanlab.cloud/Enrichr/).

* [`gget archs4`](https://pachterlab.github.io/gget/archs4.html)

Find the most correlated genes to a gene of interest or find the gene's tissue expression atlas using [ARCHS4](https://maayanlab.cloud/archs4/).

* [`gget pdb`](https://pachterlab.github.io/gget/pdb.html)

Get the structure and metadata of a protein from the [RCSB Protein Data Bank](https://www.rcsb.org/).

* [`gget alphafold`](https://pachterlab.github.io/gget/alphafold.html)

Predict the 3D structure of a protein from its amino acid sequence using a simplified version of [DeepMind](https://www.deepmind.com/)’s [AlphaFold2](https://github.com/deepmind/alphafold).

gget快速入门

命令行

# Fetch all Homo sapiens reference and annotation FTPs from the latest Ensembl release$ gget ref homo_sapiens
# Get Ensembl IDs of human genes with "ace2" or "angiotensin converting enzyme 2" in their name/description$ gget search -s homo_sapiens 'ace2''angiotensin converting enzyme 2'# Look up gene ENSG00000130234 (ACE2) and its transcript ENST00000252519$ gget info ENSG00000130234 ENST00000252519
# Fetch the amino acid sequence of the canonical transcript of gene ENSG00000130234$ gget seq --translate ENSG00000130234
# Quickly find the genomic location of (the start of) that amino acid sequence$ gget blat MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS
# BLAST (the start of) that amino acid sequence$ gget blast MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS
# Align nucleotide or amino acid sequences stored in a FASTA file$ gget muscle path/to/file.fa
# Use Enrichr for an ontology analysis of a list of genes$ gget enrichr -db ontology ACE2 AGT AGTR1 ACE AGTRAP AGTR2 ACE3P
# Get the human tissue expression of gene ACE2$ gget archs4 -w tissue ACE2
# Get the protein structure (in PDB format) of ACE2 as stored in the Protein Data Bank (PDB ID returned by gget info)$ gget pdb 1R42 -o 1R42.pdb
# Predict the protein structure of GFP from its amino acid sequence$ gget setup alphafold # setup only needs to be run once$ gget alphafold MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Python (Jupyter Lab / Google Colab):

importggetgget.ref("homo_sapiens")
gget.search(["ace2", "angiotensin converting enzyme 2"], "homo_sapiens")
gget.info(["ENSG00000130234", "ENST00000252519"])
gget.seq("ENSG00000130234", translate=True)
gget.blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget.blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget.muscle("path/to/file.fa")
gget.enrichr(["ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"], database="ontology", plot=True)
gget.archs4("ACE2", which="tissue")
gget.pdb("1R42", save=True)
gget.setup("alphafold") # setup only needs to be run oncegget.alphafold("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK")

Call `gget` from R using [reticulate](https://rstudio.github.io/reticulate/):

system("pip install gget")
install.packages("reticulate")
library(reticulate)
gget<-import("gget")
gget$ref("homo_sapiens")
gget$search(list("ace2", "angiotensin converting enzyme 2"), "homo_sapiens")
gget$info(list("ENSG00000130234", "ENST00000252519"))
gget$seq("ENSG00000130234", translate=TRUE)
gget$blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget$blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget$muscle("path/to/file.fa", out="path/to/out.afa")
gget$enrichr(list("ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"), database="ontology")
gget$archs4("ACE2", which="tissue")
gget$pdb("1R42", save=TRUE)

gget: 一款强大的基因组参考数据库的高效查询工具

gget工具地址

gget安装

gget模块

gget快速入门

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

gget: 一款强大的基因组参考数据库的高效查询工具

gget工具地址

gget安装

gget模块

gget快速入门

热门文章

最新文章

相关课程

相关电子书