GT-Scan2: Bringing Bioinformatics to Alibaba Cloud

简介: Learn how Alibaba Cloud powers the cutting-edge genome sequence search tool, GT-Scan2, with its suite of big data products and serverless computing platform.

CRISPR-Cas9 is a genome editing tool that is creating a buzz in the science world. It is faster, cheaper and more accurate than previous techniques for editing the genome of living cells. It hence has the potential to revolutionize a wide range of applications.

CRISPR-Cas9 has a lot of potential especially in the health space as it allows the treatment of medical conditions that have a genetic component, including cancer, hepatitis B or even high cholesterol. Clinical trials have already started for patients with specific blood and solid cancer types.

CRISPR-Cas9 is suitable for these applications because it can be programmed to recognize and edit specific locations in the genome by pattern-matching unique sequences of DNA. However, for robust application in the clinic, the efficiency of CRISPR-Cas9 needs to be increased as does the speed with which target sites can be designed.

Researchers in the eHealth program of the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, developed GT-Scan2, a novel software tool to address both issues.

GT-Scan2 can help researchers find the most effective CRISPR/Cas9 targets in a genomic region by ranking targets by the predicted cutting efficiency. You can think of it as the "search-engine for the genome". GT-Scan2 will also report the number of potential off-targets for each target, where potential off-targets are other regions in the genome with 0-3 mismatches to the target.

  • Identifies optimal CRISPR-Cas9 targets in the human genome.
  • Combines information about the chromatin environment and sequence of the target site.

Architecture

A Web Application front end is used to access the GT Scan2 application and to submit the relevant jobs.

Image_1

When a user submits a job, GT-Scan2 inserts the job parameters as an item into a TableStore table via an API call. This allows the solution to be freely scalable without creating a bottleneck. The database entry triggers the first Function Compute function, which finds all putative CRISPR targets in the user-specified DNA sequence (fetched automatically upon user submission). Potential CRISPR target sites have fixed rules and can be easily found using a regular expression that completes in seconds and are inserted into a second TableStore table.

Image_2
GT-Scan2 is served directly from OSS making it a static web app without server-side processing. It retrieves the dynamic content (such as job results and parameters) via API calls using API Gateway from a NoSQL database (TableStore) using a JavaScript framework.

Applying Serverless Computing

All potential targets need to be evaluated for their off-target risk using the efficient string matching tool, Bowtie. Though Bowtie only requires a reduced representation of the 3 billion letter genomic sequence, the size of these index files still reaches 915 MB for the human genome. Even though Alibaba Cloud Function Compute supports temp spaces of this size, the implementation divides the genome into smaller blocks to enable parallel processing. For an average run, GT-Scan2 hence triggers 200-500 individual Function Compute functions, which simultaneously update the scores for the different putative targets in TableStore. During this process, the frontend is polling this table via API Gateway and updating the webpage as results come in, eliminating the need for server-side compute.

Alibaba Cloud Function Compute provides a framework to develop a future-ready software package that is able to support medical genome engineering applications. It has the ability to instantaneously scale at run time to the optimal capability by spawning the appropriate number of functions to cope with the varying complexity of different genes. Other benefits include only paying for the storage when no compute is triggered; jobs not competing with web server resources as the website is a static page with dynamic content being updated through Angular 2 and the API Gateway; as well as not needing to maintain compute instances (security patches of OS).

Improvements

GT-Scan deployment benefitted from the Alibaba Cloud specific architectural patterns and services. Some of them are listed below.

  • Uses asynchronous invoke method instead of queue based triggers. This allows shorter invoke times and removes the dependency on message queue.
  • Applies Batch read/write when accessing data from the NoSQL database, making IO more efficient.
  • GT Scan deployment streams all logs to Alibaba Cloud Log Service, which allows easier troubleshooting of issues with the workflow operations. Access to logs in a single location allows user to pin point issues easily without having to spend time on logging into server or individual service consoles.

Image_3

Automated Deployments

The open sourced Fun Tool (Fun with Serverless) will enable automated deployments of API Gateway and Function Compute resources making deployments of new GT Scan versions a breeze. The tool allows automated deployments of components defined in a simple YAML file.

What's Next?

Analytics

Leverage Alibaba Cloud's award winning big data platform to create a Machine Learning Pipeline will enable sophisticated analyses to be integrated in the application. This is of specific relevance for personalized health applications, which identify editing strategies for individual patients.

Image_4

Log Analysis

Alibaba Cloud Log Service allows exporting log files for future analysis leveraging Alibaba Cloud's big data platform of existing open sources analysis platforms available at CSIRO's disposal. The log file exports can then be plugged into an existing machine learning pipeline to learn from the usage patterns of the GT-Scan application.

Image_5

Ref

https://community.alibabacloud.com/blog/gt-scan2%253A-bringing-bioinformatics-to-alibaba-cloud_593841?spm=a2c65.11461537.0.0.62ef5355hBhpcO

相关实践学习
阿里云表格存储使用教程
表格存储(Table Store)是构建在阿里云飞天分布式系统之上的分布式NoSQL数据存储服务,根据99.99%的高可用以及11个9的数据可靠性的标准设计。表格存储通过数据分片和负载均衡技术,实现数据规模与访问并发上的无缝扩展,提供海量结构化数据的存储和实时访问。 产品详情:https://www.aliyun.com/product/ots
目录
相关文章
|
域名解析 SEO 搜索推荐
网络基础知识之————A记录和CNAME记录的区别
1、什么是域名解析? 域名解析就是国际域名或者国内域名以及中文域名等域名申请后做的到IP地址的转换过程。IP地址是网路上标识您站点的数字地址,为了简单好记,采用域名来代替ip地址标识站点地址。域名的解析工作由DNS服务器完成。
11855 1
|
人工智能 算法 数据安全/隐私保护
基于文档智能和百炼平台的RAG应用-部署实践有感
本文对《文档智能 & RAG让AI大模型更懂业务》解决方案进行了详细测评,涵盖实践原理理解、部署体验、LLM知识库优势及改进空间、适用业务场景等方面。测评指出,该方案在提升AI大模型对特定业务领域的理解和应用能力方面表现突出,但需在技术细节描述、知识库维护、多语言支持、性能优化及数据安全等方面进一步完善。
542 1
|
运维 监控 Linux
服务器管理面板大盘点: 8款开源面板助你轻松管理Linux服务器
在数字化时代,服务器作为数据存储和计算的核心设备,其管理效率与安全性直接关系到业务的稳定性和可持续发展。随着技术的不断进步,开源社区涌现出众多服务器管理面板,这些工具以其强大的功能、灵活的配置和友好的用户界面,极大地简化了Linux服务器的管理工作。本文将详细介绍8款开源的服务器管理面板,包括Websoft9、宝塔、cPanel、1Panel等,旨在帮助运维人员更好地选择和使用这些工具,提升服务器管理效率。
ly~
|
人工智能 自然语言处理 搜索推荐
人工智能在医学领域的应用
人工智能在医学领域的应用广泛,涵盖医学影像分析、医疗数据分析与预测、临床决策支持、药物研发、自然语言处理及智能健康管理等方面。它能提高诊断准确性,预测疾病风险与进展,优化治疗方案,加速药物研发,提升手术安全性,并实现个性化健康管理,有效推动了医疗科技的进步。
ly~
788 3
|
缓存 Linux Docker
docker 跨平台构建镜像
docker 跨平台构建镜像
479 0
|
算法 C++
【C++入门到精通】智能指针 shared_ptr循环引用 | weak_ptr 简介及C++模拟实现 [ C++入门 ]
【C++入门到精通】智能指针 shared_ptr循环引用 | weak_ptr 简介及C++模拟实现 [ C++入门 ]
711 0
|
Python
ERROR: Failed building wheel for osgeo
ERROR: Failed building wheel for osgeo
739 0
|
Python Windows
Jupyter Notebook的使用
Jupyter Notebook的使用
520 0
|
前端开发 JavaScript 测试技术
深入探索自动化测试框架:Selenium与Appium的对比分析
【4月更文挑战第29天】 在快速迭代的软件发展环境中,自动化测试已成为确保软件质量和加速产品上市的关键步骤。本文将重点探讨两种广泛使用的自动化测试框架——Selenium和Appium,通过对比它们的核心特性、适用场景及执行效率,为软件开发和测试团队提供选择指南。文章不仅分析了各自的技术架构和脚本语言支持,还讨论了它们在处理Web应用和移动应用测试时的优缺点,旨在帮助读者根据项目需求做出更加明智的选择。

热门文章

最新文章