GT-Scan2: Bringing Bioinformatics to Alibaba Cloud

本文涉及的产品
函数计算FC,每月15万CU 3个月
Serverless 应用引擎免费试用套餐包,4320000 CU,有效期3个月
简介: Learn how Alibaba Cloud powers the cutting-edge genome sequence search tool, GT-Scan2, with its suite of big data products and serverless computing platform.

CRISPR-Cas9 is a genome editing tool that is creating a buzz in the science world. It is faster, cheaper and more accurate than previous techniques for editing the genome of living cells. It hence has the potential to revolutionize a wide range of applications.

CRISPR-Cas9 has a lot of potential especially in the health space as it allows the treatment of medical conditions that have a genetic component, including cancer, hepatitis B or even high cholesterol. Clinical trials have already started for patients with specific blood and solid cancer types.

CRISPR-Cas9 is suitable for these applications because it can be programmed to recognize and edit specific locations in the genome by pattern-matching unique sequences of DNA. However, for robust application in the clinic, the efficiency of CRISPR-Cas9 needs to be increased as does the speed with which target sites can be designed.

Researchers in the eHealth program of the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, developed GT-Scan2, a novel software tool to address both issues.

GT-Scan2 can help researchers find the most effective CRISPR/Cas9 targets in a genomic region by ranking targets by the predicted cutting efficiency. You can think of it as the "search-engine for the genome". GT-Scan2 will also report the number of potential off-targets for each target, where potential off-targets are other regions in the genome with 0-3 mismatches to the target.

  • Identifies optimal CRISPR-Cas9 targets in the human genome.
  • Combines information about the chromatin environment and sequence of the target site.

Architecture

A Web Application front end is used to access the GT Scan2 application and to submit the relevant jobs.

Image_1

When a user submits a job, GT-Scan2 inserts the job parameters as an item into a TableStore table via an API call. This allows the solution to be freely scalable without creating a bottleneck. The database entry triggers the first Function Compute function, which finds all putative CRISPR targets in the user-specified DNA sequence (fetched automatically upon user submission). Potential CRISPR target sites have fixed rules and can be easily found using a regular expression that completes in seconds and are inserted into a second TableStore table.

Image_2
GT-Scan2 is served directly from OSS making it a static web app without server-side processing. It retrieves the dynamic content (such as job results and parameters) via API calls using API Gateway from a NoSQL database (TableStore) using a JavaScript framework.

Applying Serverless Computing

All potential targets need to be evaluated for their off-target risk using the efficient string matching tool, Bowtie. Though Bowtie only requires a reduced representation of the 3 billion letter genomic sequence, the size of these index files still reaches 915 MB for the human genome. Even though Alibaba Cloud Function Compute supports temp spaces of this size, the implementation divides the genome into smaller blocks to enable parallel processing. For an average run, GT-Scan2 hence triggers 200-500 individual Function Compute functions, which simultaneously update the scores for the different putative targets in TableStore. During this process, the frontend is polling this table via API Gateway and updating the webpage as results come in, eliminating the need for server-side compute.

Alibaba Cloud Function Compute provides a framework to develop a future-ready software package that is able to support medical genome engineering applications. It has the ability to instantaneously scale at run time to the optimal capability by spawning the appropriate number of functions to cope with the varying complexity of different genes. Other benefits include only paying for the storage when no compute is triggered; jobs not competing with web server resources as the website is a static page with dynamic content being updated through Angular 2 and the API Gateway; as well as not needing to maintain compute instances (security patches of OS).

Improvements

GT-Scan deployment benefitted from the Alibaba Cloud specific architectural patterns and services. Some of them are listed below.

  • Uses asynchronous invoke method instead of queue based triggers. This allows shorter invoke times and removes the dependency on message queue.
  • Applies Batch read/write when accessing data from the NoSQL database, making IO more efficient.
  • GT Scan deployment streams all logs to Alibaba Cloud Log Service, which allows easier troubleshooting of issues with the workflow operations. Access to logs in a single location allows user to pin point issues easily without having to spend time on logging into server or individual service consoles.

Image_3

Automated Deployments

The open sourced Fun Tool (Fun with Serverless) will enable automated deployments of API Gateway and Function Compute resources making deployments of new GT Scan versions a breeze. The tool allows automated deployments of components defined in a simple YAML file.

What's Next?

Analytics

Leverage Alibaba Cloud's award winning big data platform to create a Machine Learning Pipeline will enable sophisticated analyses to be integrated in the application. This is of specific relevance for personalized health applications, which identify editing strategies for individual patients.

Image_4

Log Analysis

Alibaba Cloud Log Service allows exporting log files for future analysis leveraging Alibaba Cloud's big data platform of existing open sources analysis platforms available at CSIRO's disposal. The log file exports can then be plugged into an existing machine learning pipeline to learn from the usage patterns of the GT-Scan application.

Image_5

Ref

https://community.alibabacloud.com/blog/gt-scan2%253A-bringing-bioinformatics-to-alibaba-cloud_593841?spm=a2c65.11461537.0.0.62ef5355hBhpcO

相关实践学习
消息队列+Serverless+Tablestore:实现高弹性的电商订单系统
基于消息队列以及函数计算,快速部署一个高弹性的商品订单系统,能够应对抢购场景下的高并发情况。
阿里云表格存储使用教程
表格存储(Table Store)是构建在阿里云飞天分布式系统之上的分布式NoSQL数据存储服务,根据99.99%的高可用以及11个9的数据可靠性的标准设计。表格存储通过数据分片和负载均衡技术,实现数据规模与访问并发上的无缝扩展,提供海量结构化数据的存储和实时访问。 产品详情:https://www.aliyun.com/product/ots
目录
相关文章
Query Performance Optimization at Alibaba Cloud Log Analytics Service
PrestoCon Day 2023,链接:https://prestoconday2023.sched.com/event/1Mjdc?iframe=no首页自我介绍,分享题目概要各个性能优化项能够优化的资源类别limit快速短路有什么优点?有啥特征?进一步的优化空间?避免不必要块的生成逻辑单元分布式执行,global 阶段的算子哪些字段无需输出?公共子表达式结合FilterNode和Proje
Query Performance Optimization at Alibaba Cloud Log Analytics Service
《Alibaba_Cloud_Whitepaper_-_The_Cloud_is_Taking_Over_Ecommerce》电子版地址
Alibaba_Cloud_Whitepaper_-_The_Cloud_is_Taking_Over_Ecommerce
65 0
《Alibaba_Cloud_Whitepaper_-_The_Cloud_is_Taking_Over_Ecommerce》电子版地址
|
机器学习/深度学习 存储 边缘计算
What I Think Alibaba Cloud Should Do In The Next 5 Years
In the next five years, Alibaba Cloud should focus on expanding its suite of services, delivering better customer service, and continuing to evolve its product offerings to meet the changing needs of its customers
120 0
What I Think Alibaba Cloud Should Do In The Next 5 Years
SAP gateway GWaaS single sign on
Created by Wang, Jerry, last modified on Mar 02, 2016
113 0
SAP gateway GWaaS single sign on
|
分布式计算 关系型数据库 数据库
New Product Launch: Alibaba Cloud Data Integration
Support online real-time & offline data exchange between all data sources, networks and locations with Alibaba Cloud Data Integration.
14587 0
New Product Launch: Alibaba Cloud Data Integration
How Does Alibaba Cloud Power the Biggest Online Shopping Festival?
Have you ever wondered what the underlying technology behind Alibaba Single’s Day Shopping Festival (also known as 11-11) is like?
3346 0
How Does Alibaba Cloud Power the Biggest Online Shopping Festival?
|
安全
Three Reasons to Add Alibaba Cloud to Your Multi-Cloud Strategy
From the minute you chose one of the world’s most popular cloud computing companies as your strategic cloud services provider, your world became multi-cloud.
2118 0
Three Reasons to Add Alibaba Cloud to Your Multi-Cloud Strategy
|
网络协议 安全 关系型数据库
Using Mautic Automated Marketing on an Alibaba Cloud Instance with DirectMail
In this tutorial, we will install Mautic on a LEMP stack, using the Webinoly Optimized server automation tool.
3697 0
Using Mautic Automated Marketing on an Alibaba Cloud Instance with DirectMail
Alibaba Cloud Poses Future Challenge to UK Engineers
Future Challenge is a big data contest launched by Alibaba Cloud and the UK’s national weather service Met Office.
2568 0
Alibaba Cloud Poses Future Challenge to UK Engineers
|
人工智能 固态存储 内存技术
Alibaba Cloud Launches Dual-mode SSD to Optimize Hyper-scale Infrastructure Performance
Alibaba has announced today the launch of a new system that aims to optimize the storage performance of hyper-scale infrastructure in addressing incre.
2843 0

热门文章

最新文章