How Does Alibaba Cloud Power the Biggest Online Shopping Festival?

简介: Have you ever wondered what the underlying technology behind Alibaba Single’s Day Shopping Festival (also known as 11-11) is like?

Author: Alibaba Group Senior Staff Engineer Ding Yu

Have you ever wondered what the underlying technology behind Alibaba Single's Day Shopping Festival (also known as 11-11) is like? With sales reaching over US$17.8 billion in 2016, Single's Day has become the largest online shopping day in the world!

Alibaba Cloud's infrastructure has evolved rapidly to cope with increasing demands from the entire Alibaba ecosystem, especially for Single's Day. From 2009 to 2016, we have witnessed an increase of peak transaction volume of over 400 times!

1

Figure 1: Peak transaction volume on Single's Day from 2009 to 2016

Such feat can only be achieved with a robust computing architecture, not only capable of handling bursty traffic but also capable of quickly recovering from system faults. While sales revenue typically grows linearly with transaction volume, system complexity becomes exponentially difficult at such a large scale. What's more, deploying and maintaining such complex system is labor intensive and costly.

Designing a High Availability Infrastructure

As the Architect for Single's Day since 2009, I will share with you some of our key strategies in designing our infrastructure.

Although cloud computing has freed us from the geographical constraints of data centers, supporting an event such as Single's Day isn't as straightforward as simply adding more servers. We need to know precisely how much computing power we need to ensure high availability and reliability while keeping costs at a minimum.

Alibaba Cloud tackles this problem from multiple angles:
1.Comprehensive load testing on system architecture
2.System architecture fault simulation
3.Cross-region server deployment
4.Automated intelligent control

We will cover these four topics in further detail in the following sections.

2

Figure 2: Enterprise high availability design

Comprehensive Load Testing on System Architecture

Load testing is one of the default metric for performance testing in most systems. Basically, what we do is to simulate the traffic load of Single's Day and test it on our existing infrastructure. We use traffic data collected from previous years as well as predicted data to account for this year's growth. One of the important purpose of load testing is not only to discover the maximum capacity but also to determine the most common applications and services that customers use during this period.

System Architecture Fault Simulation

Essentially, fault simulation is a form of stress testing on our system architecture. We intentionally disable certain services, overloading the system with heavy loads. In particular, we look out for any Single-Point-of-Failures (SPOFs) in our architecture and eliminate them.

Cross-Region Server Deployment

In most scenarios, servers only run within a single region. However, this approach may not be sufficient when faced with extreme loads during Single's Day. Therefore, we utilize cross-region deployment to expand the capacity and improve service availability. We split users into different servers based on user ID, and employ an active-active configuration in our clusters to maintain high availability and achieve seamless service handover. In addition, data is also backed up across multiple sites to enhance disaster recovery capabilities.

3

Figure 3: High availability multi-region cluster

Automated Intelligent Control

Even with all of the technologies discussed previously, it is almost impossible to control traffic flow and scale resources in a large system manually. That is why we use an automated intelligent control, which focuses on traffic control and fault recovery.

Because we don't have access to unlimited resources, there is always a possibility of having too much load. To handle this problem, we can prioritize users based on the type of request. For example, customers completing purchases should be prioritized over users who are only browsing a website. Once we prioritize them, we can put them in a queue and complete requests based on this queue. We can also adjust the service of quality received by users based on this queueing system.

4

Figure 4: User traffic control

As the number of devices increases, the probability of fault occurring in devices increases as well. When a server fails, our system detects this anomaly and reassigns the user to the next nearest server. This automatic approach significantly reduces delay, which in turn improves user experience and minimizes O&M costs. In addition, this system will trigger alarms to notify our engineers about these faults, helping our team to quickly locate and troubleshoot faults.

5

Figure 5: Server fault recovery

Conclusion

As we can see, powering an event as large as Single's Day is no easy task. With proper planning and design, we can cope even the most unexpected challenges for this event. We are confident that our evolved architecture can achieve a lot more for this year's Single's Day festival!

However, one question springs to mind – What do we do with all this computing power when the festival ends? For most of our systems, we adopt a hybrid cloud environment. With hybrid cloud, we can scale resources as required but also maintain a "lighter" system when the load is low (such as when Single's Day festival ends). This way, we can minimize operating costs while maximizing our capacity.

In addition, we utilize Alibaba Cloud's core products as well as our family of distributed middleware. Currently, our distributed middleware offerings are only limited to Mainland China customers, but we are hoping to make them available to customers from across the globe soon.

If you want to learn more about the underlying technology for Alibaba Single's Day, please check out my presentation video at The Computing Conference 2017.

If you are interested in building your own infrastructure with Alibaba Cloud products, you should definitely check out our attractive offers on 11-11 Cloud Deals!

Core Products (available globally):
Elastic Compute Service (ECS)
Server Load Balancer (SLB)
Auto Scaling
ApsaraDB for RDS
CDN

Distributed Middleware (currently only available in Mainland China):
• Distributed Relational Database Service (DRDS)
• Cloud Service Bus (CSB)
• Global Transaction Service (GTS)
• Application Real-Time Monitoring Service (ARMS)
• Message Queue (MQ)
• Enterprise Distributed Application Service (EDAS)

目录
相关文章
|
7月前
|
存储 机器学习/深度学习 编解码
wf309043@alibaba-inc.com
香港科技大学 江计算机科学与工程系香港特别行政区电子邮件: cjiangao@connect.ust.hk摘要-3D内容创建在各种应用中扮演着至关重要的角色,如游戏、机器人模拟和虚拟现实。然而,这个过程是劳动密集型和耗时的,需要熟练的设计师在创建一个单一的3D资产上投入相当多的精力。为了解决这一挑战,文本到3D生成技术已经成为自动化3D创建的一个有前途的解决方案。利用大型视觉语言模型的成功,这些技术旨在生成基于文本描述的3D内容。尽管最近在这一领域取得了进展,但现有解决方案在发电质量和效率方面仍然面临着重大的限制。在这次调查中,我们对最新的文本创建方法进行了深入的调查。我们提供了一个关于文本
45 0
|
Oracle 关系型数据库 数据库
|
域名解析 网络协议 Linux
|
网络协议 安全 关系型数据库
Manage Customer Relations with SuiteCRM on Alibaba Cloud
By Jeff Cleverley, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.
10109 3
Manage Customer Relations with SuiteCRM on Alibaba Cloud
|
分布式计算 关系型数据库 数据库
New Product Launch: Alibaba Cloud Data Integration
Support online real-time & offline data exchange between all data sources, networks and locations with Alibaba Cloud Data Integration.
14606 0
New Product Launch: Alibaba Cloud Data Integration
|
Java Maven Android开发
《Cloud Toolkit User Guide》
Alibaba Cloud Toolkit,面向 IDE(如 Eclipse 或 IntelliJ IDEA )的插件,帮助开发者更高效的开发、测试、诊断并部署适合云端运行的应用
38403 1
|
定位技术
Esri and Alibaba Cloud to Bring Enhanced Location Intelligence Technology to Cloud Users
Esri and Alibaba Cloud are working together to combine Esri’s excellence in GIS with Alibaba Cloud’s world-class Cloud Computing capabilities.
1403 0
Esri and Alibaba Cloud to Bring Enhanced Location Intelligence Technology to Cloud Users
|
网络协议 安全 关系型数据库
Using Mautic Automated Marketing on an Alibaba Cloud Instance with DirectMail
In this tutorial, we will install Mautic on a LEMP stack, using the Webinoly Optimized server automation tool.
3708 0
Using Mautic Automated Marketing on an Alibaba Cloud Instance with DirectMail
|
NoSQL Java 开发工具
Large-Scale Instant Messaging Hosting on Alibaba Cloud
How can we build a stable, high-concurrency instant messaging (IM) system architecture?
1568 0
Large-Scale Instant Messaging Hosting on Alibaba Cloud
|
NoSQL 大数据 API
GT-Scan2: Bringing Bioinformatics to Alibaba Cloud
Learn how Alibaba Cloud powers the cutting-edge genome sequence search tool, GT-Scan2, with its suite of big data products and serverless computing platform.
1635 0

热门文章

最新文章