Working with Big Data on Alibaba Cloud

简介: You know Alibaba Cloud can be used to deploy applications. You may be less familiar with its Big Data storage and management options.


You know Alibaba Cloud can be used to deploy applications. You may be less familiar with its Big Data storage and management options.

In fact, Alibaba offers a range of Big Data solutions. This article outlines them and explains which types of Big Data services on the Alibaba Cloud align with various workloads.

Data Storage

Let's start with storage, since that is the most fundamental requirement of Big Data. OSS (Object Storage Service) is Alibaba's high-volume, cloud-based data storage service. It is available for storing extremely large quantities of data of any type, and from any source.

OSS can be used for data that must be accessed frequently (such as multimedia files), as well as for archival and other low-use purposes. It includes tools for migrating large quantities of data to and from the OSS storage system, along with an SDK, and a REST API.


The SDK includes full interfaces with the major front- and backend website and web-service languages, as well as Android and iOS. SDK commands for these languages and platforms cover a wide range of functions, including object upload, download, and management, complex and sophisticated image processing and manipulation, and web-oriented features, such as static website hosting and access management.

Multimedia and Image Files

OSS is particularly well-suited for such things as handling high volumes of multimedia and image files. It can be used in conjunction with both websites and apps for storage, streaming and other forms of serving, transcoding, and image format conversion. OSS can also be used to provide large volumes of data for rapid download.

OSS, however, is simply one part of Alibaba Cloud's rich Big Data infrastructure. Storage may be fundamental, but it is what you can do with the stored data that makes all the difference:

Data IDE and MaxCompute

Data IDE is Alibaba Cloud's overall framework for managing Big Data, and for taking care of such basic functions as scheduling, monitoring, and control of access permissions. It handles much of the underlying architecture, as well as many basic management tasks, allowing you to concentrate on the development and operation of large, data-oriented projects.

Data Processing Tools

Data IDE works closely with MaxCompute, Alibaba's platform for processing Big Data. MaxCompute includes a variety of tools for analyzing and processing very large volumes of data, including its own version of SQL, graphing and MapReduce functions, and concurrent upload and download functions. It includes an extensive SDK, and a full set of security features.

Working together, Data IDE and MaxCompute allow you to manage, process, and query large amounts of data. Because they simplify many of the processes involved in handling Big Data, they can significantly reduce the time required to mount a large, complex, and data-intensive website. They can also help to reduce the volume and cost of storage and data processing and provide a solid basis for in-depth analytics.


Alibaba Cloud also offers E-MapReduce, a very rich framework for managing and processing Big Data, based on Hadoop and Apache Spark. Hadoop and Spark cluster services form the core of E-MapReduce. The advantage of E-MapReduce is that it takes care of many of the low-level tasks required for cluster creation and provisioning, while at the same time providing an integrated framework for managing and using clusters.

Because E-MapReduce is based on Hadoop clustering and Spark cluster-oriented services, you can effectively use the storage and computation space it provides as if it were a self-contained system running on its own host, rather than being standard cloud-computing storage.

E-MapReduce Architecture

Architecturally, E-MapReduce consists of an agent layer at the base, with the HDFS and Tachyon file systems sitting directly above it. Above those sit the full Hadoop ecosystem, along with Spark and a wide variety of Apache tools. The top layer is the web-based user-administration interface, which makes it easy to use and manage the underlying tools and systems.

Full Hadoop/Spark Capabilities—The Easy Way

What this means is that if you can do it using Hadoop, Apache Spark, or their associated tools, you can do it in E-MapReduce—and you can do it much, much more easily than you could if you had to set up and provision Hadoop or Spark from scratch.

Needless to say, E-MapReduce integrates very easily with other Big Data-oriented elements of Alibaba Cloud. It can work with Alibaba Elastic Computing Services (ECS) apps, and it can process data stored in OSS. It can also send data to MaxCompute, and take MaxCompute output for further processing.

E-MapReduce can be used to process and serve massive amounts of data. Its Spark-based features make it particularly suitable for such things as streaming large volumes of data.

The Big Data Picture

What can you do with Alibaba's Big Data tools and services? E-MapReduce and MaxCompute both provide a very wide range of tools for performing such fundamental Big Data-oriented tasks as rapidly sorting, searching, and analyzing extremely large volumes of data.

You can use Alibaba Cloud's Big Data features to set up and manage backend services for high-volume, data-intensive websites which provide streaming services, generate large amounts of user upload and download traffic, or that rapidly return search results from massive quantities of data.

You can also use the same features to process and manage large media files, to efficiently handle extremely large databases in situations where rapid retrieval is important, or to deal with the processing and storage requirements of unique or industry-specific streams of high-volume data.

What does Alibaba Cloud do for you when it comes to Big Data? It can give you the tools, the storage, and the services you need to get your Big Data operation up and running exactly the way that you want it to run—quickly, easily, and with a minimum of overhead in terms of time, effort, or expense.

Michael Churchman


Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. He is a regular contributor.

SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 
网络协议 关系型数据库 MySQL
Cloud platform build management Topic | Cloud computing (FREE)
128 0
负载均衡 大数据 Linux
网络协议 安全 Unix
Admin & Engineer & Services Topic | Cloud computing (FREE)
云计算 Admin & Engineer & Services 习题(试读)
106 0
Java 应用服务中间件 Linux
Operation Topic | Cloud computing (FREE)
云计算 Operation 习题(试读)
128 0
监控 网络协议 安全
Security Topic | Cloud computing (FREE)
云计算 Security 习题(试读)
97 0
SQL 存储 算法
Project & Rdbms Topic | Cloud computing (FREE)
云计算 Project & Rdbms 习题(试读)
80 0
54 0
42 0
分布式计算 关系型数据库 数据库
New Product Launch: Alibaba Cloud Data Integration
Support online real-time & offline data exchange between all data sources, networks and locations with Alibaba Cloud Data Integration.
14545 0
New Product Launch: Alibaba Cloud Data Integration