Working with Big Data on Alibaba Cloud-阿里云开发者社区

开发者社区> 芷沁> 正文

Working with Big Data on Alibaba Cloud

简介: You know Alibaba Cloud can be used to deploy applications. You may be less familiar with its Big Data storage and management options.


You know Alibaba Cloud can be used to deploy applications. You may be less familiar with its Big Data storage and management options.

In fact, Alibaba offers a range of Big Data solutions. This article outlines them and explains which types of Big Data services on the Alibaba Cloud align with various workloads.

Data Storage

Let's start with storage, since that is the most fundamental requirement of Big Data. OSS (Object Storage Service) is Alibaba's high-volume, cloud-based data storage service. It is available for storing extremely large quantities of data of any type, and from any source.

OSS can be used for data that must be accessed frequently (such as multimedia files), as well as for archival and other low-use purposes. It includes tools for migrating large quantities of data to and from the OSS storage system, along with an SDK, and a REST API.


The SDK includes full interfaces with the major front- and backend website and web-service languages, as well as Android and iOS. SDK commands for these languages and platforms cover a wide range of functions, including object upload, download, and management, complex and sophisticated image processing and manipulation, and web-oriented features, such as static website hosting and access management.

Multimedia and Image Files

OSS is particularly well-suited for such things as handling high volumes of multimedia and image files. It can be used in conjunction with both websites and apps for storage, streaming and other forms of serving, transcoding, and image format conversion. OSS can also be used to provide large volumes of data for rapid download.

OSS, however, is simply one part of Alibaba Cloud's rich Big Data infrastructure. Storage may be fundamental, but it is what you can do with the stored data that makes all the difference:

Data IDE and MaxCompute

Data IDE is Alibaba Cloud's overall framework for managing Big Data, and for taking care of such basic functions as scheduling, monitoring, and control of access permissions. It handles much of the underlying architecture, as well as many basic management tasks, allowing you to concentrate on the development and operation of large, data-oriented projects.

Data Processing Tools

Data IDE works closely with MaxCompute, Alibaba's platform for processing Big Data. MaxCompute includes a variety of tools for analyzing and processing very large volumes of data, including its own version of SQL, graphing and MapReduce functions, and concurrent upload and download functions. It includes an extensive SDK, and a full set of security features.

Working together, Data IDE and MaxCompute allow you to manage, process, and query large amounts of data. Because they simplify many of the processes involved in handling Big Data, they can significantly reduce the time required to mount a large, complex, and data-intensive website. They can also help to reduce the volume and cost of storage and data processing and provide a solid basis for in-depth analytics.


Alibaba Cloud also offers E-MapReduce, a very rich framework for managing and processing Big Data, based on Hadoop and Apache Spark. Hadoop and Spark cluster services form the core of E-MapReduce. The advantage of E-MapReduce is that it takes care of many of the low-level tasks required for cluster creation and provisioning, while at the same time providing an integrated framework for managing and using clusters.

Because E-MapReduce is based on Hadoop clustering and Spark cluster-oriented services, you can effectively use the storage and computation space it provides as if it were a self-contained system running on its own host, rather than being standard cloud-computing storage.

E-MapReduce Architecture

Architecturally, E-MapReduce consists of an agent layer at the base, with the HDFS and Tachyon file systems sitting directly above it. Above those sit the full Hadoop ecosystem, along with Spark and a wide variety of Apache tools. The top layer is the web-based user-administration interface, which makes it easy to use and manage the underlying tools and systems.

Full Hadoop/Spark Capabilities—The Easy Way

What this means is that if you can do it using Hadoop, Apache Spark, or their associated tools, you can do it in E-MapReduce—and you can do it much, much more easily than you could if you had to set up and provision Hadoop or Spark from scratch.

Needless to say, E-MapReduce integrates very easily with other Big Data-oriented elements of Alibaba Cloud. It can work with Alibaba Elastic Computing Services (ECS) apps, and it can process data stored in OSS. It can also send data to MaxCompute, and take MaxCompute output for further processing.

E-MapReduce can be used to process and serve massive amounts of data. Its Spark-based features make it particularly suitable for such things as streaming large volumes of data.

The Big Data Picture

What can you do with Alibaba's Big Data tools and services? E-MapReduce and MaxCompute both provide a very wide range of tools for performing such fundamental Big Data-oriented tasks as rapidly sorting, searching, and analyzing extremely large volumes of data.

You can use Alibaba Cloud's Big Data features to set up and manage backend services for high-volume, data-intensive websites which provide streaming services, generate large amounts of user upload and download traffic, or that rapidly return search results from massive quantities of data.

You can also use the same features to process and manage large media files, to efficiently handle extremely large databases in situations where rapid retrieval is important, or to deal with the processing and storage requirements of unique or industry-specific streams of high-volume data.

What does Alibaba Cloud do for you when it comes to Big Data? It can give you the tools, the storage, and the services you need to get your Big Data operation up and running exactly the way that you want it to run—quickly, easily, and with a minimum of overhead in terms of time, effort, or expense.

Michael Churchman


Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. He is a regular contributor.


在应用中,有时会遇到用户询问如何使单台云服务器具备多个公网IP的问题。 具体如何操作呢,有了NAT网关这个也不是难题。
26708 0
如果在创建实例时没有设置密码,或者密码丢失,您可以在控制台上重新设置实例的登录密码。本文仅描述如何在 ECS 管理控制台上修改实例登录密码。
9266 0
2907 0
11130 0
阿里云服务器初级使用者可能面临的问题之一. 使用tomcat或者其他服务器软件设置端口号后,比如 一些不是默认的, mysql的 3306, mssql的1433,有时候打不开网页, 原因是没有在ecs安全组去设置这个端口号. 解决: 点击ecs下网络和安全下的安全组 在弹出的安全组中,如果没有就新建安全组,然后点击配置规则 最后如上图点击添加...或快速创建.   have fun!  将编程看作是一门艺术,而不单单是个技术。
10771 0
windows server 2008阿里云ECS服务器安全设置
最近我们Sinesafe安全公司在为客户使用阿里云ecs服务器做安全的过程中,发现服务器基础安全性都没有做。为了为站长们提供更加有效的安全基础解决方案,我们Sinesafe将对阿里云服务器win2008 系统进行基础安全部署实战过程! 比较重要的几部分 1.
9006 0
购买阿里云ECS云服务器后如何登录?场景不同,阿里云优惠总结大概有三种登录方式: 登录到ECS云服务器控制台 在ECS云服务器控制台用户可以更改密码、更换系.
12956 0
购买阿里云ECS云服务器后如何登录?场景不同,云吞铺子总结大概有三种登录方式: 登录到ECS云服务器控制台 在ECS云服务器控制台用户可以更改密码、更换系统盘、创建快照、配置安全组等操作如何登录ECS云服务器控制台? 1、先登录到阿里云ECS服务器控制台 2、点击顶部的“控制台” 3、通过左侧栏,切换到“云服务器ECS”即可,如下图所示 通过ECS控制台的远程连接来登录到云服务器 阿里云ECS云服务器自带远程连接功能,使用该功能可以登录到云服务器,简单且方便,如下图:点击“远程连接”,第一次连接会自动生成6位数字密码,输入密码即可登录到云服务器上。
21844 0