Working with Big Data on Alibaba Cloud

简介: You know Alibaba Cloud can be used to deploy applications. You may be less familiar with its Big Data storage and management options.

Effective_Tools_to_Refine_Your_Data_Presentation_Skills

You know Alibaba Cloud can be used to deploy applications. You may be less familiar with its Big Data storage and management options.

In fact, Alibaba offers a range of Big Data solutions. This article outlines them and explains which types of Big Data services on the Alibaba Cloud align with various workloads.

Data Storage

Let's start with storage, since that is the most fundamental requirement of Big Data. OSS (Object Storage Service) is Alibaba's high-volume, cloud-based data storage service. It is available for storing extremely large quantities of data of any type, and from any source.

OSS can be used for data that must be accessed frequently (such as multimedia files), as well as for archival and other low-use purposes. It includes tools for migrating large quantities of data to and from the OSS storage system, along with an SDK, and a REST API.

OSS SDK

The SDK includes full interfaces with the major front- and backend website and web-service languages, as well as Android and iOS. SDK commands for these languages and platforms cover a wide range of functions, including object upload, download, and management, complex and sophisticated image processing and manipulation, and web-oriented features, such as static website hosting and access management.

Multimedia and Image Files

OSS is particularly well-suited for such things as handling high volumes of multimedia and image files. It can be used in conjunction with both websites and apps for storage, streaming and other forms of serving, transcoding, and image format conversion. OSS can also be used to provide large volumes of data for rapid download.

OSS, however, is simply one part of Alibaba Cloud's rich Big Data infrastructure. Storage may be fundamental, but it is what you can do with the stored data that makes all the difference:

Data IDE and MaxCompute

Data IDE is Alibaba Cloud's overall framework for managing Big Data, and for taking care of such basic functions as scheduling, monitoring, and control of access permissions. It handles much of the underlying architecture, as well as many basic management tasks, allowing you to concentrate on the development and operation of large, data-oriented projects.

Data Processing Tools

Data IDE works closely with MaxCompute, Alibaba's platform for processing Big Data. MaxCompute includes a variety of tools for analyzing and processing very large volumes of data, including its own version of SQL, graphing and MapReduce functions, and concurrent upload and download functions. It includes an extensive SDK, and a full set of security features.

Working together, Data IDE and MaxCompute allow you to manage, process, and query large amounts of data. Because they simplify many of the processes involved in handling Big Data, they can significantly reduce the time required to mount a large, complex, and data-intensive website. They can also help to reduce the volume and cost of storage and data processing and provide a solid basis for in-depth analytics.

E-MapReduce

Alibaba Cloud also offers E-MapReduce, a very rich framework for managing and processing Big Data, based on Hadoop and Apache Spark. Hadoop and Spark cluster services form the core of E-MapReduce. The advantage of E-MapReduce is that it takes care of many of the low-level tasks required for cluster creation and provisioning, while at the same time providing an integrated framework for managing and using clusters.

Because E-MapReduce is based on Hadoop clustering and Spark cluster-oriented services, you can effectively use the storage and computation space it provides as if it were a self-contained system running on its own host, rather than being standard cloud-computing storage.

E-MapReduce Architecture

Architecturally, E-MapReduce consists of an agent layer at the base, with the HDFS and Tachyon file systems sitting directly above it. Above those sit the full Hadoop ecosystem, along with Spark and a wide variety of Apache tools. The top layer is the web-based user-administration interface, which makes it easy to use and manage the underlying tools and systems.

Full Hadoop/Spark Capabilities—The Easy Way

What this means is that if you can do it using Hadoop, Apache Spark, or their associated tools, you can do it in E-MapReduce—and you can do it much, much more easily than you could if you had to set up and provision Hadoop or Spark from scratch.

Needless to say, E-MapReduce integrates very easily with other Big Data-oriented elements of Alibaba Cloud. It can work with Alibaba Elastic Computing Services (ECS) apps, and it can process data stored in OSS. It can also send data to MaxCompute, and take MaxCompute output for further processing.

E-MapReduce can be used to process and serve massive amounts of data. Its Spark-based features make it particularly suitable for such things as streaming large volumes of data.

The Big Data Picture

What can you do with Alibaba's Big Data tools and services? E-MapReduce and MaxCompute both provide a very wide range of tools for performing such fundamental Big Data-oriented tasks as rapidly sorting, searching, and analyzing extremely large volumes of data.

You can use Alibaba Cloud's Big Data features to set up and manage backend services for high-volume, data-intensive websites which provide streaming services, generate large amounts of user upload and download traffic, or that rapidly return search results from massive quantities of data.

You can also use the same features to process and manage large media files, to efficiently handle extremely large databases in situations where rapid retrieval is important, or to deal with the processing and storage requirements of unique or industry-specific streams of high-volume data.

What does Alibaba Cloud do for you when it comes to Big Data? It can give you the tools, the storage, and the services you need to get your Big Data operation up and running exactly the way that you want it to run—quickly, easily, and with a minimum of overhead in terms of time, effort, or expense.

Michael Churchman

02

Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. He is a regular Fixate.io contributor.

相关实践学习
基于MaxCompute的热门话题分析
Apsara Clouder大数据专项技能认证配套课程:基于MaxCompute的热门话题分析
目录
相关文章
|
SQL 机器学习/深度学习 运维
实时计算 Flink:基于 Apache Flink 构建的大数据计算平台(附白皮书)
实时计算 Flink版是阿里云提供的基于 Apache Flink 构建的企业级、高性能实时大数据处理系统,由Apache Flink创始团队官方出品。
|
编解码
通信原理期末复习——基础小题汇总(二)
通信原理期末复习——基础小题汇总(二)
654 0
通信原理期末复习——基础小题汇总(二)
|
3月前
|
人工智能 自然语言处理 安全
2025年企业如何选择智能客服系统:企业级智能客服系统推荐
在数字化转型加速的今天,智能客服已成为企业提升服务效率与客户体验的核心工具。本文系统梳理主流智能客服解决方案,重点解析阿里云旗下瓴羊Quick Service如何依托通义大模型,实现全渠道、全链路、全场景的智能化服务升级,助力企业从“拥有”到“用好”,真正释放智能客服的增长潜力。
|
存储 监控 大数据
阿里云实时计算Flink在多行业的应用和实践
本文整理自 Flink Forward Asia 2023 中闭门会的分享。主要分享实时计算在各行业的应用实践,对回归实时计算的重点场景进行介绍以及企业如何使用实时计算技术,并且提供一些在技术架构上的参考建议。
1636 7
阿里云实时计算Flink在多行业的应用和实践
|
运维 Serverless 云计算
解锁协作与创新的钥匙:计算巢&JupyterHub 引领数据驱动新时代
在这个数字化转型的时代,JupyterHub 为教育、研究和企业提供了一种强大且灵活的解决方案,帮助团队和个人高效地协作和探索数据。无论您是数据科学家、教育工作者还是开发团队的一员,JupyterHub 都能通过其无与伦比的功能和易用性提升您的生产力和创新能力。计算巢提供
|
人工智能 IDE 开发工具
《C++人工智能开发 IDE 全解析:助力智能创新之路》
本文深入探讨了几款适合 C++ 人工智能开发的 IDE,包括 Visual Studio、CLion、Eclipse CDT 和 Qt Creator。每款 IDE 都有其独特的优势,如 Visual Studio 的强大调试工具、CLion 的代码导航和 CMake 支持、Eclipse CDT 的跨平台能力和丰富的插件生态系统,以及 Qt Creator 在界面开发方面的卓越表现。开发者应根据项目需求、团队协作和个人习惯选择最合适的 IDE,以提升开发效率和体验。
492 16
|
SQL 搜索推荐 OLAP
Flink 流批一体场景应用及落地情况
本文由阿里云 Flink 团队苏轩楠老师撰写,旨在介绍 Flink 流批一体在几个常见场景下的应用。
69071 11
Flink 流批一体场景应用及落地情况
|
存储 SQL 缓存
MySQL性能优化(硬件,系统配置,表结构,SQL语句)
想必大家都知道,面试期间一提到数据库,就会聊到数据库优化相关问题。网上关于数据库优化的文章也是眼花缭乱,层出不穷。今天将会通过这篇文章细分几点给大家汇总整理出一套关于MySQL数据库的优化方案,让大家通过学习这篇文章不再被面试官吊打!
1543 0
MySQL性能优化(硬件,系统配置,表结构,SQL语句)
|
存储 消息中间件 Java
新一代消息中间件—Apache Pulsar
新一代消息中间件—Apache Pulsar
1669 0
新一代消息中间件—Apache Pulsar
|
存储 监控 关系型数据库
DataX 概述、部署、数据同步运用示例
DataX是阿里巴巴开源的离线数据同步工具,支持多种数据源之间的高效传输。其特点是多数据源支持、可扩展性、灵活配置、高效传输、任务调度监控和活跃的开源社区支持。DataX通过Reader和Writer插件实现数据源的读取和写入,采用Framework+plugin架构。部署简单,解压即可用。示例展示了如何配置DataX同步MySQL到HDFS,并提供了速度和内存优化建议。此外,还解决了NULL值同步问题及配置文件变量传参的方法。
9696 5