Alibaba Cloud MaxCompute vs. AWS Redshift vs. Azure SQL Data Warehouse

本文涉及的产品
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
简介:

6e06294dbfcb4956edf52e3310e5b481fd6d07f0_jpeg

Data is the currency of the digital world. How your organization stores, organizes, analyzes, and uses the data within its confines will largely determine how successful it is. Enterprises deal with large quantities of data, typically at petabyte scale, and they look to glean maximum value from all this data.

Cloud computing has been a game changer in this respect. What would be cost-prohibitive with traditional servers is now much more accessible with the economic and powerful solutions offered by cloud computing vendors.

Case in point: Data warehouse solutions hosted completely in the cloud. Thanks to cloud-based data lakes, what would have been impossible a few years ago is now made possible by the plummeting costs of data storage disks, and more powerful compute instances. This post explains how to use data warehouses in the cloud, and compares popular options on major public cloud platforms.

Data Warehouse Basics

A data warehouse is a centralized data store that’s used by multiple applications within your organization. If you’re looking to analyze small quantities of data that are a couple of GB in size, a data warehouse is too complex for your needs. A data warehouse makes sense only once you’ve scaled to a few hundred GB of data. At that point, you can’t function at the same speed and agility you used to, and you need a data warehouse.

The first thing to know about a data warehouse is that it is architected differently from small-scale database infrastructure. Rather than having databases that are restricted to hardware servers, a data warehouse is made of multiple servers that work together as a single unit.

Alibaba Cloud MaxCompute

Alibaba Cloud’s MaxCompute is a large-scale data analysis platform that is purpose-built for running big data workloads. Coming from the house of Alibaba, it was built out of necessity, when Alibaba needed a way to manage their ever-growing data that Oracle servers could no longer handle. It is battle-tested internally at Alibaba, where it has run a cluster of 10,000 server nodes. On a daily basis, 14,000 developers at Alibaba run three million jobs on it, and it stores 99% of all of Alibaba’s data. It is the first database service to scale to 100 TB of data at 7,000 BigBench Query-per-minute (BBQpm).

MaxCompute makes data migration simple with a variety of options. You can use Alibaba Cloud’s own tools like the MaxCompute client, or DataWorks, or even popular external tools like Flume, Logstash, or Fluentd. The uploaded data is stored in an SQL database, and can easily be scaled up to petabytes in size.

The most recent version of MaxCompute supports SQL 2.0, and interestingly allows for querying of unstructured data like images and video content. Despite the large quantities of data, and some of it being unstructured, MaxCompute is especially well-suited for real-time analysis. And the best part is that it is extremely easy to use and maintain. MaxCompute handles the difficulty of managing a distributed data store by having unique processes for clustering, indexing, and join optimization which all help with better data storage and retrieval at large scale.

With its recent US launch, MaxCompute is ready to change the way Big Data is processed across the world. With aggressive pricing, it is ready to take on similar services from the two other big cloud vendors—AWS and Azure.

AWS Redshift

AWS Redshift is one of the early services from the AWS stable. Similar to MaxCompute, it stores and analyzes data at petabyte scale. You can load data into Redshift using many AWS services such as S3, DynamoDB, or an SSH-enabled host on EC2. It leverages AWS IAM for security and access permissions. Further, you can encrypt your data using KMS, either on server-side or within the AWS cloud.

A unique feature of the service is Redshift Spectrum, which lets you query data that’s already in AWS S3. This means you don’t have to load your data into Redshift or transform your data. Instead, you can get to querying the data directly. However, if you’d rather have your data in Redshift and you have a lot of it, AWS Glue is an ETL service that makes data loading easy.

AWS recently announced new DC2 nodes which replace DC1 nodes at the same cost. They’re based on Intel’s Broadwell chips and offer twice the performance of the previous DC1 nodes and 30% better storage utilization.

With a variety of options for usage, AWS Redshift is an attractive option for data warehousing in the cloud.

Azure SQL Data Warehouse

Azure SQL data warehouse is the Big Data analysis solution from Microsoft. With Microsoft’s big footprint among the Fortune 500 enterprises, many of its customers would be interested in this service. Azure provides two flavors of this service—one optimized for elasticity, and the other optimized for compute. You could separate workloads across these two tiers, and it makes for an interesting choice. You can allocate or measure usage in the form of Data Warehouse Units (DWUs). There are two types of DWUs—a regular DWU, and a cDWU which is optimized for compute. Azure provides a level of service for each database which is measured in the form of Database Transaction Units (DTUs).

Azure has a tool called PolyBase, which is used to query external data without requiring the user to know Hadoop. PolyBase lets you import and export data to and from Hadoop, Azure Blob Storage, or Azure Data Lake Store, or query the data without moving it in and out of SQL Data Warehouse. SQL Data Warehouse is also well integrated with PowerShell, which lets you use scripting to automate common tasks.

Conclusion

In conclusion, all three data warehouse services mentioned here are powerful tools that take a different approach to the same challenge—analyzing big data in real time. If you have broader commitments that require you to choose Redshift or SQL Data Warehouse, it’s not a bad spot to be in. However, if you’re curious to try a powerful new option that is also cost-effective, MaxCompute is the way to go. Alibaba Cloud is offering a $300 credit for new users, making it easy to get a feel for what the platform has to offer before going all in. Try MaxCompute and start unlocking value from all your data in real time.

Bio

Twain Taylor

Twain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. His work involved reviewing stack traces, and resolving issues affecting both customers and the Support team, and handling escalations. Later, he built branded social media applications, and automation scripts to help startups better manage their marketing operations. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications

相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps 
目录
相关文章
|
20天前
|
SQL 算法 大数据
为什么大数据平台会回归SQL
在大数据领域,尽管非结构化数据占据了大数据平台80%以上的存储空间,结构化数据分析依然是核心任务。SQL因其广泛的应用基础和易于上手的特点成为大数据处理的主要语言,各大厂商纷纷支持SQL以提高市场竞争力。然而,SQL在处理复杂计算时表现出的性能和开发效率低下问题日益凸显,如难以充分利用现代硬件能力、复杂SQL优化困难等。为了解决这些问题,出现了像SPL这样的开源计算引擎,它通过提供更高效的开发体验和计算性能,以及对多种数据源的支持,为大数据处理带来了新的解决方案。
|
1月前
|
SQL 存储 算法
比 SQL 快出数量级的大数据计算技术
SQL 是大数据计算中最常用的工具,但在实际应用中,SQL 经常跑得很慢,浪费大量硬件资源。例如,某银行的反洗钱计算在 11 节点的 Vertica 集群上跑了 1.5 小时,而用 SPL 重写后,单机只需 26 秒。类似地,电商漏斗运算和时空碰撞任务在使用 SPL 后,性能也大幅提升。这是因为 SQL 无法写出低复杂度的算法,而 SPL 提供了更强大的数据类型和基础运算,能够实现高效计算。
|
2月前
|
SQL 消息中间件 分布式计算
大数据-143 - ClickHouse 集群 SQL 超详细实践记录!(一)
大数据-143 - ClickHouse 集群 SQL 超详细实践记录!(一)
93 0
|
2月前
|
SQL 大数据
大数据-143 - ClickHouse 集群 SQL 超详细实践记录!(二)
大数据-143 - ClickHouse 集群 SQL 超详细实践记录!(二)
66 0
|
2月前
|
SQL 大数据 API
大数据-132 - Flink SQL 基本介绍 与 HelloWorld案例
大数据-132 - Flink SQL 基本介绍 与 HelloWorld案例
55 0
|
2月前
|
SQL 分布式计算 大数据
大数据-97 Spark 集群 SparkSQL 原理详细解析 Broadcast Shuffle SQL解析过程(一)
大数据-97 Spark 集群 SparkSQL 原理详细解析 Broadcast Shuffle SQL解析过程(一)
65 0
|
2月前
|
SQL 分布式计算 算法
大数据-97 Spark 集群 SparkSQL 原理详细解析 Broadcast Shuffle SQL解析过程(二)
大数据-97 Spark 集群 SparkSQL 原理详细解析 Broadcast Shuffle SQL解析过程(二)
87 0
|
2月前
|
SQL 分布式计算 Java
大数据-96 Spark 集群 SparkSQL Scala编写SQL操作SparkSQL的数据源:JSON、CSV、JDBC、Hive
大数据-96 Spark 集群 SparkSQL Scala编写SQL操作SparkSQL的数据源:JSON、CSV、JDBC、Hive
53 0
|
2月前
|
SQL 分布式计算 大数据
大数据-94 Spark 集群 SQL DataFrame & DataSet & RDD 创建与相互转换 SparkSQL
大数据-94 Spark 集群 SQL DataFrame & DataSet & RDD 创建与相互转换 SparkSQL
72 0
|
2月前
|
SQL 存储 分布式计算
大数据-93 Spark 集群 Spark SQL 概述 基本概念 SparkSQL对比 架构 抽象
大数据-93 Spark 集群 Spark SQL 概述 基本概念 SparkSQL对比 架构 抽象
46 0

相关产品

  • 云原生大数据计算服务 MaxCompute