Alibaba Cloud DataWorks Highly Recognized by Forrester

本文涉及的产品
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
大数据开发治理平台DataWorks,资源组抵扣包 750CU*H
简介: DataWorks is listed in Forrester's Cloud Data Warehouse Q1 2018 report as one of the core products from a global first-tier CDW service provider.

Analyses in this article are based on Now Tech: Cloud Data Warehouse, Q1 2018 (Published by Noel Yuhanna, March 13, 2018). The views and opinions expressed herein are those of the author.

On March 13, 2018, Forrester issued the Now Tech: Cloud Data Warehouse Q1 2018 report. In this report, Forrester comprehensively assessed Cloud Data Warehouses (CDWs) in aspects such as main features, regional performance, market segmentation, and customers.

Alibaba Cloud, AWS, Google and Microsoft are selected as the four global first-tier CDW service providers. Alibaba Cloud DataWorks and MaxCompute are the only products from a Chinese company recognized in the report.

In this report, Forrester highlighted four core CDW features:

  • Flexible deployment
    CDWs are expected to have several flexible deployment modes. For small enterprises, CDWs should provide the online multi-tenant mode to allow these customers to quickly mobilize computing resources and implement data warehouse deployment in just several minutes. For medium and large enterprises, CDWs should support the exclusive or local deployment mode to provide robust computing performance and absolute security as well as leave out technical details of high complexity
  • Efficient data migration to cloud
    For customers that have not yet migrated their data warehouses to cloud or customers that adopt online and offline hybrid architectures, CDWs should provide a fast and low-cost approach to help users implement data collection.
  • Diverse analysis methods
    CDWs should support multiple technical means to help users get desired data processing capabilities in various business scenarios.
  • Excellent security
    CDWs should provide security in various aspects, including data encryption, auditing, data desensitization and access control.

As the core of Alibaba Cloud CDW services, why is DataWorks recognized by Forrester? Let's look at the detailed analysis on DataWorks.

Product Architecture

Before analyzing DataWorks, we will first take a quick look at its role in the Alibaba Cloud CDW service system and its product architecture.

1

Among a variety of Alibaba Cloud products, DataWorks and MaxCompute make up the core of CDW service capabilities. As a storage computing engine, MaxCompute is responsible for supporting the IaaS layer and provides users with numerous and reliable big data table storage and SQL execution capability. However, MaxCompute alone cannot meet data processing requirements. Data development, data integration and other CDW services are also required to empower customers with big data. To this end, DataWorks provides a relatively complete solution.

Specifically, DataWorks includes 8 major modules:

  • Data integration: Integrate heterogeneous data to collect numerous data from various source systems on big data cloud platforms
  • Data development: Data warehouse design and ETL development
  • O&M monitoring: O&M monitoring over jobs in the ETL process
  • Real-time analytics: Real-time data exploration and analysis
  • Data asset management: Metadata management, data map, data lineage, data asset graph, etc.
  • Data quality: The system for data quality control, monitoring, verification and assessment
  • Data security: data permission management, classified data marking, data desensitization and data audits
  • Data service: data sharing, data switching and data API services

Flexible Deployment

This Forrester report gives lengthy explanation of the necessity of multiple deployment modes, and includes the comparison among CDWs from several service providers. DataWorks is one of the first-tier products that provide multiple deployment modes.

Serving as the core of the Alibaba Group's data middleware system, DataWorks has been used to support business operations in enterprises like Alibaba Group, Ant Financial, and Cainiao since 2009. If you've used data services provided by Taobao, Tmall, Ant Financial, and other companies, you may have indirectly used the computing service provided by DataWorks.

DataWorks is already available for public cloud users. As of now, DataWorks has provided services for over 4,000 public cloud customers, including Weibo, Renrenche, and Tianhong Asset Management.

DataWorks also supports private cloud. As an important empowering means of big data, DataWorks is utilized in Alibaba Cloud's private cloud solutions including Apsara Enterprise. Since 2015, DataWorks has been providing support for important enterprise and government projects including the Alibaba Cloud ET City Brain and "Easy municipal service access".

With flexible deployment modes, DataWorks can meet a wide variety of customers' needs. For small enterprises, public cloud solutions can be used flexibly to provide services and support; for medium and large enterprises, private cloud or hybrid cloud solutions can fully meet customers' needs.

Efficient Data Migration to the Cloud

It is obvious that efficient data integration methods can significantly facilitate the migration of enterprise data to cloud. During the initial migration stage, enterprises need to quickly and securely migrate their data assets to cloud; during the stage of continuous business operations, enterprises need to input various kinds of data into CDWs and then output processed data from CDWs to individual business units.

The Data Integration feature of DataWorks can be used to read/write multiple data sources, including relational databases, NoSQL databases, big data databases and text storage (FTP), uniformly check data resources in data sources, and synchronize and integrate heterogeneous data sources in complex network environments. As to scheduling a specific import task, DataWorks supports batch synchronization, full synchronization and incremental synchronization of offline data. Users can specify a custom synchronization time by minute, day, hour, week, or month.

2

In addition, the Data Integration feature of DataWorks provides data stream control to manage data stream behavior in dirty data, data velocity and number of concurrent threads, leading to all-round user cost reduction and lean management.

Diverse Analysis Methods

DataWorks provides powerful data development IDEs and supports visual editing of SQL code, integration tasks and business flow DAG graphs. Multi-user online cooperation and task script version management can meet practical needs of enterprise-level data development. In addition to the offline task processing feature, DataWorks provides the lightweight "Analytics Workbench" tool to fully utilize the computing capacity of MaxCompute and meet users' instant data analysis needs.

3

It is reported that updates have recently been made to the drag-and-drop business flow editing feature in DataWorks to further improve user experience and provide a better data development IDE.

Robust Security

Sensitive data protection requires even better compliance with the industry standards and data privacy laws and regulations. Security is the top priority of DataWorks. DataWorks provides data security modules and implements all-round data security using the following security protection means:

  • Multi-tenant isolation
    DataWorks has its own multi-tenant permission model. Tenants can apply for resource quotas on demand and manage their own resources; tenants can also manage their own data, permissions, users and roles independently from each other to ensure data security.
  • Data security level setting
    Data security levels allow users to discover and locate sensitive data, and see the sensitive data distribution on data resource platforms. Auto-discover sensitive data based on specified insensitive data types and classify insensitive data. Appropriate security rules are applied based on secret levels such as Top Secret, Confidential and General.
  • Data access audit
    DataWorks will strictly examine privileged users' access, including access time, executed operations and execution order. Recording and auditing privileged users' access can ensure that appropriate operations are performed at the proper time by these privileged users, and check if abnormal operations are made, to further improve the security of data systems.
  • Data desensitization
    When failing to decide whether some users, access addresses, or even fields are distrustful or not, DataWorks will focus on data content itself, identify sensitive information points and block dynamic access to this information to ensure data security.

DataWorks has received a third-level information security certificate issued by the Ministry of Public Security.

Conclusion

With "Internet Plus" further applied in different industries, there is an increasing need for enterprises to manage, process and employ their data assets. Internet companies can quickly use their big data processing capability to meet other enterprises' needs. That also explains why these four cloud service providers, instead of long-established data warehouse companies like Oracle and IBM, are listed in the Forrester report as first-tier CDW providers.

Thanks to years of data leveraging in Alibaba Cloud, DataWorks can fully meet enterprise-level requirements in deployment modes, data integration, analysis means, and data security.

It is said that DataWorks will continue to provide more advanced data management ideas, including real-time data integration and data asset analysis. DataWorks combines cloud computing with data warehouse management methodology to implement persistent innovations and create "platforms most suitable for big data warehouse development". That is another reason why DataWorks is listed in this Forrester's CDW report.

To learn more about the Big Data capabilities of Alibaba Cloud, read the Forrester report on MaxCompute.

相关实践学习
基于Hologres轻量实时的高性能OLAP分析
本教程基于GitHub Archive公开数据集,通过DataWorks将GitHub中的项⽬、行为等20多种事件类型数据实时采集至Hologres进行分析,同时使用DataV内置模板,快速搭建实时可视化数据大屏,从开发者、项⽬、编程语⾔等多个维度了解GitHub实时数据变化情况。
目录
相关文章
|
JavaScript 前端开发 Dubbo
注册中心设计 Ap 与 CP 区别|学习笔记
快速学习注册中心设计 Ap 与 CP 区别
1151 0
注册中心设计 Ap 与 CP 区别|学习笔记
|
7月前
|
SQL 存储 分布式计算
《深度洞察:Hadoop生态系统与SQL的奇妙联动》
Hadoop生态系统如同一座工业城市,包含HDFS、MapReduce、YARN等核心组件,协同处理海量数据。SQL作为经典数据语言,在Hadoop中通过Hive等工具发挥重要作用,降低使用门槛、提升查询效率,并助力数据集成与治理。二者的结合推动了大数据技术发展,未来将在AI、物联网等领域展现更大潜力,持续优化数据处理与分析能力,为科学决策提供有力支持。
153 33
|
缓存 Java Apache
常见的 HTTP 状态码分类及说明
这篇文章介绍了常见的HTTP状态码分类及其说明,包括1xx信息响应、2xx成功、3xx重定向、4xx客户端错误和5xx服务器错误,并提供了一个使用Apache HttpClient进行HTTP POST请求的Java代码示例。
|
JSON 前端开发 数据可视化
再见丑陋的 Swagger,这个API神器界面更炫酷,逼格更高,体验更好
代码未动,文档先行 其实大家都知道 API 文档先行的重要性,但是在实践过程中往往会遇到很多困难。 程序员最讨厌的两件事:1. 写文档,2. 别人不写文档。大多数开发人员不愿意写 API 文档的原因是写文档短期收益远低于付出的成本,然而并不是所有人都能够坚持做有长期收益的事情的。 作为一个前后端分离模式开发的团队,我们经常会看到这样的场景:前端开发和后端开发在一起热烈的讨论“你这接口参数怎么又变了?”,“接口怎么又不通了?”,“稍等,我调试下”,“你再试试..."。 那能不能写好 API 文档,大家都按文档来开发?很难,因为写文档、维护文档比较麻烦,而且费时,还会经常出现 API 更新了
|
安全 Linux 程序员
使用阿里云服务器部署Code-server
本人是iPad党,实在不想感受游戏本的重量,但是又要用到C++,Go语言开发,于是想起了GitHub上Code-server的项目,正巧有个服务器,所以就开始干了!
|
机器学习/深度学习 存储 SQL
快速入门DVC(一):简介
简述 DVC的开发者为iterative.ai,成立于2017年。它是一款开源的,针对机器学习项目的版本控制系统,同时也提供企业服务。起初,DVC从数据版本化管理概念切入,之后,提供对机器学习全方位的支持。
|
算法 机器人 定位技术
ROS中阶笔记(七):机器人SLAM与自主导航—SLAM功能包的使用(上)
ROS中阶笔记(七):机器人SLAM与自主导航—SLAM功能包的使用
879 0
ROS中阶笔记(七):机器人SLAM与自主导航—SLAM功能包的使用(上)
Future & CompleteFuture 实践总结
Future & CompleteFuture 实践总结
|
人工智能 Ubuntu 搜索推荐
Python文本语音识别模块大比拼,看看青铜与王者的差别!
如果把Python比喻成游戏中的一个英雄,你觉得它是谁?对于Dota老玩家来说,我会想到钢琴手卡尔!感觉Python和卡尔一样,除了生孩子什么都可以做的角色。日常生活中,我们会涉及到很多语音播报的场景,比如郭德纲版的高德地图导航、超市门口的红外感知提醒欢迎光临、银行的自助叫号系统,等等...今天就和大家聊聊Python文本转语音,看看这些从青铜到王者的模块。
640 0