MaxCompute2.0 Empowers the Rapid Expansion of ZhongAn Insurance

本文涉及的产品
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
简介: At the Alibaba Cloud MaxCompute session during the 2017 Computing Conference held in Hangzhou.

BD_002

Summary: At the Alibaba Cloud MaxCompute session during the 2017 Computing Conference held in Hangzhou, Wang Chaoqun, Data Director of ZhongAn Insurance, delivered a speech on how MaxCompute empowers the business expansion of ZhongAn Insurance. This article first introduces the advantages of MaxCompute, presents the benefits big data brings to the company management, and finally focuses on the analysis of the data platform construction of ZhongAn Insurance, including task scheduling, metadata, and data quality monitoring.

The highlights of the speech are as follows:

As the first Internet company in China, ZhongAn Insurance has been using MaxCompute as its computing platform since its founding.

01

Why MaxCompute?

At the early stage, we made a choice between a self-created platform and MaxCompute in terms of robustness, interaction with application systems, scalability, data security, and cost effectiveness.

Robustness: 24/7 service and exception recovery time

Interaction with application systems: efficiency and cost of data source acquisition and data delivery

Scalability: elastic computing when data grows exponentially

Data security: protection against exceptional data attacks, multiple sandbox protection, and permission systems;

Cost: comparison between the self-created platform and MaxCompute

Firstly, in 2013, we could not find too many computing platforms that provide complete computing capabilities. In this case, MaxCompute, created to support the verified use of production system of Alibaba Finance, is a good choice that satisfies our requirements on both auto scaling and scalability by supporting the computing capabilities over 5,000 machines. Secondly, ZhongAn relies on the professional competencies of Alibaba Cloud for its leading position in China's computing market. Finally, MaxCompute, as a computing platform, also provides analyzing and mining tools, as well as available IDE (DataWorks, Studio) development tools, which helps reduce our development cost in our initial processing and development phase.

What are the benefits of using big data in the company management?

02

The development of the overall ecosystem chain of the cloud computing and big data is shown in the preceding figure. With over 60% annual growth rate in China's cloud computing market and the considerable number of new features of AWS, cloud computing is increasingly present in our daily lives. In the past ten years after the birth of Hadoop, the cloud computing products have been enriched greatly, and the ecosystem has become increasingly larger.

Big data excels not only in its tools, platforms and ecosystems, but also in its supports for the ecosystem development with the empowerment of personnel and scenarios. Over 10,000 employees of the Alibaba Group are using MaxCompute in their daily works. With the empowerment of personnel, big data scenarios can be diversified from the rich feedback data. During ten years' development, the investment in manpower and resources and the positive returns has contributed to the big data sector, so as to form a closed loop.

ZhongAn is an insurance-centered company, providing cross-ecosystem connections that empower the cooperation with different sub-sectors, including e-commerce, 3C, and autos. These products interconnect all the ecosystem partners and increase the contacts with users. By cooperating with more than 300 ecosystem partners, we have accumulated a wealth of user data. We hope ZhongAn can support the ecosystems and expand its own open platform by means of data accumulation, customer accumulation and brand accumulation.

By the end of 2016, we have served 492 million users with 7.2 billion insurance policies, providing the initial guarantee for the new generation of Internet companies in China. Among our users, people under 30 occupy approximately 50%, which indicates that ZhongAn Insurance represents a new idea and way of life. In addition, these people are well able to make money, and they better approve of and are more aware of insurance. They are the future main force of consumption.

Data Platform Building of ZhongAn Insurance

Behind each string of numbers lie the efforts of all the staff. Then, based on MaxCompute, what does the data platform do? How is the business supported to develop rapidly?

The data platform contains platform tools, data monitoring and data services. Data itself includes multi-source heterogeneous data. Data value is embodied in its mobility and openness. Only after the data is processed, inspected and provided for users can value be generated. The platform tools include MaxCompute, data synchronization, task scheduling and computing storage management. Data monitoring covers early-warning systems, metadata, kinship and data quality. Data services include data portals, self-service data acquisition and service APIs.

Task scheduling system

03

Task scheduling is essentially aimed at completing the state of the workflow of data processing. Data processing is a multi-link course. In order to ensure the data order is correct, we support scheduling by different periods, such as a day, a week, a month and the like; we support grouping priorities, hourly tasks, and self-defined time scheduling; the amount of daily tasks exceeds 10,000.

Task scheduling is a directed graph, and we can see a lot of source data from each node, where red data indicates the state of error, blue indicates success, green indicates operating, and yellow indicates existence. Different task processing courses come from a plurality of data sources, which confuses us. If an error is found in the information, does the error come from the task itself, or is the error caused by the result of the upstream data source? Then, how can the developer be enabled to faster locate the problem, reduce development cost and give the unified statement? We solve the problem by metadata.

Metadata

The data includes the breaking of barriers between data, which is favorable for model optimization and exception location; and the breaking of barriers between data and humans, which is favorable for cost optimization. Data relationships include data dictionary information, kinship information, storage and output information, table owner information and business metadata information, so as to drive the optimization of storage computing to reduce the use cost of MaxCompute.

The left diagram shows the basic information between data, as well as data output information and kinship. The right diagram shows the source of the tables, as well as which tables next round will be affected by the output. After acquiring the information, we will break the barriers between data, as well as the barriers between humans and data.

04

Cost is reduced by 30% after storage is optimized. By the optimization of storage computing, invalid storage is reduced, and computing efficiency will be improved.

Data Quality Monitoring

Data quality monitoring is embedded in the execution state of the task itself in slice mode, to execute the self-treatment of the task and judge the state of itself by itself. The accuracy of the data is verified based on rules and templates, and only 0 KB will be used by downstream data. This avoids data contamination, and errors are exposed by themselves independently of downstream data. It is characterized by the facts that the counter-collecting function of MaxCompute is utilized; that its rules are counter rules, containing tables and filed levels; that its template is the combination of rules, periods and statistical functions; that after-process monitoring is changed to in-process monitoring; that user defining is supported; that key tasks are covered; and that the rate of coverage reaches 30%.

Data Services and Security

What will we consider during consumption? Data needs to be opened and circulated,and what do we need to notice during the opening and circulation? Data leakage and insecurity will both bring disaster to a company.

Technically, based on ACL and role management, we endow different levels; we enable permission grade control and establish the encrypted approval process for sensitive information masks and secret-related information; opening and security are based on technical control and process control as well as data required for various roles. The basis of opening is security control, the key to opening is process management, and we keep a balance between opening and security.

During the establishment of a data platform, three stages of usability, ease of use, and adaptability need to be retained, and multiple iterations are needed to upgrade the system. Data is a service, and we need to satisfy users' different data requirements. Data is an infrastructure, and every company faces the building and use of data platforms.

The richness of ecosystems, the sharing of resources and tools, the deep research on and support for mining algorithms, of MaxCompute, are strong enough to meet our use requirements, so that we have more time to contact users and create value for users. The cost of MaxCompute is also decreasing gradually. In the future, we hope MaxCompute can provide more types of mode support, including UDF resource libraries, such as IP libraries (including python algorithm packages for mining, and artificial intelligence platform support).

05

相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps 
目录
相关文章
|
21小时前
|
SQL 存储 分布式计算
阿里云 Paimon + MaxCompute 极速体验
Paimon 和 MaxCompute 的对接经历了长期优化,解决了以往性能不足的问题。通过半年紧密合作,双方团队专门提升了 Paimon 在 MaxCompute 上的读写性能。主要改进包括:采用 Arrow 接口减少数据转换开销,内置 Paimon SDK 提升启动速度,实现原生读写能力,减少中间拷贝与转换,显著降低 CPU 开销与延迟。经过双十一实战验证,Paimon 表的读写速度已接近 MaxCompute 内表,远超传统外表。欢迎体验!
|
6月前
|
SQL 分布式计算 运维
DataWorks产品使用合集之是否支持CDH SparkSql节点
DataWorks作为一站式的数据开发与治理平台,提供了从数据采集、清洗、开发、调度、服务化、质量监控到安全管理的全套解决方案,帮助企业构建高效、规范、安全的大数据处理体系。以下是对DataWorks产品使用合集的概述,涵盖数据处理的各个环节。
33 1
|
消息中间件 分布式计算 大数据
大数据Spark Continuous Processing
大数据Spark Continuous Processing
142 0
|
分布式计算 Apache Spark
《Building Robust ETL Pipelines with Apache Spark》电子版地址
Building Robust ETL Pipelines with Apache Spark
89 0
《Building Robust ETL Pipelines with Apache Spark》电子版地址
|
分布式计算 关系型数据库 分布式数据库
MaxCompute Spark
MaxCompute Spark
170 0
|
分布式计算 Java MaxCompute
MaxCompute Spark开发指南
MaxCompute Spark开发指南 0. 概述 本文档面向需要使用MaxCompute Spark进行开发的用户使用。本指南主要适用于具备有Spark开发经验的开发人员。
5642 0
|
SQL 运维 DataWorks
Data Lake Analytics: 使用DataWorks来调度DLA任务
DataWorks作为阿里云上广受欢迎的大数据开发调度服务,最近加入了对于Data Lake Analytics的支持,意味着所有Data Lake Analytics的客户可以获得任务开发、任务依赖关系管理、任务调度、任务运维等等全方位强大的能力,今天就给大家介绍一下如何使用DataWorks来调度DLA的脚本任务。
6790 0
|
JSON 定位技术 数据格式
Data Lake Analytics的Geospatial分析函数
0. 简介 为满足部分客户在云上做Geometry数据的分析需求,阿里云Data Lake Analytics(以下简称:DLA)支持多种格式的地理空间数据处理函数,符合Open Geospatial Consortium’s (OGC) OpenGIS规范,支持的常用数据格式包括: WKT WKB GeoJson ESRI Geometry Object Json ESRI Shape DLA采用4326坐标系标准,EPSG 4326使用经纬度坐标,属于地理坐标系。
1977 0
|
存储 分布式计算 数据处理
MaxCompute(ODPS)上处理非结构化数据的Best Practice
随着MaxCompute(ODPS)2.0的上线,新增的非结构化数据处理框架也推出一系列的介绍文章,包括 MaxCompute上如何访问OSS数据, 基本功能用法和整体介绍,侧重介绍读取OSS数据进行计算处理; 本文:MaxCompute(ODPS)上处理非结构化数据的Best Practice。
4819 0
|
SQL 人工智能 HIVE
MaxCompute2.0 Performance Metrics: Faster, Stronger Computing
This evaluation focuses on performance comparison between MaxCompute2.0 and other offline computing products, as well as between MaxCompute2.
1915 0
MaxCompute2.0 Performance Metrics: Faster, Stronger Computing