MaxCompute2.0 Empowers the Rapid Expansion of ZhongAn Insurance

本文涉及的产品
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
简介: At the Alibaba Cloud MaxCompute session during the 2017 Computing Conference held in Hangzhou.

BD_002

Summary: At the Alibaba Cloud MaxCompute session during the 2017 Computing Conference held in Hangzhou, Wang Chaoqun, Data Director of ZhongAn Insurance, delivered a speech on how MaxCompute empowers the business expansion of ZhongAn Insurance. This article first introduces the advantages of MaxCompute, presents the benefits big data brings to the company management, and finally focuses on the analysis of the data platform construction of ZhongAn Insurance, including task scheduling, metadata, and data quality monitoring.

The highlights of the speech are as follows:

As the first Internet company in China, ZhongAn Insurance has been using MaxCompute as its computing platform since its founding.

01

Why MaxCompute?

At the early stage, we made a choice between a self-created platform and MaxCompute in terms of robustness, interaction with application systems, scalability, data security, and cost effectiveness.

Robustness: 24/7 service and exception recovery time

Interaction with application systems: efficiency and cost of data source acquisition and data delivery

Scalability: elastic computing when data grows exponentially

Data security: protection against exceptional data attacks, multiple sandbox protection, and permission systems;

Cost: comparison between the self-created platform and MaxCompute

Firstly, in 2013, we could not find too many computing platforms that provide complete computing capabilities. In this case, MaxCompute, created to support the verified use of production system of Alibaba Finance, is a good choice that satisfies our requirements on both auto scaling and scalability by supporting the computing capabilities over 5,000 machines. Secondly, ZhongAn relies on the professional competencies of Alibaba Cloud for its leading position in China's computing market. Finally, MaxCompute, as a computing platform, also provides analyzing and mining tools, as well as available IDE (DataWorks, Studio) development tools, which helps reduce our development cost in our initial processing and development phase.

What are the benefits of using big data in the company management?

02

The development of the overall ecosystem chain of the cloud computing and big data is shown in the preceding figure. With over 60% annual growth rate in China's cloud computing market and the considerable number of new features of AWS, cloud computing is increasingly present in our daily lives. In the past ten years after the birth of Hadoop, the cloud computing products have been enriched greatly, and the ecosystem has become increasingly larger.

Big data excels not only in its tools, platforms and ecosystems, but also in its supports for the ecosystem development with the empowerment of personnel and scenarios. Over 10,000 employees of the Alibaba Group are using MaxCompute in their daily works. With the empowerment of personnel, big data scenarios can be diversified from the rich feedback data. During ten years' development, the investment in manpower and resources and the positive returns has contributed to the big data sector, so as to form a closed loop.

ZhongAn is an insurance-centered company, providing cross-ecosystem connections that empower the cooperation with different sub-sectors, including e-commerce, 3C, and autos. These products interconnect all the ecosystem partners and increase the contacts with users. By cooperating with more than 300 ecosystem partners, we have accumulated a wealth of user data. We hope ZhongAn can support the ecosystems and expand its own open platform by means of data accumulation, customer accumulation and brand accumulation.

By the end of 2016, we have served 492 million users with 7.2 billion insurance policies, providing the initial guarantee for the new generation of Internet companies in China. Among our users, people under 30 occupy approximately 50%, which indicates that ZhongAn Insurance represents a new idea and way of life. In addition, these people are well able to make money, and they better approve of and are more aware of insurance. They are the future main force of consumption.

Data Platform Building of ZhongAn Insurance

Behind each string of numbers lie the efforts of all the staff. Then, based on MaxCompute, what does the data platform do? How is the business supported to develop rapidly?

The data platform contains platform tools, data monitoring and data services. Data itself includes multi-source heterogeneous data. Data value is embodied in its mobility and openness. Only after the data is processed, inspected and provided for users can value be generated. The platform tools include MaxCompute, data synchronization, task scheduling and computing storage management. Data monitoring covers early-warning systems, metadata, kinship and data quality. Data services include data portals, self-service data acquisition and service APIs.

Task scheduling system

03

Task scheduling is essentially aimed at completing the state of the workflow of data processing. Data processing is a multi-link course. In order to ensure the data order is correct, we support scheduling by different periods, such as a day, a week, a month and the like; we support grouping priorities, hourly tasks, and self-defined time scheduling; the amount of daily tasks exceeds 10,000.

Task scheduling is a directed graph, and we can see a lot of source data from each node, where red data indicates the state of error, blue indicates success, green indicates operating, and yellow indicates existence. Different task processing courses come from a plurality of data sources, which confuses us. If an error is found in the information, does the error come from the task itself, or is the error caused by the result of the upstream data source? Then, how can the developer be enabled to faster locate the problem, reduce development cost and give the unified statement? We solve the problem by metadata.

Metadata

The data includes the breaking of barriers between data, which is favorable for model optimization and exception location; and the breaking of barriers between data and humans, which is favorable for cost optimization. Data relationships include data dictionary information, kinship information, storage and output information, table owner information and business metadata information, so as to drive the optimization of storage computing to reduce the use cost of MaxCompute.

The left diagram shows the basic information between data, as well as data output information and kinship. The right diagram shows the source of the tables, as well as which tables next round will be affected by the output. After acquiring the information, we will break the barriers between data, as well as the barriers between humans and data.

04

Cost is reduced by 30% after storage is optimized. By the optimization of storage computing, invalid storage is reduced, and computing efficiency will be improved.

Data Quality Monitoring

Data quality monitoring is embedded in the execution state of the task itself in slice mode, to execute the self-treatment of the task and judge the state of itself by itself. The accuracy of the data is verified based on rules and templates, and only 0 KB will be used by downstream data. This avoids data contamination, and errors are exposed by themselves independently of downstream data. It is characterized by the facts that the counter-collecting function of MaxCompute is utilized; that its rules are counter rules, containing tables and filed levels; that its template is the combination of rules, periods and statistical functions; that after-process monitoring is changed to in-process monitoring; that user defining is supported; that key tasks are covered; and that the rate of coverage reaches 30%.

Data Services and Security

What will we consider during consumption? Data needs to be opened and circulated,and what do we need to notice during the opening and circulation? Data leakage and insecurity will both bring disaster to a company.

Technically, based on ACL and role management, we endow different levels; we enable permission grade control and establish the encrypted approval process for sensitive information masks and secret-related information; opening and security are based on technical control and process control as well as data required for various roles. The basis of opening is security control, the key to opening is process management, and we keep a balance between opening and security.

During the establishment of a data platform, three stages of usability, ease of use, and adaptability need to be retained, and multiple iterations are needed to upgrade the system. Data is a service, and we need to satisfy users' different data requirements. Data is an infrastructure, and every company faces the building and use of data platforms.

The richness of ecosystems, the sharing of resources and tools, the deep research on and support for mining algorithms, of MaxCompute, are strong enough to meet our use requirements, so that we have more time to contact users and create value for users. The cost of MaxCompute is also decreasing gradually. In the future, we hope MaxCompute can provide more types of mode support, including UDF resource libraries, such as IP libraries (including python algorithm packages for mining, and artificial intelligence platform support).

05

相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps 
目录
相关文章
|
6月前
|
机器学习/深度学习 SQL 分布式计算
详解Apache Hudi Schema Evolution(模式演进)
详解Apache Hudi Schema Evolution(模式演进)
237 3
|
存储 SQL 分布式计算
MaxCompute(原名ODPS,全称Open Data Processing Service)
MaxCompute(原名ODPS,全称Open Data Processing Service)是阿里云开发的一种云原生数据处理和分析服务。它提供了强大的数据计算和处理能力,支持海量数据的存储、计算、分析和挖掘,并且具有高可靠、高性能、高可扩展、高安全等优势,适用于各种数据处理和分析场景。
1138 0
|
分布式计算 Apache Spark
《Building Robust ETL Pipelines with Apache Spark》电子版地址
Building Robust ETL Pipelines with Apache Spark
83 0
《Building Robust ETL Pipelines with Apache Spark》电子版地址
|
SQL 存储 分布式计算
MaxCompute Information Schema功能详解
阿里云的技术专家为大家带来MaxCompute新功能Information Schema的详细介绍。内容包括Information Schema的简介,安装,使用场景,以及对此新功能的使用建议。
1945 1
MaxCompute Information Schema功能详解
|
存储 分布式计算 数据处理
MaxCompute(ODPS)上处理非结构化数据的Best Practice
随着MaxCompute(ODPS)2.0的上线,新增的非结构化数据处理框架也推出一系列的介绍文章,包括 MaxCompute上如何访问OSS数据, 基本功能用法和整体介绍,侧重介绍读取OSS数据进行计算处理; 本文:MaxCompute(ODPS)上处理非结构化数据的Best Practice。
4791 0
|
分布式计算 安全 MaxCompute
Forrester Report: MaxCompute One of World's Leading Cloud-Based Data Warehouse
Forrester names Alibaba Cloud MaxCompute as one of the world's leading cloud-based data warehouse in the "Cloud Data Warehouse, Q1 2018" report.
2764 0
Forrester Report: MaxCompute One of World's Leading Cloud-Based Data Warehouse
|
SQL 人工智能 HIVE
MaxCompute2.0 Performance Metrics: Faster, Stronger Computing
This evaluation focuses on performance comparison between MaxCompute2.0 and other offline computing products, as well as between MaxCompute2.
1913 0
MaxCompute2.0 Performance Metrics: Faster, Stronger Computing
|
分布式计算 关系型数据库 MaxCompute
Best Practices for Data Migration from MaxCompute to HybridDB for PostgreSQL
Users often struggle to migrate data from MaxCompute to HybridDB. This blog describes five steps that will help you migrate data conveniently.
2939 0
Best Practices for Data Migration from MaxCompute to HybridDB for PostgreSQL
|
Web App开发 SQL Java
|
SQL 分布式计算 Hadoop
Optimizing Complex Data Distribution in MaxCompute
For a long time, data distribution has been an issue in the field of Big Data processing. Unfortunately, the Big Data processing systems that are popular today do not satisfactorily solve the issue.
1808 0