How AHI Fintech and DataVisor are Securing Data through AI and Big Data

本文涉及的产品
可视分析地图(DataV-Atlas),3 个项目,100M 存储空间
数据可视化DataV,5个大屏 1个月
简介: With growing threat of cyber-attacks, organizations like AHI Fintech and DataVisor are using Big Data and AI to help customers in China to protect their data.

The field of financial risk control has recently seen a sudden increase in competition over the past year. Several budding enterprises find themselves currently fighting a battle on two fronts—data acquisition capabilities and algorithm technology.

In June 2017, China's Cyber Security Act was launched. Companies that crawl users' mobile phone for data without prior authorization may now be facing serious legal implications. This can include a 7-year jail sentence for legal representatives of the convicted company. Furthermore, many businesses in the field of data acquisition and transactional data are now facing thorough investigation.

With the loss of the gray data industry, the risk control industry seems to be facing an opportunity to move toward healthy compliance, despite the technical challenges it still faces.

A Close Race: Algorithms or Data?

Despite the increasingly thorough regulations, cyber-attack rates are still on the rise. This makes user data even more precious, especially for businesses reliant on the integration of external data. Instead of circumventing regulations, enterprises have now shifted their focus on two major aspects of big data analysis – algorithms and modeling. These two areas are crucial in the field of big data as their emergence has given rise to a group of new risk-control companies in China.

Huang Ling, CEO of AHI FinTech and a Ph.D. in computer science at the University of California, Berkeley, and a part-time professor at the Interdisciplinary Information Institute at Tsinghua University described his work as "A global war." "In the risk control industry, our opponents are huge, and feature worldwide black-market production chain."

Since the very beginning, risk control has been facing a global opponent in the form of a community of hackers. These hackers invade other people’s phones and computers through malicious software. On the one hand, they can access confidential data, on the other hand, they can use this compromised information to open fake accounts. They also can do all kinds of false social interactions such as leaving reviews or making purchases. They eventually have a seemingly normal account with a lot of friends and good credit history. Ultimately, they use these accounts to apply for a variety of financial products.

Meanwhile, the core of risk control is using relevant data to conduct modeling analysis and eliminate fake users, then provide repayment ability and repayment willingness and risk control evaluations for real users.

The data risk control companies that have begun to pop up over the past few years complete the resources that this work relies on algorithms and data acquisition capabilities.

Currently, in the face of the massive worldwide "gangs" of hackers and an enormous black-market production chain, risk control, and anti-fraud solutions in the market are lagging an obvious step behind regarding of algorithmic technology. Program providers often use device fingerprints, black/white lists, regular systems, or tagged machine learning models to detect fraudulent activities. Some methods only conduct shallow analyses; therefore, it is easy for malicious opponents to circumvent and deceive. Some use machine learning methods but often rely on tagged historical data to train models. This labeled data is often scarce and represents only the fraudulent activities that have occurred in the past. Furthermore, the models trained with this data are not accurate enough to cope with the ever-changing fraudulent practices.

Meanwhile, a vast number of risk control companies are primarily reliant on strong data acquisition and integration capabilities. But by malicious crawling, purchasing of hacked data, and so on, the consolidated data eventually includes a substantial portion of the individual's confidential data which include ID card, phone number, bank card, savings account, or exact home address. The solutions that currently exist in the industry are extremely dependent on this kind of data. However, this data uses infringes on personal privacy and its legal legitimacy has been the target of heavy criticism. However, the amount of data available within the industry has a profound impact on modeling accuracy, and after the omission of these sensitive data, it will test the algorithms’ ability to perform.

Acquisition of Risk Control Data: Is Magnitude or Scenario More Important?

Will the sophistication of the algorithm make up for the loss of a large amount of sensitive data? Last month, in an exclusive interview with HC Financial Service Group CFO Shen Yutong (Tony Shen) in New York, the Big Data Digest said we should be more cautious when using data that is not directly related to lending behavior or credit behavior.

"Some people frequently shop online, but because of their frequent shopping habits, they find themselves short of cash and in need of a loan. It is necessary to realize that regular online payments do not always mean that the buyer is a good person to give credit to. Therefore, they fail to get loans when applying for greater amounts of credit."

Social and online-shopping data, while valuable, are not necessarily more useful than data that is directly related to a person’s financial situation. Mr. Shen’s attitude reflects that of traditional financial experts toward Internet risk control—that is one of caution. Furthermore, it embodies a problem currently faced by the industry, when acquiring risk control data, is the volume of data more important than the data scenarios or vice versa?

With regards to this issue, Huang Ling is obviously more supportive of the latter type, "I think it depends on what kind of data we're talking about and how we use those data. The data which we're crawling isn't necessarily going to make much of a difference here. For customer data and application scenarios, it is important to help them mine the data more accurately and closer to the goal of the data.

1

Core customers for financial risk control are Internet companies, Internet financial enterprises, and other financial institutions. What these institutions have in common is a large number of accounts. With accounts at the center, we can acquire a lot of personal information including bank balances, purchasing history, borrowing history, and more. The work of risk control then is to build models around this data and make assessments of the user’s ability and willingness to make repayments, proper loan amount, etc.

Huang Ling believes that desensitizing data for user behavior models can also help realize the goal of risk control fraud prevention. "When executing behavior analysis, we usually look at the person’s social relationships, phone records, and e-commerce behavior. Here, behavior refers to where, when, and on what device the user registers and logs on to the website, what they did after logging in (the pages they visited, the products they purchased, the friends they added, who the spoke to, etc.). Even though this data includes some sensitive information (such as who your friends are) the data is desensitized. This data is then fed into a graphing algorithm and user association analysis is used to identify the hidden information related to the interaction between users."

Huang Ling and his team at AHI FinTech Quest

"We have almost no sensitive data belonging to users, more important for us is to use non-sensitive data, targeting the client’s behavior data. We then combine it with user use scenarios, use AI and Big Data methods to help the client get value out of their data, then create the most appropriate risk control model for the client’s user scenario. Subsequently, we help them achieve the most optimal testing results on their platform. This way, we can automatically detect abnormal connections among tens of millions of users, produce risk control warnings, and guard against organized and systematic risks. We execute all without infringing on users’ privacy or having insight on the type or characteristics of the attack."

Redefining Risk Control: Starting from the Data Source

This kind of data acquisition method also poses greater and more serious requirements to companies in the industry.

"When we do risk control modeling, the first thing we look at is the quality of the data, including whether or not the data is complete and whether or not it includes data that is relevant to risk control."

Huang Ling believes that risk control happens not only during modeling and testing, but on the side of the company, beginning with the collection of data. When dealing with customers, AHI FinTech focuses on helping its clients elevate their related abilities from a service perspective:

First, after a risk control signal is sent to the client’s platform, the platform can then block users with a high-risk value. For users with lower but insignificant risk control values, the system can merge data from other dimensions into their rules and models. It can also perform further processing and refining, and then re-process the user.

2

Furthermore, the system provides feedback during each step of data collection as well as the discovery of fraudulent trends and modeling.

The systems conduct these two aspects in parallel. If collected data does not meet quality standards, then the client must be required to adjust, then provide feedback on its issues in certain aspects. Even if the data cannot be made up for right away, the system has to fix it as soon as possible."

"We make recommendations to the client on how to collect data according to our own risk control and fraud prevention experience, so working with us is not just a matter of us helping you meet fraud detection requirements, we also give the client a lot of feedback and keep open lines of communication. We offer them comprehensive consulting and service from system applications to data collection and risk control."

The opponent of financial risk control is the enormous black-market production chain, making the matter extremely complicated. Organizations are introducing innovative technologies like AI and machine learning to the field in droves but using them correctly is certainly not a simple matter.

Most of solutions currently in the market are reliant on collecting massive amounts of data, then using rules systems or supervised machine learning generated models. These solutions harbor an obvious shortcoming: the models are always reliant on training with historical tag data. However, we can produce tags can only after we have suffered a fraud attack. We create them at the cost of our own sweat, blood, and tears. As our goal is for these kinds of attacks to be increasingly rare, we find ourselves lacking data for training models. Models produced by this kind of tag training are never good enough, and they can only represent fraudulent behavior that we’ve seen in the past. When fraudsters invent new methods, our models that are reliant on tag training always have difficulty in quickly and accurately stopping them—often creating massive losses.

Huang Ling’s team uses semi-supervised learning on data with only a few—or even no—tags to generate models, allowing them to significantly reduce the cost of acquiring new tags, increasing data usage, and producing higher quality models. Using an active machine learning platform, massive data processing capabilities provided by a Big Data system that combines organic and artificial intelligence, and the experience of risk control experts to help artificial intelligence automatically learn previously unknown fraud tactics. Additionally, it can track new fraud methods, and constantly adapt to an ever-changing environment to created anti-fraud machine learning models. This makes it significantly difficult for fraudsters to evade detection.

The Black-Market Production Chain in the Risk Control Industry

In addition to AHI FinTech, there is a Silicon Valley company—Datavisor—in the risk control field that takes a similar approach. Beginning in 2014, Huang Ling left his 7-year career as a senior researcher at Intel, becoming a founding member and Director of Data at Datavisor where he hosted the company’s entire machine learning, user behavior analysis, and credit modeling system. Here, he became a party to the next generation in Silicon Valley and became the most well-known expert in using unsupervised risk control methods.

3

Huang Ling has always believed that risk control in China is not particularly comparable to that in Silicon Valley. The black products faced by the anti-fraud industry are an entire production chain, made up of a gang of sorts that is spread out around the globe. This chain stretches from Eastern Europe to America to China to India. Moreover, it includes security attack software at the top to the people who use this software to control people and phones around the world. It also includes people and phones to create fake users who execute all kinds of fraudulent activity and reap the benefits.

Therefore, to a certain extent, you can say that risk control and anti-fraud work are universal. Several Internet companies and financial institutions in China are also facing attacks from abroad, and a lot of attacks perpetrated in America are conducted via China, India, Africa, or one of several South East Asian countries.

As a result, the much significant difference between America and China likely lies in differences in political policy and industry development:

Because the credit system in America is sound, the cost of committing fraud is relatively high. Whether the user is defrauding a bank or an online merchant, these kinds of activities usually affect the user’s credit score through a variety of channels. In China, this system is still under-developed; therefore, in many situations, user’s credit rating according to the central bank does not reflect online financial fraudulent activities. Therefore, the cost of committing fraud is comparatively low. As a result, examples of large-scale fraud are more abundant in China than they are in the States and they tend to be harder to handle effectively.

Furthermore, the development of the industry has been different in China and America. China’s mobile applications and Internet financial industries have grown to be larger than their counterparts in America, so there is more fraudulent activity surrounding these two sectors than in America.

"After coming back, we noticed that in China—especially in fields related to finance—this kind of fraud gang was larger and craftier than they are in America. They also use more real people to commit fraud, making them harder to detect and requiring more machine learning and AI modeling methods." Huang Ling said.

And on a global battlefield such as this, the addition of experts in artificial intelligence algorithms and security scientists is even more invaluable.

Turning to the entrepreneurial mind, Huang Ling said, "I have been researching and practicing in the fields of artificial intelligence algorithms and Internet security for several years. I hope that the skills and experience I have acquired over this time will be useful in the fields of financial risk control and fraud prevention. I also wish to provide a complete set of systems and services to accompany financial and Internet products and thereby achieve a more secure, honest, and fair industry environment. "

Aside from Huang Ling, the other co-founder of AHI FinTech—chief scientist Xu Wei—also hails from academia. Xu Wei served as a cross-institute adjunct professor at Tsinghua University.

Conclusion

Huang Ling regards the entrepreneurship of AI scientists in the field of risk control as a good thing. This is because they possess the skill and understanding of algorithms. Additionally, he is willing to give them a chance to really participate in the industry, rather than just being a cog in the machine.

相关实践学习
DataV Board用户界面概览
本实验带领用户熟悉DataV Board这款可视化产品的用户界面
阿里云实时数仓实战 - 项目介绍及架构设计
课程简介 1)学习搭建一个数据仓库的过程,理解数据在整个数仓架构的从采集、存储、计算、输出、展示的整个业务流程。 2)整个数仓体系完全搭建在阿里云架构上,理解并学会运用各个服务组件,了解各个组件之间如何配合联动。 3 )前置知识要求   课程大纲 第一章 了解数据仓库概念 初步了解数据仓库是干什么的 第二章 按照企业开发的标准去搭建一个数据仓库 数据仓库的需求是什么 架构 怎么选型怎么购买服务器 第三章 数据生成模块 用户形成数据的一个准备 按照企业的标准,准备了十一张用户行为表 方便使用 第四章 采集模块的搭建 购买阿里云服务器 安装 JDK 安装 Flume 第五章 用户行为数据仓库 严格按照企业的标准开发 第六章 搭建业务数仓理论基础和对表的分类同步 第七章 业务数仓的搭建  业务行为数仓效果图  
目录
相关文章
|
存储 分布式计算 运维
【2023云栖】刘一鸣:Data+AI时代大数据平台建设的思考与发布
本文根据2023云栖大会演讲实录整理而成,演讲信息如下: 演讲人:刘一鸣 | 阿里云自研大数据产品负责人 演讲主题:Data+AI时代大数据平台应该如何建设
102218 15
|
23天前
|
存储 人工智能 Cloud Native
云栖重磅|从数据到智能:Data+AI驱动的云原生数据库
在9月20日2024云栖大会上,阿里云智能集团副总裁,数据库产品事业部负责人,ACM、CCF、IEEE会士(Fellow)李飞飞发表《从数据到智能:Data+AI驱动的云原生数据库》主题演讲。他表示,数据是生成式AI的核心资产,大模型时代的数据管理系统需具备多模处理和实时分析能力。阿里云瑶池将数据+AI全面融合,构建一站式多模数据管理平台,以数据驱动决策与创新,为用户提供像“搭积木”一样易用、好用、高可用的使用体验。
云栖重磅|从数据到智能:Data+AI驱动的云原生数据库
|
21天前
|
人工智能 数据挖掘 数据库
拥抱Data+AI|破解电商7大挑战,DMS+AnalyticDB助力企业智能决策
本文为数据库「拥抱Data+AI」系列连载第1篇,该系列是阿里云瑶池数据库面向各行业Data+AI应用场景,基于真实客户案例&最佳实践,展示Data+AI行业解决方案的连载文章。本篇内容针对电商行业痛点,将深入探讨如何利用数据与AI技术以及数据分析方法论,为电商行业注入新的活力与效能。
拥抱Data+AI|破解电商7大挑战,DMS+AnalyticDB助力企业智能决策
|
22天前
|
人工智能 Cloud Native 数据管理
媒体声音|重磅升级,阿里云发布首个“Data+AI”驱动的一站式多模数据平台
在2024云栖大会上,阿里云瑶池数据库发布了首个一站式多模数据管理平台DMS:OneMeta+OneOps。该平台由Data+AI驱动,兼容40余种数据源,实现跨云数据库、数据仓库、数据湖的统一数据治理,帮助用户高效提取和分析元数据,提升业务决策效率10倍。DMS已服务超10万企业客户,降低数据管理成本高达90%。
100 19
|
14天前
|
人工智能 自然语言处理 关系型数据库
从数据到智能,一站式带你了解 Data+AI 精选解决方案、特惠权益
从 Data+AI 精选解决方案、特惠权益等,一站式带你了解阿里云瑶池数据库经典的AI产品服务与实践。
|
14天前
|
存储 人工智能 关系型数据库
拥抱Data+AI|玩家去哪儿了?解码Data+AI如何助力游戏日志智能分析
本文为阿里云瑶池数据库「拥抱Data+AI」系列连载第2篇,基于真实客户案例和最佳实践,探讨如何利用阿里云Data+AI解决方案应对游戏行业挑战,通过AI为游戏行业注入新的活力。文章详细介绍了日志数据的实时接入、高效查询、开源开放及AI场景落地,展示了完整的Data+AI解决方案及其实际应用效果。
|
15天前
|
存储 人工智能 关系型数据库
拥抱Data+AI|解码Data+AI助力游戏日志智能分析
「拥抱Data+AI」系列第2篇:阿里云DMS+AnalyticDB助力游戏日志数据分析与预测
拥抱Data+AI|解码Data+AI助力游戏日志智能分析
|
23天前
|
数据采集 人工智能 搜索推荐
|
23天前
|
数据采集 人工智能 搜索推荐
大咖说|Data+AI:企业智能化转型的核心驱动力
在数字化浪潮的推动下,企业正面临前所未有的挑战与机遇。数据与人工智能的结合,形成了强大的Data+AI力量,尤其在近期人工智能迅速发展的背景下,这一力量正在加速重塑企业的运营模式、竞争策略和市场前景,成为适应变化、提升竞争力、推动创新的核心驱动力。本文将讨论企业采用Data+AI平台的必要性及其在企业智能化转型中的作用。
115 0
大咖说|Data+AI:企业智能化转型的核心驱动力
|
28天前
|
SQL 人工智能 DataWorks
DataWorks:新一代 Data+AI 数据开发与数据治理平台演进
本文介绍了阿里云 DataWorks 在 DA 数智大会 2024 上的最新进展,包括新一代智能数据开发平台 DataWorks Data Studio、全新升级的 DataWorks Copilot 智能助手、数据资产治理、全面云原生转型以及更开放的开发者体验。这些更新旨在提升数据开发和治理的效率,助力企业实现数据价值最大化和智能化转型。
228 5
下一篇
无影云桌面