How AHI Fintech and DataVisor are Securing Data through AI and Big Data

本文涉及的产品
数据可视化 DataV,5个大屏 1个月
可视分析地图(DataV-Atlas),3 个项目,100M 存储空间
简介: With growing threat of cyber-attacks, organizations like AHI Fintech and DataVisor are using Big Data and AI to help customers in China to protect their data.

The field of financial risk control has recently seen a sudden increase in competition over the past year. Several budding enterprises find themselves currently fighting a battle on two fronts—data acquisition capabilities and algorithm technology.

In June 2017, China's Cyber Security Act was launched. Companies that crawl users' mobile phone for data without prior authorization may now be facing serious legal implications. This can include a 7-year jail sentence for legal representatives of the convicted company. Furthermore, many businesses in the field of data acquisition and transactional data are now facing thorough investigation.

With the loss of the gray data industry, the risk control industry seems to be facing an opportunity to move toward healthy compliance, despite the technical challenges it still faces.

A Close Race: Algorithms or Data?

Despite the increasingly thorough regulations, cyber-attack rates are still on the rise. This makes user data even more precious, especially for businesses reliant on the integration of external data. Instead of circumventing regulations, enterprises have now shifted their focus on two major aspects of big data analysis – algorithms and modeling. These two areas are crucial in the field of big data as their emergence has given rise to a group of new risk-control companies in China.

Huang Ling, CEO of AHI FinTech and a Ph.D. in computer science at the University of California, Berkeley, and a part-time professor at the Interdisciplinary Information Institute at Tsinghua University described his work as "A global war." "In the risk control industry, our opponents are huge, and feature worldwide black-market production chain."

Since the very beginning, risk control has been facing a global opponent in the form of a community of hackers. These hackers invade other people’s phones and computers through malicious software. On the one hand, they can access confidential data, on the other hand, they can use this compromised information to open fake accounts. They also can do all kinds of false social interactions such as leaving reviews or making purchases. They eventually have a seemingly normal account with a lot of friends and good credit history. Ultimately, they use these accounts to apply for a variety of financial products.

Meanwhile, the core of risk control is using relevant data to conduct modeling analysis and eliminate fake users, then provide repayment ability and repayment willingness and risk control evaluations for real users.

The data risk control companies that have begun to pop up over the past few years complete the resources that this work relies on algorithms and data acquisition capabilities.

Currently, in the face of the massive worldwide "gangs" of hackers and an enormous black-market production chain, risk control, and anti-fraud solutions in the market are lagging an obvious step behind regarding of algorithmic technology. Program providers often use device fingerprints, black/white lists, regular systems, or tagged machine learning models to detect fraudulent activities. Some methods only conduct shallow analyses; therefore, it is easy for malicious opponents to circumvent and deceive. Some use machine learning methods but often rely on tagged historical data to train models. This labeled data is often scarce and represents only the fraudulent activities that have occurred in the past. Furthermore, the models trained with this data are not accurate enough to cope with the ever-changing fraudulent practices.

Meanwhile, a vast number of risk control companies are primarily reliant on strong data acquisition and integration capabilities. But by malicious crawling, purchasing of hacked data, and so on, the consolidated data eventually includes a substantial portion of the individual's confidential data which include ID card, phone number, bank card, savings account, or exact home address. The solutions that currently exist in the industry are extremely dependent on this kind of data. However, this data uses infringes on personal privacy and its legal legitimacy has been the target of heavy criticism. However, the amount of data available within the industry has a profound impact on modeling accuracy, and after the omission of these sensitive data, it will test the algorithms’ ability to perform.

Acquisition of Risk Control Data: Is Magnitude or Scenario More Important?

Will the sophistication of the algorithm make up for the loss of a large amount of sensitive data? Last month, in an exclusive interview with HC Financial Service Group CFO Shen Yutong (Tony Shen) in New York, the Big Data Digest said we should be more cautious when using data that is not directly related to lending behavior or credit behavior.

"Some people frequently shop online, but because of their frequent shopping habits, they find themselves short of cash and in need of a loan. It is necessary to realize that regular online payments do not always mean that the buyer is a good person to give credit to. Therefore, they fail to get loans when applying for greater amounts of credit."

Social and online-shopping data, while valuable, are not necessarily more useful than data that is directly related to a person’s financial situation. Mr. Shen’s attitude reflects that of traditional financial experts toward Internet risk control—that is one of caution. Furthermore, it embodies a problem currently faced by the industry, when acquiring risk control data, is the volume of data more important than the data scenarios or vice versa?

With regards to this issue, Huang Ling is obviously more supportive of the latter type, "I think it depends on what kind of data we're talking about and how we use those data. The data which we're crawling isn't necessarily going to make much of a difference here. For customer data and application scenarios, it is important to help them mine the data more accurately and closer to the goal of the data.

1

Core customers for financial risk control are Internet companies, Internet financial enterprises, and other financial institutions. What these institutions have in common is a large number of accounts. With accounts at the center, we can acquire a lot of personal information including bank balances, purchasing history, borrowing history, and more. The work of risk control then is to build models around this data and make assessments of the user’s ability and willingness to make repayments, proper loan amount, etc.

Huang Ling believes that desensitizing data for user behavior models can also help realize the goal of risk control fraud prevention. "When executing behavior analysis, we usually look at the person’s social relationships, phone records, and e-commerce behavior. Here, behavior refers to where, when, and on what device the user registers and logs on to the website, what they did after logging in (the pages they visited, the products they purchased, the friends they added, who the spoke to, etc.). Even though this data includes some sensitive information (such as who your friends are) the data is desensitized. This data is then fed into a graphing algorithm and user association analysis is used to identify the hidden information related to the interaction between users."

Huang Ling and his team at AHI FinTech Quest

"We have almost no sensitive data belonging to users, more important for us is to use non-sensitive data, targeting the client’s behavior data. We then combine it with user use scenarios, use AI and Big Data methods to help the client get value out of their data, then create the most appropriate risk control model for the client’s user scenario. Subsequently, we help them achieve the most optimal testing results on their platform. This way, we can automatically detect abnormal connections among tens of millions of users, produce risk control warnings, and guard against organized and systematic risks. We execute all without infringing on users’ privacy or having insight on the type or characteristics of the attack."

Redefining Risk Control: Starting from the Data Source

This kind of data acquisition method also poses greater and more serious requirements to companies in the industry.

"When we do risk control modeling, the first thing we look at is the quality of the data, including whether or not the data is complete and whether or not it includes data that is relevant to risk control."

Huang Ling believes that risk control happens not only during modeling and testing, but on the side of the company, beginning with the collection of data. When dealing with customers, AHI FinTech focuses on helping its clients elevate their related abilities from a service perspective:

First, after a risk control signal is sent to the client’s platform, the platform can then block users with a high-risk value. For users with lower but insignificant risk control values, the system can merge data from other dimensions into their rules and models. It can also perform further processing and refining, and then re-process the user.

2

Furthermore, the system provides feedback during each step of data collection as well as the discovery of fraudulent trends and modeling.

The systems conduct these two aspects in parallel. If collected data does not meet quality standards, then the client must be required to adjust, then provide feedback on its issues in certain aspects. Even if the data cannot be made up for right away, the system has to fix it as soon as possible."

"We make recommendations to the client on how to collect data according to our own risk control and fraud prevention experience, so working with us is not just a matter of us helping you meet fraud detection requirements, we also give the client a lot of feedback and keep open lines of communication. We offer them comprehensive consulting and service from system applications to data collection and risk control."

The opponent of financial risk control is the enormous black-market production chain, making the matter extremely complicated. Organizations are introducing innovative technologies like AI and machine learning to the field in droves but using them correctly is certainly not a simple matter.

Most of solutions currently in the market are reliant on collecting massive amounts of data, then using rules systems or supervised machine learning generated models. These solutions harbor an obvious shortcoming: the models are always reliant on training with historical tag data. However, we can produce tags can only after we have suffered a fraud attack. We create them at the cost of our own sweat, blood, and tears. As our goal is for these kinds of attacks to be increasingly rare, we find ourselves lacking data for training models. Models produced by this kind of tag training are never good enough, and they can only represent fraudulent behavior that we’ve seen in the past. When fraudsters invent new methods, our models that are reliant on tag training always have difficulty in quickly and accurately stopping them—often creating massive losses.

Huang Ling’s team uses semi-supervised learning on data with only a few—or even no—tags to generate models, allowing them to significantly reduce the cost of acquiring new tags, increasing data usage, and producing higher quality models. Using an active machine learning platform, massive data processing capabilities provided by a Big Data system that combines organic and artificial intelligence, and the experience of risk control experts to help artificial intelligence automatically learn previously unknown fraud tactics. Additionally, it can track new fraud methods, and constantly adapt to an ever-changing environment to created anti-fraud machine learning models. This makes it significantly difficult for fraudsters to evade detection.

The Black-Market Production Chain in the Risk Control Industry

In addition to AHI FinTech, there is a Silicon Valley company—Datavisor—in the risk control field that takes a similar approach. Beginning in 2014, Huang Ling left his 7-year career as a senior researcher at Intel, becoming a founding member and Director of Data at Datavisor where he hosted the company’s entire machine learning, user behavior analysis, and credit modeling system. Here, he became a party to the next generation in Silicon Valley and became the most well-known expert in using unsupervised risk control methods.

3

Huang Ling has always believed that risk control in China is not particularly comparable to that in Silicon Valley. The black products faced by the anti-fraud industry are an entire production chain, made up of a gang of sorts that is spread out around the globe. This chain stretches from Eastern Europe to America to China to India. Moreover, it includes security attack software at the top to the people who use this software to control people and phones around the world. It also includes people and phones to create fake users who execute all kinds of fraudulent activity and reap the benefits.

Therefore, to a certain extent, you can say that risk control and anti-fraud work are universal. Several Internet companies and financial institutions in China are also facing attacks from abroad, and a lot of attacks perpetrated in America are conducted via China, India, Africa, or one of several South East Asian countries.

As a result, the much significant difference between America and China likely lies in differences in political policy and industry development:

Because the credit system in America is sound, the cost of committing fraud is relatively high. Whether the user is defrauding a bank or an online merchant, these kinds of activities usually affect the user’s credit score through a variety of channels. In China, this system is still under-developed; therefore, in many situations, user’s credit rating according to the central bank does not reflect online financial fraudulent activities. Therefore, the cost of committing fraud is comparatively low. As a result, examples of large-scale fraud are more abundant in China than they are in the States and they tend to be harder to handle effectively.

Furthermore, the development of the industry has been different in China and America. China’s mobile applications and Internet financial industries have grown to be larger than their counterparts in America, so there is more fraudulent activity surrounding these two sectors than in America.

"After coming back, we noticed that in China—especially in fields related to finance—this kind of fraud gang was larger and craftier than they are in America. They also use more real people to commit fraud, making them harder to detect and requiring more machine learning and AI modeling methods." Huang Ling said.

And on a global battlefield such as this, the addition of experts in artificial intelligence algorithms and security scientists is even more invaluable.

Turning to the entrepreneurial mind, Huang Ling said, "I have been researching and practicing in the fields of artificial intelligence algorithms and Internet security for several years. I hope that the skills and experience I have acquired over this time will be useful in the fields of financial risk control and fraud prevention. I also wish to provide a complete set of systems and services to accompany financial and Internet products and thereby achieve a more secure, honest, and fair industry environment. "

Aside from Huang Ling, the other co-founder of AHI FinTech—chief scientist Xu Wei—also hails from academia. Xu Wei served as a cross-institute adjunct professor at Tsinghua University.

Conclusion

Huang Ling regards the entrepreneurship of AI scientists in the field of risk control as a good thing. This is because they possess the skill and understanding of algorithms. Additionally, he is willing to give them a chance to really participate in the industry, rather than just being a cog in the machine.

相关实践学习
基于Hologres轻松玩转一站式实时仓库
本场景介绍如何利用阿里云MaxCompute、实时计算Flink和交互式分析服务Hologres开发离线、实时数据融合分析的数据大屏应用。
阿里云实时数仓实战 - 项目介绍及架构设计
课程简介 1)学习搭建一个数据仓库的过程,理解数据在整个数仓架构的从采集、存储、计算、输出、展示的整个业务流程。 2)整个数仓体系完全搭建在阿里云架构上,理解并学会运用各个服务组件,了解各个组件之间如何配合联动。 3 )前置知识要求   课程大纲 第一章 了解数据仓库概念 初步了解数据仓库是干什么的 第二章 按照企业开发的标准去搭建一个数据仓库 数据仓库的需求是什么 架构 怎么选型怎么购买服务器 第三章 数据生成模块 用户形成数据的一个准备 按照企业的标准,准备了十一张用户行为表 方便使用 第四章 采集模块的搭建 购买阿里云服务器 安装 JDK 安装 Flume 第五章 用户行为数据仓库 严格按照企业的标准开发 第六章 搭建业务数仓理论基础和对表的分类同步 第七章 业务数仓的搭建  业务行为数仓效果图  
目录
相关文章
|
机器学习/深度学习 传感器 人工智能
AI提高药物发现效率 | ML,Supercomputers and Big Data
AI提高药物发现效率 | ML,Supercomputers and Big Data
133 0
AI提高药物发现效率 | ML,Supercomputers and Big Data
|
人工智能 异构计算
Heterogeneous Computing for AI and Big Data – Alibaba Cloud Computing Conference
Alibaba Cloud heterogeneous platform for elastic computing aims to provide high-quality services for organizations to realize scientific and technological innovations.
1693 0
Heterogeneous Computing for AI and Big Data – Alibaba Cloud Computing Conference
|
人工智能 分布式计算 MaxCompute
The Cloud and AI: A Marriage Made In Heaven For Big Data Analytics?
Cloud-based solutions are saving Big Data from itself with smart, secure and scalable offline data developments to realize impressive ROIs.
1709 0
The Cloud and AI: A Marriage Made In Heaven For Big Data Analytics?
|
7天前
|
人工智能 供应链 安全
AI预测区块链接技术未来
**区块链未来趋势摘要**: - 技术迭代优化,提升性能、安全,广泛应用于金融、供应链、医疗。 - 深度融合产业,扩展至智能合约、数字身份,全球化应用更均衡。 - 标准化规范化进程加速,国家与行业制定相应规则。 - NFT、元宇宙催生新应用,金融区块链受益于数字人民币发展。 - 市场规模预计2026年达163.68亿美元,中国年复合增速73%,潜力巨大。 - 多维度发展势头强劲,区块链将重塑信任与数字经济格局。
|
1天前
|
机器学习/深度学习 人工智能 自然语言处理
影中的ai技术
【6月更文挑战第27天】电影中的ai技术
156 65
|
4天前
|
存储 人工智能 自然语言处理
LLM技术全景图:技术人必备的技术指南,一张图带你掌握从基础设施到AI应用的全面梳理
LLM技术全景图:技术人必备的技术指南,一张图带你掌握从基础设施到AI应用的全面梳理
LLM技术全景图:技术人必备的技术指南,一张图带你掌握从基础设施到AI应用的全面梳理
|
3天前
|
人工智能 运维 Cloud Native
活动回顾丨云原生技术实践营 Serverless + AI 专场 (深圳站) 回顾 & PPT 下载
云原生技术实践营 Serverless + AI 专场 (深圳站) 回顾。
|
1天前
|
机器学习/深度学习 人工智能 自然语言处理
AI技术对法律行业有何影响?
【6月更文挑战第27天】AI技术对法律行业有何影响?
15 3

热门文章

最新文章