Interview with iDST Deputy Managing Director Hua Xiansheng: City Brain – Comprehensive Urban Cognition

简介: From October 11 to 14, 2017, The Computing Conference will be held once again in Hangzhou's Yunqi township.

0213_0213_Distributed_messaging_engine_under_a_peak_of_terabytes_of_data_Part_1

Editor's Note: From October 11 to 14, 2017, The Computing Conference will be held once again in Hangzhou's Yunqi township (get your tickets now!). As one of the world's most influential technology expos, this conference will include brilliant lectures by many Alibaba Group's experts and industry leaders. Starting from today, the Yunqi Community will interview a series of conference guests.

The first guest we interviewed was Alibaba iDST Deputy Managing Director Hua Xiansheng. During the October Computing Conference, he will discuss the latest trends in the computer vision field and the latest progress of the City Brain.

Hua Xiansheng is a leading international expert in the field of visual recognition and search, and has previously served as the program committee chair for the ACM Multimedia Conference and other organizations. Dr. Hua is also a Thousand Talents Program expert, IEEE Fellow, ACM Distinguished Scientist, and MIT TR35 Young Innovator Award recipient.

In 2015, Dr. Hua left the Microsoft Research Institute for Alibaba. In the search business department, he was responsible for optimizing image-based product search technology and his team developed Pailitao, the image search function for the Taobao app. In April 2016, Dr. Hua also joined Alibaba's artificial intelligence research institute iDST, where he directed the research work of the visual computing team. At present, the City Brain project is one of the projects under his charge.

At the Conference on Computer Vision and Pattern Recognition (CVPR 2017) held at the end of July, Dr. Hua, as the director of iDST's visual computing team, delivered a keynote speech titled "Practices of Large-Scale Target Re-Identification", which brought up the City Brain project.

Finding Value in Heterogeneous City Data

The City Brain project was publicly announced at the 2016 Hangzhou Computing Conference. Wang Jian, the then chairman of the Alibaba Group's technical committee, introduced City Brain using the following words: "City Brain has Alibaba Cloud's ET artificial intelligence technology at its core to perform a comprehensive real-time analysis across the city, automate public resource allocation, and fix problems as they arise during city operation. City Brain will evolve into a super artificial intelligence for city governance."

Today, one year has passed, but City Brain remains a mysterious project to outsiders. If you want to use a plain and dated term to define it, you could call it a smart city. However, City Brain is actually far more advanced than what we usually refer to as a smart city.

In the words of Dr. Hua, at its core, City Brain uses big data and big computing to mine valuable information from large volumes of heterogeneous city data.

What is heterogeneous city data? It has two main features:

First, city data is a combination of visual data, public transport data, GPS data, and other heterogeneous data. Naturally, visual data make up the largest and most important part of such data. Second, city data volumes are huge. For example, a city may have hundreds of thousands of cameras, which produce massive data around the clock on a daily basis. Therefore, the inherent advantage of city data is its massive volume. The mission of City Brain is to find a way to extract valuable information from the data.

According to Dr. Hua, "in the past, the value of these data was not fully explored and the deployment and O&M costs for this many devices were very high. However, the value of such data goes far beyond traditional applications such as license plate identification and traffic fines."

City Brain is creating cities with data intelligence. By providing comprehensive, real-time, and complete awareness, it can recognize vehicle shapes, models, trajectories, and speeds, or perceive pedestrians and cyclists. On such a basis, the project can improve decision-making, make forecasts, and intervene. At present, the value of city data is gradually becoming more apparent.

Dr. Hua used traffic conditions as an example: When an emergency arose, City Brain could immediately find the relevant data, such as suspect vehicles, cars involved in accidents, and even criminal suspects. After analyzing relevant data, it can also optimize traffic for the entire city. Going one step further, City Brain can even predict such a situation before it happens. For instance, it can tell you where traffic jams will occur in the next 10 minutes. City Brain is also capable of making predictions much earlier and deploy police and medical resources in advance. It can even prevent traffic accidents by instituting preemptive traffic control and policing.

Dr. Hua added that the comprehensive perception of city data is possible due to two main technologies. First, improved computing power, such as cloud computing, GPUs, and FPGAs, allows us to compute massive volumes of data. For example, we can simultaneously process video feeds from thousands, tens of thousands, or even more roads in real time. Second, deep learning algorithms are critical to the progress in the field of computer vision.

Dr. Hua's team has already made many breakthroughs relative to algorithms. On the server end, they are using more optimized algorithms for vehicle detection and license plate recognition with greater precision. At the same time, they can monitor accidents in real time and predict traffic conditions. City Brain has been deployed and used in the Hangzhou and Xiaoshan metropolitan areas for quite some time.

"We can perform large-scale video processing, but either efficiency or stability poses a major challenge. Over the better half of this year, as a result of ongoing iteration and optimization efforts in the project, its overall processing speed has been increased by a factor of 20 today."

From Perception to Search

Without a doubt, computer vision is both the most important and most challenging aspect in the City Brain project. Dr. Hua stated that visual data is the core of heterogeneous city data. It is more comprehensive than other data. Therefore, the City Brain project invests the most time and energy in visual technology.

"From the coverage perspective, GPS data prevails over visual data, because GPS data is essentially cross-section data. However, visual data is more comprehensive and can give us complete details of what is happening at any given intersection."

However, besides the fundamental aspects of visual perception and recognition, City Brain must also deals with issues related to the structure of visual data, such as search.

Just like Taobao's image search feature, City Brain must index images in real time. One of the major breakthroughs of this project is indexing and searching visual data feeds from cameras across a city.

According to Dr. Hua, from the technical perspective, the overall approach to city image search is similar to Taobao's image search feature. First you need to know where your target is and detect it. Then, you need to identify the vehicle, person, or other moving target and the target's properties. Finally, you need to extract a feature, a high-dimensional vector representing the essential characteristics of this target.

However, city images searches are much more complex than product searches. As far as the customer is concerned, different instances of the same product are essentially identical. However, cars of the same model owned by different people cannot be consider identical. In addition, human feature description and search are another major challenge. If a person's facial image is not clear, this issue becomes even trickier. These are the real challenges that need to be overcome in actual applications.

Of course, the iDST visual team is already at the forefront of the industry. Their results achieved in open test sets have already greatly exceeded the best publicly available results.

Commercialization of AI

With artificial intelligence development in full swing, the past few years have seen the emergence of many AI startups, both in China and abroad. Successfully commercialization is the best standard for measuring the strength of these companies.

Dr. Hua believes that successful AI commercialization must meet five criteria:

First, competent algorithms serve as a foundation.

Second, related data must be available.

Third, there must be a user base large enough.

Forth, there also needs to be a platform with powerful computing capabilities and a sound system architecture (of course, cloud computing has already lowered the barrier to entry for many startups).

Fifth, there must be a good business model.

At present, most artificial intelligence companies focus on visual applications. It would be no exaggeration to say that the field of computer vision is already a "red ocean". It is undeniable that computer vision is the fastest in terms of commercialization among the numerous artificial intelligence technologies Dr. Hua predicts that there will be five main visual application trends in the future:

The first is transportation security, which is also a main focus of City Brain.

Then, there is rich media, the use of visual methods to find valuable information in large volumes of video or image data.

The third trend will be medical imaging. Although adoption of such technologies in the medical community may take longer, they will certainly be an important area in the future.

The fourth trend of application is industry vision. In the future, cameras will be able to replace manual-visual inspections and judgements in most scenarios. This is a field to be further explored in the future.

In addition, the field of terminal-based visual intelligence is quite promising, including chips and some visual-based applications.

It is not hard to see that the fields described above are exactly the R&D focuses of Alibaba Cloud's City Brain, Medical Brain, and Industrial Brain. However, the differences between the different fields are also quite obvious. During the interview, Dr. Hua repeatedly stressed the importance of in-depth study of each industry. Artificial intelligence is gradually penetrating into different industries and sectors. However, to realize the full potential of this technology, in addition to laying the foundation with data and algorithms, in-depth research into specific application scenarios is also of critical importance.

Below we have attached the transcript of our interview with Dr. Hua:

Yunqi: What are the limitations of deep learning when applied to computer vision applications? In the future, will it be outdated by new technologies?

Dr. Hua: In fact, there are many limitations. Deep learning looks wonderful, but there are still many issues that need to be addressed. For example, facial recognition works great on a small scale, and its results are passable when dealing with thousands of individuals. However, any further expansion of the scale is very difficult to achieve. Also, video quality, resolution, and obstructions all limit the effectiveness of recognition. In these aspects, machines still cannot compete with humans. Deep learning is highly reliant on data. Deep learning applications using small data need to be further explored.

In recent years, deep learning has been gaining momentum. However, in the future, there will surely be new technologies to challenge its position.

Yunqi: One of our papers entitled "Video to Shop: Matching Clothes in Videos to Online Shopping Images" was included in last month's CVPR. Can you talk about the innovative ideas about this application?

Dr. Hua: This application uses cutting-edge clothing detection and tracking technologies. To address the multiple angle, multiple scenario, and obstruction challenges in detection of the clothing worn by celebrities, we came up with a Reconfigurable Deep Tree structure. It relies on similarity matching between multiple frames to deal with obstructions, fuzziness, and other problems in individual frames. This structure can be considered an extension of the existing attention model and can be used to solve the problem of multi-model fusion.

Yunqi: In your opinion, what future changes can be predicted in the computer vision field?

Dr. Hua: It depends on which level you want to talk about. If we are talking about technology, I think the evolution of deep learning itself will be an important change. For example, GANs may be used in more scenarios. Large-scale video mining will be another important direction. From a higher level, if we look at the field from the perspective of intelligent applications, I think that more in-depth research into specific industries will truly jump-start commercialization of artificial intelligence, or the so-called visual intelligence. Then this technology will realize its true impact and potential. Practice and exploration in this area will in turn promote the further development of visual technologies. Only by putting this technology into practice can we discover what challenges remain to be addressed. After all, the real-world competition can be very cruel.

Yunqi: What do you plan to share with attendees during this Computing Conference? Can you give us a preview of the topics you will discuss and tell us why you chose them?

Dr. Hua: I will introduce some of the applications of visual technology in various fields and the challenges they face, with special focus to the technologies and applications in the City Brain project. Our previous discussions only touched upon the City Brain project. This time, I want to take a deeper dive. For example, I want to discuss the technical details of City Brain and how we can manifest its value.

目录
相关文章
|
2月前
|
机器学习/深度学习 传感器 人工智能
【博士每天一篇论文-综述】Brain Inspired Computing : A Systematic Survey and Future Trends
本文提供了对脑启发计算(BIC)领域的系统性综述,深入探讨了BIC的理论模型、硬件架构、软件工具、基准数据集,并分析了该领域在人工智能中的重要性、最新进展、主要挑战和未来发展趋势。
52 2
【博士每天一篇论文-综述】Brain Inspired Computing : A Systematic Survey and Future Trends
|
2月前
|
算法 前端开发 数据可视化
【博士每天一篇文献-综述】Brain network communication_ concepts, models and applications
本文综述了脑网络通信的概念、模型和应用,将脑网络通信模型分为扩散过程、参数模型和路由协议三大类,并探讨了这些模型在理解大脑功能、健康和疾病方面的应用,同时提出了未来研究方向和使用Brain Connectivity Toolbox等工具箱进行实际研究的指导。
28 1
【博士每天一篇文献-综述】Brain network communication_ concepts, models and applications
|
2月前
|
数据可视化 算法 Go
【博士每天一篇文献-实验】Exploring the Morphospace of Communication Efficiency in Complex Networks
这篇论文探讨了复杂网络中不同拓扑结构下的通信效率,并使用"效率形态空间"来分析网络拓扑与效率度量之间的关系,得出结论表明通信效率与网络结构紧密相关。
34 3
|
2月前
|
存储 机器学习/深度学习 算法
【博士每天一篇论文-综述】An overview of brain-like computing Architecture, applications, and future trends
本文提供了对脑科学计算的介绍,包括神经元模型、神经信息编码方式、类脑芯片技术、脑科学计算的应用领域以及面临的挑战,展望了脑科学计算的未来发展趋势。
30 0
【博士每天一篇论文-综述】An overview of brain-like computing Architecture, applications, and future trends
|
5月前
|
机器学习/深度学习 编解码 自然语言处理
【虚拟人综述论文】Human-Computer Interaction System: A Survey of Talking-Head Generation
【虚拟人综述论文】Human-Computer Interaction System: A Survey of Talking-Head Generation
|
机器学习/深度学习 人工智能 自然语言处理
《ML Papers Explained》开源项目!
《ML Papers Explained》开源项目!
106 0
《ML Papers Explained》开源项目!
|
算法
Reading《Practical lessons from predicting clicks on Ads at Facebook》(1)
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/sinat_32502811/article/details/80794980 因为在做京东的算法大赛,小白选手,看了一些别人的入门级程序,胡乱改了一通,也没有什么大的进展,而且感觉比赛的问题和点击率预估还是有点像的,所以搜了个论文来读,看看牛人们的思路。
2263 0
Interview with Alibaba Cloud Chief Quantum Technology Scientist Shi Yaoyun: A Long Journey to a Bright Future for Quantum Computing
The 2017 Hangzhou Computing Conference will be held once again in Hangzhou's Yunqi township.
4460 0
|
人工智能
Register Now丨Alibaba Technology Forum, Stanford University
Alibaba Technology Forum is designed for developers and engineering academic society to better understand the latest technology achievements and challenges of Alibaba Group.
2232 0