Alibaba Cloud Database group's Academic programs and Recruitment information-阿里云开发者社区

开发者社区> belle.zhoux> 正文

Alibaba Cloud Database group's Academic programs and Recruitment information

简介: We are hiring technical experts   Database Kernel Expert: Job Description: Alibaba Cloud ApsaraDB team is well known for database products, includi.

We are hiring technical experts

Database Kernel Expert

Job Description:

Alibaba Cloud ApsaraDB team is well known for database products, including application database, hybrid transaction/analytical processing database, and big data database. ApsaraDB not only eases the pain of database management, but also focus on improving open source database engines. The team are looking for international background, self- motivated engineers who have deep understanding in database kernel and productivity. If you are a DB geek, why not join us and explore the beauty of DB.

Job requirement:

  • Excelent multiprocess and network programming skills in C++/JAVA/Python/Golang
  • Familiar with Linux kernel and performance tuning
  • Solid knowledge of distributed database

Minimum qualifications:

  • Easy going nature
  • English (full professional proficiency)
  • Ablity to work with team remotely

Preferred qualifications:

  • Experience in design or core development with high data volume and high load products / systems
  • Open source project development experience
  • Familiar with c/c++/erlang proxy development, such as maxscale/spider/nginx etc.

  • Contact information:
  •       /     LinkedIn:b7ce494d2405bf5cd162199a2e49a6f677a41fd2

Invites Research Project Proposals on Apsara Cloud Database

Alibaba Cloud Database team is willing to establish scientific and technical cooperation with universities and institutes. The categories of database areas where research proposals are invited include: In-Memory DB, NVM Optimization, Self-Driving DB, Query Optimization based on Machine Learning, Distributed Transaction, OLAP Computing Engine Optimization, and Time Series/Spatio Temporal DB. Details of the seven categories are as follows:

Topic 1 In-memory Database


In-memory databases, such as SAP HANA, can provide high performance and high throughput, which have been widely deployed in high-performance demand environment. But their deployment cost severely depends on the local memory size, which imposes limitations on how this kind of database can be used. However, with the increase size of the host memory and the extensive utilization of emerging hardware such as 3DXPoint and AEP, servers with super large memory capacity (more than 1TB) become more and more popular. Therefore, the utility of in-memory database is about to usher in rapid development.

Currently, most in-memory databases are designed for traditional hardware such as SSD and HDD. For instance, their sub-modules such as logging, index and cache have not been optimized according to the access features of memory, such as how to fit the ten-nanosecond level access speed and how to take the advantage of byte-addressable capability, which are problems to be solved.

For in-memory databases such as Redis, MemSQL, VoltDB, and so on, there are also many challenges such as how to efficiently support data compression, data indexing and loading, and data persistence in the scenario of large memory capacity or hybrid memory systems. To tackle these issues, innovative design of data structures and storage mechanisms are needed.

Related Research Topics

  1. New compression algorithms that improve memory space utilization as well as support efficient database indexing.
  2. DRAM/NVM/SSD hybrid memory oriented optimization mechanisms for key modules of in-memory database, such as logging, hybrid storage allocation and data layout management.
  3. The low latency and scalable architectures for nonvolatile storage media. The storage engine that supports high throughput (ten million QPS or more).

Topic2 NVM based database optimization


NVM(Non-volatile Memory) has the features of persistent, byte-addressable, high storage density, DRAM-comparable write performance. As a new generation of storage media, NVM is propelling the significant revolution of computing architecture. Core components of databases like storage engine, logging system as well as index structure are capable of being optimized by taking advantage of the above features of NVM.

Ralated Research Topics

  1. The DRAM/NVM/SSD hybrid storage engine for traditional relative databases that improves data access performance.
  2. Efficient NVM-based parallel logging mechanisms, which provide higher throughput of logging system and shorter recovery time after database crash.
  3. Index structures designed for all sorts of NVM media such as 3DXPoint or MRAM, and suitable for the efficient access mode of CPUs.

Topic3: Self-driving Database


The highly efficient database operation and maintenance has long been concerned as a competitive system feature, especially for today when the amount of data is increasing in an exponential speed. Since the database scale is titanic and there exist too many system parameters, plus complicated and ever-changing workload, manual operation and maintenance tends to be far more difficult with time passing by. Therefore, people are paying more attention to the self-driving database, whose core idea is to take advantage of the approach of machine learning and artificial intelligence, which enables the automatic parameter adjustment, self-diagnosis, and self-optimization of databases. The goal of the self-driving database is to offload the burden of DBA, but also deliver improved performance and lower cost.

However, there are too many parameters in databases. For example, MySQL and PostgreSQL both have hundreds of adjustable options, and Oracle has even more. Even worse, many of these parameters are relying on each other and influencing each other. It is a significant challenge to give an appropriate configuration based on complex combinations of these factors.

Ralated Research Topics

  1. Automatic parameter adjustment. Choosing appropriate database parameters is always the key task for a DBA. It is a vital important function for the self-driving database to adjust parameter automatically with the method of machine learning and artificial intelligence.
  2. Workload prediction, Performance diagnosis and optimization. One of core tasks of database operation and maintenance is dynamically performance diagnosis and optimization. We need effective solution to predict the workload, diagnose on the performance and optimize it intelligently.

Topic4:Query optimization based on machine learning


It has been proved that query optimization is an NP-complete problem. Traditional query optimizers are restricted by statistical methods, query cost models, and optimization algorithms, and hence when faced with data skew and highly relational data, these optimizers are unable to choose the best query plan, which therefore leads to query performance deterioration. It is a key requirement how to utilize machine learning as an important basis and input for query scheduling and resource management to bring up more accurate cost modeling and better query plans. It intends to break through the bottleneck and restriction of traditional SQL optimization technology to enhance database query performance and system resource utilization.

Ralated Research Topics

  1. Self-learning query optimizer cost models (Include homogeneous and heterogeneous computing), which optimize the query cost accuracy and the resource prediction accuracy in terms of the query processing operator and workload.
  2. More accurate and intelligent query plan that improves the query performance that previously affected by non-optimized query plan in the practical scenarios and the standard benchmarks.
  3. Performance and resource prediction based on machine learning that enhances the accuracy of prediction in the practical scenarios and the standard benchmarks.

Topic5: Distributed Transaction and Query


Under the background of an era when big data promotes the development of industries, enterprises often select various database products to support online transactions, report generation, log storage and off-line analysis to give a solid support for the high-speed development of their business. HTAP databases are born in such an environment. They supports both OLTP and OLAP in a hybrid form, meeting the requirements of most enterprise-level applications, solving the business problems of customers with a one-stop solution.

As to OLTP, most current systems use an independent center node to deal with distributed transactions, which imposes constraints on their performance and scalability. The Spanner provides the distributed transaction consistency based on special device (GPS plus Atomic Clock) and few companies could afford this. A more practical high performance and scalable distributed transaction solution is needed.

In terms of OLAP, the distributed SQL query is an important measure for databases to deal with a huge amount of data. The quality of the query plan, which involves in several factors such as host hardware, network throughput, and data layout, may have a great impact on the response latency of a distributed SQL query. It is a significant challenge to make an optimized distributed query plan according to these dynamically changing factors.

Ralated Research Topics

  1. Decentralized distributed transaction system prototype that delivers high performance, with up to millions of QPS, hundred microseconds latency, and near-linear horizontal scalability. It would be self-adaptive to various size of transaction, and have fast crash recovery capability for large scale transaction processing.
  2. Novel distributed query optimizers that can precisely and effectively combine resources of the whole distributed database and then develop high quality query plan dynamically and expeditiously.

Topic 6 OLAP Computing Engine


To be able to face current and future rapidly growing data environment, it is a pressing requirement and greater challenge for the new generation OLAP computing engine to consider how to deal with PB even EB level data highly efficiently, how to provide real-time analysis of a huge amount of non structured data, and how to support more complicated computing model apart from traditional SQL.

In the scene of big data OLAP analysis, Column Storage is capable of providing a better I/O and data compression and becomes the primary storage mode currently. However, it remains a challenge for databases to provide the hybrid row-column storage model and the hot/cold data separation policies in the same system to meet the requirement for OLAP scenarios.

Ralated Research Topics

  1. Storage mechanisms for OLAP. Exploiting techniques and algorithms for column storage compression, hybrid row-column storage, and hybrid row-column storage.
  2. Vectorization analysis. To deal with the non-structured data such as videos, images, and voices stored in databases, the OLAP engine need support real time on-line non-structured data indexing using vectorization analysis and indexing enabled by built-in machine learning algorithms.
  3. Code Gen capability. OLAP computing engine need improve its analysis performance by introducing novel compilation and execution technology.

Topic 7 time series/spatio-temporal database


With the rapid development of Internet, IoT and Edge computing, Surveying and Social Sensing, the time series and spatio-temporal data will be enormously enriched and the cross-media linkage of information will be growingly more complicated. New types of data such as 3D-scene data,spatio-temporal trajectory data ,IoT sensing data(time+location+value),spatio-temporal media data and complex relational network data will be used in various industries.

New businesses like IoT、E-commerce/new retail,shared trip, automatic driving, intelligent logistics, intelligent transportation will prompt the time series computing ,the spatio-temporal computing and graph computing to come up everywhere. Therefore, the time series/spatio-temporal/graph computing power of database will become the core requirement to support these emerging industries, and also act as a key driving power for cloud computing business.

Ralated Research Topics

  1. Real time non-structured data processing. Combined with stream computing system, we need to establish a real-time access, efficient compression storage and analysis framework for large-scale time-series / location / graph data, write up to tens of millions of sequential data points per second.
  2. Graph modeling and application based on spatio-temporal constraints. In conjunction with the application of related fields, we need to design and build a Graph model with dynamic temporal and spatial semantic constraints, which can support data compression, fast path/relational search and analysis under large-scale scenarios.
  3. Hierarchical multidimensional efficient index. Design/optimize temporal indexes, spatial indexes, graph indexes and their combinations; combined with time series/ graph new data models and query features, studies to implement pre-aggregated indexes, correlation indexes, and approximate query indexes.
  4. Hardware and software acceleration for graphics and images. Studies to implement graphic and image query processing operator based on hardware acceleration and algorithm optimization, and the performance will be improved by more than one order of magnitude.


6328 0
2155 0
阿里云服务器初级使用者可能面临的问题之一. 使用tomcat或者其他服务器软件设置端口号后,比如 一些不是默认的, mysql的 3306, mssql的1433,有时候打不开网页, 原因是没有在ecs安全组去设置这个端口号. 解决: 点击ecs下网络和安全下的安全组 在弹出的安全组中,如果没有就新建安全组,然后点击配置规则 最后如上图点击添加...或快速创建.   have fun!  将编程看作是一门艺术,而不单单是个技术。
3951 0
购买阿里云ECS云服务器后如何登录?场景不同,阿里云优惠总结大概有三种登录方式: 登录到ECS云服务器控制台 在ECS云服务器控制台用户可以更改密码、更换系.
5663 0
windows server 2008阿里云ECS服务器安全设置
最近我们Sinesafe安全公司在为客户使用阿里云ecs服务器做安全的过程中,发现服务器基础安全性都没有做。为了为站长们提供更加有效的安全基础解决方案,我们Sinesafe将对阿里云服务器win2008 系统进行基础安全部署实战过程! 比较重要的几部分 1.
4962 0
阿里云ECS云服务器初始化是指将云服务器系统恢复到最初状态的过程,阿里云的服务器初始化是通过更换系统盘来实现的,是免费的,阿里云百科网分享服务器初始化教程: 服务器初始化教程方法 本文的服务器初始化是指将ECS云服务器系统恢复到最初状态,服务器中的数据也会被清空,所以初始化之前一定要先备份好。
10711 0
阿里云安全组设置详细图文教程(收藏起来) 阿里云服务器安全组设置规则分享,阿里云服务器安全组如何放行端口设置教程。阿里云会要求客户设置安全组,如果不设置,阿里云会指定默认的安全组。那么,这个安全组是什么呢?顾名思义,就是为了服务器安全设置的。安全组其实就是一个虚拟的防火墙,可以让用户从端口、IP的维度来筛选对应服务器的访问者,从而形成一个云上的安全域。
3795 0
购买阿里云ECS云服务器后如何登录?场景不同,云吞铺子总结大概有三种登录方式: 登录到ECS云服务器控制台 在ECS云服务器控制台用户可以更改密码、更换系统盘、创建快照、配置安全组等操作如何登录ECS云服务器控制台? 1、先登录到阿里云ECS服务器控制台 2、点击顶部的“控制台” 3、通过左侧栏,切换到“云服务器ECS”即可,如下图所示 通过ECS控制台的远程连接来登录到云服务器 阿里云ECS云服务器自带远程连接功能,使用该功能可以登录到云服务器,简单且方便,如下图:点击“远程连接”,第一次连接会自动生成6位数字密码,输入密码即可登录到云服务器上。
16265 0
阿里云数据库:帮用户承担一切数据库风险,给您何止是安心!支持关系型数据库:MySQL、SQL Server、PostgreSQL、PPAS(完美兼容Oracle)、自研PB级数据存储的分布式数据库Petadata、自研金融级云数据库OceanBase支持NoSQL数据库:MongoDB、Redis、Memcache更有褚霸、丁奇、德哥、彭立勋、玄惭、叶翔等顶尖数据库专家服务。
+ 订阅