Alibaba Cloud Database group's Academic programs and Recruitment information

本文涉及的产品
云原生数据库 PolarDB 分布式版,标准版 2核8GB
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
简介: We are hiring technical experts   Database Kernel Expert: Job Description: Alibaba Cloud ApsaraDB team is well known for database products, includi.

We are hiring technical experts

Database Kernel Expert

Job Description:

Alibaba Cloud ApsaraDB team is well known for database products, including application database, hybrid transaction/analytical processing database, and big data database. ApsaraDB not only eases the pain of database management, but also focus on improving open source database engines. The team are looking for international background, self- motivated engineers who have deep understanding in database kernel and productivity. If you are a DB geek, why not join us and explore the beauty of DB.

Job requirement:

  • Excelent multiprocess and network programming skills in C++/JAVA/Python/Golang
  • Familiar with Linux kernel and performance tuning
  • Solid knowledge of distributed database

Minimum qualifications:

  • Easy going nature
  • English (full professional proficiency)
  • Ablity to work with team remotely

Preferred qualifications:

  • Experience in design or core development with high data volume and high load products / systems
  • Open source project development experience
  • Familiar with c/c++/erlang proxy development, such as maxscale/spider/nginx etc.


  • Contact information:
  • Email:sibo.zsb@alibaba-inc.com       /     LinkedIn:b7ce494d2405bf5cd162199a2e49a6f677a41fd2





Invites Research Project Proposals on Apsara Cloud Database


Alibaba Cloud Database team is willing to establish scientific and technical cooperation with universities and institutes. The categories of database areas where research proposals are invited include: In-Memory DB, NVM Optimization, Self-Driving DB, Query Optimization based on Machine Learning, Distributed Transaction, OLAP Computing Engine Optimization, and Time Series/Spatio Temporal DB. Details of the seven categories are as follows:

Topic 1 In-memory Database

Background

In-memory databases, such as SAP HANA, can provide high performance and high throughput, which have been widely deployed in high-performance demand environment. But their deployment cost severely depends on the local memory size, which imposes limitations on how this kind of database can be used. However, with the increase size of the host memory and the extensive utilization of emerging hardware such as 3DXPoint and AEP, servers with super large memory capacity (more than 1TB) become more and more popular. Therefore, the utility of in-memory database is about to usher in rapid development.

Currently, most in-memory databases are designed for traditional hardware such as SSD and HDD. For instance, their sub-modules such as logging, index and cache have not been optimized according to the access features of memory, such as how to fit the ten-nanosecond level access speed and how to take the advantage of byte-addressable capability, which are problems to be solved.

For in-memory databases such as Redis, MemSQL, VoltDB, and so on, there are also many challenges such as how to efficiently support data compression, data indexing and loading, and data persistence in the scenario of large memory capacity or hybrid memory systems. To tackle these issues, innovative design of data structures and storage mechanisms are needed.

Related Research Topics

  1. New compression algorithms that improve memory space utilization as well as support efficient database indexing.
  2. DRAM/NVM/SSD hybrid memory oriented optimization mechanisms for key modules of in-memory database, such as logging, hybrid storage allocation and data layout management.
  3. The low latency and scalable architectures for nonvolatile storage media. The storage engine that supports high throughput (ten million QPS or more).


Topic2 NVM based database optimization

Background

NVM(Non-volatile Memory) has the features of persistent, byte-addressable, high storage density, DRAM-comparable write performance. As a new generation of storage media, NVM is propelling the significant revolution of computing architecture. Core components of databases like storage engine, logging system as well as index structure are capable of being optimized by taking advantage of the above features of NVM.

Ralated Research Topics

  1. The DRAM/NVM/SSD hybrid storage engine for traditional relative databases that improves data access performance.
  2. Efficient NVM-based parallel logging mechanisms, which provide higher throughput of logging system and shorter recovery time after database crash.
  3. Index structures designed for all sorts of NVM media such as 3DXPoint or MRAM, and suitable for the efficient access mode of CPUs.



Topic3: Self-driving Database

Background

The highly efficient database operation and maintenance has long been concerned as a competitive system feature, especially for today when the amount of data is increasing in an exponential speed. Since the database scale is titanic and there exist too many system parameters, plus complicated and ever-changing workload, manual operation and maintenance tends to be far more difficult with time passing by. Therefore, people are paying more attention to the self-driving database, whose core idea is to take advantage of the approach of machine learning and artificial intelligence, which enables the automatic parameter adjustment, self-diagnosis, and self-optimization of databases. The goal of the self-driving database is to offload the burden of DBA, but also deliver improved performance and lower cost.

However, there are too many parameters in databases. For example, MySQL and PostgreSQL both have hundreds of adjustable options, and Oracle has even more. Even worse, many of these parameters are relying on each other and influencing each other. It is a significant challenge to give an appropriate configuration based on complex combinations of these factors.

Ralated Research Topics

  1. Automatic parameter adjustment. Choosing appropriate database parameters is always the key task for a DBA. It is a vital important function for the self-driving database to adjust parameter automatically with the method of machine learning and artificial intelligence.
  2. Workload prediction, Performance diagnosis and optimization. One of core tasks of database operation and maintenance is dynamically performance diagnosis and optimization. We need effective solution to predict the workload, diagnose on the performance and optimize it intelligently.


Topic4:Query optimization based on machine learning

Background

It has been proved that query optimization is an NP-complete problem. Traditional query optimizers are restricted by statistical methods, query cost models, and optimization algorithms, and hence when faced with data skew and highly relational data, these optimizers are unable to choose the best query plan, which therefore leads to query performance deterioration. It is a key requirement how to utilize machine learning as an important basis and input for query scheduling and resource management to bring up more accurate cost modeling and better query plans. It intends to break through the bottleneck and restriction of traditional SQL optimization technology to enhance database query performance and system resource utilization.

Ralated Research Topics

  1. Self-learning query optimizer cost models (Include homogeneous and heterogeneous computing), which optimize the query cost accuracy and the resource prediction accuracy in terms of the query processing operator and workload.
  2. More accurate and intelligent query plan that improves the query performance that previously affected by non-optimized query plan in the practical scenarios and the standard benchmarks.
  3. Performance and resource prediction based on machine learning that enhances the accuracy of prediction in the practical scenarios and the standard benchmarks.


Topic5: Distributed Transaction and Query

Background

Under the background of an era when big data promotes the development of industries, enterprises often select various database products to support online transactions, report generation, log storage and off-line analysis to give a solid support for the high-speed development of their business. HTAP databases are born in such an environment. They supports both OLTP and OLAP in a hybrid form, meeting the requirements of most enterprise-level applications, solving the business problems of customers with a one-stop solution.

As to OLTP, most current systems use an independent center node to deal with distributed transactions, which imposes constraints on their performance and scalability. The Spanner provides the distributed transaction consistency based on special device (GPS plus Atomic Clock) and few companies could afford this. A more practical high performance and scalable distributed transaction solution is needed.

In terms of OLAP, the distributed SQL query is an important measure for databases to deal with a huge amount of data. The quality of the query plan, which involves in several factors such as host hardware, network throughput, and data layout, may have a great impact on the response latency of a distributed SQL query. It is a significant challenge to make an optimized distributed query plan according to these dynamically changing factors.

Ralated Research Topics

  1. Decentralized distributed transaction system prototype that delivers high performance, with up to millions of QPS, hundred microseconds latency, and near-linear horizontal scalability. It would be self-adaptive to various size of transaction, and have fast crash recovery capability for large scale transaction processing.
  2. Novel distributed query optimizers that can precisely and effectively combine resources of the whole distributed database and then develop high quality query plan dynamically and expeditiously.


Topic 6 OLAP Computing Engine

Background

To be able to face current and future rapidly growing data environment, it is a pressing requirement and greater challenge for the new generation OLAP computing engine to consider how to deal with PB even EB level data highly efficiently, how to provide real-time analysis of a huge amount of non structured data, and how to support more complicated computing model apart from traditional SQL.

In the scene of big data OLAP analysis, Column Storage is capable of providing a better I/O and data compression and becomes the primary storage mode currently. However, it remains a challenge for databases to provide the hybrid row-column storage model and the hot/cold data separation policies in the same system to meet the requirement for OLAP scenarios.

Ralated Research Topics

  1. Storage mechanisms for OLAP. Exploiting techniques and algorithms for column storage compression, hybrid row-column storage, and hybrid row-column storage.
  2. Vectorization analysis. To deal with the non-structured data such as videos, images, and voices stored in databases, the OLAP engine need support real time on-line non-structured data indexing using vectorization analysis and indexing enabled by built-in machine learning algorithms.
  3. Code Gen capability. OLAP computing engine need improve its analysis performance by introducing novel compilation and execution technology.


Topic 7 time series/spatio-temporal database

Background

With the rapid development of Internet, IoT and Edge computing, Surveying and Social Sensing, the time series and spatio-temporal data will be enormously enriched and the cross-media linkage of information will be growingly more complicated. New types of data such as 3D-scene data,spatio-temporal trajectory data ,IoT sensing data(time+location+value),spatio-temporal media data and complex relational network data will be used in various industries.

New businesses like IoT、E-commerce/new retail,shared trip, automatic driving, intelligent logistics, intelligent transportation will prompt the time series computing ,the spatio-temporal computing and graph computing to come up everywhere. Therefore, the time series/spatio-temporal/graph computing power of database will become the core requirement to support these emerging industries, and also act as a key driving power for cloud computing business.

Ralated Research Topics

  1. Real time non-structured data processing. Combined with stream computing system, we need to establish a real-time access, efficient compression storage and analysis framework for large-scale time-series / location / graph data, write up to tens of millions of sequential data points per second.
  2. Graph modeling and application based on spatio-temporal constraints. In conjunction with the application of related fields, we need to design and build a Graph model with dynamic temporal and spatial semantic constraints, which can support data compression, fast path/relational search and analysis under large-scale scenarios.
  3. Hierarchical multidimensional efficient index. Design/optimize temporal indexes, spatial indexes, graph indexes and their combinations; combined with time series/ graph new data models and query features, studies to implement pre-aggregated indexes, correlation indexes, and approximate query indexes.
  4. Hardware and software acceleration for graphics and images. Studies to implement graphic and image query processing operator based on hardware acceleration and algorithm optimization, and the performance will be improved by more than one order of magnitude.




目录
相关文章
|
Serverless 容器
s to describe what capacity cloud native database serv
No less than 800 words to describe what capacity cloud native database serverless, and cloud native database serverless how to help the business "cost reduction and efficiency"
82 0
|
关系型数据库 PostgreSQL RDS
Cloud Massive Task Scheduling System Database Design - Alibaba Cloud RDS PostgreSQL Cases
PostgreSQL is crucial to cloud massive task scheduling system. Here we will describe how to design a system database for cloud massive task scheduling.
1240 0
Cloud Massive Task Scheduling System Database Design - Alibaba Cloud RDS PostgreSQL Cases
|
关系型数据库 MySQL 分布式数据库
The Evolution of Alibaba Cloud's Relational Database Services Architecture – PolarDB
This article discusses the history of Alibaba Cloud's RDS architecture, as well as the motivation behind the development of PolarDB.
4843 0
The Evolution of Alibaba Cloud's Relational Database Services Architecture – PolarDB
|
SQL 关系型数据库 MySQL
Deep Dive on Alibaba Cloud’s Next-Generation Database
Are you looking for a low-cost and high-performance database? Check out the upcoming next-generation relational database by Alibaba Cloud – PolarDB!
3516 0
Deep Dive on Alibaba Cloud’s Next-Generation Database
|
SQL Oracle NoSQL
Alibaba Cloud RDS vs. IBM Cloud Database vs. Oracle Cloud Database
This article provides a technical review of the cloud-based database offerings from Alibaba Cloud, Oracle, and IBM.
2864 0
|
8月前
|
SQL Oracle 关系型数据库
WARNING: Too Many Parse Errors With error=911 When Running a JDBC Application Connected to an Oracle 19c database
WARNING: Too Many Parse Errors With error=911 When Running a JDBC Application Connected to an Oracle 19c database (
104 2
|
8月前
|
Oracle 关系型数据库
19c 开启Oracle Database Vault
19c 开启Oracle Database Vault
174 1
|
8月前
|
SQL Oracle 关系型数据库
Connect to Autonomous Database Using Oracle Database Tools
Connect to Autonomous Database Using Oracle Database Tools
68 1
|
7月前
|
Oracle 关系型数据库 Linux
Requirements for Installing Oracle Database/Client 19c on OL8 or RHEL8 64-bit (x86-64) (Doc ID 2668780.1)
Requirements for Installing Oracle Database/Client 19c on OL8 or RHEL8 64-bit (x86-64) (Doc ID 2668780.1)
59 0
|
8月前
|
人工智能 Oracle 关系型数据库
一篇文章弄懂Oracle和PostgreSQL的Database Link
一篇文章弄懂Oracle和PostgreSQL的Database Link