Groonga开源搜索引擎——列存储做聚合,没有内建分布式,分片和副本是随mysql或者postgreSQL作为存储引擎由MySQL自身来做分片和副本的

本文涉及的产品
RDS AI 助手,专业版
RDS MySQL DuckDB 分析主实例,集群系列 4核8GB
简介:

1. Characteristics of Groonga

ppt:http://mroonga.org/publication/presentation/groonga-mysqluc2011.pdf

1.1. Groonga overview

Groonga is a fast and accurate full text search engine based on inverted index. One of the characteristics of Groonga is that a newly registered document instantly appears in search results. Also, Groonga allows updates without read locks. These characteristics result in superior performance on real-time applications.

Groonga is also a column-oriented database management system (DBMS). Compared with well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate queries. Due to this advantage, Groonga can cover weakness of row-oriented systems.

The basic functions of Groonga are provided in a C library. Also, libraries for using Groonga in other languages, such as Ruby, are provided by related projects. In addition, groonga-based storage engines are provided for MySQL and PostgreSQL. These libraries and storage engines allow any application to use Groonga. See usage examples.

1.2. Full text search and Instant update

In widely used DBMSs, updates are immediately processed, for example, a newly registered record appears in the result of the next query. In contrast, some full text search engines do not support instant updates, because it is difficult to dynamically update inverted indexes, the underlying data structure.

Groonga also uses inverted indexes but supports instant updates. In addition, Groonga allows you to search documents even when updating the document collection. Due to these superior characteristics, Groonga is very flexible as a full text search engine. Also, Groonga always shows good performance because it divides a large task, inverted index merging, into smaller tasks.

1.3. Column store and aggregate query

People can collect more than enough data in the Internet era. However, it is difficult to extract informative knowledge from a large database, and such a task requires a many-sided analysis through trial and error. For example, search refinement by date, time and location may reveal hidden patterns. Aggregate queries are useful to perform this kind of tasks.

An aggregate query groups search results by specified column values and then counts the number of records in each group. For example, an aggregate query in which a location column is specified counts the number of records per location. Making a graph from the result of an aggregate query against a date column is an easy way to visualize changes over time. Also, a combination of refinement by location and an aggregate query against a date column allows visualization of changes over time in specific location. Thus refinement and aggregation are important to perform data mining.

A column-oriented architecture allows Groonga to efficiently process aggregate queries because a column-oriented database, which stores records by column, allows an aggregate query to access only a specified column. On the other hand, an aggregate query on a row-oriented database, which stores records by row, has to access neighbor columns, even though those columns are not required.

1.4. Inverted index and tokenizer

An inverted index is a traditional data structure used for large-scale full text search. A search engine based on inverted index extracts index terms from a document when it is added. Then in retrieval, a query is divided into index terms to find documents containing those index terms. In this way, index terms play an important role in full text search and thus the way of extracting index terms is a key to a better search engine.

A tokenizer is a module to extract index terms. A Japanese full text search engine commonly uses a word-based tokenizer (hereafter referred to as a word tokenizer) and/or a character-based n-gram tokenizer (hereafter referred to as an n-gram tokenizer). A word tokenizer-based search engine is superior in time, space and precision, which is the fraction of relevant documents in a search result. On the other hand, an n-gram tokenizer-based search engine is superior in recall, which is the fraction of retrieved documents in the perfect search result. The best choice depends on the application in practice.

Groonga supports both word and n-gram tokenizers. The simplest built-in tokenizer uses spaces as word delimiters. Built-in n-gram tokenizers (n = 1, 2, 3) are also available by default. In addition, a yet another built-in word tokenizer is available if MeCab, a part-of-speech and morphological analyzer, is embedded. Note that a tokenizer is pluggable and you can develop your own tokenizer, such as a tokenizer based on another part-of-speech tagger or a named-entity recognizer.

1.5. Sharable storage and read lock-free

Multi-core processors are mainstream today and the number of cores per processor is increasing. In order to exploit multiple cores, executing multiple queries in parallel or dividing a query into sub-queries for parallel processing is becoming more important.

A database of Groonga can be shared with multiple threads/processes. Also, multiple threads/processes can execute read queries in parallel even when another thread/process is executing an update query because Groonga uses read lock-free data structures. This feature is suited to a real-time application that needs to update a database while executing read queries. In addition, Groonga allows you to build flexible systems. For example, a database can receive read queries through the built-in HTTP server of Groonga while accepting update queries through MySQL.

1.6. Geo-location (latitude and longitude) search

Location services are getting more convenient because of mobile devices with GPS. For example, if you are going to have lunch or dinner at a nearby restaurant, a local search service for restaurants may be very useful, and for such services, fast geo-location search is becoming more important.

Groonga provides inverted index-based fast geo-location search, which supports a query to find points in a rectangle or circle. Groonga gives high priority to points near the center of an area. Also, Groonga supports distance measurement and you can sort points by distance from any point.

1.7. Groonga library

The basic functions of Groonga are provided in a C library and any application can use Groonga as a full text search engine or a column-oriented database. Also, libraries for languages other than C/C++, such as Ruby, are provided in related projects. See related projects for details.

1.8. Groonga server

Groonga provides a built-in server command which supports HTTP, the memcached binary protocol and the Groonga Query Transfer Protocol (GQTP). Also, a Groonga server supports query caching, which significantly reduces response time for repeated read queries. Using this command, Groonga is available even on a server that does not allow you to install new libraries.

1.9. Mroonga storage engine

Groonga works not only as an independent column-oriented DBMS but also as storage engines of well-known DBMSs. For example, Mroonga is a MySQL pluggable storage engine using Groonga. By using Mroonga, you can use Groonga for column-oriented storage and full text search. A combination of a built-in storage engine, MyISAM or InnoDB, and a Groonga-based full text search engine is also available. All the combinations have good and bad points and the best one depends on the application. See related projects for details.

 

转自:http://groonga.org/docs/characteristic.html















本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6800785.html,如需转载请自行联系原作者

相关实践学习
每个IT人都想学的“Web应用上云经典架构”实战
本实验从Web应用上云这个最基本的、最普遍的需求出发,帮助IT从业者们通过“阿里云Web应用上云解决方案”,了解一个企业级Web应用上云的常见架构,了解如何构建一个高可用、可扩展的企业级应用架构。
MySQL数据库入门学习
本课程通过最流行的开源数据库MySQL带你了解数据库的世界。   相关的阿里云产品:云数据库RDS MySQL 版 阿里云关系型数据库RDS(Relational Database Service)是一种稳定可靠、可弹性伸缩的在线数据库服务,提供容灾、备份、恢复、迁移等方面的全套解决方案,彻底解决数据库运维的烦恼。 了解产品详情: https://www.aliyun.com/product/rds/mysql 
相关文章
|
4月前
|
关系型数据库 MySQL 数据库
阿里云数据库RDS费用价格:MySQL、SQL Server、PostgreSQL和MariaDB引擎收费标准
阿里云RDS数据库支持MySQL、SQL Server、PostgreSQL、MariaDB,多种引擎优惠上线!MySQL倚天版88元/年,SQL Server 2核4G仅299元/年,PostgreSQL 227元/年起。高可用、可弹性伸缩,安全稳定。详情见官网活动页。
913 152
|
4月前
|
关系型数据库 分布式数据库 数据库
阿里云数据库收费价格:MySQL、PostgreSQL、SQL Server和MariaDB引擎费用整理
阿里云数据库提供多种类型,包括关系型与NoSQL,主流如PolarDB、RDS MySQL/PostgreSQL、Redis等。价格低至21元/月起,支持按需付费与优惠套餐,适用于各类应用场景。
|
7月前
|
SQL 关系型数据库 MySQL
Go语言数据库编程:使用 `database/sql` 与 MySQL/PostgreSQL
Go语言通过`database/sql`标准库提供统一数据库操作接口,支持MySQL、PostgreSQL等多种数据库。本文介绍了驱动安装、连接数据库、基本增删改查操作、预处理语句、事务处理及错误管理等内容,涵盖实际开发中常用的技巧与注意事项,适合快速掌握Go语言数据库编程基础。
570 62
|
4月前
|
关系型数据库 MySQL 数据库
阿里云数据库RDS支持MySQL、SQL Server、PostgreSQL和MariaDB引擎
阿里云数据库RDS支持MySQL、SQL Server、PostgreSQL和MariaDB引擎,提供高性价比、稳定安全的云数据库服务,适用于多种行业与业务场景。
|
存储 分布式计算 Hadoop
基于Java的Hadoop文件处理系统:高效分布式数据解析与存储
本文介绍了如何借鉴Hadoop的设计思想,使用Java实现其核心功能MapReduce,解决海量数据处理问题。通过类比图书馆管理系统,详细解释了Hadoop的两大组件:HDFS(分布式文件系统)和MapReduce(分布式计算模型)。具体实现了单词统计任务,并扩展支持CSV和JSON格式的数据解析。为了提升性能,引入了Combiner减少中间数据传输,以及自定义Partitioner解决数据倾斜问题。最后总结了Hadoop在大数据处理中的重要性,鼓励Java开发者学习Hadoop以拓展技术边界。
407 7
|
关系型数据库 MySQL 数据库
市场领先者MySQL的挑战者:PostgreSQL的崛起
PostgreSQL(简称PG)是世界上最先进的开源对象关系型数据库,起源于1986年的加州大学伯克利分校POSTGRES项目。它以其丰富的功能、强大的扩展性和数据完整性著称,支持复杂数据类型、MVCC、全文检索和地理空间数据处理等特性。尽管市场份额略低于MySQL,但PG在全球范围内广泛应用,受到Google、AWS、Microsoft等知名公司支持。常用的客户端工具包括PgAdmin、Navicat和DBeaver。
954 4
|
关系型数据库 分布式数据库 数据库
PostgreSQL+Citus分布式数据库
PostgreSQL+Citus分布式数据库
472 15
|
存储 关系型数据库 MySQL
MySQL vs. PostgreSQL:选择适合你的开源数据库
在众多开源数据库中,MySQL和PostgreSQL无疑是最受欢迎的两个。它们都有着强大的功能、广泛的社区支持和丰富的生态系统。然而,它们在设计理念、性能特点、功能特性等方面存在着显著的差异。本文将从这三个方面对MySQL和PostgreSQL进行比较,以帮助您选择更适合您需求的开源数据库。
667 4
|
关系型数据库 MySQL PostgreSQL
postgresql和mysql中的limit使用方法
postgresql和mysql中的limit使用方法
540 1

热门文章

最新文章

推荐镜像

更多