cassandra的全文检索插件

2017-11-08 1991

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

https://github.com/Stratio/cassandra-lucene-index

Stratio’s Cassandra Lucene Index

Stratio’s Cassandra Lucene Index, derived from Stratio Cassandra, is a plugin for Apache Cassandra that extends its index functionality to provide near real time search such as ElasticSearch or Solr, including full text search capabilities and free multivariable, geospatial and bitemporal search. It is achieved through an Apache Lucene based implementation of Cassandra secondary indexes, where each node of the cluster indexes its own data. Stratio’s Cassandra indexes are one of the core modules on which Stratio’s BigData platform is based.

Index relevance searches allow you to retrieve the n more relevant results satisfying a search. The coordinator node sends the search to each node in the cluster, each node returns its n best results and then the coordinator combines these partial results and gives you the n best of them, avoiding full scan. You can also base the sorting in a combination of fields.

Any cell in the tables can be indexed, including those in the primary key as well as collections. Wide rows are also supported. You can scan token/key ranges, apply additional CQL3 clauses and page on the filtered results.

Index filtered searches are a powerful help when analyzing the data stored in Cassandra with MapReduce frameworks as Apache Hadoop or, even better, Apache Spark. Adding Lucene filters in the jobs input can dramatically reduce the amount of data to be processed, avoiding full scan.

The following benchmark result can give you an idea about the expected performance when combining Lucene indexes with Spark. We do successive queries requesting from the 1% to 100% of the stored data. We can see a high performance for the index for the queries requesting strongly filtered data. However, the performance decays in less restrictive queries. As the number of records returned by the query increases, we reach a point where the index becomes slower than the full scan. So, the decision to use indexes in your Spark jobs depends on the query selectivity. The trade-off between both approaches depends on the particular use case. Generally, combining Lucene indexes with Spark is recommended for jobs retrieving no more than the 25% of the stored data.

This project is not intended to replace Apache Cassandra denormalized tables, inverted indexes, and/or secondary indexes. It is just a tool to perform some kind of queries which are really hard to be addressed using Apache Cassandra out of the box features, filling the gap between real-time and analytics.

More detailed information is available at Stratio’s Cassandra Lucene Index documentation.

Features

Lucene search technology integration into Cassandra provides:

Stratio’s Cassandra Lucene Index and its integration with Lucene search technology provides:

Full text search (language-aware analysis, wildcard, fuzzy, regexp)
Boolean search (and, or, not)
Sorting by relevance, column value, and distance
Geospatial indexing (points, lines, polygons and their multiparts)
Geospatial transformations (bounding box, buffer, centroid, convex hull, union, difference, intersection)
Geospatial operations (intersects, contains, is within)
Bitemporal search (valid and transaction time durations)
CQL complex types (list, set, map, tuple and UDT)
CQL user defined functions (UDF)
CQL paging, even with sorted searches
Columns with TTL
Third-party CQL-based drivers compatibility
Spark and Hadoop compatibility

Not yet supported:

Thrift API
Legacy compact storage option
Indexing counter columns
Indexing static columns
Other partitioners than Murmur3

Requirements

Cassandra (identified by the three first numbers of the plugin version)
Java >= 1.8 (OpenJDK and Sun have been tested)
Maven >= 3.0

本文转自张昺华-sky博客园博客，原文链接：http://www.cnblogs.com/bonelee/p/6757830.html，如需转载请自行联系原作者

cassandra的全文检索插件

Stratio’s Cassandra Lucene Index

Features

Requirements

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

cassandra的全文检索插件

Stratio’s Cassandra Lucene Index

Features

Requirements

热门文章

最新文章

相关课程

相关电子书

相关实验场景