Apache Solr vs Elasticsearch-feature

简介: API Feature Solr 6.2.1 ElasticSearch 5.0 Format XML, CSV, JSON JSON HTTP REST API Bin...

API

Feature Solr 6.2.1 ElasticSearch 5.0
Format XML, CSV, JSON JSON
HTTP REST API
Binary API   SolrJ  TransportClient, Thrift (through a plugin)
JMX support  ES specific stats are exposed through the REST API
Official client libraries  Java Java, Groovy, PHP, Ruby, Perl, Python, .NET, Javascript Official list of clients
Community client libraries  PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list
3rd-party product integration (open-source) Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, Wordpress, CouchBase
3rd-party product integration (commercial) DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR SearchBlox, Hortonworks Data Platform, MapR etc Complete list
Output JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java JSON, XML/HTML (via plugin)

Infrastructure

Feature Solr 6.2.1 ElasticSearch 5.0
Master-slave replication  Only in non-SolrCloud. In SolrCloud, behaves identically to ES.  Not an issue because shards are replicated across nodes.
Integrated snapshot and restore Filesystem Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories

Indexing

Feature Solr 6.2.1 ElasticSearch 5.0
Data Import DataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File [DEPRECATED in 2.x] Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues 
Partial Doc Updates   with stored fields  with _source field
Custom Analyzers and Tokenizers 
Per-field analyzer chain 
Per-doc/query analyzer chain 
Index-time synonyms   Supports Solr and Wordnet synonym format
Query-time synonyms   especially via hon-lucene-synonyms  Technically, yes, but practically no because multi-word/phrase query-time synonyms are not supported. See ES docs and hon-lucene-synonyms blog for nuances.
Multiple indexes 
Near-Realtime Search/Indexing 
Complex documents 
Schemaless   4.4+
Multiple document types per schema   One set of fields per schema, one schema per core
Online schema changes   Schemaless mode or via dynamic fields.  Only backward-compatible changes.
Apache Tika integration 
Dynamic fields 
Field copying   via multi-fields
Hash-based deduplication   Murmur plugin or ER plugin

Searching

Feature Solr 6.2.1 ElasticSearch 5.0
Lucene Query parsing 
Structured Query DSL   Need to programmatically create queries if going beyond Lucene query syntax.
Span queries   via SOLR-2703
Spatial/geo search 
Multi-point spatial search 
Faceting   Top N term accuracy can be controlled with shard_size
Advanced Faceting   New JSON faceting API as of Solr 5.x  blog post
Geo-distance Faceting
Pivot Facets 
More Like This
Boosting by functions 
Boosting using scripting languages 
Push Queries  JIRA issue  Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping 
Query Re-Ranking   via Rescoring or a plugin
Index-based Spellcheck   Phrase Suggester
Wordlist-based Spellcheck 
Autocomplete
Query elevation  workaround
Intra-index joins   via parent-child query  via has_children and top_children queries
Inter-index joins   Joined index has to be single-shard and replicated across all nodes.
Resultset Scrolling   New to 4.7.0  via scan search type
Filter queries   also supports filtering by native scripts
Filter execution order   local params and cache property
Alternative QueryParsers   DisMax, eDisMax  query_string, dis_max, match, multi_match etc
Negative boosting   but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes  it can search across multiple compatible collections
Result highlighting
Custom Similarity 
Searcher warming on index reload   Warmers API
Term Vectors API

Customizability

Feature Solr 6.2.1 ElasticSearch 5.0
Pluggable API endpoints 
Pluggable search workflow   via SearchComponents
Pluggable update workflow   via UpdateRequestProcessor
Pluggable Analyzers/Tokenizers
Pluggable QueryParsers 
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing 
Pluggable webapps   [site plugins DEPRECATED in 5.x] blog post
Automated plugin installation   Installable from GitHub, maven, sonatype or elasticsearch.org

Distributed

Feature Solr 6.2.1 ElasticSearch 5.0
Self-contained cluster   Depends on separate ZooKeeper server  Only Elasticsearch nodes
Automatic node discovery  ZooKeeper  internal Zen Discovery or ZooKeeper
Partition tolerance  The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function.  Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See this
Automatic failover  If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards.
Automatic leader election
Shard replication
Sharding 
Automatic shard rebalancing   it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.
Change # of shards  Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime.  each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime.
Shard splitting
Relocate shards and replicas   can be done by creating a shard replicate on the desired node and then removing the shard from the source node  can move shards and replicas to any node in the cluster on demand
Control shard routing   shards or _route_ parameter  routing parameter
Pluggable shard/replica assignment  Rule-based replica assignment  Probabilistic shard balancing with Tempest plugin
Consistency Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index. Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.

Misc

Feature Solr 6.2.1 ElasticSearch 5.0
Web Admin interface  bundled with Solr  Marvel or Kibana apps
Visualisation Banana (Port of Kibana) Kibana
Hosting providers WebSolrSearchifyHosted-SolrIndexDepotOpenSolrgotosolr FoundObjectRocketbonsai.ioIndexistoqbox.ioIndexDepotCompose.io


Thoughts...

I'm embedding my answer to this "Solr-vs-Elasticsearch" Quora question verbatim here:

1. Elasticsearch was born in the age of REST APIs. If you love REST APIs, you'll probably feel more at home with ES from the get-go. I don't actually think it's 'cleaner' or 'easier to use', but just that it is more aligned with web 2.0 developers' mindsets.

2. Elasticsearch's Query DSL syntax is really flexible and it's pretty easy to write complex queries with it, though it does border on being verbose. Solr doesn't have an equivalent, last I checked. Having said that, I've never found Solr's query syntax wanting, and I've always been able to easily write a custom SearchComponent if needed (more on this later).

3. I find Elasticsearch's documentation to be pretty awful. It doesn't help that some examples in the documentation are written in YAML and others in JSON. I wrote a ES code parser once to auto-generate documentation from Elasticsearch's source and found a number of discrepancies between code and what's documented on the website, not to mention a number of undocumented/alternative ways to specify the same config key. 

By contrast, I've found Solr to be consistent and really well-documented. I've found pretty much everything I've wanted to know about querying and updating indices without having to dig into code much. Solr's schema.xml and solrconfig.xml are *extensively* documented with most if not all commonly used configurations. 

4. Whilst what Rick says about ES being mostly ready to go out-of-box is true, I think that is also a possible problem with ES. Many users don't take the time to do the most simple config (e.g. type mapping) of ES because it 'just works' in dev, and end up running into issues in production. 

And once you do have to do config, then I personally prefer Solr's config system over ES'. Long JSON config files can get overwhelming because of the JSON's lack of support for comments. Yes you can use YAML, but it's annoying and confusing to go back and forth between YAML and JSON. 

5. If your own app works/thinks in JSON, then without a doubt go for ES because ES thinks in JSON too. Solr merely supports it as an afterthought. ES has a number of nice JSON-related features such as parent-child and nested docs that makes it a very natural fit. Parent-child joins are awkward in Solr, and I don't think there's a Solr equivalent for ES Inner hits.

6. ES doesn't require ZooKeeper for it's 'elastic' features which is nice coz I personally find ZK unpleasant, but as a result, ES does have issues with split-brain scenarios though (google 'elasticsearch split-brain' or see this: Elasticsearch Resiliency Status).

7. Overall from working with clients as a Solr/Elasticsearch consultant, I've found that developer preferences tend to end up along language party lines: if you're a Java/c# developer, you'll be pretty happy with Solr. If you live in Javascript or Ruby, you'll probably love Elasticsearch. If you're on Python or PHP, you'll probably be fine with either. 

Something to add about this: ES doesn't have a very elegant Java API IMHO (you'll basically end up using REST because it's less painful), whereas Solrj is very satisfactory and more efficient than Solr's REST API. If you're primarily a Java dev team, do take this into consideration for your sanity. There's no scenario in which constructing JSON in Java is fun/simple, whereas in Python its absolutely pain-free, and believe me, if you have a non-trivial app, your ES json query strings will be works of art. 

8. ES doesn't have in-built support for pluggable 'SearchComponents', to use Solr's terminology. SearchComponents are (for me) a pretty indispensable part of Solr for anyone who needs to do anything customized and in-depth with search queries. 

Yes of course, in ES you can just implement your own RestHandler, but that's just not the same as being able to plug-into and rewire the way search queries are handled and parsed. 

9. Whichever way you go, I highly suggest you choose a client library which is as 'close to the metal' as you can get. Both ES and Solr have *really* simple search and updating search APIs. If a client library introduces an additional DSL layer in attempt to 'simplify', I suggest you think long and hard about using it, as it's likely to complicate matters in the long-run, and make debugging and asking for help on SO more problematic. 

In particular, if you're using Rails + Solr, consider using rsolr/rsolr
instead of sunspot/sunspot if you can help it. ActiveRecord is complex code and sufficiently magical. The last thing you want is more magic on top of that. 

---

To conclude, ES and Solr have more or less feature-parity and from a feature standpoint, there's rarely one reason to go one way or the other (unless your app lives/breathes JSON). Performance-wise, they are also likely to be quite similar (I'm sure there are exceptions to the rule. ES' relatively new autocomplete implementation, for example, is a pretty dramatic departure from previous Lucene/Solr implementations, and I suspect it produces faster responses at scale).

ES does offer less friction from the get-go and you feel like you have something working much quicker, but I find this to be illusory. Any time gained in this stage is lost when figuring out how to properly configure ES because of poor documentation - an inevitablity when you have a non-trivial application. 

Solr encourages you to understand a little more about what you're doing, and the chance of you shooting yourself in the foot is somewhat lower, mainly because you're forced to read and modify the 2 well-documented XML config files in order to have a working search app.

---

EDIT on Nov 2015: 

ES has been gradually distinguishing itself from Solr when it comes to data analytics. I think it's fair to attribute this to the immense traction of the ELK stack in the logging, monitoring and analytic space. My guess is that this is where Elastic (the company) gets the majority of its revenue, so it makes perfect sense that ES (the product) reflects this.

We see this manifesting primarily in the form of aggregations, which is a more flexible and nuanced replacement for facets. Read more about aggregations here: Migrating to aggregations

Aggregations have been out for a while now (since 1.4), but with the recently released ES 2.0 comes pipeline aggregations, which let you compute aggregations such as derivatives, moving averages, and series arithmetic on the results of other aggregations. Very cool stuff, and Solr simply doesn't have an equivalent. More on pipeline aggregations here: Out of this world aggregations

If you're currently using or contemplating using Solr in an analytics app, it is worth your while to look into ES aggregation features to see if you need any of it.


相关实践学习
以电商场景为例搭建AI语义搜索应用
本实验旨在通过阿里云Elasticsearch结合阿里云搜索开发工作台AI模型服务,构建一个高效、精准的语义搜索系统,模拟电商场景,深入理解AI搜索技术原理并掌握其实现过程。
ElasticSearch 最新快速入门教程
本课程由千锋教育提供。全文搜索的需求非常大。而开源的解决办法Elasricsearch(Elastic)就是一个非常好的工具。目前是全文搜索引擎的首选。本系列教程由浅入深讲解了在CentOS7系统下如何搭建ElasticSearch,如何使用Kibana实现各种方式的搜索并详细分析了搜索的原理,最后讲解了在Java应用中如何集成ElasticSearch并实现搜索。  
目录
相关文章
|
8月前
|
存储 监控 数据可视化
可观测性方案怎么选?SelectDB vs Elasticsearch vs ClickHouse
基于 SelectDB 的高性能倒排索引、高吞吐量写入和高压缩存储,用户可以构建出性能高于Elasticsearch 10 倍的可观测性平台,并支持国内外多个云上便捷使用 SelectDB Cloud 的开箱即用服务。
458 8
可观测性方案怎么选?SelectDB vs Elasticsearch vs ClickHouse
|
存储 自然语言处理 BI
从 Elasticsearch 到 Apache Doris 腾讯音乐内容库升级,统一搜索分析引擎,成本直降 80%
实现写入性能提升 4 倍、使用成本节省达 80% 的显著成效
540 1
从 Elasticsearch 到 Apache Doris 腾讯音乐内容库升级,统一搜索分析引擎,成本直降 80%
|
10月前
|
存储 SQL Apache
为什么 Apache Doris 是比 Elasticsearch 更好的实时分析替代方案?
本文将从技术选型的视角,从开放性、系统架构、实时写入、实时存储、实时查询等多方面,深入分析 Apache Doris 与 Elasticsearch 的能力差异及性能表现
1126 17
为什么 Apache Doris 是比 Elasticsearch 更好的实时分析替代方案?
|
12月前
|
存储 运维 监控
金融场景 PB 级大规模日志平台:中信银行信用卡中心从 Elasticsearch 到 Apache Doris 的先进实践
中信银行信用卡中心每日新增日志数据 140 亿条(80TB),全量归档日志量超 40PB,早期基于 Elasticsearch 构建的日志云平台,面临存储成本高、实时写入性能差、文本检索慢以及日志分析能力不足等问题。因此使用 Apache Doris 替换 Elasticsearch,实现资源投入降低 50%、查询速度提升 2~4 倍,同时显著提高了运维效率。
730 3
金融场景 PB 级大规模日志平台:中信银行信用卡中心从 Elasticsearch 到 Apache Doris 的先进实践
|
机器学习/深度学习 分布式计算 自然语言处理
【搜索引擎选型】Solr vs. Elasticsearch:怎么选?
【搜索引擎选型】Solr vs. Elasticsearch:怎么选?
|
机器学习/深度学习 分布式计算 自然语言处理
【搜索引擎选型】Solr vs. Elasticsearch:选择开源搜索引擎
【搜索引擎选型】Solr vs. Elasticsearch:选择开源搜索引擎
|
存储 SQL 监控
独家深度 | 一文看懂 ClickHouse vs Elasticsearch:谁更胜一筹?
本文的主旨在于通过彻底剖析ClickHouse和Elasticsearch的内核架构,从原理上讲明白两者的优劣之处,同时会附上一份覆盖多场景的测试报告给读者作为参考。
17010 1
独家深度 | 一文看懂 ClickHouse vs Elasticsearch:谁更胜一筹?
|
安全 Java Apache
elasticsearch 升级Apache Log4j2组件包
elasticsearch 升级Apache Log4j2组件包
1499 0
|
弹性计算 监控 Apache
【云上ELK系列】阿里云Elasticsearch的Apache日志分析实践
阿里云Elasticsearch采集上游数据的方式有很多种,其中有一个与开源完全兼容的方案:通过logstash及logstash周围的强大的plugin实现数据采集。 首先我们需要在ECS中来安装部署logstash,购买阿里云ECS服务,准备1.8以上版本的JDK。
4622 0
|
测试技术 Apache
Elasticsearch压力测试工具-Apache Jmeter
一、下载Jmeter 下载地址:http://jmeter.apache.org/download_jmeter.cgi 解压之后运行: cd /apache-jmeter-3.2/bin ./jmeter 二、添加线程组 依次店测试计划->添加->threads->线程组: 在线程组中添加线程数和用户数,模拟用户访问: 10个用户,每个用户200个线程,循环10次。
2954 0

推荐镜像

更多