instagram use solr instead postgresql gis

本文涉及的产品
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
云原生数据库 PolarDB MySQL 版,通用型 2核8GB 50GB
简介:
instagram的技术点可参考 : 

从文章来看, 在地理位置搜索方面, 他们使用了solr.
也算专车专用吧. 
For our  geo-search API , we used PostgreSQL for many months, but once our Media entries were sharded, moved over to using  Apache Solr . It has a simple JSON interface, so as far as our application is concerned, it’s just another API to consume.

Apache Solr

SolrTM is the popular, blazing fast open source enterprise search platform from the Apache LuceneTMproject. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

See the complete feature list for more details.

For more information about Solr, please see the Solr wiki.

SolrTM Features

Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Web Traffic
  • Standards Based Open Interfaces - XML, JSON and HTTP
  • Comprehensive HTML Administration Interfaces
  • Server statistics exposed over JMX for monitoring
  • Linearly scalable, auto index replication, auto failover and recovery
  • Near Real-time indexing
  • Flexible and Adaptable with XML configuration
  • Extensible Plugin Architecture

Solr Uses the LuceneTM Search Library and Extends it!

  • A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
  • Powerful Extensions to the Lucene Query Language
  • Faceted Search and Filtering
  • Geospatial Search with support for multiple points per document and geo polygons
  • Advanced, Configurable Text Analysis
  • Highly Configurable and User Extensible Caching
  • Performance Optimizations
  • External Configuration via XML
  • An AJAX based administration interface
  • Monitorable Logging
  • Fast near real-time incremental indexing and index replication
  • Highly Scalable Distributed search with sharded index across multiple hosts
  • JSON, XML, CSV/delimited-text, and binary update formats
  • Easy ways to pull in data from databases and XML files from local disk and HTTP sources
  • Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
  • Apache UIMA integration for configurable metadata extraction
  • Multiple search indices

Detailed Features

Schema

  • Defines the field types and fields of documents
  • Can drive more intelligent processing
  • Declarative Lucene Analyzer specification
  • Dynamic Fields enables on-the-fly addition of new fields
  • CopyField functionality allows indexing a single field multiple ways, or combining multiple fields into a single searchable field
  • Explicit types eliminates the need for guessing types of fields
  • External file-based configuration of stopword lists, synonym lists, and protected word lists
  • Many additional text analysis components including word splitting, regex and sounds-like filters
  • Pluggable similarity model per field

Query

  • HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby, PHP, Velocity, CSV, binary)
  • Sort by any number of fields, and by complex functions of numeric fields
  • Advanced DisMax query parser for high relevancy results from user-entered queries
  • Highlighted context snippets
  • Faceted Searching based on unique field values, explicit queries, date ranges, numeric ranges or pivot
  • Multi-Select Faceting by tagging and selectively excluding filters
  • Spelling suggestions for user queries
  • More Like This suggestions for given document
  • Function Query - influence the score by user specified complex functions of numeric fields or query relevancy scores.
  • Range filter over Function Query results
  • Date Math - specify dates relative to "NOW" in queries and updates
  • Dynamic search results clustering using Carrot2
  • Numeric field statistics such as min, max, average, standard deviation
  • Combine queries derived from different syntaxes
  • Auto-suggest functionality for completing user queries
  • Allow configuration of top results for a query, overriding normal scoring and sorting
  • Simple join capability between two document types
  • Performance Optimizations

Core

  • Dynamically create and delete document collections without restarting
  • Pluggable query handlers and extensible XML data format
  • Pluggable user functions for Function Query
  • Customizable component based request handler with distributed search support
  • Document uniqueness enforcement based on unique key field
  • Duplicate document detection, including fuzzy near duplicates
  • Custom index processing chains, allowing document manipulation before indexing
  • User configurable commands triggered on index changes
  • Ability to control where docs with the sort field missing will be placed
  • "Luke" request handler for corpus information

Caching

  • Configurable Query Result, Filter, and Document cache instances
  • Pluggable Cache implementations, including a lock free, high concurrency implementation
  • Cache warming in background
  • When a new searcher is opened, configurable searches are run against it in order to warm it up to avoid slow first hits. During warming, the current searcher handles live requests.
  • Autowarming in background
  • The most recently accessed items in the caches of the current searcher are re-populated in the new searcher, enabling high cache hit rates across index/searcher changes.
  • Fast/small filter implementation
  • User level caching with autowarming support

SolrCloud

  • Centralized Apache ZooKeeper based configuration
  • Automated distributed indexing/sharding - send documents to any node and it will be forwarded to correct shard
  • Near Real-Time indexing with immediate push-based replication (also support for slower pull-based replication)
  • Transaction log ensures no updates are lost even if the documents are not yet indexed to disk
  • Automated query failover, index leader election and recovery in case of failure
  • No single point of failure

Admin Interface

  • Comprehensive statistics on cache utilization, updates, and queries
  • Interactive schema browser that includes index statistics
  • Replication monitoring
  • SolrCloud dashboard with graphical cluster node status
  • Full logging control
  • Text analysis debugger, showing result of every stage in an analyzer
  • Web Query Interface w/ debugging output
  • Parsed query output
  • Lucene explain() document score detailing
  • Explain score for documents outside of the requested range to debug why a given document wasn't ranked higher.

The Apache Software Foundation

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are defined by collaborative consensus based processes, an open, pragmatic software license and a desire to create high quality software that leads the way in its field. Apache Lucene, Apache Solr, Apache PyLucene, Apache Open Relevance Project and their respective logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.


[参考]
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍如何基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
定位技术 Windows 关系型数据库
PostgreSQL GUI pgadmin4 v3.3 支持 gis geometry 数据编辑、显示
标签 PostgreSQL , pgadmin , gis , 编辑 背景 pgadmin 4 v3.3 开始支持geometry 类型的展示。 https://www.postgresql.org/ftp/pgadmin/pgadmin4/v3.3/windows/ 如果geometry使用的是SRID 4326 (WGS 84 lon/lat)坐标系,则pgadmin会自动从OpenStreetMap 加载图层,作为背景。
2304 0
|
关系型数据库 Serverless 定位技术
PostgreSQL如何使用GIS函数计算两个点连线的中间点?
PostgreSQL如何使用GIS函数计算两个点连线的中间点?
450 62
|
关系型数据库 Serverless 定位技术
PostgreSQL GIS函数判断两条线有交点的函数是什么?
PostgreSQL GIS函数判断两条线有交点的函数是什么?
810 60
|
关系型数据库 定位技术 分布式数据库
沉浸式学习PostgreSQL|PolarDB 18: 通过GIS轨迹相似伴随|时态分析|轨迹驻点识别等技术对拐卖、诱骗场景进行侦查
本文主要教大家怎么用好数据库, 而不是怎么运维管理数据库、怎么开发数据库内核.
1521 1
|
SQL 关系型数据库 定位技术
如何使用shp2pgsql 将shp格式的GIS数据导入到PostgreSQL
如何使用shp2pgsql 将shp格式的GIS数据导入到PostgreSQL
9370 0
|
大数据 关系型数据库 数据库
桌面GIS连接Postgresql总结
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/ESA_DSQ/article/details/52203481 对于非开发人员的GISer而言,数据库这东西更多停留在mdb,gdb的层面,相对而言这些数据的使用无论是在处理还是管理上,门槛相对较低。
1372 0
|
关系型数据库 MySQL 定位技术
PostgreSQL MySQL 兼容性之 - Gis类型
PostGIS的GIS功能相比MySQL强大太多,本文仅仅列举了MySQL支持的部分。欲了解PostGIS请参考:http://postgis.net/docs/manual-2.2/reference.htmlPostGIS有几百个操作函数, 对GIS支持强大。 POINT MySQL
5717 0
|
4月前
|
存储 关系型数据库 测试技术
拯救海量数据:PostgreSQL分区表性能优化实战手册(附压测对比)
本文深入解析PostgreSQL分区表的核心原理与优化策略,涵盖性能痛点、实战案例及压测对比。首先阐述分区表作为继承表+路由规则的逻辑封装,分析分区裁剪失效、全局索引膨胀和VACUUM堆积三大性能杀手,并通过电商订单表崩溃事件说明旧分区维护的重要性。接着提出四维设计法优化分区策略,包括时间范围分区黄金法则与自动化维护体系。同时对比局部索引与全局索引性能,展示后者在特定场景下的优势。进一步探讨并行查询优化、冷热数据分层存储及故障复盘,解决分区锁竞争问题。
439 2
|
关系型数据库 分布式数据库 PolarDB
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
514 0

推荐镜像

更多