Scalable System Design Patterns
可伸缩系统的设计模式
Ricky Ho在他的博客中分享了该文章,该文章是一个简单的概括分享,详细的可以参见他博客的其它详细文章。下面主要是意译。
1、Load Balancer:负载均衡 – 由分发者来决定哪个工作者处理下一个请求,这种决定可以基于不同的策略。
“In this model, there is a dispatcher that determines which worker instance will handle the request based on different policies. The application should best be "stateless" so any worker instance can handle the request.
This pattern is deployed in almost every medium to large web site setup.”
该模式中,由分发器来决定哪个工作者来处理请求。应用最好是无状态的,以使任何一个工作者都能同等处理请求。几乎所有的中大型网站都应用了负载均衡器这个模式。
2、Scatter and Gather:分散和聚合 – 分发者将请求广播到处理池当中的所有工作者。每一个工作者单独计算其中一部分并将结果返回给分发者,由分发者来汇总所有的计算结果并返回。
”In this model, the dispatcher multicast the request to all workers of the pool. Each worker will compute a local result and send it back to the dispatcher, who will consolidate them into a single response and then send back to the client.
This pattern is used in Search engines like Yahoo, Google to handle user's keyword search request ... etc.“
该模式中,分发者将请求转发给池中的所有工作者,每个工作者处理请求的一部分并返回给分发器,分发器工作者返回的结果加工组合为一个响应返回给客户端。该模式在搜索引擎中使用处理用户的关键字,如Yahoo、Google。
3、Result Cache:结果缓存 – 分发者会首先检查这个请求之前是否有处理过,并试图找出之前的处理结果并返回,以便节省处理时间。
“In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
This pattern is commonly used in large enterprise application. Memcached is a very commonly deployed cache server.”
该模式,只是在分发器处理时加了一步查询结果缓存(译注:类似浏览器缓存),如果之前已经处理过并且可以使用之前的缓存,就返回之前的处理结果节省处理时间!该模式通常使用在大型企业应用。Memcached就是一个常用的cache服务器。
4、Shared Space:共享空间 – 所有的工作者都关注一块共享区域内的信息,并且都向这块区域提交自己的部分知识、信息。信息不断被完善,直到问题可以被解决为止。
“This model also known as "Blackboard"; all workers monitors information from the shared space and contributes partial knowledge back to the blackboard. The information is continuously enriched until a solution is reached.
This pattern is used in JavaSpace and also commercial product GigaSpace.”
这个模式也叫“黑板模式”。就是在处理流程中,存在一个全局传递的对象,它可能包含了请求参数、中间状态、响应结果等各种信息,供流程中的各个组件对其进行操作。该模式在JavaSpace(译注:JavaSpaces技术是进行分布式计算的一种简单机制)和GigaSpace(译注:是一个虚拟化的中间件层)中都有使用。
5、Pipe and Filter:管道和过滤器 – 所有的工作者按照数据处理的流程被串行连接起来。
“This model is also known as "Data Flow Programming"; all workers connected by pipes where data is flow across.
This pattern is a very common EAI pattern.”
这个模式也叫“面向数据流编程”,是很通用的企业集成模式。
6、Map Reduce:专门用于磁盘IO为瓶颈的批处理作业。使用分布式的文件系统使得文件能够被并行处理。
“The model is targeting batch jobs where disk I/O is the major bottleneck. It use a distributed file system so that disk I/O can be done in parallel.
This pattern is used in many of Google's internal application, as well as implemented in open source Hadoop parallel processing framework. I also find this pattern can be used in many many application design scenarios.”
这个模式使用分布式文件系统,这样磁盘可以并行I/O。Google内部许多应用程序使用了这个模式。Hadoop就是基于MapReduce的一个实现。
7、Bulk Synchronous Parellel:批量同步并行 – 所有工作者一个接一个的执行,由主控来进行协调。
“This model is based on lock-step execution across all workers, coordinated by a master. Each worker repeat the following steps until the exit condition is reached, when there is no more active workers.
1) Each worker read data from input queue
2) Each worker perform local processing based on the read data
3) Each worker push local result along its direct connection
This pattern has been used in Google's Pregel graph processing model as well as the Apache Hama project.”
该模型基于一个master协调,所有的worker同步(lock-step)执行。
该模式被用于Google Pregel Graph Processing google-pregel-graph-processing和Hama。
8、Execution Orchestrator:执行集中管理 – 一个智能的调度者在一群简单的工作者上调配已经准备好运行的任务(基于依赖图)
“This model is based on an intelligent scheduler / orchestrator to schedule ready-to-run tasks (based on a dependency graph) across a clusters of dumb workers.
This pattern is used in Microsoft's Dryad project”
该模式基于一个智能调度者/管理者在一群工作者之间调配可运行任务。该模式在微软的:Microsoft’s Dryad project中使用。