读书笔记:Large-scale cluster management at Google with Borg

简介:
  名篇,讲的Google自用的调度平台Borg,我感觉也是Google的钓鱼论文,当年大家知道有Borg的时候,好多人在各种地方呼吁Google把Borg开源,或者再详细的讲讲细节。结果Google趁势推出Kubernetes,“Borg虽然不开源,可是俺们开源了在这个基础上研发的更新、更通用的Kubernetes啊,大家快来用啊啊啊啊啊啊啊啊”。 Kubernetes于是大火。
   Borgs 最NB的地方是同时跑Long-running service和batch jobs, 这样据该论文所说会提高大概20~30%的效率,很NB的。他的原话是:“Since many other organizations run user-facing and batch jobs in separate clusters, we examined what would happen if we did the same. Figure 5 shows that segregating prod and non-prod work would need 20–30% more machines in the median cell to run our workload.” 大意就是别人都是把面向用户和批处理Job分开在不同的机群里跑,我们也试了一下这么会怎么样。我们一试,结果哎呀妈呀,要多用20~30%的机器才行。
目的    
    Google'sBorg system is a cluster manager that runs hundreds of thousands of jobs, frommany thousands of different applications, across a number of clusters each withup to tens of thousands of machines.
 
好处:
   1, Hides the detail of resources management and failure handings 
   2, operates with very haigh relability and availability and supportsapplications that do the same
   3, lets user run workloads accross tens of thousands o machines. 

概念:
1,Borg cell: a set of machines that are managed as a unit.
2,Workload: Borg cells run a heterogenous workload withtwo main parts.
Thefirst is long-running services that should “never” go down, and handleshort-lived latency-sensitive requests (a few ms to a few hundred ms). Suchservices are used for end-user-facing products such as Gmail, Google Docs, andweb search, and for internal infrastructure services (e.g., BigTable).
Thesecond is batch jobs that take from a few seconds to a few days to complete;these are much less sensitive to short-term performance fluctuations.
3,Cluster: The machines in a cell belong to a singlecluster, defined by the high-performance datacenter-scale network fabric thatconnects them. A cluster lives inside a single datacenter building, and acollection of buildings makes up a site.
4,Jobs:A Borg job’s properties include its name, owner, andthe number of tasks it has. Jobs can have constraints to force its tasks to runon machines with particular attributes such as processor architecture, OSversion, or an external IP address.
5,Task:  Each task maps to aset of Linux processes running in a container on a machine
6,Alloc: A Borg alloc (short for allocation) is areserved set of resources on a machine in which one or more tasks can be run;the resources remain assigned whether or not they are used.
7,Quota:Quota is used to decide which jobs to admit forscheduling. Quota is expressed as a vector of resource quantities (CPU, RAM,disk, etc.) at a given priority, for a period of time (typically months).
 
架构
Borgmaster:Each cell’s Borgmaster consists of two processes: themain Borgmaster process and a separate scheduler (x3.2). The main Borgmasterprocess handles client RPCs that either mutate state (e.g., create job) orprovide read-only access to data (e.g., lookup job). It also manages statemachines for all of the objects in the system (machines, tasks, allocs, etc.),communicates with the Borglets, and offers a web UI as a backup to Sigma.
 
Scheduling:When a job is submitted, the Borgmaster records itpersistently in the Paxos store and adds the job’s tasks to the pending queue.This is scanned asynchronously by the scheduler, which assigns tasks tomachines if there are sufficient available resources that meet the job’sconstraints. (The scheduler primarily operates on tasks, not jobs.)
 
Borglet:The Borglet is a local Borg agent that is present onevery machine in a cell. It starts and stops tasks; restarts them if they fail;manages local resources by manipulating OS kernel settings; rolls over debuglogs; and reports the state of the machine to the Borgmaster and othermonitoring systems.
 
一些小细节
The vastmajority of the Borg workload does not run inside virtual machines
Borgwrites the task's hostname and port into a consistent. highly-available file inChubby
Allcomponents of Borg are written in c++
A keydesign feature in Borg is that already-running tasks continue to run even ifthe Borgmaster or a task's Borglet goes down.

性能
各种NB.

最后推K8S的广告:
The Kubernetes architecture goes further: it has an API server at its core that is responsible only for processing requests and manipulating the underlying state objects. The cluster management logic is built as small, composable micro-services that are clients of this API server, such as the replication controller, which maintains the desired number of replicas of a pod in the face of failures, and the node controller, which manages the machine lifecycle.  

相关实践学习
通过Ingress进行灰度发布
本场景您将运行一个简单的应用,部署一个新的应用用于新的发布,并通过Ingress能力实现灰度发布。
容器应用与集群管理
欢迎来到《容器应用与集群管理》课程,本课程是“云原生容器Clouder认证“系列中的第二阶段。课程将向您介绍与容器集群相关的概念和技术,这些概念和技术可以帮助您了解阿里云容器服务ACK/ACK Serverless的使用。同时,本课程也会向您介绍可以采取的工具、方法和可操作步骤,以帮助您了解如何基于容器服务ACK Serverless构建和管理企业级应用。 学习完本课程后,您将能够: 掌握容器集群、容器编排的基本概念 掌握Kubernetes的基础概念及核心思想 掌握阿里云容器服务ACK/ACK Serverless概念及使用方法 基于容器服务ACK Serverless搭建和管理企业级网站应用
相关文章
|
API
JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Invalid escape sequence at line
JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Invalid escape sequence at line
167 0
App is not indexable by Google Search; consider adding at least one Activity with an ACTION-VIEW in
App is not indexable by Google Search; consider adding at least one Activity with an ACTION-VIEW in
135 0
|
Kubernetes 调度 Docker
【干货-K8S系列】Kubernetes调度核心解密:从Google Borg说起
一个容器平台的主要功能就是为容器分配运行时所需要的计算,存储和网络资源。容器调度系统负责选择在最合适的主机上启动容器,并且将它们关联起来。它必须能够自动的处理容器故障并且能够在更多的主机上自动启动更多的容器来应对更多的应用访问。
1923 0
(转) Graph-powered Machine Learning at Google
    Graph-powered Machine Learning at Google     Thursday, October 06, 2016 Posted by Sujith Ravi, Staff Research Scientist, Googl...
|
8月前
|
数据可视化 定位技术 Sentinel
如何用Google Earth Engine快速、大量下载遥感影像数据?
【2月更文挑战第9天】本文介绍在谷歌地球引擎(Google Earth Engine,GEE)中,批量下载指定时间范围、空间范围的遥感影像数据(包括Landsat、Sentinel等)的方法~
2825 1
如何用Google Earth Engine快速、大量下载遥感影像数据?
|
8月前
|
编解码 人工智能 算法
Google Earth Engine——促进森林温室气体报告的全球时间序列数据集
Google Earth Engine——促进森林温室气体报告的全球时间序列数据集
115 0
|
8月前
|
编解码 人工智能 数据库
Google Earth Engine(GEE)——全球道路盘查项目全球道路数据库
Google Earth Engine(GEE)——全球道路盘查项目全球道路数据库
175 0
|
8月前
Google Earth Engine(GEE)——导出指定区域的河流和流域范围
Google Earth Engine(GEE)——导出指定区域的河流和流域范围
329 0

热门文章

最新文章