读书笔记:Large-scale cluster management at Google with Borg

简介:
  名篇,讲的Google自用的调度平台Borg,我感觉也是Google的钓鱼论文,当年大家知道有Borg的时候,好多人在各种地方呼吁Google把Borg开源,或者再详细的讲讲细节。结果Google趁势推出Kubernetes,“Borg虽然不开源,可是俺们开源了在这个基础上研发的更新、更通用的Kubernetes啊,大家快来用啊啊啊啊啊啊啊啊”。 Kubernetes于是大火。
   Borgs 最NB的地方是同时跑Long-running service和batch jobs, 这样据该论文所说会提高大概20~30%的效率,很NB的。他的原话是:“Since many other organizations run user-facing and batch jobs in separate clusters, we examined what would happen if we did the same. Figure 5 shows that segregating prod and non-prod work would need 20–30% more machines in the median cell to run our workload.” 大意就是别人都是把面向用户和批处理Job分开在不同的机群里跑,我们也试了一下这么会怎么样。我们一试,结果哎呀妈呀,要多用20~30%的机器才行。
目的    
    Google'sBorg system is a cluster manager that runs hundreds of thousands of jobs, frommany thousands of different applications, across a number of clusters each withup to tens of thousands of machines.
 
好处:
   1, Hides the detail of resources management and failure handings 
   2, operates with very haigh relability and availability and supportsapplications that do the same
   3, lets user run workloads accross tens of thousands o machines. 

概念:
1,Borg cell: a set of machines that are managed as a unit.
2,Workload: Borg cells run a heterogenous workload withtwo main parts.
Thefirst is long-running services that should “never” go down, and handleshort-lived latency-sensitive requests (a few ms to a few hundred ms). Suchservices are used for end-user-facing products such as Gmail, Google Docs, andweb search, and for internal infrastructure services (e.g., BigTable).
Thesecond is batch jobs that take from a few seconds to a few days to complete;these are much less sensitive to short-term performance fluctuations.
3,Cluster: The machines in a cell belong to a singlecluster, defined by the high-performance datacenter-scale network fabric thatconnects them. A cluster lives inside a single datacenter building, and acollection of buildings makes up a site.
4,Jobs:A Borg job’s properties include its name, owner, andthe number of tasks it has. Jobs can have constraints to force its tasks to runon machines with particular attributes such as processor architecture, OSversion, or an external IP address.
5,Task:  Each task maps to aset of Linux processes running in a container on a machine
6,Alloc: A Borg alloc (short for allocation) is areserved set of resources on a machine in which one or more tasks can be run;the resources remain assigned whether or not they are used.
7,Quota:Quota is used to decide which jobs to admit forscheduling. Quota is expressed as a vector of resource quantities (CPU, RAM,disk, etc.) at a given priority, for a period of time (typically months).
 
架构
Borgmaster:Each cell’s Borgmaster consists of two processes: themain Borgmaster process and a separate scheduler (x3.2). The main Borgmasterprocess handles client RPCs that either mutate state (e.g., create job) orprovide read-only access to data (e.g., lookup job). It also manages statemachines for all of the objects in the system (machines, tasks, allocs, etc.),communicates with the Borglets, and offers a web UI as a backup to Sigma.
 
Scheduling:When a job is submitted, the Borgmaster records itpersistently in the Paxos store and adds the job’s tasks to the pending queue.This is scanned asynchronously by the scheduler, which assigns tasks tomachines if there are sufficient available resources that meet the job’sconstraints. (The scheduler primarily operates on tasks, not jobs.)
 
Borglet:The Borglet is a local Borg agent that is present onevery machine in a cell. It starts and stops tasks; restarts them if they fail;manages local resources by manipulating OS kernel settings; rolls over debuglogs; and reports the state of the machine to the Borgmaster and othermonitoring systems.
 
一些小细节
The vastmajority of the Borg workload does not run inside virtual machines
Borgwrites the task's hostname and port into a consistent. highly-available file inChubby
Allcomponents of Borg are written in c++
A keydesign feature in Borg is that already-running tasks continue to run even ifthe Borgmaster or a task's Borglet goes down.

性能
各种NB.

最后推K8S的广告:
The Kubernetes architecture goes further: it has an API server at its core that is responsible only for processing requests and manipulating the underlying state objects. The cluster management logic is built as small, composable micro-services that are clients of this API server, such as the replication controller, which maintains the desired number of replicas of a pod in the face of failures, and the node controller, which manages the machine lifecycle.  

相关实践学习
深入解析Docker容器化技术
Docker是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的Linux机器上,也可以实现虚拟化,容器是完全使用沙箱机制,相互之间不会有任何接口。Docker是世界领先的软件容器平台。开发人员利用Docker可以消除协作编码时“在我的机器上可正常工作”的问题。运维人员利用Docker可以在隔离容器中并行运行和管理应用,获得更好的计算密度。企业利用Docker可以构建敏捷的软件交付管道,以更快的速度、更高的安全性和可靠的信誉为Linux和Windows Server应用发布新功能。 在本套课程中,我们将全面的讲解Docker技术栈,从环境安装到容器、镜像操作以及生产环境如何部署开发的微服务应用。本课程由黑马程序员提供。     相关的阿里云产品:容器服务 ACK 容器服务 Kubernetes 版(简称 ACK)提供高性能可伸缩的容器应用管理能力,支持企业级容器化应用的全生命周期管理。整合阿里云虚拟化、存储、网络和安全能力,打造云端最佳容器化应用运行环境。 了解产品详情: https://www.aliyun.com/product/kubernetes
相关文章
|
API
JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Invalid escape sequence at line
JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Invalid escape sequence at line
379 0
App is not indexable by Google Search; consider adding at least one Activity with an ACTION-VIEW in
App is not indexable by Google Search; consider adding at least one Activity with an ACTION-VIEW in
185 0
|
Kubernetes 调度 Docker
【干货-K8S系列】Kubernetes调度核心解密:从Google Borg说起
一个容器平台的主要功能就是为容器分配运行时所需要的计算,存储和网络资源。容器调度系统负责选择在最合适的主机上启动容器,并且将它们关联起来。它必须能够自动的处理容器故障并且能够在更多的主机上自动启动更多的容器来应对更多的应用访问。
2185 0
(转) Graph-powered Machine Learning at Google
    Graph-powered Machine Learning at Google     Thursday, October 06, 2016 Posted by Sujith Ravi, Staff Research Scientist, Googl...
|
数据可视化 定位技术 Sentinel
如何用Google Earth Engine快速、大量下载遥感影像数据?
【2月更文挑战第9天】本文介绍在谷歌地球引擎(Google Earth Engine,GEE)中,批量下载指定时间范围、空间范围的遥感影像数据(包括Landsat、Sentinel等)的方法~
5567 1
如何用Google Earth Engine快速、大量下载遥感影像数据?
|
编解码 人工智能 算法
Google Earth Engine——促进森林温室气体报告的全球时间序列数据集
Google Earth Engine——促进森林温室气体报告的全球时间序列数据集
346 0
|
编解码 人工智能 数据库
Google Earth Engine(GEE)——全球道路盘查项目全球道路数据库
Google Earth Engine(GEE)——全球道路盘查项目全球道路数据库
439 0
Google Earth Engine(GEE)——导出指定区域的河流和流域范围
Google Earth Engine(GEE)——导出指定区域的河流和流域范围
950 0

热门文章

最新文章

推荐镜像

更多