Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

简介:

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

 

为什么需要Mesos?

现在有越来越多的compute framework, 并且每个framework都有自己的适用场景和优缺点. 比如Hadoop, MPI, Pregel, Spark…… 
所以往往需要build不同的framework来满足不同的需要, 问题是如果不同的framework搭建在不同的cluster上, 太不方便了 
首先那么多的cluster, 严重的资源浪费, 并且对于处理对象big data需要在各个cluster之间导来导去, 相当不方便

所以Mesos就提供了这样的一个方案, 可以使不同的framework来共享一个cluster. 

现在已有的集群共享solution,

1. Statically partition the cluster and run one framework per partition, 将集群分成互不打扰的patition  
2. Allocate a set of VMs to each framework, 使用虚拟机技术

Unfortunately, these solutions achieve neither high utilization nor efficient data sharing. 
The main problem is the mismatch between the allocation granularities of these solutions and of existing frameworks. 
这些技术在利用效率和数据共享上都不太好, 原因是他们共享的粒度太粗, 和现有的计算framework不匹配. 
比如对于Hadoop, 对于资源的分配可以细到slot的级别, 一个instance可以包含多个slot.

In this paper, we propose Mesos, a thin resource sharing layer that enables fine-grained sharing across diverse cluster computing frameworks, by giving frameworks a common interface for accessing cluster resources. 
Mesos的特点, 就说可以实现不同计算framework之间的fine-grained的资源共享, 它通过提供一种通用的cluster资源访问接口来实现.

 

Mesos架构设计

Design Philosophy

Because cluster frameworks are both highly diverse and rapidly evolving, 
our overriding design philosophy has been to define a minimal interface that enables efficient resource sharing across frameworks, 
and otherwise push control of task scheduling and execution to the frameworks.

设计哲学, 一句话就是简单至上. 
首先, 为了应对framework之间极大的差异性, Mesos只提高一组最小的简单接口用于共享资源 
接着, 由framework自身负责task的schedule和执行

 

架构Overview

Mesos consists of a master process that manages slave daemons running on each cluster node, and frameworks that run tasks on these slaves.

The master implements fine-grained sharing across frameworks using resource offers
Each resource offer is a list of free resources on multiple slaves.

The master decides how many resources to offer to each framework according to an organizational policy, such as fair sharing or priority.

Each framework running on Mesos consists of two components: 
scheduler that registers with the master to be offered resources 
an executor process that is launched on slave nodes to run the framework’s tasks

首先Mesos基于master, master用于管理slave, 并且在master上可以用户定义各个framework的资源分配策略, 比如fair或者priority 
既然基于master, 就需要考虑单点问题, 这儿使用zookeeper来管理并确保failover

提到'resource offer'的概念, 其实就是可用资源的列表 
Mesos slave会将resource offer发给Mesos master, master通知各个framework scheduler当前resource offer的情况 
Scheduler会根据各自情况, 决定是否在slave上assign task, task的执行由framework executor来完成, 对mesos透明

image

 

实际的例子, 
1. Slave向master报告resources offer, 4cpu, 4gb ram 
2. master通知注册的schedulers 
3. scheduler判断当前待执行的task列表, 发现task1, 和task2, 可以在s1执行, 发请求告诉master 
4. master通知在s1上的framework1的executor, 执行相应的task

image

 

Isolation

Mesos provides performance isolation between framework executors running on the same slave by leveraging existing OS isolation mechanisms.

We currently isolate resources using OS container technologies, specifically Linux Containers and Solaris Projects.

These technologies can limit the CPU, memory, network bandwidth, and (in new Linux kernels) I/O usage of a process tree.


本文章摘自博客园,原文发布日期: 2013-03-27 

目录
相关文章
|
机器学习/深度学习 自然语言处理 数据可视化
M2E2: Cross-media Structured Common Space for Multimedia Event Extraction 论文解读
我们介绍了一个新的任务,多媒体事件抽取(M2E2),旨在从多媒体文档中抽取事件及其参数。我们开发了第一个基准测试
142 0
|
缓存 负载均衡 网络协议
译|A scalable, commodity data center network architecture(三)
译|A scalable, commodity data center network architecture(三)
123 0
|
存储 算法 网络协议
译|A scalable, commodity data center network architecture(二)
译|A scalable, commodity data center network architecture(二)
147 0
|
存储 分布式计算 网络协议
译|A scalable, commodity data center network architecture(一)
译|A scalable, commodity data center network architecture
167 0
《Data infrastructure architecture for a medium size organization tips for collecting, storing and analysis》电子版地址
Data infrastructure architecture for a medium size organization: tips for collecting, storing and analysis
99 0
《Data infrastructure architecture for a medium size organization tips for collecting, storing and analysis》电子版地址
|
Java 虚拟化 C++
Stack based vs Register based Virtual Machine Architecture
进程虚拟机简介 一个虚拟机是对原生操作系统的一个高层次的抽象,目的是为了模拟物理机器,本文所谈论的是基于进程的虚拟机,而不是基于系统的虚拟机,基于系统的虚拟机可以用来在同一个平台下去运行多个不同的硬件架构的操作系统,常见的有kvm,xen,vmware等,而基于进程的虚拟机常见的有JVM,PVM(python虚拟机)等,java和python的解释器将java和python的代码编译成JVM和P
3739 0
|
安全 Java Android开发
PhotoSharing Part I: Setting up the Photo Sharing Android Application
We will build a photo sharing Android app with real-time image uploading and downloading functionality using Alibaba Cloud OSS.
1808 0
|
Java 开发工具 Android开发
Photo Sharing App Part II: Understanding OSS Functions & Creating UI
We will build a photo sharing Android app with real-time image uploading and downloading functionality using Alibaba Cloud OSS.
2087 0