笔记:Ceph: A Scalable, High-Performance Distributed File System

本文涉及的产品
对象存储 OSS,20GB 3个月
对象存储 OSS,内容安全 1000次 1年
对象存储 OSS,恶意文件检测 1000次 1年
简介:

关于Ceph的名篇。Ceph是现在很火的一个存储系统,不同于HDSF主要是面向大数据应用,Ceph是立志要做一个通用的存储解决方案,要同时很好的支持对象存储(Object Storage),块存储(Block Storage)以及文件系统(File System) 。现在很多Openstack私有云的存储都是基于Ceph的。Ceph就是基于这篇论文做得。

摘要
很明确的指出了Ceph的使命:
We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability.
以及关键方法和技术:
Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable object storage devices (OSDs).
We leverage device intelligence by distributing data replication, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system.
A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads.
然后就是性能
Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supportingmore than 250,000metadata operations per second.

介绍:
先把NFS和传统OSD的问题说了一下。
然后介绍Ceph:
We present Ceph, a distributed file system that provides excellent performance and reliability while promising unparalleled scalability.
这句是一个关键:Our architecture is based on the assumption that systems at the petabyte scale are inherently dynamic: large systems are inevitably built incrementally, node failures are the norm rather than the exception, and the quality and character of workloads are constantly shifting over time.
Ceph的架构如下:

系统介绍:
Ceph分3部分:
the client, each instance of which exposes a near-POSIX file system interface to a host or process;
a cluster of OSDs, which collectively stores all data and metadata;
A metadata server cluster, which manages the namespace (file names and directories) while coordinating security, consistency and coherence (see Figure 1).
如下图所示:
screenshot

主要做法:
Decoupled Data and Metadata
Dynamic Distributed Metadata Management
Reliable Autonomic Distributed Object Storage

后面几章是对每部分具体实现的介绍,没有什么太高深的公式和理论,大家一般都能看明白,挺有意思的。
原文链接:
http://www.ece.eng.wayne.edu/~sjiang/ECE7650-winter-15/topic5B-S.pdf
如果下不了可以去百度学术上再搜一下。

相关实践学习
借助OSS搭建在线教育视频课程分享网站
本教程介绍如何基于云服务器ECS和对象存储OSS,搭建一个在线教育视频课程分享网站。
相关文章
|
6月前
|
Oracle 关系型数据库 Linux
Disable NUMA on database servers to improve performance of Linux file system utilities
Disable NUMA on database servers to improve performance of Linux file system utilities
45 3
|
存储 缓存 网络协议
译|High-Performance Server Architecture(下)
译|High-Performance Server Architecture(下)
82 0
|
缓存 前端开发 安全
译|High-Performance Server Architecture(上)
译|High-Performance Server Architecture
75 0
|
传感器 关系型数据库 PostgreSQL
Real-time Monitoring and Alerts for Senior Citizens - Big Data for Healthcare
This article discusses Alibaba Cloud PostgreSQL best practices for healthcare applications. In particular, we will explore how Big Data can be applied.
2510 0
Real-time Monitoring and Alerts for Senior Citizens - Big Data for Healthcare
|
分布式计算 MySQL 关系型数据库
Implementing a Highly-Compressed Data Storage
Alibaba Cloud ApsaraDB for RDS for MySQL supports the TokuDB engine to store data that is compressed to 5 to 10 times smaller than its original size.
1740 0
Common mistakes to avoid while using big data in risk management
Managing risk is a challenging enterprise, and errors are often made which can lead to catastrophic consequences.
1671 0
下一篇
无影云桌面