Ceph Reef(18.2.X)的对象存储网关(rgw)组件搭建

本文涉及的产品
对象存储 OSS,20GB 3个月
密钥管理服务KMS,1000个密钥,100个凭据,1个月
对象存储 OSS,恶意文件检测 1000次 1年
简介: 这篇文章是关于Ceph Reef(18.2.X)版本中对象存储系统的配置和使用案例,包括对象存储网关的概述、核心资源介绍、Ceph RGW支持的接口、高可用radosgw的部署、s3cmd工具的使用以及如何通过HTTP方式访问对象存储。

                                              作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。

一.对象存储系统概述

1.对象存储网关概述

Ceph对象网关可以将数据存储在用于存储来自cephfs客户端或ceph rbd客户端的数据的同一ceph存储集群中。

object是对象存储系统中数据存储的基本单位,每个Object时数据和数据属性集的综合体,数据数据可以根据应用的需求进行设置,包括数据分布,服务质量等。

每个对象自我维护其属性,从而简化了存储系统的管理任务,对象的大小可以不同,甚至可以包含整个数据结构,如文件,数据库表项等,文件等上传和下载,默认有一个最大的数据块15MB。

Ceph对象存储使用Ceph对象网关守护进程(Rados GateWay,简称rgw),它是用于与ceph存储集群进行交互式的HTTP服务器。

Ceph RGW基于librados,是为应用提供RESTful类型的对象存储接口,默认使用Civetweb作为其Web Service。

在N版本中Civetweb默认使用法端口7480提供服务,但R版本(18.2.4)中使用了80端口,若想自定义端口就需要修改ceph的配置文件。
    - 自0.80版本(Firefly,2014-05-01~2016-04-01)起,Ceph放弃了apache和fastcgi提供radosgw服务;
    - 默认嵌入了在ceph-radosgw进程中的Citeweb,这种新的实现方式更加轻便和简洁,但直到Ceph 11.0.1 Kraken(2017-01-01~2017-08-01)版本,Citeweb才开始支持SSL协议。

推荐阅读:
    https://docs.ceph.com/en/nautilus/radosgw/
    https://docs.ceph.com/en/nautilus/radosgw/bucketpolicy/
    https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/userguide/bucketnamingrules.html
    https://www.s3express.com/help/help.html

2.对象存储系统的核心资源概述

各种存储方案虽然在设计与实现上有所区别,但大多数对象存储系统对外呈现的核心资源类型大同小异。

一般来说,一个对象存储系统的核心资源应该包括(User),存储桶(Bucket)和对象(object),它们之间的关系是:
    - 1.User将Object存储到存储系统上的Bucket;
    - 2.存储桶属于某个用户并可以容纳对象,一个存储桶用于存储多个对象;
    - 3.同一个用户可以拥有多个存储桶,不同用户允许使用相同名称的bucket;

3.ceph rgw支持的接口

RGW需要自己独有的守护进程服务才可以正常的使用,RGW并非必须的接口,仅在需要用到S3和Swift兼容的RESTful接口时才需要部署RGW实例,RGW在创建的时候,会自动初始化自己的存储池。

如上图所示,由于RGW提供与OpenStack Swift和Amazon S3兼容的接口,因此ceph对象网关具有自己的用户管理。
    - Amazon S3:
        兼容Amazon S3RESTful API,侧重命令行操作。
        提供了user,bucket和object分别表示用户,存储桶和对象,其中bucket隶属于user。
        因此user名称即可作为bucket的名称空间,不同用户允许使用相同的bucket。

    - OpenStack Swift:
        兼容OpenStack Swift API,侧重应用代码实践。
        提供了user,container和object分别对应于用户,存储桶和对象,不过它还额外为user提供了父及组件account,用于表示一个项目或租户。
        因此一个account中可包含一到多个user,它们可共享使用同一组container,并为container提供名称空间。

    - RadosGW:
       提供了user,subuser,bucket和object,其中user对应于S3的user,而subuser则对应于Swift的user,不过user和subuser都不支持为bucket提供名称空间,因此不同用户的存储桶也不允许同名。
       不过,自Jewel版本(10.2.11,2016-04-01~2018-07-01)起,RadosGW引入了tenant(租户)用于为user和bucket提供名称空间,但它是可选组件。
       Jewel版本之前,radosgw的所有user位于同一名称空间,它要求所有user的ID必须唯一,并且即便是不同user的bucket也不允许使用相同的bucket ID。

二.高可用radosgw实操案例

1 部署之前查看集群状态

[root@ceph141 ~]# ceph -s
  cluster:
    id:     3cb12fba-5f6e-11ef-b412-9d303a22b70f
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 11m)
    mgr: ceph141.cwgrgj(active, since 10m), standbys: ceph142.ymuzfe
    mds: 1/1 daemons up, 1 standby
    osd: 7 osds: 7 up (since 11m), 7 in (since 16h)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 48 objects, 492 KiB
    usage:   329 MiB used, 3.3 TiB / 3.3 TiB avail
    pgs:     65 active+clean

[root@ceph141 ~]#

2 创建一个服务

[root@ceph141 ~]# ceph orch apply rgw yinzhengjie
Scheduled rgw.yinzhengjie update...
[root@ceph141 ~]#

3 部署rgw组件

[root@ceph141 ~]# ceph orch daemon add  rgw  yinzhengjie ceph142
Deployed rgw.yinzhengjie.ceph141.csxaif on host 'ceph142'
[root@ceph141 ~]#

4 检查rgw组件是否部署成功

[root@ceph141 ~]# ceph -s
  cluster:
    id:     3cb12fba-5f6e-11ef-b412-9d303a22b70f
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 23m)
    mgr: ceph141.cwgrgj(active, since 23m), standbys: ceph142.ymuzfe
    mds: 1/1 daemons up, 1 standby
    osd: 7 osds: 7 up (since 23m), 7 in (since 16h)
    rgw: 1 daemon active (1 hosts, 1 zones)  # Duang~不难发现,多了一个rgw组件!

  data:
    volumes: 1/1 healthy
    pools:   7 pools, 193 pgs
    objects: 274 objects, 499 KiB
    usage:   430 MiB used, 3.3 TiB / 3.3 TiB avail
    pgs:     193 active+clean

[root@ceph141 ~]#

5 查看rgw默认创建的存储池信息

[root@ceph141 ~]# ceph osd pool ls
...
.rgw.root
default.rgw.log
default.rgw.control
default.rgw.meta
[root@ceph141 ~]# 
[root@ceph141 ~]# radosgw-admin zone get --rgw-zone=default --rgw-zonegroup=default
{
    "id": "10c61974-a41b-438d-ac2e-942b00e11d53",
    "name": "default",
    "domain_root": "default.rgw.meta:root",
    "control_pool": "default.rgw.control",
    "gc_pool": "default.rgw.log:gc",
    "lc_pool": "default.rgw.log:lc",
    "log_pool": "default.rgw.log",
    "intent_log_pool": "default.rgw.log:intent",
    "usage_log_pool": "default.rgw.log:usage",
    "roles_pool": "default.rgw.meta:roles",
    "reshard_pool": "default.rgw.log:reshard",
    "user_keys_pool": "default.rgw.meta:users.keys",
    "user_email_pool": "default.rgw.meta:users.email",
    "user_swift_pool": "default.rgw.meta:users.swift",
    "user_uid_pool": "default.rgw.meta:users.uid",
    "otp_pool": "default.rgw.otp",
    "system_key": {
        "access_key": "",
        "secret_key": ""
    },
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": "default.rgw.buckets.index",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": "default.rgw.buckets.data"
                    }
                },
                "data_extra_pool": "default.rgw.buckets.non-ec",
                "index_type": 0,
                "inline_data": true
            }
        }
    ],
    "realm_id": "",
    "notif_pool": "default.rgw.log:notif"
}
[root@ceph141 ~]#

6 查看ceph集群各组件部署信息

[root@ceph141 ~]# ceph  orch ls
NAME                  PORTS        RUNNING  REFRESHED  AGE  PLACEMENT    
alertmanager          ?:9093,9094      1/1  5m ago     46h  count:1      
ceph-exporter                          3/3  7m ago     46h  *            
crash                                  3/3  7m ago     46h  *            
grafana               ?:3000           1/1  5m ago     46h  count:1      
mds.oldboyedu-cephfs                   2/2  5m ago     18h  count:2      
mgr                                    2/2  5m ago     46h  count:2      
mon                                    3/5  7m ago     46h  count:5      
node-exporter         ?:9100           3/3  7m ago     46h  *            
osd                                      7  7m ago     -    <unmanaged>  
prometheus            ?:9095           1/1  5m ago     46h  count:1      
rgw.yinzhengjie       ?:80             1/1  5m ago     5m   ceph142      
[root@ceph141 ~]#

7 访问对象存储的WebUI

http://10.0.0.142/

三.s3cmd工具上传视频访问验证

1 安装s3cmd工具

[root@ceph141 ~]# echo 10.0.0.142 www.yinzhengjie.com >> /etc/hosts
[root@ceph141 ~]# 
[root@ceph141 ~]# apt -y install s3cmd

2 创建rgw账号

[root@ceph141 ~]# radosgw-admin user create --uid "jasonyin" --display-name "尹正杰"
{
    "user_id": "jasonyin",
    "display_name": "尹正杰",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "jasonyin",
            "access_key": "ZHOE7MVPLJFE5EIU738W",  # 注意,别丢了,下面要用!
            "secret_key": "VUNbdDwAGIq9AZv5f55e2gzptK1PUOnWg9nc44pE"   # 注意,别丢了,下面要用!
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

[root@ceph141 ~]#

3 运行s3cmd的运行环境,生成"/root/.s3cfg"配置文件

[root@ceph141 ~]# ll /root/.s3cfg
ls: cannot access '/root/.s3cfg': No such file or directory
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd --configure 

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: ZHOE7MVPLJFE5EIU738W  # rgw账号的access_key
Secret Key: VUNbdDwAGIq9AZv5f55e2gzptK1PUOnWg9nc44pE   # rgw账号的secret_key
Default Region [US]:  # 直接回车即可

Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: www.yinzhengjie.com  # 用于访问rgw的地址

Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: www.yinzhengjie.com/%(bucket)  # 设置DNS解析风格

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password:  # 文件不加密,直接回车即可 
Path to GPG program [/usr/bin/gpg]:  # 指定自定义的gpg程序路径,直接回车即可

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: No  # 你的rgw是否是https,如果不是设置为No

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name:   # 代理服务器的地址,我并没有配置代理服务器,因此直接回车即可

New settings:  # 注意,下面的信息是上面咱们填写时一个总的预览信息
  Access Key: ZHOE7MVPLJFE5EIU738W
  Secret Key: VUNbdDwAGIq9AZv5f55e2gzptK1PUOnWg9nc44pE
  Default Region: US
  S3 Endpoint: www.yinzhengjie.com
  DNS-style bucket+hostname:port template for accessing a bucket: www.yinzhengjie.com/%(bucket)
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name: 
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] Y  # 如果确认上述信息没问题的话,则输入字母Y即可。
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Not configured. Never mind.

Save settings? [y/N] y  # 是否保存配置,我们输入y,默认是不保存配置的。
Configuration saved to '/root/.s3cfg'
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# ll /root/.s3cfg
-rw------- 1 root root 2269 Aug 23 09:59 /root/.s3cfg
[root@ceph141 ~]#

4 创建buckets

[root@ceph141 ~]# s3cmd mb s3://yinzhengjie-bucket
Bucket 's3://yinzhengjie-bucket/' created
[root@ceph141 ~]#


温馨提示:
    通用存储桶命名规则,以下命名规则适用于通用存储桶。
        - 1存储桶名称必须介于 3(最少)到 63(最多)个字符之间。
        - 2.存储桶名称只能由小写字母、数字、句点(.)和连字符(-)组成。
        - 3.存储桶名称必须以字母或数字开头和结尾。
        - 4.存储桶名称不得包含两个相邻的句点。
        - 5.存储桶名称不得采用 IP 地址格式(例如,192.168.5.4)。
        - 6.存储桶名称不得以前缀 xn-- 开头。
        - 7.存储桶名称不得以前缀 sthree- 开头。
        - 8.存储桶名称不得以前缀 sthree-configurator 开头。
        - 9.存储桶名称不得以前缀 amzn-s3-demo- 开头。
        - 10.存储桶名称不得以后缀 -s3alias 结尾。此后缀是为接入点别名预留的。有关更多信息,请参阅 为您的 S3 存储桶接入点使用存储桶式别名。
        - 11.存储桶名称不得以后缀 --ol-s3 结尾。此后缀是为对象 Lambda 接入点别名预留的。有关更多信息,请参阅 如何为您的 S3 存储桶对象 Lambda 接入点使用存储桶式别名。
        - 12.存储桶名称不得以后缀 .mrap 结尾。此后缀预留用于多区域接入点名称。有关更多信息,请参阅 命名 Amazon S3 多区域接入点的规则。
        - 13.存储桶名称不得以后缀 --x-s3 结尾。此后缀预留用于目录存储桶。有关更多信息,请参阅 目录存储桶命名规则。
        - 14.存储桶名称在分区内所有 AWS 区域中的所有 AWS 账户间必须是唯一的。分区是一组区域。AWS 目前有三个分区:aws(标准区域)、aws-cn(中国区域)和 aws-us-gov(AWS GovCloud (US))。
        - 15.存储桶名称不能被同一分区中的另一个 AWS 账户使用,直到存储桶被删除。
        - 16.与 Amazon S3 Transfer Acceleration 一起使用的存储桶名称中不能有句点(.)。 

    为了获得最佳兼容性,我们建议您避免在存储桶名称中使用句点(.),但仅用于静态网站托管的存储桶除外。如果您在存储桶名称中包含句点,则无法通过 HTTPS 使用虚拟主机式寻址,除非您执行自己的证书验证。这是因为用于存储桶虚拟托管的安全证书不适用于名称中带有句点的存储桶。

    此限制不会影响用于静态网站托管的存储桶,因为静态网站托管只能通过 HTTP 提供。有关虚拟主机式寻址的更多信息,请参阅存储桶的虚拟托管。有关静态网站托管的更多信息,请参阅使用 Amazon S3 托管静态网站。

    参考链接:
        https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/userguide/bucketnamingrules.html

5.查看buckets

[root@ceph141 ~]# s3cmd ls
2024-08-23 02:03  s3://yinzhengjie-bucket
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# radosgw-admin buckets list 
[
    "yinzhengjie-bucket"
]
[root@ceph141 ~]#

6.使用s3cmd上传数据到buckets

[root@ceph141 ~]# ll  01-昨日内容回顾及今日内容预告.mp4 
-rw-r--r-- 1 root root 36084548 Aug 23 10:06 01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd put 01-昨日内容回顾及今日内容预告.mp4 s3://yinzhengjie-bucket
upload: '01-昨日内容回顾及今日内容预告.mp4' -> 's3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4'  [part 1 of 3, 15MB] [1 of 1]
 15728640 of 15728640   100% in    3s     4.18 MB/s  done
upload: '01-昨日内容回顾及今日内容预告.mp4' -> 's3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4'  [part 2 of 3, 15MB] [1 of 1]
 15728640 of 15728640   100% in    0s    21.34 MB/s  done
upload: '01-昨日内容回顾及今日内容预告.mp4' -> 's3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4'  [part 3 of 3, 4MB] [1 of 1]
 4627268 of 4627268   100% in    0s    23.26 MB/s  done
[root@ceph141 ~]# 
[root@ceph141 ~]# echo 15728640+15728640+4627268 | bc  # 很明显,上面在上传视频的时候把文件拆成了3个部分,上传的总大小是一致的。
36084548
[root@ceph141 ~]#

温馨提示:
  如上所示,对于一个大的RGW Object,会被切割成多个独立的RGW Object上传,称为"multipart",“multipart”的优势是断点续传。s3接口默认切割大小为15MB。

7 使用s3cmd下载数据

[root@ceph141 ~]# ll 01-昨日内容回顾及今日内容预告.mp4 
-rw-r--r-- 1 root root 36084548 Aug 23 10:06 01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]# 
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd get s3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4 /tmp/
download: 's3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4' -> '/tmp/01-昨日内容回顾及今日内容预告.mp4'  [1 of 1]
 36084548 of 36084548   100% in    0s   106.15 MB/s  done
[root@ceph141 ~]# 
[root@ceph141 ~]# ll /tmp/01-昨日内容回顾及今日内容预告.mp4 
-rw-r--r-- 1 root root 36084548 Aug 23 02:07 /tmp/01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]# 
[root@ceph141 ~]# md5sum 01-昨日内容回顾及今日内容预告.mp4 /tmp/01-昨日内容回顾及今日内容预告.mp4 
fc7be02a17330902eff0214616bd6312  01-昨日内容回顾及今日内容预告.mp4
fc7be02a17330902eff0214616bd6312  /tmp/01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]# 
[root@ceph141 ~]# diff 01-昨日内容回顾及今日内容预告.mp4 /tmp/01-昨日内容回顾及今日内容预告.mp4 
[root@ceph141 ~]#

8 授权策略

[root@ceph141 ~]# cat yinzhengjie-anonymous-access-policy.json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["*"]},
    "Action": "s3:GetObject",
    "Resource": [
      "arn:aws:s3:::yinzhengjie-bucket/*"
    ]
  }]
}
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd setpolicy yinzhengjie-anonymous-access-policy.json s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/: Policy updated
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd info s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/ (bucket):
   Location:  default
   Payer:     BucketOwner
   Expiration Rule: none
   Policy:    {
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["*"]},
    "Action": "s3:GetObject",
    "Resource": [
      "arn:aws:s3:::yinzhengjie-bucket/*"
    ]
  }]
}

   CORS:      none
   ACL:       尹正杰: FULL_CONTROL
[root@ceph141 ~]#

9.基于http方式访问对象存储

http://10.0.0.142/yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4

温馨提示:
    - 1.对于对象存储网关而言,我们需要将"www.yinzhengjie.com"解析到ceph141,ceph142,ceph143的任意一个节点上;
    - 2.生产环境中,建议在rgw设备前加一个负载均衡器,以防止后端rgw宕机的情况,以减少单点故障的问题;
    - 3.在使用http方式访问对象存储的时候,我们需要注意以下事项:
        - 3.1 资源对象的访问方式
            http还是https,依赖于rgw的基本配置。
        - 3.2 资源对象的访问控制
           通过定制策略的方式来实现。
        - 3.3 资源对象的跨域问题
           通过定义cors的方式来实现。
        - 3.4 资源对象在浏览器端端缓存机制
           rgw端基本配置定制。

10 删除策略

[root@ceph141 ~]# s3cmd info s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/ (bucket):
   Location:  default
   Payer:     BucketOwner
   Expiration Rule: none
   Policy:    {
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["*"]},
    "Action": "s3:GetObject",
    "Resource": [
      "arn:aws:s3:::yinzhengjie-bucket/*"
    ]
  }]
}

   CORS:      none
   ACL:       尹正杰: FULL_CONTROL
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd delpolicy  s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/: Policy deleted
[root@ceph141 ~]# 
[root@ceph141 ~]# s3cmd info s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/ (bucket):
   Location:  default
   Payer:     BucketOwner
   Expiration Rule: none
   Policy:    none
   CORS:      none
   ACL:       尹正杰: FULL_CONTROL
[root@ceph141 ~]# 


请思考:
    可道云之前使用阿里云的对象存储,那么现在是否可以使用ceph的对象存储来替代呢?将他封装成一个项目吧!

11.再次访问测试,发现无法访问

访问URL:
http://10.0.0.142/yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4


返回响应:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message/>
<BucketName>yinzhengjie-bucket</BucketName>
<RequestId>tx00000d1b139914db38023-0066c7f24c-fc49-default</RequestId>
<HostId>fc49-default-default</HostId>
</Error>

12 其他使用技巧

其实s3cmd还支持存储桶和文件的其他管理方式,如果需要,自行参考帮助信息即可。

具体实操可以课堂演示下。 

[root@ceph141 ~]# s3cmd -h
Usage: s3cmd [options] COMMAND [parameters]

S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.

Options:
  -h, --help            show this help message and exit
  --configure           Invoke interactive (re)configuration tool. Optionally
                        use as '--configure s3://some-bucket' to test access
                        to a specific bucket instead of attempting to list
                        them all.
  -c FILE, --config=FILE
                        Config file name. Defaults to $HOME/.s3cfg
  --dump-config         Dump current configuration after parsing config files
                        and command line options and exit.
  --access_key=ACCESS_KEY
                        AWS Access Key
  --secret_key=SECRET_KEY
                        AWS Secret Key
  --access_token=ACCESS_TOKEN
                        AWS Access Token
  -n, --dry-run         Only show what should be uploaded or downloaded but
                        don't actually do it. May still perform S3 requests to
                        get bucket listings and other information though (only
                        for file transfer commands)
  -s, --ssl             Use HTTPS connection when communicating with S3.
                        (default)
  --no-ssl              Don't use HTTPS.
  -e, --encrypt         Encrypt files before uploading to S3.
  --no-encrypt          Don't encrypt files.
  -f, --force           Force overwrite and other dangerous operations.
  --continue            Continue getting a partially downloaded file (only for
                        [get] command).
  --continue-put        Continue uploading partially uploaded files or
                        multipart upload parts.  Restarts parts/files that
                        don't have matching size and md5.  Skips files/parts
                        that do.  Note: md5sum checks are not always
                        sufficient to check (part) file equality.  Enable this
                        at your own risk.
  --upload-id=UPLOAD_ID
                        UploadId for Multipart Upload, in case you want
                        continue an existing upload (equivalent to --continue-
                        put) and there are multiple partial uploads.  Use
                        s3cmd multipart [URI] to see what UploadIds are
                        associated with the given URI.
  --skip-existing       Skip over files that exist at the destination (only
                        for [get] and [sync] commands).
  -r, --recursive       Recursive upload, download or removal.
  --check-md5           Check MD5 sums when comparing files for [sync].
                        (default)
  --no-check-md5        Do not check MD5 sums when comparing files for [sync].
                        Only size will be compared. May significantly speed up
                        transfer but may also miss some changed files.
  -P, --acl-public      Store objects with ACL allowing read for anyone.
  --acl-private         Store objects with default ACL allowing access for you
                        only.
  --acl-grant=PERMISSION:EMAIL or USER_CANONICAL_ID
                        Grant stated permission to a given amazon user.
                        Permission is one of: read, write, read_acp,
                        write_acp, full_control, all
  --acl-revoke=PERMISSION:USER_CANONICAL_ID
                        Revoke stated permission for a given amazon user.
                        Permission is one of: read, write, read_acp,
                        write_acp, full_control, all
  -D NUM, --restore-days=NUM
                        Number of days to keep restored file available (only
                        for 'restore' command). Default is 1 day.
  --restore-priority=RESTORE_PRIORITY
                        Priority for restoring files from S3 Glacier (only for
                        'restore' command). Choices available: bulk, standard,
                        expedited
  --delete-removed      Delete destination objects with no corresponding
                        source file [sync]
  --no-delete-removed   Don't delete destination objects [sync]
  --delete-after        Perform deletes AFTER new uploads when delete-removed
                        is enabled [sync]
  --delay-updates       *OBSOLETE* Put all updated files into place at end
                        [sync]
  --max-delete=NUM      Do not delete more than NUM files. [del] and [sync]
  --limit=NUM           Limit number of objects returned in the response body
                        (only for [ls] and [la] commands)
  --add-destination=ADDITIONAL_DESTINATIONS
                        Additional destination for parallel uploads, in
                        addition to last arg.  May be repeated.
  --delete-after-fetch  Delete remote objects after fetching to local file
                        (only for [get] and [sync] commands).
  -p, --preserve        Preserve filesystem attributes (mode, ownership,
                        timestamps). Default for [sync] command.
  --no-preserve         Don't store FS attributes
  --exclude=GLOB        Filenames and paths matching GLOB will be excluded
                        from sync
  --exclude-from=FILE   Read --exclude GLOBs from FILE
  --rexclude=REGEXP     Filenames and paths matching REGEXP (regular
                        expression) will be excluded from sync
  --rexclude-from=FILE  Read --rexclude REGEXPs from FILE
  --include=GLOB        Filenames and paths matching GLOB will be included
                        even if previously excluded by one of
                        --(r)exclude(-from) patterns
  --include-from=FILE   Read --include GLOBs from FILE
  --rinclude=REGEXP     Same as --include but uses REGEXP (regular expression)
                        instead of GLOB
  --rinclude-from=FILE  Read --rinclude REGEXPs from FILE
  --files-from=FILE     Read list of source-file names from FILE. Use - to
                        read from stdin.
  --region=REGION, --bucket-location=REGION
                        Region to create bucket in. As of now the regions are:
                        us-east-1, us-west-1, us-west-2, eu-west-1, eu-
                        central-1, ap-northeast-1, ap-southeast-1, ap-
                        southeast-2, sa-east-1
  --host=HOSTNAME       HOSTNAME:PORT for S3 endpoint (default:
                        s3.amazonaws.com, alternatives such as s3-eu-
                        west-1.amazonaws.com). You should also set --host-
                        bucket.
  --host-bucket=HOST_BUCKET
                        DNS-style bucket+hostname:port template for accessing
                        a bucket (default: %(bucket)s.s3.amazonaws.com)
  --reduced-redundancy, --rr
                        Store object with 'Reduced redundancy'. Lower per-GB
                        price. [put, cp, mv]
  --no-reduced-redundancy, --no-rr
                        Store object without 'Reduced redundancy'. Higher per-
                        GB price. [put, cp, mv]
  --storage-class=CLASS
                        Store object with specified CLASS (STANDARD,
                        STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER
                        or DEEP_ARCHIVE). [put, cp, mv]
  --access-logging-target-prefix=LOG_TARGET_PREFIX
                        Target prefix for access logs (S3 URI) (for [cfmodify]
                        and [accesslog] commands)
  --no-access-logging   Disable access logging (for [cfmodify] and [accesslog]
                        commands)
  --default-mime-type=DEFAULT_MIME_TYPE
                        Default MIME-type for stored objects. Application
                        default is binary/octet-stream.
  -M, --guess-mime-type
                        Guess MIME-type of files by their extension or mime
                        magic. Fall back to default MIME-Type as specified by
                        --default-mime-type option
  --no-guess-mime-type  Don't guess MIME-type and use the default type
                        instead.
  --no-mime-magic       Don't use mime magic when guessing MIME-type.
  -m MIME/TYPE, --mime-type=MIME/TYPE
                        Force MIME-type. Override both --default-mime-type and
                        --guess-mime-type.
  --add-header=NAME:VALUE
                        Add a given HTTP header to the upload request. Can be
                        used multiple times. For instance set 'Expires' or
                        'Cache-Control' headers (or both) using this option.
  --remove-header=NAME  Remove a given HTTP header.  Can be used multiple
                        times.  For instance, remove 'Expires' or 'Cache-
                        Control' headers (or both) using this option. [modify]
  --server-side-encryption
                        Specifies that server-side encryption will be used
                        when putting objects. [put, sync, cp, modify]
  --server-side-encryption-kms-id=KMS_KEY
                        Specifies the key id used for server-side encryption
                        with AWS KMS-Managed Keys (SSE-KMS) when putting
                        objects. [put, sync, cp, modify]
  --encoding=ENCODING   Override autodetected terminal and filesystem encoding
                        (character set). Autodetected: UTF-8
  --add-encoding-exts=EXTENSIONs
                        Add encoding to these comma delimited extensions i.e.
                        (css,js,html) when uploading to S3 )
  --verbatim            Use the S3 name as given on the command line. No pre-
                        processing, encoding, etc. Use with caution!
  --disable-multipart   Disable multipart upload on files bigger than
                        --multipart-chunk-size-mb
  --multipart-chunk-size-mb=SIZE
                        Size of each chunk of a multipart upload. Files bigger
                        than SIZE are automatically uploaded as multithreaded-
                        multipart, smaller files are uploaded using the
                        traditional method. SIZE is in Mega-Bytes, default
                        chunk size is 15MB, minimum allowed chunk size is 5MB,
                        maximum is 5GB.
  --list-md5            Include MD5 sums in bucket listings (only for 'ls'
                        command).
  -H, --human-readable-sizes
                        Print sizes in human readable form (eg 1kB instead of
                        1234).
  --ws-index=WEBSITE_INDEX
                        Name of index-document (only for [ws-create] command)
  --ws-error=WEBSITE_ERROR
                        Name of error-document (only for [ws-create] command)
  --expiry-date=EXPIRY_DATE
                        Indicates when the expiration rule takes effect. (only
                        for [expire] command)
  --expiry-days=EXPIRY_DAYS
                        Indicates the number of days after object creation the
                        expiration rule takes effect. (only for [expire]
                        command)
  --expiry-prefix=EXPIRY_PREFIX
                        Identifying one or more objects with the prefix to
                        which the expiration rule applies. (only for [expire]
                        command)
  --progress            Display progress meter (default on TTY).
  --no-progress         Don't display progress meter (default on non-TTY).
  --stats               Give some file-transfer stats.
  --enable              Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --disable             Disable given CloudFront distribution (only for
                        [cfmodify] command)
  --cf-invalidate       Invalidate the uploaded filed in CloudFront. Also see
                        [cfinval] command.
  --cf-invalidate-default-index
                        When using Custom Origin and S3 static website,
                        invalidate the default index file.
  --cf-no-invalidate-default-index-root
                        When using Custom Origin and S3 static website, don't
                        invalidate the path to the default index file.
  --cf-add-cname=CNAME  Add given CNAME to a CloudFront distribution (only for
                        [cfcreate] and [cfmodify] commands)
  --cf-remove-cname=CNAME
                        Remove given CNAME from a CloudFront distribution
                        (only for [cfmodify] command)
  --cf-comment=COMMENT  Set COMMENT for a given CloudFront distribution (only
                        for [cfcreate] and [cfmodify] commands)
  --cf-default-root-object=DEFAULT_ROOT_OBJECT
                        Set the default root object to return when no object
                        is specified in the URL. Use a relative path, i.e.
                        default/index.html instead of /default/index.html or
                        s3://bucket/default/index.html (only for [cfcreate]
                        and [cfmodify] commands)
  -v, --verbose         Enable verbose output.
  -d, --debug           Enable debug output.
  --version             Show s3cmd version (2.2.0) and exit.
  -F, --follow-symlinks
                        Follow symbolic links as if they are regular files
  --cache-file=FILE     Cache FILE containing local source MD5 values
  -q, --quiet           Silence output on stdout
  --ca-certs=CA_CERTS_FILE
                        Path to SSL CA certificate FILE (instead of system
                        default)
  --ssl-cert=SSL_CLIENT_CERT_FILE
                        Path to client own SSL certificate CRT_FILE
  --ssl-key=SSL_CLIENT_KEY_FILE
                        Path to client own SSL certificate private key
                        KEY_FILE
  --check-certificate   Check SSL certificate validity
  --no-check-certificate
                        Do not check SSL certificate validity
  --check-hostname      Check SSL certificate hostname validity
  --no-check-hostname   Do not check SSL certificate hostname validity
  --signature-v2        Use AWS Signature version 2 instead of newer signature
                        methods. Helpful for S3-like systems that don't have
                        AWS Signature v4 yet.
  --limit-rate=LIMITRATE
                        Limit the upload or download speed to amount bytes per
                        second.  Amount may be expressed in bytes, kilobytes
                        with the k suffix, or megabytes with the m suffix
  --no-connection-pooling
                        Disable connection re-use
  --requester-pays      Set the REQUESTER PAYS flag for operations
  -l, --long-listing    Produce long listing [ls]
  --stop-on-error       stop if error in transfer
  --content-disposition=CONTENT_DISPOSITION
                        Provide a Content-Disposition for signed URLs, e.g.,
                        "inline; filename=myvideo.mp4"
  --content-type=CONTENT_TYPE
                        Provide a Content-Type for signed URLs, e.g.,
                        "video/mp4"

Commands:
  Make bucket
      s3cmd mb s3://BUCKET
  Remove bucket
      s3cmd rb s3://BUCKET
  List objects or buckets
      s3cmd ls [s3://BUCKET[/PREFIX]]
  List all object in all buckets
      s3cmd la 
  Put file into bucket
      s3cmd put FILE [FILE...] s3://BUCKET[/PREFIX]
  Get file from bucket
      s3cmd get s3://BUCKET/OBJECT LOCAL_FILE
  Delete file from bucket
      s3cmd del s3://BUCKET/OBJECT
  Delete file from bucket (alias for del)
      s3cmd rm s3://BUCKET/OBJECT
  Restore file from Glacier storage
      s3cmd restore s3://BUCKET/OBJECT
  Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below)
      s3cmd sync LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR or s3://BUCKET[/PREFIX] s3://BUCKET[/PREFIX]
  Disk usage by buckets
      s3cmd du [s3://BUCKET[/PREFIX]]
  Get various information about Buckets or Files
      s3cmd info s3://BUCKET[/OBJECT]
  Copy object
      s3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
  Modify object metadata
      s3cmd modify s3://BUCKET1/OBJECT
  Move object
      s3cmd mv s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
  Modify Access control list for Bucket or Files
      s3cmd setacl s3://BUCKET[/OBJECT]
  Modify Bucket Policy
      s3cmd setpolicy FILE s3://BUCKET
  Delete Bucket Policy
      s3cmd delpolicy s3://BUCKET
  Modify Bucket CORS
      s3cmd setcors FILE s3://BUCKET
  Delete Bucket CORS
      s3cmd delcors s3://BUCKET
  Modify Bucket Requester Pays policy
      s3cmd payer s3://BUCKET
  Show multipart uploads
      s3cmd multipart s3://BUCKET [Id]
  Abort a multipart upload
      s3cmd abortmp s3://BUCKET/OBJECT Id
  List parts of a multipart upload
      s3cmd listmp s3://BUCKET/OBJECT Id
  Enable/disable bucket access logging
      s3cmd accesslog s3://BUCKET
  Sign arbitrary string using the secret key
      s3cmd sign STRING-TO-SIGN
  Sign an S3 URL to provide limited public access with expiry
      s3cmd signurl s3://BUCKET/OBJECT <expiry_epoch|+expiry_offset>
  Fix invalid file names in a bucket
      s3cmd fixbucket s3://BUCKET[/PREFIX]
  Create Website from bucket
      s3cmd ws-create s3://BUCKET
  Delete Website
      s3cmd ws-delete s3://BUCKET
  Info about Website
      s3cmd ws-info s3://BUCKET
  Set or delete expiration rule for the bucket
      s3cmd expire s3://BUCKET
  Upload a lifecycle policy for the bucket
      s3cmd setlifecycle FILE s3://BUCKET
  Get a lifecycle policy for the bucket
      s3cmd getlifecycle s3://BUCKET
  Remove a lifecycle policy for the bucket
      s3cmd dellifecycle s3://BUCKET
  List CloudFront distribution points
      s3cmd cflist 
  Display CloudFront distribution point parameters
      s3cmd cfinfo [cf://DIST_ID]
  Create CloudFront distribution point
      s3cmd cfcreate s3://BUCKET
  Delete CloudFront distribution point
      s3cmd cfdelete cf://DIST_ID
  Change CloudFront distribution point parameters
      s3cmd cfmodify cf://DIST_ID
  Display CloudFront invalidation request(s) status
      s3cmd cfinvalinfo cf://DIST_ID[/INVAL_ID]

For more information, updates and news, visit the s3cmd website:
http://s3tools.org

[root@ceph141 ~]#
相关实践学习
借助OSS搭建在线教育视频课程分享网站
本教程介绍如何基于云服务器ECS和对象存储OSS,搭建一个在线教育视频课程分享网站。
目录
相关文章
|
12天前
|
存储 API Swift
Ceph Reef(18.2.X)之Swift操作对象存储网关
这篇文章详细介绍了Ceph Reef(18.2.X)中通过Swift API操作对象存储网关的方法,包括创建用户、子用户、配置环境变量、以及使用swift命令行工具进行存储桶和对象的管理。
22 7
Ceph Reef(18.2.X)之Swift操作对象存储网关
|
12天前
|
对象存储 Python
Ceph Reef(18.2.X)之python操作对象存储网关
这篇文章介绍了如何在Ceph Reef(18.2.X)环境中使用Python操作对象存储网关(rgw),包括环境搭建、账号创建、使用s3cmd工具以及编写和测试Python代码。
29 3
|
2月前
|
机器学习/深度学习 人工智能 专有云
人工智能平台PAI使用问题之怎么将DLC的数据写入到另一个阿里云主账号的OSS中
阿里云人工智能平台PAI是一个功能强大、易于使用的AI开发平台,旨在降低AI开发门槛,加速创新,助力企业和开发者高效构建、部署和管理人工智能应用。其中包含了一系列相互协同的产品与服务,共同构成一个完整的人工智能开发与应用生态系统。以下是对PAI产品使用合集的概述,涵盖数据处理、模型开发、训练加速、模型部署及管理等多个环节。
|
1月前
|
存储 机器学习/深度学习 弹性计算
阿里云EMR数据湖文件系统问题之OSS-HDFS全托管服务的问题如何解决
阿里云EMR数据湖文件系统问题之OSS-HDFS全托管服务的问题如何解决
|
2月前
|
消息中间件 分布式计算 DataWorks
DataWorks产品使用合集之如何使用Python和阿里云SDK读取OSS中的文件
DataWorks作为一站式的数据开发与治理平台,提供了从数据采集、清洗、开发、调度、服务化、质量监控到安全管理的全套解决方案,帮助企业构建高效、规范、安全的大数据处理体系。以下是对DataWorks产品使用合集的概述,涵盖数据处理的各个环节。
|
2月前
|
存储 运维 安全
阿里云OSS的优势
【7月更文挑战第19天】阿里云OSS的优势
103 2
|
2月前
|
存储 API 开发工具
阿里云OSS
【7月更文挑战第19天】阿里云OSS
89 1
|
2月前
|
人工智能 对象存储
【阿里云AI助理】自家产品提供错误答案。阿里云OSS 资源包类型: 下行流量 地域: 中国内地通用 下行流量包规格: 300 GB 套餐: 下行流量包(中国内地) ,包1年。那么这个是每月300GB,1年是3600GB的流量;还是1年只有300GB的流量?
自家产品提供错误答案。阿里云OSS 资源包类型: 下行流量 地域: 中国内地通用 下行流量包规格: 300 GB 套餐: 下行流量包(中国内地) ,包1年。那么这个是每月300GB,1年是3600GB的流量;还是1年只有300GB的流量?
108 1
|
2月前
|
存储 弹性计算 对象存储
预留空间是什么?阿里云OSS对象存储预留空间说明
阿里云OSS预留空间是预付费存储产品,提供折扣价以锁定特定容量,适用于抵扣有地域属性的Bucket标准存储费用及ECS快照费。通过购买预留空间,如500GB通用预留+100GB标准-本地冗余存储包,用户可优化成本。
|
3月前
|
SQL 分布式计算 DataWorks
DataWorks产品使用合集之如何将CSV文件从阿里云OSS同步到ODPS表,并且使用列作为表分区
DataWorks作为一站式的数据开发与治理平台,提供了从数据采集、清洗、开发、调度、服务化、质量监控到安全管理的全套解决方案,帮助企业构建高效、规范、安全的大数据处理体系。以下是对DataWorks产品使用合集的概述,涵盖数据处理的各个环节。
DataWorks产品使用合集之如何将CSV文件从阿里云OSS同步到ODPS表,并且使用列作为表分区