作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
一.对象存储系统概述
1.对象存储网关概述
Ceph对象网关可以将数据存储在用于存储来自cephfs客户端或ceph rbd客户端的数据的同一ceph存储集群中。
object是对象存储系统中数据存储的基本单位,每个Object时数据和数据属性集的综合体,数据数据可以根据应用的需求进行设置,包括数据分布,服务质量等。
每个对象自我维护其属性,从而简化了存储系统的管理任务,对象的大小可以不同,甚至可以包含整个数据结构,如文件,数据库表项等,文件等上传和下载,默认有一个最大的数据块15MB。
Ceph对象存储使用Ceph对象网关守护进程(Rados GateWay,简称rgw),它是用于与ceph存储集群进行交互式的HTTP服务器。
Ceph RGW基于librados,是为应用提供RESTful类型的对象存储接口,默认使用Civetweb作为其Web Service。
在N版本中Civetweb默认使用法端口7480提供服务,但R版本(18.2.4)中使用了80端口,若想自定义端口就需要修改ceph的配置文件。
- 自0.80版本(Firefly,2014-05-01~2016-04-01)起,Ceph放弃了apache和fastcgi提供radosgw服务;
- 默认嵌入了在ceph-radosgw进程中的Citeweb,这种新的实现方式更加轻便和简洁,但直到Ceph 11.0.1 Kraken(2017-01-01~2017-08-01)版本,Citeweb才开始支持SSL协议。
推荐阅读:
https://docs.ceph.com/en/nautilus/radosgw/
https://docs.ceph.com/en/nautilus/radosgw/bucketpolicy/
https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/userguide/bucketnamingrules.html
https://www.s3express.com/help/help.html
2.对象存储系统的核心资源概述
各种存储方案虽然在设计与实现上有所区别,但大多数对象存储系统对外呈现的核心资源类型大同小异。
一般来说,一个对象存储系统的核心资源应该包括(User),存储桶(Bucket)和对象(object),它们之间的关系是:
- 1.User将Object存储到存储系统上的Bucket;
- 2.存储桶属于某个用户并可以容纳对象,一个存储桶用于存储多个对象;
- 3.同一个用户可以拥有多个存储桶,不同用户允许使用相同名称的bucket;
3.ceph rgw支持的接口
RGW需要自己独有的守护进程服务才可以正常的使用,RGW并非必须的接口,仅在需要用到S3和Swift兼容的RESTful接口时才需要部署RGW实例,RGW在创建的时候,会自动初始化自己的存储池。
如上图所示,由于RGW提供与OpenStack Swift和Amazon S3兼容的接口,因此ceph对象网关具有自己的用户管理。
- Amazon S3:
兼容Amazon S3RESTful API,侧重命令行操作。
提供了user,bucket和object分别表示用户,存储桶和对象,其中bucket隶属于user。
因此user名称即可作为bucket的名称空间,不同用户允许使用相同的bucket。
- OpenStack Swift:
兼容OpenStack Swift API,侧重应用代码实践。
提供了user,container和object分别对应于用户,存储桶和对象,不过它还额外为user提供了父及组件account,用于表示一个项目或租户。
因此一个account中可包含一到多个user,它们可共享使用同一组container,并为container提供名称空间。
- RadosGW:
提供了user,subuser,bucket和object,其中user对应于S3的user,而subuser则对应于Swift的user,不过user和subuser都不支持为bucket提供名称空间,因此不同用户的存储桶也不允许同名。
不过,自Jewel版本(10.2.11,2016-04-01~2018-07-01)起,RadosGW引入了tenant(租户)用于为user和bucket提供名称空间,但它是可选组件。
Jewel版本之前,radosgw的所有user位于同一名称空间,它要求所有user的ID必须唯一,并且即便是不同user的bucket也不允许使用相同的bucket ID。
二.高可用radosgw实操案例
1 部署之前查看集群状态
[root@ceph141 ~]# ceph -s
cluster:
id: 3cb12fba-5f6e-11ef-b412-9d303a22b70f
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 11m)
mgr: ceph141.cwgrgj(active, since 10m), standbys: ceph142.ymuzfe
mds: 1/1 daemons up, 1 standby
osd: 7 osds: 7 up (since 11m), 7 in (since 16h)
data:
volumes: 1/1 healthy
pools: 3 pools, 65 pgs
objects: 48 objects, 492 KiB
usage: 329 MiB used, 3.3 TiB / 3.3 TiB avail
pgs: 65 active+clean
[root@ceph141 ~]#
2 创建一个服务
[root@ceph141 ~]# ceph orch apply rgw yinzhengjie
Scheduled rgw.yinzhengjie update...
[root@ceph141 ~]#
3 部署rgw组件
[root@ceph141 ~]# ceph orch daemon add rgw yinzhengjie ceph142
Deployed rgw.yinzhengjie.ceph141.csxaif on host 'ceph142'
[root@ceph141 ~]#
4 检查rgw组件是否部署成功
[root@ceph141 ~]# ceph -s
cluster:
id: 3cb12fba-5f6e-11ef-b412-9d303a22b70f
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 23m)
mgr: ceph141.cwgrgj(active, since 23m), standbys: ceph142.ymuzfe
mds: 1/1 daemons up, 1 standby
osd: 7 osds: 7 up (since 23m), 7 in (since 16h)
rgw: 1 daemon active (1 hosts, 1 zones) # Duang~不难发现,多了一个rgw组件!
data:
volumes: 1/1 healthy
pools: 7 pools, 193 pgs
objects: 274 objects, 499 KiB
usage: 430 MiB used, 3.3 TiB / 3.3 TiB avail
pgs: 193 active+clean
[root@ceph141 ~]#
5 查看rgw默认创建的存储池信息
[root@ceph141 ~]# ceph osd pool ls
...
.rgw.root
default.rgw.log
default.rgw.control
default.rgw.meta
[root@ceph141 ~]#
[root@ceph141 ~]# radosgw-admin zone get --rgw-zone=default --rgw-zonegroup=default
{
"id": "10c61974-a41b-438d-ac2e-942b00e11d53",
"name": "default",
"domain_root": "default.rgw.meta:root",
"control_pool": "default.rgw.control",
"gc_pool": "default.rgw.log:gc",
"lc_pool": "default.rgw.log:lc",
"log_pool": "default.rgw.log",
"intent_log_pool": "default.rgw.log:intent",
"usage_log_pool": "default.rgw.log:usage",
"roles_pool": "default.rgw.meta:roles",
"reshard_pool": "default.rgw.log:reshard",
"user_keys_pool": "default.rgw.meta:users.keys",
"user_email_pool": "default.rgw.meta:users.email",
"user_swift_pool": "default.rgw.meta:users.swift",
"user_uid_pool": "default.rgw.meta:users.uid",
"otp_pool": "default.rgw.otp",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "default.rgw.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "default.rgw.buckets.data"
}
},
"data_extra_pool": "default.rgw.buckets.non-ec",
"index_type": 0,
"inline_data": true
}
}
],
"realm_id": "",
"notif_pool": "default.rgw.log:notif"
}
[root@ceph141 ~]#
6 查看ceph集群各组件部署信息
[root@ceph141 ~]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,9094 1/1 5m ago 46h count:1
ceph-exporter 3/3 7m ago 46h *
crash 3/3 7m ago 46h *
grafana ?:3000 1/1 5m ago 46h count:1
mds.oldboyedu-cephfs 2/2 5m ago 18h count:2
mgr 2/2 5m ago 46h count:2
mon 3/5 7m ago 46h count:5
node-exporter ?:9100 3/3 7m ago 46h *
osd 7 7m ago - <unmanaged>
prometheus ?:9095 1/1 5m ago 46h count:1
rgw.yinzhengjie ?:80 1/1 5m ago 5m ceph142
[root@ceph141 ~]#
7 访问对象存储的WebUI
http://10.0.0.142/
三.s3cmd工具上传视频访问验证
1 安装s3cmd工具
[root@ceph141 ~]# echo 10.0.0.142 www.yinzhengjie.com >> /etc/hosts
[root@ceph141 ~]#
[root@ceph141 ~]# apt -y install s3cmd
2 创建rgw账号
[root@ceph141 ~]# radosgw-admin user create --uid "jasonyin" --display-name "尹正杰"
{
"user_id": "jasonyin",
"display_name": "尹正杰",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [],
"keys": [
{
"user": "jasonyin",
"access_key": "ZHOE7MVPLJFE5EIU738W", # 注意,别丢了,下面要用!
"secret_key": "VUNbdDwAGIq9AZv5f55e2gzptK1PUOnWg9nc44pE" # 注意,别丢了,下面要用!
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw",
"mfa_ids": []
}
[root@ceph141 ~]#
3 运行s3cmd的运行环境,生成"/root/.s3cfg"配置文件
[root@ceph141 ~]# ll /root/.s3cfg
ls: cannot access '/root/.s3cfg': No such file or directory
[root@ceph141 ~]#
[root@ceph141 ~]# s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: ZHOE7MVPLJFE5EIU738W # rgw账号的access_key
Secret Key: VUNbdDwAGIq9AZv5f55e2gzptK1PUOnWg9nc44pE # rgw账号的secret_key
Default Region [US]: # 直接回车即可
Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: www.yinzhengjie.com # 用于访问rgw的地址
Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: www.yinzhengjie.com/%(bucket) # 设置DNS解析风格
Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: # 文件不加密,直接回车即可
Path to GPG program [/usr/bin/gpg]: # 指定自定义的gpg程序路径,直接回车即可
When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: No # 你的rgw是否是https,如果不是设置为No
On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name: # 代理服务器的地址,我并没有配置代理服务器,因此直接回车即可
New settings: # 注意,下面的信息是上面咱们填写时一个总的预览信息
Access Key: ZHOE7MVPLJFE5EIU738W
Secret Key: VUNbdDwAGIq9AZv5f55e2gzptK1PUOnWg9nc44pE
Default Region: US
S3 Endpoint: www.yinzhengjie.com
DNS-style bucket+hostname:port template for accessing a bucket: www.yinzhengjie.com/%(bucket)
Encryption password:
Path to GPG program: /usr/bin/gpg
Use HTTPS protocol: False
HTTP Proxy server name:
HTTP Proxy server port: 0
Test access with supplied credentials? [Y/n] Y # 如果确认上述信息没问题的话,则输入字母Y即可。
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)
Now verifying that encryption works...
Not configured. Never mind.
Save settings? [y/N] y # 是否保存配置,我们输入y,默认是不保存配置的。
Configuration saved to '/root/.s3cfg'
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# ll /root/.s3cfg
-rw------- 1 root root 2269 Aug 23 09:59 /root/.s3cfg
[root@ceph141 ~]#
4 创建buckets
[root@ceph141 ~]# s3cmd mb s3://yinzhengjie-bucket
Bucket 's3://yinzhengjie-bucket/' created
[root@ceph141 ~]#
温馨提示:
通用存储桶命名规则,以下命名规则适用于通用存储桶。
- 1存储桶名称必须介于 3(最少)到 63(最多)个字符之间。
- 2.存储桶名称只能由小写字母、数字、句点(.)和连字符(-)组成。
- 3.存储桶名称必须以字母或数字开头和结尾。
- 4.存储桶名称不得包含两个相邻的句点。
- 5.存储桶名称不得采用 IP 地址格式(例如,192.168.5.4)。
- 6.存储桶名称不得以前缀 xn-- 开头。
- 7.存储桶名称不得以前缀 sthree- 开头。
- 8.存储桶名称不得以前缀 sthree-configurator 开头。
- 9.存储桶名称不得以前缀 amzn-s3-demo- 开头。
- 10.存储桶名称不得以后缀 -s3alias 结尾。此后缀是为接入点别名预留的。有关更多信息,请参阅 为您的 S3 存储桶接入点使用存储桶式别名。
- 11.存储桶名称不得以后缀 --ol-s3 结尾。此后缀是为对象 Lambda 接入点别名预留的。有关更多信息,请参阅 如何为您的 S3 存储桶对象 Lambda 接入点使用存储桶式别名。
- 12.存储桶名称不得以后缀 .mrap 结尾。此后缀预留用于多区域接入点名称。有关更多信息,请参阅 命名 Amazon S3 多区域接入点的规则。
- 13.存储桶名称不得以后缀 --x-s3 结尾。此后缀预留用于目录存储桶。有关更多信息,请参阅 目录存储桶命名规则。
- 14.存储桶名称在分区内所有 AWS 区域中的所有 AWS 账户间必须是唯一的。分区是一组区域。AWS 目前有三个分区:aws(标准区域)、aws-cn(中国区域)和 aws-us-gov(AWS GovCloud (US))。
- 15.存储桶名称不能被同一分区中的另一个 AWS 账户使用,直到存储桶被删除。
- 16.与 Amazon S3 Transfer Acceleration 一起使用的存储桶名称中不能有句点(.)。
为了获得最佳兼容性,我们建议您避免在存储桶名称中使用句点(.),但仅用于静态网站托管的存储桶除外。如果您在存储桶名称中包含句点,则无法通过 HTTPS 使用虚拟主机式寻址,除非您执行自己的证书验证。这是因为用于存储桶虚拟托管的安全证书不适用于名称中带有句点的存储桶。
此限制不会影响用于静态网站托管的存储桶,因为静态网站托管只能通过 HTTP 提供。有关虚拟主机式寻址的更多信息,请参阅存储桶的虚拟托管。有关静态网站托管的更多信息,请参阅使用 Amazon S3 托管静态网站。
参考链接:
https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/userguide/bucketnamingrules.html
5.查看buckets
[root@ceph141 ~]# s3cmd ls
2024-08-23 02:03 s3://yinzhengjie-bucket
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# radosgw-admin buckets list
[
"yinzhengjie-bucket"
]
[root@ceph141 ~]#
6.使用s3cmd上传数据到buckets
[root@ceph141 ~]# ll 01-昨日内容回顾及今日内容预告.mp4
-rw-r--r-- 1 root root 36084548 Aug 23 10:06 01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]#
[root@ceph141 ~]# s3cmd put 01-昨日内容回顾及今日内容预告.mp4 s3://yinzhengjie-bucket
upload: '01-昨日内容回顾及今日内容预告.mp4' -> 's3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4' [part 1 of 3, 15MB] [1 of 1]
15728640 of 15728640 100% in 3s 4.18 MB/s done
upload: '01-昨日内容回顾及今日内容预告.mp4' -> 's3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4' [part 2 of 3, 15MB] [1 of 1]
15728640 of 15728640 100% in 0s 21.34 MB/s done
upload: '01-昨日内容回顾及今日内容预告.mp4' -> 's3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4' [part 3 of 3, 4MB] [1 of 1]
4627268 of 4627268 100% in 0s 23.26 MB/s done
[root@ceph141 ~]#
[root@ceph141 ~]# echo 15728640+15728640+4627268 | bc # 很明显,上面在上传视频的时候把文件拆成了3个部分,上传的总大小是一致的。
36084548
[root@ceph141 ~]#
温馨提示:
如上所示,对于一个大的RGW Object,会被切割成多个独立的RGW Object上传,称为"multipart",“multipart”的优势是断点续传。s3接口默认切割大小为15MB。
7 使用s3cmd下载数据
[root@ceph141 ~]# ll 01-昨日内容回顾及今日内容预告.mp4
-rw-r--r-- 1 root root 36084548 Aug 23 10:06 01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# s3cmd get s3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4 /tmp/
download: 's3://yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4' -> '/tmp/01-昨日内容回顾及今日内容预告.mp4' [1 of 1]
36084548 of 36084548 100% in 0s 106.15 MB/s done
[root@ceph141 ~]#
[root@ceph141 ~]# ll /tmp/01-昨日内容回顾及今日内容预告.mp4
-rw-r--r-- 1 root root 36084548 Aug 23 02:07 /tmp/01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]#
[root@ceph141 ~]# md5sum 01-昨日内容回顾及今日内容预告.mp4 /tmp/01-昨日内容回顾及今日内容预告.mp4
fc7be02a17330902eff0214616bd6312 01-昨日内容回顾及今日内容预告.mp4
fc7be02a17330902eff0214616bd6312 /tmp/01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]#
[root@ceph141 ~]# diff 01-昨日内容回顾及今日内容预告.mp4 /tmp/01-昨日内容回顾及今日内容预告.mp4
[root@ceph141 ~]#
8 授权策略
[root@ceph141 ~]# cat yinzhengjie-anonymous-access-policy.json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["*"]},
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::yinzhengjie-bucket/*"
]
}]
}
[root@ceph141 ~]#
[root@ceph141 ~]# s3cmd setpolicy yinzhengjie-anonymous-access-policy.json s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/: Policy updated
[root@ceph141 ~]#
[root@ceph141 ~]# s3cmd info s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/ (bucket):
Location: default
Payer: BucketOwner
Expiration Rule: none
Policy: {
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["*"]},
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::yinzhengjie-bucket/*"
]
}]
}
CORS: none
ACL: 尹正杰: FULL_CONTROL
[root@ceph141 ~]#
9.基于http方式访问对象存储
http://10.0.0.142/yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4
温馨提示:
- 1.对于对象存储网关而言,我们需要将"www.yinzhengjie.com"解析到ceph141,ceph142,ceph143的任意一个节点上;
- 2.生产环境中,建议在rgw设备前加一个负载均衡器,以防止后端rgw宕机的情况,以减少单点故障的问题;
- 3.在使用http方式访问对象存储的时候,我们需要注意以下事项:
- 3.1 资源对象的访问方式
http还是https,依赖于rgw的基本配置。
- 3.2 资源对象的访问控制
通过定制策略的方式来实现。
- 3.3 资源对象的跨域问题
通过定义cors的方式来实现。
- 3.4 资源对象在浏览器端端缓存机制
rgw端基本配置定制。
10 删除策略
[root@ceph141 ~]# s3cmd info s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/ (bucket):
Location: default
Payer: BucketOwner
Expiration Rule: none
Policy: {
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["*"]},
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::yinzhengjie-bucket/*"
]
}]
}
CORS: none
ACL: 尹正杰: FULL_CONTROL
[root@ceph141 ~]#
[root@ceph141 ~]# s3cmd delpolicy s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/: Policy deleted
[root@ceph141 ~]#
[root@ceph141 ~]# s3cmd info s3://yinzhengjie-bucket
s3://yinzhengjie-bucket/ (bucket):
Location: default
Payer: BucketOwner
Expiration Rule: none
Policy: none
CORS: none
ACL: 尹正杰: FULL_CONTROL
[root@ceph141 ~]#
请思考:
可道云之前使用阿里云的对象存储,那么现在是否可以使用ceph的对象存储来替代呢?将他封装成一个项目吧!
11.再次访问测试,发现无法访问
访问URL:
http://10.0.0.142/yinzhengjie-bucket/01-昨日内容回顾及今日内容预告.mp4
返回响应:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message/>
<BucketName>yinzhengjie-bucket</BucketName>
<RequestId>tx00000d1b139914db38023-0066c7f24c-fc49-default</RequestId>
<HostId>fc49-default-default</HostId>
</Error>
12 其他使用技巧
其实s3cmd还支持存储桶和文件的其他管理方式,如果需要,自行参考帮助信息即可。
具体实操可以课堂演示下。
[root@ceph141 ~]# s3cmd -h
Usage: s3cmd [options] COMMAND [parameters]
S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.
Options:
-h, --help show this help message and exit
--configure Invoke interactive (re)configuration tool. Optionally
use as '--configure s3://some-bucket' to test access
to a specific bucket instead of attempting to list
them all.
-c FILE, --config=FILE
Config file name. Defaults to $HOME/.s3cfg
--dump-config Dump current configuration after parsing config files
and command line options and exit.
--access_key=ACCESS_KEY
AWS Access Key
--secret_key=SECRET_KEY
AWS Secret Key
--access_token=ACCESS_TOKEN
AWS Access Token
-n, --dry-run Only show what should be uploaded or downloaded but
don't actually do it. May still perform S3 requests to
get bucket listings and other information though (only
for file transfer commands)
-s, --ssl Use HTTPS connection when communicating with S3.
(default)
--no-ssl Don't use HTTPS.
-e, --encrypt Encrypt files before uploading to S3.
--no-encrypt Don't encrypt files.
-f, --force Force overwrite and other dangerous operations.
--continue Continue getting a partially downloaded file (only for
[get] command).
--continue-put Continue uploading partially uploaded files or
multipart upload parts. Restarts parts/files that
don't have matching size and md5. Skips files/parts
that do. Note: md5sum checks are not always
sufficient to check (part) file equality. Enable this
at your own risk.
--upload-id=UPLOAD_ID
UploadId for Multipart Upload, in case you want
continue an existing upload (equivalent to --continue-
put) and there are multiple partial uploads. Use
s3cmd multipart [URI] to see what UploadIds are
associated with the given URI.
--skip-existing Skip over files that exist at the destination (only
for [get] and [sync] commands).
-r, --recursive Recursive upload, download or removal.
--check-md5 Check MD5 sums when comparing files for [sync].
(default)
--no-check-md5 Do not check MD5 sums when comparing files for [sync].
Only size will be compared. May significantly speed up
transfer but may also miss some changed files.
-P, --acl-public Store objects with ACL allowing read for anyone.
--acl-private Store objects with default ACL allowing access for you
only.
--acl-grant=PERMISSION:EMAIL or USER_CANONICAL_ID
Grant stated permission to a given amazon user.
Permission is one of: read, write, read_acp,
write_acp, full_control, all
--acl-revoke=PERMISSION:USER_CANONICAL_ID
Revoke stated permission for a given amazon user.
Permission is one of: read, write, read_acp,
write_acp, full_control, all
-D NUM, --restore-days=NUM
Number of days to keep restored file available (only
for 'restore' command). Default is 1 day.
--restore-priority=RESTORE_PRIORITY
Priority for restoring files from S3 Glacier (only for
'restore' command). Choices available: bulk, standard,
expedited
--delete-removed Delete destination objects with no corresponding
source file [sync]
--no-delete-removed Don't delete destination objects [sync]
--delete-after Perform deletes AFTER new uploads when delete-removed
is enabled [sync]
--delay-updates *OBSOLETE* Put all updated files into place at end
[sync]
--max-delete=NUM Do not delete more than NUM files. [del] and [sync]
--limit=NUM Limit number of objects returned in the response body
(only for [ls] and [la] commands)
--add-destination=ADDITIONAL_DESTINATIONS
Additional destination for parallel uploads, in
addition to last arg. May be repeated.
--delete-after-fetch Delete remote objects after fetching to local file
(only for [get] and [sync] commands).
-p, --preserve Preserve filesystem attributes (mode, ownership,
timestamps). Default for [sync] command.
--no-preserve Don't store FS attributes
--exclude=GLOB Filenames and paths matching GLOB will be excluded
from sync
--exclude-from=FILE Read --exclude GLOBs from FILE
--rexclude=REGEXP Filenames and paths matching REGEXP (regular
expression) will be excluded from sync
--rexclude-from=FILE Read --rexclude REGEXPs from FILE
--include=GLOB Filenames and paths matching GLOB will be included
even if previously excluded by one of
--(r)exclude(-from) patterns
--include-from=FILE Read --include GLOBs from FILE
--rinclude=REGEXP Same as --include but uses REGEXP (regular expression)
instead of GLOB
--rinclude-from=FILE Read --rinclude REGEXPs from FILE
--files-from=FILE Read list of source-file names from FILE. Use - to
read from stdin.
--region=REGION, --bucket-location=REGION
Region to create bucket in. As of now the regions are:
us-east-1, us-west-1, us-west-2, eu-west-1, eu-
central-1, ap-northeast-1, ap-southeast-1, ap-
southeast-2, sa-east-1
--host=HOSTNAME HOSTNAME:PORT for S3 endpoint (default:
s3.amazonaws.com, alternatives such as s3-eu-
west-1.amazonaws.com). You should also set --host-
bucket.
--host-bucket=HOST_BUCKET
DNS-style bucket+hostname:port template for accessing
a bucket (default: %(bucket)s.s3.amazonaws.com)
--reduced-redundancy, --rr
Store object with 'Reduced redundancy'. Lower per-GB
price. [put, cp, mv]
--no-reduced-redundancy, --no-rr
Store object without 'Reduced redundancy'. Higher per-
GB price. [put, cp, mv]
--storage-class=CLASS
Store object with specified CLASS (STANDARD,
STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER
or DEEP_ARCHIVE). [put, cp, mv]
--access-logging-target-prefix=LOG_TARGET_PREFIX
Target prefix for access logs (S3 URI) (for [cfmodify]
and [accesslog] commands)
--no-access-logging Disable access logging (for [cfmodify] and [accesslog]
commands)
--default-mime-type=DEFAULT_MIME_TYPE
Default MIME-type for stored objects. Application
default is binary/octet-stream.
-M, --guess-mime-type
Guess MIME-type of files by their extension or mime
magic. Fall back to default MIME-Type as specified by
--default-mime-type option
--no-guess-mime-type Don't guess MIME-type and use the default type
instead.
--no-mime-magic Don't use mime magic when guessing MIME-type.
-m MIME/TYPE, --mime-type=MIME/TYPE
Force MIME-type. Override both --default-mime-type and
--guess-mime-type.
--add-header=NAME:VALUE
Add a given HTTP header to the upload request. Can be
used multiple times. For instance set 'Expires' or
'Cache-Control' headers (or both) using this option.
--remove-header=NAME Remove a given HTTP header. Can be used multiple
times. For instance, remove 'Expires' or 'Cache-
Control' headers (or both) using this option. [modify]
--server-side-encryption
Specifies that server-side encryption will be used
when putting objects. [put, sync, cp, modify]
--server-side-encryption-kms-id=KMS_KEY
Specifies the key id used for server-side encryption
with AWS KMS-Managed Keys (SSE-KMS) when putting
objects. [put, sync, cp, modify]
--encoding=ENCODING Override autodetected terminal and filesystem encoding
(character set). Autodetected: UTF-8
--add-encoding-exts=EXTENSIONs
Add encoding to these comma delimited extensions i.e.
(css,js,html) when uploading to S3 )
--verbatim Use the S3 name as given on the command line. No pre-
processing, encoding, etc. Use with caution!
--disable-multipart Disable multipart upload on files bigger than
--multipart-chunk-size-mb
--multipart-chunk-size-mb=SIZE
Size of each chunk of a multipart upload. Files bigger
than SIZE are automatically uploaded as multithreaded-
multipart, smaller files are uploaded using the
traditional method. SIZE is in Mega-Bytes, default
chunk size is 15MB, minimum allowed chunk size is 5MB,
maximum is 5GB.
--list-md5 Include MD5 sums in bucket listings (only for 'ls'
command).
-H, --human-readable-sizes
Print sizes in human readable form (eg 1kB instead of
1234).
--ws-index=WEBSITE_INDEX
Name of index-document (only for [ws-create] command)
--ws-error=WEBSITE_ERROR
Name of error-document (only for [ws-create] command)
--expiry-date=EXPIRY_DATE
Indicates when the expiration rule takes effect. (only
for [expire] command)
--expiry-days=EXPIRY_DAYS
Indicates the number of days after object creation the
expiration rule takes effect. (only for [expire]
command)
--expiry-prefix=EXPIRY_PREFIX
Identifying one or more objects with the prefix to
which the expiration rule applies. (only for [expire]
command)
--progress Display progress meter (default on TTY).
--no-progress Don't display progress meter (default on non-TTY).
--stats Give some file-transfer stats.
--enable Enable given CloudFront distribution (only for
[cfmodify] command)
--disable Disable given CloudFront distribution (only for
[cfmodify] command)
--cf-invalidate Invalidate the uploaded filed in CloudFront. Also see
[cfinval] command.
--cf-invalidate-default-index
When using Custom Origin and S3 static website,
invalidate the default index file.
--cf-no-invalidate-default-index-root
When using Custom Origin and S3 static website, don't
invalidate the path to the default index file.
--cf-add-cname=CNAME Add given CNAME to a CloudFront distribution (only for
[cfcreate] and [cfmodify] commands)
--cf-remove-cname=CNAME
Remove given CNAME from a CloudFront distribution
(only for [cfmodify] command)
--cf-comment=COMMENT Set COMMENT for a given CloudFront distribution (only
for [cfcreate] and [cfmodify] commands)
--cf-default-root-object=DEFAULT_ROOT_OBJECT
Set the default root object to return when no object
is specified in the URL. Use a relative path, i.e.
default/index.html instead of /default/index.html or
s3://bucket/default/index.html (only for [cfcreate]
and [cfmodify] commands)
-v, --verbose Enable verbose output.
-d, --debug Enable debug output.
--version Show s3cmd version (2.2.0) and exit.
-F, --follow-symlinks
Follow symbolic links as if they are regular files
--cache-file=FILE Cache FILE containing local source MD5 values
-q, --quiet Silence output on stdout
--ca-certs=CA_CERTS_FILE
Path to SSL CA certificate FILE (instead of system
default)
--ssl-cert=SSL_CLIENT_CERT_FILE
Path to client own SSL certificate CRT_FILE
--ssl-key=SSL_CLIENT_KEY_FILE
Path to client own SSL certificate private key
KEY_FILE
--check-certificate Check SSL certificate validity
--no-check-certificate
Do not check SSL certificate validity
--check-hostname Check SSL certificate hostname validity
--no-check-hostname Do not check SSL certificate hostname validity
--signature-v2 Use AWS Signature version 2 instead of newer signature
methods. Helpful for S3-like systems that don't have
AWS Signature v4 yet.
--limit-rate=LIMITRATE
Limit the upload or download speed to amount bytes per
second. Amount may be expressed in bytes, kilobytes
with the k suffix, or megabytes with the m suffix
--no-connection-pooling
Disable connection re-use
--requester-pays Set the REQUESTER PAYS flag for operations
-l, --long-listing Produce long listing [ls]
--stop-on-error stop if error in transfer
--content-disposition=CONTENT_DISPOSITION
Provide a Content-Disposition for signed URLs, e.g.,
"inline; filename=myvideo.mp4"
--content-type=CONTENT_TYPE
Provide a Content-Type for signed URLs, e.g.,
"video/mp4"
Commands:
Make bucket
s3cmd mb s3://BUCKET
Remove bucket
s3cmd rb s3://BUCKET
List objects or buckets
s3cmd ls [s3://BUCKET[/PREFIX]]
List all object in all buckets
s3cmd la
Put file into bucket
s3cmd put FILE [FILE...] s3://BUCKET[/PREFIX]
Get file from bucket
s3cmd get s3://BUCKET/OBJECT LOCAL_FILE
Delete file from bucket
s3cmd del s3://BUCKET/OBJECT
Delete file from bucket (alias for del)
s3cmd rm s3://BUCKET/OBJECT
Restore file from Glacier storage
s3cmd restore s3://BUCKET/OBJECT
Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below)
s3cmd sync LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR or s3://BUCKET[/PREFIX] s3://BUCKET[/PREFIX]
Disk usage by buckets
s3cmd du [s3://BUCKET[/PREFIX]]
Get various information about Buckets or Files
s3cmd info s3://BUCKET[/OBJECT]
Copy object
s3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
Modify object metadata
s3cmd modify s3://BUCKET1/OBJECT
Move object
s3cmd mv s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
Modify Access control list for Bucket or Files
s3cmd setacl s3://BUCKET[/OBJECT]
Modify Bucket Policy
s3cmd setpolicy FILE s3://BUCKET
Delete Bucket Policy
s3cmd delpolicy s3://BUCKET
Modify Bucket CORS
s3cmd setcors FILE s3://BUCKET
Delete Bucket CORS
s3cmd delcors s3://BUCKET
Modify Bucket Requester Pays policy
s3cmd payer s3://BUCKET
Show multipart uploads
s3cmd multipart s3://BUCKET [Id]
Abort a multipart upload
s3cmd abortmp s3://BUCKET/OBJECT Id
List parts of a multipart upload
s3cmd listmp s3://BUCKET/OBJECT Id
Enable/disable bucket access logging
s3cmd accesslog s3://BUCKET
Sign arbitrary string using the secret key
s3cmd sign STRING-TO-SIGN
Sign an S3 URL to provide limited public access with expiry
s3cmd signurl s3://BUCKET/OBJECT <expiry_epoch|+expiry_offset>
Fix invalid file names in a bucket
s3cmd fixbucket s3://BUCKET[/PREFIX]
Create Website from bucket
s3cmd ws-create s3://BUCKET
Delete Website
s3cmd ws-delete s3://BUCKET
Info about Website
s3cmd ws-info s3://BUCKET
Set or delete expiration rule for the bucket
s3cmd expire s3://BUCKET
Upload a lifecycle policy for the bucket
s3cmd setlifecycle FILE s3://BUCKET
Get a lifecycle policy for the bucket
s3cmd getlifecycle s3://BUCKET
Remove a lifecycle policy for the bucket
s3cmd dellifecycle s3://BUCKET
List CloudFront distribution points
s3cmd cflist
Display CloudFront distribution point parameters
s3cmd cfinfo [cf://DIST_ID]
Create CloudFront distribution point
s3cmd cfcreate s3://BUCKET
Delete CloudFront distribution point
s3cmd cfdelete cf://DIST_ID
Change CloudFront distribution point parameters
s3cmd cfmodify cf://DIST_ID
Display CloudFront invalidation request(s) status
s3cmd cfinvalinfo cf://DIST_ID[/INVAL_ID]
For more information, updates and news, visit the s3cmd website:
http://s3tools.org
[root@ceph141 ~]#