DataWorks如何复用oss写orc或parputt格式的文件?
目前通过复用HDFS Writer的方式完成OSS写ORC或Parquet格式的文件。在OSS Writer已有参数的基础上,增加了Path、FileFormat等扩展配置参数,参数含义请参见HDFS Writer。ORC或Parquet文件写入OSS的示例如下:以ORC文件格式写入OSS。 {"stepType": "oss","parameter": {"datasource": "","fileFormat": "orc","path": "/tests/case61","fileName": "orc","writeMode": "append","column": [{"name": "col1","type": "BIGINT"},{"name": "col2","type": "DOUBLE"},{"name": "col3","type": "STRING"}],"writeMode": "append","fieldDelimiter": "\t","compress": "NONE","encoding": "UTF-8"}}以Parquet文件格式写入OSS,示例如下。 {"stepType": "oss","parameter": {"datasource": "","fileFormat": "parquet","path": "/tests/case61","fileName": "test","writeMode": "append","fieldDelimiter": "\t","compress": "SNAPPY","encoding": "UTF-8","parquetSchema": "message test { required int64 int64_col;required binary str_col (UTF8);required group params (MAP) {repeated group key_value {required binary key (UTF8);required binary value (UTF8);}}required group params_arr (LIST) {repeated group list {required binary element (UTF8);}}req https://help.aliyun.com/document_detail/137765.html,此回答整理自钉群“DataWorks交流群(答疑@机器人)”
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
DataWorks基于MaxCompute/Hologres/EMR/CDP等大数据引擎,为数据仓库/数据湖/湖仓一体等解决方案提供统一的全链路大数据开发治理平台。