通过客户端使用MaxCompute Quick Start-阿里云开发者社区

通过客户端使用MaxCompute Quick Start

2022-08-03 1152

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： MaxCompute（ODPS）是适用于数据分析场景的企业级SaaS（Software as a Service）模式云数据仓库，以Serverless架构提供快速、全托管的在线数据仓库服务，消除了传统数据平台在资源扩展性和弹性方面的限制，最小化用户运维投入，使您可以经济并高效地分析处理海量数据，有效降低企业成本，并保障数据安全。可基于DataWorks实现一站式的数据同步、业务流程设计、数据开发、管理和运维功能。也可基于机器学习平台的算法组件实现对MaxCompute数据进行模型训练等操作。本文将演示通过客户端和Datawork快速使用MaxCompute做数据的导入导出操作以供参考。

使用前提与环境准备：服务开通与购买

阿里云MaxCompute开通：开通MaxCompute
阿里云DataWorks开通：开通DataWorks
开通MaxCompute具体文档参考：准备工作
开通DataWorks具体文档参考：准备工作

Step By Step

操作流程
1. 安装并配置MaxCompute客户端：下载MaxCompute客户端安装包（Github）
2. 登录MaxCompute客户端
3. 登录DataWorks控制台单击相应工作空间后的进入数据开发
4. 在DataWorks数据开发页面创建表
5. 通过MaxCompute客户端使用Tunnel Upload导入数据
6. DataWorks确认导入结果
7. 在MaxCompute客户端，执行Tunnel Download命令导出数据
8. 确认导出结果
1.安装并配置MaxCompute客户端：解压下载的安装包文件，得到bin、conf、lib和plugins文件夹，进入conf文件夹，配置odps_config.ini文件。odps_config.ini文件内容如下

project_name=创建的MaxCompute项目名称
access_id=阿里云账号或RAM用户的AccessKey ID
access_key=AccessKey ID对应的AccessKey Secret
end_point=MaxCompute服务的连接地址
log_view_host=http://logview.odps.aliyun.com
https_check=
# confirm threshold for query input size(unit: GB)
data_size_confirm=
# this url is for odpscmd update
update_url=
# download sql results by instance tunnel
use_instance_tunnel=
# the max records when download sql results by instance tunnel
instance_tunnel_max_record=
# IMPORTANT:
#   If leaving tunnel_endpoint untouched, console will try to automatically get one from odps service, which might charge networking fees in some cases.
#   Please refer to Endpoint
# tunnel_endpoint=

# use set.<key>=
# e.g. set.odps.sql.select.output.format=

2.登录MaxCompute客户端：在MaxCompute客户端安装路径下的bin文件夹中，双击odpscmd.bat文件，即可启动MaxCompute客户端

3.登录DataWorks控制台单击相应工作空间后的进入数据开发

4.在DataWorks数据开发页面创建表

创建表bank_data、bank_data_pt、result_table1和result_table2
- 创建非分区表bank_data，命令示例如下

create table if not exists bank_data
(
 age             BIGINT comment '年龄',
 job             STRING comment '工作类型',
 marital         STRING comment '婚否',
 education       STRING comment '教育程度',
 credit          STRING comment '是否有信用卡',
 housing         STRING comment '是否有房贷',
 loan            STRING comment '是否有贷款',
 contact         STRING comment '联系方式',
 month           STRING comment '月份',
 day_of_week     STRING comment '星期几',
 duration        STRING comment '持续时间',
 campaign        BIGINT comment '本次活动联系的次数',
 pdays           DOUBLE comment '与上一次联系的时间间隔',
 previous        DOUBLE comment '之前与客户联系的次数',
 poutcome        STRING comment '之前市场活动的结果',
 emp_var_rate    DOUBLE comment '就业变化速率',
 cons_price_idx  DOUBLE comment '消费者物价指数',
 cons_conf_idx   DOUBLE comment '消费者信心指数',
 euribor3m       DOUBLE comment '欧元存款利率',
 nr_employed     DOUBLE comment '职工人数',
 fixed_deposit   BIGINT comment '是否有定期存款'
);

创建分区表bank_data_pt，并添加分区，命令示例如下

create table if not exists bank_data_pt
(
 age             BIGINT comment '年龄',
 job             STRING comment '工作类型',
 marital         STRING comment '婚否',
 education       STRING comment '教育程度',
 housing         STRING comment '是否有房贷',
 loan            STRING comment '是否有贷款',
 contact         STRING comment '联系方式',
 month           STRING comment '月份',
 day_of_week     STRING comment '星期几',
 duration        STRING comment '持续时间',
 campaign        BIGINT comment '本次活动联系的次数',
 pdays           DOUBLE comment '与上一次联系的时间间隔',
 previous        DOUBLE comment '之前与客户联系的次数',
 poutcome        STRING comment '之前市场活动的结果',
 emp_var_rate    DOUBLE comment '就业变化速率',
 cons_price_idx  DOUBLE comment '消费者物价指数',
 cons_conf_idx   DOUBLE comment '消费者信心指数',
 euribor3m       DOUBLE comment '欧元存款利率',
 nr_employed     DOUBLE comment '职工人数',
 fixed_deposit   BIGINT comment '是否有定期存款'
)partitioned by (credit STRING comment '是否有信用卡');

alter table bank_data_pt add if not exists partition (credit='yes') partition (credit='no') partition (credit='unknown');

创建非分区表result_table1，命令示例如下

create table if not exists result_table1
(
 education   STRING comment '教育程度',
 num         BIGINT comment '人数'
);

创建非分区表result_table2，命令示例如下

create table if not exists result_table2
(
 education   STRING comment '教育程度',
 num         BIGINT comment '人数',
 credit      STRING comment '是否有信用卡'
);

执行如下命令确认表已在MaxCompute项目中

show tables;

返回结果

执行如下命令确认表的结构正确无误

--查看bank_data表结构。
desc bank_data;
--查看bank_data_pt表结构。
desc bank_data_pt;
--查看bank_data_pt的分区。
show partitions bank_data_pt;
--查看result_table1表结构。
desc result_table1;
--查看result_table2表结构。
desc result_table2;

返回结果

5.通过MaxCompute客户端使用Tunnel Upload导入数据
- 导入非分区表的数据文件：banking.txt
- 导入分区表的数据文件：banking_nocreditcard.csv、banking_uncreditcard.csv、banking_yescreditcard.csv
- 1.确认数据文件的保存路径：数据文件的保存路径有两种选择：您可以将文件直接归档至MaxCompute客户端的bin目录中，上传路径为文件名.后缀名；也可以将文件归档至其他路径下，例如D盘的test文件夹，上传路径为D:\test\文件名.后缀名。假设，本文中的示例数据文件banking.txt保存在MaxCompute客户端的bin目录中，banking_yescreditcard.csv、banking_uncreditcard.csv和banking_nocreditcard.csv保存在D盘的test文件夹下。
- 2.执行Tunnel Upload命令导入数据

tunnel upload banking.txt bank_data;
tunnel upload D:\test\banking_yescreditcard.csv bank_data_pt/credit="yes";
tunnel upload D:\test\banking_uncreditcard.csv bank_data_pt/credit="unknown";
tunnel upload D:\test\banking_nocreditcard.csv bank_data_pt/credit="no";

导入结果

6.DataWorks确认导入结果
- 导入数据后，您需要查看导入的目标表与数据文件中的数据条数是否一致，确认所有数据均已成功导入。本文中的示例数据文件banking.txt中有41188条数据，banking_yescreditcard.csv、banking_uncreditcard.csv和banking_nocreditcard.csv分别有3、8597、32588条数据。
- 命令示例如下

select count(*) as num1 from bank_data;
select count(*) as num2 from bank_data_pt where credit="yes";
select count(*) as num3 from bank_data_pt where credit="unknown";
select count(*) as num4 from bank_data_pt where credit="no";

查看返回结果

1. 在MaxCompute客户端，执行Tunnel Download命令导出数据
- 1.在DataWorks基于非分区表bank_data和分区表bank_data_pt，查询各个学历下的贷款买房的单身人士数量，并将结果分别保存到result_table1和result_table2中。
  命令示例如下

--查询非分区表bank_data中各个学历下的贷款买房的单身人士数量并将查询结果写入result_table1。
insert overwrite table result_table1
select education, count(marital) as num
from bank_data
where housing = 'yes' and marital = 'single'
group by education;

--查询分区表bank_data_pt中各个学历下的贷款买房的单身人士数量并将查询结果写入result_table2。
set odps.sql.allow.fullscan=true;
insert overwrite table result_table2 
select education, count(marital) as num, credit 
from bank_data_pt 
where housing = 'yes' and marital = 'single'
group by education, credit;

2.查询result_table1和result_table2的写入结果

select * from result_table1;
select * from result_table2;

查看结果

3.基于Tunnel Download将MaxCompute表中的数据导出到本地：在MaxCompute客户端，执行Tunnel Download命令导出数据
命令示例如下

tunnel download result_table1 result_table1.txt;
tunnel download result_table2 D:\test\result_table2.csv;

8.确认导出结果
- result_table1的导出结果确认

result_table2的导出结果确认

通过客户端使用MaxCompute Quick Start

使用前提与环境准备：服务开通与购买

Step By Step

更多参考

云服务技术课堂

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

通过客户端使用MaxCompute Quick Start

使用前提与环境准备：服务开通与购买

Step By Step

更多参考

云服务技术课堂

热门文章

最新文章

相关课程

相关电子书