1 动手实战-基于EMR离线数据分析
1.1 创建资源,连接EMR集群
场景申请到的资源
登陆ram子账号,找到主节点公网IP地址
连接EMR集群,场景中的终端操作起来不太方便,使用本地putty终端也可以连接到主节点,完成后面的操作。
1.2 导入数据至EMR集群
在HDFS上创建目录,将编辑的文件放到HFDS文件系统上
[root@emr-header-1 ~]hdfs dfs -mkdir -p /data/student
[root@emr-header-1 ~]vim u.txt
[root@emr-header-1 ~] hdfs dfs -put u.txt /data/student
显示放入的文件和文件内容
[root@emr-header-1 ~]# hdfs dfs -ls /data/studentFound 1 items -rw-r-----2 root hadoop 23912022-02-2809:30 /data/student/u.txt [root@emr-header-1 ~]# hdfs dfs -cat /data/student/u.txt1962423881250949186302389171774222377187888711624451288060692316634618863975962984744884182806115265288117148825346558916284673054513886324817
登陆hive,创建表,导入数据
[root@emr-header-1 ~]# hiveLogging initialized using configuration in file:/etc/ecm/hive-conf-2.3.2-1.0.1/hive-log4j2.properties Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> CREATE TABLE emrusers ( userid INT, movieid INT, rating INT, unixtime STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ; OK Time taken: 1.053 seconds hive> LOAD DATA INPATH '/data/student/u.txt' INTO TABLE emrusers; Loading data to table default.emrusers OK Time taken: 0.459 seconds
1.3 查询表,在表上运行统计分析sql语句
查看表的前五行数据,sql语句被转成了map-reduce任务,花费的时间较长。
hive>select*from emrusers limit5; OK 196242388125094918630238917177422237718788871162445128806069231663461886397596Time taken:0.069 seconds, Fetched:5 row(s)
查询表的总行数,sql语句被转成了map-reduce任务,花费的时间较长。
hive>selectcount(*)from emrusers; WARNING: Hive-on-MR is deprecated in Hive 2and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez)or using Hive 1.X releases. Query ID = root_20220228110103_9aec542e-2d15-49de-b0fe-388ee617b755 Total jobs =1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time:1Inorder to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>Inorder to limit the maximum number of reducers:set hive.exec.reducers.max=<number>Inorder to set a constant number of reducers:set mapreduce.job.reduces=<number> Starting Job = job_1646010854736_0005, Tracking URL = http://emr-header-1.cluster-286405:20888/proxy/application_1646010854736_0005/ Kill Command =/usr/lib/hadoop-current/bin/hadoop job -kill job_1646010854736_0005 Hadoop job information for Stage-1: number of mappers:1; number of reducers:12022-02-2811:01:11,438 Stage-1 map =0%, reduce =0%2022-02-2811:01:16,722 Stage-1 map =100%, reduce =0%, Cumulative CPU 0.99 sec 2022-02-2811:01:22,891 Stage-1 map =100%, reduce =100%, Cumulative CPU 2.28 sec MapReduce Total cumulative CPU time:2 seconds 280 msec Ended Job = job_1646010854736_0005 MapReduce Jobs Launched: Stage-Stage-1: Map:1 Reduce:1 Cumulative CPU:2.28 sec HDFS Read:10079 HDFS Write:103 SUCCESS Total MapReduce CPU Time Spent:2 seconds 280 msec OK 106Time taken:20.893 seconds, Fetched:1 row(s)
查询数据表中评级最高的三个电影,sql语句被转成了map-reduce任务,花费的时间较长。
hive>select movieid,sum(rating)as rat from emrusers groupby movieid orderby rat desclimit3; WARNING: Hive-on-MR is deprecated in Hive 2and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez)or using Hive 1.X releases. Query ID = root_20220228110213_6733e92a-00ed-4d71-b289-5be55aaa26af Total jobs =2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size:1Inorder to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>Inorder to limit the maximum number of reducers:set hive.exec.reducers.max=<number>Inorder to set a constant number of reducers:set mapreduce.job.reduces=<number> Starting Job = job_1646010854736_0006, Tracking URL = http://emr-header-1.cluster-286405:20888/proxy/application_1646010854736_0006/ Kill Command =/usr/lib/hadoop-current/bin/hadoop job -kill job_1646010854736_0006 Hadoop job information for Stage-1: number of mappers:1; number of reducers:12022-02-2811:02:21,418 Stage-1 map =0%, reduce =0%2022-02-2811:02:25,532 Stage-1 map =100%, reduce =0%, Cumulative CPU 1.0 sec 2022-02-2811:02:30,664 Stage-1 map =100%, reduce =100%, Cumulative CPU 2.0 sec MapReduce Total cumulative CPU time:2 seconds 0 msec Ended Job = job_1646010854736_0006 Launching Job 2 out of 2 Number of reduce tasks determined at compile time:1Inorder to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>Inorder to limit the maximum number of reducers:set hive.exec.reducers.max=<number>Inorder to set a constant number of reducers:set mapreduce.job.reduces=<number> Starting Job = job_1646010854736_0007, Tracking URL = http://emr-header-1.cluster-286405:20888/proxy/application_1646010854736_0007/ Kill Command =/usr/lib/hadoop-current/bin/hadoop job -kill job_1646010854736_0007 Hadoop job information for Stage-2: number of mappers:1; number of reducers:12022-02-2811:02:38,922 Stage-2 map =0%, reduce =0%2022-02-2811:02:43,038 Stage-2 map =100%, reduce =0%, Cumulative CPU 1.12 sec 2022-02-2811:02:48,162 Stage-2 map =100%, reduce =100%, Cumulative CPU 2.14 sec MapReduce Total cumulative CPU time:2 seconds 140 msec Ended Job = job_1646010854736_0007 MapReduce Jobs Launched: Stage-Stage-1: Map:1 Reduce:1 Cumulative CPU:2.0 sec HDFS Read:9642 HDFS Write:2131 SUCCESS Stage-Stage-2: Map:1 Reduce:1 Cumulative CPU:2.14 sec HDFS Read:7869 HDFS Write:143 SUCCESS Total MapReduce CPU Time Spent:4 seconds 140 msec OK 14413274103049Time taken:36.114 seconds, Fetched:3 row(s)
2 动手实战-使用阿里云Elasticsearch快速搭建智能运维系统
2.1 申请资源,登录Elasticsearch集群
场景申请到的资源如下
登录子账号能看到三个Elasticsearch集群
核对一下,本次体验申请到的资源应该是es-cn-jpy7 开头的集群
修改Kibana配置,打开私网访问,从公网访问kibana。
2.2 开启自动创建索引功能
这一步比较坑的是dev工具在左侧导航栏的最下面,不知这个导航栏是以什么顺序排列的。
2.3 创建metricbeat采集器
选择ecs实例后,启动采集器
查看采集器状态
启动器状态为已生效
一共创建了3个采集器,只有一个成功运行,状态为已生效0/1的采集器其实部署是失败的。
查看dashboard
可以看到ECS的进程数,cpu、系统负载等。
2.4 总结
这个场景有一定难度,不知为啥场景中出现了多个Elasticsearch集群,对于采集器来说只能创建,删除和重启时都提示权限不够,创建的采集器有2个部署失败,体验手册中也没有给出分析和解决办法。
3 推荐系统入门之使用协同过滤实现商品推荐
这个场景除了需要因为版本变化需要切换到旧版本之外,其它同体验手册完全相同,甚至数据和结果也和体验手册完全一致。
打开实验
检查数据
运行实验
运行完成
检查join-1 节点结果,显示相似条目
查看全表统计-1 .显示推荐的结果
查看全表统计-2,显示相关性。