Request&Response
Request&Response第一章 Request1. 学习目标了解Request的概念了解Request的组成部分掌握Request获取请求行的信息掌握Request获取请求头的信息掌握Request获取请求参数掌握解决请求参数乱码掌握Request域对象掌握请求转发2. 内容讲解2.1 Request概述2.1.1 Request的概念在Servlet API中,定义了一个HttpServletRequest接口,它继承自ServletRequest接口,专门用来封装HTTP请求消息。由于HTTP请求消息分为请求行、请求头和请求体三部分,因此,在HttpServletRequest接口中定义了获取请求行、请求头和请求消息体的相关方法.用我们自己的话来理解: Request就是服务器中的一个对象,该对象中封装了HTTP请求的请求行、请求头和请求体的内容2.1.2 Request的组成请求行: 包含请求方式、请求的url地址、所使用的HTTP协议的版本请求头: 一个个的键值对,每一个键值对都表示一种含义,用于客户端传递相关的信息给服务器请求体: POST请求有请求体,里面携带的是POST请求的参数,而GET请求没有请求体2.1.3 Request的作用获取HTTP请求三部分内容(行,头,体)进行请求转发跳转作为请求域对象进行存取数据2.2 Request获取HTTP请求的内容2.2.1 获取请求头的APIgetMethod() ;获取请求方式getContextPath() ; 获得当前应用上下文路径getRequestURI();获得请求地址,不带主机名getRequestURL();获得请求地址,带主机名public class ServletDemo01 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
//使用request对象获取请求行的信息:
//1. 获取请求的方式(可能会用)
String method = request.getMethod();
//System.out.println("请求方式为:" + method);;
//2. 获取请求的url: 统一资源定位符 http://localhost:8080/requestDemo/demo01
String url = request.getRequestURL().toString();
//System.out.println("此次请求的url是:" + url);
//3. 获取请求的uri: 统一资源标识符,在url的基础上省略了服务器路径"http://loaclhost:8080"
String uri = request.getRequestURI();
System.out.println(uri);
}
}2.2.2 获取请求头的APIgetHeader(String name), 根据请求头的name获取请求头的值public class ServletDemo02 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
//根据请求头的name获取value
//目标:获取name为user-agent的请求头的信息
//user-agent请求头中包含的是客户端浏览器信息
String header = request.getHeader("user-agent");
System.out.println("获取的请求头agent为:" + header);
}
}2.2.3 获取请求参数2.2.3.1 请求参数的概念请求参数是客户端携带给服务器的由键值对组成的数据,例如"username=aobama&password=123456"这种类型的数据2.2.3.2 客户端携带请求参数的形式URL地址后面附着的请求参数,例如http://localhost:8080/app/hellServlet?username=汤姆表单携带请求参数Ajax请求携带请求参数(后续会学习)2.2.3.3 获取请求参数的API方法名返回值类型方法描述request.getParameterMap()Map<String, String[]>获取当前请求的所有参数,以键值对的方式存储到Map中request.getParameter("请求参数的名字")String根据一个参数名获取一个参数值request.getParameterValues("请求参数的名字")String []根据一个参数名获取多个参数值request.getParameterNames()Enumeration获取当前请求的所有参数的参数名2.2.3.4 实例代码HTML代码<!-- 测试请求参数的表单 -->
<form action="/orange/ParamServlet" method="post">
<!-- 单行文本框 -->
<!-- input标签配合type="text"属性生成单行文本框 -->
<!-- name属性定义的是请求参数的名字 -->
<!-- 如果设置了value属性,那么这个值就是单行文本框的默认值 -->
个性签名:<input type="text" name="signal" value="单行文本框的默认值" /><br/>
<!-- 密码框 -->
<!-- input标签配合type="password"属性生成密码框 -->
<!-- 用户在密码框中填写的内容不会被一明文形式显示 -->
密码:<input type="password" name="secret" /><br/>
<!-- 单选框 -->
<!-- input标签配合type="radio"属性生成单选框 -->
<!-- name属性一致的radio会被浏览器识别为同一组单选框,同一组内只能选择一个 -->
<!-- 提交表单后,真正发送给服务器的是name属性和value属性的值 -->
<!-- 使用checked="checked"属性设置默认被选中 -->
请选择你最喜欢的季节:
<input type="radio" name="season" value="spring" />春天
<input type="radio" name="season" value="summer" checked="checked" />夏天
<input type="radio" name="season" value="autumn" />秋天
<input type="radio" name="season" value="winter" />冬天
<br/><br/>
你最喜欢的动物是:
<input type="radio" name="animal" value="tiger" />路虎
<input type="radio" name="animal" value="horse" checked="checked" />宝马
<input type="radio" name="animal" value="cheetah" />捷豹
<br/>
<!-- 多选框 -->
<!-- input标签和type="checkbox"配合生成多选框 -->
<!-- 多选框被用户选择多个并提交表单后会产生『一个名字携带多个值』的情况 -->
你最喜欢的球队是:
<input type="checkbox" name="team" value="Brazil"/>巴西
<input type="checkbox" name="team" value="German" checked="checked"/>德国
<input type="checkbox" name="team" value="France"/>法国
<input type="checkbox" name="team" value="China" checked="checked"/>中国
<input type="checkbox" name="team" value="Italian"/>意大利
<br/>
<!-- 下拉列表 -->
<!-- 使用select标签定义下拉列表整体,在select标签内设置name属性 -->
你最喜欢的运动是:
<select name="sport">
<!-- 使用option属性定义下拉列表的列表项 -->
<!-- 使用option标签的value属性设置提交给服务器的值,在option标签的标签体中设置给用户看的值 -->
<option value="swimming">游泳</option>
<option value="running">跑步</option>
<!-- 使用option标签的selected="selected"属性设置这个列表项默认被选中 -->
<option value="shooting" selected="selected">射击</option>
<option value="skating">溜冰</option>
</select>
<br/>
<br/><br/>
<!-- 表单隐藏域 -->
<!-- input标签和type="hidden"配合生成表单隐藏域 -->
<!-- 表单隐藏域在页面上不会有任何显示,用来保存要提交到服务器但是又不想让用户看到的数据 -->
<input type="hidden" name="userId" value="234654745" />
<!-- 多行文本框 -->
自我介绍:<textarea name="desc">多行文本框的默认值</textarea>
<br/>
<!-- 普通按钮 -->
<button type="button">普通按钮</button>
<!-- 重置按钮 -->
<button type="reset">重置按钮</button>
<!-- 表单提交按钮 -->
<button type="submit">提交按钮</button>
</form>Java代码protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doPost(request,response);
}
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// 获取包含全部请求参数的Map
Map<String, String[]> parameterMap = request.getParameterMap();
// 遍历这个包含全部请求参数的Map
Set<String> keySet = parameterMap.keySet();
for (String key : keySet) {
String[] values = parameterMap.get(key);
System.out.println(key + "=" + Arrays.asList(values));
}
System.out.println("---------------------------");
// 根据请求参数名称获取指定的请求参数值
// getParameter()方法:获取单选框的请求参数
String season = request.getParameter("season");
System.out.println("season = " + season);
// getParameter()方法:获取多选框的请求参数
// 只能获取到多个值中的第一个
String team = request.getParameter("team");
System.out.println("team = " + team);
// getParameterValues()方法:取单选框的请求参数
String[] seasons = request.getParameterValues("season");
System.out.println("Arrays.asList(seasons) = " + Arrays.asList(seasons));
// getParameterValues()方法:取多选框的请求参数
String[] teams = request.getParameterValues("team");
System.out.println("Arrays.asList(teams) = " + Arrays.asList(teams));
}2.4 解决获取请求参数乱码2.4.1 为什么会发生请求参数乱码因为客户端发送给请求参数给服务器的时候需要进行编码,将字符串编码成二进制才能够在网络中传输,而服务器在接收到二进制之后需要进行解码才能够获取真正的请求参数;在这个过程中如果保证客户端编码使用的字符集和服务器解码使用的字符集相同的话,基本上(只要采用正确的够用的字符集)就不会发生乱码了;而发生乱码的原因是因为使用了错误的字符集,或者是客户端与服务器端所采用的字符集不一致。2.4.2 怎么解决请求参数乱码我们当前使用的Tomcat的版本是Tomcat8以上,所以我们不需要考虑GET方式乱码的问题,因为Tomcat8及以上版本已经在配置中解决了GET请求乱码的问题。我们只需要解决POST请求乱码问题解决POST请求的参数乱码只需要在获取请求参数前调用request.setCharacterEncoding("UTF-8")就行了2.5 请求转发2.5.1 什么是请求转发请求转发是从一个资源跳转到另一个资源,在这个过程中客户端不会发起新的请求2.5.2 请求转发的入门案例2.5.2.1 案例目标从ServletDemo01使用请求转发的方式跳转到ServletDemo022.5.2.2 请求转发的APIrequest.getRequestDispatcher("路径").forward(request,response);2.5.2.3 案例代码ServletDemo01的代码public class ServletDemo01 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
System.out.println("ServletDemo01执行了...")
//请求转发跳转到ServletDemo02
request.getRequestDispatcher("/demo02").forward(request, response);
}
}ServletDemo02的代码public class ServletDemo02 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
System.out.println("ServletDemo02执行了...")
}
}2.5.3 请求转发的特征请求转发的跳转是由服务器发起的,在这个过程中浏览器只会发起一次请求请求转发只能跳转到本项目的资源,但是可以跳转到WEB-INF中的资源请求转发不会改变地址栏的地址2.5 请求域对象2.5.1 请求域范围我们之前学过全局域的范围,全局域是整个项目范围的所有动态资源都能够共享的一个范围;而请求域的范围只是在一次请求中的动态资源能够共享的一个范围2.5.2 请求域对象的API往请求域中存入数据:request.setAttribute(key,value)从请求域中取出数据:request.getAttribute(key)2.5.3 请求域对象案例2.5.3.1 案例目标在ServletDemo01中往请求域中存入"username"作为key,"aobama"作为值的键值对;然后在ServletDemo02中从请求域中根据"username"取出对应的值2.5.3.2 使用请求域的前提请求域对象一定要和请求转发一起使用,因为请求域的范围是一次请求范围内,所以要在两个动态资源中使用请求域必须要进行请求转发跳转2.5.3.3 案例代码ServletDemo01的代码public class ServletDemo01 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
System.out.println("ServletDemo01执行了...")
String username = "aobama";
//将username存储到request域对象中
request.setAttribute("name",username);
//请求转发跳转到ServletDemo02
request.getRequestDispatcher("/demo02").forward(request, response);
}
}ServletDemo02的代码public class ServletDemo02 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String username = (String)request.getAttribute("name");
System.out.println("在ServletDemo02中获取username:"+username)
}
}2.6 JavaBean2.6.1 什么是JavaBeanJavaBean是使用Java语言编写的可重用组件,在我们的项目中JavaBean主要用于存储内存中的数据,以及提供方法便于使用者获取数据2.6.2 JavaBean的编写要求类必须是公有的必须有无参构造函数属性私有,使用private修饰针对所有的私有属性,提供对应的set和get方法建议重写toString()方法,便于打印对象基本类型简写使用包装类型2.6.3 JavaBean的示例public class User {
private Integer id;
private String username;
public User() {
}
public Integer getId() {
return id;
}
public void setId(Integer id) {
this.id = id;
}
public String getUsername() {
return username;
}
public void setUsername(String username) {
this.username = username;
}
@Override
public String toString() {
return "User{" +
"id=" + id +
", username='" + username + ''' +
'}';
}
}2.7 使用BeanUtils将Map中的数据封装到JavaBean对象中2.7.1 使用JavaBean存储数据和使用Map存储数据的优缺点对比2.7.1.1 使用Map存储数据的优缺点优点:灵活性强于javabean,易扩展,耦合度低。写起来简单,代码量少。缺点:javabean在数据输入编译期就会对一些数据类型进行校验,如果出错会直接提示。而map的数据类型则需要到sql层,才会进行处理判断。map的参数名称如果写错,也是需要到sql层,才能判断出是不是字段写错,不利于调试等。相对而言javabean会在编译期间发现错误map的参数值如果多传、乱传,也是需要到sql层,才能判断出是不是字段写错,不利于调试等。相对而言javabean会在编译期间发现错误仅仅看方法签名,你不清楚Map中所拥有的参数个数、类型、每个参数代表的含义。 后期人员去维护,例如需要加一个参数等,如果项目层次较多,就需要把每一层的代码都了解清楚才能知道传递了哪些参数2.7.1.2 使用JavaBean存储数据的优缺点优点:面向对象的良好诠释、数据结构清晰,便于团队开发 & 后期维护代码足够健壮,可以排除掉编译期错误Map存储数据的缺点都是JavaBean存储数据的优点缺点:代码量增多,需要花时间去封装JavaBean类2.7.2 我们存储数据时候的选择通常情况下,我们会选择使用JavaBean来存储内存中的数据,除非是非常简单的数据没有必要多编写一个JavaBean类的时候才会选择使用Map进行存储2.7.3 BeanUtils的使用2.7.3.1 作用将Map中的数据填充到JavaBean对象中2.7.3.2 API方法介绍BeanUtils.populate(map对象,JavaBean.class);2.7.3.3 使用步骤导入对应的jar包调用BeanUtils类的populate方法,传入对应的参数就可以了第二章 Response1. 学习目标了解Response的概念了解Response的组成部分掌握使用Response向客户端输出字符串掌握解决输出字符串时候的乱码问题掌握重定向了解重定向和请求转发的区别2. 内容讲解2.1 Response的概述2.1.1 Response的概念在Servlet API中,定义了一个HttpServletResponse接口(doGet,doPost方法的参数),它继承自ServletResponse接口,专门用来封装HTTP响应消息。由于HTTP响应消息分为响应行、响应头、响应体三部分,因此,在HttpServletResponse接口中定义了向客户端发送响应状态码、响应头、响应体的方法用我们自己的话说: Response就是服务器端一个对象,它里面可以封装要响应给客户端的响应行、头、体的信息2.1.2 Response的组成部分响应行: 包含响应状态码、状态码描述信息、HTTP协议的版本响应头: 一个个的键值对,每一个键值对都包含了具有各自含义的发送给客户端的信息响应体: 用于展示在客户端的文本、图片,或者供客户端下载或播放的内容2.1.3 Response的作用设置响应行的信息,主要是设置响应状态码设置响应头的信息设置响应体的信息2.2 使用Response向客户端输出字符串2.2.1 输出字符串的API//1. 获取字符输出流
PrintWriter writer = response.getWriter();
//2. 输出内容
writer.write("hello world");2.2.2 响应数据乱码问题由于服务器端在输出内容的时候进行编码使用的字符集和客户端进行解码的时候使用的字符集不一致,所以会发生响应数据乱码问题。 我们解决响应数据乱码问题只需要在获取字符输出流之前,执行如下代码就可以了:response.setContentType("text/html;charset=UTF-8");2.3 使用response向客户端响应一个文件2.3.1 使用字节输出流的APIServletOutputStream os = response.getOutputStream();2.3.2 案例目标在ServletDemo01中使用response向浏览器输出一张图片2.3.3 案例代码package com.atguigu.servlet;
import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
/**
* @author Leevi
* 日期2021-05-11 11:19
*/
public class ServletDemo01 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
//解决响应内容的乱码问题
//response.setContentType("text/html;charset=UTF-8");
//System.out.println("hello world");
//response就是服务器端要发送给客户端的响应内容,它里面包含三部分: 响应行、头、体
//1. 设置响应行的内容:
//设置响应的状态码,但是一般情况下我们不需要设置状态码,因为服务器会自动设置状态码
//response.setStatus(404);
//2. 设置响应头信息: setHeader("name","value");
//3. 设置响应体的信息: 响应体就是显示在浏览器的数据
//3.1 通过字符流往浏览器输出文本内容
//response.getWriter().write("<h1>你好世界...</h1>");
//3.2 使用字节流往浏览器输出一张图片
//首先设置响应数据的mime-type
//第一步: 使用字节输入流读取那张图片
//使用ServletContext获取资源的真实路径
String realPath = getServletContext().getRealPath("img/mm.jpg");
InputStream is = new FileInputStream(realPath);
//第二步: 使用字节输出流,将图片输出到浏览器
ServletOutputStream os = response.getOutputStream();
//边读编写
int len = 0;
byte[] buffer = new byte[1024];
while ((len = is.read(buffer)) != -1){
os.write(buffer,0,len);
}
os.close();
is.close();
}
}2.4 重定向2.4.1 什么是重定向重定向是由项目中的一个资源跳转到另一个资源,在这个过程中客户端会发起新的请求2.4.2 重定向的入门案例2.4.2.1 案例目标从ServletDemo01重定向跳转到ServletDemo022.4.2.2 重定向的APIresponse.sendRedirect("路径");2.3.2.3 案例代码ServletDemo01的代码public class ServletDemo01 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
System.out.println("ServletDemo01执行了...")
//请求转发跳转到ServletDemo02
response.sendRedirect("/app/demo02");
}
}ServletDemo02的代码public class ServletDemo02 extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
System.out.println("在ServletDemo02执行了.....")
}
}2.4.3 重定向的特征重定向的跳转是由浏览器发起的,在这个过程中浏览器会发起两次请求重定向跳转可以跳转到任意服务器的资源,但是无法访问WEB-INF中的资源重定向跳转浏览器的地址栏中的地址会变成跳转到的路径2.5 重定向和请求转发的对比重定向会由浏览器发起新的请求,而请求转发不会发起新的请求重定向可以访问任意互联网资源,而请求转发只能访问本项目资源重定向不能访问本项目的WEB-INF内的资源,而请求转发可以访问本项目的WEB-INF内的资源发起重定向的资源和跳转到的目标资源没在同一次请求中,所以重定向不能在请求域中使用;而发起请求转发的资源和跳转到的目标资源在同一次请求中,所以请求转发可以在请求域中使用2.6 综合案例2.6.1 案例目标2.6.2 案例实现步骤新建module拷贝内容三个jar包:mysql驱动、druid连接池、dbutils工具类配置文件编写html页面<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>登录页面</title>
</head>
<body>
<form action="/web0603/login" method="post">
用户名<input type="text" name="username"/><br/>
密码<input type="text" name="password"/><br/>
<input type="submit" value="登录">
</form>
</body>
</html>4.准备数据CREATE TABLE `t_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) DEFAULT NULL,
`password` varchar(50) DEFAULT NULL,
`nickname` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
insert into `t_user`(`id`,`username`,`password`,`nickname`) values (1,'jay','123456','周杰棍');5.编写LoginServlet代码package com.atguigu.servlet;
import com.atguigu.bean.User;
import com.atguigu.utils.JDBCTools;
import org.apache.commons.dbutils.QueryRunner;
import org.apache.commons.dbutils.handlers.BeanHandler;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
/**
* @author Leevi
* 日期2021-05-11 14:30
*/
public class LoginServlet extends HttpServlet {
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doGet(request, response);
}
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
//解决请求参数和响应数据的乱码问题
request.setCharacterEncoding("UTF-8");
response.setContentType("text/html;charset=UTF-8");
//获取请求参数
String username = request.getParameter("username");
String password = request.getParameter("password");
//使用DBUtils连接数据库执行查询的SQL语句
String sql = "select * from t_user where username=? and password=?";
QueryRunner queryRunner = new QueryRunner(JDBCTools.getDataSource());
try {
User user = queryRunner.query(sql, new BeanHandler<>(User.class), username, password);
if (user != null) {
//说明查询到数据
//向浏览器响应查询到数据了
response.getWriter().write("登录成功!!!");
return;
}
//说明登录失败
throw new RuntimeException();
} catch (Exception e) {
e.printStackTrace();
response.getWriter().write("登录失败!!!");
}
}
}配置LoginServlet的映射路径部署Module
【大数据开发运维解决方案】Hadoop2.7.6+Spark2.4.4+Scala2.11.12+Hudi0.5.2单机伪分布式安装
Hadoop2.7.6+Spark2.4.4+Scala2.11.12+Hudi0.5.2单机伪分布式安装注意1、本文档使用的基础hadoop环境是基于本人写的另一篇文章的基础上新增的spark和hudi的安装部署文档,基础环境部署文档2、整篇文章配置相对简单,走了一些坑,没有写在文档里,为了像我一样的小白看我的文档,按着错误的路径走了,文章整体写的较为详细,按照文章整体过程来做应该不会出错,如果需要搭建基础大数据环境的,可以看上面本人写的hadoop环境部署文档,写的较为详细。3、关于spark和hudi的介绍这里不再赘述,网上和官方文档有很多的文字介绍,本文所有安装所需的介质或官方文档均已给出可以直接下载或跳转的路径,方便各位免费下载与我文章安装的一致版本的介质。4、下面是本实验安装完成后本人实验环境整体hadoop系列组件的版本情况:软件名称版本号Hadoop2.7.6Mysql5.7Hive2.3.2Hbase1.4.9Spark2.4.4Hudi0.5.2JDK1.8.0_151Scala2.11.12OGG for bigdata12.3Kylin2.4Kafka2.11-1.1.1Zookeeper3.4.6Oracle Linux6.8x64一、安装spark依赖的Scala因为其他版本的Spark都是基于2.11.版本,只有2.4.2版本的才使用Scala2.12. 版本进行开发,hudi官方用的是spark2.4.4,而spark:"Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)",所以这里我们下载scala2.11.12。1.1 下载和解压缩Scala下载地址:点击进入下载linux版本:在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:[root@hadoop opt]# cd /usr/
[root@hadoop usr]# mkdir scala
[root@hadoop usr]# cd scala/
[root@hadoop scala]# pwd
/usr/scala
[root@hadoop scala]# ls
scala-2.11.12.tgz
[root@hadoop scala]# tar -zxvf scala-2.11.12.tgz
[root@hadoop scala]# ls
scala-2.11.12 scala-2.11.12.tgz
[root@hadoop scala]# rm -rf *tgz
[root@hadoop scala]# cd scala-2.11.12/
[root@hadoop scala-2.11.12]# pwd
/usr/scala/scala-2.11.121.2 配置环境变量编辑/etc/profile这个文件,在文件中增加配置:export SCALA_HOME=/usr/scala/scala-2.11.12
在该文件的PATH变量中增加下面的内容:
${SCALA_HOME}/bin添加完成后,我的/etc/profile的配置如下:export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.11.12
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib保存退出,source一下使环境变量生效:[root@hadoop ~]# source /etc/profile1.3 验证Scala[root@hadoop scala-2.11.12]# scala -version
Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL二、 下载和解压缩Spark2.1、下载Spark下载地址:点击进入2.2 解压缩Spark在/hadoop创建spark目录用户存放spark。[root@hadoop scala-2.11.12]# cd /hadoop/
[root@hadoop hadoop]# mkdir spark
[root@hadoop hadoop]# cd spark/
通过xftp上传安装包到spark目录
[root@hadoop spark]# tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz
[root@hadoop spark]# ls
spark-2.4.4-bin-hadoop2.7 spark-2.4.4-bin-hadoop2.7.tgz
[root@hadoop spark]# rm -rf *tgz
[root@hadoop spark]# mv spark-2.4.4-bin-hadoop2.7/* .
[root@hadoop spark]# ls
bin conf data examples jars kubernetes LICENSE licenses NOTICE python R README.md RELEASE sbin spark-2.4.4-bin-hadoop2.7 yarn三、Spark相关的配置3.1、配置环境变量编辑/etc/profile文件,增加export SPARK_HOME=/hadoop/spark上面的变量添加完成后编辑该文件中的PATH变量,添加${SPARK_HOME}/bin修改完成后,我的/etc/profile文件内容是:export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.11.12
export SPARK_HOME=/hadoop/spark
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${SPARK_HOME}/sbin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib编辑完成后,执行命令 source /etc/profile使环境变量生效。3.2、配置参数文件进入conf目录[root@hadoop conf]# pwd
/hadoop/spark/conf复制一份配置文件并重命名root@hadoop conf]# cp spark-env.sh.template spark-env.sh
[root@hadoop conf]# ls
docker.properties.template fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh spark-env.sh.template编辑spark-env.h文件,在里面加入配置(具体路径以自己的为准):export SCALA_HOME=/usr/scala/scala-2.11.12
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_HOME=/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HOME=/hadoop/spark
export SPARK_MASTER_IP=192.168.1.66
export SPARK_EXECUTOR_MEMORY=1Gsource /etc/profile生效。3.3、新建slaves文件以spark为我们创建好的模板创建一个slaves文件,命令是:[root@hadoop conf]# pwd
/hadoop/spark/conf
[root@hadoop conf]# cp slaves.template slaves四、启动spark因为spark是依赖于hadoop提供的分布式文件系统的,所以在启动spark之前,先确保hadoop在正常运行。[root@hadoop hadoop]# jps
23408 RunJar
23249 JobHistoryServer
23297 RunJar
24049 Jps
22404 DataNode
22774 ResourceManager
23670 Kafka
22264 NameNode
22889 NodeManager
23642 QuorumPeerMain
22589 SecondaryNameNode在hadoop正常运行的情况下,在hserver1(也就是hadoop的namenode,spark的marster节点)上执行命令:[root@hadoop hadoop]# cd /hadoop/spark/sbin
[root@hadoop sbin]# ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
[root@hadoop sbin]# cat /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out
Spark Command: /usr/java/jdk1.8.0_151/bin/java -cp /hadoop/spark/conf/:/hadoop/spark/jars/*:/hadoop/etc/hadoop/ -Xmx1g org.apache.spark.deploy.master.Master --host hadoop --port 7077 --webui-port 8080
========================================
20/03/30 22:42:27 INFO master.Master: Started daemon with process name: 24079@hadoop
20/03/30 22:42:27 INFO util.SignalUtils: Registered signal handler for TERM
20/03/30 22:42:27 INFO util.SignalUtils: Registered signal handler for HUP
20/03/30 22:42:27 INFO util.SignalUtils: Registered signal handler for INT
20/03/30 22:42:27 WARN master.MasterArguments: SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST
20/03/30 22:42:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 22:42:27 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 22:42:27 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 22:42:27 INFO spark.SecurityManager: Changing view acls groups to:
20/03/30 22:42:27 INFO spark.SecurityManager: Changing modify acls groups to:
20/03/30 22:42:27 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 22:42:27 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077.
20/03/30 22:42:27 INFO master.Master: Starting Spark master at spark://hadoop:7077
20/03/30 22:42:27 INFO master.Master: Running Spark version 2.4.4
20/03/30 22:42:28 INFO util.log: Logging initialized @1497ms
20/03/30 22:42:28 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 22:42:28 INFO server.Server: Started @1560ms
20/03/30 22:42:28 INFO server.AbstractConnector: Started ServerConnector@6182300a{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
20/03/30 22:42:28 INFO util.Utils: Successfully started service 'MasterUI' on port 8080.
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f1f0276{/app,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f1af444{/app/json,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@259b10d3{/,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6fc2f56f{/json,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37a28407{/static,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e99fa57{/app/kill,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66be5bb8{/driver/kill,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO ui.MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://hadoop:8080
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b2c0980{/metrics/master/json,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4ac1749f{/metrics/applications/json,null,AVAILABLE,@Spark}
20/03/30 22:42:28 INFO master.Master: I have been elected leader! New state: ALIVE
20/03/30 22:42:31 INFO master.Master: Registering worker 192.168.1.66:39384 with 8 cores, 4.6 GB RAM
[root@hadoop sbin]# cat /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
Spark Command: /usr/java/jdk1.8.0_151/bin/java -cp /hadoop/spark/conf/:/hadoop/spark/jars/*:/hadoop/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop:7077
========================================
20/03/30 22:42:29 INFO worker.Worker: Started daemon with process name: 24173@hadoop
20/03/30 22:42:29 INFO util.SignalUtils: Registered signal handler for TERM
20/03/30 22:42:29 INFO util.SignalUtils: Registered signal handler for HUP
20/03/30 22:42:29 INFO util.SignalUtils: Registered signal handler for INT
20/03/30 22:42:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 22:42:30 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 22:42:30 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 22:42:30 INFO spark.SecurityManager: Changing view acls groups to:
20/03/30 22:42:30 INFO spark.SecurityManager: Changing modify acls groups to:
20/03/30 22:42:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 22:42:30 INFO util.Utils: Successfully started service 'sparkWorker' on port 39384.
20/03/30 22:42:30 INFO worker.Worker: Starting Spark worker 192.168.1.66:39384 with 8 cores, 4.6 GB RAM
20/03/30 22:42:30 INFO worker.Worker: Running Spark version 2.4.4
20/03/30 22:42:30 INFO worker.Worker: Spark home: /hadoop/spark
20/03/30 22:42:31 INFO util.log: Logging initialized @1682ms
20/03/30 22:42:31 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 22:42:31 INFO server.Server: Started @1758ms
20/03/30 22:42:31 INFO server.AbstractConnector: Started ServerConnector@3d598dff{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}
20/03/30 22:42:31 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081.
20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5099c1b0{/logPage,null,AVAILABLE,@Spark}
20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64348087{/logPage/json,null,AVAILABLE,@Spark}
20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46dcda1b{/,null,AVAILABLE,@Spark}
20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1617f7cc{/json,null,AVAILABLE,@Spark}
20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56e77d31{/static,null,AVAILABLE,@Spark}
20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@643123b6{/log,null,AVAILABLE,@Spark}
20/03/30 22:42:31 INFO ui.WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://hadoop:8081
20/03/30 22:42:31 INFO worker.Worker: Connecting to master hadoop:7077...
20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1cf30aaa{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 22:42:31 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:7077 after 36 ms (0 ms spent in bootstraps)
20/03/30 22:42:31 INFO worker.Worker: Successfully registered with master spark://hadoop:7077
启动没问题,访问Webui:http://192.168.1.66:8080/五、运行Spark提供的计算圆周率的示例程序这里只是简单的用local模式运行一个计算圆周率的Demo。按照下面的步骤来操作。[root@hadoop sbin]# cd /hadoop/spark/
[root@hadoop spark]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.4.4.jar
20/03/30 22:45:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 22:45:59 INFO spark.SparkContext: Running Spark version 2.4.4
20/03/30 22:45:59 INFO spark.SparkContext: Submitted application: Spark Pi
20/03/30 22:45:59 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 22:45:59 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 22:45:59 INFO spark.SecurityManager: Changing view acls groups to:
20/03/30 22:45:59 INFO spark.SecurityManager: Changing modify acls groups to:
20/03/30 22:45:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 22:45:59 INFO util.Utils: Successfully started service 'sparkDriver' on port 39352.
20/03/30 22:45:59 INFO spark.SparkEnv: Registering MapOutputTracker
20/03/30 22:45:59 INFO spark.SparkEnv: Registering BlockManagerMaster
20/03/30 22:45:59 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/30 22:45:59 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/30 22:45:59 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-63bf7c92-8908-4784-8e16-4c6ef0c93dc0
20/03/30 22:45:59 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
20/03/30 22:45:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
20/03/30 22:46:00 INFO util.log: Logging initialized @2066ms
20/03/30 22:46:00 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 22:46:00 INFO server.Server: Started @2179ms
20/03/30 22:46:00 INFO server.AbstractConnector: Started ServerConnector@3abd581e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 22:46:00 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@36dce7ed{/jobs,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a1ebcff{/jobs/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@19868320{/jobs/job,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c20be82{/jobs/job/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@13c612bd{/stages,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3ef41c66{/stages/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b739528{/stages/stage,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f577419{/stages/stage/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28fa700e{/stages/pool,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d526ad9{/stages/pool/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e041f0c{/storage,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a175569{/storage/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@11963225{/storage/rdd,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3c966c{/storage/rdd/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@11ee02f8{/environment,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4102b1b1{/environment/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61a5b4ae{/executors,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a71c100{/executors/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b69fd74{/executors/threadDump,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f325091{/executors/threadDump/json,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@437e951d{/static,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@467f77a5{/,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bb9aa43{/api,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66b72664{/jobs/job/kill,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a34b7b8{/stages/stage/kill,null,AVAILABLE,@Spark}
20/03/30 22:46:00 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop:4040
20/03/30 22:46:00 INFO spark.SparkContext: Added JAR file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.4.jar at spark://hadoop:39352/jars/spark-examples_2.11-2.4.4.jar with timestamp 1585579560287
20/03/30 22:46:00 INFO executor.Executor: Starting executor ID driver on host localhost
20/03/30 22:46:00 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38875.
20/03/30 22:46:00 INFO netty.NettyBlockTransferService: Server created on hadoop:38875
20/03/30 22:46:00 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/30 22:46:00 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop, 38875, None)
20/03/30 22:46:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop:38875 with 366.3 MB RAM, BlockManagerId(driver, hadoop, 38875, None)
20/03/30 22:46:00 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop, 38875, None)
20/03/30 22:46:00 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop, 38875, None)
20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6f8e0cee{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 22:46:01 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
20/03/30 22:46:01 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
20/03/30 22:46:01 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
20/03/30 22:46:01 INFO scheduler.DAGScheduler: Parents of final stage: List()
20/03/30 22:46:01 INFO scheduler.DAGScheduler: Missing parents: List()
20/03/30 22:46:01 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
20/03/30 22:46:01 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 366.3 MB)
20/03/30 22:46:01 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 366.3 MB)
20/03/30 22:46:01 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop:38875 (size: 1256.0 B, free: 366.3 MB)
20/03/30 22:46:01 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
20/03/30 22:46:01 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
20/03/30 22:46:01 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/03/30 22:46:01 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
20/03/30 22:46:01 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
20/03/30 22:46:01 INFO executor.Executor: Fetching spark://hadoop:39352/jars/spark-examples_2.11-2.4.4.jar with timestamp 1585579560287
20/03/30 22:46:01 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:39352 after 45 ms (0 ms spent in bootstraps)
20/03/30 22:46:01 INFO util.Utils: Fetching spark://hadoop:39352/jars/spark-examples_2.11-2.4.4.jar to /tmp/spark-9e0481a2-756b-436f-bc74-dd42fb5ea839/userFiles-86767584-1e78-45f2-a9ed-8ac4360ab170/fetchFileTem
p2974211155688432975.tmp20/03/30 22:46:01 INFO executor.Executor: Adding file:/tmp/spark-9e0481a2-756b-436f-bc74-dd42fb5ea839/userFiles-86767584-1e78-45f2-a9ed-8ac4360ab170/spark-examples_2.11-2.4.4.jar to class loader
20/03/30 22:46:01 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver
20/03/30 22:46:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
20/03/30 22:46:01 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
20/03/30 22:46:01 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 308 ms on localhost (executor driver) (1/2)
20/03/30 22:46:01 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
20/03/30 22:46:01 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 31 ms on localhost (executor driver) (2/2)
20/03/30 22:46:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/03/30 22:46:01 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.606 s
20/03/30 22:46:01 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.703911 s
Pi is roughly 3.1386756933784667
20/03/30 22:46:01 INFO server.AbstractConnector: Stopped Spark@3abd581e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 22:46:01 INFO ui.SparkUI: Stopped Spark web UI at http://hadoop:4040
20/03/30 22:46:01 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/30 22:46:01 INFO memory.MemoryStore: MemoryStore cleared
20/03/30 22:46:01 INFO storage.BlockManager: BlockManager stopped
20/03/30 22:46:01 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/03/30 22:46:01 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/30 22:46:01 INFO spark.SparkContext: Successfully stopped SparkContext
20/03/30 22:46:01 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 22:46:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e019897d-3160-4bb1-ab59-f391e32ec47a
20/03/30 22:46:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9e0481a2-756b-436f-bc74-dd42fb5ea839可以看到输出:Pi is roughly 3.137355686778434已经打印出了圆周率。上面只是使用了单机本地模式调用Demo,使用集群模式运行Demo,请继续看。六、用yarn-cluster模式执行计算程序进入到Spark的安装目录,执行命令,用yarn-cluster模式运行计算圆周率的Demo:[root@hadoop spark]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster examples/jars/spark-examples_2.11-2.4.4.jar
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
20/03/30 22:47:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 22:47:48 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
20/03/30 22:47:48 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
20/03/30 22:47:48 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
20/03/30 22:47:48 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/03/30 22:47:48 INFO yarn.Client: Setting up container launch context for our AM
20/03/30 22:47:48 INFO yarn.Client: Setting up the launch environment for our AM container
20/03/30 22:47:48 INFO yarn.Client: Preparing resources for our AM container
20/03/30 22:47:48 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/03/30 22:47:51 INFO yarn.Client: Uploading resource file:/tmp/spark-d554f7cd-c7d4-4dfa-bc86-11a340925db6/__spark_libs__3389017089811757919.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585579247054_0001/__spark_libs__3389017089811757919.zip20/03/30 22:47:59 INFO yarn.Client: Uploading resource file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.4.jar -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585579247054_0001/spark-exa
mples_2.11-2.4.4.jar20/03/30 22:47:59 INFO yarn.Client: Uploading resource file:/tmp/spark-d554f7cd-c7d4-4dfa-bc86-11a340925db6/__spark_conf__559264393694354636.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1
585579247054_0001/__spark_conf__.zip20/03/30 22:47:59 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 22:47:59 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 22:47:59 INFO spark.SecurityManager: Changing view acls groups to:
20/03/30 22:47:59 INFO spark.SecurityManager: Changing modify acls groups to:
20/03/30 22:47:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 22:48:01 INFO yarn.Client: Submitting application application_1585579247054_0001 to ResourceManager
20/03/30 22:48:01 INFO impl.YarnClientImpl: Submitted application application_1585579247054_0001
20/03/30 22:48:02 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:02 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1585579681188
final status: UNDEFINED
tracking URL: http://hadoop:8088/proxy/application_1585579247054_0001/
user: root
20/03/30 22:48:03 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:04 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:05 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:06 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:07 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:08 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:09 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:11 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:12 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:13 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:14 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:15 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:16 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:17 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:19 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:20 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:21 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:22 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:23 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:24 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:25 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:26 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:27 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:28 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED)
20/03/30 22:48:29 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:29 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: hadoop
ApplicationMaster RPC port: 37844
queue: default
start time: 1585579681188
final status: UNDEFINED
tracking URL: http://hadoop:8088/proxy/application_1585579247054_0001/
user: root
20/03/30 22:48:30 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:31 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:32 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:33 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:34 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:35 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:36 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:37 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:38 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:39 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:40 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:41 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:42 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:43 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:44 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:45 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING)
20/03/30 22:48:46 INFO yarn.Client: Application report for application_1585579247054_0001 (state: FINISHED)
20/03/30 22:48:46 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: hadoop
ApplicationMaster RPC port: 37844
queue: default
start time: 1585579681188
final status: SUCCEEDED
tracking URL: http://hadoop:8088/proxy/application_1585579247054_0001/
user: root
20/03/30 22:48:46 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 22:48:46 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-4c243c24-9489-4c8a-a1bc-a6a9780615d6
20/03/30 22:48:46 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d554f7cd-c7d4-4dfa-bc86-11a340925db6
注意,使用yarn-cluster模式计算,结果没有输出在控制台,结果写在了Hadoop集群的日志中,如何查看计算结果?注意到刚才的输出中有地址: tracking URL: http://hadoop:8088/proxy/application_1585579247054_0001/ 进去看看: 再点进logs: 查看stdout内容: 圆周率结果已经打印出来了。这里再给出几个常用命令:启动spark
./sbin/start-all.sh
启动Hadoop以**及Spark:
./starths.sh
停止命令改成stop七、配置spark读取hive表由于在hive里面操作表是通过mapreduce的方式,效率较低,本文主要描述如何通过spark读取hive表到内存进行计算。第一步,先把&dollar;HIVE_HOME/conf/hive-site.xml放入$SPARK_HOME/conf内,使得spark能够获取hive配置[root@hadoop spark]# pwd
/hadoop/spark
[root@hadoop spark]# cp $HIVE_HOME/conf/hive-site.xml conf/
[root@hadoop spark]# chmod 777 conf/hive-site.xml
[root@hadoop spark]# cp /hadoop/hive/lib/mysql-connector-java-5.1.47.jar jars/通过spark-shell进入交互界面[root@hadoop spark]# /hadoop/spark/bin/spark-shell
20/03/31 10:31:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/03/31 10:32:41 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/03/31 10:32:41 WARN util.Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Spark context Web UI available at http://hadoop:4042
Spark context available as 'sc' (master = local[*], app id = local-1585621962060).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext
scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._
scala> val hiveContext = new HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@62966c9f
scala> hiveContext.sql("show databases").show()
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.client.capability.check does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.false.positive.probability does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.broker.address.default does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.orc.time.counters does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve-fraction.min does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.ms.footer.cache.ppd.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.event.message.factory does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.metrics.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.hs2.user.access does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.storage.storageDirectory does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.am.liveness.connection.timeout.ms does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.dynamic.semijoin.reduction.threshold does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.connect.retry.limit does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.xmx.headroom does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.dynamic.semijoin.reduction does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.direct does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.stats does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.client.consistent.splits does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.tez.session.lifetime does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.timedout.txn.reaper.start does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.ttl does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.management.acl does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.delegation.token.lifetime does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.guidKey does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.ats.hook.queue.capacity does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.strict.checks.large.query does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.bigtable.minsize.semijoin.reduction does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.alloc.min does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.alloc.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.wait.queue.comparator.class.name does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.port does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.cache.use.soft.references does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve.fraction.max does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.listener.thread-count does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.container.max.java.heap.fraction does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.stats.column.autogather does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.am.liveness.heartbeat.interval.ms does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.decoding.metrics.percentiles.intervals does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.groupby.position.alias does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.txn.store.impl does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.use.groupby.shuffle does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.object.cache.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.parallel.ops.in.session does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.groupby.limit.extrastep does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.file.location does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.retry.delay.seconds does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.materializedview.fileformat does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.num.file.cleaner.threads does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.test.fail.compaction does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.blobstore.use.blobstore.as.scratchdir does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.class does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.mmap.path does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.download.permanent.fns does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.max.historic.queries does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.execution.reducesink.new.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.max.num.delta does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.attempted does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.port does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.initiator.failed.compacts.threshold does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.reporter does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.max.pending.writes does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.execution.mode does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.enable.grace.join.in.llap does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.memory.mode does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.threadpool.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.select.threshold does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.scratchdir.lock does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.use.spnego does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.file.frequency does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.hs2.coordinator.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.timeout.seconds does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.filter.stats.reduction does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.exec.orc.base.delta.ratio does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.fastpath does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.clear.dangling.scratchdir does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.test.fail.heartbeater does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.file.cleanup.delay.seconds does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.management.rpc.port does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.mapjoin.hybridgrace.bloomfilter does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.tree does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.stats.ndv.tuner does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.query.length does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.failed does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.close.session.on.disconnect does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.ppd.windowing does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.initial.metadata.count.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.host does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.ms.footer.cache.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.point.lookup.min does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.file.metadata.threads does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.service.refresh.interval.sec does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.max.output.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.remote.token.requires.signing does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.cache.allow.synthetic.fileid does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.hash.table.inflation.factor does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.hbase.ttl does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.vectorized does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.writeset.reaper.interval does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.vector.serde.deserialize does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.order.columnalignment does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.send.buffer.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.exec.schema.evolution does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.elements.values.clause does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.llap.concurrent.queries does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.allow.uber does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.partition.size.max does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.auth does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.include.fileid does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.communicator.num.threads does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orderby.position.alias does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.connection.sleep.between.retries.ms does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.max.partitions does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.hadoop2.component does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.yarn.shuffle.port does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.elements.in.clause does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.passiveWaitTimeMs does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.load.dynamic.partitions.thread does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.segments.granularity does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.http.response.header.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.conf.internal.variable.list does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose.reductionpercentage does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.cm.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.retry.limit does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.serialize.in.tasks does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.query.timeout.seconds does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.hadoop2.frequency does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.directory.batch.ms does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.reader.wait does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.reenable.max.timeout.ms does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.max.open.txns does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.reduce.side does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.zookeeper.publish.configs does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.auto.convert.join.hashtable.max.entries does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.tez.sessions.init.threads does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.authorization.storage.check.externaltable.drop does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.execution.mode does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.cbo.cnf.maxnodes does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.adaptor.usage.mode does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.materializedview.rewriting does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.groupMembershipKey does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.catalog.cache.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.cbo.show.warnings does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.fshandler.threads does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.max.bloom.filter.entries does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.metadata.fraction does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.materializedview.serde does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.scheduler.wait.queue.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.cache.entries does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.txn.operational.properties does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.memory.ttl does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.rpc.port does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.nonvector.wrapper.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.cache.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.vectorized.input.format does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.cte.materialize.threshold does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.clean.until does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.semijoin.conversion does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.port does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.dynamic.partition.pruning does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.metrics.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.rootdir does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.limit.partition.request does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.async.log.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.logger does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.allow.udf.load.on.demand does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.cli.tez.session.async does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.bloom.filter.factor does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.am-reporter.max.threads does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.use.file.size.for.mapjoin does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.strict.checks.bucketing does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.bucket.pruning.compat does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.spnego.principal does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.preemption.metrics.intervals does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.shuffle.dir.watcher.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.arena.count does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.use.SSL does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.connection.timeout.ms does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.transpose.aggr.join does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.maxTries does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.dynamic.partition.pruning.max.data.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.metadata.base does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.invalidator.frequency does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.use.lrfu does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.mmap does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.coordinator.address.default does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.max.fetch.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.conf.hidden.list does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.io.sarg.cache.max.weight.mb does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.clear.dangling.scratchdir.interval does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.sleep.time does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.row.serde.deserialize does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.compile.lock.timeout does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.timedout.txn.reaper.interval does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.max.variance does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.lrfu.lambda does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.metadata.db.type does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.stream.timeout does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.transactional.events.mem does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.default.fetch.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.cm.retain does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.merge.cardinality.check does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.groupClassKey does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.point.lookup does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.allow.permanent.fns does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.web.ssl does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.txn.manager.dump.lock.state.on.acquire.timeout does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.succeeded does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.use.fileid.path does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.slice.row.count does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.mapjoin.optimized.hashtable.probe.percent does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.select.distribute does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.am.use.fqdn does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.reenable.min.timeout.ms does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.validate.acls does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.support.special.characters.tablename does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.mv.files.thread does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.skip.compile.udf.check does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.vector.serde.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.cm.interval does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.sleep.interval.between.start.attempts does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.yarn.container.mb does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.http.read.timeout does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.blobstore.optimizations.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.orc.gap.cache does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.exec.copyfile.maxnumfiles does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.formats does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.http.numConnection does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.scheduler.enable.preemption does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.num.executors does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.full does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.connection.class does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.tez.sessions.custom.queue.allowed does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.slice.lrr does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.writer.wait does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.http.request.header.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.max.threads does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose.reductiontuples does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.test.rollbacktxn does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.num.schedulable.tasks.per.node does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.acl does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.memory.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.strict.checks.type.safety does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.async.exec.async.compile does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.max.input.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.enable.memory.manager does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.msck.repair.batch.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.blobstore.supported.schemes does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.allow.synthetic.fileid does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.stats.filter.in.factor does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.use.op.stats does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.exec.input.listing.max.threads does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.tez.session.lifetime.jitter does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.web.port does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.strict.checks.cartesian.product does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.rpc.num.handlers does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.vcpus.per.instance does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.count.open.txns.interval does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.min.bloom.filter.entries does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.partition.columns.separate does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.cache.stripe.details.mem.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.txn.heartbeat.threadpool.size does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.locality.delay does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.cmrootdir does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.disable.backoff.factor does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.am.liveness.connection.sleep.between.retries.ms does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.exec.inplace.progress does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.working.directory does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.memory.per.instance.mb does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.msck.path.validation does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve.fraction does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.merge.nway.joins does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.history.reaper.interval does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.vector.serde.async.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.in.place.progress does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.memory.rownum.max does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.xsrf.filter.enabled does not exist
20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.alloc.max does not exist
+------------+
|databaseName|
+------------+
| default|
| hadoop|
+------------+
scala> hiveContext.sql("show tables").show()
+--------+--------------------+-----------+
|database| tableName|isTemporary|
+--------+--------------------+-----------+
| default| aa| false|
| default| bb| false|
| default| dd| false|
| default| kylin_account| false|
| default| kylin_cal_dt| false|
| default|kylin_category_gr...| false|
| default| kylin_country| false|
| default|kylin_intermediat...| false|
| default|kylin_intermediat...| false|
| default| kylin_sales| false|
| default| test| false|
| default| test_null| false|
+--------+--------------------+-----------+可以看到已经查询到结果了,但是为啥上面报了一堆WARN 。比如:WARN conf.HiveConf: HiveConf of name hive.llap.skip.compile.udf.check does not exis hive-site配置文件删除掉:<property>
<name>hive.llap.skip.compile.udf.check</name>
<value>false</value>
<description>
Whether to skip the compile-time check for non-built-in UDFs when deciding whether to
execute tasks in LLAP. Skipping the check allows executing UDFs from pre-localized
jars in LLAP; if the jars are not pre-localized, the UDFs will simply fail to load.
</description>
</property>再次登录执行警告就消失了。八、配置Hudi8.1、检阅官方文档重点地方先来看下官方文档getstart首页:我之前装的hadoop环境是2.7版本的,前面之所以装spark2.4.4就是因为目前官方案例就是用的hadoop2.7+spark2.4.4,而且虽然现在hudi、spark是支持scala2.11.x/2.12.x,但是官网这里也是用的2.11,我这里为了保持和hudi官方以及spark2.4.4(Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151))一致,也就装的2.11.12版本的scala。因为目前为止,Hudi已经出了0.5.2版本,但是Hudi官方仍然用的0.5.1的做示例,接下来,先切换到hudi0.5.1的发布文档:点击查看上面发布文档讲的意思是:版本升级 将Spark版本从2.1.0升级到2.4.4 将Avro版本从1.7.7升级到1.8.2将Parquet版本从1.8.1升级到1.10.1将Kafka版本从0.8.2.1升级到2.0.0,这是由于将spark-streaming-kafkaartifact从0.8_2.11升级到0.10_2.11/2.12间接升级 重要:Hudi0.5.1版本需要将spark的版本升级到2.4+Hudi现在支持Scala 2.11和2.12,可以参考Scala 2.12构建来使用Scala 2.12来构建Hudi,另外,hudi-spark, hudi-utilities, hudi-spark-bundle andhudi-utilities-bundle包名现已经对应变更为 hudi-spark_{scala_version},hudi-spark_{scala_version}, hudi-utilities_{scala_version},hudi-spark-bundle_{scala_version}和hudi-utilities-bundle_{scala_version}. 注意这里的scala_version为2.11或2.12。在0.5.1版本中,对于timeline元数据的操作不再使用重命名方式,这个特性在创建Hudi表时默认是打开的。对于已存在的表,这个特性默认是关闭的,在已存在表开启这个特性之前,请参考这部分(https://hudi.apache.org/docs/deployment.html#upgrading)。若开启新的Huditimeline布局方式(layout),即避免重命名,可设置写配置项hoodie.timeline.layout.version=1。当然,你也可以在CLI中使用repairoverwrite-hoodie-props命令来添加hoodie.timeline.layout.version=1至hoodie.properties文件。注意,无论使用哪种方式,在升级Writer之前请先升级HudiReader(查询引擎)版本至0.5.1版本。 CLI支持repairoverwrite-hoodie-props来指定文件来重写表的hoodie.properties文件,可以使用此命令来的更新表名或者使用新的timeline布局方式。注意当写hoodie.properties文件时(毫秒),一些查询将会暂时失败,失败后重新运行即可。DeltaStreamer用来指定表类型的参数从--storage-type变更为了--table-type,可以参考wiki来了解更多的最新变化的术语。配置Kafka ResetOffset策略的值变化了。枚举值从LARGEST变更为LATEST,SMALLEST变更为EARLIEST,对应DeltaStreamer中的配置项为auto.offset.reset。当使用spark-shell来了解Hudi时,需要提供额外的--packagesorg.apache.spark:spark-avro_2.11:2.4.4,可以参考quickstart了解更多细节。 Keygenerator(键生成器)移动到了单独的包下org.apache.hudi.keygen,如果你使用重载键生成器类(对应配置项:hoodie.datasource.write.keygenerator.class),请确保类的全路径名也对应进行变更。Hive同步工具将会为MOR注册带有_ro后缀的RO表,所以查询也请带_ro后缀,你可以使用--skip-ro-suffix配置项来保持旧的表名,即同步时不添加_ro后缀。0.5.1版本中,供presto/hive查询引擎使用的hudi-hadoop-mr-bundle包shaded了avro包,以便支持realtimequeries(实时查询)。Hudi支持可插拔的记录合并逻辑,用户只需自定义实现HoodieRecordPayload。如果你使用这个特性,你需要在你的代码中relocateavro依赖,这样可以确保你代码的行为和Hudi保持一致,你可以使用如下方式来relocation。 org.apache.avro.org.apache.hudi.org.apache.avro. DeltaStreamer更好的支持Delete,可参考blog了解更多细节。DeltaStreamer支持AWS Database Migration Service(DMS) ,可参考blog了解更多细节。支持DynamicBloomFilter(动态布隆过滤器),默认是关闭的,可以使用索引配置项hoodie.bloom.index.filter.type=DYNAMIC_V0来开启。HDFSParquetImporter支持bulkinsert,可配置--command为bulkinsert。 支持AWS WASB和WASBS云存储。8.2、错误的安装尝试好了,看完了发布文档,而且已经定下了我们的使用版本关系,那么直接切换到Hudi0.5.2最新版本的官方文档:点此跳转因为之前没用过spark和hudi,在看到hudi官网的第一眼时候,首先想到的是先下载一个hudi0.5.1对应的应用程序,然后再进行部署,部署好了之后再执行上面官网给的命令代码,比如下面我之前做的错误示范:由于官方目前案例都是用的0.5.1,所以我也下载这个版本:
https://downloads.apache.org/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz
将下载好的安装包,上传到/hadoop/spark目录下并解压:
[root@hadoop spark]# ls
bin conf data examples hudi-0.5.1-incubating.src.tgz jars kubernetes LICENSE licenses logs NOTICE python R README.md RELEASE sbin spark-2.4.4-bin-hadoop2.7 work yarn
[root@hadoop spark]# tar -zxvf hudi-0.5.1-incubating.src.tgz
[root@hadoop spark]# ls
bin conf data examples hudi-0.5.1-incubating hudi-0.5.1-incubating.src.tgz jars kubernetes LICENSE licenses logs NOTICE python R README.md RELEASE sbin spark-2.4.4-bin-hadoop2.7 work yarn
[root@hadoop spark]# rm -rf *tgz
[root@hadoop ~]# /hadoop/spark/bin/spark-shell \
> --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \
> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/hadoop/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hudi#hudi-spark-bundle_2.11 added as a dependency
org.apache.spark#spark-avro_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-5717aa3e-7bfb-42c4-aadd-2a884f3521d5;1.0
confs: [default]
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
:: resolution report :: resolve 454ms :: artifacts dl 1ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 2 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark-bundle_2.11/0.5.1-incubating/hudi-spark-bundle_2.11-0.5.1-incubating.pom
Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark-bundle_2.11/0.5.1-incubating/hudi-spark-bundle_2.11-0.5.1-incubating.jar
。。。。。。。。。。
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.apache.hudi#hudi-spark-bundle_2.11;0.5.1-incubating: not found
:: org.apache.spark#spark-avro_2.11;2.4.4: not found
::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hudi#hudi-spark-bundle_2.11;0.5.1-incubating: not found, unresolved dependency: org.apache.spark#spark-avro_2.11;2.4.4:
not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1302)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:304)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)8.3、正确的“安装部署”其实下载的这个应该算是个源码包,不是可直接运行的。而且spark-shell --packages是指定java包的maven地址,若不给定,则会使用该机器安装的maven默认源中下载此jar包,也就是说指定的这两个jar是需要自动下载的,我的虚拟环境一没设置外部网络,二没配置maven,这肯定会报错找不到jar包。官方这里的代码:--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4说白了其实就是指定maven项目pom文件的依赖,翻了一下官方文档,找到了Hudi给的中央仓库地址,然后从中找到了官方案例代码中指定的两个包:直接拿出来,就是下面这两个:<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.11</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>org.apache.hudi</groupId>
<artifactId>hudi-spark-bundle_2.11</artifactId>
<version>0.5.2-incubating</version>
</dependency>好吧,那我就在这直接下载了这俩包,然后再继续看官方文档:这里说了我也可以通过自己构建hudi来快速开始, 并在spark-shell命令中使用--jars /packaging/hudi-spark-bundle/target/hudi-spark-bundle-..*-SNAPSHOT.jar, 而不是--packages org.apache.hudi:hudi-spark-bundle:0.5.2-incubating,看到这个提示,我在linux看了下 spark-shell的帮助:[root@hadoop external_jars]# /hadoop/spark/bin/spark-shell --help
Usage: ./bin/spark-shell [options]
Scala REPL options:
-I <file> preload <file>, enforcing line-by-line interpretation
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of jars to include on the driver
and executor classpaths.
--packages Comma-separated list of maven coordinates of jars to include
on the driver and executor classpaths. Will search the local
maven repo, then maven central and any additional remote
repositories given by --repositories. The format for the
coordinates should be groupId:artifactId:version.
--exclude-packages Comma-separated list of groupId:artifactId, to exclude while
resolving the dependencies provided in --packages to avoid
dependency conflicts.
--repositories Comma-separated list of additional remote repositories to
search for the maven coordinates given with --packages.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor. File paths of these files
in executors can be accessed via SparkFiles.get(fileName).
--conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M).
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath.
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).
--proxy-user NAME User to impersonate when submitting the application.
This argument does not work with --principal / --keytab.
--help, -h Show this help message and exit.
--verbose, -v Print additional debug output.
--version, Print the version of current Spark.
Cluster deploy mode only:
--driver-cores NUM Number of cores used by the driver, only in cluster mode
(Default: 1).
Spark standalone or Mesos with cluster deploy mode only:
--supervise If given, restarts the driver on failure.
--kill SUBMISSION_ID If given, kills the driver specified.
--status SUBMISSION_ID If given, requests the status of the driver specified.
Spark standalone and Mesos only:
--total-executor-cores NUM Total cores for all executors.
Spark standalone and YARN only:
--executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,
or all available cores on the worker in standalone mode)
YARN-only:
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
--num-executors NUM Number of executors to launch (Default: 2).
If dynamic allocation is enabled, the initial number of
executors will be at least NUM.
--archives ARCHIVES Comma separated list of archives to be extracted into the
working directory of each executor.
--principal PRINCIPAL Principal to be used to login to KDC, while running on
secure HDFS.
--keytab KEYTAB The full path to the file that contains the keytab for the
principal specified above. This keytab will be copied to
the node running the Application Master via the Secure
Distributed Cache, for renewing the login tickets and the
delegation tokens periodically.原来--jasrs是指定机器上存在的jar文件,接下来将前面下载的两个包上传到服务器:[root@hadoop spark]# mkdir external_jars
[root@hadoop spark]# cd external_jars/
[root@hadoop external_jars]# pwd
/hadoop/spark/external_jars
通过xftp上传jar到此目录
[root@hadoop external_jars]# ls
hudi-spark-bundle_2.11-0.5.2-incubating.jar scala-library-2.11.12.jar spark-avro_2.11-2.4.4.jar spark-tags_2.11-2.4.4.jar unused-1.0.0.jar然后将官方案例代码:spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'修改为:[root@hadoop external_jars]# /hadoop/spark/bin/spark-shell --jars /hadoop/spark/external_jars/spark-avro_2.11-2.4.4.jar,/hadoop/spark/external_jars/hudi-spark-bundle_2.11-0.5.2-incubating.jar --conf 'spark.seri
alizer=org.apache.spark.serializer.KryoSerializer
'20/03/31 15:19:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop:4040
Spark context available as 'sc' (master = local[*], app id = local-1585639157881).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala> OK!!!没有报错了,接下来开始尝试进行增删改查操作。8.4、Hudi增删改查基于上面步骤8.4.1、设置表名、基本路径和数据生成器来生成记录scala> import org.apache.hudi.QuickstartUtils._
import org.apache.hudi.QuickstartUtils._
scala> import scala.collection.JavaConversions._
import scala.collection.JavaConversions._
scala> import org.apache.spark.sql.SaveMode._
import org.apache.spark.sql.SaveMode._
scala> import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceReadOptions._
scala> import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.DataSourceWriteOptions._
scala> import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.hudi.config.HoodieWriteConfig._
scala> val tableName = "hudi_cow_table"
tableName: String = hudi_cow_table
scala> val basePath = "file:///tmp/hudi_cow_table"
basePath: String = file:///tmp/hudi_cow_table
scala> val dataGen = new DataGenerator
dataGen: org.apache.hudi.QuickstartUtils.DataGenerator = org.apache.hudi.QuickstartUtils$DataGenerator@4bf6bc2d数据生成器 可以基于行程样本模式 生成插入和更新的样本。8.4.2、插入数据生成一些新的行程样本,将其加载到DataFrame中,然后将DataFrame写入Hudi数据集中,如下所示。scala> val inserts = convertToStringList(dataGen.generateInserts(10))
inserts: java.util.List[String] = [{"ts": 0.0, "uuid": "81a9b76c-655b-4527-85fc-7696bdeab4fd", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.4726905879569653, "begin_lon": 0.46157858450465483, "e
nd_lat": 0.754803407008858, "end_lon": 0.9671159942018241, "fare": 34.158284716382845, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "0d612dd2-5f10-4296-a434-b34e6558e8f1", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.6100070562136587, "begin_lon": 0.8779402295427752, "end_lat": 0.3407870505929602, "end_lon": 0.5030798142293655, "fare": 43.4923811219014, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "0e170de4-7eda-4ab5-8c06-e351e8b23e3d", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.5731835407930634, "begin_...scala> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
warning: there was one deprecation warning; re-run with -deprecation for details
df: org.apache.spark.sql.DataFrame = [begin_lat: double, begin_lon: double ... 8 more fields]
scala> df.write.format("org.apache.hudi").
| options(getQuickstartWriteConfigs).
| option(PRECOMBINE_FIELD_OPT_KEY, "ts").
| option(RECORDKEY_FIELD_OPT_KEY, "uuid").
| option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
| option(TABLE_NAME, tableName).
| mode(Overwrite).
| save(basePath);
20/03/31 15:28:11 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL.mode(Overwrite)覆盖并重新创建数据集(如果已经存在)。 您可以检查在/tmp/hudi_cow_table/\<region>/\<country>/\<city>/下生成的数据。我们提供了一个记录键 (schema中的uuid),分区字段(region/county/city)和组合逻辑(schema中的ts) 以确保行程记录在每个分区中都是唯一的。更多信息请参阅 对Hudi中的数据进行建模, 有关将数据提取到Hudi中的方法的信息,请参阅写入Hudi数据集。 这里我们使用默认的写操作:插入更新。 如果您的工作负载没有更新,也可以使用更快的插入或批量插入操作。 想了解更多信息,请参阅写操作。8.4.3、查询数据将数据文件加载到DataFrame中。scala> df.write.format("org.apache.hudi").
| options(getQuickstartWriteConfigs).
| option(PRECOMBINE_FIELD_OPT_KEY, "ts").
| option(RECORDKEY_FIELD_OPT_KEY, "uuid").
| option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
| option(TABLE_NAME, tableName).
| mode(Overwrite).
| save(basePath);
20/03/31 15:28:11 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL.
scala> val roViewDF = spark.
| read.
| format("org.apache.hudi").
| load(basePath + "/*/*/*/*")
20/03/31 15:30:03 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL.
roViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
scala> roViewDF.registerTempTable("hudi_ro_table")
warning: there was one deprecation warning; re-run with -deprecation for details
scala> spark.sql("select fare, begin_lon, begin_lat, ts from hudi_ro_table where fare > 20.0").show()
+------------------+-------------------+-------------------+---+
| fare| begin_lon| begin_lat| ts|
+------------------+-------------------+-------------------+---+
| 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0|
| 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0|
| 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0|
| 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0|
|34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0|
| 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0|
| 43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0|
| 41.06290929046368| 0.8192868687714224| 0.651058505660742|0.0|
+------------------+-------------------+-------------------+---+
scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from hudi_ro_table").show()
+-------------------+--------------------+----------------------+---------+----------+------------------+
|_hoodie_commit_time| _hoodie_record_key|_hoodie_partition_path| rider| driver| fare|
+-------------------+--------------------+----------------------+---------+----------+------------------+
| 20200331152807|264170aa-dd3f-4a7...| americas/united_s...|rider-213|driver-213| 93.56018115236618|
| 20200331152807|0e170de4-7eda-4ab...| americas/united_s...|rider-213|driver-213| 64.27696295884016|
| 20200331152807|fb06d140-cd00-413...| americas/united_s...|rider-213|driver-213| 27.79478688582596|
| 20200331152807|eb1d495c-57b0-4b3...| americas/united_s...|rider-213|driver-213| 33.92216483948643|
| 20200331152807|2b3380b7-2216-4ca...| americas/united_s...|rider-213|driver-213|19.179139106643607|
| 20200331152807|81a9b76c-655b-452...| americas/brazil/s...|rider-213|driver-213|34.158284716382845|
| 20200331152807|d24e8cb8-69fd-4cc...| americas/brazil/s...|rider-213|driver-213| 66.62084366450246|
| 20200331152807|0d612dd2-5f10-429...| americas/brazil/s...|rider-213|driver-213| 43.4923811219014|
| 20200331152807|a6a7e7ed-3559-4ee...| asia/india/chennai|rider-213|driver-213|17.851135255091155|
| 20200331152807|824ee8d5-6f1f-4d5...| asia/india/chennai|rider-213|driver-213| 41.06290929046368|
+-------------------+--------------------+----------------------+---------+----------+------------------+该查询提供已提取数据的读取优化视图。由于我们的分区路径(region/country/city)是嵌套的3个级别 从基本路径开始,我们使用了load(basePath + "/*/*/*/*")。 有关支持的所有存储类型和视图的更多信息,请参考存储类型和视图。8.4.4、更新数据这类似于插入新数据。使用数据生成器生成对现有行程的更新,加载到DataFrame中并将DataFrame写入hudi数据集。scala> val updates = convertToStringList(dataGen.generateUpdates(10))
updates: java.util.List[String] = [{"ts": 0.0, "uuid": "0e170de4-7eda-4ab5-8c06-e351e8b23e3d", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.7340133901254792, "begin_lon": 0.5142184937933181, "en
d_lat": 0.7814655558162802, "end_lon": 0.6592596683641996, "fare": 49.527694252432056, "partitionpath": "americas/united_states/san_francisco"}, {"ts": 0.0, "uuid": "81a9b76c-655b-4527-85fc-7696bdeab4fd", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.1593867607188556, "begin_lon": 0.010872312870502165, "end_lat": 0.9808530350038475, "end_lon": 0.7963756520507014, "fare": 29.47661370147079, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "81a9b76c-655b-4527-85fc-7696bdeab4fd", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.71801964677...scala> val df = spark.read.json(spark.sparkContext.parallelize(updates, 2));
warning: there was one deprecation warning; re-run with -deprecation for details
df: org.apache.spark.sql.DataFrame = [begin_lat: double, begin_lon: double ... 8 more fields]
scala> df.write.format("org.apache.hudi").
| options(getQuickstartWriteConfigs).
| option(PRECOMBINE_FIELD_OPT_KEY, "ts").
| option(RECORDKEY_FIELD_OPT_KEY, "uuid").
| option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
| option(TABLE_NAME, tableName).
| mode(Append).
| save(basePath);
20/03/31 15:32:27 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL.注意,保存模式现在为追加。通常,除非您是第一次尝试创建数据集,否则请始终使用追加模式。 查询现在再次查询数据将显示更新的行程。每个写操作都会生成一个新的由时间戳表示的commit 。在之前提交的相同的_hoodie_record_key中寻找_hoodie_commit_time, rider, driver字段变更。8.4.5、增量查询Hudi还提供了获取给定提交时间戳以来已更改的记录流的功能。 这可以通过使用Hudi的增量视图并提供所需更改的开始时间来实现。 如果我们需要给定提交之后的所有更改(这是常见的情况),则无需指定结束时间。scala> // reload data
scala> spark.
| read.
| format("org.apache.hudi").
| load(basePath + "/*/*/*/*").
| createOrReplaceTempView("hudi_ro_table")
20/03/31 15:33:55 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL.
scala>
scala> val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime from hudi_ro_table order by commitTime").map(k => k.getString(0)).take(50)
commits: Array[String] = Array(20200331152807, 20200331153224)
scala> val beginTime = commits(commits.length - 2) // commit time we are interested in
beginTime: String = 20200331152807
scala> // 增量查询数据
scala> val incViewDF = spark.
| read.
| format("org.apache.hudi").
| option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
| option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
| load(basePath);
20/03/31 15:34:40 WARN hudi.DefaultSource: hoodie.datasource.view.type is deprecated and will be removed in a later release. Please use hoodie.datasource.query.type
incViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
scala> incViewDF.registerTempTable("hudi_incr_table")
warning: there was one deprecation warning; re-run with -deprecation for details
scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from hudi_incr_table where fare > 20.0").show()
+-------------------+------------------+--------------------+-------------------+---+
|_hoodie_commit_time| fare| begin_lon| begin_lat| ts|
+-------------------+------------------+--------------------+-------------------+---+
| 20200331153224|49.527694252432056| 0.5142184937933181| 0.7340133901254792|0.0|
| 20200331153224| 98.3428192817987| 0.3349917833248327| 0.4777395067707303|0.0|
| 20200331153224| 90.9053809533154| 0.19949323322922063|0.18294079059016366|0.0|
| 20200331153224| 90.25710109008239| 0.4006983139989222|0.08528650347654165|0.0|
| 20200331153224| 29.47661370147079|0.010872312870502165| 0.1593867607188556|0.0|
| 20200331153224| 63.72504913279929| 0.888493603696927| 0.6570857443423376|0.0|
+-------------------+------------------+--------------------+-------------------+---+
这将提供在开始时间提交之后发生的所有更改,其中包含票价大于20.0的过滤器。关于此功能的独特之处在于,它现在使您可以在批量数据上创作流式管道。8.4.6、特定时间点查询让我们看一下如何查询特定时间的数据。可以通过将结束时间指向特定的提交时间,将开始时间指向”000”(表示最早的提交时间)来表示特定时间。scala> val beginTime = "000" // Represents all commits > this time.
beginTime: String = 000
scala> val endTime = commits(commits.length - 2) // commit time we are interested in
endTime: String = 20200331152807
scala>
scala> // 增量查询数据
scala> val incViewDF = spark.read.format("org.apache.hudi").
| option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
| option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
| option(END_INSTANTTIME_OPT_KEY, endTime).
| load(basePath);
20/03/31 15:36:00 WARN hudi.DefaultSource: hoodie.datasource.view.type is deprecated and will be removed in a later release. Please use hoodie.datasource.query.type
incViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
scala> incViewDF.registerTempTable("hudi_incr_table")
warning: there was one deprecation warning; re-run with -deprecation for details
scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from hudi_incr_table where fare > 20.0").show()
+-------------------+------------------+-------------------+-------------------+---+
|_hoodie_commit_time| fare| begin_lon| begin_lat| ts|
+-------------------+------------------+-------------------+-------------------+---+
| 20200331152807| 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0|
| 20200331152807| 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0|
| 20200331152807| 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0|
| 20200331152807| 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0|
| 20200331152807|34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0|
| 20200331152807| 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0|
| 20200331152807| 43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0|
| 20200331152807| 41.06290929046368| 0.8192868687714224| 0.651058505660742|0.0|
+-------------------+------------------+-------------------+-------------------+---+
Sqoop入门(一篇就够了)
01 引言在前面的《DataX教程》,我们知道了DataX是阿里研发的一个数据迁移工具,同理,本文主要讲解的是Hadoop生态系统下的一个迁移工具,也就是Sqoop。02 Sqoop概述2.1 Sqoop定义Sqoop:是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。2.2 Sqoop功能Sqoop的主要功能如下:导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;导出数据:从Hadoop的文件系统中导出数据到关系数据库;2.3 Sqoop工作机制工作机制:将导入或导出命令翻译成mapreduce程序来实现在翻译出的mapreduce中主要是对inputformat和outputformat进行定制。03 Sqoop安装安装sqoop的前提是已经具备java和hadoop的环境 !3.1 Sqoop下载step1:下载并解压下载地址:http://ftp.wayne.edu/apache/sqoop/1.4.6/3.2 Sqoop配置step2:修改配置文件$ cd $SQOOP_HOME/conf
$ mv sqoop-env-template.sh sqoop-env.sh
打开sqoop-env.sh并编辑下面几行:export HADOOP_COMMON_HOME=/home/hadoop/apps/hadoop-2.6.1/
export HADOOP_MAPRED_HOME=/home/hadoop/apps/hadoop-2.6.1/
export HIVE_HOME=/home/hadoop/apps/hive-1.2.1
step3:加入mysql的jdbc驱动包cp ~/app/hive/lib/mysql-connector-java-5.1.28.jar $SQOOP_HOME/lib/
3.3 Sqoop验证启动step4:验证启动$ cd $SQOOP_HOME/bin
$ sqoop-version
预期的输出:15/12/17 14:52:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6 git commit id 5b34accaca7de251fc91161733f906af2eddbe83
Compiled by abe on Fri Aug 1 11:19:26 PDT 2015
到这里,整个Sqoop安装工作完成。04 Sqoop导入导出4.1 Sqoop导入4.1.1 导入语法Sqoop导入:导入单个表从RDBMS到HDFS,表中的每一行被视为HDFS的记录,所有记录都存储为文本文件的文本数据(或者Avro、sequence文件等二进制数据) 。下面的语法用于将数据导入HDFS:$ sqoop import (generic-args) (import-args)
4.1.2 导入案例在mysql中有一个库userdb中三个表:4.1.2.1 导入表数据到HDFS下面的命令用于从MySQL数据库服务器中的emp表导入HDFS:$bin/sqoop import \
--connect jdbc:mysql://hdp-node-01:3306/test \
--username root \
--password root \
--table emp --m 1
如果成功执行,那么会得到下面的输出:14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar
-----------------------------------------------------
O mapreduce.Job: map 0% reduce 0%
14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0%
14/12/22 15:28:16 INFO mapreduce.Job: Job job_1419242001831_0001 completed successfully
-----------------------------------------------------
-----------------------------------------------------
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec)
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.
为了验证在HDFS导入的数据,请使用以下命令查看导入的数据:$ $HADOOP_HOME/bin/hadoop fs -cat /user/hadoop/emp/part-m-00000
emp表的数据和字段之间用逗号(,)表示:1201, gopal, manager, 50000, TP
1202, manisha, preader, 50000, TP
1203, kalil, php dev, 30000, AC
1204, prasanth, php dev, 30000, AC
1205, kranthi, admin, 20000, TP
4.1.2.2 导入关系表到HIVEbin/sqoop import \
--connect jdbc:mysql://hdp-node-01:3306/test \
--username root \
--password root \
--table emp \
--hive-import --m 1
4.1.2.3 导入到HDFS指定目录指定目标目录选项的Sqoop导入命令的语法:--target-dir <new or exist directory in HDFS>
下面的命令是用来导入emp_add表数据到’/queryresult'目录bin/sqoop import \
--connect jdbc:mysql://hdp-node-01:3306/test \
--username root \
--password root \
--target-dir /queryresult \
--table emp --m 1
下面的命令是用来验证 /queryresult目录中 emp_add表导入的数据形式 :$HADOOP_HOME/bin/hadoop fs -cat /queryresult/part-m-*
它会用逗号(,)分隔emp_add表的数据和字段:1201, 288A, vgiri, jublee
1202, 108I, aoc, sec-bad
1203, 144Z, pgutta, hyd
1204, 78B, oldcity, sec-bad
1205, 720C, hitech, sec-bad
4.1.2.4 导入表数据子集我们可以导入表的使用Sqoop导入工具,"where"子句的一个子集。它执行在各自的数据库服务器相应的SQL查询,并将结果存储在HDFS的目标目录。where子句的语法如下:--where <condition>
下面的命令用来导入emp_add表数据的子集。子集查询检索员工ID和地址,居住城市为:Secunderabadbin/sqoop import \
--connect jdbc:mysql://hdp-node-01:3306/test \
--username root \
--password root \
--where "city ='sec-bad'" \
--target-dir /wherequery \
--table emp_add --m 1
下面的命令用来验证数据从emp_add表导入/wherequery目录:$HADOOP_HOME/bin/hadoop fs -cat /wherequery/part-m-*
它用逗号(,)分隔 emp_add表数据和字段:1202, 108I, aoc, sec-bad
1204, 78B, oldcity, sec-bad
1205, 720C, hitech, sec-bad
4.1.2.5 增量导入增量导入是仅导入新添加的表中的行的技术。它需要添加‘incremental’, ‘check-column’, 和 ‘last-value’选项来执行增量导入。下面的语法用于Sqoop导入命令增量选项:--incremental <mode>
--check-column <column name>
--last value <last check column value>
假设新添加的数据转换成emp表如下:1206, satish p, grp des, 20000, GR
下面的命令用于在EMP表执行增量导入:bin/sqoop import \
--connect jdbc:mysql://hdp-node-01:3306/test \
--username root \
--password root \
--table emp --m 1 \
--incremental append \
--check-column id \
--last-value 1205
以下命令用于从emp表导入HDFS emp/ 目录的数据验证:$ $HADOOP_HOME/bin/hadoop fs -cat /user/hadoop/emp/part-m-*
它用逗号(,)分隔emp_add表数据和字段:1201, gopal, manager, 50000, TP
1202, manisha, preader, 50000, TP
1203, kalil, php dev, 30000, AC
1204, prasanth, php dev, 30000, AC
1205, kranthi, admin, 20000, TP
1206, satish p, grp des, 20000, GR
下面的命令是从表emp 用来查看修改或新添加的行:$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*1
这表示新添加的行用逗号(,)分隔emp表的字段。
1206, satish p, grp des, 20000, GR
4.2 Sqoop导出4.2.1 导出语法Sqoop导出:将数据从HDFS导出到RDBMS数据库注意:默认操作是从将文件中的数据使用INSERT语句插入到表中更新模式下,是生成UPDATE语句更新表数据以下是export命令语法:$ sqoop export (generic-args) (export-args)
4.2.2 导出案例数据是在HDFS 中“EMP/”目录的emp_data文件中,所述emp_data如下:1201, gopal, manager, 50000, TP
1202, manisha, preader, 50000, TP
1203, kalil, php dev, 30000, AC
1204, prasanth, php dev, 30000, AC
1205, kranthi, admin, 20000, TP
1206, satish p, grp des, 20000, GR
step1:首先需要手动创建mysql中的目标表$ mysql
mysql> USE db;
mysql> CREATE TABLE employee (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(20),
deg VARCHAR(20),
salary INT,
dept VARCHAR(10));
step2:然后执行导出命令bin/sqoop export \
--connect jdbc:mysql://hdp-node-01:3306/test \
--username root \
--password root \
--table emp2 \
--export-dir /user/hadoop/emp/
step3:验证表mysql命令行mysql>select * from employee;
如果给定的数据存储成功,那么可以找到数据在如下的employee表。
+------+--------------+-------------+-------------------+--------+
| Id | Name | Designation | Salary | Dept |
+------+--------------+-------------+-------------------+--------+
| 1201 | gopal | manager | 50000 | TP |
| 1202 | manisha | preader | 50000 | TP |
| 1203 | kalil | php dev | 30000 | AC |
| 1204 | prasanth | php dev | 30000 | AC |
| 1205 | kranthi | admin | 20000 | TP |
| 1206 | satish p | grp des | 20000 | GR |
+------+--------------+-------------+-------------------+--------+
05 Sqoop原理Sqoop的原理:其实就是将导入导出命令转化为mapreduce程序来执行,sqoop在接收到命令后,都要生成mapreduce程序。5.1 Sqoop 代码定制使用sqoop的代码生成工具可以方便查看到sqoop所生成的java代码,并可在此基础之上进行深入定制开发。5.1.2 代码定制语法以下是Sqoop代码生成命令的语法:$ sqoop-codegen (generic-args) (codegen-args)
$ sqoop-codegen (generic-args) (codegen-args)
5.1.2 代码定制案例示例:以USERDB数据库中的表emp来生成Java代码为例,下面的命令用来生成导入:$ sqoop-codegen \
--import
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp
如果命令成功执行,那么它就会产生如下的输出:14/12/23 02:34:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/12/23 02:34:41 INFO tool.CodeGenTool: Beginning code generation
……………….
14/12/23 02:34:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/12/23 02:34:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.jar
验证: 查看输出目录下的文件$ cd /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/
$ ls
emp.class
emp.jar
emp.java
如果想做深入定制导出,则可修改上述代码文件!06 Sqoop、DataX关系与对比DataX之前写过教程,可以参考《DataX教程》6.1 Sqoop特点Sqoop主要特点:可以将关系型数据库中的数据导入hdfs、hive或者hbase等hadoop组件中,也可将hadoop组件中的数据导入到关系型数据库中;sqoop在导入导出数据时,充分采用了map-reduce计算框架,根据输入条件生成一个map-reduce作业,在hadoop集群中运行。采用map-reduce框架同时在多个节点进行import或者export操作,速度比单节点运行多个并行导入导出效率高,同时提供了良好的并发性和容错性;支持insert、update模式,可以选择参数,若内容存在就更新,若不存在就插入;对国外的主流关系型数据库支持性更好。6.2 DataX特点DataX主要特点:异构数据库和文件系统之间的数据交换;采用Framework + plugin架构构建,Framework处理了缓冲,流控,并发,上下文加载等高速数据交换的大部分技术问题,提供了简单的接口与插件交互,插件仅需实现对数据处理系统的访问;数据传输过程在单进程内完成,全内存操作,不读写磁盘,也没有IPC;开放式的框架,开发者可以在极短的时间开发一个新插件以快速支持新的数据库/文件系统。6.3 Sqoop与DataX的区别Sqoop与DataX的区别如下:sqoop采用map-reduce计算框架进行导入导出,而datax仅仅在运行datax的单台机器上进行数据的抽取和加载,速度比sqoop慢了许多;sqoop只可以在关系型数据库和hadoop组件之间进行数据迁移,而在hadoop相关组件之间,比如hive和hbase之间就无法使用sqoop互相导入导出数据,同时在关系型数据库之间,比如mysql和oracle之间也无法通过sqoop导入导出数据。与之相反,datax能够分别实现关系型数据库hadoop组件之间、关系型数据库之间、hadoop组件之间的数据迁移;sqoop是专门为hadoop而生,对hadoop支持度好,而datax可能会出现不支持高版本hadoop的现象;sqoop只支持官方提供的指定几种关系型数据库和hadoop组件之间的数据交换,而在datax中,用户只需根据自身需求修改文件,生成相应rpm包,自行安装之后就可以使用自己定制的插件;07 文末本文主要讲解了Sqoop的概念、安装以及使用,谢谢大家的阅读,本文完!