首页> 搜索结果页
"解决directory错误" 检索
共 4196 条结果
操作系统课设详细解答(二)
一、题目二实验二 Linux 进程管理二、实验目的通过进程的创建、撤销和运行加深对进程概念和进程并发执行的理解,明确进程和程序之间的区别。三、总体设计1.背景知识在 Linux 中创建子进程要使用 fork()函数,执行新的命令要使用 exec()系列函数,等待子进 程结束使用 wait()函数,结束终止进程使用 exit()函数。fork()原型如下:pid_t fork(void);fork 建立一个子进程,父进程继续运行,子进程在同样的位置执行同样的程序。对于父进程,fork()返回子进程的 pid, 对于子进程,fork()返回 0。出错时返回-1。2.模块介绍2-1:一个父进程,两个子进程2-2:一个父进程,一个子进程2-3:一个父进程,多个子进程3.设计步骤(1)进程的创建任务要求:编写一段程序,使用系统调用 fork()创建两个子进程。当此程序运行时,在系统 中有一个父进程和两个子进程活动。让每一个进程在屏幕上显示一个字符:父进程显示字符“a”; 两子进程分别显示字符“b”和字符“c”。步骤 1:使用 vi 或 gedit 新建一个 fork_demo.c 程序,然后拷贝清单 2-1 中的程序,使用 cc 或者gcc 编译成可执行文件 fork_demo。例如,可以使用 gcc –o fork_demo fork_demo.c 完成编译。步骤 2:在命令行输入./fork_demo 运行该程序。图2-1 进程的创建输出结果(2)子进程执行新任务任务要求:编写一段程序,使用系统调用 fork()创建一个子进程。子进程通过系统调用 exec 更换自己原有的执行代码,转去执行 Linux 命令/bin/ls (显示当前目录的列表),然后调用 exit()函 数结束。父进程则调用 waitpid()等待子进程结束,并在子进程结束后显示子进程的标识符,然后正 常结束。程序执行过程如图 2-1 所示。步骤 1:使用 vi 或 gedit 新建一个 exec_demo.c 程序,然后拷贝清单 2-2 中的程序(该程序的执 行如图 2-1 所示),使用 cc 或者 gcc 编译成可执行文件 exec_demo。例如,可以使用 gcc –o exec_demo exec_demo.c 完成编译。步骤 2:在命令行输入./exec_demo 运行该程序。步骤 3:观察该程序在屏幕上的显示结果,并分析。图2-2 子进程执行新任务输出结果(3)实现一个简单的 shell(命令行解释器) (此任务有一些难度,可选做)。任务要求:要设计的 shell 类似于 sh,bash,csh 等,必须支持以下内部命令:cd <目录>更改当前的工作目录到另一个<目录>。如果<目录>未指定,输出当前工作目录。如果<目录>不存在,应当有适当的错误信息提示。这个命令应该也能改变 PWD 的环境变量。environ 列出所有环境变量字符串的设置(类似于 Unix 系统下的 env 命令)。echo <内容 > 显示 echo 后的内容且换行help 简短概要的输出你的 shell 的使用方法和基本功能。jobs 输出 shell 当前的一系列子进程,必须提供子进程的命名和 PID 号。quit,exit,bye 退出 shell。图2-3 实现一个简单的 shell输出结果四、详细设计数据结构一个进程创建多个子进程时,则子进程之间具有兄弟关系,数据结构为链表结构,也运用了一些C++库函数。程序流程图图2-4 进程的创建流程图图2-5 子进程执行新任务流程图图2-6 实现一个简单的 shell(命令行解释器)流程图3. 关键代码2-1 创建进程#include <sys/types.h> #include <stdio.h> #include <unistd.h> int main () { int x; while((x=fork())==-1); if (x==0){ x=fork(); if(x>0) printf("b"); else printf("c"); } else printf("a"); }2-2 子进程执行新任务#include <sys/types.h> #include <stdio.h> #include <unistd.h> int main() { pid_t pid; /* fork a child process */ pid = fork(); if (pid < 0) { /* error occurred */ fprintf(stderr, "Fork Failed"); return 1; } else if (pid == 0) { /* 子进程 */ execlp("/bin/ls","ls",NULL); } else { /* 父进程 */ /* 父进程将一直等待,直到子进程运行完毕*/ wait(NULL); printf("Child Complete"); } return 0; } } return 0; }2-3 实现一个简单的 shell(命令行解释器) (选做)#include<stdio.h> #include<string.h> #include<sys/types.h> #include<unistd.h> int main() { char cmd[666]; char cata[100]; while(1) { int len,i,flag,cnt; printf("Enter commands:"); // print String scanf("%s",cmd); // Calculation String len = strlen(cmd); // for cd if(cmd[0]=='c') { flag=0; cnt=0; // Start after command for(i=3; i<len-1; i++) { // String is not null if(cmd[i]!=' ') flag=1; if(flag) { cata[cnt++] = cmd[i]; } } // String is null if(cnt==0) { printf("path error!\n"); cata[0]='.'; cata[1]='\0'; } } //for echo if(cmd[0]=='e'&&cmd[1]=='c') { flag = 0; for(i=5; i<len-1; i++) { if(cmd[i]!=' ') flag=1; if(flag) { putchar(cmd[i]); } } if(flag) putchar('\n'); } // for help if(cmd[0]=='h') { printf("/**********Method***********/\n"); printf("print cd<catalog> :find directory\n"); printf("print environ :List set\n"); printf("print echo<content> : print content\n"); printf("print help :List Method\n"); printf("print jobs :provide PID\n"); printf("print quit,exit,bye :break \n"); printf("/******Method***************/\n"); } // for quit,exit,bye if(cmd[0]=='q'||cmd[1]=='x'||cmd[0]=='b') { printf("break\n"); return 0; } else { cnt=0; // child process pid_t pid = fork(); if(pid<0) { // error occurred fprintf(stderr,"Fork Failed" ); return 1; } else if(pid==0) { //cd if(cmd[0]=='c') { execlp("/bin/ls",cata,NULL); } //jobs else if(cmd[0]=='j') { execlp("pstree","-p",NULL); } //environ else if(cmd[1]=='n') { execlp("env","",NULL); } } else { //wait child process exit wait(); } } printf("\n"); } return 0; }五、实验结果与分析实验2-1结果分析:修改后代码清单2-1后,从main()函数开始,运行父进程,通过while((x=fork())== -1)判断创建进程是否成功,如果x>0,则继续创建子进程,若成功,则此时有两个子进程和一个父进程,先创建的子进程会输出c,接下来是父进程执行完毕,输出a,后面是后创建的子进程执行完毕输出b;所以最终的输出结果是abc。实验2-2结果分析:从main()函数开始,父进程创建子进程,首先判断子进程是否创建成功,如果pid<0则创建进程失败,当pid=0时,运行子进程,输出系统当前目录。父进程将会一直等待子进程信号,只有当子进程释放信号,父进程输出“Child Complete”。实验2-3结果分析:从main()函数开始,根据下面这些关键字的标志位进行设置判断,然后再在判断之后对下面的功能进行实现:cd <目录>更改当前的工作目录到另一个<目录>。如果<目录>未指定,输出当前工作目录。如果<目录>不存在,应当有适当的错误信息提示。这个命令应该也能改变 PWD 的环境变量。environ 列出所有环境变量字符串的设置(类似于Unix 系统下的 env 命令)。echo <内容 > 显示 echo 后的内容且换行help 简短概要的输出你的 shell 的使用方法和基本功能。jobs 输出 shell 当前的一系列子进程,必须提供子进程的命名和 PID 号。quit,exit,bye 退出 shell,也就是依次终止运行的父子进程。六、小结与心得体会通过这个实验加深了我对Linux操作系统的进程概念的了解,也学会了在Linux基本运行,也使我明白了在Linux系统中子进程的创建,以及父子进程的运行过程,加深了对进程运行的理解。在Linux中利用fork建立一个子进程,父进程继续运行,子进程在同样的位置执行同样的程序。对于父进程,fork()返回子进程的 pid, 对于子进程,fork()返回 0,出错时返回-1,while((x=fork())==-1)这句话是用来判断子进程是否能创建成功,而且当x=0时运行子进程,当x>0时父进程执行,而x<0时,则进程创建不成功,通过代码确定父子进程的先后执行顺序。同时也完成实现一个简单的 shell(命令行解释器),这个是选做但我也挑战自己做这道题目,从中也收获非常多,采用了关键字这种思路去慢慢分块实现不同命令的功能,对于逻辑处理也提升很多。一、题目三实验三 互斥与同步二、实验目的(1) 回顾操作系统进程、线程的有关概念,加深对 Windows 线程的理解。(2) 了解互斥体对象,利用互斥与同步操作编写生产者-消费者问题的并发程序,加深对 P (即semWait)、V(即 semSignal)原语以及利用 P、V 原语进行进程间同步与互斥操作的理解。三、总体设计1.基本原理与算法1.1、利用的是互斥与同步中的信号量1.2、使用信号量解决有限缓冲区生产者和消费者问题2.模块介绍主要有两大模块:生产者和消费者;生产者又包括Produce(),Append(); 消费者包括Take(),Consume(); 线程的创建。3.设计步骤(1) 生产者消费者问题步骤 1:创建一个“Win32 Consol Application”工程,然后拷贝清单 3-1 中的程序,编译成可执行文件。步骤 2:在“命令提示符”窗口运行步骤 1 中生成的可执行文件,列出运行结果。步骤 3:仔细阅读源程序,找出创建线程的 WINDOWS API 函数,回答下列问题:线程的第一个执行函数是什么(从哪里开始执行)?它位于创建线程的 API 函数的第几个参数中?答:Produce()函数,位于第三个参数。步骤 4:修改清单 3-1 中的程序,调整生产者线程和消费者线程的个数,使得消费者数目大与生产者,看看结果有何不同。察看运行结果,从中你可以得出什么结论?答:当生产者个数多于消费者个数时生产速度快,生产者经常等待消费者对产品进行消费;反之,消费者经常等待生产者生产。步骤 5:修改清单 3-1 中的程序,按程序注释中的说明修改信号量 EmptySemaphore 的初始化方法,看看结果有何不同。答:结果为空,因为参数设置成可用资源为0,所以进程无法使用。步骤 6:根据步骤 4 的结果,并查看 MSDN,回答下列问题:1)CreateMutex 中有几个参数,各代表什么含义。2)CreateSemaphore 中有几个参数,各代表什么含义,信号量的初值在第几个参数中。3)程序中 P、V 原语所对应的实际 Windows API 函数是什么,写出这几条语句。4)CreateMutex 能用 CreateSemaphore 替代吗?尝试修改程序 3-1,将信号量 Mutex 完全用CreateSemaphore 及相关函数实现。写出要修改的语句。答:(1)3个;LPSECURITY_ATTRIBUTESlpMutexAttributes, // 指向安全属性的指针BOOLbInitialOwner, // 初始化互斥对象的所有者;LPCTSTRlpName // 指向互斥对象名的指针;第二个参数是FALSE,表示刚刚创建的这个Mutex不属于任何线程。(2)4个;//第一个参数:安全属性,如果为NULL则是默认安全属性 //第二个参数:信号量的初始值,要>=0且<=第三个参数 //第三个参数:信号量的最大值 //第四个参数:信号量的名称。(3)WaitForSingleObject(FullSemaphore,INFINITE); P(full);WaitForSingleObject(Mutex,INFINITE); //P(mutex);ReleaseMutex(Mutex); //V(mutex);ReleaseSemaphore(EmptySemaphore,1,NULL); //V(empty);(4)可以,Mutex=CreateSemaphore(NULL,false,false,NULL);生产者,消费者内: ReleaseMutex(Mutex);改为 ReleaseSemaphore(Mutex,1,NULL)。图3-1 生产者消费者问题输出结果(2) 读者写者问题根据实验(1)中所熟悉的 P、V 原语对应的实际 Windows API 函数,并参考教材中读者、写者问题的算法原理,尝试利用 Windows API 函数实现第一类读者写者问题(读者优先)。图3-2 读者写者问题输出结果四、详细设计数据结构应用了循环队列、数组,信号量。程序流程图图3-3 生产者消费者问题流程图图3-4 读者写者问题流程图3. 关键代码3-1 创建进程int main() { //创建各个互斥信号 //注意,互斥信号量和同步信号量的定义方法不同,互斥信号量调用的是 CreateMutex 函数,同步信号量调用的是 CreateSemaphore 函数,函数的返回值都是句柄。 Mutex = CreateMutex(NULL,FALSE,NULL); EmptySemaphore = CreateSemaphore(NULL,SIZE_OF_BUFFER,SIZE_OF_BUFFER,NULL); //将上句做如下修改,看看结果会怎样 // EmptySemaphore = CreateSemaphore(NULL,0,SIZE_OF_BUFFER-1,NULL); FullSemaphore = CreateSemaphore(NULL,0,SIZE_OF_BUFFER,NULL); //调整下面的数值,可以发现,当生产者个数多于消费者个数时, //生产速度快,生产者经常等待消费者;反之,消费者经常等待 const unsigned short PRODUCERS_COUNT = 10; //生产者的个数 const unsigned short CONSUMERS_COUNT = 1; //消费者的个数 //总的线程数 const unsigned short THREADS_COUNT = PRODUCERS_COUNT+CONSUMERS_COUNT; HANDLE hThreads[THREADS_COUNT]; //各线程的 handle DWORD producerID[PRODUCERS_COUNT]; //生产者线程的标识符 DWORD consumerID[CONSUMERS_COUNT]; //消费者线程的标识符 //创建生产者线程 for (int i=0; i<PRODUCERS_COUNT; ++i) { hThreads[i]=CreateThread(NULL,0,Producer,NULL,0,&producerID[i]); if (hThreads[i]==NULL) return -1; } //创建消费者线程 for (int i=0; i<CONSUMERS_COUNT; ++i) { hThreads[PRODUCERS_COUNT+i]=CreateThread(NULL,0,Consumer,NULL,0,&consumerID[i]); if (hThreads[i]==NULL) return -1; } while(p_ccontinue) { if(getchar()) //按回车后终止程序运行 { p_ccontinue = false; } } return 0; } //消费者 DWORD WINAPI Consumer(LPVOID lpPara) { while(p_ccontinue) { WaitForSingleObject(FullSemaphore,INFINITE); //P(full); WaitForSingleObject(Mutex,INFINITE); //P(mutex); Take(); Consume(); Sleep(1500); ReleaseMutex(Mutex); //V(mutex); ReleaseSemaphore(EmptySemaphore,1,NULL); //V(empty); } return 0; }3-2 子进程执行新任务int main() { Mutex = CreateMutex(NULL,FALSE,NULL); X = CreateMutex(NULL,FALSE,NULL); const unsigned short READERS_COUNT = 2;//创建两个读进程 const unsigned short WRITERS_COUNT = 1;//创建一个写进程 const unsigned short THREADS_COUNT = READERS_COUNT+WRITERS_COUNT; HANDLE hThreads[THREADS_COUNT]; //创建写线程 for (int i=0; i<WRITERS_COUNT; ++i) { hThreads[i]=CreateThread(NULL,0,writer,NULL,0,NULL); if (hThreads[i]==NULL) return -1; } //创建读线程 for (int i=0; i<READERS_COUNT; ++i) { hThreads[WRITERS_COUNT+i]=CreateThread(NULL,0,reader,NULL,0,NULL);//生产者线程函数Producer 线程ID&producerID[i] if (hThreads[i]==NULL) return -1; } //程序人为终止操作设计 while(p_ccontinue) { if(getchar()) //按回车后终止程序运行 { p_ccontinue = false; } } return 0; } //写者 DWORD WINAPI writer(LPVOID lpPara) { while(p_ccontinue) { WaitForSingleObject(Mutex,INFINITE); Write(); Sleep(1500); ReleaseMutex(Mutex); //V(mutex); } return 0; }五、实验结果与分析实验3-1结果分析:修改后代码清单3-1后,从main()函数开始,首先创建了生产者-消费者问题中应用到的互斥信号和同步信号以及其他基础定义,创建消费者生产者线程;最初生产者满足条件生产产品,所以先执行生产者,然后当资源有产品时,会执行消费者,生产者和消费者在代码运行过程中出现是随机的,当生产者多于消费者时,生产速度快,生产者经常等待消费者;反之,消费者经常等待;若缓冲区为空,则必定是生产者运行,缓冲区为满,则消费者运行,生产者等待,而对于结果的表示,则是调用了Append()和Consume()中的循环输出。实验3-2结果分析:这个是读写者中读者优先的问题,从main()函数开始,首先创建了生产者-消费者问题中应用到的两个互斥信号以及其他基础定义,创建消读者写者线程;最初写者先创建先运行,然后会执行读者线程,由于设置了两个互斥信号量可以将其中一个作为读者优先设置信号量,当第一个读者拿到这个互斥信号量时,写者就得等待读者释放这个信号量,而其他读者就不用就直接拿到不用判断可以运行输出。对于结果的表示,也是调用了read ()和Write()函数进行输出。六、小结与心得体会通过这个实验,我更好的了解互斥体对象,利用互斥与同步操作编写生产者-消费者问题的并发程序,加深对 P (即 semWait)、V(即 semSignal)原语以及利用 P、V 原语进行进程间同步与互斥操作的理解,生产者消费者问题是一个典型的例题,主要涉及同步与互斥,这也保证了在程序运行过程中只能有一个线程进行。然后对于3-2问题,我借鉴了《操作系统》课程书籍中的读者优先的思路,并将其实现,在这个过程中收获非常多也非常大,对于信号量以及进程的了解也更加深刻。以上只是操作系统课设部分设计内容,如果想要完整操作系统课设源代码资源有以下两种获取方式,请点击下面资源链接进行下载,希望能帮助到你!操作系统课设完整资源:点击打开下载资源操作系统课设完整资源:点击打开下载资源(注意:购买文章后,百度云盘链接大家不要直接复制链接,请手打链接否则可能打不开资源)
文章
安全  ·  算法  ·  Unix  ·  Shell  ·  Linux  ·  API  ·  C语言  ·  C++  ·  Windows
2023-01-13
MSE Java Agent踩坑之appendToSystemClassLoaderSearch问题
本文是《容器中的Java》系列文章之 2/n ,欢迎关注后续连载 :) 。JVM如何获取当前容器的资源限制?——容器中的Java 1从Java Agent报错开始,到JVM原理,到glibc线程安全,再到pthread tls,逐步探究Java Agent诡异报错。背景由于阿里云多个产品都提供了Java Agent给用户使用,在多个Java Agent一起使用的场景下,造成了总体Java Agent耗时增加,各个Agent各自存储,导致内存占用、资源消耗增加。MSE发起了one-java-agent项目,能够协同各个Java Agent;同时也支持更加高效、方便的字节码注入。其中,各个Java Agent作为one-java-agent的plugin,在premain阶段是通过多线程启动的方式来加载,从而将启动速度由O(n)降低到O(1),降低了整体Java Agent整体的加载时间。问题但最近在新版Agent验证过程中,one-java-agent的premain阶段,发现有如下报错:2022-06-15 06:22:47 [oneagent plugin arms-agent start] ERROR c.a.o.plugin.PluginManagerImpl -start plugin error, name: arms-agent com.alibaba.oneagent.plugin.PluginException: start error, agent jar::/home/admin/.opt/ArmsAgent/plugins/ArmsAgent/arms-bootstrap-1.7.0-SNAPSHOT.jar  at com.alibaba.oneagent.plugin.TraditionalPlugin.start(TraditionalPlugin.java:113)  at com.alibaba.oneagent.plugin.PluginManagerImpl.startOnePlugin(PluginManagerImpl.java:294)  at com.alibaba.oneagent.plugin.PluginManagerImpl.access$200(PluginManagerImpl.java:22)  at com.alibaba.oneagent.plugin.PluginManagerImpl$2.run(PluginManagerImpl.java:325)  at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.InternalError: null  at sun.instrument.InstrumentationImpl.appendToClassLoaderSearch0(Native Method)  at sun.instrument.InstrumentationImpl.appendToSystemClassLoaderSearch(InstrumentationImpl.java:200)  at com.alibaba.oneagent.plugin.TraditionalPlugin.start(TraditionalPlugin.java:100)  ... 4 common frames omitted 2022-06-16 09:51:09 [oneagent plugin ahas-java-agent start] ERROR c.a.o.plugin.PluginManagerImpl -start plugin error, name: ahas-java-agent com.alibaba.oneagent.plugin.PluginException: start error, agent jar::/home/admin/.opt/ArmsAgent/plugins/ahas-java-agent/ahas-java-agent.jar  at com.alibaba.oneagent.plugin.TraditionalPlugin.start(TraditionalPlugin.java:113)  at com.alibaba.oneagent.plugin.PluginManagerImpl.startOnePlugin(PluginManagerImpl.java:294)  at com.alibaba.oneagent.plugin.PluginManagerImpl.access$200(PluginManagerImpl.java:22)  at com.alibaba.oneagent.plugin.PluginManagerImpl$2.run(PluginManagerImpl.java:325)  at java.lang.Thread.run(Thread.java:855) Caused by: java.lang.IllegalArgumentException: null  at sun.instrument.InstrumentationImpl.appendToClassLoaderSearch0(Native Method)  at sun.instrument.InstrumentationImpl.appendToSystemClassLoaderSearch(InstrumentationImpl.java:200)  at com.alibaba.oneagent.plugin.TraditionalPlugin.start(TraditionalPlugin.java:100)  ... 4 common frames omitted熟悉Java Agent的同学可能能注意到,这是调用Instrumentation.appendToSystemClassLoaderSearch报错了。但首先appendToSystemClassLoaderSearch的路径是存在的;其次,这个报错的真实原因是在C++部分,比较难排查。但不管怎样,还是要深究下为什么出现这个错误。首先我们梳理下具体的调用流程,下面的分析都是基于此来分析的:- Instrumentation.appendToSystemClassLoaderSearch (java)  - appendToClassLoaderSearch0 (JNI)  `- appendToClassLoaderSearch  |- AddToSystemClassLoaderSearch  | `-create_class_path_zip_entry  | `-stat  `-convertUft8ToPlatformString  `- iconv打日志、确定现场因为这个问题在容器环境下,有10%的概率出现,比较容易复现,于是就用dragonwell8的最新代码,加日志,确认下现场。首先在JNI的实际入口处,也就是appendToClassLoaderSearch的方法入口添加日志:加了上面的日志后,发现问题更加令人头秃了:没有报错的时候,appendToClassLoaderSearch entry会输出。有报错的时候,appendToClassLoaderSearch entry反而没有输出,没执行到这儿?这个和报错的日志对不上啊,难道是stacktrace信息骗了我们?过了难熬的一晚上后,第二天请教了dragonwell的同学,大佬打日志的姿势是这样的:tty->print_cr("internal error");如果上面用不了,再用printf("xxx\n");fflush(stdout);这样加日志后,果然我们的日志都能打出来了。这是踩的第一个坑,printf要加上fflush才能保证输出成功。分析代码后面又是不断加日志,最终发现create_class_path_zip_entry返回NULL。找不到对应的jar文件?继续排查,发现是stat报错,返回No such file or directory。但是前面也提到了,jarFile的路径是存在的,难道stat不是线程安全的?查了下文档( https://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html),发现stat是线程安全的。于是又回过头来再看,这时候注意到stat的路径是不正常的:有的时候路径是空,有的时候路径是/home/admin/.opt/ArmsAgent/plugins/ahas-java-agent/ahas-java-agent.jarSHOT.jar,从字符末尾可以看到,基本上是因为两个字符写到了同一片内存导致的;而且对应字符串长度也变成了一个不规律的数字了。那么问题就很明确了,开始查找这个字符串的生成。这个字符是convertUft8ToPlatformString生成的。字符编码转换有问题?于是开始调试utf8ToPlatform的逻辑,这时候为了避免频繁加日志、重启容器,所以直接在ECS上运行gdb调试jvm。结果发现,在Linux下,utf8ToPlatform就是直接memcpy,而且memcpy的目标地址是在栈上。这怎么看都不太可能有线程安全问题啊?后来仔细查了下,发现和环境变量有关,ECS上编码相关的环境变量是LANG=en_US.UTF-8,在容器上centos:7默认没有这个环境变量,此种情况下,jvm读到的是ANSI_X3.4-1968。https://man7.org/linux/man-pages/man3/nl_langinfo.3.html这儿是第二个坑,环境变量会影响本地编码转换。结合如上现象和代码,发现在容器环境下,还是要经过iconv,从UTF-8转到ANSI_X3.4-1968编码的。其实,这儿也可以推测出来,如果手动在容器中设置了LANG=en_US.UTF-8,这个问题就不会再出现。额外的验证也证实了这点。然后又加日志,最终确认是iconv的时候,目标字符串写挂了。难道是iconv线程不安全?iconv不是线程安全的!查一下iconv的文档,发现它不是完全线程安全的:通俗的说,iconv之前,需要先用iconv_open打开一个iconv_t,而且这个iconv_t,不支持多线程同时使用。至此,问题已经差不多定位清楚了,因为jvm把iconv_t写成了全局变量,这样在多个线程append的时候,就有可能同时调用iconv,导致竞态问题。这儿是第三个坑,iconv不是线程安全的。如何修复先修复one-java-agent对于Java代码,非常容易修改,只需要加一个锁就可以了:但是这儿有一个设计问题,instrument对象已经在代码中到处散落了,现在突然要加一个锁,几乎所有用到的地方都要改,代码改造成本比较大。于是最终还是通过proxy类来解决:这样其他地方就只需要使用InstrumentationWrapper就可以了,也不会触发这个问题。jvm要不要修复然后我们分析下jvm侧的代码,发现就是因为iconv_t不是线程安全的,导致appendToClassLoaderSearch0方法不是线程安全的,那能不能优雅的解决掉呢?如果是Java程序,直接用ThreadLoal来存储iconv_t就能解决了。但是cpp这边,虽然C++ 11支持thread_local,但首先jdk8还没用C++ 11(这个可以参考 JEP );其次,C++ 11的也仅仅支持thread_local的set和get,thread_local的初始化、销毁等生命周期管理还不支持,比如没办法在线程结束时自动回收iconv_t资源。那咱们就fallback到pthread?因为pthread提供了thread-specific data,可以做类似的事情。pthread_key_create创建thread-local storage区域pthread_setspecific用于将值放入thread-local storagepthread_getspecific用于从thread-local storage取出值最重要的,pthread_once满足了pthread_key_t只能初始化一次的需求。另外也需要提到的,pthread_once的第二个参数,就是线程结束时的回调,我们就可以用它来关闭iconv_t,避免资源泄漏。总之pthread提供了thread_local的全生命周期管理。于是,最终代码如下,用make_key初始化thread-local storage:于是编译JDK之后,打镜像、批量重启数次pod,就没有再出现文章开头提到的问题了。总结在整个过程中,从Java到JNI/JVMTi,再到glibc,再到pthread,踩了很多坑:printf要加上fflush才能保证输出成功环境变量会影响本地字符编码转换iconv不是线程安全的使用pthread thread-local storage来实现线程局部变量的全生命周期管理从这个案例中,沿着调用栈、代码,逐步还原问题、并提出解决方案,希望大家能对Java/JVM多了解一点。相关信息one-java-agent修复的链接 https://github.com/alibaba/one-java-agent/issues/31dragonwell修复的链接 https://github.com/alibaba/dragonwell8/pull/346one-java-agent给大家带来了更加方便、无侵入的微服务治理方式 https://www.aliyun.com/product/aliware/mse
文章
存储  ·  弹性计算  ·  安全  ·  NoSQL  ·  Java  ·  Linux  ·  C语言  ·  C++  ·  容器  ·  Perl
2023-01-06
Windows平台Go语言环境搭建
官网:https://golang.google.cn/dl/使用 Go 构建简单、安全、可扩展的系统“当时,没有一个团队成员知道 Go,但在一个月内,每个人都在用 Go 编写,我们正在构建端点。正是它的灵活性、易用性以及 Go 背后的真正酷的概念(Go 如何处理本机并发、垃圾收集,当然还有安全+速度)帮助我们在构建过程中参与其中。还有,谁能打败那个可爱的吉祥物!— Jaime Enrique Garcia Lopez,Capital One 高级软件开发经理Windows平台Go语言环境搭建1. 下载2. 安装3. IDA安装Go(Golang)语言诞生于2007年,也是近些年比较新而且又有名的编程语言了,本语言是由 Google公司 的 Robert Griesemer、Rob Pike 和 Ken Thompson 开发的,并在2009年11月进行开源,2012年发布了Go 1稳定版本。Go语言很像C语言和高级语言JS,特点是简洁、快速、安全,在并行方面也是非常简单方便,所以对高性能分布式系统和游戏服务器更加友好。1. 下载https://golang.google.cn/进入官网之后,点击Download然后点击Microsoft Windows,下载安装文件2. 安装安装过程直接下一步即可。3. IDA安装Go语言开发推荐IDA有:Goland这个编译器是大名鼎鼎的JetBrains 公司的产品,非常值得推荐,但是就是收费和大小挺大的将近460MBLiteIDE XLiteIDE 是一款简单,开源,跨平台的 Go IDE。是国人七叶开发的,看到其只有43MB小巧而又方便,也是非常推荐的。VSCode一款深受程序员欢迎的编辑器,轻松扩展其他编程语言,其大小才86MB,不到100MB还是不错的。package main import "fmt" func main() { fmt.Println("Hello, World!") }这里我们介绍的LiteIDE,不为别的,支持国产!但是第一次使用可能会出现错误:go:go.mod file not found in current directory or any parent directory; see ‘go help modules’错误:进程退出代码 1.解决方案:go env -w GO111MODULE=auto
文章
安全  ·  JavaScript  ·  IDE  ·  程序员  ·  编译器  ·  Go  ·  开发工具  ·  C语言  ·  Windows
2023-03-26
【大数据开发运维解决方案】sqoop避免输入密码自动增量job脚本介绍
上一篇文章介绍了sqoop增量同步数据到hive,同时上一篇文章也给出了本人写的hadoop+hive+hbase+sqoop+kylin的伪分布式安装方法及使用和增量同步实现的连接,本篇文章将介绍如何将上一篇文章介绍的增量方式同sqoop自带的job机制和shell脚本以及crontab结合起来实现自动增量同步的需求。一、知识储备sqoop job --help usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [<tool-name>] [TOOL-ARGS]] Job management arguments: --create <job-id> Create a new saved job --delete <job-id> Delete a saved job --exec <job-id> Run a saved job --help Print usage instructions --list List saved jobs --meta-connect <jdbc-uri> Specify JDBC connect string for the metastore --show <job-id> Show the parameters for a saved job --verbose Print more information while working Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.二、详细实验这里先来看一个根据时间戳增量append的job创建和执行的过程,然后再看merge-id方式。1、先来创建一个增量追加的job:[root@hadoop bin]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS --fields-terminated-by '\t' --li nes-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'19/03/13 18:12:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/13 18:12:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43) at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785) at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:399) at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.create(HsqldbJobStorage.java:379) at org.apache.sqoop.tool.JobTool.createJob(JobTool.java:181) at org.apache.sqoop.tool.JobTool.run(JobTool.java:294) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.lang.ClassNotFoundException: org.json.JSONObject at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)报错了,sqoop缺少java-json.jar包。好吧,下载缺少的jar包然后上传到$SQOOP_HOME/lib,连接:点此下载jar包将下载好的jar包放到$SQOOP_HOME/lib下,然后重新创建:先把之前创建失败的job删除了[root@hadoop ~]# sqoop job --delete inc_job Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/13 18:40:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]创建job[root@hadoop ~]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password "tiger" --table INR_LAS --fields-terminated-by '\t' --l ines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/13 18:40:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/13 18:40:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.列出来刚刚创建的job[root@hadoop ~]# sqoop job --list Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/13 18:41:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Available jobs: inc_job 查看刚刚创建的job保存的last_value[root@hadoop ~]# sqoop job --show inc_job Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/13 18:45:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Enter password: -------这里输入的是执行这个job的系统用户密码。而执行的时候这里输入的是连接数据库的用户对应的密码 Job: inc_job Tool: import Options: ---------------------------- verbose = false hcatalog.drop.and.create.table = false incremental.last.value = 2019-03-20 14:49:19 db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl codegen.output.delimiters.escape = 0 codegen.output.delimiters.enclose.required = false codegen.input.delimiters.field = 0 mainframe.input.dataset.type = p hbase.create.table = false split.limit = null null.string = \\N db.require.password = true skip.dist.cache = false hdfs.append.dir = true db.table = INR_LAS codegen.input.delimiters.escape = 0 accumulo.create.table = false import.fetch.size = null codegen.input.delimiters.enclose.required = false db.username = scott reset.onemapper = false codegen.output.delimiters.record = 10 import.max.inline.lob.size = 16777216 sqoop.throwOnError = false hbase.bulk.load.enabled = false hcatalog.create.table = false db.clear.staging.table = false incremental.col = ETLTIME codegen.input.delimiters.record = 0 enable.compression = false hive.overwrite.table = false hive.import = true codegen.input.delimiters.enclose = 0 hive.table.name = INR_LAS accumulo.batch.size = 10240000 hive.database.name = oracle hive.drop.delims = false customtool.options.jsonmap = {} null.non-string = \\N codegen.output.delimiters.enclose = 0 hdfs.delete-target.dir = false codegen.output.dir = . codegen.auto.compile.dir = true relaxed.isolation = false mapreduce.num.mappers = 1 accumulo.max.latency = 5000 import.direct.split.size = 0 sqlconnection.metadata.transaction.isolation.level = 2 codegen.output.delimiters.field = 9 export.new.update = UpdateOnly incremental.mode = AppendRows hdfs.file.format = TextFile sqoop.oracle.escaping.disabled = true codegen.compile.dir = /tmp/sqoop-root/compile/1173d716481c4bd8f6cb589b87a382ea direct.import = false temporary.dirRoot = _sqoop hive.fail.table.exists = false db.batch = false接下来手动执行[root@hadoop ~]# sqoop job --exec inc_job Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/13 18:47:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Enter password: ---------这里输入的是连接数据库的用户对应的密码,如何做到免密登录?向下继续看 19/03/13 18:47:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/13 18:47:50 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/13 18:47:50 INFO tool.CodeGenTool: Beginning code generation 19/03/13 18:47:51 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 18:47:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 18:47:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/13 18:47:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.jar 19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 18:47:55 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS 19/03/13 18:47:55 INFO tool.ImportTool: Incremental import based on column ETLTIME 19/03/13 18:47:55 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/13 18:47:55 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 18:47:55 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS 19/03/13 18:47:55 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 18:47:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/13 18:47:57 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/13 18:48:00 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/13 18:48:00 INFO mapreduce.JobSubmitter: number of splits:1 19/03/13 18:48:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0016 19/03/13 18:48:01 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0016 19/03/13 18:48:01 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0016/ 19/03/13 18:48:01 INFO mapreduce.Job: Running job: job_1552469242276_0016 19/03/13 18:48:11 INFO mapreduce.Job: Job job_1552469242276_0016 running in uber mode : false 19/03/13 18:48:11 INFO mapreduce.Job: map 0% reduce 0% 19/03/13 18:48:18 INFO mapreduce.Job: map 100% reduce 0% 19/03/13 18:48:18 INFO mapreduce.Job: Job job_1552469242276_0016 completed successfully 19/03/13 18:48:19 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=144628 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=39 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4454 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4454 Total vcore-milliseconds taken by all map tasks=4454 Total megabyte-milliseconds taken by all map tasks=4560896 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=229 CPU time spent (ms)=2430 Physical memory (bytes) snapshot=191975424 Virtual memory (bytes) snapshot=2143756288 Total committed heap usage (bytes)=116916224 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=39 19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3135 seconds (1.7478 bytes/sec) 19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Retrieved 1 records. 19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS 19/03/13 18:48:19 INFO util.AppendUtils: Creating missing output directory - INR_LAS 19/03/13 18:48:19 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 18:48:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 18:48:19 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive 19/03/13 18:48:19 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive 19/03/13 18:48:19 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive 19/03/13 18:48:19 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/13 18:48:19 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/13 18:48:22 INFO SessionState: Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 19/03/13 18:48:22 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0/_tmp_space.db 19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0 19/03/13 18:48:22 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main 19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0 19/03/13 18:48:22 INFO ql.Driver: Compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 18:48:25 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 18:48:25 INFO hive.metastore: Connected to metastore. 19/03/13 18:48:25 INFO parse.CalcitePlanner: Starting Semantic Analysis 19/03/13 18:48:25 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27 19/03/13 18:48:25 INFO ql.Driver: Semantic Analysis Completed 19/03/13 18:48:25 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/13 18:48:25 INFO ql.Driver: Completed compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 3.251 seconds 19/03/13 18:48:25 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/13 18:48:25 INFO ql.Driver: Executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=e09a2f96-2edd-4747-a65f-4899c2863aa 0, clientType=HIVECLI]19/03/13 18:48:25 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 19/03/13 18:48:25 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop. hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 18:48:26 INFO hive.metastore: Connected to metastore. 19/03/13 18:48:26 INFO ql.Driver: Completed executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 0.113 seconds OK 19/03/13 18:48:26 INFO ql.Driver: OK Time taken: 3.379 seconds 19/03/13 18:48:26 INFO CliDriver: Time taken: 3.379 seconds 19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0 19/03/13 18:48:26 INFO session.SessionState: Resetting thread name to main 19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0 19/03/13 18:48:26 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main 19/03/13 18:48:26 INFO ql.Driver: Compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS` 19/03/13 18:48:26 INFO ql.Driver: Semantic Analysis Completed 19/03/13 18:48:26 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/13 18:48:26 INFO ql.Driver: Completed compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 0.426 seconds 19/03/13 18:48:26 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/13 18:48:26 INFO ql.Driver: Executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS` 19/03/13 18:48:26 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0 Loading data to table oracle.inr_las 19/03/13 18:48:26 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS 19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 18:48:26 INFO hive.metastore: Connected to metastore. 19/03/13 18:48:26 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 19/03/13 18:48:27 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode 19/03/13 18:48:27 INFO exec.StatsTask: Executing stats task 19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 18:48:27 INFO hive.metastore: Connected to metastore. 19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 18:48:27 INFO hive.metastore: Connected to metastore. 19/03/13 18:48:27 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=6, numRows=0, totalSize=518, rawDataSize=0] 19/03/13 18:48:27 INFO ql.Driver: Completed executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 1.225 seconds OK 19/03/13 18:48:27 INFO ql.Driver: OK Time taken: 1.653 seconds 19/03/13 18:48:27 INFO CliDriver: Time taken: 1.653 seconds 19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0 19/03/13 18:48:27 INFO session.SessionState: Resetting thread name to main 19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0 19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme hdfs 19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme file 19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 18:48:27 INFO hive.HiveImport: Hive import complete. 19/03/13 18:48:27 INFO hive.HiveImport: Export directory is empty, removing it. 19/03/13 18:48:27 INFO tool.ImportTool: Saving incremental import state to the metastore 19/03/13 18:48:27 INFO tool.ImportTool: Updated data for job: inc_job通过上面实验我们发现每次执行job时,都要输入数据库用户密码,怎么实现免密登录,可以参照这种方式:在创建Job时,使用--password-file参数,而且非--passoword。主要原因是在执行Job时使用--password参数将有警告,并且需要输入密码才能执行Job。当我们采用--password-file参数时,执行Job无需输入数据库密码,所以我们修改一下上面创建的job语句:先drop原来的job[root@hadoop conf]# sqoop job --delete inc_job创建password-file文件注:sqoop规定密码文件必须放在HDFS之上,并且权限必须为400[root@hadoop sqoop]# mkdir pwd [root@hadoop sqoop]# cd pwd [root@hadoop pwd]# pwd /hadoop/sqoop/pwd [root@hadoop pwd]# echo -n "tiger" > scott.pwd [root@hadoop pwd]# hdfs dfs -put scott.pwd /user/hive/warehouse [root@hadoop pwd]# hdfs dfs -chmod 400 /user/hive/warehouse/scott.pwd重新创建,这里不在指定password而是passwordfile[root@hadoop conf]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password-file /user/hive/warehouse/scott.pwd --table INR_LAS --fields-terminated-by '\t' --lines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'验证,看下当前oracle数据库表:select * from inr_las; EMPNO ENAME JOB SAL ETLTIME 1 er CLERK 800.00 2019/3/20 10:42:27 2 ALLEN SALESMAN 1600.00 2019/3/20 10:42:27 3 WARD SALESMAN 1250.00 2019/3/20 10:42:27 4 JONES MANAGER 2975.00 2019/3/20 10:42:27 5 MARTIN SALESMAN 1250.00 2019/3/20 10:42:27 6 zhao DBA 1000.00 2019/3/20 10:52:34 7 yan BI 100.00 2019/3/20 10:42:27 8 dong JAVA 5232.00 2019/3/20 15:36:07再看下当前hive表数据:hive> select * from inr_las; OK 1 er CLERK 800.0 2019-03-20 10:42:27.0 2 ALLEN SALESMAN 1600.0 2019-03-20 10:42:27.0 3 WARD SALESMAN 1250.0 2019-03-20 10:42:27.0 4 JONES MANAGER 2975.0 2019-03-20 10:42:27.0 5 MARTIN SALESMAN 1250.0 2019-03-20 10:42:27.0 6 zhao DBA 1000.0 2019-03-20 10:52:34.0 7 yan BI 100.0 2019-03-20 10:42:27.0 8 dong JAVA 332.0 2019-03-20 14:49:19.0 8 dong JAVA 3232.0 2019-03-20 15:13:35.0 8 dong JAVA 4232.0 2019-03-20 15:29:03.0 8 dong JAVA 5232.0 2019-03-20 15:36:07.0 8 dong JAVA 5232.0 2019-03-20 15:36:07.0 8 dong JAVA 3232.0 2019-03-20 15:13:35.0 Time taken: 0.176 seconds, Fetched: 13 row(s)我们job的增量时间设置的--last-value '2019-03-20 14:49:19',源端有一条数据empno=8符合增量条件,现在再执行一下新创建的job:[root@hadoop pwd]# sqoop job --exec inc_job Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/13 19:14:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/13 19:14:32 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/13 19:14:32 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/13 19:14:32 INFO tool.CodeGenTool: Beginning code generation 19/03/13 19:14:33 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 19:14:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 19:14:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/13 19:14:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.jar 19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 19:14:38 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS 19/03/13 19:14:38 INFO tool.ImportTool: Incremental import based on column ETLTIME 19/03/13 19:14:38 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/13 19:14:38 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 19:14:38 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS 19/03/13 19:14:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 19:14:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/13 19:14:38 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/13 19:14:42 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/13 19:14:42 INFO mapreduce.JobSubmitter: number of splits:1 19/03/13 19:14:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0017 19/03/13 19:14:42 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0017 19/03/13 19:14:43 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0017/ 19/03/13 19:14:43 INFO mapreduce.Job: Running job: job_1552469242276_0017 19/03/13 19:14:53 INFO mapreduce.Job: Job job_1552469242276_0017 running in uber mode : false 19/03/13 19:14:53 INFO mapreduce.Job: map 0% reduce 0% 19/03/13 19:15:00 INFO mapreduce.Job: map 100% reduce 0% 19/03/13 19:15:00 INFO mapreduce.Job: Job job_1552469242276_0017 completed successfully 19/03/13 19:15:00 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=144775 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=39 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=5332 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=5332 Total vcore-milliseconds taken by all map tasks=5332 Total megabyte-milliseconds taken by all map tasks=5459968 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=651 CPU time spent (ms)=2670 Physical memory (bytes) snapshot=188571648 Virtual memory (bytes) snapshot=2148745216 Total committed heap usage (bytes)=119537664 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=39 19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3081 seconds (1.7482 bytes/sec) 19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Retrieved 1 records. 19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS 19/03/13 19:15:00 INFO util.AppendUtils: Creating missing output directory - INR_LAS 19/03/13 19:15:01 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 19:15:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 19:15:01 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive 19/03/13 19:15:01 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive 19/03/13 19:15:01 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive 19/03/13 19:15:01 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/13 19:15:01 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/13 19:15:04 INFO SessionState: Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618 19/03/13 19:15:04 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618 19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618/_tmp_space.db 19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618 19/03/13 19:15:04 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main 19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618 19/03/13 19:15:04 INFO ql.Driver: Compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 19:15:09 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 19:15:09 INFO hive.metastore: Connected to metastore. 19/03/13 19:15:09 INFO parse.CalcitePlanner: Starting Semantic Analysis 19/03/13 19:15:09 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27 19/03/13 19:15:09 INFO ql.Driver: Semantic Analysis Completed 19/03/13 19:15:09 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/13 19:15:09 INFO ql.Driver: Completed compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 5.309 seconds 19/03/13 19:15:09 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/13 19:15:09 INFO ql.Driver: Executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=7feac288-289d-4d74-8641-553c5ab6561 8, clientType=HIVECLI]19/03/13 19:15:09 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 19/03/13 19:15:09 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop. hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 19:15:10 INFO hive.metastore: Connected to metastore. 19/03/13 19:15:10 INFO ql.Driver: Completed executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 0.106 seconds OK 19/03/13 19:15:10 INFO ql.Driver: OK Time taken: 5.429 seconds 19/03/13 19:15:10 INFO CliDriver: Time taken: 5.429 seconds 19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618 19/03/13 19:15:10 INFO session.SessionState: Resetting thread name to main 19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618 19/03/13 19:15:10 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main 19/03/13 19:15:10 INFO ql.Driver: Compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS` 19/03/13 19:15:10 INFO ql.Driver: Semantic Analysis Completed 19/03/13 19:15:10 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/13 19:15:10 INFO ql.Driver: Completed compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 0.415 seconds 19/03/13 19:15:10 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/13 19:15:10 INFO ql.Driver: Executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS` 19/03/13 19:15:10 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0 Loading data to table oracle.inr_las 19/03/13 19:15:10 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS 19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 19:15:10 INFO hive.metastore: Connected to metastore. 19/03/13 19:15:10 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 19/03/13 19:15:11 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode 19/03/13 19:15:11 INFO exec.StatsTask: Executing stats task 19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 19:15:11 INFO hive.metastore: Connected to metastore. 19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 19:15:11 INFO hive.metastore: Connected to metastore. 19/03/13 19:15:11 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=7, numRows=0, totalSize=557, rawDataSize=0] 19/03/13 19:15:11 INFO ql.Driver: Completed executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 1.296 seconds OK 19/03/13 19:15:11 INFO ql.Driver: OK Time taken: 1.713 seconds 19/03/13 19:15:11 INFO CliDriver: Time taken: 1.713 seconds 19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618 19/03/13 19:15:11 INFO session.SessionState: Resetting thread name to main 19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618 19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme hdfs 19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme file 19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 19:15:11 INFO hive.HiveImport: Hive import complete. 19/03/13 19:15:11 INFO hive.HiveImport: Export directory is empty, removing it. 19/03/13 19:15:11 INFO tool.ImportTool: Saving incremental import state to the metastore 19/03/13 19:15:11 INFO tool.ImportTool: Updated data for job: inc_job发现已经不需要输入密码了,再来看下hive表数据:hive> select * from inr_las; OK 1 er CLERK 800.0 2019-03-20 10:42:27.0 2 ALLEN SALESMAN 1600.0 2019-03-20 10:42:27.0 3 WARD SALESMAN 1250.0 2019-03-20 10:42:27.0 4 JONES MANAGER 2975.0 2019-03-20 10:42:27.0 5 MARTIN SALESMAN 1250.0 2019-03-20 10:42:27.0 6 zhao DBA 1000.0 2019-03-20 10:52:34.0 7 yan BI 100.0 2019-03-20 10:42:27.0 8 dong JAVA 332.0 2019-03-20 14:49:19.0 8 dong JAVA 3232.0 2019-03-20 15:13:35.0 8 dong JAVA 4232.0 2019-03-20 15:29:03.0 8 dong JAVA 5232.0 2019-03-20 15:36:07.0 8 dong JAVA 5232.0 2019-03-20 15:36:07.0 8 dong JAVA 5232.0 2019-03-20 15:36:07.0 8 dong JAVA 3232.0 2019-03-20 15:13:35.0 Time taken: 0.161 seconds, Fetched: 14 row(s)成14条数据了,多了条empno=8的数据,成功了。不过笔者这里的需求是源端oracle数据库做了update之后,由于时间戳也会跟着变化,所以我们要求根据时间戳找出变更的数据然后在hive增量更新,这里就使用到了根据时间和主键进行合并增量的nerge-id模式,job的创建类似上面的例子.这里的例子为:我们通过shell脚本进行封装让crontab 定时执行增量。注意:先声明一下,因为笔者是测试增量导入给kylin做增量cube用,测试数据量很少,所以hive表只创建外部表,不在分区。下面全流程演示如何一步步把一个表通过sqoop job结合crontab+shell脚本自动增量导入到hive:第一步,先在oracle端创建一个要同步的表,这里用上面的inr_las表再初始化一个表:create table inr_job as select a.empno, a.ename, a.job, a.sal, sysdate etltime from inr_las a ;第二步,在hive创建目标表:hive> use oracle; OK Time taken: 1.425 seconds create table INR_JOB ( empno int, ename string, job string, sal float, etltime string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/user/hive/warehouse/exter_inr_job'; Time taken: 2.836 seconds第三步,全量把数据导入hive:先删除一下上面创建外部表时指定的目录,因为创建外部表时会自动创建目录,而下面的全量导入也会自动创建,因此会导致冲突提示目录存在的错误:[root@hadoop hadoop]# hadoop fs -rmr /user/hive/warehouse/exter_inr_job rmr: DEPRECATED: Please use 'rm -r' instead. 19/03/14 06:01:23 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minut es, Emptier interval = 0 minutes.Deleted /user/hive/warehouse/exter_inr_job 接下来全量导入:sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_JOB -m 1 --target-dir /user/hive/warehouse/exter_inr_job --fields-terminated-by '\t'导入完成查询下hive数据:hive> select * from inr_job; OK 1 er CLERK 800.0 2019-03-22 17:24:42.0 2 ALLEN SALESMAN 1600.0 2019-03-22 17:24:42.0 3 WARD SALESMAN 1250.0 2019-03-22 17:24:42.0 4 JONES MANAGER 2975.0 2019-03-22 17:24:42.0 5 MARTIN SALESMAN 1250.0 2019-03-22 17:24:42.0 6 zhao DBA 1000.0 2019-03-22 17:24:42.0 7 yan BI 100.0 2019-03-22 17:24:42.0 8 dong JAVA 400.0 2019-03-22 17:24:42.0 Time taken: 3.153 seconds, Fetched: 8 row(s) 第四步,创建增量sqoop job下面的--password-file /user/hive/warehouse/scott.pwd 是之前上一篇文章创建的密码文件,读者可以看下上篇文章如何创建的 sqoop job --create auto_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password-file /user/hive/warehouse/scott.pwd --table INR_JOB --fields-terminated-by '\t' --lines-terminated-by '\n' --target-dir /user/hive/warehouse/exter_inr_job -m 1 --check-column ETLTIME --incremental lastmodified --merge-key EMPNO --last-value "2019-03-22 17:24:42"看下创建的job信息:[root@hadoop hadoop]# sqoop job --show auto_job Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/14 06:10:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/i mpl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLogg erBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLog gerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Job: auto_job Tool: import Options: ---------------------------- verbose = false hcatalog.drop.and.create.table = false incremental.last.value = 2019-03-22 17:24:42 db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl codegen.output.delimiters.escape = 0 codegen.output.delimiters.enclose.required = false codegen.input.delimiters.field = 0 mainframe.input.dataset.type = p split.limit = null hbase.create.table = false skip.dist.cache = false hdfs.append.dir = false db.table = INR_JOB codegen.input.delimiters.escape = 0 accumulo.create.table = false import.fetch.size = null codegen.input.delimiters.enclose.required = false db.username = scott reset.onemapper = false codegen.output.delimiters.record = 10 import.max.inline.lob.size = 16777216 sqoop.throwOnError = false hbase.bulk.load.enabled = false hcatalog.create.table = false db.clear.staging.table = false incremental.col = ETLTIME codegen.input.delimiters.record = 0 db.password.file = /user/hive/warehouse/scott.pwd enable.compression = false hive.overwrite.table = false hive.import = false codegen.input.delimiters.enclose = 0 accumulo.batch.size = 10240000 hive.drop.delims = false customtool.options.jsonmap = {} codegen.output.delimiters.enclose = 0 hdfs.delete-target.dir = false codegen.output.dir = . codegen.auto.compile.dir = true relaxed.isolation = false mapreduce.num.mappers = 1 accumulo.max.latency = 5000 import.direct.split.size = 0 sqlconnection.metadata.transaction.isolation.level = 2 codegen.output.delimiters.field = 9 export.new.update = UpdateOnly incremental.mode = DateLastModified hdfs.file.format = TextFile sqoop.oracle.escaping.disabled = true codegen.compile.dir = /tmp/sqoop-root/compile/be3b358816e17c786d114afb7a4f2a6d direct.import = false temporary.dirRoot = _sqoop hdfs.target.dir = /user/hive/warehouse/exter_inr_job hive.fail.table.exists = false merge.key.col = EMPNO db.batch = false 第五步,封装到shell脚本,加入定时任务[root@hadoop ~]# cd /hadoop/ [root@hadoop hadoop]# vim auto_inr.sh 加入下面内容: #!/bin/bash log="/hadoop/auto_job_log.log" echo "======================`date "+%Y-%m-%d %H:%M:%S"`增量======================" >> $log nohup sqoop job --exec auto_job >> $log 2>&1 & 保存退出,赋予权限 [root@hadoop hadoop]# chmod +x auto_inr.sh 先来手动执行一下,不过执行前先再看看job的last_value时间:[root@hadoop hadoop]# sqoop job --show auto_job Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/25 17:50:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Job: auto_job Tool: import Options: ---------------------------- verbose = false hcatalog.drop.and.create.table = false incremental.last.value = 2019-03-22 17:24:42 db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl 看到是2019-03-22 17:24:42,再看下当前时间:[root@hadoop hadoop]# date Mon Mar 25 17:54:54 CST 2019接下来手动执行下这个脚本:[root@hadoop hadoop]# ./auto_inr.sh 然后去看重定向的日志:[root@hadoop hadoop]# cat auto_job_log.log ======================2019-03-25 17:55:46增量====================== Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/25 17:55:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/25 17:55:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/25 17:55:50 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/25 17:55:50 INFO tool.CodeGenTool: Beginning code generation 19/03/25 17:55:51 INFO manager.OracleManager: Time zone has been set to GMT 19/03/25 17:55:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0 19/03/25 17:55:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/25 17:55:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.jar 19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT 19/03/25 17:55:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0 19/03/25 17:55:54 INFO tool.ImportTool: Incremental import based on column ETLTIME 19/03/25 17:55:54 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-22 17:24:42', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT 19/03/25 17:55:54 INFO mapreduce.ImportJobBase: Beginning import of INR_JOB 19/03/25 17:55:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT 19/03/25 17:55:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/25 17:55:54 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/25 17:55:57 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/25 17:55:57 INFO mapreduce.JobSubmitter: number of splits:1 19/03/25 17:55:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0009 19/03/25 17:55:58 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0009 19/03/25 17:55:58 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0009/ 19/03/25 17:55:58 INFO mapreduce.Job: Running job: job_1553503985304_0009 19/03/25 17:56:07 INFO mapreduce.Job: Job job_1553503985304_0009 running in uber mode : false 19/03/25 17:56:07 INFO mapreduce.Job: map 0% reduce 0% 19/03/25 17:56:15 INFO mapreduce.Job: map 100% reduce 0% 19/03/25 17:56:15 INFO mapreduce.Job: Job job_1553503985304_0009 completed successfully 19/03/25 17:56:15 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=144775 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=323 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=5270 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=5270 Total vcore-milliseconds taken by all map tasks=5270 Total megabyte-milliseconds taken by all map tasks=5396480 Map-Reduce Framework Map input records=8 Map output records=8 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=73 CPU time spent (ms)=3000 Physical memory (bytes) snapshot=205058048 Virtual memory (bytes) snapshot=2135244800 Total committed heap usage (bytes)=109576192 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=323 19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Transferred 323 bytes in 20.9155 seconds (15.4431 bytes/sec) 19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Retrieved 8 records. 19/03/25 17:56:15 INFO tool.ImportTool: Final destination exists, will run merge job. 19/03/25 17:56:15 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 19/03/25 17:56:15 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/25 17:56:18 INFO input.FileInputFormat: Total input paths to process : 2 19/03/25 17:56:18 INFO mapreduce.JobSubmitter: number of splits:2 19/03/25 17:56:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0010 19/03/25 17:56:19 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0010 19/03/25 17:56:19 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0010/ 19/03/25 17:56:19 INFO mapreduce.Job: Running job: job_1553503985304_0010 19/03/25 17:56:29 INFO mapreduce.Job: Job job_1553503985304_0010 running in uber mode : false 19/03/25 17:56:29 INFO mapreduce.Job: map 0% reduce 0% 19/03/25 17:56:39 INFO mapreduce.Job: map 100% reduce 0% 19/03/25 17:56:50 INFO mapreduce.Job: map 100% reduce 100% 19/03/25 17:56:50 INFO mapreduce.Job: Job job_1553503985304_0010 completed successfully 19/03/25 17:56:50 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=1090 FILE: Number of bytes written=436771 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=942 HDFS: Number of bytes written=323 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=14667 Total time spent by all reduces in occupied slots (ms)=7258 Total time spent by all map tasks (ms)=14667 Total time spent by all reduce tasks (ms)=7258 Total vcore-milliseconds taken by all map tasks=14667 Total vcore-milliseconds taken by all reduce tasks=7258 Total megabyte-milliseconds taken by all map tasks=15019008 Total megabyte-milliseconds taken by all reduce tasks=7432192 Map-Reduce Framework Map input records=16 Map output records=16 Map output bytes=1052 Map output materialized bytes=1096 Input split bytes=296 Combine input records=0 Combine output records=0 Reduce input groups=8 Reduce shuffle bytes=1096 Reduce input records=16 Reduce output records=8 Spilled Records=32 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=230 CPU time spent (ms)=5420 Physical memory (bytes) snapshot=684474368 Virtual memory (bytes) snapshot=6394597376 Total committed heap usage (bytes)=511705088 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=646 File Output Format Counters Bytes Written=323 19/03/25 17:56:50 INFO tool.ImportTool: Saving incremental import state to the metastore 19/03/25 17:56:51 INFO tool.ImportTool: Updated data for job: auto_job 可以看到日志中这么一段话:19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')说明上限是当前时间,然后再看下当前job的last_value:hcatalog.drop.and.create.table = false incremental.last.value = 2019-03-25 17:55:54.0 db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl和上面日志中的时间一致,如果job内容需要更改时,可以删了job重建更改好的job,手动指定时间为日志中的Upper bound或删除前记录下面last_value,为重建后的job提供增量时间。上面手动调用没问题。就剩最后一步crontab定时了,crontab -e加入下面一段话(每五分钟增量一次):*/2 * * * * /hadoop/auto_inr.sh如果一个表很大。我第一次初始化一部分最新的数据到hive表,如果没初始化进来的历史数据今天发生了变更,那merge-key的增量方式会不会报错呢?看下一篇测试文章吧,等写完会放链接到这:https://blog.csdn.net/qq_28356739/article/details/88803284
文章
SQL  ·  运维  ·  Oracle  ·  关系型数据库  ·  Java  ·  大数据  ·  Shell  ·  数据库  ·  HIVE  ·  数据安全/隐私保护
2023-03-24
【大数据开发运维解决方案】Sqoop全量同步mysql/Oracle数据到hive
前面文章写了如何部署一套伪分布式的handoop+hive+hbase+kylin环境,也介绍了如何在这个搭建好的伪分布式环境安装配置sqoop工具以及安装完成功后简单的使用过程中出现的错误及解决办法,接下来本篇文章详细介绍一下使用sqoop全量同步oracle/mysql数据到hive,这里实验采用oracle数据库为例,后面一篇文章将详细介绍:1、sqoop --incremental append 附加模式增量同步数据到hive2、sqoop --incremental --merge-key合并模式增量同步到hive文章现已经写完了。一、知识储备sqoop import和export工具有些通用的选项,如下表所示:数据导入工具import:import工具,是将HDFS平台外部的结构化存储系统中的数据导入到Hadoop平台,便于后续分析。我们先看一下import工具的基本选项及其含义,如下表所示:下面将通过一系列案例来测试这些功能。因为笔者现在只用到import,因此本文章只测试import相关功能,export参数没有列出,请读者自行测试。二、导入实验1、Oracle库创建测试用表初始化及hive创建表--连接的用户为scott用户 create table inr_emp as select a.empno, a.ename, a.job, a.mgr, a.hiredate, a.sal, a.deptno,sysdate as etltime from emp a where job is not null; select * from inr_emp; EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO ETLTIME 7369 er CLERK 7902 1980/12/17 800.00 20 2019/3/19 14:02:13 7499 ALLEN SALESMAN 7698 1981/2/20 1600.00 30 2019/3/19 14:02:13 7521 WARD SALESMAN 7698 1981/2/22 1250.00 30 2019/3/19 14:02:13 7566 JONES MANAGER 7839 1981/4/2 2975.00 20 2019/3/19 14:02:13 7654 MARTIN SALESMAN 7698 1981/9/28 1250.00 30 2019/3/19 14:02:13 7698 BLAKE MANAGER 7839 1981/5/1 2850.00 30 2019/3/19 14:02:13 7782 CLARK MANAGER 7839 1981/6/9 2450.00 10 2019/3/19 14:02:13 7839 KING PRESIDENT 1981/11/17 5000.00 10 2019/3/19 14:02:13 7844 TURNER SALESMAN 7698 1981/9/8 1500.00 30 2019/3/19 14:02:13 7876 ADAMS CLERK 7788 1987/5/23 1100.00 20 2019/3/19 14:02:13 7900 JAMES CLERK 7698 1981/12/3 950.00 30 2019/3/19 14:02:13 7902 FORD ANALYST 7566 1981/12/3 3000.00 20 2019/3/19 14:02:13 7934 sdf sdf 7782 1982/1/23 1300.00 10 2019/3/19 14:02:13 --hive创建表 [root@hadoop bin]# ./hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> use oracle; OK Time taken: 1.234 seconds hive> create table INR_EMP > ( > empno int, > ename string, > job string, > mgr int, > hiredate DATE, > sal float, > deptno int, > etltime DATE > ); OK Time taken: 0.63 seconds2、全量全列导入数据[root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_EMP -m 1 --hive-import --hive-database oracle Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/12 18:28:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/12 18:28:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/12 18:28:29 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/12 18:28:29 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/12 18:28:29 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/12 18:28:29 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/12 18:28:29 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/12 18:28:30 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 18:28:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_EMP t WHERE 1=0 19/03/12 18:28:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/cbdca745b64b4ab94902764a5ea26928/INR_EMP.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/12 18:28:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/cbdca745b64b4ab94902764a5ea26928/INR_EMP.jar 19/03/12 18:28:34 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 18:28:34 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 18:28:34 INFO mapreduce.ImportJobBase: Beginning import of INR_EMP 19/03/12 18:28:35 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/12 18:28:35 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 18:28:36 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/12 18:28:36 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/12 18:28:39 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/12 18:28:39 INFO mapreduce.JobSubmitter: number of splits:1 19/03/12 18:28:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552371714699_0004 19/03/12 18:28:40 INFO impl.YarnClientImpl: Submitted application application_1552371714699_0004 19/03/12 18:28:40 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552371714699_0004/ 19/03/12 18:28:40 INFO mapreduce.Job: Running job: job_1552371714699_0004 19/03/12 18:28:51 INFO mapreduce.Job: Job job_1552371714699_0004 running in uber mode : false 19/03/12 18:28:51 INFO mapreduce.Job: map 0% reduce 0% 19/03/12 18:29:00 INFO mapreduce.Job: map 100% reduce 0% 19/03/12 18:29:01 INFO mapreduce.Job: Job job_1552371714699_0004 completed successfully 19/03/12 18:29:01 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=143523 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=976 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=5538 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=5538 Total vcore-milliseconds taken by all map tasks=5538 Total megabyte-milliseconds taken by all map tasks=5670912 Map-Reduce Framework Map input records=13 Map output records=13 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=156 CPU time spent (ms)=2560 Physical memory (bytes) snapshot=207745024 Virtual memory (bytes) snapshot=2150998016 Total committed heap usage (bytes)=99090432 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=976 19/03/12 18:29:01 INFO mapreduce.ImportJobBase: Transferred 976 bytes in 25.1105 seconds (38.8683 bytes/sec) 19/03/12 18:29:01 INFO mapreduce.ImportJobBase: Retrieved 13 records. 19/03/12 18:29:01 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_EMP 19/03/12 18:29:01 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 18:29:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_EMP t WHERE 1=0 19/03/12 18:29:01 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive 19/03/12 18:29:01 WARN hive.TableDefWriter: Column MGR had to be cast to a less precise type in Hive 19/03/12 18:29:01 WARN hive.TableDefWriter: Column HIREDATE had to be cast to a less precise type in Hive 19/03/12 18:29:01 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive 19/03/12 18:29:01 WARN hive.TableDefWriter: Column DEPTNO had to be cast to a less precise type in Hive 19/03/12 18:29:01 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive 19/03/12 18:29:01 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/12 18:29:01 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 18:29:05 INFO SessionState: Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 18:29:05 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 19/03/12 18:29:07 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 19/03/12 18:29:07 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/ac8d208d-2339-4bae-aee8-c9fc1c3b93a4/_tmp_space.db 19/03/12 18:29:07 INFO conf.HiveConf: Using the default value passed in for log id: ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 19/03/12 18:29:07 INFO session.SessionState: Updating thread name to ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 main 19/03/12 18:29:07 INFO conf.HiveConf: Using the default value passed in for log id: ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 19/03/12 18:29:07 INFO ql.Driver: Compiling command(queryId=root_20190312102907_3fbb2f16-c52a-4c3c-843d-45c9ca918228): CREATE TABLE IF NOT EXISTS `oracle`.`INR_EMP` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `MGR` DOUBLE, `HIREDATE` STRING, `SAL` DOUBLE, `DEPTNO` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/12 10:29:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 18:29:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:29:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:29:10 INFO hive.metastore: Connected to metastore. 19/03/12 18:29:10 INFO parse.CalcitePlanner: Starting Semantic Analysis 19/03/12 18:29:10 INFO parse.CalcitePlanner: Creating table oracle.INR_EMP position=27 19/03/12 18:29:10 INFO ql.Driver: Semantic Analysis Completed 19/03/12 18:29:10 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 18:29:10 INFO ql.Driver: Completed compiling command(queryId=root_20190312102907_3fbb2f16-c52a-4c3c-843d-45c9ca918228); Time taken: 3.007 seconds 19/03/12 18:29:10 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 18:29:10 INFO ql.Driver: Executing command(queryId=root_20190312102907_3fbb2f16-c52a-4c3c-843d-45c9ca918228): CREATE TABLE IF NOT EXISTS `oracle`.`INR_EMP` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `MGR` DOUBLE, `HIREDATE` STRING, `SAL` DOUBLE, `DEPTNO` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/12 10:29:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 18:29:10 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=ac8d208d-2339-4bae-aee8-c9fc1c3b93a 4, clientType=HIVECLI]19/03/12 18:29:10 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 19/03/12 18:29:10 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop. hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/12 18:29:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 18:29:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:29:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:29:10 INFO hive.metastore: Connected to metastore. 19/03/12 18:29:10 INFO ql.Driver: Completed executing command(queryId=root_20190312102907_3fbb2f16-c52a-4c3c-843d-45c9ca918228); Time taken: 0.083 seconds OK 19/03/12 18:29:10 INFO ql.Driver: OK Time taken: 3.101 seconds 19/03/12 18:29:10 INFO CliDriver: Time taken: 3.101 seconds 19/03/12 18:29:10 INFO conf.HiveConf: Using the default value passed in for log id: ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 19/03/12 18:29:10 INFO session.SessionState: Resetting thread name to main 19/03/12 18:29:10 INFO conf.HiveConf: Using the default value passed in for log id: ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 19/03/12 18:29:10 INFO session.SessionState: Updating thread name to ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 main 19/03/12 18:29:10 INFO ql.Driver: Compiling command(queryId=root_20190312102910_d3ab56d4-1bcb-4063-aaab-badd4f8f13e2): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_EMP' INTO TABLE `oracle`.`INR_EMP` 19/03/12 18:29:11 INFO ql.Driver: Semantic Analysis Completed 19/03/12 18:29:11 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 18:29:11 INFO ql.Driver: Completed compiling command(queryId=root_20190312102910_d3ab56d4-1bcb-4063-aaab-badd4f8f13e2); Time taken: 0.446 seconds 19/03/12 18:29:11 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 18:29:11 INFO ql.Driver: Executing command(queryId=root_20190312102910_d3ab56d4-1bcb-4063-aaab-badd4f8f13e2): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_EMP' INTO TABLE `oracle`.`INR_EMP` 19/03/12 18:29:11 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/03/12 18:29:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0 Loading data to table oracle.inr_emp 19/03/12 18:29:11 INFO exec.Task: Loading data to table oracle.inr_emp from hdfs://192.168.1.66:9000/user/root/INR_EMP 19/03/12 18:29:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:29:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:29:11 INFO hive.metastore: Connected to metastore. 19/03/12 18:29:11 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 19/03/12 18:29:12 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode 19/03/12 18:29:12 INFO exec.StatsTask: Executing stats task 19/03/12 18:29:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 18:29:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:29:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:29:12 INFO hive.metastore: Connected to metastore. 19/03/12 18:29:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 18:29:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:29:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:29:12 INFO hive.metastore: Connected to metastore. 19/03/12 18:29:12 INFO exec.StatsTask: Table oracle.inr_emp stats: [numFiles=1, numRows=0, totalSize=976, rawDataSize=0] 19/03/12 18:29:12 INFO ql.Driver: Completed executing command(queryId=root_20190312102910_d3ab56d4-1bcb-4063-aaab-badd4f8f13e2); Time taken: 1.114 seconds OK 19/03/12 18:29:12 INFO ql.Driver: OK Time taken: 1.56 seconds 19/03/12 18:29:12 INFO CliDriver: Time taken: 1.56 seconds 19/03/12 18:29:12 INFO conf.HiveConf: Using the default value passed in for log id: ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 19/03/12 18:29:12 INFO session.SessionState: Resetting thread name to main 19/03/12 18:29:12 INFO conf.HiveConf: Using the default value passed in for log id: ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 19/03/12 18:29:12 INFO session.SessionState: Deleted directory: /tmp/hive/root/ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 on fs with scheme hdfs 19/03/12 18:29:12 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/ac8d208d-2339-4bae-aee8-c9fc1c3b93a4 on fs with scheme file 19/03/12 18:29:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 18:29:12 INFO hive.HiveImport: Hive import complete. 19/03/12 18:29:12 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.查询hive表:hive> select * from inr_emp; OK 7369 er CLERK 7902 NULL 800.0 20 NULL 7499 ALLEN SALESMAN 7698 NULL 1600.0 30 NULL 7521 WARD SALESMAN 7698 NULL 1250.0 30 NULL 7566 JONES MANAGER 7839 NULL 2975.0 20 NULL 7654 MARTIN SALESMAN 7698 NULL 1250.0 30 NULL 7698 BLAKE MANAGER 7839 NULL 2850.0 30 NULL 7782 CLARK MANAGER 7839 NULL 2450.0 10 NULL 7839 KING PRESIDENT NULL NULL 5000.0 10 NULL 7844 TURNER SALESMAN 7698 NULL 1500.0 30 NULL 7876 ADAMS CLERK 7788 NULL 1100.0 20 NULL 7900 JAMES CLERK 7698 NULL 950.0 30 NULL 7902 FORD ANALYST 7566 NULL 3000.0 20 NULL 7934 sdf sdf 7782 NULL 1300.0 10 NULL Time taken: 3.103 seconds, Fetched: 13 row(s)发现导入hive表时间相关的数据都成空值了,这里我们把oracle时间列对应的hive表的时间列改为string类型重新导入:hive> drop table inr_emp; OK Time taken: 2.483 seconds hive> create table INR_EMP > ( > empno int, > ename string, > job string, > mgr int, > hiredate string, > sal float, > deptno int, > etltime string > ); OK Time taken: 0.109 seconds再次执行一次上面的导入,看下结果:hive> select * from inr_emp; OK 7369 er CLERK 7902 1980-12-17 00:00:00.0 800.0 20 2019-03-19 14:02:13.0 7499 ALLEN SALESMAN 7698 1981-02-20 00:00:00.0 1600.0 30 2019-03-19 14:02:13.0 7521 WARD SALESMAN 7698 1981-02-22 00:00:00.0 1250.0 30 2019-03-19 14:02:13.0 7566 JONES MANAGER 7839 1981-04-02 00:00:00.0 2975.0 20 2019-03-19 14:02:13.0 7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00.0 1250.0 30 2019-03-19 14:02:13.0 7698 BLAKE MANAGER 7839 1981-05-01 00:00:00.0 2850.0 30 2019-03-19 14:02:13.0 7782 CLARK MANAGER 7839 1981-06-09 00:00:00.0 2450.0 10 2019-03-19 14:02:13.0 7839 KING PRESIDENT NULL 1981-11-17 00:00:00.0 5000.0 10 2019-03-19 14:02:13.0 7844 TURNER SALESMAN 7698 1981-09-08 00:00:00.0 1500.0 30 2019-03-19 14:02:13.0 7876 ADAMS CLERK 7788 1987-05-23 00:00:00.0 1100.0 20 2019-03-19 14:02:13.0 7900 JAMES CLERK 7698 1981-12-03 00:00:00.0 950.0 30 2019-03-19 14:02:13.0 7902 FORD ANALYST 7566 1981-12-03 00:00:00.0 3000.0 20 2019-03-19 14:02:13.0 7934 sdf sdf 7782 1982-01-23 00:00:00.0 1300.0 10 2019-03-19 14:02:13.0 Time taken: 0.369 seconds, Fetched: 13 row(s)这次正常了。3、全量选择列导入先drop了hive表inr_emp表,重建:hive> drop table inr_emp; OK Time taken: 0.205 seconds hive> create table INR_EMP > ( > empno int, > ename string, > job string, > mgr int, > hiredate string, > sal float, > deptno int, > etltime string > ); OK Time taken: 0.102 seconds然后另开一个会话挑几列导入[root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_EMP -m 1 --columns 'EMPNO,ENAME,SAL,ETLTIME' --hive-import --hi ve-database oracleWarning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/12 18:44:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/12 18:44:23 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/12 18:44:23 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/12 18:44:23 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/12 18:44:23 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/12 18:44:23 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/12 18:44:23 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/12 18:44:24 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 18:44:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_EMP t WHERE 1=0 19/03/12 18:44:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/2e1abddfc21ac4e688984b572589f687/INR_EMP.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/12 18:44:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/2e1abddfc21ac4e688984b572589f687/INR_EMP.jar 19/03/12 18:44:26 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 18:44:26 INFO mapreduce.ImportJobBase: Beginning import of INR_EMP 19/03/12 18:44:27 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/12 18:44:27 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/12 18:44:28 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/12 18:44:30 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/12 18:44:30 INFO mapreduce.JobSubmitter: number of splits:1 19/03/12 18:44:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552371714699_0007 19/03/12 18:44:31 INFO impl.YarnClientImpl: Submitted application application_1552371714699_0007 19/03/12 18:44:31 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552371714699_0007/ 19/03/12 18:44:31 INFO mapreduce.Job: Running job: job_1552371714699_0007 19/03/12 18:44:40 INFO mapreduce.Job: Job job_1552371714699_0007 running in uber mode : false 19/03/12 18:44:40 INFO mapreduce.Job: map 0% reduce 0% 19/03/12 18:44:46 INFO mapreduce.Job: map 100% reduce 0% 19/03/12 18:44:47 INFO mapreduce.Job: Job job_1552371714699_0007 completed successfully 19/03/12 18:44:47 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=143499 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=486 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4271 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4271 Total vcore-milliseconds taken by all map tasks=4271 Total megabyte-milliseconds taken by all map tasks=4373504 Map-Reduce Framework Map input records=13 Map output records=13 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=69 CPU time spent (ms)=1990 Physical memory (bytes) snapshot=188010496 Virtual memory (bytes) snapshot=2143096832 Total committed heap usage (bytes)=111149056 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=486 19/03/12 18:44:47 INFO mapreduce.ImportJobBase: Transferred 486 bytes in 20.0884 seconds (24.193 bytes/sec) 19/03/12 18:44:47 INFO mapreduce.ImportJobBase: Retrieved 13 records. 19/03/12 18:44:47 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_EMP 19/03/12 18:44:47 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 18:44:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_EMP t WHERE 1=0 19/03/12 18:44:48 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive 19/03/12 18:44:48 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive 19/03/12 18:44:48 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive 19/03/12 18:44:48 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/12 18:44:48 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 18:44:50 INFO SessionState: Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 18:44:50 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/08d98a96-18e1-4474-98df-1991d7b421f5 19/03/12 18:44:51 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/08d98a96-18e1-4474-98df-1991d7b421f5 19/03/12 18:44:51 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/08d98a96-18e1-4474-98df-1991d7b421f5/_tmp_space.db 19/03/12 18:44:51 INFO conf.HiveConf: Using the default value passed in for log id: 08d98a96-18e1-4474-98df-1991d7b421f5 19/03/12 18:44:51 INFO session.SessionState: Updating thread name to 08d98a96-18e1-4474-98df-1991d7b421f5 main 19/03/12 18:44:51 INFO conf.HiveConf: Using the default value passed in for log id: 08d98a96-18e1-4474-98df-1991d7b421f5 19/03/12 18:44:51 INFO ql.Driver: Compiling command(queryId=root_20190312104451_88b6d963-af76-490c-8832-ccc07e0667a7): CREATE TABLE IF NOT EXISTS `oracle`.`INR_EMP` ( `EMPNO` DOUBLE, `ENAME ` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/12 10:44:48' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 18:44:53 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:44:53 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:44:53 INFO hive.metastore: Connected to metastore. 19/03/12 18:44:53 INFO parse.CalcitePlanner: Starting Semantic Analysis 19/03/12 18:44:53 INFO parse.CalcitePlanner: Creating table oracle.INR_EMP position=27 19/03/12 18:44:53 INFO ql.Driver: Semantic Analysis Completed 19/03/12 18:44:53 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 18:44:53 INFO ql.Driver: Completed compiling command(queryId=root_20190312104451_88b6d963-af76-490c-8832-ccc07e0667a7); Time taken: 2.808 seconds 19/03/12 18:44:53 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 18:44:53 INFO ql.Driver: Executing command(queryId=root_20190312104451_88b6d963-af76-490c-8832-ccc07e0667a7): CREATE TABLE IF NOT EXISTS `oracle`.`INR_EMP` ( `EMPNO` DOUBLE, `ENAME ` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/12 10:44:48' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 18:44:54 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=08d98a96-18e1-4474-98df-1991d7b421f 5, clientType=HIVECLI]19/03/12 18:44:54 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 19/03/12 18:44:54 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop. hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/12 18:44:54 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 18:44:54 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:44:54 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:44:54 INFO hive.metastore: Connected to metastore. 19/03/12 18:44:54 INFO ql.Driver: Completed executing command(queryId=root_20190312104451_88b6d963-af76-490c-8832-ccc07e0667a7); Time taken: 0.092 seconds OK 19/03/12 18:44:54 INFO ql.Driver: OK Time taken: 2.911 seconds 19/03/12 18:44:54 INFO CliDriver: Time taken: 2.911 seconds 19/03/12 18:44:54 INFO conf.HiveConf: Using the default value passed in for log id: 08d98a96-18e1-4474-98df-1991d7b421f5 19/03/12 18:44:54 INFO session.SessionState: Resetting thread name to main 19/03/12 18:44:54 INFO conf.HiveConf: Using the default value passed in for log id: 08d98a96-18e1-4474-98df-1991d7b421f5 19/03/12 18:44:54 INFO session.SessionState: Updating thread name to 08d98a96-18e1-4474-98df-1991d7b421f5 main 19/03/12 18:44:54 INFO ql.Driver: Compiling command(queryId=root_20190312104454_13a6c093-1f23-4362-a95e-db15aef02c97): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_EMP' INTO TABLE `oracle`.`INR_EMP` 19/03/12 18:44:54 INFO ql.Driver: Semantic Analysis Completed 19/03/12 18:44:54 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 18:44:54 INFO ql.Driver: Completed compiling command(queryId=root_20190312104454_13a6c093-1f23-4362-a95e-db15aef02c97); Time taken: 0.411 seconds 19/03/12 18:44:54 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 18:44:54 INFO ql.Driver: Executing command(queryId=root_20190312104454_13a6c093-1f23-4362-a95e-db15aef02c97): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_EMP' INTO TABLE `oracle`.`INR_EMP` 19/03/12 18:44:54 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/03/12 18:44:54 INFO hive.metastore: Closed a connection to metastore, current connections: 0 Loading data to table oracle.inr_emp 19/03/12 18:44:54 INFO exec.Task: Loading data to table oracle.inr_emp from hdfs://192.168.1.66:9000/user/root/INR_EMP 19/03/12 18:44:54 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:44:54 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:44:54 INFO hive.metastore: Connected to metastore. 19/03/12 18:44:54 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 19/03/12 18:44:55 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode 19/03/12 18:44:55 INFO exec.StatsTask: Executing stats task 19/03/12 18:44:55 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 18:44:55 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:44:55 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:44:55 INFO hive.metastore: Connected to metastore. 19/03/12 18:44:55 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 18:44:55 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 18:44:55 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 18:44:55 INFO hive.metastore: Connected to metastore. 19/03/12 18:44:55 INFO exec.StatsTask: Table oracle.inr_emp stats: [numFiles=1, numRows=0, totalSize=486, rawDataSize=0] 19/03/12 18:44:55 INFO ql.Driver: Completed executing command(queryId=root_20190312104454_13a6c093-1f23-4362-a95e-db15aef02c97); Time taken: 1.02 seconds OK 19/03/12 18:44:55 INFO ql.Driver: OK Time taken: 1.431 seconds 19/03/12 18:44:55 INFO CliDriver: Time taken: 1.431 seconds 19/03/12 18:44:55 INFO conf.HiveConf: Using the default value passed in for log id: 08d98a96-18e1-4474-98df-1991d7b421f5 19/03/12 18:44:55 INFO session.SessionState: Resetting thread name to main 19/03/12 18:44:55 INFO conf.HiveConf: Using the default value passed in for log id: 08d98a96-18e1-4474-98df-1991d7b421f5 19/03/12 18:44:55 INFO session.SessionState: Deleted directory: /tmp/hive/root/08d98a96-18e1-4474-98df-1991d7b421f5 on fs with scheme hdfs 19/03/12 18:44:55 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/08d98a96-18e1-4474-98df-1991d7b421f5 on fs with scheme file 19/03/12 18:44:55 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 18:44:55 INFO hive.HiveImport: Hive import complete. 19/03/12 18:44:55 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.查询hive表hive> select * from inr_emp; OK 7369 er 800 NULL NULL NULL NULL NULL 7499 ALLEN 1600 NULL NULL NULL NULL NULL 7521 WARD 1250 NULL NULL NULL NULL NULL 7566 JONES 2975 NULL NULL NULL NULL NULL 7654 MARTIN 1250 NULL NULL NULL NULL NULL 7698 BLAKE 2850 NULL NULL NULL NULL NULL 7782 CLARK 2450 NULL NULL NULL NULL NULL 7839 KING 5000 NULL NULL NULL NULL NULL 7844 TURNER 1500 NULL NULL NULL NULL NULL 7876 ADAMS 1100 NULL NULL NULL NULL NULL 7900 JAMES 950 NULL NULL NULL NULL NULL 7902 FORD 3000 NULL NULL NULL NULL NULL 7934 sdf 1300 NULL NULL NULL NULL NULL Time taken: 0.188 seconds, Fetched: 13 row(s)发现的确只导入了这几列,其他列为空,如果hive表只创建我们需要的源端几个列来创建一个表,然后指定需要的这几列导入呢?删除重建hive表:hive> drop table inr_emp; OK Time taken: 0.152 seconds hive> create table INR_EMP > ( > empno int, > ename string, > sal float > ); OK Time taken: 0.086 seconds重新导入:[root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_EMP -m 1 --columns 'EMPNO,ENAME,SAL,ETLTIME' --hive-import --hi ve-database oracle 。。。查询hive表hive> select * from inr_emp; OK 7369 er 800.0 7499 ALLEN 1600.0 7521 WARD 1250.0 7566 JONES 2975.0 7654 MARTIN 1250.0 7698 BLAKE 2850.0 7782 CLARK 2450.0 7839 KING 5000.0 7844 TURNER 1500.0 7876 ADAMS 1100.0 7900 JAMES 950.0 7902 FORD 3000.0 7934 sdf 1300.0 Time taken: 0.18 seconds, Fetched: 13 row(s)导入的数据没问题,这样在做kylin增量时没我可以只选择需要计算的列来创建hive表,然后通过sqoop来增量数据到hive,降低空间使用,加下下一篇文章介绍增量导入,连接已经在文章开始给出。
文章
SQL  ·  运维  ·  分布式计算  ·  Oracle  ·  关系型数据库  ·  MySQL  ·  Hadoop  ·  大数据  ·  数据库  ·  HIVE
2023-03-24
【大数据开发运维解决方案】Oracle通过sqoop同步数据到hive
一、介绍将关系型数据库ORACLE的数据导入到HDFS中,可以通过Sqoop、OGG来实现,相比较ORACLE GOLDENGATE,Sqoop不仅不需要复杂的安装配置,而且传输效率很高,同时也能实现增量数据同步。本文档将在以上两个文章的基础上操作,是对第二篇文章环境的一个简单使用测试,使用过程中出现的错误亦可以验证暴漏第二篇文章安装的问题出现的错误,至于sqoop增量同步到hive请看本人在这篇文章之后写的测试文档。二、环境配置三、实验过程1、Oracle源端创建测试用表并初始化--scott用户下创建此表 create table ora_hive( empno number primary key, ename varchar2(30), hiredate date ); --简单初始化出近1000天数据 insert into ora_hive select level, dbms_random.string('u', 20), sysdate - level from dual connect by level <= 1000; commit; --现在表中存在2019,2018,2017,2016四年的数据2、hive创建目标表切换到hive目录: [root@hadoop ~]# cd /hadoop/hive/ [root@hadoop hive]# cd bin [root@hadoop bin]# ./hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/imp l/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/ org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/ hive-log4j2.properties Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.hive> show databases; OK default sbux Time taken: 1.508 seconds, Fetched: 2 row(s) hive> create database oracle; OK Time taken: 1.729 seconds hive> show databases; OK default oracle sbux Time taken: 0.026 seconds, Fetched: 3 row(s) hive> use oracle; OK Time taken: 0.094 seconds hive> create table ora_hive( > empno int, > ename string, > hiredate date > ); OK Time taken: 0.744 seconds hive> show tables; 3、导入数据:[root@hadoop bin]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table ORA_HIVE -m 1 --hive-import --hive-database oracle Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/12 14:38:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/12 14:38:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/12 14:38:11 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/12 14:38:11 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/12 14:38:12 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/12 14:38:12 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/12 14:38:12 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/12 14:38:12 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:38:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM ORA_HIVE t WHERE 1=0 19/03/12 14:38:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/e198fef468abdfcbc531a36035dd6646/ORA_HIVE.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/12 14:38:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e198fef468abdfcbc531a36035dd6646/ORA_HIVE.jar 19/03/12 14:38:16 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:38:16 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:38:16 INFO mapreduce.ImportJobBase: Beginning import of ORA_HIVE 19/03/12 14:38:17 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/12 14:38:17 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:38:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/12 14:38:18 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/12 14:38:21 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/12 14:38:21 INFO mapreduce.JobSubmitter: number of splits:1 19/03/12 14:38:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552371714699_0001 19/03/12 14:38:23 INFO impl.YarnClientImpl: Submitted application application_1552371714699_0001 19/03/12 14:38:23 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552371714699_0001/ 19/03/12 14:38:23 INFO mapreduce.Job: Running job: job_1552371714699_0001 19/03/12 14:38:35 INFO mapreduce.Job: Job job_1552371714699_0001 running in uber mode : false 19/03/12 14:38:35 INFO mapreduce.Job: map 0% reduce 0% 19/03/12 14:38:42 INFO mapreduce.Job: map 100% reduce 0% 19/03/12 14:38:43 INFO mapreduce.Job: Job job_1552371714699_0001 completed successfully 19/03/12 14:38:44 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=140340 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=46893 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4625 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4625 Total vcore-milliseconds taken by all map tasks=4625 Total megabyte-milliseconds taken by all map tasks=4736000 Map-Reduce Framework Map input records=1000 Map output records=1000 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=101 CPU time spent (ms)=2730 Physical memory (bytes) snapshot=203558912 Virtual memory (bytes) snapshot=2137956352 Total committed heap usage (bytes)=100139008 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=46893 19/03/12 14:38:44 INFO mapreduce.ImportJobBase: Transferred 45.7939 KB in 25.921 seconds (1.7667 KB/sec) 19/03/12 14:38:44 INFO mapreduce.ImportJobBase: Retrieved 1000 records. 19/03/12 14:38:44 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table ORA_HIVE 19/03/12 14:38:44 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:38:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM ORA_HIVE t WHERE 1=0 19/03/12 14:38:44 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive 19/03/12 14:38:44 WARN hive.TableDefWriter: Column HIREDATE had to be cast to a less precise type in Hive 19/03/12 14:38:44 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/12 14:38:44 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml 2019-03-12 06:38:47,077 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register") at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) at java.lang.SecurityManager.checkPermission(SecurityManager.java:585) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanTrustPermission(DefaultMBeanServerInterceptor.java:1848) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:322) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at org.apache.logging.log4j.core.jmx.Server.register(Server.java:380) at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:165) at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:138) at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:507) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:249) at org.apache.logging.log4j.core.async.AsyncLoggerContext.start(AsyncLoggerContext.java:86) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:239) at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:157) at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:130) at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:100) at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:187) at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:154) at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:90) at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:82) at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:65) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:702) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 14:38:47 INFO SessionState: Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 14:38:47 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf 19/03/12 14:38:47 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf 19/03/12 14:38:47 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf/_tmp_space.db 19/03/12 14:38:47 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf 19/03/12 14:38:47 INFO session.SessionState: Updating thread name to 0a864287-81f1-47f6-80fb-8ea3cf0b2faf main 19/03/12 14:38:47 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf 19/03/12 14:38:47 INFO ql.Driver: Compiling command(queryId=root_20190312063847_496038e3-f0ee-4f95-b35b-bfd1746cfd09): CREATE TABLE IF NOT EXISTS `oracle`.`ORA_HIVE` ( `EMPNO` DOUBLE, `ENAM E` STRING, `HIREDATE` STRING) COMMENT 'Imported by sqoop on 2019/03/12 06:38:44' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 14:38:50 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 14:38:50 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 14:38:50 INFO hive.metastore: Connected to metastore. 19/03/12 14:38:50 INFO parse.CalcitePlanner: Starting Semantic Analysis 19/03/12 14:38:50 INFO parse.CalcitePlanner: Creating table oracle.ORA_HIVE position=27 19/03/12 14:38:51 INFO ql.Driver: Semantic Analysis Completed 19/03/12 14:38:51 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 14:38:51 INFO ql.Driver: Completed compiling command(queryId=root_20190312063847_496038e3-f0ee-4f95-b35b-bfd1746cfd09); Time taken: 3.385 seconds 19/03/12 14:38:51 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 14:38:51 INFO ql.Driver: Executing command(queryId=root_20190312063847_496038e3-f0ee-4f95-b35b-bfd1746cfd09): CREATE TABLE IF NOT EXISTS `oracle`.`ORA_HIVE` ( `EMPNO` DOUBLE, `ENAM E` STRING, `HIREDATE` STRING) COMMENT 'Imported by sqoop on 2019/03/12 06:38:44' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 14:38:51 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=0a864287-81f1-47f6-80fb-8ea3cf0b2fa f, clientType=HIVECLI]19/03/12 14:38:51 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 19/03/12 14:38:51 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop. hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/12 14:38:51 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 14:38:51 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 14:38:51 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 14:38:51 INFO hive.metastore: Connected to metastore. 19/03/12 14:38:51 INFO ql.Driver: Completed executing command(queryId=root_20190312063847_496038e3-f0ee-4f95-b35b-bfd1746cfd09); Time taken: 0.134 seconds OK 19/03/12 14:38:51 INFO ql.Driver: OK Time taken: 3.534 seconds 19/03/12 14:38:51 INFO CliDriver: Time taken: 3.534 seconds 19/03/12 14:38:51 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf 19/03/12 14:38:51 INFO session.SessionState: Resetting thread name to main 19/03/12 14:38:51 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf 19/03/12 14:38:51 INFO session.SessionState: Updating thread name to 0a864287-81f1-47f6-80fb-8ea3cf0b2faf main 19/03/12 14:38:51 INFO ql.Driver: Compiling command(queryId=root_20190312063851_88f08fef-b111-439a-ac51-62b91c507927): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/ORA_HIVE' INTO TABLE `oracle`.`ORA_HIVE` 19/03/12 14:38:51 INFO ql.Driver: Semantic Analysis Completed 19/03/12 14:38:51 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 14:38:51 INFO ql.Driver: Completed compiling command(queryId=root_20190312063851_88f08fef-b111-439a-ac51-62b91c507927); Time taken: 0.451 seconds 19/03/12 14:38:51 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 14:38:51 INFO ql.Driver: Executing command(queryId=root_20190312063851_88f08fef-b111-439a-ac51-62b91c507927): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/ORA_HIVE' INTO TABLE `oracle`.`ORA_HIVE` 19/03/12 14:38:51 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/03/12 14:38:51 INFO hive.metastore: Closed a connection to metastore, current connections: 0 Loading data to table oracle.ora_hive 19/03/12 14:38:51 INFO exec.Task: Loading data to table oracle.ora_hive from hdfs://192.168.1.66:9000/user/root/ORA_HIVE 19/03/12 14:38:51 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 14:38:51 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 14:38:51 INFO hive.metastore: Connected to metastore. 19/03/12 14:38:51 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 19/03/12 14:38:52 ERROR exec.TaskRunner: Error in executeTask java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader; at org.apache.hadoop.hive.common.StatsSetupConst$ColumnStatsAccurate.<clinit>(StatsSetupConst.java:165) at org.apache.hadoop.hive.common.StatsSetupConst.parseStatsAcc(StatsSetupConst.java:300) at org.apache.hadoop.hive.common.StatsSetupConst.clearColumnStatsState(StatsSetupConst.java:261) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:2032) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:360) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MoveTask. com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databi nd/ObjectReader;19/03/12 14:38:52 ERROR ql.Driver: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.MoveTask. com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/C lass;)Lcom/fasterxml/jackson/databind/ObjectReader;19/03/12 14:38:52 INFO ql.Driver: Completed executing command(queryId=root_20190312063851_88f08fef-b111-439a-ac51-62b91c507927); Time taken: 0.497 seconds 19/03/12 14:38:52 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf 19/03/12 14:38:52 INFO session.SessionState: Resetting thread name to main 19/03/12 14:38:52 INFO conf.HiveConf: Using the default value passed in for log id: 0a864287-81f1-47f6-80fb-8ea3cf0b2faf 19/03/12 14:38:52 INFO session.SessionState: Deleted directory: /tmp/hive/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf on fs with scheme hdfs 19/03/12 14:38:52 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/0a864287-81f1-47f6-80fb-8ea3cf0b2faf on fs with scheme file 19/03/12 14:38:52 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 14:38:52 ERROR tool.ImportTool: Import failed: java.io.IOException: Hive CliDriver exited with status=-101 at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:355) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) 发现报错,先看第一个报错内容:2019-03-12 06:38:47,077 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")这个报错内容是Oracle jdbc驱动连接数据库时报的错,解决办法为:修改$JAVA_HOME/jre/lib/security/java.policy文件,添加一行: permission javax.management.MBeanTrustPermission "register";来看第二个错误:java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;这个报错原因是hive+sqoop的jackson版本不一致导致的问题,解决办法为:将$SQOOP_HOME/lib/jackson*.jar 文件bak,再把$HIVE_HOME/lib/jackson*.jar 拷贝替换至 $SQOOP_HOME/lib 目录中。[root@hadoop bin]# cd /hadoop/sqoop/lib/ [root@hadoop lib]# mkdir /hadoop/bak [root@hadoop lib]# mv jackson*.jar /hadoop/bak/ [root@hadoop lib]# cp $HIVE_HOME/lib/jackson*.jar .这时候先删除一次已经创建的ORA_HIVE再执行一次导入:[root@hadoop ~]# hadoop fs -rmr ORA_HIVE rmr: DEPRECATED: Please use 'rm -r' instead. 19/03/12 14:54:21 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minut es, Emptier interval = 0 minutes.Deleted ORA_HIVE [root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table ORA_HIVE -m 1 --hive-import --hive-database oracle Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/12 14:55:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/12 14:55:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/12 14:55:21 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/12 14:55:21 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/12 14:55:21 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/12 14:55:21 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/12 14:55:21 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/12 14:55:22 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:55:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM ORA_HIVE t WHERE 1=0 19/03/12 14:55:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/a53ff4a6266b4f7f9f659e0a43fa9e7e/ORA_HIVE.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/12 14:55:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/a53ff4a6266b4f7f9f659e0a43fa9e7e/ORA_HIVE.jar 19/03/12 14:55:24 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:55:24 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:55:24 INFO mapreduce.ImportJobBase: Beginning import of ORA_HIVE 19/03/12 14:55:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/12 14:55:25 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:55:26 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/12 14:55:26 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/12 14:55:29 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/12 14:55:29 INFO mapreduce.JobSubmitter: number of splits:1 19/03/12 14:55:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552371714699_0002 19/03/12 14:55:30 INFO impl.YarnClientImpl: Submitted application application_1552371714699_0002 19/03/12 14:55:30 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552371714699_0002/ 19/03/12 14:55:30 INFO mapreduce.Job: Running job: job_1552371714699_0002 19/03/12 14:55:39 INFO mapreduce.Job: Job job_1552371714699_0002 running in uber mode : false 19/03/12 14:55:39 INFO mapreduce.Job: map 0% reduce 0% 19/03/12 14:55:46 INFO mapreduce.Job: map 100% reduce 0% 19/03/12 14:55:47 INFO mapreduce.Job: Job job_1552371714699_0002 completed successfully 19/03/12 14:55:47 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=143639 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=46893 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4765 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4765 Total vcore-milliseconds taken by all map tasks=4765 Total megabyte-milliseconds taken by all map tasks=4879360 Map-Reduce Framework Map input records=1000 Map output records=1000 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=152 CPU time spent (ms)=2670 Physical memory (bytes) snapshot=201306112 Virtual memory (bytes) snapshot=2138767360 Total committed heap usage (bytes)=101187584 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=46893 19/03/12 14:55:47 INFO mapreduce.ImportJobBase: Transferred 45.7939 KB in 21.7019 seconds (2.1101 KB/sec) 19/03/12 14:55:47 INFO mapreduce.ImportJobBase: Retrieved 1000 records. 19/03/12 14:55:47 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table ORA_HIVE 19/03/12 14:55:47 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 14:55:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM ORA_HIVE t WHERE 1=0 19/03/12 14:55:47 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive 19/03/12 14:55:47 WARN hive.TableDefWriter: Column HIREDATE had to be cast to a less precise type in Hive 19/03/12 14:55:47 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/12 14:55:47 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 14:55:50 INFO SessionState: Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 14:55:50 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 19/03/12 14:55:50 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 19/03/12 14:55:50 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/4a5f4f66-31eb-4a12-95d2-bf22d45ecde4/_tmp_space.db 19/03/12 14:55:50 INFO conf.HiveConf: Using the default value passed in for log id: 4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 19/03/12 14:55:50 INFO session.SessionState: Updating thread name to 4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 main 19/03/12 14:55:50 INFO conf.HiveConf: Using the default value passed in for log id: 4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 19/03/12 14:55:51 INFO ql.Driver: Compiling command(queryId=root_20190312065551_050b64c3-f5a1-4073-addd-a838c3585502): CREATE TABLE IF NOT EXISTS `oracle`.`ORA_HIVE` ( `EMPNO` DOUBLE, `ENAM E` STRING, `HIREDATE` STRING) COMMENT 'Imported by sqoop on 2019/03/12 06:55:47' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 14:55:53 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 14:55:53 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 14:55:53 INFO hive.metastore: Connected to metastore. 19/03/12 14:55:53 INFO parse.CalcitePlanner: Starting Semantic Analysis 19/03/12 14:55:53 INFO parse.CalcitePlanner: Creating table oracle.ORA_HIVE position=27 19/03/12 14:55:53 INFO ql.Driver: Semantic Analysis Completed 19/03/12 14:55:53 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 14:55:53 INFO ql.Driver: Completed compiling command(queryId=root_20190312065551_050b64c3-f5a1-4073-addd-a838c3585502); Time taken: 2.75 seconds 19/03/12 14:55:53 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 14:55:53 INFO ql.Driver: Executing command(queryId=root_20190312065551_050b64c3-f5a1-4073-addd-a838c3585502): CREATE TABLE IF NOT EXISTS `oracle`.`ORA_HIVE` ( `EMPNO` DOUBLE, `ENAM E` STRING, `HIREDATE` STRING) COMMENT 'Imported by sqoop on 2019/03/12 06:55:47' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 14:55:53 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=4a5f4f66-31eb-4a12-95d2-bf22d45ecde 4, clientType=HIVECLI]19/03/12 14:55:53 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 19/03/12 14:55:53 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop. hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/12 14:55:53 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 14:55:53 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 14:55:53 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 14:55:53 INFO hive.metastore: Connected to metastore. 19/03/12 14:55:53 INFO ql.Driver: Completed executing command(queryId=root_20190312065551_050b64c3-f5a1-4073-addd-a838c3585502); Time taken: 0.088 seconds OK 19/03/12 14:55:53 INFO ql.Driver: OK Time taken: 2.851 seconds 19/03/12 14:55:53 INFO CliDriver: Time taken: 2.851 seconds 19/03/12 14:55:53 INFO conf.HiveConf: Using the default value passed in for log id: 4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 19/03/12 14:55:53 INFO session.SessionState: Resetting thread name to main 19/03/12 14:55:53 INFO conf.HiveConf: Using the default value passed in for log id: 4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 19/03/12 14:55:53 INFO session.SessionState: Updating thread name to 4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 main 19/03/12 14:55:53 INFO ql.Driver: Compiling command(queryId=root_20190312065553_81be5e55-9c13-4ad8-86e0-3ec286bea2e0): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/ORA_HIVE' INTO TABLE `oracle`.`ORA_HIVE` 19/03/12 14:55:54 INFO ql.Driver: Semantic Analysis Completed 19/03/12 14:55:54 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 14:55:54 INFO ql.Driver: Completed compiling command(queryId=root_20190312065553_81be5e55-9c13-4ad8-86e0-3ec286bea2e0); Time taken: 0.414 seconds 19/03/12 14:55:54 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 14:55:54 INFO ql.Driver: Executing command(queryId=root_20190312065553_81be5e55-9c13-4ad8-86e0-3ec286bea2e0): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/ORA_HIVE' INTO TABLE `oracle`.`ORA_HIVE` 19/03/12 14:55:54 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/03/12 14:55:54 INFO hive.metastore: Closed a connection to metastore, current connections: 0 Loading data to table oracle.ora_hive 19/03/12 14:55:54 INFO exec.Task: Loading data to table oracle.ora_hive from hdfs://192.168.1.66:9000/user/root/ORA_HIVE 19/03/12 14:55:54 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 14:55:54 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 14:55:54 INFO hive.metastore: Connected to metastore. 19/03/12 14:55:54 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 19/03/12 14:55:55 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode 19/03/12 14:55:55 INFO exec.StatsTask: Executing stats task 19/03/12 14:55:55 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 14:55:55 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 14:55:55 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 14:55:55 INFO hive.metastore: Connected to metastore. 19/03/12 14:55:55 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 14:55:55 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 14:55:55 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 14:55:55 INFO hive.metastore: Connected to metastore. 19/03/12 14:55:55 INFO exec.StatsTask: Table oracle.ora_hive stats: [numFiles=2, numRows=0, totalSize=93786, rawDataSize=0] 19/03/12 14:55:55 INFO ql.Driver: Completed executing command(queryId=root_20190312065553_81be5e55-9c13-4ad8-86e0-3ec286bea2e0); Time taken: 1.186 seconds OK 19/03/12 14:55:55 INFO ql.Driver: OK Time taken: 1.6 seconds 19/03/12 14:55:55 INFO CliDriver: Time taken: 1.6 seconds 19/03/12 14:55:55 INFO conf.HiveConf: Using the default value passed in for log id: 4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 19/03/12 14:55:55 INFO session.SessionState: Resetting thread name to main 19/03/12 14:55:55 INFO conf.HiveConf: Using the default value passed in for log id: 4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 19/03/12 14:55:55 INFO session.SessionState: Deleted directory: /tmp/hive/root/4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 on fs with scheme hdfs 19/03/12 14:55:55 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/4a5f4f66-31eb-4a12-95d2-bf22d45ecde4 on fs with scheme file 19/03/12 14:55:55 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 14:55:55 INFO hive.HiveImport: Hive import complete. 19/03/12 14:55:55 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.这次导入成功了,去hive验证一下:[root@hadoop bin]# ./hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> use oracle; OK Time taken: 1.111 seconds hive> select * from ORA_HIVE; 。。。。。。。 997 XAOOEHXITLWEFBZFCNAB NULL 998 IZVWHVGTEJHCJWJZTDXK NULL 999 YMBFLJTTWPENEBXEWVIJ NULL 1000 QFKDIGEYFWQBZBGTJPPD NULL Time taken: 2.092 seconds, Fetched: 2000 row(s) 已经导入进来了
文章
SQL  ·  运维  ·  Oracle  ·  关系型数据库  ·  Java  ·  大数据  ·  数据库连接  ·  数据库  ·  HIVE
2023-03-24
【大数据开发运维解决方案】Sqoop增量同步mysql/oracle数据到hive(merge-key/append)测试文档
上一篇文章介绍了sqoop全量同步数据到hive,本片文章将通过实验详细介绍如何增量同步数据到hive,以及sqoop job与crontab定时结合无密码登录的增量同步实现方法。一、知识储备在生产环境中,系统可能会定期从与业务相关的关系型数据库向Hadoop导入数据,导入数仓后进行后续离线分析。故我们此时不可能再将所有数据重新导一遍,此时我们就需要增量数据导入这一模式了。增量数据导入分两种,一是基于递增列的增量数据导入(Append方式)。二是基于时间列的增量数据导入(LastModified方式),增量导入使用到的核心参数主要是:–check-column 用来指定一些列,这些列在增量导入时用来检查这些数据是否作为增量数据进行导入,和关系型数据库中的自增字段及时间戳类似. 注意:这些被指定的列的类型不能使任意字符类型,如char、varchar等类型都是不可以的,同时–check-column可以去指定多个列–incremental 用来指定增量导入的模式,两种模式分别为Append和Lastmodified–last-value 指定上一次导入中检查列指定字段最大值接下来通过具体实验来详细说明1、Append模式增量导入重要参数:–incremental append基于递增列的增量导入(将递增列值大于阈值的所有数据增量导入Hadoop)–check-column递增列(int)–last-value 阈值(int)举个简单例子,在oracle库scott用户下有一张员工表(inr_app),表中有:自增主键员工编号(empno),员工名(ename),员工职位(job),员工薪资(sal)这几个属性,如下:--在oracle库scott下创建一个这样的表 create table inr_app as select rownum as empno, ename, job, sal from emp a where job is not null and rownum<=5; --查询: select * from inr_app; EMPNO ENAME JOB SAL 1 er CLERK 800.00 2 ALLEN SALESMAN 1600.00 3 WARD SALESMAN 1250.00 4 JONES MANAGER 2975.00 5 MARTIN SALESMAN 1250.00我们需要将新进员工也导入hadoop以供公司人力部门做分析,此时我们需要将这个表数据导入到hive,也就是增量导入前的一次全量导入:--在hive创建表: create table INR_APP ( empno int, ename string, job string, sal float ); hive> show tables; OK inr_app inr_emp ora_hive Time taken: 0.166 seconds, Fetched: 3 row(s) --接下来执行全量导入: [root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_APP -m 1 --hive-import --hive-database oracle --查询hive表 hive> select * from inr_app; OK 1 er CLERK 800.0 2 ALLEN SALESMAN 1600.0 3 WARD SALESMAN 1250.0 4 JONES MANAGER 2975.0 5 MARTIN SALESMAN 1250.0 Time taken: 0.179 seconds, Fetched: 5 row(s)过了一段时间后,公司又新来一批员工,我们需要将新员工也导入到hadoop供有关部门分析,此时我们只需要指定–incremental 参数为append,–last-value参数为5可。表示只从id大于5后开始导入:--先给oracle库scott.inr_app插入几条数据: insert into inr_app values(6,'zhao','DBA',100); insert into inr_app values(7,'yan','BI',100); insert into inr_app values(8,'dong','JAVA',100); commit; --执行增量导入: [root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_APP -m 1 --hive-import --hive-database oracle --incremental app end --check-column EMPNO --last-value 5 Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/12 19:45:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/12 19:45:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/12 19:45:56 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 19/03/12 19:45:56 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 19/03/12 19:45:56 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/12 19:45:56 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/12 19:45:56 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/12 19:45:57 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 19:45:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_APP t WHERE 1=0 19/03/12 19:45:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/9b898359374ea580a390b32da1a37949/INR_APP.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/12 19:45:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/9b898359374ea580a390b32da1a37949/INR_APP.jar 19/03/12 19:45:59 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 19:45:59 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(EMPNO) FROM INR_APP 19/03/12 19:45:59 INFO tool.ImportTool: Incremental import based on column EMPNO 19/03/12 19:45:59 INFO tool.ImportTool: Lower bound value: 5 19/03/12 19:45:59 INFO tool.ImportTool: Upper bound value: 8 19/03/12 19:45:59 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 19:45:59 INFO mapreduce.ImportJobBase: Beginning import of INR_APP 19/03/12 19:46:00 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/12 19:46:00 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 19:46:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/12 19:46:01 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/12 19:46:04 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/12 19:46:04 INFO mapreduce.JobSubmitter: number of splits:1 19/03/12 19:46:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552371714699_0010 19/03/12 19:46:05 INFO impl.YarnClientImpl: Submitted application application_1552371714699_0010 19/03/12 19:46:05 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552371714699_0010/ 19/03/12 19:46:05 INFO mapreduce.Job: Running job: job_1552371714699_0010 19/03/12 19:46:13 INFO mapreduce.Job: Job job_1552371714699_0010 running in uber mode : false 19/03/12 19:46:13 INFO mapreduce.Job: map 0% reduce 0% 19/03/12 19:46:21 INFO mapreduce.Job: map 100% reduce 0% 19/03/12 19:46:21 INFO mapreduce.Job: Job job_1552371714699_0010 completed successfully 19/03/12 19:46:21 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=143702 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=44 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4336 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4336 Total vcore-milliseconds taken by all map tasks=4336 Total megabyte-milliseconds taken by all map tasks=4440064 Map-Reduce Framework Map input records=3 Map output records=3 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=92 CPU time spent (ms)=2760 Physical memory (bytes) snapshot=211570688 Virtual memory (bytes) snapshot=2133770240 Total committed heap usage (bytes)=106954752 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=44 19/03/12 19:46:21 INFO mapreduce.ImportJobBase: Transferred 44 bytes in 20.3436 seconds (2.1628 bytes/sec) 19/03/12 19:46:21 INFO mapreduce.ImportJobBase: Retrieved 3 records. 19/03/12 19:46:21 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_APP 19/03/12 19:46:21 INFO util.AppendUtils: Creating missing output directory - INR_APP 19/03/12 19:46:21 INFO manager.OracleManager: Time zone has been set to GMT 19/03/12 19:46:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_APP t WHERE 1=0 19/03/12 19:46:21 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive 19/03/12 19:46:21 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive 19/03/12 19:46:21 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/12 19:46:21 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 19:46:24 INFO SessionState: Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/12 19:46:24 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/2968942b-30b6-49f5-b86c-d71a77963381 19/03/12 19:46:24 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/2968942b-30b6-49f5-b86c-d71a77963381 19/03/12 19:46:24 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/2968942b-30b6-49f5-b86c-d71a77963381/_tmp_space.db 19/03/12 19:46:24 INFO conf.HiveConf: Using the default value passed in for log id: 2968942b-30b6-49f5-b86c-d71a77963381 19/03/12 19:46:24 INFO session.SessionState: Updating thread name to 2968942b-30b6-49f5-b86c-d71a77963381 main 19/03/12 19:46:24 INFO conf.HiveConf: Using the default value passed in for log id: 2968942b-30b6-49f5-b86c-d71a77963381 19/03/12 19:46:24 INFO ql.Driver: Compiling command(queryId=root_20190312114624_6679c12a-4224-4bcd-a8be-f7d4ae56a139): CREATE TABLE IF NOT EXISTS `oracle`.`INR_APP` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `SAL` DOUBLE) COMMENT 'Imported by sqoop on 2019/03/12 11:46:21' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 19:46:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 19:46:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 19:46:27 INFO hive.metastore: Connected to metastore. 19/03/12 19:46:27 INFO parse.CalcitePlanner: Starting Semantic Analysis 19/03/12 19:46:27 INFO parse.CalcitePlanner: Creating table oracle.INR_APP position=27 19/03/12 19:46:27 INFO ql.Driver: Semantic Analysis Completed 19/03/12 19:46:27 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 19:46:27 INFO ql.Driver: Completed compiling command(queryId=root_20190312114624_6679c12a-4224-4bcd-a8be-f7d4ae56a139); Time taken: 2.876 seconds 19/03/12 19:46:27 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 19:46:27 INFO ql.Driver: Executing command(queryId=root_20190312114624_6679c12a-4224-4bcd-a8be-f7d4ae56a139): CREATE TABLE IF NOT EXISTS `oracle`.`INR_APP` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `SAL` DOUBLE) COMMENT 'Imported by sqoop on 2019/03/12 11:46:21' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/12 19:46:27 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=2968942b-30b6-49f5-b86c-d71a7796338 1, clientType=HIVECLI]19/03/12 19:46:27 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 19/03/12 19:46:27 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop. hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/12 19:46:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 19:46:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 19:46:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 19:46:27 INFO hive.metastore: Connected to metastore. 19/03/12 19:46:27 INFO ql.Driver: Completed executing command(queryId=root_20190312114624_6679c12a-4224-4bcd-a8be-f7d4ae56a139); Time taken: 0.096 seconds OK 19/03/12 19:46:27 INFO ql.Driver: OK Time taken: 2.982 seconds 19/03/12 19:46:27 INFO CliDriver: Time taken: 2.982 seconds 19/03/12 19:46:27 INFO conf.HiveConf: Using the default value passed in for log id: 2968942b-30b6-49f5-b86c-d71a77963381 19/03/12 19:46:27 INFO session.SessionState: Resetting thread name to main 19/03/12 19:46:27 INFO conf.HiveConf: Using the default value passed in for log id: 2968942b-30b6-49f5-b86c-d71a77963381 19/03/12 19:46:27 INFO session.SessionState: Updating thread name to 2968942b-30b6-49f5-b86c-d71a77963381 main 19/03/12 19:46:27 INFO ql.Driver: Compiling command(queryId=root_20190312114627_748c136c-1446-43df-a819-728becae7df2): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_APP' INTO TABLE `oracle`.`INR_APP` 19/03/12 19:46:28 INFO ql.Driver: Semantic Analysis Completed 19/03/12 19:46:28 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/12 19:46:28 INFO ql.Driver: Completed compiling command(queryId=root_20190312114627_748c136c-1446-43df-a819-728becae7df2); Time taken: 0.421 seconds 19/03/12 19:46:28 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/12 19:46:28 INFO ql.Driver: Executing command(queryId=root_20190312114627_748c136c-1446-43df-a819-728becae7df2): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_APP' INTO TABLE `oracle`.`INR_APP` 19/03/12 19:46:28 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/03/12 19:46:28 INFO hive.metastore: Closed a connection to metastore, current connections: 0 Loading data to table oracle.inr_app 19/03/12 19:46:28 INFO exec.Task: Loading data to table oracle.inr_app from hdfs://192.168.1.66:9000/user/root/INR_APP 19/03/12 19:46:28 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 19:46:28 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 19:46:28 INFO hive.metastore: Connected to metastore. 19/03/12 19:46:28 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 19/03/12 19:46:28 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode 19/03/12 19:46:28 INFO exec.StatsTask: Executing stats task 19/03/12 19:46:28 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 19:46:28 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 19:46:28 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 19:46:28 INFO hive.metastore: Connected to metastore. 19/03/12 19:46:29 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 19:46:29 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/12 19:46:29 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/12 19:46:29 INFO hive.metastore: Connected to metastore. 19/03/12 19:46:29 INFO exec.StatsTask: Table oracle.inr_app stats: [numFiles=2, numRows=0, totalSize=146, rawDataSize=0] 19/03/12 19:46:29 INFO ql.Driver: Completed executing command(queryId=root_20190312114627_748c136c-1446-43df-a819-728becae7df2); Time taken: 0.992 seconds OK 19/03/12 19:46:29 INFO ql.Driver: OK Time taken: 1.415 seconds 19/03/12 19:46:29 INFO CliDriver: Time taken: 1.415 seconds 19/03/12 19:46:29 INFO conf.HiveConf: Using the default value passed in for log id: 2968942b-30b6-49f5-b86c-d71a77963381 19/03/12 19:46:29 INFO session.SessionState: Resetting thread name to main 19/03/12 19:46:29 INFO conf.HiveConf: Using the default value passed in for log id: 2968942b-30b6-49f5-b86c-d71a77963381 19/03/12 19:46:29 INFO session.SessionState: Deleted directory: /tmp/hive/root/2968942b-30b6-49f5-b86c-d71a77963381 on fs with scheme hdfs 19/03/12 19:46:29 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/2968942b-30b6-49f5-b86c-d71a77963381 on fs with scheme file 19/03/12 19:46:29 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/12 19:46:29 INFO hive.HiveImport: Hive import complete. 19/03/12 19:46:29 INFO hive.HiveImport: Export directory is empty, removing it. 19/03/12 19:46:29 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments: 19/03/12 19:46:29 INFO tool.ImportTool: --incremental append 19/03/12 19:46:29 INFO tool.ImportTool: --check-column EMPNO 19/03/12 19:46:29 INFO tool.ImportTool: --last-value 8 19/03/12 19:46:29 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')查询hive表hive> select * from inr_app; OK 1 er CLERK 800.0 2 ALLEN SALESMAN 1600.0 3 WARD SALESMAN 1250.0 4 JONES MANAGER 2975.0 5 MARTIN SALESMAN 1250.0 6 zhao DBA 100.0 7 yan BI 100.0 8 dong JAVA 100.0 Time taken: 0.165 seconds, Fetched: 8 row(s)已经增量过来了,我们也可以使用hdfs dfs -cat查看生成的数据文件,生成的数据文件位置在之前配置hadoop环境时已经配置,读者也可以通过自己访问自己环境:IP:50070/explorer.html#/查询[root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/oracle.db/inr_app/part-m-00000_copy_1 6zhaoDBA100 7yanBI100 8dongJAVA100至于之前全量的数据,也可以看到:[root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/oracle.db/inr_app/part-m-00000 1erCLERK800 2ALLENSALESMAN1600 3WARDSALESMAN1250 4JONESMANAGER2975 5MARTINSALESMAN12502、、lastModify增量导入lastModify增量导入又分为两种模式:a、--incremental append 附加模式b、--incremental --merge-key合并模式接下来继续看实验:实验一:附加模式此方式要求原有表中有time字段,它能指定一个时间戳,让Sqoop把该时间戳之后的数据导入至Hadoop(这里为HDFS)。因为后续员工薪资可能状态会变化,变化后time字段时间戳也会变化,此时Sqoop依然会将相同状态更改后的员工信息导入HDFS,因此为导致数据重复。先在oracle库基于scott.inr_app新建一个带时间列etltime的表inr_las,初始化已有数据时间为sysdatecreate table inr_las as select a.empno, a.ename, a.job, a.sal, sysdate as etltime from inr_app a; select * from inr_las; EMPNO ENAME JOB SAL ETLTIME 1 er CLERK 800.00 2019/3/20 10:42:27 2 ALLEN SALESMAN 1600.00 2019/3/20 10:42:27 3 WARD SALESMAN 1250.00 2019/3/20 10:42:27 4 JONES MANAGER 2975.00 2019/3/20 10:42:27 5 MARTIN SALESMAN 1250.00 2019/3/20 10:42:27 6 zhao DBA 100.00 2019/3/20 10:42:27 7 yan BI 100.00 2019/3/20 10:42:27 8 dong JAVA 100.00 2019/3/20 10:42:27在hive创建表,这里统一指定列分隔符为'\t',后面导入也是以此为分隔符:create table INR_LAS ( empno int, ename string, job string, sal float, etltime string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';初始化全量导入:[root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS -m 1 --hive-import --hive-database oracle --fields-terminated-by '\t' --lines-terminated-by '\n'查询hive表:hive> select * from inr_las; OK 1 er CLERK 800.0 2019-03-20 10:42:27.0 2 ALLEN SALESMAN 1600.0 2019-03-20 10:42:27.0 3 WARD SALESMAN 1250.0 2019-03-20 10:42:27.0 4 JONES MANAGER 2975.0 2019-03-20 10:42:27.0 5 MARTIN SALESMAN 1250.0 2019-03-20 10:42:27.0 6 zhao DBA 100.0 2019-03-20 10:42:27.0 7 yan BI 100.0 2019-03-20 10:42:27.0 8 dong JAVA 100.0 2019-03-20 10:42:27.0 Time taken: 0.181 seconds, Fetched: 8 row(s)这次增量导入我们先使用--incremental lastmodified --last-value --append 看下效果,首先在源端对inr_las表数据做下变更:update inr_las set sal=1000,etltime=sysdate where empno=6; commit; select * from inr_las; EMPNO ENAME JOB SAL ETLTIME 1 er CLERK 800.00 2019/3/20 10:42:27 2 ALLEN SALESMAN 1600.00 2019/3/20 10:42:27 3 WARD SALESMAN 1250.00 2019/3/20 10:42:27 4 JONES MANAGER 2975.00 2019/3/20 10:42:27 5 MARTIN SALESMAN 1250.00 2019/3/20 10:42:27 6 zhao DBA 1000.00 2019/3/20 10:52:34 7 yan BI 100.00 2019/3/20 10:42:27 8 dong JAVA 100.00 2019/3/20 10:42:27接下来增量导入:[root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS --fields-terminated-by '\t' --lines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 10:42:27' -m 1 --null-string '\\N' --null-non-string '\\N' Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation. '2019-03Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/13 14:46:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/13 14:46:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/13 14:46:27 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/13 14:46:27 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/13 14:46:27 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/13 14:46:27 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 14:46:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 14:46:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/37cf0f81337f33bc731bf3d6fd0a3f73/INR_LAS.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/13 14:46:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/37cf0f81337f33bc731bf3d6fd0a3f73/INR_LAS.jar 19/03/13 14:46:30 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 14:46:30 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS 19/03/13 14:46:30 INFO tool.ImportTool: Incremental import based on column ETLTIME 19/03/13 14:46:30 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 10:42:27', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/13 14:46:30 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 10:52:34.0', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/13 14:46:31 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 14:46:31 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS 19/03/13 14:46:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/13 14:46:31 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 14:46:32 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/13 14:46:32 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/13 14:46:35 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/13 14:46:35 INFO mapreduce.JobSubmitter: number of splits:1 19/03/13 14:46:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552371714699_0031 19/03/13 14:46:36 INFO impl.YarnClientImpl: Submitted application application_1552371714699_0031 19/03/13 14:46:36 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552371714699_0031/ 19/03/13 14:46:36 INFO mapreduce.Job: Running job: job_1552371714699_0031 19/03/13 14:46:45 INFO mapreduce.Job: Job job_1552371714699_0031 running in uber mode : false 19/03/13 14:46:45 INFO mapreduce.Job: map 0% reduce 0% 19/03/13 14:46:52 INFO mapreduce.Job: map 100% reduce 0% 19/03/13 14:46:53 INFO mapreduce.Job: Job job_1552371714699_0031 completed successfully 19/03/13 14:46:54 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=143840 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=38 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4950 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4950 Total vcore-milliseconds taken by all map tasks=4950 Total megabyte-milliseconds taken by all map tasks=5068800 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=560 CPU time spent (ms)=2890 Physical memory (bytes) snapshot=189190144 Virtual memory (bytes) snapshot=2141667328 Total committed heap usage (bytes)=116391936 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=38 19/03/13 14:46:54 INFO mapreduce.ImportJobBase: Transferred 38 bytes in 21.7168 seconds (1.7498 bytes/sec) 19/03/13 14:46:54 INFO mapreduce.ImportJobBase: Retrieved 1 records. 19/03/13 14:46:54 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS 19/03/13 14:46:54 INFO util.AppendUtils: Creating missing output directory - INR_LAS 19/03/13 14:46:54 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 14:46:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 14:46:54 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive 19/03/13 14:46:54 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive 19/03/13 14:46:54 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive 19/03/13 14:46:54 INFO hive.HiveImport: Loading uploaded data into Hive 19/03/13 14:46:54 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/13 14:46:57 INFO SessionState: Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true 19/03/13 14:46:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/dbf3aaff-4a20-426b-bc59-9117e821a2f5 19/03/13 14:46:57 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/dbf3aaff-4a20-426b-bc59-9117e821a2f5 19/03/13 14:46:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/dbf3aaff-4a20-426b-bc59-9117e821a2f5/_tmp_space.db 19/03/13 14:46:57 INFO conf.HiveConf: Using the default value passed in for log id: dbf3aaff-4a20-426b-bc59-9117e821a2f5 19/03/13 14:46:57 INFO session.SessionState: Updating thread name to dbf3aaff-4a20-426b-bc59-9117e821a2f5 main 19/03/13 14:46:57 INFO conf.HiveConf: Using the default value passed in for log id: dbf3aaff-4a20-426b-bc59-9117e821a2f5 19/03/13 14:46:57 INFO ql.Driver: Compiling command(queryId=root_20190313064657_78359340-8092-4093-a9ed-b5a8e82ea901): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 06:46:54' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 14:47:00 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 14:47:00 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 14:47:00 INFO hive.metastore: Connected to metastore. 19/03/13 14:47:00 INFO parse.CalcitePlanner: Starting Semantic Analysis 19/03/13 14:47:00 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27 19/03/13 14:47:00 INFO ql.Driver: Semantic Analysis Completed 19/03/13 14:47:00 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/13 14:47:00 INFO ql.Driver: Completed compiling command(queryId=root_20190313064657_78359340-8092-4093-a9ed-b5a8e82ea901); Time taken: 3.122 seconds 19/03/13 14:47:00 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/13 14:47:00 INFO ql.Driver: Executing command(queryId=root_20190313064657_78359340-8092-4093-a9ed-b5a8e82ea901): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME ` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 06:46:54' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 14:47:00 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=dbf3aaff-4a20-426b-bc59-9117e821a2f 5, clientType=HIVECLI]19/03/13 14:47:00 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 19/03/13 14:47:00 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop. hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 14:47:00 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 14:47:00 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 14:47:00 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 14:47:00 INFO hive.metastore: Connected to metastore. 19/03/13 14:47:00 INFO ql.Driver: Completed executing command(queryId=root_20190313064657_78359340-8092-4093-a9ed-b5a8e82ea901); Time taken: 0.099 seconds OK 19/03/13 14:47:00 INFO ql.Driver: OK Time taken: 3.234 seconds 19/03/13 14:47:00 INFO CliDriver: Time taken: 3.234 seconds 19/03/13 14:47:00 INFO conf.HiveConf: Using the default value passed in for log id: dbf3aaff-4a20-426b-bc59-9117e821a2f5 19/03/13 14:47:00 INFO session.SessionState: Resetting thread name to main 19/03/13 14:47:00 INFO conf.HiveConf: Using the default value passed in for log id: dbf3aaff-4a20-426b-bc59-9117e821a2f5 19/03/13 14:47:00 INFO session.SessionState: Updating thread name to dbf3aaff-4a20-426b-bc59-9117e821a2f5 main 19/03/13 14:47:00 INFO ql.Driver: Compiling command(queryId=root_20190313064700_5af88364-6217-429d-90a0-1816e54f44d9): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS` 19/03/13 14:47:01 INFO ql.Driver: Semantic Analysis Completed 19/03/13 14:47:01 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 19/03/13 14:47:01 INFO ql.Driver: Completed compiling command(queryId=root_20190313064700_5af88364-6217-429d-90a0-1816e54f44d9); Time taken: 0.443 seconds 19/03/13 14:47:01 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 19/03/13 14:47:01 INFO ql.Driver: Executing command(queryId=root_20190313064700_5af88364-6217-429d-90a0-1816e54f44d9): LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS` 19/03/13 14:47:01 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 19/03/13 14:47:01 INFO hive.metastore: Closed a connection to metastore, current connections: 0 Loading data to table oracle.inr_las 19/03/13 14:47:01 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS 19/03/13 14:47:01 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 14:47:01 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 14:47:01 INFO hive.metastore: Connected to metastore. 19/03/13 14:47:01 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! 19/03/13 14:47:02 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode 19/03/13 14:47:02 INFO exec.StatsTask: Executing stats task 19/03/13 14:47:02 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 14:47:02 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 14:47:02 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 14:47:02 INFO hive.metastore: Connected to metastore. 19/03/13 14:47:02 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 14:47:02 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083 19/03/13 14:47:02 INFO hive.metastore: Opened a connection to metastore, current connections: 1 19/03/13 14:47:02 INFO hive.metastore: Connected to metastore. 19/03/13 14:47:02 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=2, numRows=0, totalSize=360, rawDataSize=0] 19/03/13 14:47:02 INFO ql.Driver: Completed executing command(queryId=root_20190313064700_5af88364-6217-429d-90a0-1816e54f44d9); Time taken: 1.211 seconds OK 19/03/13 14:47:02 INFO ql.Driver: OK Time taken: 1.654 seconds 19/03/13 14:47:02 INFO CliDriver: Time taken: 1.654 seconds 19/03/13 14:47:02 INFO conf.HiveConf: Using the default value passed in for log id: dbf3aaff-4a20-426b-bc59-9117e821a2f5 19/03/13 14:47:02 INFO session.SessionState: Resetting thread name to main 19/03/13 14:47:02 INFO conf.HiveConf: Using the default value passed in for log id: dbf3aaff-4a20-426b-bc59-9117e821a2f5 19/03/13 14:47:02 INFO session.SessionState: Deleted directory: /tmp/hive/root/dbf3aaff-4a20-426b-bc59-9117e821a2f5 on fs with scheme hdfs 19/03/13 14:47:02 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/dbf3aaff-4a20-426b-bc59-9117e821a2f5 on fs with scheme file 19/03/13 14:47:02 INFO hive.metastore: Closed a connection to metastore, current connections: 0 19/03/13 14:47:02 INFO hive.HiveImport: Hive import complete. 19/03/13 14:47:02 INFO hive.HiveImport: Export directory is empty, removing it. 19/03/13 14:47:02 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments: 19/03/13 14:47:02 INFO tool.ImportTool: --incremental append 19/03/13 14:47:02 INFO tool.ImportTool: --check-column ETLTIME 19/03/13 14:47:02 INFO tool.ImportTool: --last-value 2019-03-20 10:52:34.0 19/03/13 14:47:02 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')查询hive表hive> select * from inr_las; OK 1 er CLERK 800.0 2019-03-20 10:42:27.0 2 ALLEN SALESMAN 1600.0 2019-03-20 10:42:27.0 3 WARD SALESMAN 1250.0 2019-03-20 10:42:27.0 4 JONES MANAGER 2975.0 2019-03-20 10:42:27.0 5 MARTIN SALESMAN 1250.0 2019-03-20 10:42:27.0 6 zhao DBA 100.0 2019-03-20 10:42:27.0 7 yan BI 100.0 2019-03-20 10:42:27.0 8 dong JAVA 100.0 2019-03-20 10:42:27.0 6 zhao DBA 1000.0 2019-03-20 10:52:34.0 Time taken: 0.171 seconds, Fetched: 9 row(s)通过上面查询结果可以看到,empno=6的这个员工薪资和etltime记录变更时间都变化后,根据上一次全量初始化后的最大时间来做增量的起始时间去源端oracle查数时候,发现了新的发生变化的数据,然后将它最新状态抽到了hive,采用的追加方式,因此hive里存了两条记录,导致了数据重复,根据时间可以取最新的状态来获取最新数据状态。实验二:合并模式接着上面实验环境继续做,这次采用合并模式来看看效果: --先看下当前的源端oracle数据: EMPNO ENAME JOB SAL ETLTIME 1 er CLERK 800.00 2019/3/20 10:42:27 2 ALLEN SALESMAN 1600.00 2019/3/20 10:42:27 3 WARD SALESMAN 1250.00 2019/3/20 10:42:27 4 JONES MANAGER 2975.00 2019/3/20 10:42:27 5 MARTIN SALESMAN 1250.00 2019/3/20 10:42:27 6 zhao DBA 1000.00 2019/3/20 10:52:34 7 yan BI 100.00 2019/3/20 10:42:27 8 dong JAVA 200.00 2019/3/21 17:12:46 先把前面的hive表给删了hive> drop table inr_las; OK Time taken: 0.195 seconds创建为外部表hive>create table INR_LAS ( empno int, ename string, job string, sal float, etltime string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/user/hive/warehouse/exter_inr_las'; OK Time taken: 0.226 seconds注意,/user/hive/warehouse/exter_inr_las这个目录在第一次全量初始化时不要存在,它会自己创建,如果存在会报目录已存在错误:ERROR tool.ImportTool: Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.1.66:9000/user/hive/warehouse/exter_inr_las alre ady exists这时候应该先删除一次这个目录:[root@hadoop ~]# hadoop fs -rmr /user/hive/warehouse/exter_inr_las rmr: DEPRECATED: Please use 'rm -r' instead. 19/03/13 22:05:33 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /user/hive/warehouse/exter_inr_las接下来全量导入一次:[root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS -m 1 --target-dir /user/hive/warehouse/exter_inr_las --fiel ds-terminated-by '\t'Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/13 22:05:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/13 22:05:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/13 22:05:48 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/13 22:05:48 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/13 22:05:48 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/13 22:05:49 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 22:05:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 22:05:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/c8b2ed3172295709d819d17ca24aaf50/INR_LAS.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/13 22:05:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/c8b2ed3172295709d819d17ca24aaf50/INR_LAS.jar 19/03/13 22:05:52 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 22:05:52 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 22:05:52 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS 19/03/13 22:05:52 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/13 22:05:52 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 22:05:53 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/13 22:05:54 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/13 22:05:57 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/13 22:05:57 INFO mapreduce.JobSubmitter: number of splits:1 19/03/13 22:05:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552482402053_0006 19/03/13 22:05:58 INFO impl.YarnClientImpl: Submitted application application_1552482402053_0006 19/03/13 22:05:58 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552482402053_0006/ 19/03/13 22:05:58 INFO mapreduce.Job: Running job: job_1552482402053_0006 19/03/13 22:06:07 INFO mapreduce.Job: Job job_1552482402053_0006 running in uber mode : false 19/03/13 22:06:07 INFO mapreduce.Job: map 0% reduce 0% 19/03/13 22:06:13 INFO mapreduce.Job: map 100% reduce 0% 19/03/13 22:06:15 INFO mapreduce.Job: Job job_1552482402053_0006 completed successfully 19/03/13 22:06:15 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=144058 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=323 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4115 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4115 Total vcore-milliseconds taken by all map tasks=4115 Total megabyte-milliseconds taken by all map tasks=4213760 Map-Reduce Framework Map input records=8 Map output records=8 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=109 CPU time spent (ms)=2220 Physical memory (bytes) snapshot=187392000 Virtual memory (bytes) snapshot=2140803072 Total committed heap usage (bytes)=106430464 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=323 19/03/13 22:06:15 INFO mapreduce.ImportJobBase: Transferred 323 bytes in 21.3756 seconds (15.1107 bytes/sec) 19/03/13 22:06:15 INFO mapreduce.ImportJobBase: Retrieved 8 records.查看一下hdfs此文件夹下文件:[root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/exter_inr_las/part-m-00000 1 er CLERK 800 2019-03-20 10:42:27.0 2 ALLEN SALESMAN 1600 2019-03-20 10:42:27.0 3 WARD SALESMAN 1250 2019-03-20 10:42:27.0 4 JONES MANAGER 2975 2019-03-20 10:42:27.0 5 MARTIN SALESMAN 1250 2019-03-20 10:42:27.0 6 zhao DBA 1000 2019-03-20 10:52:34.0 7 yan BI 100 2019-03-20 10:42:27.0 8 dong JAVA 200 2019-03-21 17:12:46.0查一下hive表:hive> select * from inr_las; OK 1 er CLERK 800.0 2019-03-20 10:42:27.0 2 ALLEN SALESMAN 1600.0 2019-03-20 10:42:27.0 3 WARD SALESMAN 1250.0 2019-03-20 10:42:27.0 4 JONES MANAGER 2975.0 2019-03-20 10:42:27.0 5 MARTIN SALESMAN 1250.0 2019-03-20 10:42:27.0 6 zhao DBA 1000.0 2019-03-20 10:52:34.0 7 yan BI 100.0 2019-03-20 10:42:27.0 8 dong JAVA 200.0 2019-03-21 17:12:46.0 Time taken: 0.191 seconds, Fetched: 8 row(s)接下来修改一下oracle的数据:update inr_las set sal=400 ,etltime=sysdate where empno=8; commit; select * from inr_las; EMPNO ENAME JOB SAL ETLTIME 1 er CLERK 800.00 2019/3/20 10:42:27 2 ALLEN SALESMAN 1600.00 2019/3/20 10:42:27 3 WARD SALESMAN 1250.00 2019/3/20 10:42:27 4 JONES MANAGER 2975.00 2019/3/20 10:42:27 5 MARTIN SALESMAN 1250.00 2019/3/20 10:42:27 6 zhao DBA 1000.00 2019/3/20 10:52:34 7 yan BI 100.00 2019/3/20 10:42:27 8 dong JAVA 400.00 2019/3/21 17:47:03--已经更改了接下来做合并模式增量:[root@hadoop ~]# sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS --fields-terminated-by '\t' --lines-terminated-by '\n' --t arget-dir /user/hive/warehouse/exter_inr_las -m 1 --check-column ETLTIME --incremental lastmodified --merge-key EMPNO --last-value "2019-03-21 17:12:46"Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/03/13 22:18:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/13 22:18:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/03/13 22:18:42 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/13 22:18:42 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/13 22:18:42 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/03/13 22:18:43 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 22:18:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 22:18:43 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop Note: /tmp/sqoop-root/compile/d4af8fb9c2b8dd33c20926713e8d23e2/INR_LAS.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/03/13 22:18:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/d4af8fb9c2b8dd33c20926713e8d23e2/INR_LAS.jar 19/03/13 22:18:47 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 22:18:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0 19/03/13 22:18:47 INFO tool.ImportTool: Incremental import based on column ETLTIME 19/03/13 22:18:47 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-21 17:12:46', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/13 22:18:47 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-21 17:54:19.0', 'YYYY-MM-DD HH24:MI:SS.FF') 19/03/13 22:18:47 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 22:18:47 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS 19/03/13 22:18:47 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/03/13 22:18:47 INFO manager.OracleManager: Time zone has been set to GMT 19/03/13 22:18:48 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/03/13 22:18:48 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/13 22:18:52 INFO db.DBInputFormat: Using read commited transaction isolation 19/03/13 22:18:52 INFO mapreduce.JobSubmitter: number of splits:1 19/03/13 22:18:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552482402053_0009 19/03/13 22:18:53 INFO impl.YarnClientImpl: Submitted application application_1552482402053_0009 19/03/13 22:18:53 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552482402053_0009/ 19/03/13 22:18:53 INFO mapreduce.Job: Running job: job_1552482402053_0009 19/03/13 22:19:02 INFO mapreduce.Job: Job job_1552482402053_0009 running in uber mode : false 19/03/13 22:19:02 INFO mapreduce.Job: map 0% reduce 0% 19/03/13 22:19:09 INFO mapreduce.Job: map 100% reduce 0% 19/03/13 22:19:10 INFO mapreduce.Job: Job job_1552482402053_0009 completed successfully 19/03/13 22:19:10 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=144379 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=38 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4767 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4767 Total vcore-milliseconds taken by all map tasks=4767 Total megabyte-milliseconds taken by all map tasks=4881408 Map-Reduce Framework Map input records=1 Map output records=1 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=414 CPU time spent (ms)=2360 Physical memory (bytes) snapshot=189968384 Virtual memory (bytes) snapshot=2140639232 Total committed heap usage (bytes)=117440512 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=38 19/03/13 22:19:10 INFO mapreduce.ImportJobBase: Transferred 38 bytes in 22.4022 seconds (1.6963 bytes/sec) 19/03/13 22:19:11 INFO mapreduce.ImportJobBase: Retrieved 1 records. 19/03/13 22:19:11 INFO tool.ImportTool: Final destination exists, will run merge job. 19/03/13 22:19:11 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 19/03/13 22:19:11 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 19/03/13 22:19:14 INFO input.FileInputFormat: Total input paths to process : 2 19/03/13 22:19:14 INFO mapreduce.JobSubmitter: number of splits:2 19/03/13 22:19:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552482402053_0010 19/03/13 22:19:14 INFO impl.YarnClientImpl: Submitted application application_1552482402053_0010 19/03/13 22:19:14 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552482402053_0010/ 19/03/13 22:19:14 INFO mapreduce.Job: Running job: job_1552482402053_0010 19/03/13 22:19:25 INFO mapreduce.Job: Job job_1552482402053_0010 running in uber mode : false 19/03/13 22:19:25 INFO mapreduce.Job: map 0% reduce 0% 19/03/13 22:19:33 INFO mapreduce.Job: map 100% reduce 0% 19/03/13 22:19:40 INFO mapreduce.Job: map 100% reduce 100% 19/03/13 22:19:40 INFO mapreduce.Job: Job job_1552482402053_0010 completed successfully 19/03/13 22:19:40 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=614 FILE: Number of bytes written=434631 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=657 HDFS: Number of bytes written=323 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=9137 Total time spent by all reduces in occupied slots (ms)=4019 Total time spent by all map tasks (ms)=9137 Total time spent by all reduce tasks (ms)=4019 Total vcore-milliseconds taken by all map tasks=9137 Total vcore-milliseconds taken by all reduce tasks=4019 Total megabyte-milliseconds taken by all map tasks=9356288 Total megabyte-milliseconds taken by all reduce tasks=4115456 Map-Reduce Framework Map input records=9 Map output records=9 Map output bytes=590 Map output materialized bytes=620 Input split bytes=296 Combine input records=0 Combine output records=0 Reduce input groups=8 Reduce shuffle bytes=620 Reduce input records=9 Reduce output records=8 Spilled Records=18 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=503 CPU time spent (ms)=3680 Physical memory (bytes) snapshot=704909312 Virtual memory (bytes) snapshot=6395523072 Total committed heap usage (bytes)=517996544 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=361 File Output Format Counters Bytes Written=323 19/03/13 22:19:40 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments: 19/03/13 22:19:40 INFO tool.ImportTool: --incremental lastmodified 19/03/13 22:19:40 INFO tool.ImportTool: --check-column ETLTIME 19/03/13 22:19:40 INFO tool.ImportTool: --last-value 2019-03-21 17:54:19.0 19/03/13 22:19:40 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create') 这时候去看下/user/hive/warehouse/exter_inr_las/内容,你会发现part-m-00000变成了part-r-00000,意思是做了reduce:root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/exter_inr_las/part-r-00000 1 er CLERK 800 2019-03-20 10:42:27.0 2 ALLEN SALESMAN 1600 2019-03-20 10:42:27.0 3 WARD SALESMAN 1250 2019-03-20 10:42:27.0 4 JONES MANAGER 2975 2019-03-20 10:42:27.0 5 MARTIN SALESMAN 1250 2019-03-20 10:42:27.0 6 zhao DBA 1000 2019-03-20 10:52:34.0 7 yan BI 100 2019-03-20 10:42:27.0 8 dong JAVA 400 2019-03-21 17:47:03.0发现empno=8的记录的确做了变更了,增量同步成功,去看下hive表:hive> select * from inr_las; OK 1 er CLERK 800.0 2019-03-20 10:42:27.0 2 ALLEN SALESMAN 1600.0 2019-03-20 10:42:27.0 3 WARD SALESMAN 1250.0 2019-03-20 10:42:27.0 4 JONES MANAGER 2975.0 2019-03-20 10:42:27.0 5 MARTIN SALESMAN 1250.0 2019-03-20 10:42:27.0 6 zhao DBA 1000.0 2019-03-20 10:52:34.0 7 yan BI 100.0 2019-03-20 10:42:27.0 8 dong JAVA 400.0 2019-03-21 17:47:03.0 Time taken: 0.196 seconds, Fetched: 8 row(s)没问题。由于篇幅原因,sqoop job的使用及增量脚本定时同步数据的案例写在了下一篇文章
文章
SQL  ·  分布式计算  ·  运维  ·  Oracle  ·  关系型数据库  ·  Hadoop  ·  MySQL  ·  大数据  ·  HIVE  ·  数据安全/隐私保护
2023-03-24
【大数据开发运维解决方案】Hadoop2.7.6+Spark2.4.4+Scala2.11.12+Hudi0.5.2单机伪分布式安装
Hadoop2.7.6+Spark2.4.4+Scala2.11.12+Hudi0.5.2单机伪分布式安装注意1、本文档使用的基础hadoop环境是基于本人写的另一篇文章的基础上新增的spark和hudi的安装部署文档,基础环境部署文档2、整篇文章配置相对简单,走了一些坑,没有写在文档里,为了像我一样的小白看我的文档,按着错误的路径走了,文章整体写的较为详细,按照文章整体过程来做应该不会出错,如果需要搭建基础大数据环境的,可以看上面本人写的hadoop环境部署文档,写的较为详细。3、关于spark和hudi的介绍这里不再赘述,网上和官方文档有很多的文字介绍,本文所有安装所需的介质或官方文档均已给出可以直接下载或跳转的路径,方便各位免费下载与我文章安装的一致版本的介质。4、下面是本实验安装完成后本人实验环境整体hadoop系列组件的版本情况:软件名称版本号Hadoop2.7.6Mysql5.7Hive2.3.2Hbase1.4.9Spark2.4.4Hudi0.5.2JDK1.8.0_151Scala2.11.12OGG for bigdata12.3Kylin2.4Kafka2.11-1.1.1Zookeeper3.4.6Oracle Linux6.8x64一、安装spark依赖的Scala因为其他版本的Spark都是基于2.11.版本,只有2.4.2版本的才使用Scala2.12. 版本进行开发,hudi官方用的是spark2.4.4,而spark:"Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151)",所以这里我们下载scala2.11.12。1.1 下载和解压缩Scala下载地址:点击进入下载linux版本:在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:[root@hadoop opt]# cd /usr/ [root@hadoop usr]# mkdir scala [root@hadoop usr]# cd scala/ [root@hadoop scala]# pwd /usr/scala [root@hadoop scala]# ls scala-2.11.12.tgz [root@hadoop scala]# tar -zxvf scala-2.11.12.tgz [root@hadoop scala]# ls scala-2.11.12 scala-2.11.12.tgz [root@hadoop scala]# rm -rf *tgz [root@hadoop scala]# cd scala-2.11.12/ [root@hadoop scala-2.11.12]# pwd /usr/scala/scala-2.11.121.2 配置环境变量编辑/etc/profile这个文件,在文件中增加配置:export SCALA_HOME=/usr/scala/scala-2.11.12 在该文件的PATH变量中增加下面的内容: ${SCALA_HOME}/bin添加完成后,我的/etc/profile的配置如下:export JAVA_HOME=/usr/java/jdk1.8.0_151 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/hadoop/ export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR" export HIVE_HOME=/hadoop/hive export HIVE_CONF_DIR=${HIVE_HOME}/conf export HCAT_HOME=$HIVE_HOME/hcatalog export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/ export ZOOKEEPER_HOME=/hadoop/zookeeper export KAFKA_HOME=/hadoop/kafka export KYLIN_HOME=/hadoop/kylin/ export GGHOME=/hadoop/ogg12 export SCALA_HOME=/usr/scala/scala-2.11.12 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib保存退出,source一下使环境变量生效:[root@hadoop ~]# source /etc/profile1.3 验证Scala[root@hadoop scala-2.11.12]# scala -version Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL二、 下载和解压缩Spark2.1、下载Spark下载地址:点击进入2.2 解压缩Spark在/hadoop创建spark目录用户存放spark。[root@hadoop scala-2.11.12]# cd /hadoop/ [root@hadoop hadoop]# mkdir spark [root@hadoop hadoop]# cd spark/ 通过xftp上传安装包到spark目录 [root@hadoop spark]# tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz [root@hadoop spark]# ls spark-2.4.4-bin-hadoop2.7 spark-2.4.4-bin-hadoop2.7.tgz [root@hadoop spark]# rm -rf *tgz [root@hadoop spark]# mv spark-2.4.4-bin-hadoop2.7/* . [root@hadoop spark]# ls bin conf data examples jars kubernetes LICENSE licenses NOTICE python R README.md RELEASE sbin spark-2.4.4-bin-hadoop2.7 yarn三、Spark相关的配置3.1、配置环境变量编辑/etc/profile文件,增加export SPARK_HOME=/hadoop/spark上面的变量添加完成后编辑该文件中的PATH变量,添加${SPARK_HOME}/bin修改完成后,我的/etc/profile文件内容是:export JAVA_HOME=/usr/java/jdk1.8.0_151 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/hadoop/ export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR" export HIVE_HOME=/hadoop/hive export HIVE_CONF_DIR=${HIVE_HOME}/conf export HCAT_HOME=$HIVE_HOME/hcatalog export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/ export ZOOKEEPER_HOME=/hadoop/zookeeper export KAFKA_HOME=/hadoop/kafka export KYLIN_HOME=/hadoop/kylin/ export GGHOME=/hadoop/ogg12 export SCALA_HOME=/usr/scala/scala-2.11.12 export SPARK_HOME=/hadoop/spark export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:${SPARK_HOME}/sbin export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib编辑完成后,执行命令 source /etc/profile使环境变量生效。3.2、配置参数文件进入conf目录[root@hadoop conf]# pwd /hadoop/spark/conf复制一份配置文件并重命名root@hadoop conf]# cp spark-env.sh.template spark-env.sh [root@hadoop conf]# ls docker.properties.template fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh spark-env.sh.template编辑spark-env.h文件,在里面加入配置(具体路径以自己的为准):export SCALA_HOME=/usr/scala/scala-2.11.12 export JAVA_HOME=/usr/java/jdk1.8.0_151 export HADOOP_HOME=/hadoop export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export SPARK_HOME=/hadoop/spark export SPARK_MASTER_IP=192.168.1.66 export SPARK_EXECUTOR_MEMORY=1Gsource /etc/profile生效。3.3、新建slaves文件以spark为我们创建好的模板创建一个slaves文件,命令是:[root@hadoop conf]# pwd /hadoop/spark/conf [root@hadoop conf]# cp slaves.template slaves四、启动spark因为spark是依赖于hadoop提供的分布式文件系统的,所以在启动spark之前,先确保hadoop在正常运行。[root@hadoop hadoop]# jps 23408 RunJar 23249 JobHistoryServer 23297 RunJar 24049 Jps 22404 DataNode 22774 ResourceManager 23670 Kafka 22264 NameNode 22889 NodeManager 23642 QuorumPeerMain 22589 SecondaryNameNode在hadoop正常运行的情况下,在hserver1(也就是hadoop的namenode,spark的marster节点)上执行命令:[root@hadoop hadoop]# cd /hadoop/spark/sbin [root@hadoop sbin]# ./start-all.sh starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out [root@hadoop sbin]# cat /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out Spark Command: /usr/java/jdk1.8.0_151/bin/java -cp /hadoop/spark/conf/:/hadoop/spark/jars/*:/hadoop/etc/hadoop/ -Xmx1g org.apache.spark.deploy.master.Master --host hadoop --port 7077 --webui-port 8080 ======================================== 20/03/30 22:42:27 INFO master.Master: Started daemon with process name: 24079@hadoop 20/03/30 22:42:27 INFO util.SignalUtils: Registered signal handler for TERM 20/03/30 22:42:27 INFO util.SignalUtils: Registered signal handler for HUP 20/03/30 22:42:27 INFO util.SignalUtils: Registered signal handler for INT 20/03/30 22:42:27 WARN master.MasterArguments: SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST 20/03/30 22:42:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/03/30 22:42:27 INFO spark.SecurityManager: Changing view acls to: root 20/03/30 22:42:27 INFO spark.SecurityManager: Changing modify acls to: root 20/03/30 22:42:27 INFO spark.SecurityManager: Changing view acls groups to: 20/03/30 22:42:27 INFO spark.SecurityManager: Changing modify acls groups to: 20/03/30 22:42:27 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss ions: Set(root); groups with modify permissions: Set()20/03/30 22:42:27 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077. 20/03/30 22:42:27 INFO master.Master: Starting Spark master at spark://hadoop:7077 20/03/30 22:42:27 INFO master.Master: Running Spark version 2.4.4 20/03/30 22:42:28 INFO util.log: Logging initialized @1497ms 20/03/30 22:42:28 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 20/03/30 22:42:28 INFO server.Server: Started @1560ms 20/03/30 22:42:28 INFO server.AbstractConnector: Started ServerConnector@6182300a{HTTP/1.1,[http/1.1]}{0.0.0.0:8080} 20/03/30 22:42:28 INFO util.Utils: Successfully started service 'MasterUI' on port 8080. 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f1f0276{/app,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f1af444{/app/json,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@259b10d3{/,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6fc2f56f{/json,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37a28407{/static,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e99fa57{/app/kill,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66be5bb8{/driver/kill,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO ui.MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://hadoop:8080 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b2c0980{/metrics/master/json,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4ac1749f{/metrics/applications/json,null,AVAILABLE,@Spark} 20/03/30 22:42:28 INFO master.Master: I have been elected leader! New state: ALIVE 20/03/30 22:42:31 INFO master.Master: Registering worker 192.168.1.66:39384 with 8 cores, 4.6 GB RAM [root@hadoop sbin]# cat /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out Spark Command: /usr/java/jdk1.8.0_151/bin/java -cp /hadoop/spark/conf/:/hadoop/spark/jars/*:/hadoop/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop:7077 ======================================== 20/03/30 22:42:29 INFO worker.Worker: Started daemon with process name: 24173@hadoop 20/03/30 22:42:29 INFO util.SignalUtils: Registered signal handler for TERM 20/03/30 22:42:29 INFO util.SignalUtils: Registered signal handler for HUP 20/03/30 22:42:29 INFO util.SignalUtils: Registered signal handler for INT 20/03/30 22:42:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/03/30 22:42:30 INFO spark.SecurityManager: Changing view acls to: root 20/03/30 22:42:30 INFO spark.SecurityManager: Changing modify acls to: root 20/03/30 22:42:30 INFO spark.SecurityManager: Changing view acls groups to: 20/03/30 22:42:30 INFO spark.SecurityManager: Changing modify acls groups to: 20/03/30 22:42:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss ions: Set(root); groups with modify permissions: Set()20/03/30 22:42:30 INFO util.Utils: Successfully started service 'sparkWorker' on port 39384. 20/03/30 22:42:30 INFO worker.Worker: Starting Spark worker 192.168.1.66:39384 with 8 cores, 4.6 GB RAM 20/03/30 22:42:30 INFO worker.Worker: Running Spark version 2.4.4 20/03/30 22:42:30 INFO worker.Worker: Spark home: /hadoop/spark 20/03/30 22:42:31 INFO util.log: Logging initialized @1682ms 20/03/30 22:42:31 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 20/03/30 22:42:31 INFO server.Server: Started @1758ms 20/03/30 22:42:31 INFO server.AbstractConnector: Started ServerConnector@3d598dff{HTTP/1.1,[http/1.1]}{0.0.0.0:8081} 20/03/30 22:42:31 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081. 20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5099c1b0{/logPage,null,AVAILABLE,@Spark} 20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64348087{/logPage/json,null,AVAILABLE,@Spark} 20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46dcda1b{/,null,AVAILABLE,@Spark} 20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1617f7cc{/json,null,AVAILABLE,@Spark} 20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56e77d31{/static,null,AVAILABLE,@Spark} 20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@643123b6{/log,null,AVAILABLE,@Spark} 20/03/30 22:42:31 INFO ui.WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://hadoop:8081 20/03/30 22:42:31 INFO worker.Worker: Connecting to master hadoop:7077... 20/03/30 22:42:31 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1cf30aaa{/metrics/json,null,AVAILABLE,@Spark} 20/03/30 22:42:31 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:7077 after 36 ms (0 ms spent in bootstraps) 20/03/30 22:42:31 INFO worker.Worker: Successfully registered with master spark://hadoop:7077 启动没问题,访问Webui:http://192.168.1.66:8080/五、运行Spark提供的计算圆周率的示例程序这里只是简单的用local模式运行一个计算圆周率的Demo。按照下面的步骤来操作。[root@hadoop sbin]# cd /hadoop/spark/ [root@hadoop spark]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.4.4.jar 20/03/30 22:45:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/03/30 22:45:59 INFO spark.SparkContext: Running Spark version 2.4.4 20/03/30 22:45:59 INFO spark.SparkContext: Submitted application: Spark Pi 20/03/30 22:45:59 INFO spark.SecurityManager: Changing view acls to: root 20/03/30 22:45:59 INFO spark.SecurityManager: Changing modify acls to: root 20/03/30 22:45:59 INFO spark.SecurityManager: Changing view acls groups to: 20/03/30 22:45:59 INFO spark.SecurityManager: Changing modify acls groups to: 20/03/30 22:45:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss ions: Set(root); groups with modify permissions: Set()20/03/30 22:45:59 INFO util.Utils: Successfully started service 'sparkDriver' on port 39352. 20/03/30 22:45:59 INFO spark.SparkEnv: Registering MapOutputTracker 20/03/30 22:45:59 INFO spark.SparkEnv: Registering BlockManagerMaster 20/03/30 22:45:59 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 20/03/30 22:45:59 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 20/03/30 22:45:59 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-63bf7c92-8908-4784-8e16-4c6ef0c93dc0 20/03/30 22:45:59 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 20/03/30 22:45:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator 20/03/30 22:46:00 INFO util.log: Logging initialized @2066ms 20/03/30 22:46:00 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 20/03/30 22:46:00 INFO server.Server: Started @2179ms 20/03/30 22:46:00 INFO server.AbstractConnector: Started ServerConnector@3abd581e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 20/03/30 22:46:00 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@36dce7ed{/jobs,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a1ebcff{/jobs/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@19868320{/jobs/job,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c20be82{/jobs/job/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@13c612bd{/stages,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3ef41c66{/stages/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b739528{/stages/stage,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f577419{/stages/stage/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28fa700e{/stages/pool,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d526ad9{/stages/pool/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e041f0c{/storage,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a175569{/storage/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@11963225{/storage/rdd,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3c966c{/storage/rdd/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@11ee02f8{/environment,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4102b1b1{/environment/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61a5b4ae{/executors,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a71c100{/executors/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b69fd74{/executors/threadDump,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f325091{/executors/threadDump/json,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@437e951d{/static,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@467f77a5{/,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bb9aa43{/api,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66b72664{/jobs/job/kill,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7a34b7b8{/stages/stage/kill,null,AVAILABLE,@Spark} 20/03/30 22:46:00 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop:4040 20/03/30 22:46:00 INFO spark.SparkContext: Added JAR file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.4.jar at spark://hadoop:39352/jars/spark-examples_2.11-2.4.4.jar with timestamp 1585579560287 20/03/30 22:46:00 INFO executor.Executor: Starting executor ID driver on host localhost 20/03/30 22:46:00 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38875. 20/03/30 22:46:00 INFO netty.NettyBlockTransferService: Server created on hadoop:38875 20/03/30 22:46:00 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 20/03/30 22:46:00 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop, 38875, None) 20/03/30 22:46:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop:38875 with 366.3 MB RAM, BlockManagerId(driver, hadoop, 38875, None) 20/03/30 22:46:00 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop, 38875, None) 20/03/30 22:46:00 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop, 38875, None) 20/03/30 22:46:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6f8e0cee{/metrics/json,null,AVAILABLE,@Spark} 20/03/30 22:46:01 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38 20/03/30 22:46:01 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions 20/03/30 22:46:01 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38) 20/03/30 22:46:01 INFO scheduler.DAGScheduler: Parents of final stage: List() 20/03/30 22:46:01 INFO scheduler.DAGScheduler: Missing parents: List() 20/03/30 22:46:01 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents 20/03/30 22:46:01 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 366.3 MB) 20/03/30 22:46:01 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 366.3 MB) 20/03/30 22:46:01 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop:38875 (size: 1256.0 B, free: 366.3 MB) 20/03/30 22:46:01 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161 20/03/30 22:46:01 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1)) 20/03/30 22:46:01 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 20/03/30 22:46:01 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes) 20/03/30 22:46:01 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0) 20/03/30 22:46:01 INFO executor.Executor: Fetching spark://hadoop:39352/jars/spark-examples_2.11-2.4.4.jar with timestamp 1585579560287 20/03/30 22:46:01 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:39352 after 45 ms (0 ms spent in bootstraps) 20/03/30 22:46:01 INFO util.Utils: Fetching spark://hadoop:39352/jars/spark-examples_2.11-2.4.4.jar to /tmp/spark-9e0481a2-756b-436f-bc74-dd42fb5ea839/userFiles-86767584-1e78-45f2-a9ed-8ac4360ab170/fetchFileTem p2974211155688432975.tmp20/03/30 22:46:01 INFO executor.Executor: Adding file:/tmp/spark-9e0481a2-756b-436f-bc74-dd42fb5ea839/userFiles-86767584-1e78-45f2-a9ed-8ac4360ab170/spark-examples_2.11-2.4.4.jar to class loader 20/03/30 22:46:01 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver 20/03/30 22:46:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes) 20/03/30 22:46:01 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1) 20/03/30 22:46:01 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 308 ms on localhost (executor driver) (1/2) 20/03/30 22:46:01 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver 20/03/30 22:46:01 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 31 ms on localhost (executor driver) (2/2) 20/03/30 22:46:01 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 20/03/30 22:46:01 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.606 s 20/03/30 22:46:01 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.703911 s Pi is roughly 3.1386756933784667 20/03/30 22:46:01 INFO server.AbstractConnector: Stopped Spark@3abd581e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 20/03/30 22:46:01 INFO ui.SparkUI: Stopped Spark web UI at http://hadoop:4040 20/03/30 22:46:01 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/03/30 22:46:01 INFO memory.MemoryStore: MemoryStore cleared 20/03/30 22:46:01 INFO storage.BlockManager: BlockManager stopped 20/03/30 22:46:01 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 20/03/30 22:46:01 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/03/30 22:46:01 INFO spark.SparkContext: Successfully stopped SparkContext 20/03/30 22:46:01 INFO util.ShutdownHookManager: Shutdown hook called 20/03/30 22:46:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e019897d-3160-4bb1-ab59-f391e32ec47a 20/03/30 22:46:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9e0481a2-756b-436f-bc74-dd42fb5ea839可以看到输出:Pi is roughly 3.137355686778434已经打印出了圆周率。上面只是使用了单机本地模式调用Demo,使用集群模式运行Demo,请继续看。六、用yarn-cluster模式执行计算程序进入到Spark的安装目录,执行命令,用yarn-cluster模式运行计算圆周率的Demo:[root@hadoop spark]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster examples/jars/spark-examples_2.11-2.4.4.jar Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead. 20/03/30 22:47:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/03/30 22:47:48 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032 20/03/30 22:47:48 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers 20/03/30 22:47:48 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 20/03/30 22:47:48 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 20/03/30 22:47:48 INFO yarn.Client: Setting up container launch context for our AM 20/03/30 22:47:48 INFO yarn.Client: Setting up the launch environment for our AM container 20/03/30 22:47:48 INFO yarn.Client: Preparing resources for our AM container 20/03/30 22:47:48 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 20/03/30 22:47:51 INFO yarn.Client: Uploading resource file:/tmp/spark-d554f7cd-c7d4-4dfa-bc86-11a340925db6/__spark_libs__3389017089811757919.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_ 1585579247054_0001/__spark_libs__3389017089811757919.zip20/03/30 22:47:59 INFO yarn.Client: Uploading resource file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.4.jar -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585579247054_0001/spark-exa mples_2.11-2.4.4.jar20/03/30 22:47:59 INFO yarn.Client: Uploading resource file:/tmp/spark-d554f7cd-c7d4-4dfa-bc86-11a340925db6/__spark_conf__559264393694354636.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1 585579247054_0001/__spark_conf__.zip20/03/30 22:47:59 INFO spark.SecurityManager: Changing view acls to: root 20/03/30 22:47:59 INFO spark.SecurityManager: Changing modify acls to: root 20/03/30 22:47:59 INFO spark.SecurityManager: Changing view acls groups to: 20/03/30 22:47:59 INFO spark.SecurityManager: Changing modify acls groups to: 20/03/30 22:47:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss ions: Set(root); groups with modify permissions: Set()20/03/30 22:48:01 INFO yarn.Client: Submitting application application_1585579247054_0001 to ResourceManager 20/03/30 22:48:01 INFO impl.YarnClientImpl: Submitted application application_1585579247054_0001 20/03/30 22:48:02 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:02 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1585579681188 final status: UNDEFINED tracking URL: http://hadoop:8088/proxy/application_1585579247054_0001/ user: root 20/03/30 22:48:03 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:04 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:05 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:06 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:07 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:08 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:09 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:11 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:12 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:13 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:14 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:15 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:16 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:17 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:19 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:20 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:21 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:22 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:23 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:24 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:25 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:26 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:27 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:28 INFO yarn.Client: Application report for application_1585579247054_0001 (state: ACCEPTED) 20/03/30 22:48:29 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:29 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: hadoop ApplicationMaster RPC port: 37844 queue: default start time: 1585579681188 final status: UNDEFINED tracking URL: http://hadoop:8088/proxy/application_1585579247054_0001/ user: root 20/03/30 22:48:30 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:31 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:32 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:33 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:34 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:35 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:36 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:37 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:38 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:39 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:40 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:41 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:42 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:43 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:44 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:45 INFO yarn.Client: Application report for application_1585579247054_0001 (state: RUNNING) 20/03/30 22:48:46 INFO yarn.Client: Application report for application_1585579247054_0001 (state: FINISHED) 20/03/30 22:48:46 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: hadoop ApplicationMaster RPC port: 37844 queue: default start time: 1585579681188 final status: SUCCEEDED tracking URL: http://hadoop:8088/proxy/application_1585579247054_0001/ user: root 20/03/30 22:48:46 INFO util.ShutdownHookManager: Shutdown hook called 20/03/30 22:48:46 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-4c243c24-9489-4c8a-a1bc-a6a9780615d6 20/03/30 22:48:46 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d554f7cd-c7d4-4dfa-bc86-11a340925db6 注意,使用yarn-cluster模式计算,结果没有输出在控制台,结果写在了Hadoop集群的日志中,如何查看计算结果?注意到刚才的输出中有地址: tracking URL: http://hadoop:8088/proxy/application_1585579247054_0001/ 进去看看: 再点进logs: 查看stdout内容: 圆周率结果已经打印出来了。这里再给出几个常用命令:启动spark ./sbin/start-all.sh 启动Hadoop以**及Spark: ./starths.sh 停止命令改成stop七、配置spark读取hive表由于在hive里面操作表是通过mapreduce的方式,效率较低,本文主要描述如何通过spark读取hive表到内存进行计算。第一步,先把$HIVE_HOME/conf/hive-site.xml放入$SPARK_HOME/conf内,使得spark能够获取hive配置[root@hadoop spark]# pwd /hadoop/spark [root@hadoop spark]# cp $HIVE_HOME/conf/hive-site.xml conf/ [root@hadoop spark]# chmod 777 conf/hive-site.xml [root@hadoop spark]# cp /hadoop/hive/lib/mysql-connector-java-5.1.47.jar jars/通过spark-shell进入交互界面[root@hadoop spark]# /hadoop/spark/bin/spark-shell 20/03/31 10:31:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/03/31 10:32:41 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 20/03/31 10:32:41 WARN util.Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. Spark context Web UI available at http://hadoop:4042 Spark context available as 'sc' (master = local[*], app id = local-1585621962060). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151) Type in expressions to have them evaluated. Type :help for more information. scala> import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.HiveContext scala> import org.apache.spark.sql.functions._ import org.apache.spark.sql.functions._ scala> val hiveContext = new HiveContext(sc) warning: there was one deprecation warning; re-run with -deprecation for details hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@62966c9f scala> hiveContext.sql("show databases").show() 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.client.capability.check does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.false.positive.probability does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.broker.address.default does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.orc.time.counters does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve-fraction.min does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.ms.footer.cache.ppd.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.event.message.factory does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.metrics.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.hs2.user.access does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.storage.storageDirectory does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.am.liveness.connection.timeout.ms does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.dynamic.semijoin.reduction.threshold does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.connect.retry.limit does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.xmx.headroom does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.dynamic.semijoin.reduction does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.direct does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.stats does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.client.consistent.splits does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.tez.session.lifetime does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.timedout.txn.reaper.start does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.ttl does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.management.acl does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.delegation.token.lifetime does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.guidKey does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.ats.hook.queue.capacity does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.strict.checks.large.query does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.bigtable.minsize.semijoin.reduction does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.alloc.min does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.alloc.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.wait.queue.comparator.class.name does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.port does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.cache.use.soft.references does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve.fraction.max does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.listener.thread-count does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.container.max.java.heap.fraction does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.stats.column.autogather does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.am.liveness.heartbeat.interval.ms does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.decoding.metrics.percentiles.intervals does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.groupby.position.alias does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.txn.store.impl does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.use.groupby.shuffle does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.object.cache.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.parallel.ops.in.session does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.groupby.limit.extrastep does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.file.location does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.retry.delay.seconds does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.materializedview.fileformat does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.num.file.cleaner.threads does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.test.fail.compaction does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.blobstore.use.blobstore.as.scratchdir does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.class does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.mmap.path does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.download.permanent.fns does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.max.historic.queries does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.execution.reducesink.new.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.max.num.delta does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.attempted does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.port does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.initiator.failed.compacts.threshold does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.reporter does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.max.pending.writes does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.execution.mode does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.enable.grace.join.in.llap does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.memory.mode does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.threadpool.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.select.threshold does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.scratchdir.lock does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.use.spnego does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.file.frequency does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.hs2.coordinator.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.timeout.seconds does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.filter.stats.reduction does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.exec.orc.base.delta.ratio does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.fastpath does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.clear.dangling.scratchdir does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.test.fail.heartbeater does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.file.cleanup.delay.seconds does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.management.rpc.port does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.mapjoin.hybridgrace.bloomfilter does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.tree does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.stats.ndv.tuner does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.query.length does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.failed does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.close.session.on.disconnect does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.ppd.windowing does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.initial.metadata.count.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.host does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.ms.footer.cache.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.point.lookup.min does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.file.metadata.threads does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.service.refresh.interval.sec does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.max.output.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.remote.token.requires.signing does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.cache.allow.synthetic.fileid does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.hash.table.inflation.factor does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.hbase.ttl does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.vectorized does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.writeset.reaper.interval does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.vector.serde.deserialize does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.order.columnalignment does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.send.buffer.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.exec.schema.evolution does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.elements.values.clause does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.llap.concurrent.queries does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.allow.uber does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.partition.size.max does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.auth does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.include.fileid does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.communicator.num.threads does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orderby.position.alias does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.connection.sleep.between.retries.ms does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.max.partitions does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.hadoop2.component does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.yarn.shuffle.port does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.elements.in.clause does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.passiveWaitTimeMs does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.load.dynamic.partitions.thread does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.segments.granularity does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.http.response.header.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.conf.internal.variable.list does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose.reductionpercentage does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.cm.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.retry.limit does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.serialize.in.tasks does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.query.timeout.seconds does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.service.metrics.hadoop2.frequency does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.directory.batch.ms does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.reader.wait does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.reenable.max.timeout.ms does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.max.open.txns does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.reduce.side does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.zookeeper.publish.configs does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.auto.convert.join.hashtable.max.entries does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.tez.sessions.init.threads does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.authorization.storage.check.externaltable.drop does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.execution.mode does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.cbo.cnf.maxnodes does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.adaptor.usage.mode does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.materializedview.rewriting does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.groupMembershipKey does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.catalog.cache.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.cbo.show.warnings does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.fshandler.threads does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.max.bloom.filter.entries does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.metadata.fraction does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.materializedview.serde does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.scheduler.wait.queue.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.cache.entries does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.txn.operational.properties does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.memory.ttl does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.rpc.port does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.nonvector.wrapper.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.cache.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.vectorized.input.format does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.cte.materialize.threshold does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.clean.until does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.semijoin.conversion does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.port does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.dynamic.partition.pruning does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.metrics.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.rootdir does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.limit.partition.request does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.async.log.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.logger does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.allow.udf.load.on.demand does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.cli.tez.session.async does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.bloom.filter.factor does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.am-reporter.max.threads does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.use.file.size.for.mapjoin does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.strict.checks.bucketing does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.bucket.pruning.compat does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.spnego.principal does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.preemption.metrics.intervals does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.shuffle.dir.watcher.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.arena.count does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.use.SSL does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.connection.timeout.ms does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.transpose.aggr.join does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.maxTries does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.dynamic.partition.pruning.max.data.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.metadata.base does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.invalidator.frequency does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.use.lrfu does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.mmap does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.coordinator.address.default does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.max.fetch.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.conf.hidden.list does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.io.sarg.cache.max.weight.mb does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.clear.dangling.scratchdir.interval does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.sleep.time does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.row.serde.deserialize does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.compile.lock.timeout does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.timedout.txn.reaper.interval does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.max.variance does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.lrfu.lambda does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.metadata.db.type does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.stream.timeout does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.transactional.events.mem does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.default.fetch.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.cm.retain does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.merge.cardinality.check does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.groupClassKey does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.point.lookup does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.allow.permanent.fns does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.web.ssl does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.txn.manager.dump.lock.state.on.acquire.timeout does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.succeeded does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.use.fileid.path does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.slice.row.count does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.mapjoin.optimized.hashtable.probe.percent does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.select.distribute does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.am.use.fqdn does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.reenable.min.timeout.ms does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.validate.acls does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.support.special.characters.tablename does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.mv.files.thread does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.skip.compile.udf.check does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.vector.serde.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.cm.interval does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.sleep.interval.between.start.attempts does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.yarn.container.mb does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.http.read.timeout does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.blobstore.optimizations.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.orc.gap.cache does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.exec.copyfile.maxnumfiles does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.formats does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.http.numConnection does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.scheduler.enable.preemption does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.num.executors does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.full does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.connection.class does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.tez.sessions.custom.queue.allowed does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.slice.lrr does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.writer.wait does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.http.request.header.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.webui.max.threads does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose.reductiontuples does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.test.rollbacktxn does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.num.schedulable.tasks.per.node does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.acl does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.memory.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.strict.checks.type.safety does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.async.exec.async.compile does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.auto.max.input.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.enable.memory.manager does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.msck.repair.batch.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.blobstore.supported.schemes does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.splits.allow.synthetic.fileid does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.stats.filter.in.factor does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.use.op.stats does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.exec.input.listing.max.threads does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.tez.session.lifetime.jitter does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.web.port does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.strict.checks.cartesian.product does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.rpc.num.handlers does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.vcpus.per.instance does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.count.open.txns.interval does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.min.bloom.filter.entries does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.optimize.partition.columns.separate does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.orc.cache.stripe.details.mem.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.txn.heartbeat.threadpool.size does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.locality.delay does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.repl.cmrootdir does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.disable.backoff.factor does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.am.liveness.connection.sleep.between.retries.ms does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.spark.exec.inplace.progress does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.working.directory does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.memory.per.instance.mb does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.msck.path.validation does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve.fraction does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.merge.nway.joins does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.compactor.history.reaper.interval does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.vector.serde.async.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.in.place.progress does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.memory.rownum.max does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.server2.xsrf.filter.enabled does not exist 20/03/31 10:33:53 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.alloc.max does not exist +------------+ |databaseName| +------------+ | default| | hadoop| +------------+ scala> hiveContext.sql("show tables").show() +--------+--------------------+-----------+ |database| tableName|isTemporary| +--------+--------------------+-----------+ | default| aa| false| | default| bb| false| | default| dd| false| | default| kylin_account| false| | default| kylin_cal_dt| false| | default|kylin_category_gr...| false| | default| kylin_country| false| | default|kylin_intermediat...| false| | default|kylin_intermediat...| false| | default| kylin_sales| false| | default| test| false| | default| test_null| false| +--------+--------------------+-----------+可以看到已经查询到结果了,但是为啥上面报了一堆WARN 。比如:WARN conf.HiveConf: HiveConf of name hive.llap.skip.compile.udf.check does not exis hive-site配置文件删除掉:<property> <name>hive.llap.skip.compile.udf.check</name> <value>false</value> <description> Whether to skip the compile-time check for non-built-in UDFs when deciding whether to execute tasks in LLAP. Skipping the check allows executing UDFs from pre-localized jars in LLAP; if the jars are not pre-localized, the UDFs will simply fail to load. </description> </property>再次登录执行警告就消失了。八、配置Hudi8.1、检阅官方文档重点地方先来看下官方文档getstart首页:我之前装的hadoop环境是2.7版本的,前面之所以装spark2.4.4就是因为目前官方案例就是用的hadoop2.7+spark2.4.4,而且虽然现在hudi、spark是支持scala2.11.x/2.12.x,但是官网这里也是用的2.11,我这里为了保持和hudi官方以及spark2.4.4(Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151))一致,也就装的2.11.12版本的scala。因为目前为止,Hudi已经出了0.5.2版本,但是Hudi官方仍然用的0.5.1的做示例,接下来,先切换到hudi0.5.1的发布文档:点击查看上面发布文档讲的意思是:版本升级 将Spark版本从2.1.0升级到2.4.4 将Avro版本从1.7.7升级到1.8.2将Parquet版本从1.8.1升级到1.10.1将Kafka版本从0.8.2.1升级到2.0.0,这是由于将spark-streaming-kafkaartifact从0.8_2.11升级到0.10_2.11/2.12间接升级 重要:Hudi0.5.1版本需要将spark的版本升级到2.4+Hudi现在支持Scala 2.11和2.12,可以参考Scala 2.12构建来使用Scala 2.12来构建Hudi,另外,hudi-spark, hudi-utilities, hudi-spark-bundle andhudi-utilities-bundle包名现已经对应变更为 hudi-spark_{scala_version},hudi-spark_{scala_version}, hudi-utilities_{scala_version},hudi-spark-bundle_{scala_version}和hudi-utilities-bundle_{scala_version}. 注意这里的scala_version为2.11或2.12。在0.5.1版本中,对于timeline元数据的操作不再使用重命名方式,这个特性在创建Hudi表时默认是打开的。对于已存在的表,这个特性默认是关闭的,在已存在表开启这个特性之前,请参考这部分(https://hudi.apache.org/docs/deployment.html#upgrading)。若开启新的Huditimeline布局方式(layout),即避免重命名,可设置写配置项hoodie.timeline.layout.version=1。当然,你也可以在CLI中使用repairoverwrite-hoodie-props命令来添加hoodie.timeline.layout.version=1至hoodie.properties文件。注意,无论使用哪种方式,在升级Writer之前请先升级HudiReader(查询引擎)版本至0.5.1版本。 CLI支持repairoverwrite-hoodie-props来指定文件来重写表的hoodie.properties文件,可以使用此命令来的更新表名或者使用新的timeline布局方式。注意当写hoodie.properties文件时(毫秒),一些查询将会暂时失败,失败后重新运行即可。DeltaStreamer用来指定表类型的参数从--storage-type变更为了--table-type,可以参考wiki来了解更多的最新变化的术语。配置Kafka ResetOffset策略的值变化了。枚举值从LARGEST变更为LATEST,SMALLEST变更为EARLIEST,对应DeltaStreamer中的配置项为auto.offset.reset。当使用spark-shell来了解Hudi时,需要提供额外的--packagesorg.apache.spark:spark-avro_2.11:2.4.4,可以参考quickstart了解更多细节。 Keygenerator(键生成器)移动到了单独的包下org.apache.hudi.keygen,如果你使用重载键生成器类(对应配置项:hoodie.datasource.write.keygenerator.class),请确保类的全路径名也对应进行变更。Hive同步工具将会为MOR注册带有_ro后缀的RO表,所以查询也请带_ro后缀,你可以使用--skip-ro-suffix配置项来保持旧的表名,即同步时不添加_ro后缀。0.5.1版本中,供presto/hive查询引擎使用的hudi-hadoop-mr-bundle包shaded了avro包,以便支持realtimequeries(实时查询)。Hudi支持可插拔的记录合并逻辑,用户只需自定义实现HoodieRecordPayload。如果你使用这个特性,你需要在你的代码中relocateavro依赖,这样可以确保你代码的行为和Hudi保持一致,你可以使用如下方式来relocation。 org.apache.avro.org.apache.hudi.org.apache.avro. DeltaStreamer更好的支持Delete,可参考blog了解更多细节。DeltaStreamer支持AWS Database Migration Service(DMS) ,可参考blog了解更多细节。支持DynamicBloomFilter(动态布隆过滤器),默认是关闭的,可以使用索引配置项hoodie.bloom.index.filter.type=DYNAMIC_V0来开启。HDFSParquetImporter支持bulkinsert,可配置--command为bulkinsert。 支持AWS WASB和WASBS云存储。8.2、错误的安装尝试好了,看完了发布文档,而且已经定下了我们的使用版本关系,那么直接切换到Hudi0.5.2最新版本的官方文档:点此跳转因为之前没用过spark和hudi,在看到hudi官网的第一眼时候,首先想到的是先下载一个hudi0.5.1对应的应用程序,然后再进行部署,部署好了之后再执行上面官网给的命令代码,比如下面我之前做的错误示范:由于官方目前案例都是用的0.5.1,所以我也下载这个版本: https://downloads.apache.org/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz 将下载好的安装包,上传到/hadoop/spark目录下并解压: [root@hadoop spark]# ls bin conf data examples hudi-0.5.1-incubating.src.tgz jars kubernetes LICENSE licenses logs NOTICE python R README.md RELEASE sbin spark-2.4.4-bin-hadoop2.7 work yarn [root@hadoop spark]# tar -zxvf hudi-0.5.1-incubating.src.tgz [root@hadoop spark]# ls bin conf data examples hudi-0.5.1-incubating hudi-0.5.1-incubating.src.tgz jars kubernetes LICENSE licenses logs NOTICE python R README.md RELEASE sbin spark-2.4.4-bin-hadoop2.7 work yarn [root@hadoop spark]# rm -rf *tgz [root@hadoop ~]# /hadoop/spark/bin/spark-shell \ > --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ > --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars :: loading settings :: url = jar:file:/hadoop/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml org.apache.hudi#hudi-spark-bundle_2.11 added as a dependency org.apache.spark#spark-avro_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-5717aa3e-7bfb-42c4-aadd-2a884f3521d5;1.0 confs: [default] You probably access the destination server through a proxy server that is not well configured. You probably access the destination server through a proxy server that is not well configured. You probably access the destination server through a proxy server that is not well configured. You probably access the destination server through a proxy server that is not well configured. You probably access the destination server through a proxy server that is not well configured. You probably access the destination server through a proxy server that is not well configured. You probably access the destination server through a proxy server that is not well configured. You probably access the destination server through a proxy server that is not well configured. :: resolution report :: resolve 454ms :: artifacts dl 1ms :: modules in use: --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 2 | 0 | 0 | 0 || 0 | 0 | --------------------------------------------------------------------- :: problems summary :: :::: WARNINGS Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark-bundle_2.11/0.5.1-incubating/hudi-spark-bundle_2.11-0.5.1-incubating.pom Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark-bundle_2.11/0.5.1-incubating/hudi-spark-bundle_2.11-0.5.1-incubating.jar 。。。。。。。。。。 :::::::::::::::::::::::::::::::::::::::::::::: :: UNRESOLVED DEPENDENCIES :: :::::::::::::::::::::::::::::::::::::::::::::: :: org.apache.hudi#hudi-spark-bundle_2.11;0.5.1-incubating: not found :: org.apache.spark#spark-avro_2.11;2.4.4: not found :::::::::::::::::::::::::::::::::::::::::::::: :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hudi#hudi-spark-bundle_2.11;0.5.1-incubating: not found, unresolved dependency: org.apache.spark#spark-avro_2.11;2.4.4: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1302) at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:304) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)8.3、正确的“安装部署”其实下载的这个应该算是个源码包,不是可直接运行的。而且spark-shell --packages是指定java包的maven地址,若不给定,则会使用该机器安装的maven默认源中下载此jar包,也就是说指定的这两个jar是需要自动下载的,我的虚拟环境一没设置外部网络,二没配置maven,这肯定会报错找不到jar包。官方这里的代码:--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4说白了其实就是指定maven项目pom文件的依赖,翻了一下官方文档,找到了Hudi给的中央仓库地址,然后从中找到了官方案例代码中指定的两个包:直接拿出来,就是下面这两个:<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-avro_2.11</artifactId> <version>2.4.4</version> </dependency> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-spark-bundle_2.11</artifactId> <version>0.5.2-incubating</version> </dependency>好吧,那我就在这直接下载了这俩包,然后再继续看官方文档:这里说了我也可以通过自己构建hudi来快速开始, 并在spark-shell命令中使用--jars /packaging/hudi-spark-bundle/target/hudi-spark-bundle-..*-SNAPSHOT.jar, 而不是--packages org.apache.hudi:hudi-spark-bundle:0.5.2-incubating,看到这个提示,我在linux看了下 spark-shell的帮助:[root@hadoop external_jars]# /hadoop/spark/bin/spark-shell --help Usage: ./bin/spark-shell [options] Scala REPL options: -I <file> preload <file>, enforcing line-by-line interpretation Options: --master MASTER_URL spark://host:port, mesos://host:port, yarn, k8s://https://host:port, or local (Default: local[*]). --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or on one of the worker machines inside the cluster ("cluster") (Default: client). --class CLASS_NAME Your application's main class (for Java / Scala apps). --name NAME A name of your application. --jars JARS Comma-separated list of jars to include on the driver and executor classpaths. --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version. --exclude-packages Comma-separated list of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts. --repositories Comma-separated list of additional remote repositories to search for the maven coordinates given with --packages. --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps. --files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName). --conf PROP=VALUE Arbitrary Spark configuration property. --properties-file FILE Path to a file from which to load extra properties. If not specified, this will look for conf/spark-defaults.conf. --driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M). --driver-java-options Extra Java options to pass to the driver. --driver-library-path Extra library path entries to pass to the driver. --driver-class-path Extra class path entries to pass to the driver. Note that jars added with --jars are automatically included in the classpath. --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G). --proxy-user NAME User to impersonate when submitting the application. This argument does not work with --principal / --keytab. --help, -h Show this help message and exit. --verbose, -v Print additional debug output. --version, Print the version of current Spark. Cluster deploy mode only: --driver-cores NUM Number of cores used by the driver, only in cluster mode (Default: 1). Spark standalone or Mesos with cluster deploy mode only: --supervise If given, restarts the driver on failure. --kill SUBMISSION_ID If given, kills the driver specified. --status SUBMISSION_ID If given, requests the status of the driver specified. Spark standalone and Mesos only: --total-executor-cores NUM Total cores for all executors. Spark standalone and YARN only: --executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode, or all available cores on the worker in standalone mode) YARN-only: --queue QUEUE_NAME The YARN queue to submit to (Default: "default"). --num-executors NUM Number of executors to launch (Default: 2). If dynamic allocation is enabled, the initial number of executors will be at least NUM. --archives ARCHIVES Comma separated list of archives to be extracted into the working directory of each executor. --principal PRINCIPAL Principal to be used to login to KDC, while running on secure HDFS. --keytab KEYTAB The full path to the file that contains the keytab for the principal specified above. This keytab will be copied to the node running the Application Master via the Secure Distributed Cache, for renewing the login tickets and the delegation tokens periodically.原来--jasrs是指定机器上存在的jar文件,接下来将前面下载的两个包上传到服务器:[root@hadoop spark]# mkdir external_jars [root@hadoop spark]# cd external_jars/ [root@hadoop external_jars]# pwd /hadoop/spark/external_jars 通过xftp上传jar到此目录 [root@hadoop external_jars]# ls hudi-spark-bundle_2.11-0.5.2-incubating.jar scala-library-2.11.12.jar spark-avro_2.11-2.4.4.jar spark-tags_2.11-2.4.4.jar unused-1.0.0.jar然后将官方案例代码:spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'修改为:[root@hadoop external_jars]# /hadoop/spark/bin/spark-shell --jars /hadoop/spark/external_jars/spark-avro_2.11-2.4.4.jar,/hadoop/spark/external_jars/hudi-spark-bundle_2.11-0.5.2-incubating.jar --conf 'spark.seri alizer=org.apache.spark.serializer.KryoSerializer '20/03/31 15:19:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://hadoop:4040 Spark context available as 'sc' (master = local[*], app id = local-1585639157881). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_151) Type in expressions to have them evaluated. Type :help for more information. scala> OK!!!没有报错了,接下来开始尝试进行增删改查操作。8.4、Hudi增删改查基于上面步骤8.4.1、设置表名、基本路径和数据生成器来生成记录scala> import org.apache.hudi.QuickstartUtils._ import org.apache.hudi.QuickstartUtils._ scala> import scala.collection.JavaConversions._ import scala.collection.JavaConversions._ scala> import org.apache.spark.sql.SaveMode._ import org.apache.spark.sql.SaveMode._ scala> import org.apache.hudi.DataSourceReadOptions._ import org.apache.hudi.DataSourceReadOptions._ scala> import org.apache.hudi.DataSourceWriteOptions._ import org.apache.hudi.DataSourceWriteOptions._ scala> import org.apache.hudi.config.HoodieWriteConfig._ import org.apache.hudi.config.HoodieWriteConfig._ scala> val tableName = "hudi_cow_table" tableName: String = hudi_cow_table scala> val basePath = "file:///tmp/hudi_cow_table" basePath: String = file:///tmp/hudi_cow_table scala> val dataGen = new DataGenerator dataGen: org.apache.hudi.QuickstartUtils.DataGenerator = org.apache.hudi.QuickstartUtils$DataGenerator@4bf6bc2d数据生成器 可以基于行程样本模式 生成插入和更新的样本。8.4.2、插入数据生成一些新的行程样本,将其加载到DataFrame中,然后将DataFrame写入Hudi数据集中,如下所示。scala> val inserts = convertToStringList(dataGen.generateInserts(10)) inserts: java.util.List[String] = [{"ts": 0.0, "uuid": "81a9b76c-655b-4527-85fc-7696bdeab4fd", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.4726905879569653, "begin_lon": 0.46157858450465483, "e nd_lat": 0.754803407008858, "end_lon": 0.9671159942018241, "fare": 34.158284716382845, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "0d612dd2-5f10-4296-a434-b34e6558e8f1", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.6100070562136587, "begin_lon": 0.8779402295427752, "end_lat": 0.3407870505929602, "end_lon": 0.5030798142293655, "fare": 43.4923811219014, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "0e170de4-7eda-4ab5-8c06-e351e8b23e3d", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.5731835407930634, "begin_...scala> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2)) warning: there was one deprecation warning; re-run with -deprecation for details df: org.apache.spark.sql.DataFrame = [begin_lat: double, begin_lon: double ... 8 more fields] scala> df.write.format("org.apache.hudi"). | options(getQuickstartWriteConfigs). | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). | option(TABLE_NAME, tableName). | mode(Overwrite). | save(basePath); 20/03/31 15:28:11 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL.mode(Overwrite)覆盖并重新创建数据集(如果已经存在)。 您可以检查在/tmp/hudi_cow_table/\<region>/\<country>/\<city>/下生成的数据。我们提供了一个记录键 (schema中的uuid),分区字段(region/county/city)和组合逻辑(schema中的ts) 以确保行程记录在每个分区中都是唯一的。更多信息请参阅 对Hudi中的数据进行建模, 有关将数据提取到Hudi中的方法的信息,请参阅写入Hudi数据集。 这里我们使用默认的写操作:插入更新。 如果您的工作负载没有更新,也可以使用更快的插入或批量插入操作。 想了解更多信息,请参阅写操作。8.4.3、查询数据将数据文件加载到DataFrame中。scala> df.write.format("org.apache.hudi"). | options(getQuickstartWriteConfigs). | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). | option(TABLE_NAME, tableName). | mode(Overwrite). | save(basePath); 20/03/31 15:28:11 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL. scala> val roViewDF = spark. | read. | format("org.apache.hudi"). | load(basePath + "/*/*/*/*") 20/03/31 15:30:03 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL. roViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields] scala> roViewDF.registerTempTable("hudi_ro_table") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select fare, begin_lon, begin_lat, ts from hudi_ro_table where fare > 20.0").show() +------------------+-------------------+-------------------+---+ | fare| begin_lon| begin_lat| ts| +------------------+-------------------+-------------------+---+ | 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0| | 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0| | 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0| | 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0| |34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0| | 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0| | 43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0| | 41.06290929046368| 0.8192868687714224| 0.651058505660742|0.0| +------------------+-------------------+-------------------+---+ scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from hudi_ro_table").show() +-------------------+--------------------+----------------------+---------+----------+------------------+ |_hoodie_commit_time| _hoodie_record_key|_hoodie_partition_path| rider| driver| fare| +-------------------+--------------------+----------------------+---------+----------+------------------+ | 20200331152807|264170aa-dd3f-4a7...| americas/united_s...|rider-213|driver-213| 93.56018115236618| | 20200331152807|0e170de4-7eda-4ab...| americas/united_s...|rider-213|driver-213| 64.27696295884016| | 20200331152807|fb06d140-cd00-413...| americas/united_s...|rider-213|driver-213| 27.79478688582596| | 20200331152807|eb1d495c-57b0-4b3...| americas/united_s...|rider-213|driver-213| 33.92216483948643| | 20200331152807|2b3380b7-2216-4ca...| americas/united_s...|rider-213|driver-213|19.179139106643607| | 20200331152807|81a9b76c-655b-452...| americas/brazil/s...|rider-213|driver-213|34.158284716382845| | 20200331152807|d24e8cb8-69fd-4cc...| americas/brazil/s...|rider-213|driver-213| 66.62084366450246| | 20200331152807|0d612dd2-5f10-429...| americas/brazil/s...|rider-213|driver-213| 43.4923811219014| | 20200331152807|a6a7e7ed-3559-4ee...| asia/india/chennai|rider-213|driver-213|17.851135255091155| | 20200331152807|824ee8d5-6f1f-4d5...| asia/india/chennai|rider-213|driver-213| 41.06290929046368| +-------------------+--------------------+----------------------+---------+----------+------------------+该查询提供已提取数据的读取优化视图。由于我们的分区路径(region/country/city)是嵌套的3个级别 从基本路径开始,我们使用了load(basePath + "/*/*/*/*")。 有关支持的所有存储类型和视图的更多信息,请参考存储类型和视图。8.4.4、更新数据这类似于插入新数据。使用数据生成器生成对现有行程的更新,加载到DataFrame中并将DataFrame写入hudi数据集。scala> val updates = convertToStringList(dataGen.generateUpdates(10)) updates: java.util.List[String] = [{"ts": 0.0, "uuid": "0e170de4-7eda-4ab5-8c06-e351e8b23e3d", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.7340133901254792, "begin_lon": 0.5142184937933181, "en d_lat": 0.7814655558162802, "end_lon": 0.6592596683641996, "fare": 49.527694252432056, "partitionpath": "americas/united_states/san_francisco"}, {"ts": 0.0, "uuid": "81a9b76c-655b-4527-85fc-7696bdeab4fd", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.1593867607188556, "begin_lon": 0.010872312870502165, "end_lat": 0.9808530350038475, "end_lon": 0.7963756520507014, "fare": 29.47661370147079, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "81a9b76c-655b-4527-85fc-7696bdeab4fd", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.71801964677...scala> val df = spark.read.json(spark.sparkContext.parallelize(updates, 2)); warning: there was one deprecation warning; re-run with -deprecation for details df: org.apache.spark.sql.DataFrame = [begin_lat: double, begin_lon: double ... 8 more fields] scala> df.write.format("org.apache.hudi"). | options(getQuickstartWriteConfigs). | option(PRECOMBINE_FIELD_OPT_KEY, "ts"). | option(RECORDKEY_FIELD_OPT_KEY, "uuid"). | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). | option(TABLE_NAME, tableName). | mode(Append). | save(basePath); 20/03/31 15:32:27 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL.注意,保存模式现在为追加。通常,除非您是第一次尝试创建数据集,否则请始终使用追加模式。 查询现在再次查询数据将显示更新的行程。每个写操作都会生成一个新的由时间戳表示的commit 。在之前提交的相同的_hoodie_record_key中寻找_hoodie_commit_time, rider, driver字段变更。8.4.5、增量查询Hudi还提供了获取给定提交时间戳以来已更改的记录流的功能。 这可以通过使用Hudi的增量视图并提供所需更改的开始时间来实现。 如果我们需要给定提交之后的所有更改(这是常见的情况),则无需指定结束时间。scala> // reload data scala> spark. | read. | format("org.apache.hudi"). | load(basePath + "/*/*/*/*"). | createOrReplaceTempView("hudi_ro_table") 20/03/31 15:33:55 WARN hudi.DefaultSource: Snapshot view not supported yet via data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark SQL. scala> scala> val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime from hudi_ro_table order by commitTime").map(k => k.getString(0)).take(50) commits: Array[String] = Array(20200331152807, 20200331153224) scala> val beginTime = commits(commits.length - 2) // commit time we are interested in beginTime: String = 20200331152807 scala> // 增量查询数据 scala> val incViewDF = spark. | read. | format("org.apache.hudi"). | option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL). | option(BEGIN_INSTANTTIME_OPT_KEY, beginTime). | load(basePath); 20/03/31 15:34:40 WARN hudi.DefaultSource: hoodie.datasource.view.type is deprecated and will be removed in a later release. Please use hoodie.datasource.query.type incViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields] scala> incViewDF.registerTempTable("hudi_incr_table") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from hudi_incr_table where fare > 20.0").show() +-------------------+------------------+--------------------+-------------------+---+ |_hoodie_commit_time| fare| begin_lon| begin_lat| ts| +-------------------+------------------+--------------------+-------------------+---+ | 20200331153224|49.527694252432056| 0.5142184937933181| 0.7340133901254792|0.0| | 20200331153224| 98.3428192817987| 0.3349917833248327| 0.4777395067707303|0.0| | 20200331153224| 90.9053809533154| 0.19949323322922063|0.18294079059016366|0.0| | 20200331153224| 90.25710109008239| 0.4006983139989222|0.08528650347654165|0.0| | 20200331153224| 29.47661370147079|0.010872312870502165| 0.1593867607188556|0.0| | 20200331153224| 63.72504913279929| 0.888493603696927| 0.6570857443423376|0.0| +-------------------+------------------+--------------------+-------------------+---+ 这将提供在开始时间提交之后发生的所有更改,其中包含票价大于20.0的过滤器。关于此功能的独特之处在于,它现在使您可以在批量数据上创作流式管道。8.4.6、特定时间点查询让我们看一下如何查询特定时间的数据。可以通过将结束时间指向特定的提交时间,将开始时间指向”000”(表示最早的提交时间)来表示特定时间。scala> val beginTime = "000" // Represents all commits > this time. beginTime: String = 000 scala> val endTime = commits(commits.length - 2) // commit time we are interested in endTime: String = 20200331152807 scala> scala> // 增量查询数据 scala> val incViewDF = spark.read.format("org.apache.hudi"). | option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL). | option(BEGIN_INSTANTTIME_OPT_KEY, beginTime). | option(END_INSTANTTIME_OPT_KEY, endTime). | load(basePath); 20/03/31 15:36:00 WARN hudi.DefaultSource: hoodie.datasource.view.type is deprecated and will be removed in a later release. Please use hoodie.datasource.query.type incViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields] scala> incViewDF.registerTempTable("hudi_incr_table") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from hudi_incr_table where fare > 20.0").show() +-------------------+------------------+-------------------+-------------------+---+ |_hoodie_commit_time| fare| begin_lon| begin_lat| ts| +-------------------+------------------+-------------------+-------------------+---+ | 20200331152807| 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0| | 20200331152807| 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0| | 20200331152807| 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0| | 20200331152807| 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0| | 20200331152807|34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0| | 20200331152807| 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0| | 20200331152807| 43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0| | 20200331152807| 41.06290929046368| 0.8192868687714224| 0.651058505660742|0.0| +-------------------+------------------+-------------------+-------------------+---+
文章
SQL  ·  消息中间件  ·  分布式计算  ·  Hadoop  ·  Java  ·  Linux  ·  Scala  ·  Maven  ·  HIVE  ·  Spark
2023-03-24
记一次 rr 和硬件断点解决内存踩踏问题
在日常的调试过程中,我们总会遇到一些有趣的 bug,在本文我就遇到了一个有意思的查询结果不一致问题。故事的开始我们在测试 NebulaGraph 的 MATCH 语句的时候发现一个很神奇的事情:(root@nebula) [gdlancer]> match (v1)-[e*1..1]->(v2) where id(v1) in [1, 2, 3, 4] and (v2)-[e*1..1]->(v1) return e; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | e | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | [[:Rel_5 2->2 @0 {Rel_5_0_Bool: true, Rel_5_1_Bool: true, Rel_5_2_Double: 0.533698, Rel_5_3_String: "Stephen Curry", Rel_5_4_Double: 0.162998}]] | | [[:Rel_1 2->2 @0 {Rel_1_0_Int: 3, Rel_1_1_Int: 5, Rel_1_2_Int: 81, Rel_1_3_Double: 0.975062, Rel_1_4_Bool: true, Rel_1_5_Int: 59}]] | | [[:Rel_0 2->2 @0 {Rel_0_0_Bool: true, Rel_0_1_String: "Kevin Durant", Rel_0_2_String: "Joel Embiid", Rel_0_3_Int: 96, Rel_0_4_Double: 0.468568, Rel_0_5_Int: 98, Rel_0_6_Int: 77}]] | | [[:Rel_2 2->2 @0 {Rel_2_0_Int: 38, Rel_2_1_Double: 0.120953, Rel_2_2_String: "Null1", Rel_2_3_Bool: false, Rel_2_4_Bool: true, Rel_2_5_Int: 6, Rel_2_6_String: "Tracy McGrady"}]] | | [[:Rel_3 2->2 @0 {Rel_3_0_String: "Aron Baynes", Rel_3_1_String: "LeBron James", Rel_3_2_Double: 0.831096, Rel_3_3_Int: 11}]] | | [[:Rel_4 2->2 @0 {Rel_4_0_Bool: true, Rel_4_1_String: "Kevin Durant", Rel_4_2_Double: 0.71757, Rel_4_3_String: "Marc Gasol", Rel_4_4_Double: 0.285247, Rel_4_5_String: "Cory Joseph"}]] | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Got 6 rows (time spent 146.7ms/168.31625ms) Tue, 03 Jan 2023 14:10:03 CST (root@nebula) [gdlancer]> match (v1)-[e*1..1]->(v2) where id(v1) in [1, 2, 3, 4] and (v2)-[e*1..1]->(v1) return e; +---+ | e | +---+ +---+ Empty set (time spent 30.67ms/58.220042ms) Tue, 03 Jan 2023 14:10:05 CST同样的语句,两次查询的结果集居然不一样!开始 Debugprofile 出问题的语句:(root@nebula) [gdlancer]> profile match (v1)-[e*1..1]->(v2) where id(v1) in [1, 2, 3, 4] and (v2)-[e*1..1]->(v1) return e; +---+ | e | +---+ +---+ Empty set (time spent 18.755ms/79.84375ms) Execution Plan (optimize time 1656 us) -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | id | name | dependencies | profiling data | operator info | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 12 | Project | 11 | ver: 0, rows: 0, execTime: 17us, totalTime: 19us | outputVar: { | | | | | | "colNames": [ | | | | | | "e" | | | | | | ], | | | | | | "type": "DATASET", | | | | | | "name": "__Project_12" | | | | | | } | | | | | | inputVar: __Filter_11 | | | | | | columns: [ | | | | | | "$e" | | | | | | ] | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 11 | Filter | 10 | ver: 0, rows: 0, execTime: 26us, totalTime: 29us | outputVar: { | | | | | | "colNames": [ | | | | | | "v1", | | | | | | "e", | | | | | | "v2" | | | | | | ], | | | | | | "type": "DATASET", | | | | | | "name": "__Filter_11" | | | | | | } | | | | | | inputVar: __PatternApply_10 | | | | | | condition: ((id($v1)==1) OR (id($v1)==2) OR (id($v1)==3) OR (id($v1)==4)) | | | | | | isStable: false | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 10 | PatternApply | 6,9 | ver: 0, rows: 0, execTime: 84us, totalTime: 87us | outputVar: { | | | | | | "colNames": [ | | | | | | "v1", | | | | | | "e", | | | | | | "v2" | | | | | | ], | | | | | | "type": "DATASET", | | | | | | "name": "__PatternApply_10" | | | | | | } | | | | | | inputVar: { | | | | | | "rightVar": "__AppendVertices_9", | | | | | | "leftVar": "__Project_6" | | | | | | } | | | | | | keyCols: [ | | | | | | "id($-.v2)", | | | | | | "id($-.v1)" | | | | | | ] | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 6 | Project | 5 | ver: 0, rows: 18, execTime: 103us, totalTime: 106us | outputVar: { | | | | | | "colNames": [ | | | | | | "v1", | | | | | | "e", | | | | | | "v2" | | | | | | ], | | | | | | "name": "__Project_6", | | | | | | "type": "DATASET" | | | | | | } | | | | | | inputVar: __AppendVertices_5 | | | | | | columns: [ | | | | | | "$-.v1 AS v1", | | | | | | "[__VAR_2 IN $-.e WHERE is_edge($__VAR_2)] AS e", | | | | | | "$-.v2 AS v2" | | | | | | ] | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 5 | AppendVertices | 4 | { | outputVar: { | | | | | ver: 0, rows: 18, execTime: 438us, totalTime: 1613us | "colNames": [ | | | | | resp[2]: { | "v1", | | | | | "exec": "308(us)", | "e", | | | | | "host": "store1:9779", | "v2" | | | | | "total": "1114(us)" | ], | | | | | } | "name": "__AppendVertices_5", | | | | | total_rpc: 1350(us) | "type": "DATASET" | | | | | resp[0]: { | } | | | | | "exec": "356(us)", | inputVar: __Traverse_4 | | | | | "host": "store3:9779", | space: 8 | | | | | "total": "1248(us)" | dedup: true | | | | | } | limit: -1 | | | | | resp[1]: { | filter: | | | | | "exec": "323(us)", | orderBy: [] | | | | | "host": "store2:9779", | src: none_direct_dst($-.e) | | | | | "total": "966(us)" | props: [ | | | | | } | { | | | | | } | "tagId": 13, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "tagId": 12, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "props": [ | | | | | | "_tag" | | | | | | ], | | | | | | "tagId": 11 | | | | | | }, | | | | | | { | | | | | | "tagId": 9, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "tagId": 10, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "props": [ | | | | | | "_tag" | | | | | | ], | | | | | | "tagId": 14 | | | | | | }, | | | | | | { | | | | | | "props": [ | | | | | | "_tag" | | | | | | ], | | | | | | "tagId": 15 | | | | | | } | | | | | | ] | | | | | | exprs: | | | | | | vertex_filter: | | | | | | if_track_previous_path: true | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 4 | Traverse | 2 | { | outputVar: { | | | | | ver: 0, rows: 18, execTime: 597us, totalTime: 2943us | "colNames": [ | | | | | step[1]: [ | "v1", | | | | | { | "e" | | | | | "exec": "811(us)", | ], | | | | | "host": "store2:9779", | "name": "__Traverse_4", | | | | | "storage_detail": { | "type": "DATASET" | | | | | "GetNeighborsNode": "551(us)", | } | | | | | "HashJoinNode": "415(us)", | inputVar: __Dedup_2 | | | | | "RelNode": "551(us)", | space: 8 | | | | | "SingleEdgeNode": "391(us)" | dedup: true | | | | | }, | limit: -1 | | | | | "total": "2139(us)", | filter: | | | | | "total_rpc_time": "2328(us)", | orderBy: [] | | | | | "vertices": 2 | src: $-._vid | | | | | }, | edgeTypes: [] | | | | | { | edgeDirection: OUT_EDGE | | | | | "exec": "769(us)", | vertexProps: | | | | | "host": "store1:9779", | edgeProps: [ | | | | | "storage_detail": { | { | | | | | "GetNeighborsNode": "259(us)", | "type": 21, | | | | | "HashJoinNode": "177(us)", | "props": [ | | | | | "RelNode": "259(us)", | "_src", | | | | | "SingleEdgeNode": "161(us)" | "_type", | | | | | }, | "_rank", | | | | | "total": "1938(us)", | "_dst", | | | | | "total_rpc_time": "2328(us)", | "Rel_5_0_Bool", | | | | | "vertices": 1 | "Rel_5_1_Bool", | | | | | }, | "Rel_5_4_Double", | | | | | { | "Rel_5_3_String", | | | | | "exec": "699(us)", | "Rel_5_2_Double" | | | | | "host": "store6:9779", | ] | | | | | "storage_detail": { | }, | | | | | "GetNeighborsNode": "161(us)", | { | | | | | "HashJoinNode": "152(us)", | "props": [ | | | | | "RelNode": "162(us)", | "_src", | | | | | "SingleEdgeNode": "142(us)" | "_type", | | | | | }, | "_rank", | | | | | "total": "1735(us)", | "_dst", | | | | | "total_rpc_time": "2328(us)", | "Rel_1_0_Int", | | | | | "vertices": 1 | "Rel_1_3_Double", | | | | | } | "Rel_1_2_Int", | | | | | ] | "Rel_1_4_Bool", | | | | | } | "Rel_1_5_Int", | | | | | | "Rel_1_1_Int" | | | | | | ], | | | | | | "type": 17 | | | | | | }, | | | | | | { | | | | | | "type": 16, | | | | | | "props": [ | | | | | | "_src", | | | | | | "_type", | | | | | | "_rank", | | | | | | "_dst", | | | | | | "Rel_0_6_Int", | | | | | | "Rel_0_0_Bool", | | | | | | "Rel_0_3_Int", | | | | | | "Rel_0_2_String", | | | | | | "Rel_0_4_Double", | | | | | | "Rel_0_1_String", | | | | | | "Rel_0_5_Int" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "type": 18, | | | | | | "props": [ | | | | | | "_src", | | | | | | "_type", | | | | | | "_rank", | | | | | | "_dst", | | | | | | "Rel_2_3_Bool", | | | | | | "Rel_2_1_Double", | | | | | | "Rel_2_4_Bool", | | | | | | "Rel_2_5_Int", | | | | | | "Rel_2_2_String", | | | | | | "Rel_2_6_String", | | | | | | "Rel_2_0_Int" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "type": 19, | | | | | | "props": [ | | | | | | "_src", | | | | | | "_type", | | | | | | "_rank", | | | | | | "_dst", | | | | | | "Rel_3_0_String", | | | | | | "Rel_3_3_Int", | | | | | | "Rel_3_1_String", | | | | | | "Rel_3_2_Double" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "props": [ | | | | | | "_src", | | | | | | "_type", | | | | | | "_rank", | | | | | | "_dst", | | | | | | "Rel_4_0_Bool", | | | | | | "Rel_4_2_Double", | | | | | | "Rel_4_5_String", | | | | | | "Rel_4_1_String", | | | | | | "Rel_4_4_Double", | | | | | | "Rel_4_3_String" | | | | | | ], | | | | | | "type": 20 | | | | | | } | | | | | | ] | | | | | | statProps: | | | | | | exprs: | | | | | | random: false | | | | | | steps: 1..1 | | | | | | vertex filter: | | | | | | edge filter: | | | | | | if_track_previous_path: false | | | | | | first step filter: | | | | | | tag filter: | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 2 | Dedup | 1 | ver: 0, rows: 4, execTime: 16us, totalTime: 18us | outputVar: { | | | | | | "colNames": [ | | | | | | "_vid" | | | | | | ], | | | | | | "type": "DATASET", | | | | | | "name": "__Dedup_2" | | | | | | } | | | | | | inputVar: __VAR_1 | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 1 | PassThrough | 3 | ver: 0, rows: 0, execTime: 14us, totalTime: 19us | outputVar: { | | | | | | "colNames": [ | | | | | | "_vid" | | | | | | ], | | | | | | "type": "DATASET", | | | | | | "name": "__VAR_1" | | | | | | } | | | | | | inputVar: | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 3 | Start | | ver: 0, rows: 0, execTime: 2us, totalTime: 29us | outputVar: { | | | | | | "colNames": [], | | | | | | "type": "DATASET", | | | | | | "name": "__Start_3" | | | | | | } | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 9 | AppendVertices | 8 | ver: 0, rows: 0, execTime: 46us, totalTime: 50us | outputVar: { | | | | | | "colNames": [ | | | | | | "v2", | | | | | | "e", | | | | | | "v1" | | | | | | ], | | | | | | "type": "DATASET", | | | | | | "name": "__AppendVertices_9" | | | | | | } | | | | | | inputVar: __Traverse_8 | | | | | | space: 8 | | | | | | dedup: true | | | | | | limit: -1 | | | | | | filter: | | | | | | orderBy: [] | | | | | | src: none_direct_dst($-.e) | | | | | | props: [ | | | | | | { | | | | | | "props": [ | | | | | | "_tag" | | | | | | ], | | | | | | "tagId": 13 | | | | | | }, | | | | | | { | | | | | | "props": [ | | | | | | "_tag" | | | | | | ], | | | | | | "tagId": 12 | | | | | | }, | | | | | | { | | | | | | "tagId": 11, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "tagId": 9, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "tagId": 10, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "tagId": 14, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | }, | | | | | | { | | | | | | "tagId": 15, | | | | | | "props": [ | | | | | | "_tag" | | | | | | ] | | | | | | } | | | | | | ] | | | | | | exprs: | | | | | | vertex_filter: | | | | | | if_track_previous_path: true | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 8 | Traverse | 7 | { | outputVar: { | | | | | ver: 0, rows: 0, execTime: 4867us, totalTime: 9173us | "colNames": [ | | | | | step[2]: [ | "v2", | | | | | { | "e" | | | | | "exec": "488(us)", | ], | | | | | "host": "store2:9779", | "type": "DATASET", | | | | | "storage_detail": { | "name": "__Traverse_8" | | | | | "GetNeighborsNode": "371(us)", | } | | | | | "HashJoinNode": "261(us)", | inputVar: __Argument_7 | | | | | "RelNode": "371(us)", | space: 8 | | | | | "SingleEdgeNode": "243(us)" | dedup: true | | | | | }, | limit: -1 | | | | | "total": "1509(us)", | filter: | | | | | "total_rpc_time": "1948(us)", | orderBy: [] | | | | | "vertices": 2 | src: id($-.v2) | | | | | }, | edgeTypes: [] | | | | | { | edgeDirection: OUT_EDGE | | | | | "exec": "331(us)", | vertexProps: | | | | | "host": "store3:9779", | edgeProps: [ | | | | | "storage_detail": { | { | | | | | "GetNeighborsNode": "86(us)", | "type": 21, | | | | | "HashJoinNode": "63(us)", | "props": [ | | | | | "RelNode": "86(us)", | "_src", | | | | | "SingleEdgeNode": "54(us)" | "_type", | | | | | }, | "_rank", | | | | | "total": "1208(us)", | "_dst", | | | | | "total_rpc_time": "1948(us)", | "Rel_5_0_Bool", | | | | | "vertices": 1 | "Rel_5_1_Bool", | | | | | }, | "Rel_5_4_Double", | | | | | { | "Rel_5_3_String", | | | | | "exec": "686(us)", | "Rel_5_2_Double" | | | | | "host": "store5:9779", | ] | | | | | "storage_detail": { | }, | | | | | "GetNeighborsNode": "311(us)", | { | | | | | "HashJoinNode": "254(us)", | "props": [ | | | | | "RelNode": "311(us)", | "_src", | | | | | "SingleEdgeNode": "237(us)" | "_type", | | | | | }, | "_rank", | | | | | "total": "1532(us)", | "_dst", | | | | | "total_rpc_time": "1948(us)", | "Rel_1_0_Int", | | | | | "vertices": 2 | "Rel_1_3_Double", | | | | | }, | "Rel_1_2_Int", | | | | | { | "Rel_1_4_Bool", | | | | | "exec": "467(us)", | "Rel_1_5_Int", | | | | | "host": "store6:9779", | "Rel_1_1_Int" | | | | | "storage_detail": { | ], | | | | | "GetNeighborsNode": "173(us)", | "type": 17 | | | | | "HashJoinNode": "124(us)", | }, | | | | | "RelNode": "173(us)", | { | | | | | "SingleEdgeNode": "115(us)" | "type": 16, | | | | | }, | "props": [ | | | | | "total": "1368(us)", | "_src", | | | | | "total_rpc_time": "1948(us)", | "_type", | | | | | "vertices": 1 | "_rank", | | | | | }, | "_dst", | | | | | { | "Rel_0_6_Int", | | | | | "exec": "494(us)", | "Rel_0_0_Bool", | | | | | "host": "store1:9779", | "Rel_0_3_Int", | | | | | "storage_detail": { | "Rel_0_2_String", | | | | | "GetNeighborsNode": "238(us)", | "Rel_0_4_Double", | | | | | "HashJoinNode": "147(us)", | "Rel_0_1_String", | | | | | "RelNode": "239(us)", | "Rel_0_5_Int" | | | | | "SingleEdgeNode": "137(us)" | ] | | | | | }, | }, | | | | | "total": "1246(us)", | { | | | | | "total_rpc_time": "1948(us)", | "type": 18, | | | | | "vertices": 1 | "props": [ | | | | | } | "_src", | | | | | ] | "_type", | | | | | step[3]: [ | "_rank", | | | | | { | "_dst", | | | | | "exec": "643(us)", | "Rel_2_3_Bool", | | | | | "host": "store5:9779", | "Rel_2_1_Double", | | | | | "storage_detail": { | "Rel_2_4_Bool", | | | | | "GetNeighborsNode": "432(us)", | "Rel_2_5_Int", | | | | | "HashJoinNode": "296(us)", | "Rel_2_2_String", | | | | | "RelNode": "433(us)", | "Rel_2_6_String", | | | | | "SingleEdgeNode": "272(us)" | "Rel_2_0_Int" | | | | | }, | ] | | | | | "total": "1556(us)", | }, | | | | | "total_rpc_time": "1913(us)", | { | | | | | "vertices": 3 | "type": 19, | | | | | }, | "props": [ | | | | | { | "_src", | | | | | "exec": "581(us)", | "_type", | | | | | "host": "store6:9779", | "_rank", | | | | | "storage_detail": { | "_dst", | | | | | "GetNeighborsNode": "255(us)", | "Rel_3_0_String", | | | | | "HashJoinNode": "162(us)", | "Rel_3_3_Int", | | | | | "RelNode": "256(us)", | "Rel_3_1_String", | | | | | "SingleEdgeNode": "151(us)" | "Rel_3_2_Double" | | | | | }, | ] | | | | | "total": "1612(us)", | }, | | | | | "total_rpc_time": "1913(us)", | { | | | | | "vertices": 1 | "type": 20, | | | | | }, | "props": [ | | | | | { | "_src", | | | | | "exec": "373(us)", | "_type", | | | | | "host": "store2:9779", | "_rank", | | | | | "storage_detail": { | "_dst", | | | | | "GetNeighborsNode": "124(us)", | "Rel_4_0_Bool", | | | | | "HashJoinNode": "93(us)", | "Rel_4_2_Double", | | | | | "RelNode": "124(us)", | "Rel_4_5_String", | | | | | "SingleEdgeNode": "84(us)" | "Rel_4_1_String", | | | | | }, | "Rel_4_4_Double", | | | | | "total": "1285(us)", | "Rel_4_3_String" | | | | | "total_rpc_time": "1913(us)", | ] | | | | | "vertices": 1 | } | | | | | }, | ] | | | | | { | statProps: | | | | | "exec": "502(us)", | exprs: | | | | | "host": "store7:9779", | random: false | | | | | "storage_detail": { | steps: 4..3 | | | | | "GetNeighborsNode": "157(us)", | vertex filter: | | | | | "HashJoinNode": "132(us)", | edge filter: | | | | | "RelNode": "157(us)", | if_track_previous_path: false | | | | | "SingleEdgeNode": "123(us)" | first step filter: | | | | | }, | tag filter: | | | | | "total": "1295(us)", | | | | | | "total_rpc_time": "1913(us)", | | | | | | "vertices": 1 | | | | | | } | | | | | | ] | | | | | | step[1]: [ | | | | | | { | | | | | | "exec": "522(us)", | | | | | | "host": "store3:9779", | | | | | | "storage_detail": { | | | | | | "GetNeighborsNode": "361(us)", | | | | | | "HashJoinNode": "272(us)", | | | | | | "RelNode": "361(us)", | | | | | | "SingleEdgeNode": "253(us)" | | | | | | }, | | | | | | "total": "1534(us)", | | | | | | "total_rpc_time": "1702(us)", | | | | | | "vertices": 2 | | | | | | }, | | | | | | { | | | | | | "exec": "445(us)", | | | | | | "host": "store2:9779", | | | | | | "storage_detail": { | | | | | | "GetNeighborsNode": "185(us)", | | | | | | "HashJoinNode": "77(us)", | | | | | | "RelNode": "185(us)", | | | | | | "SingleEdgeNode": "69(us)" | | | | | | }, | | | | | | "total": "1296(us)", | | | | | | "total_rpc_time": "1702(us)", | | | | | | "vertices": 1 | | | | | | }, | | | | | | { | | | | | | "exec": "529(us)", | | | | | | "host": "store1:9779", | | | | | | "storage_detail": { | | | | | | "GetNeighborsNode": "245(us)", | | | | | | "HashJoinNode": "155(us)", | | | | | | "RelNode": "245(us)", | | | | | | "SingleEdgeNode": "146(us)" | | | | | | }, | | | | | | "total": "1276(us)", | | | | | | "total_rpc_time": "1702(us)", | | | | | | "vertices": 1 | | | | | | } | | | | | | ] | | | | | | } | | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- | 7 | Argument | | ver: 0, rows: 4, execTime: 0us, totalTime: 55us | outputVar: { | | | | | | "colNames": [ | | | | | | "v2" | | | | | | ], | | | | | | "type": "DATASET", | | | | | | "name": "__Argument_7" | | | | | | } | | | | | | inputVar: __Project_6 | -----+----------------+--------------+------------------------------------------------------+---------------------------------------------------------------------------- Tue, 03 Jan 2023 15:54:06 CST我们发现执行计划中编号 8 的节点对应的 step range 明显是错的:steps: 4..3(结合语句中的 e*1..1,正确值应该是 1..1)。下面,我们通过 rr 录下一次错误的执行过程,然后在设置 MatchStepRange 的地方(src/graph/planner/match/MatchPathPlanner.cpp:282)下断点:[New Thread 34.61] [Switching to Thread 34.47] Thread 3 hit Breakpoint 3, nebula::graph::MatchPathPlanner::rightExpandFromNode (this=0x18e60dc5c58, startIndex=0, subplan=...) at /data/src/nebula-comm/src/graph/planner/match/MatchPathPlanner.cpp:282 282 traverse->setStepRange(edge.range); (rr) bt #0 nebula::graph::MatchPathPlanner::rightExpandFromNode (this=0x18e60dc5c58, startIndex=0, subplan=...) at /data/src/nebula-comm/src/graph/planner/match/MatchPathPlanner.cpp:282 #1 0x00000000033198dd in nebula::graph::MatchPathPlanner::expandFromNode (this=0x18e60dc5c58, startIndex=0, subplan=...) at /data/src/nebula-comm/src/graph/planner/match/MatchPathPlanner.cpp:169 #2 0x000000000331937b in nebula::graph::MatchPathPlanner::expand (this=0x18e60dc5c58, startFromEdge=false, startIndex=0, subplan=...) at /data/src/nebula-comm/src/graph/planner/match/MatchPathPlanner.cpp:158 #3 0x00000000033185ca in nebula::graph::MatchPathPlanner::transform (this=0x18e60dc5c58, bindWhere=0x69433b909880, nodeAliasesSeen=...) at /data/src/nebula-comm/src/graph/planner/match/MatchPathPlanner.cpp:78 #4 0x0000000003305b16 in nebula::graph::MatchClausePlanner::transform (this=0x69433bbd64b0, clauseCtx=0x7f25a94881c0) at /data/src/nebula-comm/src/graph/planner/match/MatchClausePlanner.cpp:33 #5 0x0000000003302a11 in nebula::graph::MatchPlanner::genPlan (this=0x69433b9998b0, clauseCtx=0x7f25a94881c0) at /data/src/nebula-comm/src/graph/planner/match/MatchPlanner.cpp:42 #6 0x0000000003303030 in nebula::graph::MatchPlanner::connectMatchPlan (this=0x69433b9998b0, queryPlan=..., matchCtx=0x7f25a94881c0) at /data/src/nebula-comm/src/graph/planner/match/MatchPlanner.cpp:63 #7 0x0000000003302101 in nebula::graph::MatchPlanner::genQueryPartPlan (this=0x69433b9998b0, qctx=0x7f25a94672a0, queryPlan=..., queryPart=...) at /data/src/nebula-comm/src/graph/planner/match/MatchPlanner.cpp:137 #8 0x0000000003301e84 in nebula::graph::MatchPlanner::transform (this=0x69433b9998b0, astCtx=0x7f25a95eb700) at /data/src/nebula-comm/src/graph/planner/match/MatchPlanner.cpp:33 #9 0x00000000032e40b1 in nebula::graph::Planner::toPlan (astCtx=0x7f25a95eb700) at /data/src/nebula-comm/src/graph/planner/Planner.cpp:38 #10 0x00000000030d6b7e in nebula::graph::Validator::toPlan (this=0x69433bbd0f80) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:401 #11 0x00000000030d5b93 in nebula::graph::Validator::validate (this=0x69433bbd0f80) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:364 #12 0x0000000003114880 in nebula::graph::SequentialValidator::validateImpl (this=0x69433bbd0c00) at /data/src/nebula-comm/src/graph/validator/SequentialValidator.cpp:40 #13 0x00000000030d58e7 in nebula::graph::Validator::validate (this=0x69433bbd0c00) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:354 #14 0x00000000030d4c8f in nebula::graph::Validator::validate (sentence=0x69433bc58870, qctx=0x7f25a94672a0) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:285 #15 0x0000000002ff4841 in nebula::graph::QueryInstance::validateAndOptimize (this=0x69433bc07640) at /data/src/nebula-comm/src/graph/service/QueryInstance.cpp:102 #16 0x0000000002ff3920 in nebula::graph::QueryInstance::execute (this=0x69433bc07640) at /data/src/nebula-comm/src/graph/service/QueryInstance.cpp:42 #17 0x0000000002fe9219 in nebula::graph::QueryEngine::execute (this=0x69433bc603c0, rctx=...) at /data/src/nebula-comm/src/graph/service/QueryEngine.cpp:57 #18 0x0000000002f6145f in nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1::operator()(nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >) (this=0x7f25a9452460, ret=...) at /data/src/nebula-comm/src/graph/service/GraphService.cpp:183 #19 0x0000000002f60626 in folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}::operator()() const (this=0x18e60dc8120) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:98 #20 0x0000000002f605bf in folly::futures::detail::InvokeResultWrapper<void>::wrapResult<folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}>(folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}) (fn=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:90 #21 0x0000000002f6057c in folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) ( t=..., f=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:108 #22 0x0000000002f604cf in folly::Future<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >::thenValue<nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&)#1}::operator()(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&) (this=0x7f25a9452460, t=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:991 #23 0x0000000002f6046e in folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >::thenValue<nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&)#1}>::invoke<folly::Executor::KeepAlive<folly::Executor>, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > > >(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&) (this=0x7f25a9452460, args=..., args=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:144 #24 0x0000000002f603cb in folly::futures::detail::detail_msvc_15_7_workaround::invoke<folly::futures::detail::tryExecutorCallableResult<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, folly::Future<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >::thenValue<nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::alloc--Type <RET> for more, q to quit, c to continue without paging--q Quit (rr) p edge.range $1 = (nebula::MatchStepRange *) 0x69433b948320 (rr) p *edge.range $2 = {min_ = 1, max_ = 1} (rr) c Continuing. Thread 3 hit Breakpoint 3, nebula::graph::MatchPathPlanner::rightExpandFromNode (this=0x18e60dc61f8, startIndex=0, subplan=...) at /data/src/nebula-comm/src/graph/planner/match/MatchPathPlanner.cpp:282 282 traverse->setStepRange(edge.range); (rr) p edge.range $3 = (nebula::MatchStepRange *) 0x69433b9998a0 (rr) p *edge.range $4 = {min_ = 4, max_ = 3} (rr) p &edge.range->min_ $5 = (size_t *) 0x69433b9998a0 (rr) watch *((size_t *) 0x69433b9998a0) Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) (rr)我们发现第二次设置的 MatchStepRange(4..3) 显然已经出错。我们在 &edge.range→min_ 这个地址(0x69433b9998a0)下个硬件断点,重新运行程序。我们观察到 0x69433b9998a0 这个地址一直被反复的释放直到它被分配给一个 MatchStepRange 对象:0x0000000003319f1d 282 traverse->setStepRange(edge.range); The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /root/src/nebula-comm/build-debug/bin/nebula-graphd Program stopped. 0x00007f25a9200100 in ?? () from /lib64/ld-linux-x86-64.so.2 (rr) c Continuing. [New Thread 34.39] [New Thread 34.35] [New Thread 34.36] [New Thread 34.37] [New Thread 34.38] [Switching to Thread 34.39] Thread 2 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = <unreadable> New value = 0 0x0000000070000002 in syscall_traced () (rr) c Continuing. [New Thread 34.56] [New Thread 34.57] [New Thread 34.40] [New Thread 34.41] [New Thread 34.42] [New Thread 34.43] [New Thread 34.44] [New Thread 34.45] [New Thread 34.46] [New Thread 34.47] [New Thread 34.48] [New Thread 34.49] [New Thread 34.50] [New Thread 34.51] [New Thread 34.52] [New Thread 34.53] [New Thread 34.54] [New Thread 34.55] [New Thread 34.58] [New Thread 34.59] [New Thread 34.60] [New Thread 34.61] [Switching to Thread 34.57] Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 0 New value = 128849018911 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:315 315 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory. (rr) c Continuing. Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 128849018911 New value = 0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:260 260 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory. (rr) Continuing. Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 0 New value = 139799731034544 std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, long>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, long> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_insert_bucket_begin (this=0x7f25a94ed9a0, __bkt=0, __node=0x7f25a94e1a80) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable.h:1619 1619 { (rr) Continuing. Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 139799731034544 New value = 0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:260 260 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory. (rr) Continuing. Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 0 New value = 89457155694696 std::_Hashtable<int, std::pair<int const, std::vector<std::shared_ptr<nebula::meta::NebulaSchemaProvider const>, std::allocator<std::shared_ptr<nebula::meta::NebulaSchemaProvider const> > > >, std::allocator<std::pair<int const, std::vector<std::shared_ptr<nebula::meta::NebulaSchemaProvider const>, std::allocator<std::shared_ptr<nebula::meta::NebulaSchemaProvider const> > > > >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_bucket_begin (this=0x515c5dfce058, __bkt=0, __node=0x69433bc58e70) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable.h:1619 1619 { (rr) Continuing. Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 89457155694696 New value = 115737483335024 0x0000000004066efa in std::_Hashtable<int, std::pair<int const, std::vector<std::shared_ptr<nebula::meta::NebulaSchemaProvider const>, std::allocator<std::shared_ptr<nebula::meta::NebulaSchemaProvider const> > > >, std::allocator<std::pair<int const, std::vector<std::shared_ptr<nebula::meta::NebulaSchemaProvider const>, std::allocator<std::shared_ptr<nebula::meta::NebulaSchemaProvider const> > > > >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_insert_bucket_begin (this=0x515c5dfce058, __bkt=1, __node=0x69433b94d570) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable.h:1616 1616 _H1, _H2, _Hash, _RehashPolicy, _Traits>:: (rr) Continuing. Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 115737483335024 New value = 0 0x0000000003153d10 in std::__shared_ptr<nebula::meta::NebulaSchemaProvider const, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (this=0x69433b9998a0) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:1119 1119 : _M_ptr(0), _M_refcount() (rr) Continuing. Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 0 New value = 139799732183408 std::swap<nebula::meta::NebulaSchemaProvider const*> (__a=@0x515c5dfcd878: 0x0, __b=@0x69433b9998a0: 0x7f25a9606170) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/move.h:196 196 } (rr) Continuing. Thread 8 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 139799732183408 New value = 0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:260 260 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory. (rr) Continuing. [Switching to Thread 34.47] Thread 16 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 0 New value = 98202129293344 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:315 315 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory. (rr) Continuing. Thread 16 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 98202129293344 New value = 1 0x00000000031e675d in std::make_unique<nebula::MatchStepRange, nebula::MatchStepRange&> (__args=...) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:857 857 { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); } (rr) bt #0 0x00000000031e675d in std::make_unique<nebula::MatchStepRange, nebula::MatchStepRange&> (__args=...) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:857 #1 0x00000000031e586a in nebula::MatchEdge::clone (this=0x69433b919530) at /data/src/nebula-comm/src/parser/MatchPath.h:128 #2 0x00000000031e082d in nebula::MatchPath::clone (this=0x7f25a94d3b00) at /data/src/nebula-comm/src/parser/MatchPath.h:363 #3 0x00000000031e0161 in nebula::graph::extractSinglePathPredicate (expr=0x595077086880, pathPreds=...) at /data/src/nebula-comm/src/graph/validator/MatchValidator.cpp:1122 #4 0x00000000031e0b43 in nebula::graph::extractMultiPathPredicate (expr=0x595077085e60, pathPreds=...) at /data/src/nebula-comm/src/graph/validator/MatchValidator.cpp:1158 #5 0x00000000031d87aa in nebula::graph::MatchValidator::validatePathInWhere (this=0x69433bbd0f80, wctx=..., availableAliases=..., paths=...) at /data/src/nebula-comm/src/graph/validator/MatchValidator.cpp:1182 #6 0x00000000031cfddc in nebula::graph::MatchValidator::validateFilter (this=0x69433bbd0f80, filter=0x595077006eb0, whereClauseCtx=...) at /data/src/nebula-comm/src/graph/validator/MatchValidator.cpp:361 #7 0x00000000031cdd60 in nebula::graph::MatchValidator::validateImpl (this=0x69433bbd0f80) at /data/src/nebula-comm/src/graph/validator/MatchValidator.cpp:66 #8 0x00000000030d58e7 in nebula::graph::Validator::validate (this=0x69433bbd0f80) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:354 #9 0x0000000003114880 in nebula::graph::SequentialValidator::validateImpl (this=0x69433bbd0c00) at /data/src/nebula-comm/src/graph/validator/SequentialValidator.cpp:40 #10 0x00000000030d58e7 in nebula::graph::Validator::validate (this=0x69433bbd0c00) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:354 #11 0x00000000030d4c8f in nebula::graph::Validator::validate (sentence=0x69433bc58870, qctx=0x7f25a94672a0) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:285 #12 0x0000000002ff4841 in nebula::graph::QueryInstance::validateAndOptimize (this=0x69433bc07640) at /data/src/nebula-comm/src/graph/service/QueryInstance.cpp:102 #13 0x0000000002ff3920 in nebula::graph::QueryInstance::execute (this=0x69433bc07640) at /data/src/nebula-comm/src/graph/service/QueryInstance.cpp:42 #14 0x0000000002fe9219 in nebula::graph::QueryEngine::execute (this=0x69433bc603c0, rctx=...) at /data/src/nebula-comm/src/graph/service/QueryEngine.cpp:57 #15 0x0000000002f6145f in nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1::operator()(nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >) (this=0x7f25a9452460, ret=...) at /data/src/nebula-comm/src/graph/service/GraphService.cpp:183 #16 0x0000000002f60626 in folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}::operator()() const (this=0x18e60dc8120) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:98 #17 0x0000000002f605bf in folly::futures::detail::InvokeResultWrapper<void>::wrapResult<folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}>(folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}) (fn=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:90 #18 0x0000000002f6057c in folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) ( t=..., f=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:108 #19 0x0000000002f604cf in folly::Future<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >::thenValue<nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&)#1}::operator()(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&) (this=0x7f25a9452460, t=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:991 #20 0x0000000002f6046e in folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >::thenValue<nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&)#1}>::invoke<folly::Executor::KeepAlive<folly::Executor>, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > > >(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&) (this=0x7f25a9452460, args=..., args=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:144 #21 0x0000000002f603cb in folly::futures::detail::detail_msvc_15_7_workaround::invoke<folly::futures::detail::tryExecutorCallableResult<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, folly::Future<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >::thenValue<nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&)#1}, void>, folly::futures::detail::CoreCallbackState<folly::Unit, {lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&)#1}>, nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, 0>(folly::futures::detail::tryExecutorCallableResult<nebula::StatusOr< --Type <RET> for more, q to quit, c to continue without paging--q Quit (rr)可以看到出问题的 MatchStepRange 在 src/parser/MatchPath.h:128 通过 make_unique 分配出的内存并初始化的:119 120 MatchEdge clone() const { 121 auto me = MatchEdge(); 122 me.direction_ = direction_; 123 me.alias_ = alias_; 124 for (const auto& type : types_) { 125 me.types_.emplace_back(std::make_unique<std::string>(*DCHECK_NOTNULL(type))); 126 } 127 if (range_ != nullptr) { 128 me.range_ = std::make_unique<MatchStepRange>(*range_); 129 } 130 if (props_ != nullptr) { 131 me.props_ = static_cast<MapExpression*>(props_->clone()); 132 } 133 return me; 134 } 135 136 private: 137 Direction direction_; 138 std::string alias_;继续 continue 看这块内存上的数据什么时候被篡改成错误的值:Continuing. Thread 16 hit Hardware watchpoint 4: *((size_t *) 0x69433b9998a0) Old value = 1 New value = 0 0x0000000002ec2901 in std::__detail::_Hash_node_base::_Hash_node_base (this=0x69433b9998a0) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:218 218 _Hash_node_base() noexcept : _M_nxt() { } (rr) bt #0 0x0000000002ec2901 in std::__detail::_Hash_node_base::_Hash_node_base (this=0x69433b9998a0) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:218 #1 0x0000000002ed1191 in std::__detail::_Hash_node_value_base<nebula::Expression::Kind>::_Hash_node_value_base (this=0x69433b9998a0) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:229 #2 0x0000000002ed10c1 in std::__detail::_Hash_node<nebula::Expression::Kind, false>::_Hash_node (this=0x69433b9998a0) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:279 #3 0x0000000002ed0f9b in std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<nebula::Expression::Kind, false> > >::_M_allocate_node<nebula::Expression::Kind const&> (this=0x18e60dc63b8, __args=@0x18e60dc63a6: nebula::Expression::Kind::kVertex) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:2085 #4 0x0000000002ed0cf3 in std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<nebula::Expression::Kind, false> > >::operator()<nebula::Expression::Kind const&> (this=0x18e60dc5ef8, __arg=@0x18e60dc63a6: nebula::Expression::Kind::kVertex) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:167 #5 0x0000000002ed0bc2 in std::_Hashtable<nebula::Expression::Kind, nebula::Expression::Kind, std::allocator<nebula::Expression::Kind>, std::__detail::_Identity, std::equal_to<nebula::Expression::Kind>, std::hash<nebula::Expression::Kind>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_insert<nebula::Expression::Kind const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<nebula::Expression::Kind, false> > > > ( this=0x18e60dc63b8, __v=@0x18e60dc63a6: nebula::Expression::Kind::kVertex, __node_gen=..., __n_elt=1) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable.h:1852 #6 0x0000000002ed06cd in std::__detail::_Insert_base<nebula::Expression::Kind, nebula::Expression::Kind, std::allocator<nebula::Expression::Kind>, std::__detail::_Identity, std::equal_to<nebula::Expression::Kind>, std::hash<nebula::Expression::Kind>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::insert (this=0x18e60dc63b8, __v=@0x18e60dc63a6: nebula::Expression::Kind::kVertex) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:824 #7 0x0000000002ed042d in std::_Hashtable<nebula::Expression::Kind, nebula::Expression::Kind, std::allocator<nebula::Expression::Kind>, std::__detail::_Identity, std::equal_to<nebula::Expression::Kind>, std::hash<nebula::Expression::Kind>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_Hashtable<nebula::Expression::Kind const*> (this=0x18e60dc63b8, __f=0x18e60dc63a6, __l=0x18e60dc63a8, __bucket_hint=0, __h1=..., __h2=..., __h=..., __eq=..., __exk=..., __a=...) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable.h:1026 #8 0x0000000002ed01bc in std::_Hashtable<nebula::Expression::Kind, nebula::Expression::Kind, std::allocator<nebula::Expression::Kind>, std::__detail::_Identity, std::equal_to<nebula::Expression::Kind>, std::hash<nebula::Expression::Kind>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_Hashtable (this=0x18e60dc63b8, __l=..., __n=0, __hf=..., __eql=..., __a=...) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable.h:497 #9 0x0000000002eae859 in std::unordered_set<nebula::Expression::Kind, std::hash<nebula::Expression::Kind>, std::equal_to<nebula::Expression::Kind>, std::allocator<nebula::Expression::Kind> >::unordered_set (this=0x18e60dc63b8, __l=..., __n=0, __hf=..., __eql=..., __a=...) at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unordered_set.h:226 #10 0x00000000031d2df0 in nebula::graph::MatchValidator::validateReturn (this=0x69433bbd0f80, ret=0x69433bc587b0, queryParts=..., retClauseCtx=...) at /data/src/nebula-comm/src/graph/validator/MatchValidator.cpp:471 #11 0x00000000031ce962 in nebula::graph::MatchValidator::validateImpl (this=0x69433bbd0f80) at /data/src/nebula-comm/src/graph/validator/MatchValidator.cpp:117 #12 0x00000000030d58e7 in nebula::graph::Validator::validate (this=0x69433bbd0f80) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:354 #13 0x0000000003114880 in nebula::graph::SequentialValidator::validateImpl (this=0x69433bbd0c00) at /data/src/nebula-comm/src/graph/validator/SequentialValidator.cpp:40 #14 0x00000000030d58e7 in nebula::graph::Validator::validate (this=0x69433bbd0c00) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:354 #15 0x00000000030d4c8f in nebula::graph::Validator::validate (sentence=0x69433bc58870, qctx=0x7f25a94672a0) at /data/src/nebula-comm/src/graph/validator/Validator.cpp:285 #16 0x0000000002ff4841 in nebula::graph::QueryInstance::validateAndOptimize (this=0x69433bc07640) at /data/src/nebula-comm/src/graph/service/QueryInstance.cpp:102 #17 0x0000000002ff3920 in nebula::graph::QueryInstance::execute (this=0x69433bc07640) at /data/src/nebula-comm/src/graph/service/QueryInstance.cpp:42 #18 0x0000000002fe9219 in nebula::graph::QueryEngine::execute (this=0x69433bc603c0, rctx=...) at /data/src/nebula-comm/src/graph/service/QueryEngine.cpp:57 #19 0x0000000002f6145f in nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1::operator()(nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >) (this=0x7f25a9452460, ret=...) at /data/src/nebula-comm/src/graph/service/GraphService.cpp:183 #20 0x0000000002f60626 in folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}::operator()() const (this=0x18e60dc8120) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:98 #21 0x0000000002f605bf in folly::futures::detail::InvokeResultWrapper<void>::wrapResult<folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}>(folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&)::{lambda()#1}) (fn=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:90 #22 0x0000000002f6057c in folly::futures::detail::wrapInvoke<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> >, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&, nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) ( t=..., f=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:108 #23 0x0000000002f604cf in folly::Future<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >::thenValue<nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1>(nebula::graph::GraphService::future_executeWithParameter(long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, nebula::Value, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nebula::Value> > > const&)::$_1&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&)#1}::operator()(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::StatusOr<std::shared_ptr<nebula::graph::ClientSession> > >&&) (this=0x7f25a9452460, t=...) at /data/src/nebula-comm/build-debug/third-party/install/include/folly/futures/Future-inl.h:991 --Type <RET> for more, q to quit, c to continue without paging--q Quit (rr)非常神奇的是,这块内存在 src/graph/validator/MatchValidator.cpp:471 中又分配给某个 hash 表了: 449 // Check validity of return clause. 450 // Disable return * without symbols, disable invalid expressions, check aggregate expression, 451 // rewrite expression to fit semantic, check available aliases, check columns, check limit and 452 // order by options. 453 Status MatchValidator::validateReturn(MatchReturn *ret, 454 const std::vector<QueryPart> &queryParts, 455 ReturnClauseContext &retClauseCtx) { 456 YieldColumns *columns = retClauseCtx.qctx->objPool()->makeAndAdd<YieldColumns>(); 457 if (ret->returnItems()->allNamedAliases() && !queryParts.empty()) { 458 auto status = buildColumnsForAllNamedAliases(queryParts, columns); 459 if (!status.ok()) { 460 return status; 461 } 462 if (columns->empty() && !ret->returnItems()->columns()) { 463 return Status::SemanticError("RETURN * is not allowed when there are no variables in scope"); 464 } 465 } 466 std::vector<const Expression *> exprs; 467 if (ret->returnItems()->columns()) { 468 exprs.reserve(ret->returnItems()->columns()->size()); 469 for (auto *column : ret->returnItems()->columns()->columns()) { 470 if (ExpressionUtils::hasAny(column->expr(), 471 {Expression::Kind::kVertex, Expression::Kind::kEdge})) { 472 return Status::SemanticError( 473 "keywords: vertex and edge are not supported in return clause `%s'", 474 column->toString().c_str()); 475 } 476 if (!retClauseCtx.yield->hasAgg_ && 477 ExpressionUtils::hasAny(column->expr(), {Expression::Kind::kAggregate})) { 478 retClauseCtx.yield->hasAgg_ = true; 479 } 480 column->setExpr(ExpressionUtils::rewriteAttr2LabelTagProp( 481 column->expr(), retClauseCtx.yield->aliasesAvailable)); 482 exprs.push_back(column->expr()); 483 columns->addColumn(column->clone().release()); 484 } 485 }唯一的解释就是 MatchStepRange 所在的内存被释放然后又分配出去了。重点关注 MatchStepRange 内存的分配、释放过程,我们顺着 make_unique() 分配 MatchStepRange 的堆栈跟过去:1171 1172 Status MatchValidator::validatePathInWhere( 1173 WhereClauseContext &wctx, 1174 const std::unordered_map<std::string, AliasType> &availableAliases, 1175 std::vector<Path> &paths) { 1176 auto expr = ExpressionUtils::flattenInnerLogicalExpr(wctx.filter); 1177 auto *pool = qctx_->objPool(); 1178 ValidatePatternExpressionVisitor visitor(pool, vctx_); 1179 expr->accept(&visitor); 1180 std::vector<MatchPath> pathPreds; 1181 // FIXME(czp): Delete this function and add new expression visitor to cover all general cases 1182 if (extractMultiPathPredicate(expr, pathPreds)) { 1183 wctx.filter = nullptr; 1184 } else { 1185 // Flatten and fold the inner logical expressions that already have operands that can be 1186 // compacted 1187 wctx.filter = 1188 ExpressionUtils::foldInnerLogicalExpr(ExpressionUtils::flattenInnerLogicalExpr(expr)); 1189 } 1190 for (auto &pred : pathPreds) { 1191 NG_RETURN_IF_ERROR(checkMatchPathExpr(pred, availableAliases)); 1192 // Build path alias 1193 auto pathAlias = pred.toString(); 1194 pred.setAlias(new std::string(pathAlias)); 1195 paths.emplace_back(); 1196 NG_RETURN_IF_ERROR(validatePath(&pred, paths.back())); 1197 NG_RETURN_IF_ERROR(buildRollUpPathInfo(&pred, paths.back())); 1198 }MatchStepRange 是 graph/validator/MatchValidator.cpp:1182 这行代码触发分配的,保存在 pathPreds 这个 vector 中。在 NebulaGraph 中不定长(变长)查询 :<edge_type>*[minHop..maxHop] 会触发 extractMultiPathPredicate() 函数来分配 match path range,本文就是触发了 extractMultiPathPredicate() 函数。回到上面代码,这里就有个问题:pathPreds 是个局部变量,validatePathInWhere() 一旦执行结束它的内存就释放了,这当然包括 MatchStepRange 这个 unique_ptr。再看 MatchStepRange 这块内存是否在释放后还被其他地方被用到。查看代码我们发现 graph/validator/MatchValidator.cpp:1196 这行代码的产生的调用链 validatePath()-> buildEdgeInfo() 把 pred 中的 MatchStepRange 裸指针赋值出去了: 138 // Validate pattern from expression 139 Status MatchValidator::validatePath(const MatchPath *path, Path &pathInfo) { 140 // Pattern from expression won't generate new variable 141 std::unordered_map<std::string, AliasType> dummy; 142 NG_RETURN_IF_ERROR(buildNodeInfo(path, pathInfo.nodeInfos, dummy)); 143 NG_RETURN_IF_ERROR(buildEdgeInfo(path, pathInfo.edgeInfos, dummy)); 144 NG_RETURN_IF_ERROR(buildPathExpr(path, pathInfo, dummy)); 145 pathInfo.isPred = path->isPredicate(); 146 pathInfo.isAntiPred = path->isAntiPredicate(); 147 148 return Status::OK(); 149 } ... 263 // Build edges information by match pattern. 264 Status MatchValidator::buildEdgeInfo(const MatchPath *path, 265 std::vector<EdgeInfo> &edgeInfos, 266 std::unordered_map<std::string, AliasType> &aliases) { 267 auto *sm = qctx_->schemaMng(); 268 auto steps = path->steps(); 269 edgeInfos.resize(steps); 270 271 for (auto i = 0u; i < steps; i++) { 272 auto *edge = path->edge(i); 273 auto &types = edge->types(); ... 298 AliasType aliasType = AliasType::kEdge; 299 auto *stepRange = edge->range(); 300 if (stepRange != nullptr) { 301 NG_RETURN_IF_ERROR(validateStepRange(stepRange)); 302 edgeInfos[i].range = stepRange;看 src/graph/validator/MatchValidator.cpp:299,直接赋值了那块即将被释放的 MatchStepRange 指针,这是一切悲剧的根源。问题定位到了,也就好解决了。谢谢你读完本文 (///▽///)NebulaGraph Desktop,Windows 和 macOS 用户安装图数据库的绿色通道,10s 拉起搞定海量数据的图服务。通道传送门:http://c.nxw.so/95xjV想看源码的小伙伴可以前往 GitHub 阅读、使用、(^з^)-☆ star 它 -> GitHub;和其他的 NebulaGraph 用户一起交流图数据库技术和应用技能,留下「你的名片」一起玩耍呢~
文章
NoSQL  ·  iOS开发  ·  MacOS  ·  Windows
2023-03-22
Linux基础操作
1.os概念,定位 操作系统是一款管理软件,管理硬件和软件。对上提供良好、稳定和安全、高效的运行环境;对下管理好软硬件资源。2.查看Linux主机ip和使用XSHell登陆主机、XSHell下的复制黏贴查看Linux主机ip:在终端下敲 ifconfig 指令, 查看到 ip 地址。。使用XSHell登录主机:在XSHell终端写:ssh [ip]。ip 为刚才看到的 ifconfig 结果.如果网络畅通, 将会提示输入用户名密码. 输入即可正确登陆。XSHell下的复制黏贴:复制: ctrl + insert (有些同学的 insert 需要配合 fn 来按)                                                                  粘贴: shift + insert      ctrl + c / ctrl + v 是不行的3.ls指令语法: ls [选项][目录或文件]功能:对于目录,该命令列出该目录下的所有子目录与文件。对于文件,将列出文件名以及其他信息常用选项:-a 列出目录下的所有文件,包括以 . 开头的隐含文件。-d 将目录象文件一样显示,而不是显示其下的文件。 如: ls –d 指定目录-i 输出文件的 i 节点的索引信息。 如 ls –ai 指定文件-k 以 k 字节的形式表示文件的大小。 ls –alk 指定文件-l 列出文件的详细信息。-n 用数字的 UID,GID 代替名称。 (介绍 UID, GID)-F 在每个文件名后附上一个字符以说明该文件的类型, “*”表示可执行的普通文件; “/”表示目录; “@”表示符号链接; “|”表示FIFOs; “=”表示套接字(sockets)。(目录类型识别)-r 对目录反向排序。-t 以时间排序。-s 在l文件名后输出该文件的大小。(大小排序,如何找到目录下最大的文件)-R 列出所有子目录下的文件。 (递归)-1 一行只输出一个文件。这里列出今天刚刚学习到的:①ls指令:显示当前路径下的文件或者目录名称。编辑 ②ls -l指令:显示当前路径下的文件或者目录的更详细的属性信息。PS:文件 = 文件内容数据+文件属性数据。因此文件本身是需要占用空间的,即使是空文件,显示0KB,但其属性是占用空间的。文件之间(普通文件 VS  目录):普通文件就是普通的文件,目录现在可以人为是文件夹。 ③ls -a  与ls -al指令:ls -a显示目录下的所有文件,包括以.开头的隐藏文件。ls -al是ls -a和ls -l的结合,显示更详细的属性信息4.pwd指令语法: pwd功能:显示用户当前所在的目录5.cd指令 语法:cd 目录名功能:改变工作目录。将当前工作目录改变到指定的目录下。cd .. : 返回上级目录。cd /home/litao/linux/ : 绝对路径。cd ../day02/ : 相对路径。cd ~:进入用户家目。cd -:返回最近访问目录。cd指令:切换路径,进入目标路径进行操作。cd . 和cd ..  指令 :cd .是当前目录  cd ..是返回上级目录 。cd ~指令:进入家目录。root的家目录是单独的,而所有用户的家目录,都是在/home/XXX。这里显示root家目录。 cd -指令:返回最近访问目录。可以理解为在两个目录下反复横跳。 在分析绝对路径和相对路径前,先要知道Linux系统中,磁盘上的文件和目录被组成一棵目录树,每个节点都是目录或文件。其实几乎任何操作系统文件的目录组织结构是一颗多叉树。  多叉树,有叶子节点和路上节点(其实就是父节点,或非叶子节点,在Linux下这样称呼比较好理解),路上节点一定只能是目录,而叶子节点,可以是普通文件,也可以是空目录。而我们为什么喜欢用路径来表示一个文件?因为,从根目录到一个文件的路径,是唯一的!尽管某个文件有很多个,放在不同的目录里面,但是,我们可以通过路径,找到那个唯一!因此,这里分为绝对路径和相对路径。cd /home/litao/linux/ : 绝对路径。就是直接从根目录开始往下走。cd ../day02/ : 相对路径。就是,如果我们在处于某个目录中,但是想找到另外的目录中的文件,不需要返回根目录,而是可以通过这个目录,更换到目标目录中,接着找到目标文件。6.touch指令语法:touch [选项]... 文件...功能: touch命令参数可更改文档或目录的日期时间,包括存取时间和更改时间,或者新建一个不存在的文件。常用选项:-a 或--time=atime或--time=access或--time=use只更改存取时间。-c 或--no-create 不建立任何文档。-d 使用指定的日期时间,而非现在的时间。-f 此参数将忽略不予处理,仅负责解决BSD版本touch指令的兼容性问题。-m 或--time=mtime或--time=modify 只更改变动时间。-r 把指定文档或目录的日期时间,统统设成和参考文档或目录的日期时间相同。-t 使用指定的日期时间,而非现在的时间 touch指令就是在Linux下用来创建普通文件的指令7.mkdir指令语法: mkdir [选项] dirname...功能:在当前目录下创建一个名为 “dirname”的目录-p, --parents 可以是一个路径名称。此时若路径中的某些目录尚不存在,加上此选项后,系统将自动建立好那些尚不存在的目录,即一次可以建立多个目录。即在使用mkdir命令创建新的目录时,在其父目录不存在时先创建父目录的指令就是 mkdir -p XXX/XXX8.rmdir指令&&rm指令rmdir是一个与mkdir相对应的命令。 mkdir是建立目录,而rmdir是删除命令。语法: rmdir [-p][dirName]适用对象:具有当前目录操作权限的所有使用者功能:删除空目录常用选项:-p 当子目录被删除后如果父目录也变成空目录的话,就连带父目录一起删除rm命令可以同时删除文件或目录
语法: rm [-f-i-r-v][dirName/dir]适用对象:所有使用者功能:删除文件或目录(rm默认删除普通文件,加上-r,即可删除目录)常用选项:-f 即使文件属性为只读(即写保护),亦直接删除-i 删除前逐一询问确认-r 删除目录及其下所有文件 9.man指令inux的命令有很多参数,我们不可能全记住,我们可以通过查看联机手册获取帮助。也就是说:man是一个查看命令、系统调用、C接口的一个手册。man默认从1号手册开始查找,找到即停。man可以根据手册查找:man 1/2/3 命令/接口/C。访问Linux手册页的命令是:man 语法: man [选项] 命令常用选项-k 根据关键字搜索联机帮助num 只在第num章节找-a 将所有章节的都显示出来,比如 man printf 它缺省从第一章开始搜索,知道就停止,用a选项,当按下q退出,他会继续往后面搜索,直到所有章节都搜索完毕。解释一下,面手册分为8章1 是普通的命令2 是系统调用,如open,write之类的(通过这个,至少可以很方便的查到调用这个函数,需要加什么头文件)3 是库函数,如printf,fread4是特殊文件,也就是/dev下的各种设备文件5 是指文件的格式,比如passwd, 就会说明这个文件中各个字段的含义6 是给游戏留的,由各个游戏自己定义7 是附件还有一些变量,比如向environ这种全局变量在这里就有说明8 是系统管理用的命令,这些命令只能由root使用,如ifconfig10. cp指令语法: cp [选项] 源文件或目录 目标文件或目录功能: 复制文件或目录说明: cp指令用于复制文件或目录,如同时指定两个以上的文件或目录,且最后的目的地是一个已经存在的目录,则它会把前面指定的所有文件或目录复制到此目录中。若同时指定多个文件或目录,而最后的目的地并非一个已存在的目录,则会出现错误信息。cp拷贝目录或者文件,-r -f -i 同rm# ls a firstfile new_firstdir test.txt # cp test.txt a # cd a # ll drwxr-xr-x 3 root root 4096 Sep 18 10:33 b -rw-r--r-- 1 root root 168908 Sep 27 15:47 test.txt11. mv指令mv命令是move的缩写,可以用来移动文件或者将文件改名(move (rename) files),是Linux系统下常用的命令,经常用来备份文件或者目录。1.mv的功能类似剪切功能,移动目录或文件。                                                                                  2.对文件或目录重命名语法: mv [选项] 源文件或目录 目标文件或目录功能:1. 视mv命令中第二个参数类型的不同(是目标文件还是目标目录), mv命令将文件重命名或将其移至一个新的目录中。2. 当第二个参数类型是文件时, mv命令完成文件重命名,此时,源文件只能有一个(也可以是源目录名),它将所给的源文件或目录重命名为给定的目标文件名。3. 当第二个参数是已存在的目录名称时,源文件或目录参数可以有多个, mv命令将各参数指定的源文件均移至目标目录中。常用选项:-f : force 强制的意思,如果目标文件已经存在,不会询问而直接覆盖-i :若目标文件 (destination) 已经存在时,就会询问是否覆盖!# pwd /root/new_firstdir [root@VM-12-9-centos new_firstdir]# ll total 0 # touch aaa # mv aaa new_firstdir # cd new_firstdir [root@VM-12-9-centos new_firstdir]# ls aaa12. cat指令语法: cat [选项][文件]                                                                                                                    功能: 查看目标文件的内容常用选项:-b 对非空输出行编号-n 对输出的所有行编号-s 不输出多行空行# ls a firstfile new_firstdir test.txt ]# cat test.txt hello 106 [0] hello 106 [1] hello 106 [2] hello 106 [3] hello 106 [4] hello 106 [5] hello 106 [6] hello 106 [7] hello 106 [8] hello 106 [9] hello 106 [10]另外,还有一个类似的指令:tac。它的功能,也是cat一样的,打印、显示。不过,cat和tac的打印的顺序是相反的。# tac test.txt hello 106 [10000] hello 106 [9999] hello 106 [9998] hello 106 [9997] hello 106 [9996] hello 106 [9995] hello 106 [9994] hello 106 [9993] hello 106 [9992] hello 106 [9991] hello 106 [9990]13. more指令语法: more [选项][文件]功能: more命令,功能类似 cat常用选项:-n 对输出的所有行编号q 退出more对于上面的cat或tac指令,上面其实是有一万个数据,如果我们需要查找第5千个数据的时候,需要不断的往上或往下翻阅,这其实是很麻烦的,因此,more指令可以解决这样的问题。总的来说:cat和tac适合比较小的文本或者代码段,而more和接下来要说的less指令,适合大的(日志之类的).# more -5000 test.txt hello 106 [4985] hello 106 [4986] hello 106 [4987] hello 106 [4988] hello 106 [4989] hello 106 [4990] hello 106 [4991] hello 106 [4992] hello 106 [4993] hello 106 [4994] hello 106 [4995] hello 106 [4996] hello 106 [4997] hello 106 [4998] hello 106 [4999]14. less指令less 工具也是对文件或其它输出进行分页显示的工具,应该说是linux正统查看文件内容的工具,功能极其强大。less 的用法比起 more 更加的有弹性。在 more 的时候,我们并没有办法向前面翻, 只能往后面看但若使用了 less 时,就可以使用 [pageup][pagedown] 等按键的功能来往前往后翻看文件,更容易用来查看一个文件的内容!除此之外,在 less 里头可以拥有更多的搜索功能,不止可以向下搜,也可以向上搜。语法: less [参数] 文件功能:less与more类似,但使用less可以随意浏览文件,而more仅能向前移动,却不能向后移动,而且less在查看之前不会加载整个文件选项:-i 忽略搜索时的大小写-N 显示每行的行号/字符串:向下搜索“字符串”的功能?字符串:向上搜索“字符串”的功能n:重复前一个搜索(与 / 或 ? 有关)N:反向重复前一个搜索(与 / 或 ? 有关)q:quit15. head指令head 与 tail 就像它的名字一样的浅显易懂,它是用来显示开头或结尾某个数量的文字区块, head 用来显示档案的开头至标准输出中,而 tail 想当然尔就是看档案的结尾语法: head [参数]... [文件]...功能:head 用来显示档案的开头至标准输出中,默认head命令打印其相应文件的开头10行。选项:-n<行数> 显示的行数# head -10 test.txt hello 106 [0] hello 106 [1] hello 106 [2] hello 106 [3] hello 106 [4] hello 106 [5] hello 106 [6] hello 106 [7] hello 106 [8] hello 106 [9]16. tail指令ail 命令从指定点开始将文件写到标准输出.使用tail命令的-f选项可以方便的查阅正在改变的日志文件,tail -f filename会把filename里最尾部的内容显示在屏幕上,并且不但刷新,使你看到最新的文件内容语法: tail[必要参数][选择参数][文件]功能: 用于显示指定文件末尾内容,不指定文件时,作为输入信息进行处理。常用查看日志文件。选项:                                                                                                                                                -f 循环读取-n<行数> 显示行数# tail -3 test.txt hello 106 [9998] hello 106 [9999] hello 106 [10000]这里插入一个点:管道。就当提前预习:如果,我想取中间十行的数据【1000,1010】,怎么办。有两种方法:第一种,是创建一个文件来接收前1010,然后再读取这个文件的后10行,但是这样很麻烦,而已要创建文件。所以,第二种方法是利用管道:这里先浅浅地理解什么是管道:管道是传输资源的东西,一般都要有一个入口一个出口。那么:下面代码中:'  | '就是管道,可以将head看成管道的入口,tail看成管道的出口,而管道里面,先放进了前面的"head -1010 test.txt"的数据,然后tail再从管道里面取"tail -3"的数据。# head -1010 test.txt | tail -3 hello 106 [1000] hello 106 [1001] hello 106 [1002] 那么,我们将这里是数据,进行逆序,那么,再加跟管道进去就好了!# head -1010 test.txt | tail -3 | tac hello 106 [1002] hello 106 [1001] hello 106 [1000]17.时间相关的指令date显示date 指定格式显示时间: date +%Y:%m:%ddate 用法: date [OPTION]... [+FORMAT]1.在显示方面,使用者可以设定欲显示的格式,格式设定为一个加号后接数个标记,其中常用的标记列表如下
%H : 小时(00..23)%M : 分钟(00..59)%S : 秒(00..61)%X : 相当于 %H:%M:%S%d : 日 (01..31)%m : 月份 (01..12)%Y : 完整年份 (0000..9999)%F : 相当于 %Y-%m-%d2.在设定时间方面
date -s //设置当前时间,只有root权限才能设置,其他只能查看。date -s 20080523 //设置成20080523,这样会把具体时间设置成空00:00:00date -s 01:01:01 //设置具体时间,不会对日期做更改date -s “01:01:01 2008-05-23″ //这样可以设置全部时间date -s “01:01:01 20080523″ //这样可以设置全部时间date -s “2008-05-23 01:01:01″ //这样可以设置全部时间date -s “20080523 01:01:01″ //这样可以设置全部时间3.时间戳时间->时间戳: date +%s时间戳->时间: date -d@1508749502Unix时间戳(英文为Unix epoch, Unix time, POSIX time 或 Unix timestamp)是从1970年1月1日(UTC/GMT的午夜)开始所经过的秒数,不考虑闰秒18.cal指令cal命令可以用来显示公历(阳历)日历。公历是现在国际通用的历法,又称格列历,通称阳历。 “阳历”又名“太阳历”,系以地球绕行太阳一周为一年,为西方各国所通用,故又名“西历”命令格式: cal [参数][月份][年份]功能: 用于查看日历等时间信息,如只有一个参数,则表示年份(1-9999),如有两个参数,则表示月份和年份常用选项:-3 显示系统前一个月,当前月,下一个月的月历-j 显示在当年中的第几天(一年日期按天算,从1月1号算起,默认显示当前月在一年中的天数)-y 显示当前年份的日历19 find质指令:-nameLinux下find命令在目录结构中搜索文件,并执行指定的操作。Linux下find命令提供了相当多的查找条件,功能很强大。由于find具有强大的功能,所以它的选项也很多,其中大部分选项都值得我们花时间来了解一下。即使系统中含有网络文件系统( NFS), find命令在该文件系统中同样有效,只你具有相应的权限。在运行一个非常消耗资源的find命令时,很多人都倾向于把它放在后台执行,因为遍历一个大的文件系统可能会花费很长的时间(这里是指30G字节以上的文件系统)语法: find pathname -options功能: 用于在文件树种查找文件,并作出相应的处理(可能访问磁盘,进而导致效率低下)常用选项:-name 按照文件名查找文件[root@VM-12-9-centos ~]# ll total 12 a firstfile lesson4 _firstdir # find ~ -name test.txt a/test.txt a/lesson4/test.txt lesson4/test.txt # cd lesson4 lesson4]# ll total 168 test.txt lesson4]# find ~ -name firstfile /root/firstfile拓展关于搜索查找的指令:①which 用来查找命令的路径的指令# which pwd /usr/bin/pwd # which rm alias rm='rm -i' /usr/bin/rm这里有个指令:alias,它用于对一个指令进行重命名。于此同时,当我们执行ll或ls指令的时候,会发现,文件和目录的颜色不一样,那是因为alias带上了colors的指令。# which ls alias ls='ls --color=auto'②whereis:在特定的路径下,查找指定的文件名对应的指令或者文档# whereis ls ls: /usr/bin/ls /usr/share/man/man1/ls.1.gz # whereis test.txt test: /usr/bin/test /usr/share/man/man1/test.1.gz20. grep指令文本内容的过滤工具,对文本内容进行匹配,匹配成功的进行行显示语法: grep [选项] 搜寻字符串 文件功能: 在文件中搜索字符串,将找到的行打印出来常用选项:-i :忽略大小写的不同,所以大小写视为相同-n :顺便输出行号-v :反向选择,亦即显示出没有 '搜寻字符串' 内容的那一行# grep '9999' test.txt hello 106 [9999] # grep '999' test.txt hello 106 [999] hello 106 [1999] hello 106 [2999] hello 106 [3999] hello 106 [4999] hello 106 [5999] hello 106 [6999] hello 106 [7999] hello 106 [8999] hello 106 [9990] hello 106 [9991] hello 106 [9992] hello 106 [9993] hello 106 [9994] hello 106 [9995] hello 106 [9996] hello 106 [9997] hello 106 [9998] hello 106 [9999]-n带上行号:# grep -n '999' test.txt 1000:hello 106 [999] 2000:hello 106 [1999] 3000:hello 106 [2999] ......-i:# vim test.txt # grep 'abc' test.txt abc abc abc # grep -i 'abc' test.txt abc ABC aBc Abc abc abc ABC # grep 'x' test.txt x x-v:就是反向。补充1:wc -l  统计行数# grep '999' test.txt | wc -l 19补充2:sort  按照ASCII码进行排序(升序)# touch file.txt # ll total 180 a file.txt firstfile lesson4 new_firstdir test.txt # vim file.txt # cat file.txt 111111 2222 33333 44444444 66666 555555 7777 # sort file.txt 111111 2222 33333 44444444 555555 66666 7777补充3:uniq 对文本内容中,相邻,相等的,去重。# vim file.txt # cat file.txt 111111 2222 33333 44444444 66666 555555 7777 7777 7777 7777 22222 555555 555555 555555 # uniq file.txt 111111 2222 33333 44444444 66666 555555 7777 22222 555555我们发现,有一些没有相邻的,没去去重,我们可以利用管道,先进行排序,然后再去重# sort file.txt 111111 2222 22222 33333 44444444 555555 555555 555555 555555 66666 7777 7777 7777 7777 # sort file.txt | uniq 111111 2222 22222 33333 44444444 555555 66666 777721. zip/unzip指令语法: zip 压缩文件.zip 目录或文件功能: 将目录或文件压缩成zip格式常用选项:-r 递 归处理,将指定目录下的所有文件和子目录一并处理例子:将test2目录压缩: zip test2.zip test2/*解压到tmp目录: unzip test2.zip -d /tmpzip默认对一个目录进行打包压缩的时候,只会对目录文件打包压缩,也就是目录文件的内容不达标压缩。于此,需要加上-r递归。zip -r 你的压缩包(自定义) dir(要打包压缩的目录) unzip  你的压缩包(自定义)--在当前目录下进行解包解压的功能# mkdir temp # ll total 244 a file.txt firstfile lesson4 my.zip new_firstdir temp test.txt # mv my.zip temp # ll total 188 a 5 file.txt firstfile lesson4 53 new_firstdir temp test.txt # cd temp # ll total 56 my.zip # unzip my.zip Archive: my.zip creating: a/ creating: a/b/ creating: a/b/c/ creating: a/b/c/d/ inflating: a/test.txt creating: a/lesson4/ inflating: a/lesson4/test.txt inflating: a/my.tgz # ll total 60 a my.zip # tree a a |-- b | `-- c | `-- d |-- lesson4 | `-- test.txt |-- my.tgz `-- test.txt 4 directories, 3 files # pwd /root/temp # ll total 60 a my.zip # cd a # ll total 204 b lesson4 my.tgz test.txt # tree b b `-- c `-- d 2 directories, 0 files # less test.txt [3]+ Stopped less test.txt上面的是解压到当前目录,那么,接下来的指令,便是解压到指定目录中:unzip my.zip -d /home/XXX或者/root/XXX# unzip my.zip -d /root/a/b Archive: my.zip creating: /root/a/b/a/ creating: /root/a/b/a/b/ creating: /root/a/b/a/b/c/ creating: /root/a/b/a/b/c/d/ inflating: /root/a/b/a/test.txt creating: /root/a/b/a/lesson4/ inflating: /root/a/b/a/lesson4/test.txt inflating: /root/a/b/a/my.tgz22 tar指令:打包/解包,不打开,直接看内容tar [-cxtzjvf] 文件与目录 ....
参数: 
-c :建立一个压缩文件的参数指令(create 的意思);-x :解开一个压缩文件的参数指令!-t :查看 tarfile 里面的文件!-z :是否同时具有 gzip 的属性?亦即是否需要用 gzip 压缩?-j :是否同时具有 bzip2 的属性?亦即是否需要用 bzip2 压缩?-v :压缩的过程中显示文件!这个常用,但不建议用在背景执行过程!-f :使用档名,请留意,在 f 之后要立即接档名喔!不要再加参数!-C : 解压到指定目录下面代码中,分别实现了打包压缩和解包的操作:打包压缩:tar -czf oh.tgz(压缩包名字)lesson4(压缩的目录文件名)(czf:c代表创建一个压缩包,z代表使用z代表的算法,f代表文件名)。 解包:tar -xzf oh.tgz  x代表解开压缩包。如果带个v,-xzvf  -cvzf   会把过程显示出来。~]# ll total 188 a file.txt firstfile lesson4 firstdir temp test.txt ~]# tree lesson4 lesson4 `-- test.txt 0 directories, 1 file ~]# tar -czf oh.tgz lesson4 ~]# ll total 216 a file.txt firstfile lesson4 new_firstdir oh.tgz temp test.txt ~]# mv oh.tgz temp ~]# cd temp temp]# ll total 28 oh.tgz # tar -xzf oh.tgz # ll total 32 lesson4 oh.tgz # tree lesson4 lesson4 `-- test.txt 0 directories, 1 file不解压,看里面的内容:相当于windows下,点开压缩包,查看里面的东西一样。-ttemp]# tar -tf oh.tgz lesson4/ lesson4/test.txt上面的也是默认到当前目录。那么,指定路径解压,就需要  -C  指令a file.txt firstfile new_firstdir temp test.txt ~]# cd temp temp]# ll total 32 lesson4 oh.tgz temp]# tar -xzvf oh.tgz -C ~ lesson4/ lesson4/test.txt temp]# ls ~ a file.txt firstfile lesson4 new_firstdir temp test.txt23. bc指令:bc命令可以很方便的进行浮点运算。temp]# bc bc 1.06.95 30-90 -60 3.25-36.3 -33.05 ^Z [5]+ Stopped bc temp]# echo "1+2+3+4+5+6+7+8+9" 1+2+3+4+5+6+7+8+9 temp]# echo "1+2+3+4+5+6+7+8+9" | bc 4524. uname -r 指令语法: uname [选项]功能: uname用来获取电脑和操作系统的相关信息。补充说明: uname可显示linux主机所用的操作系统的版本、硬件的名称等基本信息                        常用选项: 
-a或–all 详细输出所有信息,依次为内核名称,主机名,内核版本号,内核版本,硬件名,处理器类型,硬件平台类型,操作系统名称25. 重要的几个热键[TAB],[ctrl]-c,[ctrl]-d[Tab]按键---具有『命令补全』和『档案补齐』的功能[Ctrl]-c按键---让当前的程序『停掉』[Ctrl]-d按键---通常代表着:『键盘输入结束(End Of File, EOF 戒 End OfInput)』的意思;另外他也可以用来取代exit26. 关机语法: shutdown [选项] ** 常见选项: ** -h :将系统的服务停掉后,立即关机。                                                                                               -r : 在将系统的服务器停掉之后就重新启动。                                                                                   -t sec : -t 后面加秒数,亦即『过几秒后关机』的意思27. 拓展◆安装和登录命令: login、 shutdown、 halt、 reboot、 install、 mount、 umount、 chsh、 exit、 last;◆ 文件处理命令: file、 mkdir、 grep、 dd、 find、 mv、 ls、 diff、 cat、 ln;◆ 系统管理相关命令: df、 top、 free、 quota、 at、 lp、 adduser、 groupadd、 kill、 crontab;◆ 网络操作命令: ifconfig、 ip、 ping、 netstat、 telnet、 ftp、 route、 rlogin、 rcp、 finger、 mail、 nslookup;◆ 系统安全相关命令: passwd、 su、 umask、 chgrp、 chmod、 chown、 chattr、 sudo ps、 who;◆ 其它命令: tar、 unzip、 gunzip、 unarj、 mtools、 man、 unendcode、 uudecode。查看xpu:lscpu查看内存:lsmem查看磁盘:df -h查看登录了服务器的账号,也就是用户:who 27.1 shell命令以及运行原理Linux严格意义上说的是一个操作系统,我们称之为“核心(kernel) “ ,但我们一般用户,不能直接使用kernel。而是通过kernel的“外壳”程序,也就是所谓的shell,来与kernel沟通。如何理解?为什么不能直接使用kernel?从技术角度, Shell的最简单定义:命令行解释器(command Interpreter)主要包含:将使用者的命令翻译给核心(kernel)处理。同时,将核心的处理结果翻译给使用者。对比windows GUI,我们操作windows 不是直接操作windows内核,而是通过图形接口,点击,从而完成我们的操作(比如进入D盘的操作,我们通常是双击D盘盘符.或者运行起来一个应用序)。shell 对于Linux,有相同的作用,主要是对我们的指令进行解析,解析指令给Linux内核。反馈结果在通过内核运行出结果,通过shell解析给用户。windows的图形界面,本质也是一种外壳程序。所以,Linux shell命令行外壳 和 Windows图形界面,本质是一样的。通过用户——shell——内核这样的结构,可以有效的执行很多指令。当用户传入的是非法指令,那么,shell会直接拒绝,不需要进入到内核当中,也起到了保护作用。帮助理解:如果说你是一个闷骚且害羞的程序员,那shell就像媒婆,操作系统内核就是你们村头漂亮的且有让你心动的MM小花。你看上了小花,但是有不好意思直接表白,那就让你你家人找媒婆帮你提亲,所有的事情你都直接跟媒婆沟通,由媒婆转达你的意思给小花,而我们找到媒婆姓王,所以我们叫它王婆,它对应我们常使用的bash27.2 Linux权限的概念Linux下有两种用户:超级用户(root)、普通用户。超级用户:可以再linux系统下做任何事情,不受限制普通用户:在linux下做有限的事情。超级用户的命令提示符是“#”,普通用户的命令提示符是“$命令: su [用户名]功能:切换用户。例如,要从root用户切换到普通用户user,则使用 su user。 要从普通用户user切换到root用户则使用 su root(root可以省略),此时系统会提示输入root用户的口令~]$ su - Password: ~]# pwd /root ~]# su wjmhlh ]$ pwd /root ]$ whoami wjmhlh ]$ su Password: ~]# whoami root ~]# su wjmhlh ]$ # 代表root $代表普通用户~]$ pwd /home/wjmhlh ~]$ clear ~]$ mkdir lesson5 ~]$ ll total 4 lesson5 ~]$ cd lesson5 lesson5]$ su - Password: Last login: Thu Oct 6 16:44:28 CST 2022 on pts/0 ~]# whoami root ~]# exit logout lesson5]$ whoami wjmhlh lesson5]$ su Password: lesson5]# whoami root lesson5]# su wjmhlh lesson5]$ whoami wjmhlh lesson5]$ root与普通用户之间的切换操作当我不想转换为root用户,但是需要root权限的时候,可以使用命令sudo XXX。但会有个问题,那就是,root对普通用户的信任。当root对普通用户信任的时候,使用sudo XXX后,再使用whoami的时候,发现,我们的权限更换成了root,但不信任,就会出现以下信息: ~]$ sudo whoami[sudo] password for wjmhlh: wjmhlh is not in the sudoers file.  This incident will be reported.所以,说了那么多,什么是权限?我们为什么需要权限?①什么是权限?权限是约束人的(一个人或者某群体)。而对于人,并不是真的去约束这个人或群体,而是对他的身份或者扮演的角色进行约束。就好比如,我张三,开通了某奇艺的会员,我张三能去看VIP电影,真的是因为我是张三,所以我能看吗?如果另外的一个人也是叫做张三,他能也看吗?不!仅仅是因为我在某奇艺的身份!因此:文件权限 = 人+文件属性。27.3 Linux权限管理续上:人—>角色—>权限。角色(身份)->(拥有者:owner     其他人:other    所属组:grouper)lesson5]$ touch file.txtlesson5]$ lltotal 0-rw-rw-r-- 1 wjmhlh wjmhlh 0 Oct  7 13:43 file.txt第一个红: 文件的拥有者  第二个红:文件的所属组,因为这里只有我一个人在用。对于其他人,则是,用户在访问这个文件的时候,用户名与拥有者和所属组的名字不相同的话,那么,这个用户就是其他人other。那么,为什么要存在所属组呢?文件创建者是拥有者,拥有者以外的是其他人,这个能很好的理解。而所属组,用来干嘛?所谓组,那就是一群人在一起奋斗的组别。如果有一天,一个项目中,几个团队共用一个Linux机器。我想要给我的团队小组成员看我的代码,即对他们开源,但是又不能给小组以外的人看,如果没有所属组,那么,我需要将拥有者和其他人的权限放开,这时候,只要是个人都能查看我的代码了。这样显然是绝对不行的!因此,所属组就诞生了,我只需要在所属组中,对我的小组成员开源,其他人也不能看见,这就皆大欢喜了!文件属性是啥?文件属性:r(读权限)  w(写权限)   x(执行权限)来介绍一些小知识:lesson5]$ mkdir dirlesson5]$ lltotal 4drwxrwxr-x 2 wjmhlh wjmhlh 4096 Oct  7 14:00 dir第一个红: 文件的拥有者  第二个红:文件的所属组第一个绿是文件大小。第一个蓝指的是最近修改或创建时间之前我们讲过,文件 = 内容 + 属性。所以,上面那行信息,都是属于文件的属性而橙色部分:看第一个字符:剩下的部分:所以,我们阅读权限的正确方法是:drwxrwxr-x 2 wjmhlh wjmhlh 4096 Oct  7 14:00 dir对于dir,它是目录文件,拥有者允许读权限、写权限和执行权限;所属组允许读权限、写权限和执行权限;其他人允许读权限,不允许写权限,允许执行权限。同理:-rw-rw-r-- 1 wjmhlh wjmhlh 0 Oct  7 13:43 file.txt对于file.txt,它是普通文件,拥有者允许读权限、写权限和不允许执行权限;所属组允许读权限、写权限和不允许执行权限;其他人允许读权限,不允许写权限,不允许执行权限。还有噢,其实,rwx可以使用二进制来表示,然后三组转换成八进制数字。结合以下的指令:# chmod 664 /home/abc.txt# chmod 640 /home/abc.txt接下来,我们来看看,如何操作权限?chmod
功能: 设置文件的访问权限格式: chmod [参数] 权限 文件名常用选项:R -> 递归修改目录文件的权限说明:只有文件的拥有者和root才可以改变文件的权限chmod① 用户表示符+/-=权限字符+:向权限范围增加权限代号所表示的权限-:向权限范围取消权限代号所表示的权限=:向权限范围赋予权限代号所表示的权限用户符号:u:拥有者g:拥有者同组用o:其它用户a:所有用户例子:lesson5]$ whoamiwjmhlhlesson5]$ chmod u-r file.txt      //去掉读权限  //拥有者lesson5]$ lltotal 4drwxrwxr-x 2 wjmhlh wjmhlh 4096 Oct  7 14:00 dir--w-rw-r-- 1 wjmhlh wjmhlh    0 Oct  7 13:43 file.txtlesson5]$ chmod u+x file.txt   // 增加执行权限   //拥有者lesson5]$ lltotal 4drwxrwxr-x 2 wjmhlh wjmhlh 4096 Oct  7 14:00 dir--wxrw-r-- 1 wjmhlh wjmhlh    0 Oct  7 13:43 file.txt lesson5]$ chmod u-rwx file.txt // 去掉读写执行权限   //拥有者 lesson5]$ lltotal 4drwxrwxr-x 2 wjmhlh wjmhlh 4096 Oct  7 14:00 dir----rw-r-- 1 wjmhlh wjmhlh    0 Oct  7 13:43 file.txt lesson5]$ chmod u+rw file.txt //增加读和执行权限   //拥有者lesson5]$ lltotal 4drwxrwxr-x 2 wjmhlh wjmhlh 4096 Oct  7 14:00 dir-rw-rw-r-- 1 wjmhlh wjmhlh    0 Oct  7 13:43 file.txt同时,如果我们需要同时操作拥有者、所属组和其他人的权限,可以用逗号分开。chmod u-rwx,u-rwx,g-rwx,o-rwx file.txt也可以chmod a-rwx file.txt值得注意的是,当拥有者的某个权限失效时,但是所属组拥有,我们使用拥有者来操作这个失效的权限时,依然无法执行,既然所属组拥有这个权限。因为,对于拥有者——所属组——other,是if——else if ——else的关系,拥有者没有,那么,匹配到所属组也是拥有者的名称,那么,这个权限也不能使用!还有一点的就是:root不受权限的约束!root能够删掉普通用户的任意权限,但是却可以在用户没有这个文件的权限的时候,去操作这个权限!比如,-rw-rw-r-- 1 wjmhlh wjmhlh    0 Oct  7 13:43 file.txt。普通用户只有r和w。root可以将两个也删掉,变成----rw-r--。删掉后,root依然可以对其进行读写和执行。你一点脾气都没有!所以,root一定不能丢!chown功能:修改文件的拥有者格式: chown [参数] 用户名 文件名实例:# chown user1 f1# chown -R user1 filegroup1当然,当你要将一个东西给别人的时候,需要跟别人说一声,我们可以使用sudo 来强行给。————sudo chown XXX file同时将拥有者和所属组都修改给别人,那么是这样的:sudo chown XXX:XXX filechgrp
功能:修改文件或目录的所属组格式: chgrp [参数] 用户组名 文件名常用选项: -R 递归修改文件或目录的所属组file指令:功能说明:辨识文件类型。语法: file [选项] 文件或目录常用选项-c 详细显示指令执行过程,便于排错或分析程序执行的情形。-z 尝试去解读压缩文件的内容。drwxrwxr-x 2 wjmhlh wjmhlh 4096 Oct  7 14:00 dir-rw-rw-r-- 1 wjmhlh wjmhlh    0 Oct  7 13:43 file.txt lesson5]$ file dirdir: directory lesson5]$ file file.txtfile.txt: empty关于使用sudo分配权限,后续再解析。为什么需要权限?便于系统进行安全的管理。那么,为什么在我们创建了目录或者文件后,默认的权限是我们所看到的样子,意思是说,为什么一开始不是rwx rwx rwx,而是rw- rw- rw-等等类似的样子?因为Linux规定:目录:起始权限:777文件:起始权限:666$umask0002——>000 000 010---系统会默认配置好umask 权限掩码:反是在umask出现的权限,都必须在起始权限中去掉!!!这里的去掉,不是做减法,而是按位与&最终权限 = 起始权限 & (^umask),即现对umask进行按位取反,再进行按位与,最终得到最终权限。比如:值得注意的一点是:umask是可以被修改的,也就是,默认的是0002,但我们可以修改为0444,0000.     umask 0000最后,关于rwx的一些补充:r—读权限,并不决定我们能否进入这个目录或文件,而是决定了我们能否使用ls或ll来查看里面的内容    w—写权限。决定了能否在目录里面创建文件或目录x—执行权限,决定了我们能否进入这个目录或文件所以为什么系统规定目录的默认的权限是777开始的?因为所有的目录,在创建初,一般都可以进入(x)。
文章
人工智能  ·  安全  ·  算法  ·  Unix  ·  Linux  ·  Shell  ·  程序员  ·  网络安全  ·  数据安全/隐私保护  ·  Windows
2023-03-17
...
跳转至:
开发与运维
5786 人关注 | 133444 讨论 | 319524 内容
+ 订阅
  • 算法题学习链路简要分析与面向 ChatGPT 编程
  • Linux的IPtables可以阻挡ddos攻击吗?底层原理是什么?
  • Codeup的实用性评价
查看更多 >
数据库
252947 人关注 | 52318 讨论 | 99273 内容
+ 订阅
  • MySQL的数值型数据类型是干什么的?使用场景是什么?底层原理是什么?
  • MySQL的日期/时间型数据类型是干什么的?使用场景是什么?底层原理是什么?
  • MySQL的字符型数据类型是干什么的?使用场景是什么?底层原理是什么?
查看更多 >
安全
1247 人关注 | 24148 讨论 | 85898 内容
+ 订阅
  • MySQL的字符型数据类型是干什么的?使用场景是什么?底层原理是什么?
  • MySQL的SSL/TLS支持是什么意思?具体如何使用?底层原理是什么?
  • Linux的IPtables可以阻挡ddos攻击吗?底层原理是什么?
查看更多 >
大数据
188713 人关注 | 30991 讨论 | 83955 内容
+ 订阅
  • Yii2如何进行代码审查?具体怎么做?底层原理是什么?
  • 什么是 WebSocket 协议?底层原理是什么?
  • office全版本软件安装包(win+mac版本)——2016office软件下载
查看更多 >
人工智能
2875 人关注 | 12395 讨论 | 102680 内容
+ 订阅
  • 算法题学习链路简要分析与面向 ChatGPT 编程
  • Codeup的实用性评价
  • 基于PSO三维极点搜索matlab仿真
查看更多 >