Sqoop1 From PostgreSQL to Hdfs-阿里云开发者社区

开发者社区> 数据库> 正文

Sqoop1 From PostgreSQL to Hdfs

简介: 环境:   Sqoop1 Install And Test MySQL/PostgreSQL 参数解析: --connect: JDBC连接URL --username:连接数据库用户名 --password:连接数据库密码 --table:   要读取的表 -m:map并行读取的数量 含义:读取user_info表数据到HDFS集群,并叧通过一个map任务 注意:此Sqoop命令没有指定HDFS目录,默认数据会放在/user/{user.name}/{--table参数指定表名}目录下。

环境:   Sqoop1 Install And Test MySQL/PostgreSQL
参数解析:
--connect: JDBC连接URL
--username:连接数据库用户名
--password:连接数据库密码
--table:   要读取的表
-m:map并行读取的数量
含义:读取user_info表数据到HDFS集群,并叧通过一个map任务
注意:此Sqoop命令没有指定HDFS目录,默认数据会放在/user/{user.name}/{--table参数指定表名}目录下。


问题1:  org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2_temp does not exist(表的relation不存在,也就是说该用户没有查到在默认schema下的这表)
[root@sht-sgmhadoopnn-01 bin]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali  --table place_openhoursv2_temp  -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 10:51:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 10:51:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 10:51:57 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 10:51:57 INFO tool.CodeGenTool: Beginning code generation
16/07/30 10:51:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM place_openhoursv2_temp AS t LIMIT 1
16/07/30 10:51:58 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2_temp does not exist
  Position: 17
org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2 does not exist
  Position: 17
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
        at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/30 10:51:58 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 bin]#


解决方法: 修改用户的默认的schema
[postgres@sht-sgmhadoopcm-01 bin]$ ./psql -U denaliadmin denali
psql (9.5.1)
Type "help" for help.

denali=# show search_path;
   search_path  
-----------------
 "$user", public
(1 row)

denali=# alter role denaliadmin set search_path to factual_search_na_16q2_20160722_epl,public;
ALTER ROLE
denali=# show search_path;                                                                   
   search_path  
-----------------
 "$user", public
(1 row)

denali=# \q

[postgres@sht-sgmhadoopcm-01 bin]$ ./psql -U denaliadmin denali
psql (9.5.1)
Type "help" for help.

denali=# show search_path;
                 search_path                
---------------------------------------------
 factual_search_na_16q2_20160722_epl, public
(1 row)

denali=#

 

错误2: org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
###  --table factual_search_na_16q2_20160722_epl.place_openhoursv2_temp 错误的,--table 只能填写table名称,不能带schema的名称(其实无需带schema名称,因为rearch_path已经指定该用户的默认的schema的是factual_search_na_16q2_20160722_epl了)
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali  --table factual_search_na_16q2_20160722_epl.place_openhoursv2_temp  -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:39:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:39:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:39:50 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:39:50 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:39:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" AS t LIMIT 1
16/07/30 12:39:50 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
  Position: 17
org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
  Position: 17
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
        at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/30 12:39:50 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

[root@sht-sgmhadoopnn-01 hadoop]#

解决方法: --table不带 schema

 


错误3:  orm.ClassWriter: No Java type for SQL type 1111 for column new_value(查询语句或者表的字段 new_value类型 无法转换)
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali  --table place_openhoursv2_temp  -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:39:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:39:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:39:04 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:39:04 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:39:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "place_openhoursv2_temp" AS t LIMIT 1
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.sqoop.orm.ClassWriter.parseNullVal(ClassWriter.java:1377)
        at org.apache.sqoop.orm.ClassWriter.parseColumn(ClassWriter.java:1402)
        at org.apache.sqoop.orm.ClassWriter.myGenerateParser(ClassWriter.java:1528)
        at org.apache.sqoop.orm.ClassWriter.generateParser(ClassWriter.java:1491)
        at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1920)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1736)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#


解决方法: 在sql语句中将该列 new_value的字段类型由json改为 text
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import  --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali  \
> --query 'select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where  $CONDITIONS ' \
> --split-by place_id \
> --target-dir /sqoop1/test7  \
> -m 3 \
> --null-string '' --null-non-string ''

  Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:36:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:36:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:36:40 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:36:40 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:36:40 INFO manager.SqlManager: Executing SQL statement: select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where   (1 = 0) 
16/07/30 12:36:40 INFO manager.SqlManager: Executing SQL statement: select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where   (1 = 0) 
16/07/30 12:36:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop/hadoop/share/hadoop/mapreduce
Note: /tmp/sqoop-root/compile/b0a9ae11fdaa3e0b3754c0c70f8113f8/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/07/30 12:36:44 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b0a9ae11fdaa3e0b3754c0c70f8113f8/QueryResult.jar
16/07/30 12:36:44 INFO mapreduce.ImportJobBase: Beginning query import.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/07/30 12:36:45 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/07/30 12:36:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/07/30 12:36:56 INFO db.DBInputFormat: Using read commited transaction isolation
16/07/30 12:36:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(place_id), MAX(place_id) FROM (select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where   (1 = 1)  ) AS t1
16/07/30 12:36:56 INFO mapreduce.JobSubmitter: number of splits:3
16/07/30 12:36:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469795998430_0014
16/07/30 12:36:57 INFO impl.YarnClientImpl: Submitted application application_1469795998430_0014
16/07/30 12:36:58 INFO mapreduce.Job: The url to track the job: http://sht-sgmhadoopnn-01:8088/proxy/application_1469795998430_0014/
16/07/30 12:36:58 INFO mapreduce.Job: Running job: job_1469795998430_0014
16/07/30 12:37:09 INFO mapreduce.Job: Job job_1469795998430_0014 running in uber mode : false
16/07/30 12:37:09 INFO mapreduce.Job:  map 0% reduce 0%
16/07/30 12:37:17 INFO mapreduce.Job:  map 33% reduce 0%
16/07/30 12:37:18 INFO mapreduce.Job:  map 67% reduce 0%
16/07/30 12:37:19 INFO mapreduce.Job:  map 100% reduce 0%
16/07/30 12:37:20 INFO mapreduce.Job: Job job_1469795998430_0014 completed successfully
16/07/30 12:37:20 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=426603
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=373
                HDFS: Number of bytes written=184494
                HDFS: Number of read operations=12
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=6
        Job Counters
                Launched map tasks=3
                Other local map tasks=3
                Total time spent by all maps in occupied slots (ms)=20412
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=20412
                Total vcore-seconds taken by all map tasks=20412
                Total megabyte-seconds taken by all map tasks=20901888
        Map-Reduce Framework
                Map input records=1000
                Map output records=1000
                Input split bytes=373
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=153
                CPU time spent (ms)=3990
                Physical memory (bytes) snapshot=525692928
                Virtual memory (bytes) snapshot=2674200576
                Total committed heap usage (bytes)=316145664
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=184494
16/07/30 12:37:20 INFO mapreduce.ImportJobBase: Transferred 180.1699 KB in 34.3762 seconds (5.2411 KB/sec)
16/07/30 12:37:20 INFO mapreduce.ImportJobBase: Retrieved 1000 records.
You have mail in /var/spool/mail/root


[root@sht-sgmhadoopnn-01 hadoop]# hadoop fs -ls  /sqoop1/test7
Found 4 items
-rw-r--r--   3 root root          0 2016-07-30 12:37 /sqoop1/test7/_SUCCESS
-rw-r--r--   3 root root     184208 2016-07-30 12:37 /sqoop1/test7/part-m-00000
-rw-r--r--   3 root root        143 2016-07-30 12:37 /sqoop1/test7/part-m-00001
-rw-r--r--   3 root root        143 2016-07-30 12:37 /sqoop1/test7/part-m-00002
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

分享:
数据库
使用钉钉扫一扫加入圈子
+ 订阅

分享数据库前沿,解构实战干货,推动数据库技术变革

其他文章