java怎么读取并获取hive的textfile文件内容：报错

如图使用FileSystem按字节数读取文件时出现了一行数据不全的情况。代码如下：//text格式存储
               Path filePath = new Path(location);
               FileStatus[] status = null;
               FileSystem fs = null;


               Configuration conf = new Configuration();
               conf.set("fs.defaultFS", "hdfs://192.168.20.181:8020");
               conf.set("fs.permissions.umask-mode", "002");
               System.getProperties().setProperty("HADOOP_USER_NAME", "hdfs");
               try {
                   fs = FileSystem.get(conf);
               } catch (Exception e) {
                   e.printStackTrace();
                   logger.error("获取hdfs目录列表失败！");
               }

               try {
                   status = fs.listStatus(filePath);
               } catch (Exception e) {
                   e.printStackTrace();
                   throw new Exception("HdfsListDirectory:" + e.getMessage());
               }

               long start = (new Date()).getTime();


               List<Future> taskList = new ArrayList<Future>();
               for (FileStatus st : status) {

                   MyCallable myCallable = new MyCallable(st, fs,lineDelim,fieldDelim,cloumnList);
                   FutureTask task = new FutureTask(myCallable);
                   pool.submit(task);
                   taskList.add(task);
               }

               int count = 0;

               for(Future f:taskList){
                   count =count +(Integer)f.get();
               }

请问各位大佬：java有什么方法读取hive中textfile文件，orc文件的读取已经完成。

自问自答一次。可以使用LazySimpleSerDe去读取文件，思路来自读取orc文件。具体代码如下：

//text格式存储
               Path filePath = new Path(location);
               Properties p = new Properties();
               LazySimpleSerDe serde = new LazySimpleSerDe();
               JobConf conf = new JobConf();
               conf.set("fs.default.name","hdfs://192.168.20.181:8020");
               p.setProperty("columns", cloNameSB.toString());
       p.setProperty("columns.types", cloTypeSB.toString());
       serde.initialize(conf, p);
       StructObjectInspector inspector = (StructObjectInspector) serde.getObjectInspector();
       TextInputFormat in = new TextInputFormat();
       in.configure(conf);
       TextInputFormat.setInputPaths(conf, filePath);
       InputSplit[] splits = in.getSplits(conf, 1);
       long start = (new Date()).getTime();
       int count = 0;
       for(InputSplit split:splits){
           RecordReader reader = in.getRecordReader(split, conf, Reporter.NULL);
           Object key = reader.createKey();
           Object value = reader.createValue();
           List<? extends StructField> fields = inspector.getAllStructFieldRefs();
           while(reader.next(key, value)) {
               Map<String,Object> map = new HashMap<String,Object>();
               for(int i = 0; i<fields.size(); i++){
                   Text text = (Text)value;
                   String[] vs = text.toString().split(fieldDelim);
                   map.put(fields.get(i).getFieldName(), vs[i]);
               }
           count++;
           }
       }

               System.out.println((new Date()).getTime()-start);
               System.out.println(count);

这样虽然比直接按字节读取文件要慢一点，但是更准确

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

java怎么读取并获取hive的textfile文件内容：报错

相关文章