HDFS的文件操作-阿里云开发者社区

HDFS的文件操作

2017-11-07 981

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

HDFS的文件操作
格式化HDFS
命令：user@namenode:hadoop$ bin/hadoop namenode -format
启动HDFS
命令：user@namenode:hadoop$ bin/start-dfs.sh

列出HDFS上的文件

命令：user@namenode:hadoop$ bin/hadoop dfs -ls

使用hadoop API

Java代码

public List<String[]> GetFileBolckHost(Configuration conf, String FileName) {
try {
List<String[]> list = new ArrayList<String[]>();
FileSystem hdfs = FileSystem.get(conf);
Path path = new Path(FileName);
FileStatus fileStatus = hdfs.getFileStatus(path);
BlockLocation[] blkLocations = hdfs.getFileBlockLocations(
fileStatus, 0, fileStatus.getLen());
int blkCount = blkLocations.length;
for (int i = 0; i < blkCount; i++) {
String[] hosts = blkLocations[i].getHosts();
list.add(hosts);
}
return list;
} catch (IOException e) {
e.printStackTrace();
}
return null;
}

在HDFS上创建目录
命令：user@namenode:hadoop$ bin/hadoop dfs -mkdir /文件名

使用hadoop API

Java代码

// 在HDFS新建文件
public FSDataOutputStream CreateFile(Configuration conf, String FileName) {
try {
FileSystem hdfs = FileSystem.get(conf);
Path path = new Path(FileName);
FSDataOutputStream outputStream = hdfs.create(path);
return outputStream;
} catch (IOException e) {
e.printStackTrace();
}
return null;
}

上传一个文件到HDFS
命令：user@namenode:hadoop$ bin/hadoop dfs -put 文件名 /user/yourUserName/

使用hadoop API

Java代码

// 上传文件到HDFS
public void PutFile(Configuration conf, String srcFile, String dstFile) {
try {
FileSystem hdfs = FileSystem.get(conf);
Path srcPath = new Path(srcFile);
Path dstPath = new Path(dstFile);
hdfs.copyFromLocalFile(srcPath, dstPath);
} catch (IOException e) {
e.printStackTrace();
}
}

从 HDFS 中导出数据

命令：user@namenode:hadoop$ bin/hadoop dfs -cat foo

使用hadoop API

Java代码

// 从HDFS读取文件
public void ReadFile(Configuration conf, String FileName) {
try {
FileSystem hdfs = FileSystem.get(conf);
FSDataInputStream dis = hdfs.open(new Path(FileName));
IOUtils.copyBytes(dis, System.out, 4096, false);
dis.close();
} catch (IOException e) {
e.printStackTrace();
}
}

HDFS 的关闭
命令：user@namenode:hadoop$ bin/stop-dfs.sh

HDFS全局状态信息

命令：bin/hadoop dfsadmin -report

我们可以得到一份全局状态报告。这份报告包含了HDFS集群的基本信息，当然也有每台机器的一些情况。

以上讲的都是本地操作HDFS，都是基于在ubuntu下并配置有hadoop环境下对HDFS的操作，作为客户端也可以在window系统下远程的对 HDFS进行操作，其实原理基本上差不多，只需要集群中namenode对外开放的IP和端口，就可以访问到HDFS

Java代码

/**
* 对HDFS操作
* @author yujing
*
*/
public class Write {
public static void main(String[] args) {
try {
uploadTohdfs();
readHdfs();
getDirectoryFromHdfs();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void uploadTohdfs() throws FileNotFoundException, IOException {
String localSrc = "D://qq.txt";
String dst = "hdfs://192.168.1.11:9000/usr/yujing/test.txt";
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
System.out.println(".");
}
});
System.out.println("上传文件成功");
IOUtils.copyBytes(in, out, 4096, true);
}
/** 从HDFS上读取文件 */
private static void readHdfs() throws FileNotFoundException, IOException {
String dst = "hdfs://192.168.1.11:9000/usr/yujing/test.txt";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
FSDataInputStream hdfsInStream = fs.open(new Path(dst));
OutputStream out = new FileOutputStream("d:/qq-hdfs.txt");
byte[] ioBuffer = new byte[1024];
int readLen = hdfsInStream.read(ioBuffer);
while (-1 != readLen) {
out.write(ioBuffer, 0, readLen);
readLen = hdfsInStream.read(ioBuffer);
}
System.out.println("读文件成功");
out.close();
hdfsInStream.close();
fs.close();
}
/**
* 以append方式将内容添加到HDFS上文件的末尾;注意：文件更新，需要在hdfs-site.xml中添<property><name>dfs.
* append.support</name><value>true</value></property>
*/
private static void appendToHdfs() throws FileNotFoundException,
IOException {
String dst = "hdfs://192.168.1.11:9000/usr/yujing/test.txt";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
FSDataOutputStream out = fs.append(new Path(dst));
int readLen = "zhangzk add by hdfs java api".getBytes().length;
while (-1 != readLen) {
out.write("zhangzk add by hdfs java api".getBytes(), 0, readLen);
}
out.close();
fs.close();
}
/** 从HDFS上删除文件 */
private static void deleteFromHdfs() throws FileNotFoundException,
IOException {
String dst = "hdfs://192.168.1.11:9000/usr/yujing";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
fs.deleteOnExit(new Path(dst));
fs.close();
}
/** 遍历HDFS上的文件和目录 */
private static void getDirectoryFromHdfs() throws FileNotFoundException,
IOException {
String dst = "hdfs://192.168.1.11:9000/usr/yujing";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
FileStatus fileList[] = fs.listStatus(new Path(dst));
int size = fileList.length;
for (int i = 0; i < size; i++) {
System.out.println("文件名name:" + fileList[i].getPath().getName()
+ "文件大小/t/tsize:" + fileList[i].getLen());
}
fs.close();
}
}

我们可以通过http://主机IP：50030就可以查看集群的所有信息，也可以查看到自己上传到HDFS上的文件

本文转自 vfast_chenxy 51CTO博客，原文链接：http://blog.51cto.com/chenxy/823255，如需转载请自行联系原作者

HDFS的文件操作

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

HDFS的文件操作

热门文章

最新文章

相关课程

相关电子书