HDFS中PathFilter类对路径进行过滤

退役的龙弟弟

浏览: 447051 次
性别:
来自: 北京

最近访客更多访客>>

cuker919

hui963966800

xyz86868

一个java程序员

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hadoop

1、定义类实现PathFilter接口

package com.ru.hadoop.wordcount;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;

/**
 * 文件路径过滤
 * @author nange
 *
 */
public class MyFilePathFileter implements PathFilter{
	//需要读取文件名必须包含fileName字符串
	private String fileName;
	
	public MyFilePathFileter(String fileName){
		this.fileName = fileName;
	}

	/**
	 * @param path :文件路径 如：hdfs://localhost:9000/hdfs/test/wordcount/in/word.txt
	 */
	@Override
	public boolean accept(Path path) {
		boolean res = false;
		if(path.toString().indexOf(fileName) != -1){
			res = true;
		}
		System.out.println("path = " + path + "过滤结果：" + res);
		return res;
	}

}

2、使用FileSystema提供globStatus（）方法对文件路径进行过滤

/**
	 * 对文件路径进行过滤
	 * FileSystema提供globStatus（）方法对文件路径进行过滤，这里的路径必须是hdfs路径
	 * 
	 * @param in : 使用通配符 如：hdfs://localhost:9000/hdfs/test/wordcount/in/*
	 * @throws IOException 
	 */
	public String filePaths(String in) throws IOException{
		StringBuilder sb = new StringBuilder();
		//globStatus()方法返回与路径想匹配的所有文件的FileStatus对象数组，并按路径排序。
		FileStatus[] fss = fs.globStatus(new Path(in), new MyFilePathFileter("in/word"));
		Path[] paths = FileUtil.stat2Paths(fss);
		if(paths != null){
			for(Path path : paths){
				sb.append(path.toString() + ",");
			}
		}
		int index = sb.toString().lastIndexOf(",");
 		if(index != -1){
 			System.out.println("过滤后的文件路径：" + sb.toString().substring(0, index));
 			return sb.toString().substring(0, index);
		}
		
		return null;
	}

3、作业多路径输入

fileInPaths：字符串使用","分割.如：hdfs://localhost:9000/hdfs/test/wordcount/in/word.txt,hdfs://localhost:9000/hdfs/test/wordcount/in/word2.txt

FileInputFormat.addInputPaths(job, fileInPaths);//多输入路径

分享到：

hadoop小文件处理以及解决方案（压缩技术） | Hadoop DistributedCache详解

2014-04-24 14:58
浏览 5790
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HDFS中PathFilter类对路径进行过滤

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HDFS中PathFilter类对路径进行过滤

评论

发表评论

相关推荐

MapReduce编程接口体系结构

hadoop优化

hadoop小文件处理以及解决方案（压缩技术）

mapreduce单元测试

Hadoop DistributedCache详解

mapreduce的reduce输出文件进行压缩

hadoop1.2.1 MultipleOutputs将结果输出到多个文件或文件夹

hadoop调度器

HDFS block块的副本存放策略

mapreduce驱动默认设置

mapreduce数据类型与java数据类型对应

MapReduce执行流程

hadoop常用指令

hdfs 架构

hadoop查看和离开安全模式

hbase的常用指令

hbase0.96.1.1安装配置

hadoop1.2.1安装配置

hadoop2.2.0伪分布式安装

eclipse远程连接hadoop服务器

最近访客更多访客>>