2014年7月27日 星期日

Add and read cache file in MapReduce program (2.2.0 API)

In MapReduce programming,
if u need ur mappers or reducers to read the content of a file,
then it is necessary to
1. put that file to HDFS
2. add that file to be localized using mapreduce Job's api so that mappers or reducers can access it

Following is the java codes for doing so:

In main function

Job job = Job.getInstance(conf, "jobname");
job.addCacheFile(new Path("hdfs://ip:port/folder/filename").toUri());

In the setup of the mapper (or reducer)

URI[] uris=context.getCacheFiles();        
Path path = new Path(uris[0].toString());
Now one may read the content of the file.
In addition, if in the main function one use job.addCacheFile to add multiple files, say

job.addCacheFile(new Path("hdfs://ip:port/folder/file1").toUri());
job.addCacheFile(new Path("hdfs://ip:port/folder/file2").toUri());
job.addCacheFile(new Path("hdfs://ip:port/folder/file3").toUri());
then file1's URI will be uris[0], file2's URI will be uris[1], and file3's URI will be uris[2].

沒有留言:

張貼留言