if u need ur mappers or reducers to read the content of a file,
then it is necessary to
1. put that file to HDFS
2. add that file to be localized using mapreduce Job's api so that mappers or reducers can access it
Following is the java codes for doing so:
In main function
Job job = Job.getInstance(conf, "jobname");
job.addCacheFile(new Path("hdfs://ip:port/folder/filename").toUri());
In the setup of the mapper (or reducer)
URI[] uris=context.getCacheFiles(); Path path = new Path(uris[0].toString());Now one may read the content of the file.
In addition, if in the main function one use job.addCacheFile to add multiple files, say
job.addCacheFile(new Path("hdfs://ip:port/folder/file1").toUri());
job.addCacheFile(new Path("hdfs://ip:port/folder/file2").toUri());
job.addCacheFile(new Path("hdfs://ip:port/folder/file3").toUri());
then file1's URI will be uris[0], file2's URI will be uris[1], and file3's URI will be uris[2].
 
沒有留言:
張貼留言