JobConfを使用せずにHadoopジョブを実行する

Question

非推奨のJobConfクラスを使用しないHadoopジョブを送信する例は1つも見つかりません。非推奨になっていないJobClientは、JobConfパラメーターを受け取るメソッドのみをサポートします。

誰かが私にJavaコードがConfigurationクラス（JobConfではない）のみを使用してHadoop map/reduceジョブを送信し、 mapreduce.lib.inputの代わりにmapred.inputパッケージ？

zjffdu · Accepted Answer

これがお役に立てば幸いです

import Java.io.File; import org.Apache.commons.io.FileUtils; import org.Apache.hadoop.conf.Configured; import org.Apache.hadoop.fs.Path; import org.Apache.hadoop.io.LongWritable; import org.Apache.hadoop.io.Text; import org.Apache.hadoop.mapreduce.Job; import org.Apache.hadoop.mapreduce.Mapper; import org.Apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.Apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.Apache.hadoop.util.Tool; import org.Apache.hadoop.util.ToolRunner; public class MapReduceExample extends Configured implements Tool { static class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> { public MyMapper(){ } protected void map( LongWritable key, Text value, org.Apache.hadoop.mapreduce.Mapper<LongWritable, Text, LongWritable, Text>.Context context) throws Java.io.IOException, InterruptedException { context.getCounter("mygroup", "jeff").increment(1); context.write(key, value); }; } @Override public int run(String[] args) throws Exception { Job job = new Job(); job.setMapperClass(MyMapper.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); return 0; } public static void main(String[] args) throws Exception { FileUtils.deleteDirectory(new File("data/output")); args = new String[] { "data/input", "data/output" }; ToolRunner.run(new MapReduceExample(), args); } }

Binary Nerd · Answer

このチュートリアルは、Hadoop0.20.1を使用して非推奨のJobConfクラスを削除する方法を示していると思います。

dk. · Answer

これはダウンロード可能なコードを使用した良い例です： http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html これも2年以上経過しており、公式はありません新しいAPIについて説明しているドキュメント。悲しい。

coderz · Answer

ConfigurationとJobを使用してみてください。次に例を示します。

（Mapper、Combiner、Reducerクラスおよびその他の構成を置き換えます）

import org.Apache.hadoop.conf.Configuration; import org.Apache.hadoop.fs.Path; import org.Apache.hadoop.io.IntWritable; import org.Apache.hadoop.io.Text; import org.Apache.hadoop.mapreduce.Job; import org.Apache.hadoop.mapreduce.Mapper; import org.Apache.hadoop.mapreduce.Reducer; import org.Apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.Apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.Apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.Apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); if(args.length != 2) { System.err.println("Usage: <in> <out>"); System.exit(2); } Job job = Job.getInstance(conf, "Word Count"); // set jar job.setJarByClass(WordCount.class); // set Mapper, Combiner, Reducer job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); /* Optional, set customer defined Partioner: * job.setPartitionerClass(MyPartioner.class); */ // set output key job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // set input and output path FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); // by default, Hadoop use TextInputFormat and TextOutputFormat // any customer defined input and output class must implement InputFormat/OutputFormat interface job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } }

Yatin · Answer

以前のAPIには、ジョブを送信する3つの方法があり、そのうちの1つは、ジョブを送信して、RunningJobへの参照を取得し、RunningJobのIDを取得することです。

submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions.

新しいAPIを使用して、RunningJobへの参照を取得し、runningJobのIDを取得するには、どのAPIもRunningJobへの参照を返さないため、どうすればよいですか。

http://hadoop.Apache.org/docs/current/api/org/Apache/hadoop/mapreduce/Job.html

ありがとう