単純なJavaプログラムからmapreduceジョブを呼び出す

Question

同じパッケージ内の単純なJavaプログラムからmapreduceジョブを呼び出そうとしています。Javaプログラムでmapreduce jarファイルを参照し、runJar(String args[])メソッドを使用して、mapreduceジョブの入力パスと出力パスも渡します。しかし、プログラムdintは機能します。

Mainメソッドへのパス入力、出力、jarパスのみを使用するプログラムをどのように実行しますか？それを介してmapreduceジョブ（jar）を実行することは可能ですか？これを実行したいのは、Javaプログラムvlがjarファイルを参照してそのような各ジョブを呼び出すところで、いくつかのmapreduceジョブを次々に実行したいからです。これが可能であれば、単純なサーブレットを使用することもできます。そのような呼び出しを行い、グラフの目的でその出力ファイルを参照します。

/* * To change this template, choose Tools | Templates * and open the template in the editor. */ /** * * @author root */ import org.Apache.hadoop.util.RunJar; import Java.util.*; public class callOther { public static void main(String args[])throws Throwable { ArrayList arg=new ArrayList(); String output="/root/Desktp/output"; arg.add("/root/NetBeansProjects/wordTool/dist/wordTool.jar"); arg.add("/root/Desktop/input"); arg.add(output); RunJar.main((String[])arg.toArray(new String[0])); } }

Thomas Jungblut · Accepted Answer

ああ、runJarは使わないでください。Java APIは非常に優れています。

通常のコードからジョブを開始する方法をご覧ください。

// create a configuration Configuration conf = new Configuration(); // create a new job based on the configuration Job job = new Job(conf); // here you have to put your mapper class job.setMapperClass(Mapper.class); // here you have to put your reducer class job.setReducerClass(Reducer.class); // here you have to set the jar which is containing your // map/reduce class, so you can use the mapper class job.setJarByClass(Mapper.class); // key/value of your reducer output job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); // this is setting the format of your input, can be TextInputFormat job.setInputFormatClass(SequenceFileInputFormat.class); // same with output job.setOutputFormatClass(TextOutputFormat.class); // here you can set the path of your input SequenceFileInputFormat.addInputPath(job, new Path("files/toMap/")); // this deletes possible output paths to prevent job failures FileSystem fs = FileSystem.get(conf); Path out = new Path("files/out/processed/"); fs.delete(out, true); // finally set the empty out path TextOutputFormat.setOutputPath(job, out); // this waits until the job completes and prints debug out to STDOUT or whatever // has been configured in your log4j properties. job.waitForCompletion(true);

外部クラスターを使用している場合は、以下の情報を構成に含める必要があります。

// this should be like defined in your mapred-site.xml conf.set("mapred.job.tracker", "jobtracker.com:50001"); // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");

hadoop-core.jarはアプリケーションコンテナのクラスパスにあります。ただし、Hadoopジョブを完了するのに数分から数時間かかる場合があるため、Webページに何らかの進行状況インジケーターを配置する必要があると思います;）

YARNの場合（> Hadoop 2）

YARNの場合、以下の構成を設定する必要があります。

// this should be like defined in your yarn-site.xml conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); // framework is now "yarn", should be defined like this in mapred-site.xm conf.set("mapreduce.framework.name", "yarn"); // like defined in hdfs-site.xml conf.set("fs.default.name", "hdfs://namenode.com:9000");

RS Software -Competency Team · Answer

Java Webアプリケーション（サーブレット）からMapReduceジョブを呼び出す

Java APIを使用してWebアプリケーションからMapReduceジョブを呼び出すことができます。サーブレットからMapReduceジョブを呼び出す簡単な例を以下に示します。手順は以下のとおりです。

ステップ1：最初にMapReduceドライバサーブレットクラスを作成します。また、地図を作成してサービスを減らします。ここにサンプルコードスニペットがあります：

CallJobFromServlet.Java

 public class CallJobFromServlet extends HttpServlet { protected void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException { Configuration conf = new Configuration(); // Replace CallJobFromServlet.class name with your servlet class Job job = new Job(conf, " CallJobFromServlet.class"); job.setJarByClass(CallJobFromServlet.class); job.setJobName("Job Name"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(Map.class); // Replace Map.class name with your Mapper class job.setNumReduceTasks(30); job.setReducerClass(Reducer.class); //Replace Reduce.class name with your Reducer class job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); // Job Input path FileInputFormat.addInputPath(job, new Path("hdfs://localhost:54310/user/hduser/input/")); // Job Output path FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:54310/user/hduser/output")); job.waitForCompletion(true); } }

ステップ2：関連するすべてのjar（hadoop、アプリケーション固有のjar）ファイルを、Webサーバー（例：Tomcat）のlibフォルダー内に配置します。これは、Hadoop構成にアクセスするために必須です（hadoopの「conf」フォルダーには、構成xmlファイル（つまり、core-site.xml、hdfs-site.xmlなど）があります）。 jarファイルをhadoop libフォルダーからWebサーバー（Tomcat）libディレクトリーにコピーするだけです。 jar名のリストは次のとおりです。

1. commons-beanutils-1.7.0.jar 2. commons-beanutils-core-1.8.0.jar 3. commons-cli-1.2.jar 4. commons-collections-3.2.1.jar 5. commons-configuration-1.6.jar 6. commons-httpclient-3.0.1.jar 7. commons-io-2.1.jar 8. commons-lang-2.4.jar 9. commons-logging-1.1.1.jar 10. hadoop-client-1.0.4.jar 11. hadoop-core-1.0.4.jar 12. jackson-core-asl-1.8.8.jar 13. jackson-mapper-asl-1.8.8.jar 14. jersey-core-1.8.jar

ステップ：WebアプリケーションをWebサーバー（Tomcatの「webapps」フォルダー内）にデプロイします。

手順4：jspファイルを作成し、フォームアクション属性でサーブレットクラス（CallJobFromServlet.Java）をリンクします。ここにサンプルコードスニペットがあります：

Index.jsp

<form id="trigger_hadoop" name="trigger_hadoop" action="./CallJobFromServlet "> <span class="back">Trigger Hadoop Job from Web Page </span> <input type="submit" name="submit" value="Trigger Job" /> </form>

faridasabry · Answer

Hadoopの例ですでに実装されているジョブの別の方法。また、hadoop jarをインポートする必要があります。次に、適切な引数のString []を使用して、目的のジョブクラスの静的メイン関数を呼び出すだけです。

Jiang Libo · Answer

Mapとreduceは異なるマシンで実行されるため、参照されるすべてのクラスとjarはマシン間で移動する必要があります。

パッケージjarがあり、デスクトップで実行している場合、@ ThomasJungblutの答えはOKです。しかし、Eclipseで実行している場合は、クラスを右クリックして実行しても機能しません。

の代わりに：

job.setJarByClass(Mapper.class);

使用する：

job.setJar("build/libs/hdfs-javac-1.0.jar");

同時に、jarのマニフェストには、メインクラスであるMain-Classプロパティを含める必要があります。

Gradleユーザーの場合、これらの行をbuild.gradleに配置できます。

jar { manifest { attributes("Main-Class": mainClassName) }}

techlearner · Answer

このようにできます

public class Test { public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new YourJob(), args); System.exit(res); }

Chris White · Answer

Hadoop-coreライブラリを使用せずにこれを行う方法は多くありません（または、@ ThomasJungblutが言ったように、なぜそうしたいのか）。

ただし、どうしても必要な場合は、ジョブのワークフローを使用してOozieサーバーをセットアップし、Oozie Webサービスインターフェイスを使用してワークフローをHadoopに送信できます。

http://yahoo.github.com/oozie/
http://yahoo.github.com/oozie/releases/2.3.0/WorkflowFunctionalSpec.html#a11.3.1_Job_Submission

繰り返しますが、これはトーマスの答えを使用して解決できる何かのための多くの作業のようです（hadoop-core jarを含め、彼のコードスニペットを使用します）。