web-dev-qa-db-ja.com

ClassNotFoundException:org.Apache.spark.SparkConf with spark on hive

SPARKをHive実行エンジンとして使用しようとしていますが、以下のエラーが発生します。Spark 1.5.0がインストールされ、Hive1.1.0を使用しています。 Hadoop2.7.0バージョンのバージョン。

Hive_empテーブルは、HiveでORC形式のテーブルとして作成されます。

Hive (Koushik)> insert into table Hive_emp values (2,'Koushik',1);
Query ID = hduser_20150921072727_feba8363-258d-4d0b-8976-662e404bca88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set Hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set Hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Java.lang.NoClassDefFoundError: org/Apache/spark/SparkConf
    at org.Apache.hadoop.Hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.Java:140)
    at org.Apache.hadoop.Hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.Java:56)
    at org.Apache.hadoop.Hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.Java:55)
    at org.Apache.hadoop.Hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.Java:116)
    at org.Apache.hadoop.Hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.Java:113)
    at org.Apache.hadoop.Hive.ql.exec.spark.SparkTask.execute(SparkTask.Java:95)
    at org.Apache.hadoop.Hive.ql.exec.Task.executeTask(Task.Java:160)
    at org.Apache.hadoop.Hive.ql.exec.TaskRunner.runSequential(TaskRunner.Java:88)
    at org.Apache.hadoop.Hive.ql.Driver.launchTask(Driver.Java:1638)
    at org.Apache.hadoop.Hive.ql.Driver.execute(Driver.Java:1397)
    at org.Apache.hadoop.Hive.ql.Driver.runInternal(Driver.Java:1183)
    at org.Apache.hadoop.Hive.ql.Driver.run(Driver.Java:1049)
    at org.Apache.hadoop.Hive.ql.Driver.run(Driver.Java:1039)
    at org.Apache.hadoop.Hive.cli.CliDriver.processLocalCmd(CliDriver.Java:207)
    at org.Apache.hadoop.Hive.cli.CliDriver.processCmd(CliDriver.Java:159)
    at org.Apache.hadoop.Hive.cli.CliDriver.processLine(CliDriver.Java:370)
    at org.Apache.hadoop.Hive.cli.CliDriver.executeDriver(CliDriver.Java:754)
    at org.Apache.hadoop.Hive.cli.CliDriver.run(CliDriver.Java:675)
    at org.Apache.hadoop.Hive.cli.CliDriver.main(CliDriver.Java:615)
    at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
    at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
    at Java.lang.reflect.Method.invoke(Method.Java:601)
    at org.Apache.hadoop.util.RunJar.run(RunJar.Java:221)
    at org.Apache.hadoop.util.RunJar.main(RunJar.Java:136)
Caused by: Java.lang.ClassNotFoundException: org.Apache.spark.SparkConf
    at Java.net.URLClassLoader$1.run(URLClassLoader.Java:366)
    at Java.net.URLClassLoader$1.run(URLClassLoader.Java:355)
    at Java.security.AccessController.doPrivileged(Native Method)
    at Java.net.URLClassLoader.findClass(URLClassLoader.Java:354)
    at Java.lang.ClassLoader.loadClass(ClassLoader.Java:423)
    at Sun.misc.Launcher$AppClassLoader.loadClass(Launcher.Java:308)
    at Java.lang.ClassLoader.loadClass(ClassLoader.Java:356)
    ... 25 more
FAILED: Execution Error, return code -101 from org.Apache.hadoop.Hive.ql.exec.spark.SparkTask. org/Apache/spark/SparkConf

また、HiveShellでsparkパスと実行エンジンを設定しました。

hduser@ubuntu:~$ spark-Shell
    Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> exit;
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
hduser@ubuntu:~$ Hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/Hive/lib/spark-Assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/Hive/auxlib/spark-Assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/Hive/lib/spark-Assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/Hive/auxlib/spark-Assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/lib/Hive/conf/Hive-log4j.properties
Hive (default)> use Koushik;
OK
Time taken: 0.593 seconds
Hive (Koushik)> set spark.home=/usr/local/src/spark;

以下のように.hivercも作成しました

hduser@ubuntu:/usr/lib/Hive/conf$ cat .hiverc
SET Hive.cli.print.header=true;
set Hive.cli.print.current.db=true;
set Hive.auto.convert.join=true;
SET hbase.scan.cacheblock=0;
SET hbase.scan.cache=10000;
SET hbase.client.scanner.cache=10000;
SET Hive.execution.engine=spark;

以下に示すDEBUGモードエラーの詳細:

hduser@ubuntu:~$ Hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/Hive/lib/spark-Assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/Hive/auxlib/spark-Assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/Hive/lib/spark-Assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/Hive/auxlib/spark-Assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/lib/Hive/conf/Hive-log4j.properties
Hive (default)> use Koushik;
OK
Time taken: 0.625 seconds
Hive (Koushik)> set Hive --hiveconf Hive.root.logger=DEBUG
              > ;
Hive (Koushik)> set Hive.execution.engine=spark;
Hive (Koushik)> desc Hive_emp;
OK
col_name    data_type   comment
empid                   int                                         
empnm                   varchar(50)                                 
deptid                  int                                         
Time taken: 0.173 seconds, Fetched: 3 row(s)
Hive (Koushik)> select * from Hive_emp;
OK
Hive_emp.empid  Hive_emp.empnm  Hive_emp.deptid
Time taken: 1.689 seconds
Hive (Koushik)> insert into table Hive_emp values (2,'Koushik',1);
Query ID = hduser_20151015112525_c96a458b-34f8-42ac-ab11-52c32479a29a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set Hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set Hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Java.lang.NoSuchMethodError: org.Apache.spark.scheduler.LiveListenerBus.addListener(Lorg/Apache/spark/scheduler/SparkListener;)V
    at org.Apache.hadoop.Hive.ql.exec.spark.LocalHiveSparkClient.<init>(LocalHiveSparkClient.Java:85)
    at org.Apache.hadoop.Hive.ql.exec.spark.LocalHiveSparkClient.getInstance(LocalHiveSparkClient.Java:69)
    at org.Apache.hadoop.Hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.Java:56)
    at org.Apache.hadoop.Hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.Java:55)
    at org.Apache.hadoop.Hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.Java:116)
    at org.Apache.hadoop.Hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.Java:113)
    at org.Apache.hadoop.Hive.ql.exec.spark.SparkTask.execute(SparkTask.Java:95)
    at org.Apache.hadoop.Hive.ql.exec.Task.executeTask(Task.Java:160)
    at org.Apache.hadoop.Hive.ql.exec.TaskRunner.runSequential(TaskRunner.Java:88)
    at org.Apache.hadoop.Hive.ql.Driver.launchTask(Driver.Java:1638)
    at org.Apache.hadoop.Hive.ql.Driver.execute(Driver.Java:1397)
    at org.Apache.hadoop.Hive.ql.Driver.runInternal(Driver.Java:1183)
    at org.Apache.hadoop.Hive.ql.Driver.run(Driver.Java:1049)
    at org.Apache.hadoop.Hive.ql.Driver.run(Driver.Java:1039)
    at org.Apache.hadoop.Hive.cli.CliDriver.processLocalCmd(CliDriver.Java:207)
    at org.Apache.hadoop.Hive.cli.CliDriver.processCmd(CliDriver.Java:159)
    at org.Apache.hadoop.Hive.cli.CliDriver.processLine(CliDriver.Java:370)
    at org.Apache.hadoop.Hive.cli.CliDriver.executeDriver(CliDriver.Java:754)
    at org.Apache.hadoop.Hive.cli.CliDriver.run(CliDriver.Java:675)
    at org.Apache.hadoop.Hive.cli.CliDriver.main(CliDriver.Java:615)
    at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57)
    at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43)
    at Java.lang.reflect.Method.invoke(Method.Java:601)
    at org.Apache.hadoop.util.RunJar.run(RunJar.Java:221)
    at org.Apache.hadoop.util.RunJar.main(RunJar.Java:136)
FAILED: Execution Error, return code -101 from org.Apache.hadoop.Hive.ql.exec.spark.SparkTask. org.Apache.spark.scheduler.LiveListenerBus.addListener(Lorg/Apache/spark/scheduler/SparkListener;)V
Hive (Koushik)> 

上記の挿入を2回実行しましたが、どちらも失敗しました。今日生成されたHive.logを見つけてください。 Hive.log

13
Koushik Chandra

私も同じ問題に直面していましたmy Ubuntu 14.4 VitualBox。修正するために私が従った手順は次のとおりです。

  1. Hive> set spark.home=/usr/local/spark;

  2. Hive> set spark.master=local;

  3. Hive> SET Hive.execution.engine=spark;

  4. 追加spark-Assembly jar以下に示すファイル:

    Hive> ADD jar /usr/local/spark/lib/spark-Assembly-1.4.0-hadoop2.6.0.jar;

1
Vinkal

このエラーの理由は、Hiveがsparkアセンブリjarを見つけることができないためです。

sPARK_HOME =/usr/local/src/sparkをエクスポートするか、Hivelibフォルダーにspark Assemblyjarを追加します。この問題は解決されます。

1
Arvindkumar