How to use Hive Warehouse Connector in HDP 3.1 and later

In the older version of HDP i.e. 2.6.x version spark was using the same catalog as hive so all the database and tables were residing in the same catalog and spark and hive was able to access it.

After HDP 3.0 the spark catalog and hive catalog are separated now.
– A table created by spark resides in the spark catalog
– A table created by hive resides in the hive catalog
– HWC API can be used to access any hive catalog table from Spark
– Spark API can be used to access any spark catalog table
– Must use LLAP to access read ACID tables
– To write to an ACID table we do not need LLAP

To connect the spark-shell using HWC you will need to run your spark-shell command as follows

Spark Shell using HWC

spark-shell --master yarn --conf "spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c2198-node2.squadron.support.hortonworks.com:2181,c2198-node3.squadron.support.hortonworks.com:2181,c2198-node4.squadron.support.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive" --conf "spark.datasource.hive.warehouse.metastoreUri=thrift://c2198-node3.squadron.support.hortonworks.com:9083" --conf "spark.datasource.hive.warehouse.load.staging.dir=/tmp/" --conf "spark.hadoop.hive.llap.daemon.service.hosts=@llap0" --conf "spark.hadoop.hive.zookeeper.quorum=c2198-node2.squadron.support.hortonworks.com:2181,c2198-node3.squadron.support.hortonworks.com:2181,c2198-node4.squadron.support.hortonworks.com:2181" --conf spark.security.credentials.hiveserver2.enabled=false --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.4.0-315.jar 

Spark Shell for normal Spark SQL

spark-shell

Once you are connected to spark-shell you can use below commands to use HWC API

Accessing hive catalog using HWC API

import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()
hive.execute("show tables").show
hive.executeQuery("select * from employee").show

Accessing spark catalog using Spark API

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("create database cldrdb")
sqlContext.sql("create table cldrdb.test_spark(a int) stored as orc")
sqlContext.sql("use s0j00en")
val result=sqlContext.sql("show tables")
result.show()

Leave a Comment