Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
This project shows how to use SPARK as Cloud-based SQL Engine and expose your big-data as a JDBC/ODBC data source via the Spark thrift server.
Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework.
Following picture illustrates how ApacheSpark can be used as SQL-on-Hadoop framework to serve your big-data as a JDBC/ODBC data source via the Spark thrift server.:
Beeline CLI
, JDBC
, ODBC
or BI tools like Tableau
connect to Spark thrift server.SparkSQL engine to access Hive or Spark temp tables and run the sql queries on ApacheSpark framework
.This project does demo 2 things:
mvn clean install
andspark-submit --class MainApp cloud-based-sql-engine-using-spark.jar
. Tht's it!records
table with SparkSQL.For this, first connect to Spark ThriftServer. Once the connection is established, just like HiveServer2, access Hive or Spark temp tables to run the sql queries on ApacheSpark framework. I'll show 2 ways to do this:
`$> beeline`
Beeline version 2.1.1-amzn-0 by Apache Hive
// Connect to spark thrift server..
`beeline> !connect jdbc:hive2://localhost:10000`
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000:
Enter password for jdbc:hive2://localhost:10000:
// run your sql queries and access data..
`jdbc:hive2://localhost:10000> show tables;,`
TestThriftClient.java
to demo the same.