Write ETL using your favorite SQL dialects
Sharp ETL is an ETL framework that simplifies writing and executing ETLs by simply writing SQL workflow files. The SQL workflow file format is combined your favorite SQL dialects with just a little bit of configuration.
docker run --name sharp_etl_db -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root -e MYSQL_DATABASE=sharp_etl mysql:5.7
./gradlew buildJars -PscalaVersion=2.12 -PsparkVersion=3.3.0 -PscalaCompt=2.12.15
hello_world.sql
cat spark/src/main/resources/tasks/hello_world.sql
you will see the following contents:
-- workflow=hello_world
-- loadType=incremental
-- logDrivenType=timewindow
-- step=define variable
-- source=temp
-- target=variables
SELECT 'RESULT' AS `OUTPUT_COL`;
-- step=print SUCCESS to console
-- source=temp
-- target=console
SELECT 'SUCCESS' AS `${OUTPUT_COL}`;
spark-submit --master local --class com.github.sharpdata.sharpetl.spark.Entrypoint spark/build/libs/sharp-etl-spark-standalone-3.3.0_2.12-0.1.0.jar single-job --name=hello_world --period=1440 --default-start-time="2022-07-01 00:00:00" --once --local
And you will see the output like:
== Physical Plan ==
*(1) Project [SUCCESS AS RESULT#17167]
+- Scan OneRowRelation[]
root
|-- RESULT: string (nullable = false)
+-------+
|RESULT |
+-------+
|SUCCESS|
+-------+
The compatible versions of Spark are as follows:
Spark | Scala |
---|---|
2.3.x | 2.11 |
2.4.x | 2.11 / 2.12 |
3.0.x | 2.12 |
3.1.x | 2.12 |
3.2.x | 2.12 / 2.13 |
3.3.x | 2.12 / 2.13 |
3.4.x | 2.12 / 2.13 |
3.5.x | 2.13 |