A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.
Groovy ETL (Getl) - open source project on Groovy, developed since 2012 to automate loading and processing data from different sources.
IBM DB2, FireBird, H2 Database, Hadoop Hive, Cloudera Impala, MS SQLServer, MySql, IBM Netezza, NetSuite, Oracle, PostgreSql, SAP Hana, SQLite, Micro Focus Vertica.
CSV, MS Excel, Json, XML, Yaml, DBF.
Kafka, SalesForce, WebServices.
Local file systems Windows and Linux, FTP, SFTP, Hadoop HDFS.
Registration of connections to Oracle and Vertica:
package packet1
import groovy.transform.BaseScript
import groovy.transform.Field
import getl.lang.Getl
@BaseScript Getl main
oracleConnection('ora', true) {
connectHost = 'oracle-host'
connectDatabase = 'oradb'
login = 'user'
password = 'password'
}
verticaConnection('ver', true) {
connectHost = 'vertica-host1'
connectDatabase = 'verdb'
extended.backupservernode = 'vertica-host2,vertica-host3'
login = 'user'
password = 'password'
}
Creating a table in Vertica based on Oracle table:
oracleTable('ora:table1', true) {
useConnection oracleConnection('ora')
schemaName = 'user'
tableName = 'table1'
}
verticaTable('ver.stage:table1', true) {
useConnection verticaConnection('ver')
schemaName = 'stage'
tableName = 'table1'
if (!exists) {
setOracleFields oracleTable('ora:table1')
create()
}
}
verticaTable('ver.work:table1', true) {
useConnection verticaConnection('ver')
schemaName = 'public'
tableName = 'table1'
if (!exists)
createLike verticaTable('ver.stage:table1')
}
Copying all rows from the Oracle table to the Vertica table by uploading to an temporary csv file and loading it through the COPY statement:
etl.copyRows(oracleTable('ora:table1'), verticaTable('ver.stage:table1')) { bulkLoad = true }
Copying table data from a staging area into a working one:
sql {
useConnection verticaConnection('ver')
exec '''
/*:count_insert*/
INSERT INTO public.table1 SELECT * FROM stage.table1;
IF ({count_insert} > 0);
ECHO Copied {count_insert} rows successfull.
END IF;
COMMIT;
'''
}
Truncate staging table and purge working table:
verticaTable('ver.stage:table1') { truncate() }
verticaTable('ver.prod:table1') { purgeTable() }
Unloading rows from the Vertica table to a csv file according to a specified condition:
csv('file1') {
useConnection csvConnection { path = '/tmp/unload' }
fileName = 'data.table1'
extenstion = 'csv'
fieldDelimiter = '|'
codePage = 'utf-8'
header = true
}
verticaTable('ver.work:table1') {
readOpts {
where = 'field1 = CURRENT_DATE'
order = ['field2', 'field3']
}
}
etl.copyRows(verticaTable('ver.work:table1'), csv('file1'))
Copying a file via ssh to another server:
files('unload_files', true) {
rootPath = '/tmp/unload'
}
sftp('ssh1', true) {
host = 'ssh-host'
login = 'user'
password = 'password'
rootPath = '/csv'
}
fileman.copier(files('unload_files'), sftp('ssh1')) {
useSourcePath { mask = 'data.{table}.csv' }
}