Data processing & ETL framework for Ruby
This is a maintenance release with code clean-up. Regular use should see no impact.
config :kiba, runner: Kiba::Runner
, be aware that this legacy runner has been removed in #96. The upgrade path is to remove this config line and let Kiba use the more modern Kiba::StreamingRunner
, which is the default anyway since Kiba v3.0.0 (see #83 for context) and is normally fully backward-compatible.kiba
shell command had been deprecated and replaced by a simple stub printing a warning to STDERR. It is now removed for good.Kiba.run(job)
can now (instead of a job parameter) take a block to define the job. See #94 for more details.
This release adds support for Ruby 2.7 and Ruby 3.
See #93 for detailed information and analysis.
kiba
CLI is deprecatedThe kiba
CLI is deprecated in favor of the more modern Kiba.parse
programmatic API [#74, #81].
The programmatic API allows everything the "command" mode supported, plus much more, and actually encourage better coding practices. For instance:
A temporary kiba-legacy-cli
gem is available (https://github.com/thbar/kiba-legacy-cli) to ease migration, but the recommendation is really to migrate over and use Kiba.parse
directly, as described in the current documentation.
StreamingRunner
Introduced in v2.0.0 [#44] to ensure a transform could yield N rows for 1 input row, and improved in v2.5.0 [#57] to help implement "buffering transforms", the StreamingRunner
is now made the default to process the jobs [#83].
This change is expected to be backward compatible and will help with reusability & features of ETL components.
A Transform's close
can now yield rows (this requires the new StreamingRunner
, see v2.0.0 release notes).
This will let component implementers support new types of scenarios:
ParallelTransform
, or batch SQL lookups)See #57 for more background & explanations.
Kiba now requires MRI Ruby 2.3+, JRuby 9.1+ or TruffleRuby.
This is done to reduce the testing burden, to encourage users to avoid EOL'ed rubies, and to let me use more recent Ruby features when relevant.
transform nil
(#73 - thanks @envygeeks for the report).Kiba 2 introduces a new, opt-in engine called the StreamingRunner
, which allows to generate an arbitrary number of rows inside class transforms. This drastically improves the reusability & composability of Kiba components (see #44 for some background).
To use the StreamingRunner
, use the following code:
# activate the new Kiba internal config system
extend Kiba::DSLExtensions::Config
# opt-in for the new engine
config :kiba, runner: Kiba::StreamingRunner
# write transform class able to yield an arbitrary number of rows
class MyYieldingTransform
def process(row)
yield {key: 1}
yield {key: 2}
{key: 3}
end
end
The improved runner is compatible with Ruby 2.0+.
:warning: it is warmly recommended not to share data between the rows yielded this way, otherwise anything changing one row will also affect the others. Make sure to build completely independent rows (or use an immutable Hash structure).
Kiba 2 is expected to be compatible with existing Kiba 1 scripts, as long as you did not use internal API.
Internal changes include:
config
system, currently only used to select the runner you want at job declaration timeParser
, to reduces the chances that ETL scripts could conflict with Kiba internal classesclose
becomes optional in destinations.