SQL input/output plugin for Fluentd
This SQL plugin has two parts:
fluent-plugin-sql | fluentd | ruby |
---|---|---|
>= 1.0.0 | >= v0.14.4 | >= 2.1 |
< 1.0.0 | < v0.14.0 | >= 1.9 |
NOTE: fluent-plugin-sql v2's buffer format is different from v1. If you update the plugin to v2, don't reuse v1's buffer.
$ fluent-gem install fluent-plugin-sql --no-document
$ fluent-gem install pg --no-document # for postgresql
You should install actual RDBMS driver gem together. pg
gem for postgresql adapter or mysql2
gem for mysql2
adapter. Other adapters supported by ActiveRecord should work.
We recommend that mysql2 gem is higher than 0.3.12
and pg gem is higher than 0.16.0
.
If you use ruby 2.1, use pg gem 0.21.0 (< 1.0.0) because ActiveRecord 5.1.4 or earlier doesn't support Ruby 2.1.
This plugin runs following SQL periodically:
SELECT * FROM table WHERE update_column > last_update_column_value ORDER BY update_column ASC LIMIT 500
What you need to configure is update_column. The column should be an incremental column (such as AUTO_ INCREMENT primary key) so that this plugin reads newly INSERTed rows. Alternatively, you can use a column incremented every time when you update the row (such as last_updated_at
column) so that this plugin reads the UPDATEd rows as well.
If you omit to set update_column parameter, it uses primary key.
It stores last selected rows to a file (named state_file) to not forget the last row when Fluentd restarts.
<source>
@type sql
host rdb_host
port rdb_port
database rdb_database
adapter mysql2_or_postgresql_or_etc
username myusername
password mypassword
tag_prefix my.rdb # optional, but recommended
select_interval 60s # optional
select_limit 500 # optional
state_file /var/run/fluentd/sql_state
<table>
table table1
tag table1 # optional
update_column update_col1
time_column time_col2 # optional
</table>
<table>
table table2
tag table2 # optional
update_column updated_at
time_column updated_at # optional
time_format %Y-%m-%d %H:%M:%S.%6N # optional
</table>
# detects all tables instead of <table> sections
#all_tables
</source>
<table> sections:
You should make sure target tables have index (and/or partitions) on the update_column. Otherwise SELECT causes full table scan and serious performance problem.
You can't replicate DELETEd rows.
This plugin takes advantage of ActiveRecord underneath. For host
, port
, database
, adapter
, username
, password
, socket
parameters, you can think of ActiveRecord's equivalent parameters.
<match my.rdb.*>
@type sql
host rdb_host
port 3306
database rdb_database
adapter mysql2_or_postgresql_or_etc
username myusername
password mypassword
socket path_to_socket
remove_tag_prefix my.rdb # optional, dual of tag_prefix in in_sql
<table>
table table1
column_mapping 'timestamp:created_at,fluentdata1:dbcol1,fluentdata2:dbcol2,fluentdata3:dbcol3'
# This is the default table because it has no "pattern" argument in <table>
# The logic is such that if all non-default <table> blocks
# do not match, the default one is chosen.
# The default table is required.
</table>
<table hello.*> # You can pass the same pattern you use in match statements.
table table2
# This is the non-default table. It is chosen if the tag matches the pattern
# AFTER remove_tag_prefix is applied to the incoming event. For example, if
# the message comes in with the tag my.rdb.hello.world, "remove_tag_prefix my.rdb"
# makes it "hello.world", which gets matched here because of "pattern hello.*".
</table>
<table hello.world>
table table3
# This is the second non-default table. You can have as many non-default tables
# as you wish. One caveat: non-default tables are matched top-to-bottom and
# the events go into the first table it matches to. Hence, this particular table
# never gets any data, since the above "hello.*" subsumes "hello.world".
</table>
</match>
<table> sections:
from:to
or key
values are separated by ,
. For example, if set 'item_id:id,item_text:data,updated_at' to column_mapping, item_id
field of record is stored into id
column and updated_at
field of record is stored into updated_at
column.remove_tag_prefix
, if given). The patterns should follow the same syntax as that of <match>. Exactly one <table> element must NOT have this parameter so that it becomes the default table to store data.