Rtdl Versions Save

rtdl makes it easy to build and maintain a real-time data lake

v0.2.0

1 year ago

V0.2.0 - Current status -- what works and what doesn't

What works? 🚀

rtdl's initial feature set is built and working. You can use the API on port 80 to configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet, and save the files to a destination configured in your stream. rtdl can write files locally, to HDFS, to AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's web UI at http://localhost:9047 (login with Username: rtdl and Password rtdl1234). rtdl supports writing in the Delta Lake table format as well as integration with the AWS Glue and Snowflake External Tables metadata catalogs.

What's new? 💥

  • Upgrading to v0.2.0 requires following the steps in our upgrade guide.
  • Added Delta Lake support.
  • Switched to file-based configuration storage (removed dependency on PostgreSQL).

What doesn't work/what's next on the roadmap? 🚴🏼

  • Community contribution: Stateful Function for PII detection and masking.
  • Making AWS Glue, Snowflake External Tables, and Delta Lake support on a by-stream basis.
  • git integration for stream configurations.
  • Research and implementation for Apache Hudi, Apache Iceberg, and Project Nessie.
  • Graphical user interface.
  • Dremio Cloud support.

v0.1.2

2 years ago

V0.1.2 - Current status -- what works and what doesn't

What works? 🚀

rtdl's initial feature set is built and working. You can use the API on port 80 to configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet, and save the files to a destination configured in your stream. rtdl can write files locally, to HDFS, to AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's web UI at http://localhost:9047 (login with Username: rtdl and Password rtdl1234).

What's new? 💥

  • Added HDFS support.
  • Added AWS Glue support.
  • Added Snowflake External Tables support.

What doesn't work/what's next on the roadmap? 🚴🏼

  • Community contribution: Stateful Function for PII detection and masking.
  • Move stream configurations to JSON files instead of SQL.
  • git integration for stream configurations.
  • Research and implementation for Apache Hudi, Apache Iceberg, Delta Lake, and Project Nessie.
  • Graphical user interface.
  • Dremio Cloud support.

v0.1.1

2 years ago

V0.1.1 - Current status -- what works and what doesn't

What works? 🚀

rtdl's initial feature set is built and working. You can use the API on port 80 to configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet, and save the files to a destination configured in your stream. rtdl can write files locally, to AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's web UI at http://localhost:9047 (login with Username: rtdl and Password rtdl1234).

What's new? 💥

  • Replaced Kafka & Zookeeper with Redpanda.
  • Added support for HDFS.
  • Fixed issue with handling booleans when writing Parquet.
  • Added several logo variants and a banner to the public directory.

What doesn't work/what's next on the roadmap? 🚴🏼

  • Dremio Cloud support.
  • Apache Hudi support.
  • Start using GitHub Projects for work tracking.
  • Research and implementation for Apache Iceberg, Delta Lake, and Project Nessie.
  • Community contribution: Stateful Function for PII detection and masking.
  • Graphical user interface.

v0.1.0

2 years ago

V0.1.0 - Current status -- what works and what doesn't

What works? 🚀

rtdl's initial feature set is built and working. You can use the API on port 80 to configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet, and save the files to a destination configured in your stream. rtdl can write files locally, to AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data with Dremio on port 9047 (login with Username: rtdl and Password rtdl1234).

What's new? 💥

  • Added support for Azure Blob Storage V2 (please note that for events written to Azure Blob Storage V2 - it can take time up to 1 minute for data to reflect in Dremio)
  • Added support for GZIP and LZO compressions in addition to SNAPPY (default). Specify compression_type_id as 2 for GZIP and 3 for LZO
  • Added support for Segment webhooks. You can set up RTDL ingester endpoint as a webhook in Segment. You will need to create a stream with the stream_alt_id as either the Source ID or the Write Key from the API Keys tab of Settings for the Source connected to the Webhook Destination.

What doesn't work/what's next on the roadmap? 🚴🏼

  • Start using GitHub Projects for work tracking
  • Research and implementation for Apache Hudi, Apache Iceberg, Delta Lake, and Project Nessie
  • Writing to HDFS
  • Graphical User Interface

v0.0.2

2 years ago

V0.0.2 - Current status -- what works and what doesn't

What works? 🚀

rtdl is not full-featured yet, but it is currently functional. You can use the API on port 80 to configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet, and save the files to a destination configured in your stream. rtdl can write files locally, to AWS S3, and to GCP Cloud Storage, and you can query your data with Dremio on port 9047 (login with Username: rtdl and Password rtdl1234).

What's new? 💥

  • Switched from Apache Hive Metastore + Presto to Dremio. Dremio works for all storage types. This was incorrectly noted as not functioning in the original release notes.
  • Added support for using a flattened JSON object as value for gcp_json_credentials field in the createStream API call. Previously, you had to double-quote everything and flatten.
  • Added CONTRIBUTING.md and decided to use a DCO over a CLA - tl;dr use -s when you commit, like git commit -s -m "..."

What doesn't work/what's next on the roadmap? 🚴🏼

  • Add support for Azure Blob Storage
  • Add support for Segment Webhooks as a source
  • Add support for more compressions - currently default Snappy compression is supported

v0.0.1

2 years ago