Glue scripts for converting AWS Service Logs for use in Athena
This release adds support for Python 3 (thanks @ryandeivert).
Please note that this is likely the last release of this library in its current format. As part of researching many of the new Glue features, it's likely we'll make use of the new Blueprints and Workflows functionality to replace a good chunk of this code.
Follow along in https://github.com/awslabs/athena-glue-service-logs/issues/23 for more updates! 😄
This release fixes an issue (#17) where we weren't fetching the entire set of partitions from the Glue Data Catalog and could result in that data not showing up in the raw or optimized tables. This would have surfaced for data sources grouped by region with a large (>1000) number of partitions.
You can update your Glue jobs by just pointing the "Python lib path" to this new build. You may notice that on the next run, data from the past 30 days in those missing regions are now added.
This update adds a few new fields for both ALB (#9) and S3 Access Logs (#11) as well as updates the Makefile to provide a location for Glue temp files (#10).
Now that VPC Flow Logs can be published directly to S3, it makes it easy to add support for them to this framework. The logs are automatically partitioned by region and date.