PgMemento Versions Save

Audit trail with schema versioning for PostgreSQL using transaction-based logging

v0.7.4

1 year ago

Changelog

This is a bugfix release to v0.7. To upgrade, use the UPGRADE_v07_to_v074.sql script or run ALTER EXTENSION pgmemento UPDATE TO '0.7.4' within the database after installing the extension. Have fun with this release. Please, provide feedback. Stay healthy and confident. Thank you!

Enhancements

  • logging a table state (e.g. for baseline) is not limited to tables with 50+ columns (#99)
  • Postgres 15 included in test suite (#100)

v0.7.3

2 years ago

Changelog

This is a bugfix release to v0.7. To upgrade, use the UPGRADE_v07_to_v073.sql script or run ALTER EXTENSION pgmemento UPDATE TO '0.7.3' within the database after installing the extension. Have fun with this release. Please, provide feedback. Happy new year and stay healthy. Thank you!

Postgres 14 support

pgMemento uses custom session variables to help logging DDL events. In Postgres 14 these must start with a letter. Thanks to @ol-teuto for reflecting this change in the pgMemento code base (#91). Older releases will not work with Postgres 14.

Bugfixes

  • fix losing update logs of separate statements with same timestamp (e.g. executed from a stored procedure) (#92)
  • pgmemento.start will call reinit if log params differ from audit_schema_log (#93)
  • shipping update scripts for all minor versions (#89)

v0.7.2

3 years ago

Changelog

This is a bugfix release to v0.7. To upgrade, use the UPGRADE_v07_to_v072.sql script or run ALTER EXTENSION pgmemento UPDATE TO '0.7.2' within the database after installing the extension. Have fun with this release. Please, provide feedback. Stay healthy. Thank you!

Bugfixes

  • Fix bug of restoring NULL values introduced by v0.7.1 (#85)
  • Reverting and restoring array values (#83)
  • Exclude empty ranges in audit_column_log for RESTORE and REVERT (#86)
  • CTL functions now create session_info object in valid JSON and not stringified (#76, #78)
  • Added upgrade script to tests (#87)
  • Migrate from travis CI to GitHub actions (#77)

v0.7.1

3 years ago

Changelog

This is a bugfix release to v0.7. To upgrade, use the UPGRADE_v07_to_v071.sql script or run ALTER EXTENSION pgmemento UPDATE TO '0.7.1' within the database after installing the extension. Have fun with this release. Please, provide feedback. Stay healthy. Thank you!

Bugfixes

Log session_user instead of current_user to log the caller (by Nikolai Lebedev) (#63)

v0.7 introduced SECURITY DEFINER on many logging functions to protect pgMemento's tables better. However, when calling log_transaction it is executed with the owner role hence logging the owner, not the calling user which we want to audit in the user_name column of transaction_log.

Fix recursive loop when calling create_table_audit (#65)

Creating tables with REFERENCES calls DDL trigger too often (#69)

Don't restore JSONB values as NULL (#71)

Support restoring array values (#71)

v0.7

4 years ago

Changelog

Enhancements

Provide a pgMemento Postgres Extension (by Serge Latyntsev) (#19)

Finally, pgMemento can be installed as a PostgreSQL extension. The required files were implemented by Serge Latyntsev already back in beginning of 2019. The extension will also be published at PGXN.

More fine-grained event logging with statement timestamps (#39)

Until now, a table event type (e.g. UPDATE) could only be logged once per transaction per table because of the way row-level changes were referenced to the event. v0.7 allows at least one event type per statement timestamp, i.e. multiple UPDATES at different times during one txid will become two entries in the table_event_log.

Enable temporal partitioning for log tables (#39)

Both table_event_log and row_log now store an event_key consisting of the transaction and statement timestamps (in epoch format) as well as txid, table operation, name and schema. This allows for easy temporal partitioning of those tables. Note, the foreign key column event_id in row_log is gone as well as the foreign key constraint. Combinations of event_key and audit_id are unique there.

Optionally logging new data in row_log, too (#44)

Understanding which values got introduced on INSERTs or UPDATEs required using the restore API, so far. But, getting old and new data deltas straight from one entry in the row_log would be more convenient and cheaper. Therefore, logging new values can now be enabled optionally. The configured logging behavior can be queried from the audit_table_log and audit_tables view. This feature was a request by Lenz Weber from Mayflower who use pgMemento for auditing a GraphQL API.

Stop relying on OID columns in log tables for recovery (#33)

So far, pgMemento was using the table's OID in table_event_log and audit_table_log to track events and changes. But, when data is restored from backup, table OIDs are not the same anymore. With this release only the table and schema names are used. The audit_table_log uses a new log_id column to trace tables in case they get renamed or recreated.

New audit_schema_log table to configure default audit_id column name and logging behavior (#49)

This is a new log table to show for which database schemas pgMemento is currently enabled. This is essential for newly created tables to check if auditing shall be started, which audit_id name to use and if new deltas shall be logged. It can be seen as a global configuration table for pgMemento.

Name for audit_id column can be configured (#49)

Admittedly, audit_id is quite a generic name for a tracer column. This release will switch to the more specific default pgmemento_audit_id. But, you can now choose how this column should be called. The configured name is stored in the audit_schema_log and audit_table_log and it's also queryable from the audit_tables view. It can be different for every table, but using existing columns, like primary keys, is not yet supported, though. Besides, the logging triggers were renamed, too. The log_* prefix has been replaced with pgmemento_*.

Start auditing also after CREATE TABLE AS and SELECT INTO events (#49)

With the audit_schema_log in place, it is also time to allow more table creation event tags.

New scriptable endpoints to init, pause and drop pgMemento in schemas (#47, #50, #54, #55)

Interactive scripts with prompts are fine, but the world of CI/CD wants something easily scriptable. New simple endpoints are provided to init, pause (stop, start) and drop pgMemento in schemas, bundled under the CTL.sql file. It's also possible to reconfigure pgMemento's behavior for an already audited schema calling the renint endpoint.

Separate log functions for transaction_log and table_event_log (#49, #52)

INSERTs into these two log tables were previously done by the log_table_event function. Now this is split up to allow for only logging transactions without events. This is for example done by the endpoints mentioned above to track when a schema has been initialized for auditing or changed. The chosen options are then stored in the session_info field of the transaction_log.

New DROP AUDIT_ID behavior and revert treatment (#56)

DROP AUDIT_ID events were treated like TRUNCATE events before, because the drop_table_audit function let users only choose between logging the existing state or delete everything. But, in some scenarios like reinit you neither want to produce logs nor want to drop the audit trail. Reverting such cases required to differentiate between recreating the audit_id column and inserting the data.

Reverting a DROP TABLE event logs a RECREATE TABLE event (#33)

On the one hand, this is to show in the logs that this table has existed before and was brought back to life. On the other hand, it also helps to reassign the same log_id value in the audit_table_log to connect this table with its former history.

Set SECURITY DEFINER for functions changing log tables (#46)

All functions writing to the log tables will be created with SECURITY DEFINER privileges so that normal users can work with the pgMemento API but cannot change the content of the log tables if this wasn't granted to them.

Refactored support for quoted database objects (#41)

Many code changes to support quoted tables and columns could be reverted back by introducing a simple get_table_oid function in the right places. Now the code is a lot cleaner again.

PostgreSQL 12 compatibility (#40)

As the adsrc column was removed from the pg_attrdef system catalog in PostgreSQL 12, it had to be replaced with pg_get_expr(adbin, adrelid, TRUE). Plus, the -o option has been removed from pg_dump which was used in extension tests.

Bugfixes

Fixed returning results from restore_record (#51)

The restore_record function now simply returns the result from the restore_query and doesn't check if it's NOT NULL. This lead to returning rows only if all columns are NOT NULL which is likely to be not true.

Fixed closing txid_ranges on rename events (#49)

It could happen that the txid_range in the audit_table_log was not closed on RENAME TABLE events. This also lead problems of setup scripts using the audit_table_log, e.g. to drop auditing from tables which do not exist anymore. Note, that the UPGRADE_v061_to_v07.sql script fixes those open ranges before doing the upgrade.

Fixed revert ALTER COLUMN events (#44)

Reverting ALTER COLUMN events did not log data because the DDL event parsing stripped out the word that triggers the data logging.

Breaking changes and upgrade path

As you can guess from reading this changelog with all it's schema changes on the log tables many breaking changes come with this release. Please, use the provided UPGRADE_v061_to_v07.sql script to move your pgMemento installation to the newest version. I'd recommend to test it first on a backup before applying it against a production database. Please, refer to the wiki page about upgrading to learn about the breaking changes in more detail. Have a look at the script as well, to see which functions are dropped explicitly because their signatures have changed.

Have fun with this release. Have fun with the pgMemento extension. Please, provide feedback. Stay healthy. Thank you!

v0.6.1

5 years ago

Changelog

This is for the most parts a bugfix release to v0.6

  • Added support for quoted schemas, tables and columns #6 #13
  • More robust parsing of DDL queries which can now also contain comments #14
  • Covering more ALTER COLUMN events in logs incl. changing data type without USING, changing column DEFAULTs or NOT NULL contraints. #22
  • Improved handling for ADD/DROP AUDIT_ID events. #18
  • Dropping audit_id column with a query is now prohibited #18
  • Restore previous versions also as MATERIALIZED VIEWs #25
  • Improved audit views that are now able to list tables with different OIDs than in audit_table_log (appears after dump and restore) #23

Breaking changes:

  • restore_record function now returns SETOF INTEGER instead of INTEGER (too return 0 rows if given audit_id is not in the logs) #16
  • ADD AUDIT_ID events will get op_id 21 instead of 2 in table_event_log #18

Dedicated to my grandmother, who turns 79 today :)

v0.6

5 years ago

Changelog

Highlights are marked:

  • New file SCHEMA to store the pgMemento database schema (instead of SETUP)
  • Use of transaction variables to log current transaction ID
  • Foreign key on transaction_id column in table_event_log now references transaction_log.id instead of txid column. Thus, cycling txids don't matter anymore at all!
  • Changed many arguments in files REVERT, LOG_UTIL and RESTORE from BIGINT to INTEGER because of changes in transaction_log table
  • New UNIQUE index on transaction_log columns txid and stmt_date. Removed constraint transaction_log_unique_txid.
  • stmt_date column in transaction_log if now filled by transaction_timestamp() instead of statement_timestamp() because of new UNIQUE index
  • Changed bounds behaviour for txid_range columns in audit_table_log and audit_column_log from '[)' to '(]'
  • Changed all flag arguments in functions from integer to boolean (affects INIT, SETUP, DDL_LOG, RESTORE and SCHEMA_MANAGEMENT)
  • New script to upgrade from v0.5 to v0.6
  • Extended test routines
  • Moved documentation to wiki section and tidied up README

SETUP:

  • Added more system information columns to transaction_log table
  • Provide a JSONB field in transaction_log allowing clients to pass more session parameters to the logs
  • New function log_table_event to log transactions and all types of table events
  • Renaming a table also renames the unique constraint on the audit_id column
  • Updates for JSON fields are now logged correctly (thx @francoricci)
  • New function column_array_to_column_list to produce a string for jsonb_build_object
  • New function log_table_state to log values of given columns or the whole table
  • Renamed log_table_state and log_schema_state to log_table_baseline and log_schema_baseline
  • Rows with existing logs are now excluded from a baseline created with log_table_baseline
  • Changed arguments for get_txid_bounds_to_table from table_name, schema_name to table_oid
  • Function create_table_audit creates an ADD AUDIT_ID event which behaves like ADD COLUMN (op_id = 2)
  • drop_table_audit now provides a flag to decide if the table's audit trail should be kept or dropped. Additionally it creates a DROP AUDIT_ID event which behaves like a TRUNCATE event (op_id = 8)

DDL_LOG:

  • Added log support for RENAME TABLE (op_id = 12) and RENAME COLUMN (op_id = 22) events
  • ADD COLUMN events are now parsed and logged by table_alter_pre_trigger, so BEFORE the change instead of AFTER
  • modify_ddl_log_tables is only called if a DDL event has been registered in advance
  • Altering or dropping multiple columns at once produces only one JSONB log per audit_id
  • Removed log_ddl_event because of new log_table_event function

RESTORE:

  • Previously named VERSIONING
  • Restoring tables now works without temporal table templates
  • Renamed generate_log_entry/ies function to restore_record/set. They now return records instead of JSONB, but output can also be a JSONB record.
  • Provide API to query multiple versions of tuples
  • New functions restore_record_definition to provide the column definition list as TEXT needed for the restore functions
  • New functions restore_value and restore_change to retrieve single values from JSONB logs with correct data type

REVERT:

  • A revert of ALTER COLUMN now uses the restore_change function to retrieve the previous version
  • Allow for reverting updates against arbitrarily nested composite data types
  • Exclude JSONB logs to ALTER COLUMN events from revert statements as only one ALTER TABLE attempt has to be made
  • Added revert support for RENAME TABLE and RENAME COLUMN events
  • Some bugfixes in SQL statements to prepare revert order

LOG_UTIL:

  • Extended data log correction functions update_key and delete_key
  • New functions get_column_list_by_txid and get_column_list_by_txid_range to get historic table layouts
  • New helper function jsonb_unroll_for_update to create syntactically correct reverse UPDATE statements
  • Refactored audit_table_check function and moved it from RESTORE to LOG_UTIL
  • Changed arguments for delete_table_event_log from table_name, schema_name to table_oid

v0.5

6 years ago

Changelog

Highlights are marked:

  • Start adding test routines
  • The audit_id column of audited tables is no longer logged in audit_column_log
  • Query system catalogues directly instead of using the information_schema
  • Formatted most SQL code to hopefully improve readability
  • Updated documentation (README)

SETUP:

  • audit_id column is created with unique constraint. Additional audit_idx BTREE index is no longer needed.
  • New op_id values in table_event_log to enable reverts of DDL statements
  • Reduced number of columns in audit_column_log
  • New functions to un/register a table in audit_table_log
  • Data types from extension (e.g. PostGIS) are now logged correctly in audit_column_log
  • Improved performance for VIEWs
  • Logging the initial state will be ordered by the table's audit_id if no primary key is available

DDL_LOG:

  • DDL events executed from inside functions are correctly parsed with GET DIAGNOSTICS command
  • Bugfixes for extracting objects from parsed DDL commands
  • modify_ddl_log_tables does not need a ddl_action argument anymore
  • DROP TABLE / SCHEMA events will log data as truncated (op_id = 8) without executing a TRUNCATE command

REVERT:

  • Support for reverting data changes caused by DDL commands
  • Better grouping of logs and consideration for mutually exclusive events in revert_distinct_transaction(s) functions
  • Revert order is no longer sorted by table's audit_ids but by the row_log ID - better for self-referencing tables

LOG_UTIL:

  • New function delete_key to remove keys from JSONB logs - useful for data corrections

VERSIONING:

  • Fixed memory leak for retrieval of historic values

SCHEMA_MANAGEMENT:

  • Removed default_values_* functions as this is satisfied with audit_column_log
  • Improved queries to retrieve current foreign keys and indexes

v0.4

7 years ago

Changelog

This is a BIG release to me which includes many very important improvements (highlights are marked):

  • Now requires PostgreSQL 9.5 because for using to_jsonb, jsonb_agg, pg_event_trigger_ddl_commands() and UPSERT
  • Let the user decide to log existing data during initialization of pgMemento

SETUP:

  • A table operation type is only logged once per transaction, which greatly improves the performance of deleting data while auditing
  • Have only one unique index on table_event_log table (transaction_id, table_relid, op_id)
  • New ID column in audit_table_log in order to log renamed versions
  • Two new event types are logged in table_event_log table: ALTER COLUMN and DROP COLUMN (see DDL_LOG changes)
  • If UPDATEs did not change anything, nothing will be logged
  • Extra option for function create_table_audit to log existing data as inserted (log_table_state)
  • Moved log_table_state function from VERSIONING.sql to SETUP.sql
  • Function log_schema_state is using the dependency depth from audit_tables_dependency to mark INSERTs in correct order

LOG_UTIL:

  • New VIEW audit_tables_dependency to make reverts and other schema-based procedures more robust
  • Bugfix in get_txid_bounds_to_table function to create a correct audit_tables view
  • Moved VIEWs and get_txid_bounds_to_table to SETUP.sql
  • Custom aggregate function jsonb_merge to merge JSONB objects (very helpful for revert_distinct_transaction)

VERSIONING:

  • Improved performance for restoring historic versions of tuples, tables and databases
  • New function generate_log_entries to look at a past table state without creating a TABLE or VIEW
  • Code split up into several functions to reduce redundancy
  • Templates for jsonb_populate_record can now be created with create_restore_template to be used for on-the-fly restores
  • Restore process queries recent state only if exists and considers renaming of tables and columns
  • All restored tables are created with an CREATE TABLE AS command to avoid firing the event trigger

REVERT:

  • Reverts are no longer limited to non-cascading operations
  • Reverts are using the dependency depth from audit_tables_dependency view what makes them more robust against foreign key violations
  • Removed buggy merging option from recover_audit_version.
  • Creating distinct changes for JSONB fields ordered by age with jsonb_merge trick

DDL_LOG:

  • New event triggers will prevent data loss when DROP [SCHEMA, TABLE, COLUMN] or ALTER COLUMN ... TYPE ... USING events occur
  • New names for event triggers and rewritten functions
  • Trigger for CREATE TABLE events will not fire for temprary tables
  • Logging also CREATE TABLE, DROP TABLE and ADD COLUMN events in transaction_log and table_event_log

v0.3

8 years ago

Changelog

  • new folder structure
  • new install scripts to create and drop pgMemento
  • new control scripts to init, stop and restart auditing
  • new pgModeler file, to get an better understanding of the pgMemento schema

SETUP:

  • two new tables to audit DDL changes of tables (audit_table_log) and columns (audit_column_log)
  • removed table_templates table as pgMemento does DDL logging now
  • removed unique constraint for table_event_log (transaction_id, schema_name, table_name) to log multiple table events per transaction
  • minor changes to trigger procedures in order to deal with multiple table events per transaction
  • removed columns table_name and schema_name from table_event_log, relid is enough
  • added extra ID column to transaction_log table as txid can cycle
  • moved log_schema_state and log_table_state function to VERSIONING.sql

VERSIONING:

  • a new templating mechanism for jsonb_populate_record which uses automatically generated templates from the DDL log tables
  • simplified query to exclude audit_ids that did not exist for a requested transaction ranges
  • it now contains the log_schema_state and log_table_state function from the SETUP.sql script

SCHEMA_MANAGEMENT:

  • removed some dynamic SQL code

REVERT:

  • new query to prepare the revert process which reflects the new logging behavior of table events (this one should not violate any foreign keys unless except for cascading DML)
  • the actual revert is done by the new recover_audit_version function
  • additional revert function to process ranges of transactions
  • alternative revert function that only reverts the oldest event per audit_id. Reverts are faster, but foreign keys would be violated

LOG_UTIL:

  • new script to provide a set of utility functions
  • get all transaction ids that had changed a given audit_id
  • e.g. delete all logs corresponding to a txid or table event
  • audit_tables view that lists all tables having an audit_id column (to know which tables where enabled for versioning)

DDL_LOG:

  • new script to create event triggers and event trigger procedures to update the DDL log tables when creating, altering and dropping tables