A curated list to help you manage temporal data across many modalities 🚀.
A curated list to help you manage temporal data across many modalities 🚀.
Generative Art Created By DALL·E!
Data versioning is the practice of storing multiple versions of the same data and providing a mechanism for accessing and managing these versions. This can be useful in a variety of situations, such as when data is accidentally deleted or corrupted, or when it is necessary to see how the data has changed over time. The vast majority of "data versioning" tools you see today are related to better managing your datasets for machine learning. The implementation paradigm used is to store versions of your data and models in Git commits. Therefore the following part of the awesome list is centered around machine learning. However, there are other ways to manage your temporal data covered in later sections.
Data time travel refers to the ability to go back in time and access previous versions of data. In order to enable data time travel, it is necessary to implement a system for versioning data, which involves storing multiple versions of the same data and providing a mechanism for accessing and managing these versions. Whereas temporal tables, also known as system-versioned temporal tables, are tables in a database that automatically track the history of data changes and allow you to query the data as it existed at any point in time. Both time travel an temporal tables often are used interchangablely to mean the same thing. Temporal tables are more of an implementation specific feature of some databases. These tables are useful for auditing, tracking changes to data over time, and performing point-in-time analysis. You can usually query a temporal table using the FOR SYSTEM_TIME clause in a SELECT statement.
Slowly changing dimensions are those in which the attributes of the dimension change over time, and the changes need to be tracked in the data warehouse. For example, a customer's address or name might change over time, and the data warehouse needs to track these changes so that historical data can be analyzed correctly.
Bitemporality is a concept in database management that refers to the ability of a database to store and manage data that is associated with multiple time periods. This can include historical data as well as data that is still in the process of being entered or updated. In a bitemporal database, data is stored in multiple versions, with each version corresponding to a specific point in time. This allows users to view and query the data as it existed at different points in time, which can be useful for a variety of purposes such as understanding how data has changed over time or for tracking the history of a particular piece of data.
Change data capture (CDC) is a process that captures and stores data about changes made to a database or other data source. It is often used in data warehousing and data integration scenarios to ensure that data in different systems is kept up to date and in sync. CDC involves tracking changes made to a database or data source and storing information about those changes in a separate location, such as a separate database or log file. This allows the data in the original source to be updated, while still maintaining a record of the changes that were made.
Soft delete is a method of deleting data from a database in a way that allows the data to be recovered if necessary. When data is deleted using the soft delete method, it is not physically removed from the database. Instead, it is marked as deleted and is typically no longer visible to users, but it can still be recovered if necessary. The soft delete method is often used as a way to prevent accidental or unintended data loss, as it allows deleted data to be recovered if necessary. It is also useful in scenarios where data needs to be retained for compliance or regulatory purposes, as it allows data to be retained while still making it unavailable to users.
This list started as personal collection of interesting things about data versioning. Your contributions and suggestions are warmly welcomed. Read the contribution guidelines.