Skip to content

Reverting Datasets

Overview

Dataset commits are the backbone of SolveBio's datastore and represent a change log of modifications to a dataset. A dataset commit represents all changes made to the target dataset by the import/migration/delete process.

All of these changes can be reverted by creating a rollback commit. All commits can be reverted. A rollback commit will restore the dataset to the state it was in before the commit was made.

The parent commit of a rollback commit is the commit to be reverted.

Rollbacks

A rollback commit represents a revert of a commit. The rollback commit will do different things depending on the mode of the parent commit. It may delete records, index a rollback file or both.

A rollback file is generated for overwrite, upsert and delete modes. This file is generated right before records are committed, by querying the current state of the dataset and storing those records in a file. This file is stored with the commit object and used when a rollback commit is created.

Commit mode Description
append Reverts by deleting all records containing parent _commit ID
delete Reverts by indexing the records deleted (stored in rollback file)
overwrite Reverts by deleting all records containing parent _commit ID. Then indexing the rollback file
upsert Same as overwrite commit mode

Checking Ability to Rollback

In order for a commit to be reverted, there must be a clear "commit" stack on the dataset. Commits with mode overwrite or upsert will block reverts and must be reverted first. When creating a rollback, if there are blocking commits, the endpoint will fail and return these blocking commit values.

Example

Imagine a simple dataset containing employee names and employee addresses. This is maintained by an annual import of employees with address changes (including new employees.) Over the course of a few years, several employees move addresses. Several employees join the company, and some leave as well.

  • Commit A (Import 2015 address file in overwrite mode)
  • Commit B (Import 2016 address file in overwrite mode)
  • Commit C (Import 2017 address file in overwrite mode)
  • Commit D (Import 2018 address file in overwrite mode)

Let's do a simple case first, where nobody actually moves addresses and therefore only new employees are added.

If we revert Commit C, then we only remove new 2017 employees from the dataset. The 2015, 2016 and 2018 employees all remain.

Now let's assume people do move and so each year we have all sorts of address changes.

If you were to revert Commit C, then the dataset would be restored to the known state that it was in Commit B. It would only reset the 2017 addresses to 2016 addresses for people that did not also change in 2018. It would also leave any new employees added in 2018. This is an inconsistent state and not a valid snapshot of the dataset at the time Commit C was indexed. Therefore this is not allowed and attempts to rollback will fail. Commit D must be reverted first.