A Insert-only table is a database table strategy that keeps all the past versions of an object in a simple and lightweight way . It allows the system to efficiently represent the state as it used to be at a past point in time and show changes over time. No self-referential are needed, avoiding costly query strategies. Single rows keep the full-state at a point in time, avoiding the need to query many rows to recover a single object.
An insert-only table requires the following columns:
And all the other columns you want to add of course.
There needs to be a trigger on insert that selects a row with the same "oid" and ts_end of null and updates it with the value of the "ts" of the row being inserted.
CREATE TRIGGER IF NOT EXISTS rules_rule_insert_trigger BEFORE INSERT on rules_rule
FOR EACH ROW
BEGIN
UPDATE rules_rule SET ts_end = new.ts WHERE oid = new.oid AND ts_end IS NULL;
END;
INSERT the object with:
vid
oid
. ts
ts_end
should be null.op
should be 'C'Instead of updating, Insert an object with:
vid
oid
of the object you're wanting to update ts
op
should be 'U'. ts_end
should be null. This will start the trigger which grabs the ts value of the row your inserting and stores it in the ts_end of the row with the same oid and ts_end of null
Deleting an object is also a insert, exactly like updating except that:
op
should be 'D'.Since we don't remove anything, undelete is possible by following the update process on a "deleted" oid. Magic.
Generally you care about what is currently active. This is done by selecting rows that have ts_end of null and op <> 'D'
.
Getting all the history of an object is of course as simple as selecting on the oid
and optionally ordering on ts.
Depending on the usage, it's sometimes better to refer to an exact version of the object (vid) or the object independent of it's exact version (oid). This system allows both. If this is hard to choose, it means you need to design the temporal dimension of your data model better.
To speed up updates and deletes, add the following index:
create unique index rules_rule_oid_ts_end_uniq on rules_rule (oid, ts_end);
This also guarantees that you can never have two versions of the object that are current at the same time.
If you have a usage pattern with a lot of objects that is heavy on querying for the latest state of the objects, you can add an index of the oid with a where clause on a ts_end of null. Add the additional condition on op <> 'D' if you also have a lot of deleted objects slowing it down. Like so:
create unique index rules_rule_active_idx
on rules_rule (oid)
where ts_end IS NULL and op <> 'D';
You can imitate a traditional table (read-only) by creating a view where only rows with ts_end is null and op <> 'D' . In the fields only show oid and ts as the last update time and nobody will ever be any wiser.
CREATE VIEW
rules_rule_active (oid, ts, …)
AS select oid, ts, …
FROM rules_rule
where ts_end IS NULL
and op <> 'D';
If the table becomes unwieldy, you can remove all oids of objects that have op = 'D' and ts is older then a certain age.Another (complementary) approach is to remove entries with ts_end of a certain age. The only artefact will be that the oldest entry will be a op of "U" rather then "A".
As an example, here's the Model in Django. If there are multiple models that use this, this could be abstracted to a sub-classable Model-subclass . models.py
class Rule(models.Model):
oid = models.UUIDField(null=False, editable=False, db_index=True, help_text="Object ID of the rule. Stable with
different updates of the rule.")
vid = models.UUIDField(null=False, editable=False, primary_key=True, help_text="Unique ID of the version of the rule.
Changes with each update.")
ts = models.DateTimeField(blank=False, auto_now=True, editable=False, db_index=True, help_text="The moment this version
was created")
ts_end = models.DateTimeField(blank=True, null=True, editable=False, help_text="The moment the rule no longer is valid")
op = models.CharField(max_length=1, null=False, default="A", editable=False, help_text="Operation of this version",
choices=[("C", "Create"), ("U", "Update"), ("D", "Delete")])
## The fields of your domain.
objects = models.Manager() # All of the rule versions.
active = ActiveRuleManager() # Only the currently active rules
class Meta:
unique_together = (("ts_end", "oid"),) # This ensures no rule can have two current versions.
get_latest_by = "ts"
def save(self, *args, **kwargs):
self.vid = uuid.uuid4() ## Make sure every save is a new entry
if self.oid is None: # no oid, means this is a new entry (not an update of an existing one)
self.oid = uuid.uuid4()
self.op = "A"
else: ## it's an update of an existing entry
self.op = "U"
super().save(force_insert=True, *args, **kwargs) # Call the "real" save() method.
def delete(self, using=None, keep_parents=False):
self.vid = uuid.uuid4() ## Force a new row.
self.op = "D" # Mark it as the final state of the Rule.
super().save(force_insert=True) # Call the "real" save() method.
self.oid = None
class ActiveRuleManager(models.Manager):
def get_queryset(self):
return super().get_queryset().filter(ts_end=None).exclude(op="D")
To add the trigger add this to your migrations script as well
migrations.RunSQL(
"""CREATE TRIGGER IF NOT EXISTS rules_rule_insert_trigger BEFORE INSERT on rules_rule FOR EACH ROW BEGIN UPDATE
rules_rule SET ts_end = new.ts WHERE oid = new.oid AND ts_end IS NULL; END;"""
)
Usage has shown that most of the operations only care about the currently active rules. To make this easy the Rule model has two Managers. "active" only returns objects that are currently active, while the common "objects" keeps returning all rows, allowing access to all versions of the rules. This removes internal logic workings of the insert-only table from common usages.
> > > from rules.models import Rule
> > > Rule.active.count() # all active rules 3 # There are 3.
> > > Rule.objects.count() # Get all versions of all rules 47 # There are 47 of those.
> > > Rule.active.latest().oid # the id of the active rule that was changed the latest UUID('24eb3231-0dc2-451d-b477-2086798a9d12')
> > > Rule.objects.filter(oid=Rule.active.latest().oid).count() # How many versions of that rule are there? 2
CREATE OR REPLACE FUNCTION insert_only_update()
RETURNS trigger AS
$$
BEGIN
UPDATE rule SET ts_end = NEW.ts WHERE pid = NEW.pid AND ts_end IS NULL;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER insert_only_rule_trigger
BEFORE INSERT
ON rule
FOR EACH ROW
EXECUTE PROCEDURE insert_only_update();