Tag Archives: bitemporal

Chicago PUG October 15 recording

As many of my followers may recall, we had the pleasure of Paul Jungwirth presenting at the Chicago PUG meetup on October 15. I have high hopes for having true temporal data support in Postgres 18, and I wanted to know all the details of what to expect. I believe I can speak on behalf of everyone who watched and was present on October 15, either in person or virtually – it was an exceptional talk! I do not recall any other meetups where so many people asked me about the recording – and it is finally here!

Enjoy!

2 Comments

Filed under events, talks

(Bi)Temporal Tables, PostgreSQL and SQL Standard

At PG Day Chicago, I presented an extended version of my talk given last year at Citus.con – Temporal Tables and Standard. Just between the time my talk was accepted and I delivered the presentation, I learned that PG 17 would include the first-ever support of an important temporal feature: uni-temporal primary keys and unique constraints.

It has been a while since the last time I presented anything temporal-related, which meant that many people in the audience hadn’t heard anything about the bitemporal model before. There was no way I could cover everything in 40 minutes, and many questions, which were asked both during the Q&A and later in the hallways, remained unanswered.

In this blog, I will address some of these questions and expand on what I would like to see in the upcoming Postgres implementation of temporal tables.

  1. Bitemporal framework and GIST. The key feature of the bitemporal model is its dependency on existing PG extensions, specifically on GIST indexes and GIST with EXCLUSION constraints. In fact, the GIST extension does all the work needed to support (bi) temporal primary/unique keys: it ensures that there is no time overlap on any two distinct values of the primary key. In the bitemporal model, we check the same thing for the two time dimensions. For those who never needed GIST indexes, here is the relevant documentation. I learned about GIST when I first started implementing bitemporality, and I could not believe all my needs were already met!
  2. Disk space requirements. For some reason, people believe that keeping all versions of each tuple requires “too much disk space.” I won’t deny that you need more space to store row versions than when you don’t; however, how much more is often overestimated. In my talk at PG Conf.EU 2022, I presented an example of storing changelog vs. storing data in a bitemporal model and demonstrated that it actually takes less space while allowing queries to be executed much faster.
  3. Excessive IO. One of the questions I was asked was whether the use of bitemporal model increases the system IO. The answer is surprisingly insignificant. Let’s look at the database operations. INSERT is the same insert, whether it is temporal or not. The non-temporal update is equal to one INSERT and one DELETE. The uni-temporal UPDATE results in one INSERT and one UPDATE; in other words, two inserts and one delete. The bitemporal UPDATE is equal to two inserts and one update; in other words, it is equal to three inserts and one delete. That means the number of costly operations remains the same as with regular updates. Also, note one remarkable fact: the only field that changes in the updated record is the time interval. That means that 1) the record size is not going to change 2) since GIST index is an R-Tree, the intervals order is defined by inclusion. When we update the time range, the only thing we are doing with it is making is smaller (end-dating), thereby the node in the index will never move, which means that GIST indexes in this case will (almost) never experience a bloat. As for all regular B-tree indexes, all of the updates in the temporal models are HOT updates.
  4. Why is the first temporal feature in PG 17 so significant? Having temporal primary/unique temporal keys in PG 17 might seem insignificant – after all, that’s what GIST with exclusion does anyway. However, one of my huge asks for many previous years was the ability to see temporal keys in a table description. I’ve invented a lot of tricks (mainly having an “empty” check constraint) so that I would be able to identify temporal tables using the \d psql command. Now, we can do it.
  5. System time or application time? Now, I am going to switch to my questions and concerns about “what’s next” in the area of temporal tables support. When I first heard about the temporal key in PG 17 at SCaLE, I immediately asked the presenter when the second dimension would be added, to which he replied – very soon. We are actively working on it, and we are going to implement everything in the standard. That means, among other things, that Postgres should distinguish between SYSTEM_TIME (as per standard) and application time, and I do not see this distinction in the ongoing discussions. 

Why is this important? 

The SQL standard requires adding semantics to DML, adding a “FOR PERIOD” clause to all commands (and assuming CURRENT if it is omitted, to keep the old code functioning). However, for the SYSTEM_TIME, “FOR” is irrelevant, because system time is defined as “transaction time”, so it can only start at the “present” moment, and for any tuple in a temporal table, it can be either in the past or now. As for application time, it is not bound to a transaction, and “FOR” can be defined as any past, present, or future time period. In both cases, “update” is not a regular update but a sequence of inserts and updates, as I described in 2). And when we define temporal referential integrity, we need to take these semantics into account, which I still have to see. From my perspective, this test is not correct:

INSERT INTO temporal_rng (id, valid_at) VALUES
   ('[1,2)', daterange('2018-01-02', '2018-02-03')),
   ('[1,2)', daterange('2018-03-03', '2018-04-04')),
   ('[2,3)', daterange('2018-01-01', '2018-01-05')),
   ('[3,4)', daterange('2018-01-01', NULL));
 ALTER TABLE temporal_fk_rng2rng
     DROP CONSTRAINT temporal_fk_rng2rng_fk;
 INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', daterange('2018-01-02', '2018-02-01'), '[1,2)');
 ALTER TABLE temporal_fk_rng2rng
     ADD CONSTRAINT temporal_fk_rng2rng_fk
     FOREIGN KEY (parent_id, PERIOD valid_at)
     REFERENCES temporal_rng;
 ALTER TABLE temporal_fk_rng2rng
     DROP CONSTRAINT temporal_fk_rng2rng_fk;
 INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[2,3)', daterange('2018-01-02', '2018-04-01'), '[1,2)');
 -- should fail:
 ALTER TABLE temporal_fk_rng2rng
     ADD CONSTRAINT temporal_fk_rng2rng_fk
     FOREIGN KEY (parent_id, PERIOD valid_at)
     REFERENCES temporal_rng;
 ERROR:  insert or update on table "temporal_fk_rng2rng" violates foreign key constraint "temporal_fk_rng2rng_fk"
 DETAIL:  Key (parent_id, valid_at)=([1,2), [2018-01-02,2018-04-01)) is not present in table "temporal_rng".

(It is very probable that the link that was sent to me does not reflect the current status, so I am holding off my judgment until I double-check; however, that’s a good illustration of the importance of operations semantics)

7 Comments

Filed under SQL

Adding partitions to bitemporal model

Until recently, I never tried to partition bitemporal tables because I could not find a good way of partitioning them and because my projects never grew big enough for partitioning becoming a necessity. However, the need became apparent two months after I started with DRW!

You can see the results in the last commit to pg_bitemporal. The changes were committed about a month ago, but I didn’t announce them, and that’s what I finally doing :).

There is no change to ll_create_bitemporal_table, so no changes are required to the existing code. I added two more functions: ll_create_bitemporal_table_partitioned and ll_create_bitemporal_partition.

This first version of bitemporal tables partitioning is very restrictive. The only partitioning currently supported is by range on the business key. The primary reason for this is the need to keep bitemporal operations performant, which means that we can’t store any adjacent records with the same business key in different partitions. You can think about it as all the records related to one object are, in some sense, “one object” and thereby can’t be separated. I believe it should be possible to expand the partitioning by adding more columns to the business key (the same way as for regular tables, the primary key should be a part of the partitioning key.). 

I think I will implement it in the next version, but for now, ll_create_bitemporal_table_partitioned will create a table partitioned by range on the business key, and ll_create_bitemporal_partition will create a partition for a specified range. 

I think it is a good start because it dramatically increases the scope of potential usage of bitemporal tables – check it out!

Leave a comment

Filed under Development and testing

How to talk about bitemporality?

The speakers’ feedback from PG Conf EU became available a couple of weeks ago, and I am still thinking about it, specifically about the feedback on my bitemporal presentation. There were a lot of critiques, which is good because it was very specific and consistent. That means that the issues people see are the real issues I need to address.
But here comes a more difficult part. As much as I feel that I have talked about bitemporality at each and single conference for the past nine years, that’s not true. And at each and single conference, most of the audience had never heard about the bitemporal model before they attended my talk. That means that to have everybody onboard, I need to explain everything ab ovo in each talk. A big portion of comments was about that – jamming the content and “assuming that everybody knew what I was talking about. If I go ahead and start explaining the topic ab ovo, I will use all the 45 allocated minutes to fill in the audience. And how can I make time to talk about news and improvements?!
The topic of bitemporality is unknown because very few people use it, and it will continue to be a case until I convince more people to try 🙂

Any suggestions, anybody? I would really appreciate a piece of advice! I want to break this cycle!

Leave a comment

Filed under Data management, talks

Bitemporal Q&A

After my bitemporal presentation, people asked lots of questions, and more questions followed later during the conference and also on Twitter and LinkedIn.

In this blog, I will try to summarize all my answers and give more details on the project development.

  1. What is my take on DDL changes in the bitemporal framework? I do not have a good answer for that. When I used bitemporality in the production environment and had to change a table structure, I would either add a column with NULL values for all the previous records or the opposite – set a default value to null for the new records. I believe that there is no fundamental difference between “not having” a column and “setting the value to NULL.” If you disagree, I am happy to discuss it!
  2. In order for bitemporal operations to look “less ugly,” can we create updatable views and hide bitemporal operations in INSTEAD triggers? Not really, because in this case, you won’t be able to pass effective and asserted ranges, and also, you can’t distinguish between update and correction.
  3. Can I use pg_bitemporal? Is it “in production?” I maintain my fork (hettie-d/pg_bitemporal), which means that I review all the bugs and do my best to fix them in a couple of days or suggest a workaround (or explain why what you are trying to do is not going to work). I used this framework for five years at Braviant Holdings, and to the best of my knowledge, the company is still using it a year and a half after I left with no major issues. One more startup has been using it for at least six months, and they not only report the bugs to me but also offer improvements.
  4. Did you think about making it an extension? I did, and there are two reasons I never got to it. First, I want people to be able to try it in different environments, and if a company uses any cloud-managed service, they can’t install unapproved extensions. Second, I have neither skills to package it nicely nor the time to learn them. I will be most grateful for any assistance, but for a reason mentioned before, I would be even happier to see it as a part of the Postgres core, hopefully, rewritten better and more efficiently.
  5. How do you support foreign keys in the bitemporal model? I still have a formalize the bitemporal referential integrity constraints creation, but the idea is very simple. You can create triggers on a regular insert that are similar to the internal Postgres triggers used to support the regular referential integrity. Here is why you need to check the parent values on insert only. In the bitemporal model, the tuples are not updated except for the changes in the asserted range (check the diagrams from my presentations). With this, the only thing we need to check is whether, when we insert a physical record, the parent key is effective and asserted at the effective start and asserted start and to infinity. And one more trigger should prevent the parent key from being deleted if there are dependent child records. 
  6. Can I use one-dimensional time? Yes, but in that case, you loose the ability to correct (or rather distinguish between correction and update) and future assertions.
  7. What if you need to actually delete something due to legal requirements? You will need to work with your customer on a case per case basis explaining the consequences, but at the end of the day, you need to do what is required by law.
  8. Users have to add time constraints to all queries, is there any way to help them? From my experience, as soon as the users understand what they are getting while using the bitemporal framework, they lear to add a “current” time pretty fast and are not get frustrated. Also, we provided them with many “canned” reports wth parameters, thud “hiding” the time dimensions filters

Please leave a comment if I didn’t answer a question you have, I wil be happy to answer!

Leave a comment

Filed under Development and testing, research

Bitemporality and postgres_air

Last Tuesday at Chicago PUG, I had a rehearsal of my new bitemporal presentation. Before it is presented to the broader audience next Wednesday, I will give one more “preview”  at Helsinki PUG on Monday, October 24, incorporating many helpful suggestions from my first listeners at DRW. One thing which differentiates this presentation from all of my prior bitemporal talks is the data set. 

I wanted to create a new real-life size example, not the toy-size 3-row tutorial I had before. As any presenter knows, building a realistic size example is extremely difficult when you do not have it in production.

And then I thought of our postgres_air database. We created it in the hope that it would be helpful not only for our book but also for others who would need test data sets for their projects. And then I thought- why can’t I use it for my own presentation?’

I knew precisely what I needed, but building a new bitemporal data set took ten times longer than I thought it would take! During this process, we realized how many inconsistencies our data set had, and now we will most likely make changes to it as well. But overall, I am glad I could use the postgres_air database exactly as intended. 

Now I plan to rewrite our pre-pandemic “Ultimate optimization” using postgres_air, and we have already started working on several NORM_GEN examples. 

2 Comments

Filed under events, talks

My Presentation at Postgres London 2022

It was great to be in London again. It was great to present at Postgres London for the first time, and I am especially grateful to the Talk Selection Committee for choosing my bitemporal talk. 

I could not be happier with the feedback I received – people actually want to try using pg_bitemporal! 

Since the traffic to the pg_bitemporal repo has increased, I decided to add some comments and answer some questions that people asked me after my presentation.

About the presentation. I honored the organizer’s request to run it from their computer. I mainly was concerned about the videos – whether they would play with a clicker, which was the only thing I checked. Only during the presentation did I realize that two-thirds of the colors were not visible and that animations didn’t work. 

That being said, the slides are available here (at least for some time), and the SQL for the tutorial is here. I hope that with all colors in place, the tutorial will be more understandable. 

About pg_bitemporal. I know that when you google pg_bitemporal, the first repo which comes up is the scalegenius one. Since the beginning on 2022, I maintain my own fork, and all the changes/bug fixes which were done after January 2022 are present in my fork only. Please go to  https://github.com/hettie-d/pg_bitemporal to get the latest. 

How to use. On the database where you want to use it, run _load_all.sql. It will create all schemas and functions you will need. 

Thank you again for attending my presentation, and I hope you will give pg_bitemporal a try! Please let me know how it goes!

3 Comments

Filed under events, talks

Bitemporal bug fixes

Attention to those who are currently using pg_bitemporal or just exploring it. I’ve received some bug reports from new users and fixed them. Please note that the fork I support is

https://github.com/hettie-d/pg_bitemporal

All fixes are there.

Leave a comment

Filed under Development and testing, SQL

Bitemporality in Practice

Yesterday marked a significant milestone in the life of the pg_bitemporal library. Five years ago, when this project started, we thought that the first practical application of that concept would be in accounting. Indeed, when I explain to people who work in finances what benefits this approach can yield, my explanations usually receive an enthusiastic response. However, when it comes to reality, i.e., doing real financial reporting, people tend to rely on more familiar approaches. After all, the financial reports are too important to mess them up. But through all these years, I hoped I would be able to demonstrate the usefulness of the bitemporal concept and the productiveness of that approach.

What happened yesterday was that my very first report, which used the bitemporal framework for accounting, went live. And the way it is using bitemporality is just how I envisioned it five years ago.

I am a scientist by nature and by education, but I have to say that no theory is good without practical application 🙂

1 Comment

Filed under Data management, research

More Updates on the State of Bitemporality

A month ago, I’ve proudly announced that I was able to modify bitemporal correction to allow correction of any effective interval of past and present, not just the current one. I was planning to embed this capability into the regular bitemporal correction, but after a couple of weeks of hesitation I decided against it.

The reason is very egotistical :). At Braviant Holdings, we use bitemporal correction in our production application, and the response time is critical. I already know that the generalized case of bitemporal correction is slightly slower due to the additional checks, and I am not going to risk the performance of what’s already in place. This being said, it shouldn’t make a significant difference for those who is using pg_bitemporal at their own risk, since the difference in the execution time is just “times two,” so if your original correction is optimized to 0.8 sec, the new one may be 1.7 sec. Makes difference for us, though.

Howeer, the function is out there and available, so if you want to try it out, check out ll_bitemporal_correction_hist.

Another update is relate to my attempts to create a barrier for straight insert/update to bitemporal tables (without compromising performance).

My original idea, that it would be possible to revoke  simple INSERT/UPDATE/DELETE from bitemporal tables and make bitemporal operations SECURITY DEFINER won’t work.


The reason is that if we create bitemporal functions as SECURITY DEFINER, and definer is a database owner, then when we grant any user permission to execute this bitemporal function on ANY bitemporal table. That means, we won’t be able to restrict access to specific tables, because we can’t grant execution based on the function parameters (again, theoretically possible, but requires creating additional objects)Now I am back to the trigger idea again.  What could be theoretically done is to have INSTEAD triggers, which would just disable insert/update/delete and then to disable these triggers inside functions. But this again requires a higher level of permissions. 

Obviously, anybody who want to instill this level of data safeguarding, can create their own aliased to bitemporal functions, related to specific tables or schemas, but that will be a custom build, not anything I can provide generically.

At the moment I am out of new ideas, just wanted to state it officially. And if anybody is interested in how to create these custom-built functions – please feel free to reach out!

pg_bitemporal public repo can be found here.

Leave a comment

Filed under Data management, SQL