Category Archives: SQL

January 26, 2025 · 12:10 pm

Indexing Partitioned Table Disaster

And one more tale from the frontline!

When I encounter cases like this, I start wondering whether I am the first person who ever tried to do “this” for real, whatever “this” is.

Here is a story. When anyone gives a talk about partitions, they always bring up an example of archiving: there is a partitioned table, and you only keep “current” partitions, whatever “current” means in that context, and after two weeks or a month, or whatever interval works for you, you detach the oldest partition from the “current” table and attach it to the “archived” table, so that the data is still available when you need it, but it does not slow down your “current” queries.

So here is Hettie confidently suggesting that a customer implement this technique to avoid querying a terabyte-plus-size table. A customer happily agrees, and life is great until one day, an archiving job reports an error of a “name already exists” for an index name.

current.a_table is a partitioned table

current.a_table_2024_12_17
current.a_table_2024_12_18
...
current.a_table_2024_12_31

are partitions.

a_table_useful_index_idx is a global index on (col1, col2, cl3, col8)

archive.a_table is an archived partition table

archive.a_table_2024_12_01
archive.a_table_2024_12_02
...
archive.a_table_2024_12_15
archive.a_table_2024_12_16

are partitions

When I looked at the conflicting name, I saw the name of the index:

a_table_col1_col2_col3_col8_idx

Somehow, this name was coming from the December 17 2024 partition that we were trying to archive, and I wondered why this name, when I clearly named the global index differently, all of a sudden, this name reappeared.

I decided that I had overlooked it in the very beginning and renamed all indexes in the archive.a_table partitions so that there would be no possible collisions.

However, in two weeks, the problem reappeared. Once again, I thought that it was all because I neglected to create a global index on the archive.a_table and rebuild-renamed everything, and surely… You got it!

After that happened for the first time, I finally started to read documentation, and I learned that:

When CREATE INDEX is invoked on a partitioned table, the default behavior is to recurse to all partitions to ensure they all have matching indexes. Each partition is first checked to determine whether an equivalent index already exists, and if so, that index will become attached as a partition of an index to the index being created, which will become its parent index. If no matching index exists, a new index will be created and automatically attached; the name of the new index in each partition will be determined as if no index name had been specified in the command. If the ONLY option is specified, no recursion is done, and the index is marked invalid. (ALTER INDEX … ATTACH PARTITION marks the index valid, once all partitions acquire matching indexes.) Note, however, that any partition that is created in the future using CREATE TABLE … PARTITION OF will automatically have a matching index, regardless of whether ONLY is specified.

Now that I have compared the documentation with everything I observed, the situation has become clear.

Each time a new partition is created, the matching index is created as if the name was not given, and there is no way to change this behavior. That’s why, although I called my index a_table_useful_index_idx, all partitions of this index were created with the default name, which was a_table_col1_col2_col3_col8_idx, and with each new partition, since the name was already taken, the next number was added to the name. That way, the last created index was a_table_col1_col2_col3_col8_idx14, but when the partition, which was 15 days old, was detached, the name a_table_col1_col2_col3_col8_idx became available again, so the index on the next partitions got this name! And two weeks later, “the same name” created a collision when we tried to attach this partition to the archive.a_table table!

If you know me, you already know I wrote the process for renaming the index after creating a new partition. But what if we have multiple indexes that need to be appropriately renamed? Yes, I am writing a generic procedure for that, but it would be much better if I didn’t have to!

Here are my questions to the Postgres community:

Did anyone experience a similar problem? If yes, did you suffer in silence, or did you write a blog post about your solution?
Has anybody who recommends archiving ever tried it?
Did they ever use indexes other than the primary key?
Can we set the names of the indexes based on partition names?
What do I need to do to make it happen :)?

7 Comments

Filed under Data management, SQL

Tagged as postgresql

December 29, 2024 · 8:30 pm

Can we use this index, please? – Why not?

It’s Christmas time and relatively quiet in my day job, so let’s make it story time again! One more tale from the trenches: how wrong you can go with one table and one index?

Several weeks ago, a user asked me why one of the queries had an “inconsistent performance.” According to the user, “Sometimes it takes three minutes, sometimes thirty, or just never finishes.” After taking a look at the query, I could tell that the actual problem was not the 30+ minutes, but 3 minutes – when you have a several hundred million row table and your select yields just over a thousand rows, it’s a classical “short query,” so you should be able to get results in milliseconds.

The original query was over a view with self-join, and at first, I suspected that something was wrong with the view itself, but then I got it down to one SELECT from one table, which was indeed super-slow: taking minutes while it should have taken seconds. The “inconsistency” was due to the high I/O and dependent on what was in the shared buffers at the execution time. The query looked like this:

SELECT * FROM large_table
  WHERE col1='AAA'
  AND col2='BCD'
  AND created_at BETWEEN '01-01-2012' AND '12-31-2012'
  AND extract (hour FROM created_at)=16
  AND extract (minute FROM created_at)=15

There was an index on all of the attributes which were referenced in the query:

CREATE INDEX large_table_index ON large_table (col1, col2, created_at);

The query plan looked perfect: INDEX SCAN using that index; however, the query was incredibly slow because, for each fetched record, the hour and minute had to be verified (and you’ve already guessed that the table was not only large but also wide).

According to the execution plan, the number of rows selected during the index scan was about 30M, and subsequent filtering reduced it to a little bit over 1K. I started to think that as ridiculous as it sounds, it could be a good idea to create an additional partial index or to include the “hour” and “minute” parts into the index. (Un)fortunately, both of these solutions didn’t work because extract and other alternatives are not immutable and can’t be used in indexes. I didn’t know what to do, but at some point, I ran

SELECT count(*) FROM large_table WHERE col1='AAA' AND col2='BCD' AND created_at BETWEEN '01-01-2012' AND '12-31-2012' AND extract (hour FROM created_at)=16 AND extract (minute FROM created_at)=15

just because I needed this count, and to my astonishment, it ran in milliseconds! Immediately, I ran EXPLAIN ANALYZE and saw that in this case, Postgres chose INDEX ONLY SCAN! Since the whole record was not needed, the filtering was performed in the index blocks themselves!

That was great, and there was no reason the original query could not be optimized the same way, but how could I explain it to the query planner? I remembered my conversation with the user who mentioned that “in most cases, this index works perfectly, and the results for any interval are returned very fast.” Do not ask why I had decided to rewrite a query as presented below, but it did the trick! I guess, in the moments like this, I do “think like Postgres.”

SELECT * FROM large_table
WHERE (col1, col2, created_at) IN (
   SELECY col1, col2, created_at 
   FROM large_table
     WHERE col1='AAA'
     AND col2='BCD'
     AND created_at BETWEEN '01-01-2012' AND '12-31-2012'
     AND extract (hour FROM created_at)=16
     AND extract (minute FROM created_at)=15)

I hope you enjoyed reading this Christmas story as much as I enjoyed sharing it!

16 Comments

Filed under SQL

Tagged as optimization, postgresql

December 22, 2024 · 6:06 pm

Optimizing access to partitioned tables

I am unsure whether others have this problem, but I find myself explaining the benefits (and what’s not)of partitioning tables in Postgres over and over again.

Most times, developers have unrealistic expectations that if they partition a gigantic table, “it will be faster to select from it.” I always have to explain that the goal and the purpose of partitioning is improving maintenance, not query optimization, and if the execution speed of short queries remains the same as before partitioning, they should consider it a success (yes, there are no rules without exceptions, and there are cases of performance improvement, but those are exceptions indeed).

The next thing I tell developers is that for the queries not to slow down, they need to explicitly include the values of the attributes used for range partitioning. Quite often, this requirement is not obvious, and moreover, it requires a significant rewrite of the SQL statements in use.

From an application developer’s perspective, they do just that:

SELECT a, b, c

FROM partitioned_table

WHERE p IN

(SELECT p FROM other_table WHERE r between 1 and 100)

You can’t imagine how many times I’ve heard: But I am selecting from a specific partition! And I had to explain that there is no way for Postgres to know before the execution starts which partition it is going to be!

When developers ask me what they should do, I advise them to calculate the value first and use it in a query. This approach works all the time, but to be honest, that’s the question I have for the community: why can’t the query planner do it? Let me be clear: I know why it can’t do it now, but what exactly fundamentally prevents smart people who make Postgres better from making this change in the planner’s behavior? I mean, if I can figure it out, why Postgres can’t?

10 Comments

Filed under SQL

Tagged as postgresql

November 30, 2024 · 9:05 am

“Sometimes, everything is slow!”

It has been a while since I published any “tales from the field,” but here is a new one. My last post was about cases when “something” was slow “sometimes.” I was able to identify the causes of that slowness using my new log analyzer based on pgBadger raw output. Since then, my teammates have become ultimate believers in the power of log analysis and in my ability to resolve all mysteries of the world by examining the log. Needless to say, I had to be up to their expectations!

A couple of months ago, one of our customers complained that “today, starting from 11 AM, everything became slow!” That system has a high transaction volume, and multiple applications access one database, so we tried to localize the slowness first. The monitoring dashboards pointed to one of the tables, but this table didn’t show any bloat or any seq scans, and the table didn’t grow dramatically recently. The most puzzling thing was that, after several hours, things would resolve themselves, and execution times got back to normal.

It took some time to capture full logs for a day when things were fine in the beginning, but then “everything got slow.” FInally, I had a good sample, but I still didn’t know how to approach the problem, because full logs showed exactly the same thing as the customer reported: at some point, “everything got slower.” I would blame waiting for the locks to be released, but the log showed zero BEGIN and zero COMMIT/ROLLBACK statements.

I started with a very simple analysis: I ran a report for each hour of that day, and for each hour, I calculated the number of SQL statements, the average execution time and the max execution time. I found the first hour when things started to slow down (and for the record, it was not the number of SQL statements which could give an impression of “slowness” from the end user perspective, but the average execution time went up), and then I selected the same numbers per minute for that hour.

Now I had almost exact time of when the slowness began, and I simply selected all the SQL statements which where executed during this minute. I saw a SELECT statement which started to appear more often during this minute and subsequent minutes. When I browsed the table, I didn’t see the whole statement, just whatever number of characters fit on the screen, but when I copied the whole SELECT statement to run the execution plan, I saw then it was not just SELECT, but SELECT ... FOR UPDATE! Although there was no BEGIN, there was a transaction in progress! Actually, a lot of them!

Next thing, I started to wonder why the whole system didn’t collapse, since I didn’t see any COMMIT/ROLLBACK, so selected all the statements for some of these particular sessions, and realized that they only ran for a couple of minutes, and then disconnected, so the lock was released after that. Next question was, why there were waits only during certain periods of time, not all the time?

Although this SELECT ... FOR UPDATE ran as a prepared statement, I could see which values were passed for each of the execution (in the parameters column). Next thing I did, I selected counts for each parameter value per this statement per minute (and later per second), and the number of distinct pid which ran this SELECT ... FOR UPDATE. As you have already guessed, at the time when “everything started to be slow” there was always more than one process trying to SELECT ... FOR UPDATE the same row of the table! Oh, and also – there was never any UPDATE following this SELECT!

The problem was solved, and I happily reported my finding to developers. They were puzzled because :

They didn’t know their ORM generated SELECT ... FOR UPDATE
They were sure that they “ran everything in transactions” and that they committed each of them!

But that’s a different story – you’all know how I feel about ORMs!

“Sometimes, It’s Slow!”

These are my least favorite mysteries to resolve. I hate mysteries when it comes to performance problems. A performance mystery means I do not know how to make it work faster! However, I love solving mysteries—except when presented with a “sometimes it is slow” case.

Cases like “it was fast just yesterday, and now it is slow” are easier because you have a pretty good idea of what you might want to check: stale statistics, increased data volumes, “not fit into memory anymore” – all usual suspects. But what do you do with “sometimes”?! What are these “times”? The only thing you can suspect are prepared statements, and if that’s not the case, you are out of ideas.

***

Recently, I was an unfortunate receiver of one of these “sometimes it is slow” service tickets. The only advantage I had this time was a “new pgBadger” in place, so I could have a full view of what was happening inside the database during this slowness. Still, I could not explain what I saw. Yes, the process was definitely suboptimal, but I still could not explain what could make some queries wait for an extended time.

Finally, after examining multiple snapshots during the slow time, I figured it out. There were two contributing factors which had nothing to do with Postgres.

All textbooks, not only the ones we used forty years ago but even the ones students use nowadays, suggest dropping all indexes and constraints before you start a bulk load and recreate them.
Most ORMs offer the courtesy of adding BEGIN at the beginning of an interaction with a database and COMMIT/ROLLBACK at the end, even in the case of read-only transactions, and an app developer who uses Python has no idea about it!

In addition, there are some Postgres specifics that contributed, so let’s look at what exactly happened.

Why #1 is a horrible idea? It all depends on the total table size of a table and relative sizes of a table/vs the number of records we are about to insert. If we have a table that already has 10M rows, and we will add 50K, dropping the indexes and rebuilding them after the load is done will take significantly more time than inserting without dropping indexes.
Why is dropping FK constraints an even worse idea? FK constraints in Postgres are implemented using internal triggers on both the parent and child tables, so dropping an FK constraint is a DDL statement that requires an exclusive lock on both tables.
When there is an open transaction reading from the parent table, an exclusive lock on that table can’t be obtained until this transaction is committed/rolled back.
In our case, all tables loaded had one foreign key referencing the same parent table. This table listed all the data files to be loaded, and before loading each file into the database, the data injection process checked whether the file was processed without any errors. Each SELECT was a part of a transaction, and each drop constraint was a part of a transaction, and each add constraint was a part of a transaction as well.

To summarize, it was just a matter of chance whether these transactions would line up while waiting for the previous one to finish, and it was no wonder it was slow “sometimes.”

Curious to learn what helped me to figure out the sequence of events? Come to one of the talks I will present next week: PG Day UK or PG Day Lowland. And if you are in Chicago, please attend our September PUG!

1 Comment

Filed under Data management, SQL

Tagged as postgresql

May 5, 2024 · 8:22 pm

(Bi)Temporal Tables, PostgreSQL and SQL Standard

At PG Day Chicago, I presented an extended version of my talk given last year at Citus.con – Temporal Tables and Standard. Just between the time my talk was accepted and I delivered the presentation, I learned that PG 17 would include the first-ever support of an important temporal feature: uni-temporal primary keys and unique constraints.

It has been a while since the last time I presented anything temporal-related, which meant that many people in the audience hadn’t heard anything about the bitemporal model before. There was no way I could cover everything in 40 minutes, and many questions, which were asked both during the Q&A and later in the hallways, remained unanswered.

In this blog, I will address some of these questions and expand on what I would like to see in the upcoming Postgres implementation of temporal tables.

Bitemporal framework and GIST. The key feature of the bitemporal model is its dependency on existing PG extensions, specifically on GIST indexes and GIST with EXCLUSION constraints. In fact, the GIST extension does all the work needed to support (bi) temporal primary/unique keys: it ensures that there is no time overlap on any two distinct values of the primary key. In the bitemporal model, we check the same thing for the two time dimensions. For those who never needed GIST indexes, here is the relevant documentation. I learned about GIST when I first started implementing bitemporality, and I could not believe all my needs were already met!
Disk space requirements. For some reason, people believe that keeping all versions of each tuple requires “too much disk space.” I won’t deny that you need more space to store row versions than when you don’t; however, how much more is often overestimated. In my talk at PG Conf.EU 2022, I presented an example of storing changelog vs. storing data in a bitemporal model and demonstrated that it actually takes less space while allowing queries to be executed much faster.
Excessive IO. One of the questions I was asked was whether the use of bitemporal model increases the system IO. The answer is surprisingly insignificant. Let’s look at the database operations. INSERT is the same insert, whether it is temporal or not. The non-temporal update is equal to one INSERT and one DELETE. The uni-temporal UPDATE results in one INSERT and one UPDATE; in other words, two inserts and one delete. The bitemporal UPDATE is equal to two inserts and one update; in other words, it is equal to three inserts and one delete. That means the number of costly operations remains the same as with regular updates. Also, note one remarkable fact: the only field that changes in the updated record is the time interval. That means that 1) the record size is not going to change 2) since GIST index is an R-Tree, the intervals order is defined by inclusion. When we update the time range, the only thing we are doing with it is making is smaller (end-dating), thereby the node in the index will never move, which means that GIST indexes in this case will (almost) never experience a bloat. As for all regular B-tree indexes, all of the updates in the temporal models are HOT updates.
Why is the first temporal feature in PG 17 so significant? Having temporal primary/unique temporal keys in PG 17 might seem insignificant – after all, that’s what GIST with exclusion does anyway. However, one of my huge asks for many previous years was the ability to see temporal keys in a table description. I’ve invented a lot of tricks (mainly having an “empty” check constraint) so that I would be able to identify temporal tables using the \d psql command. Now, we can do it.
System time or application time? Now, I am going to switch to my questions and concerns about “what’s next” in the area of temporal tables support. When I first heard about the temporal key in PG 17 at SCaLE, I immediately asked the presenter when the second dimension would be added, to which he replied – very soon. We are actively working on it, and we are going to implement everything in the standard. That means, among other things, that Postgres should distinguish between SYSTEM_TIME (as per standard) and application time, and I do not see this distinction in the ongoing discussions.

Why is this important?

The SQL standard requires adding semantics to DML, adding a “FOR PERIOD” clause to all commands (and assuming CURRENT if it is omitted, to keep the old code functioning). However, for the SYSTEM_TIME, “FOR” is irrelevant, because system time is defined as “transaction time”, so it can only start at the “present” moment, and for any tuple in a temporal table, it can be either in the past or now. As for application time, it is not bound to a transaction, and “FOR” can be defined as any past, present, or future time period. In both cases, “update” is not a regular update but a sequence of inserts and updates, as I described in 2). And when we define temporal referential integrity, we need to take these semantics into account, which I still have to see. From my perspective, this test is not correct:

INSERT INTO temporal_rng (id, valid_at) VALUES
   ('[1,2)', daterange('2018-01-02', '2018-02-03')),
   ('[1,2)', daterange('2018-03-03', '2018-04-04')),
   ('[2,3)', daterange('2018-01-01', '2018-01-05')),
   ('[3,4)', daterange('2018-01-01', NULL));
 ALTER TABLE temporal_fk_rng2rng
     DROP CONSTRAINT temporal_fk_rng2rng_fk;
 INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', daterange('2018-01-02', '2018-02-01'), '[1,2)');
 ALTER TABLE temporal_fk_rng2rng
     ADD CONSTRAINT temporal_fk_rng2rng_fk
     FOREIGN KEY (parent_id, PERIOD valid_at)
     REFERENCES temporal_rng;
 ALTER TABLE temporal_fk_rng2rng
     DROP CONSTRAINT temporal_fk_rng2rng_fk;
 INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[2,3)', daterange('2018-01-02', '2018-04-01'), '[1,2)');
 -- should fail:
 ALTER TABLE temporal_fk_rng2rng
     ADD CONSTRAINT temporal_fk_rng2rng_fk
     FOREIGN KEY (parent_id, PERIOD valid_at)
     REFERENCES temporal_rng;
 ERROR:  insert or update on table "temporal_fk_rng2rng" violates foreign key constraint "temporal_fk_rng2rng_fk"
 DETAIL:  Key (parent_id, valid_at)=([1,2), [2018-01-02,2018-04-01)) is not present in table "temporal_rng".

(It is very probable that the link that was sent to me does not reflect the current status, so I am holding off my judgment until I double-check; however, that’s a good illustration of the importance of operations semantics)

7 Comments

Filed under SQL

Tagged as bitemporal, postgresql

April 16, 2024 · 7:58 pm

PostgreSQL 17 features I am most waiting for

I won’t be able to attend Magnus’ talk at PG Day Chicago because it is scheduled at the same time as my talk, but fortunately, I attended his talk at SCaLe. There are several features I am really excited about, and I can’t wait to start using them!

Event trigger on login. Yes, yes, I know, Magnus told us it’s a footgun, but I promise, I know how to use it safely! I have so many potential usages!!!
Error handling in COPY. Since I started my new development with pgBadger, my biggest fear is that if just one record is somehow wrong, the whole file won’t load, and I can’t do anything with that – except for that now I can!!!
PgChangePassword!!! I have security-defined functions to handle that because what could be more natural than giving the user an option to change their own password?! Well… except for that, way too often, there is one user for the whole department… but we will address it later!
Redundant NOT NULL removal – that’s great!!! Such a time-saver! And again, what can be ore natural?!
Self-join removal!!! I can’t even tell how many times I fought with this problem starting from my Sybase days and all the way through my Oracle days! How many times I had to explain this “magic” to people! And now, finally, I will be able to remove this explanation from our performance book!
You might think that my favorite feature is the temporal primary key – well, it is; however, I have already started to worry about what’s next. The temporal PK is implemented just as it should be, but what I see in the future code temporal foreign key makes me worried. And nobody answers my concerns, so I do not know how loud I should scream!

Summary: if you didn’t hear this talk yet, please go to this talk at PGDay Chicago – I will be happy to lose this competition :))

4 Comments

Filed under news, SQL, talks

Tagged as conferences, postgresql

February 18, 2024 · 8:06 pm

Logging: What, Why and When

There are multiple PostgreSQL configuration parameters that determine what exactly to log. In Postgres 16, there are 30 of them if you exclude the ones related to the file naming and location. We rarely change them when we create a new instance. In a best-case scenario, your organization has a default set of cnfiguration parameters which are applied when a new instance is created.

Why are we usually ignorant of these parameters? My hypothesis is that that’s because we rarely use Postgres logs for anything. There are many extensions and third-party tools to monitor different Postgres activities, to alert of high CPU or IO, of too many connections, disk usage increase, tables bloat, and so on. None of them use information from Postgres logs. Even when we want to monitor queries’ performance changes, we use pg_stat_statements, not logs.

If anybody who is reading this blog uses Postgres logs “as is” on a regular basis, please let me know! PgBadger fans – please hold off; we will get there!

During all my Postgres years, all the cases when I or anybody whom I know would use Postgres log were for investigations; that’s when we try to find something specific that happened at some particular moment or period of time, but never for any regular monitoring. That’s why, in a majority of cases, we do not care. But if we truly “do not care,” why do any logging at all? One of the default log-related settings is log_min_duration_statement=10 which means that Postgres will record all statements which take 10 or more milliseconds.

On the surface, it makes sense. We want to record only those SQL statements that exceed the response time threshold so that we can investigate what exactly causes the problem when an application experiences slowness. However, this logging does not help us determine whether this is a sudden spike, or the execution time was always just below 10 ms and finally reached the magic number, or even what portion of similar queries is slow.

Anybody who ever was on support for PostgreSQL databases had a moment when they would be paged for Postgres running out of space for system logs. The default action is to increase the space, but I believe the first thing to do would be to check what is being logged. Quite often, we would see that the reason for the log growth is that the number of queries with execution time over 10 ms significantly increased. At this point, I would ask whether anybody cared about logging, or more specifically, whether anybody will be looking in

When we use Postgres logs for investigation and want to know what happened during a particular session, we need to see all issued statements, not just the ones that “exceed 10 ms.” With many other logging parameters, the situation is similar. You might want to set log_connections on to record which users connected to the database at what time, but unless we record what was done during the session, there is not that much value in this information.

Of course, there are cases when you need to investigate some specific problems like autovacuum not doing its job or replication issues, and then you will turn on these specific parameters, but in the general case of “let’s figure out what’s going on inside my database,” only full logging will be helpful. However, when we turn all logging on, the size of the log might skyrocket, and the problem will not be how much space you need but how you are going to process these massive volumes and extract valuable information.

At this point, we all should collectively thank Gilles Darold for pgBadger. For many years, pgBadger has held the honor of being the most often used tool for investigating performance problems and the “weapon of choice” for many. I am not an exception: I love it, and I use it often. It proved to be instrumental not only for investigations but also for demonstrating to application developers what they are doing wrong. However, to get the results you need, you should enable full logging. The pgBadger READ.me lists all the logging that should be turned on to start collecting the information:

log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h
log_min_duration_statement = 0    	 
log_checkpoints = on 
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_autovacuum_min_duration = 0
log_error_verbosity = default

My current logging philosophy is:

Log ~~almost~~ nothing by default; turn on logging ~~almost~~ everything when you need to collect data for pgBadger.

And if you can set up prompt logs removal (and log size in general is manageable), just have full logging on all the time.

Best thing for last

A couple months ago, some of my peers noticed that my name appeared in pgBadger commits. No, I didn’t contribute anything, and yes, I still can’t program, but I had a dream!

With all of the great features of pgBadger, there were two that I really wanted to have, and they were missing: the ability to analyze logs my own way: group, aggregate, slice, and dice however I want, and the ability to trace individual user sessions. Being a database person, I thought that the only way to accomplish this task was to load logs to a database, and for that, I needed “raw” logs, the ones that were already parsed but not analyzed.

Long story short, the option –dump-raw-csv does exactly that. To take full advantage of this new option, I process each log twice. First, I generate “a regular” pgBadger report, and then I create a raw log in the CSV format, which I can load into a database table. In reality, it’s not as easy and straightforward as it might sound, but I am working on this process automation, and as soon as it is done, you will hear more!

PGSQL Phriday #011. Partitioning: we need it!

I almost missed this month’s PGSQL Phriday, but I hope I still have time to make a small contribution.

Partitioning in PostgreSQL is a topic that recently became painfully familiar to me. Like other people who already contributed to PGSQL Phriday #011, I will talk only about my industrial experience rather than “how things should be.”

Like many others, I never did sharding “for real,” that is, in the production environment. From the pure PostgeSQL perspective, “sharding” is partitioning where different partitions may be located on different physical nodes. I am very interested to hear about our people’s experience because the fact that “you can use FDW for sharding” has been cited for many years, but I never talked to anybody who had done it. With everything I know about PostgeSQL partitioning, I can’t imagine any justifications for treating distributed partitions “as one table,” so naturally, I am looking forward to learning about real-life use cases.

Now, back to “classic” declarative PostgreSQL partitions. As anybody who switched from Oracle to PostgreSQL, I was very disappointed to find the lack of adequate partitioning support in Postgres. Twelve years later, despite all improvements, the situation is still not ideal. Whenever I suggest somebody partition their large tables, I must warn them that partitioning will not make queries run faster and that I will do my best to ensure the queries will not slow down (and that’s a huge commitment!).

On OLTP systems, if proper indexes are in place, PostgreSQL does a fantastic job delivering results fast, even on gigantic tables, so the users wonder what is wrong and why they should partition.

Most times, there are two reasons for partitioning: maintenance and archiving. When a table reaches a certain size, it becomes increasingly difficult for the autovacuum to complete on time, no matter how well it is tuned. The same is true for ANALYZE and for many other operations. Just think about what it would take to pg_dump of 1 TB table and to restore it (and if you wonder why you would ever need to do anything like this, recall that most data loss disasters are caused by human errors).

Archiving – moving old data to archive storage and removing it from the “active” table is a common practice, especially in OLTP systems. There are enough articles and blogs which explain why dropping a partition is better than mass deletes (and why it is better to detach an old partition and attach a new partition than to update 1M rows)

Between these two cases, there are times when I tell my customers (external or internal): we must partition this table. However, partitioning does not come without cost, and most times, you need to make sure your queries won’t slow down,

The most important thing is that when you query a partitioned table, you must specify (explicitly) the value that identifies the proper partition(s) to query. For example, if the table report is partitioned by report_date range, each query should contain an extra condition on report_date; for example:

SELECT rep_id, paid_date, amount
FROM report
WHERE employee_id=123456
AND report_date between '06-01-2023' and '07-01-2023'

If you don’t pass the date range, PostgeSQL will search all partitions, even if employee 123456 has no reports outside this date range. Moreover, if you write a statement like this:

SELECT rep_id, paid_date, amount
FROM report
WHERE employee_id=123456
AND report_date between (
SELECT start_date FROM conference 
WHERE conf_name='Awesome PG Conference') and current_date - 7

There might be cases when partitioning would improve half of your queries and slow down the rest, and you might need to evaluate which way of partitioning will cause the least harm.

Lastly, how many partitions are practical? You can find kinds of recommendations about the optimal number of partitions. From my practical experience, a partitioned table can stay happy with several thousand partitions. The only problem with “too many” partitions is DDL. To make a change to the partitioned table structure (add or remove a column) all partitions should be locked, which in case of too many partitions, will raise an exception ” out of shared memory” due to reaching the limit of max_locks_per_transaction.

Summary: most likely, at some point, you will need to partition some of your production tables. PostgreSQL declarative partition support is improving with each new version. However, it is still more art than craftsmanship to come up with a good partitioning schema that won’t decrease your system performance.

I look forward to hearing about other people’s experiences and learning some tricks I might not know!

What do you consider Postgres best practices?

Did anybody ever ask you: what are Postgres’ best practices regarding this? ” Even if you are not a consultant (because if you are, it’s your job to answer questions like this) but a humble human working with Postgres, chances are you’ve been asked this question more than once.

This question may cover various topics, from high availability and backup tools to the optimal number of indexes on a table or correct schema usage.

When I am asked such questions, I rarely know what to answer. Who defines, what are “Postgres’ best practices?” Is there any Postges Wiki page that lists “best practices”? (Actually, there are a couple of such pages in Postgres Wiki, but for a limited number of topics)

There are many aspects of PostgreSQL where I have a strong opinion on how things should be done. Most times, these opinions come from both knowledge of internals and practice, but often from practice only. If I have no practical experience with certain things, I ask people who have done it before. Also, there were cases when people told me what I was doing was wrong because it was “against Postgres’ best practices.” To that, I usually reply that if I am doing it, it’s Postgres’ best practice :). At the same time, when I read somebodies recommendations that categorically disagree, I rarely pick up the fight since I rarely can offer any argument except for “I had a different experience.”

What do “Postgres’ best practices” mean to you? Who or what would you consider the authority to dispense “best practices”? What are some best practices which you follow? Did you ever have to go against best practices? Do you ever judge other people’s technical solutions based on that criteria?

Please share your thoughts!

10 Comments

Filed under Data management, SQL

Tagged as postgresql

	Meetups (My Second,… on Prairie PostgreSQL User Group…
	Hettie D. on How I learned to use wal_…
	Inspecting the Postg… on How I learned to use wal_…
	Sadeq Dousti on How I learned to use wal_…
	Laurenz Albe on How I learned to use wal_…

Category Archives: SQL

Indexing Partitioned Table Disaster

Can we use this index, please? – Why not?

Optimizing access to partitioned tables

“Sometimes, everything is slow!”

“Sometimes, It’s Slow!”

(Bi)Temporal Tables, PostgreSQL and SQL Standard

PostgreSQL 17 features I am most waiting for

Logging: What, Why and When

PGSQL Phriday #011. Partitioning: we need it!

What do you consider Postgres best practices?

Pages

Recent Posts

Recent Comments

Archives

Categories

Follow Blog via Email

Meta