And one more tale from the frontline!
When I encounter cases like this, I start wondering whether I am the first person who ever tried to do “this” for real, whatever “this” is.
Here is a story. When anyone gives a talk about partitions, they always bring up an example of archiving: there is a partitioned table, and you only keep “current” partitions, whatever “current” means in that context, and after two weeks or a month, or whatever interval works for you, you detach the oldest partition from the “current” table and attach it to the “archived” table, so that the data is still available when you need it, but it does not slow down your “current” queries.
So here is Hettie confidently suggesting that a customer implement this technique to avoid querying a terabyte-plus-size table. A customer happily agrees, and life is great until one day, an archiving job reports an error of a “name already exists” for an index name.
current.a_table is a partitioned table
current.a_table_2024_12_17
current.a_table_2024_12_18
...
current.a_table_2024_12_31
are partitions.
a_table_useful_index_idx is a global index on (col1, col2, cl3, col8)
archive.a_table is an archived partition table
archive.a_table_2024_12_01
archive.a_table_2024_12_02
...
archive.a_table_2024_12_15
archive.a_table_2024_12_16
are partitions
When I looked at the conflicting name, I saw the name of the index:
a_table_col1_col2_col3_col8_idx
Somehow, this name was coming from the December 17 2024 partition that we were trying to archive, and I wondered why this name, when I clearly named the global index differently, all of a sudden, this name reappeared.
I decided that I had overlooked it in the very beginning and renamed all indexes in the archive.a_table partitions so that there would be no possible collisions.
However, in two weeks, the problem reappeared. Once again, I thought that it was all because I neglected to create a global index on the archive.a_table and rebuild-renamed everything, and surely… You got it!
After that happened for the first time, I finally started to read documentation, and I learned that:
When CREATE INDEX is invoked on a partitioned table, the default behavior is to recurse to all partitions to ensure they all have matching indexes. Each partition is first checked to determine whether an equivalent index already exists, and if so, that index will become attached as a partition of an index to the index being created, which will become its parent index. If no matching index exists, a new index will be created and automatically attached; the name of the new index in each partition will be determined as if no index name had been specified in the command. If the ONLY option is specified, no recursion is done, and the index is marked invalid. (ALTER INDEX … ATTACH PARTITION marks the index valid, once all partitions acquire matching indexes.) Note, however, that any partition that is created in the future using CREATE TABLE … PARTITION OF will automatically have a matching index, regardless of whether ONLY is specified.
Now that I have compared the documentation with everything I observed, the situation has become clear.
Each time a new partition is created, the matching index is created as if the name was not given, and there is no way to change this behavior. That’s why, although I called my index a_table_useful_index_idx, all partitions of this index were created with the default name, which was a_table_col1_col2_col3_col8_idx, and with each new partition, since the name was already taken, the next number was added to the name. That way, the last created index was a_table_col1_col2_col3_col8_idx14, but when the partition, which was 15 days old, was detached, the name a_table_col1_col2_col3_col8_idx became available again, so the index on the next partitions got this name! And two weeks later, “the same name” created a collision when we tried to attach this partition to the archive.a_table table!
If you know me, you already know I wrote the process for renaming the index after creating a new partition. But what if we have multiple indexes that need to be appropriately renamed? Yes, I am writing a generic procedure for that, but it would be much better if I didn’t have to!
Here are my questions to the Postgres community:
- Did anyone experience a similar problem? If yes, did you suffer in silence, or did you write a blog post about your solution?
- Has anybody who recommends archiving ever tried it?
- Did they ever use indexes other than the primary key?
- Can we set the names of the indexes based on partition names?
- What do I need to do to make it happen :)?