Skip to content

Conversation

@brandur
Copy link
Owner

@brandur brandur commented Jan 26, 2019

Implements SortSupport for the inet/cidr types in Postgres by devising
an abbreviated key representation for them that will faithfully
reproduce their existing sorting rules. This has the effect of typically
reducing the time taken for inet/cidr sorts by ~50-60%, and should show
a good improvement for the vast majority of real-world data sets.

@brandur brandur force-pushed the brandur-inet-sortsupport branch 7 times, most recently from a0f5a6f to 9b13155 Compare January 27, 2019 04:21
Copy link

@petergeoghegan petergeoghegan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good for a first draft. I'm generally pretty impressed. Have some items to work though, though.

@brandur brandur force-pushed the brandur-inet-sortsupport branch 4 times, most recently from 7fbb8e2 to 4192c91 Compare January 29, 2019 17:23
@brandur brandur changed the base branch from master to REL9_6_STABLE January 29, 2019 19:40
@brandur brandur changed the base branch from REL9_6_STABLE to master January 29, 2019 19:41
@brandur brandur force-pushed the brandur-inet-sortsupport branch from f7780e5 to 3b7022b Compare January 31, 2019 02:35
* if we have more than 25 subnet bits of information, shift it down
* to the available size
*/
if (datum_subnet_size > ABBREV_BITS_INET4_SUBNET)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this up to where subnet is initially formed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not always shift when you do that, so the general rule is that subnet's most significant bit is always the actual most significant bit? Yeah, you only benefit from the shift when doing so makes space for less significant bits at the end, but I think it's clearer that way.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this up to where subnet is initially formed?

Can you elaborate on why you thing that would be an advantage? It seems to me that doing it this way (1) saves an operation potentially (as we only shift when we're actually going to be using the subnet in the case of IPv4 on 64-bit), and (2) makes the code quite a bit more clear by moving the operation down to where the result is actually going to be used.

Why not always shift when you do that, so the general rule is that subnet's most significant bit is always the actual most significant bit? Yeah, you only benefit from the shift when doing so makes space for less significant bits at the end, but I think it's clearer that way.

Sorry, can you rephrase this? I don't understand what you mean.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just start off with a subnet integer whose most significant bit is the actual most significant bit in all cases? That's simpler and easier to debug. It immediately becomes the component that you saw from the diagram.

Also, shifting is slightly cheaper than the if statement, because you eliminate a branch. Shifting isn't expensive at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd still need to shift here, to get the most significant bit from subnet into the right offset into netmask_size_and_subnet, but I think that that would still be better overall

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're suggesting something like this?

subnet <<= ip_bits(authoritative);

// ... later

subnet >>= ((SIZE_OF_DATUM - BITS_PER_BYTE) - ABBREV_BITS_INET4_SUBNET);

Hm, I'll grant you that it's definitely an alternative, but personally, I don't think it makes the code easier to understand, and saving one if statement probably isn't worth it. I'm going to leave it for now.

Thanks for the additional review! Good stuff. I feel like this code is starting to look pretty decent.

@brandur brandur force-pushed the brandur-inet-sortsupport branch from 5f1f4cf to 6755550 Compare February 8, 2019 05:40
bmomjian and others added 12 commits June 13, 2019 22:07
It was previously incorrectly placed in the server application section.

Reported-by: Tatsuo Ishii

Discussion: https://postgr.es/m/[email protected]
Commit aa09cd2 modified estimate_path_cost_size() so that it reuses
cached costs of a basic foreign path for a given foreign-base/join
relation when costing pre-sorted foreign paths for that relation, but it
incorrectly re-computed retrieved_rows, an estimated number of rows
fetched from the remote side, which is needed for costing both the basic
and pre-sorted foreign paths.  To fix, handle retrieved_rows the same way
as the cached costs: store in that relation's fpinfo the retrieved_rows
estimate computed for costing the basic foreign path, and reuse it when
costing the pre-sorted foreign paths.  Also, reuse the rows/width
estimates stored in that relation's fpinfo when costing the pre-sorted
foreign paths, to make the code consistent.

In commit ffab494, to extend the costing mentioned above to the
foreign-grouping case, I made a change to add_foreign_grouping_paths() to
store in a given foreign-grouped relation's RelOptInfo the rows estimate
for that relation for reuse, but this patch makes that change unnecessary
since we already store the row estimate in that relation's fpinfo, which
this patch reuses when costing a foreign path for that relation with the
sortClause ordering; remove that change.

In passing, fix thinko in commit 7012b13: in estimate_path_cost_size(),
the width estimate for a given foreign-grouped relation to be stored in
that relation's fpinfo was reset incorrectly when costing a basic foreign
path for that relation with local stats.

Apply the patch to HEAD only to avoid destabilizing existing plan choices.

Author: Etsuro Fujita
Discussion: https://postgr.es/m/CAPmGK17jaJLPDEkgnP2VmkOg=5wT8YQ1CqssU8JRpZ_NSE+dqQ@mail.gmail.com
Fixes some problems introduced by 6e5f8d4:

* When reusing conninfo data from the previous connection in \connect,
  the host address should only be reused if it was specified as
  hostaddr; if it wasn't, then 'host' is resolved afresh.  We were
  reusing the same IP address, which ignores a possible DNS change
  as well as any other addresses that the name resolves to than the
  one that was used in the original connection.

* PQhost, PQhostaddr: Don't present user-specified hostaddr when we have
  an inet_net_ntop-produced equivalent address.  The latter has been
  put in canonical format, which is cleaner (so it produces "127.0.0.1"
  when given "host=2130706433", for example).

* Document the hostaddr-reusing aspect of \connect.

* Fix some code comments

Author: Fabien Coelho
Reported-by: Noah Misch
Discussion: https://postgr.es/m/[email protected]
Commit 7e413a0 added that option to pg_dump, but neglected to teach
pg_dumpall how to pass it along.  Repair.

Author: Fabien Coelho
Reported-by: Peter Eisentraut
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/[email protected]
tzdb 2019a made "UCT" a link to the "UTC" zone rather than a separate
zone with its own abbreviation. Unfortunately, our code for choosing a
timezone in initdb has an arbitrary preference for names earlier in
the alphabet, and so it would choose the spelling "UCT" over "UTC"
when the system is running on a UTC zone.

Commit 23bd3ce was backpatched in order to address this issue, but
that code helps only when /etc/localtime exists as a symlink, and does
nothing to help on systems where /etc/localtime is a copy of a zone
file (as is the standard setup on FreeBSD and probably some other
platforms too) or when /etc/localtime is simply absent (giving UTC as
the default).

Accordingly, add a preference for the spelling "UTC", such that if
multiple zone names have equally good content matches, we prefer that
name before applying the existing arbitrary rules. Also add a slightly
lower preference for "Etc/UTC"; lower because that preserves the
previous behaviour of choosing the shorter name, but letting us still
choose "Etc/UTC" over "Etc/UCT" when both exist but "UTC" does
not (not common, but I've seen it happen).

Backpatch all the way, because the tzdb change that sparked this issue
is in those branches too.
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic.  That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.

Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics.  That would be a surprising behavior.

Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly.  This changed with introduction
of MCV lists, which do include most common combinations of values.

The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.

Bumped CATVERSION due to changing system catalog definitions.

Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
Regular per-column statistics are stored in pg_statistics catalog, which
is however rather difficult to read, so we also have pg_stats view with
a human-reablable version of the data.

For extended statistic the catalog was fairly easy to read, so we did
not have such human-readable view so far.  Commit 9b6babfa2d however did
split the catalog into two, which makes querying harder.  Furthermore,
we want to show the multi-column MCV list in a way similar to per-column
stats (and not as a bytea value).

This commit introduces pg_stats_ext view, joining the two catalogs and
massaging the data to produce human-readable output similar to pg_stats.
It also considers RLS and access privileges - the data is shown only when
the user has access to all columns the extended statistic is defined on.

Bumped CATVERSION due to adding new system view.

Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
The GRANT in system_views allowed SELECT privileges on various columns in
the pg_statistic_ext catalog, but tableoid was not included in the list.
That made pg_dump fail because it's accessing this column when building
the list of extended statistics to dump.

Discussion: https://postgr.es/m/8833.1560647898%40sss.pgh.pa.us
petere and others added 27 commits July 25, 2019 13:58
As of Fedora 30, it seems that the system-provided macros for setting
up user privileges in SELinux policies don't grant the ability to read
/etc/passwd, as they formerly did.  This restriction breaks psql
(which tries to use getpwuid() to obtain the user name it's running
under) and thereby the contrib/sepgsql regression test.  Add explicit
specifications that we need the right to read /etc/passwd.

Mike Palmiotto, per a report from me.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/[email protected]
Add a line to the project file setting the target SDK. Otherwise, in for
example VS2017, if the default but optional 8.1 SDK is not installed the
build will fail.

Patch from Peifeng Qiu, slightly edited by me.

Discussion: https://postgr.es/m/CABmtVJhw1boP_bd4=b3Qv5YnqEdL696NtHFi2ruiyQ6mFHkeQQ@mail.gmail.com

Backpatch to all live branches.
libpq failed to ignore Windows-style newlines in connection service files.
This normally wasn't a problem on Windows itself, because fgets() would
convert \r\n to just \n.  But if libpq were running inside a program that
changes the default fopen mode to binary, it would see the \r's and think
they were data.  In any case, it's project policy to ignore \r in text
files unconditionally, because people sometimes try to use files with
DOS-style newlines on Unix machines, where the C library won't hide that
from us.

Hence, adjust parseServiceFile() to ignore \r as well as \n at the end of
the line.  In HEAD, go a little further and make it ignore all trailing
whitespace, to match what it's always done with leading whitespace.

In HEAD, also run around and fix up everyplace where we have
newline-chomping code to make all those places look consistent and
uniformly drop \r.  It is not clear whether any of those changes are
fixing live bugs.  Most of the non-cosmetic changes are in places that
are reading popen output, and the jury is still out as to whether popen
on Windows can return \r\n.  (The Windows-specific code in pipe_read_line
seems to think so, but our lack of support for this elsewhere suggests
maybe it's not a problem in practice.)  Hence, I desisted from applying
those changes to back branches, except in run_ssl_passphrase_command()
which is new enough and little-tested enough that we'd probably not have
heard about any problems there.

Tom Lane and Michael Paquier, per bug #15827 from Jorge Gustavo Rocha.
Back-patch the parseServiceFile() change to all supported branches,
and the run_ssl_passphrase_command() change to v11 where that was added.

Discussion: https://postgr.es/m/[email protected]
Since 15d8f83 we assert that - and since 7ef04e4, 4da597e
rely on - the slot type for an expression's
ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged
as such. That allows to either skip deforming (for a virtual tuple
slot) or optimize the code for JIT accelerated deforming
appropriately (for other known slot types).

This assumption was sometimes violated for grouping sets, when
nodeAgg.c internally uses tuplesorts, and the child node doesn't
return a TTSOpsMinimalTuple type slot. Detect that case, and flag that
the outer slot might not be "fixed".

It's probably worthwhile to optimize this further in the future, and
more granularly determine whether the slot is fixed. As we already
instantiate per-phase transition and equal expressions, we could
cheaply set the slot type appropriately for each phase.  But that's a
separate change from this bugfix.

This commit does include a very minor optimization by avoiding to
create a slot for handling tuplesorts, if no such sorts are
performed. Previously we created that slot unnecessarily in the common
case of computing all grouping sets via hashing. The code looked too
confusing without that, as the conditions for needing a sort slot and
flagging that the slot type isn't fixed, are the same.

Reported-By: Ashutosh Sharma
Author: Andres Freund
Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Backpatch: 12-, where the bug was introduced in 15d8f83
Backpatch: 12-, like the previous commit
After starting slapd, wait until it can accept a connection before
beginning the real test work.  This avoids occasional test failures.
Back-patch to 11, where the LDAP tests arrived.

Author: Thomas Munro
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20190719033013.GI1859%40paquier.xyz
The ids for linking to libpq functions were previously all lower-case.
Change to mixed-case, matching the actual function name, for easier
readability in the source.  The output isn't changed in a significant
way, since the ids are converted to lower or upper case for file names
and anchors.
When making an xref to a varlistentry, the stylesheets use the first
<term> as the link text.  In the cases fixed here, the <term> element
contained extra whitespace that ended up being part of the link text,
which looked strange in the output in some cases.  This whitespace is
significant, so remove it since we don't want it.
Turn most mentions of libpq functions into links.  At id attributes to
most libpq functions, where not existing yet, so that they can be
linked to.  (In a handful of cases there were problems with the PDF
processing toolchain, so those instances were not changed.)

Author: Fabien COELHO <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1905121032330.27203@lancre
Money values exceeding about 18 digits (depending on lc_monetary)
could be inaccurately converted to numeric, due to select_div_scale()
deciding it didn't need to compute any fractional digits.  Force
its hand by setting the dscale of one division input to equal the
number of fractional digits we need.

In passing, rearrange the logic to not do useless work in locales
where money values are considered integral.

Per bug #15925 from Slawomir Chodnicki.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/[email protected]
Some platforms create a file named "localtime" in the system
timezone directory, making it a copy or link to the active time
zone file.  If Postgres is built with --with-system-tzdata, initdb
will see that file as an exact match to localtime(3)'s behavior,
and it may decide that "localtime" is the most preferred spelling of
the active zone.  That's a very bad choice though, because it's
neither informative, nor portable, nor stable if someone changes
the system timezone setting.  Extend the preference logic added by
commit e3846a0 so that we will prefer any other zone file that
matches localtime's behavior over "localtime".

On the same logic, also discriminate against "posixrules", which
is another not-really-a-zone file that is often present in the
timezone directory.  (Since we install "posixrules" but not
"localtime", this change can affect the behavior of Postgres
with or without --with-system-tzdata.)

Note that this change doesn't prevent anyone from choosing these
pseudo-zones if they really want to (i.e., by setting TZ for initdb,
or modifying the timezone GUC later on).  It just prevents initdb
from preferring these zone names when there are multiple matches to
localtime's behavior.

Since we generally prefer to keep timezone-related behavior the
same in all branches, and since this is arguably a bug fix,
back-patch to all supported branches.

Discussion: https://postgr.es/m/CADT4RqCCnj6FKLisvT8tTPfTP4azPhhDFJqDF1JfBbOH5w4oyQ@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
pg_timezone_names() tries to avoid showing the "Factory" zone in
the view, mainly because that has traditionally had a very long
"abbreviation" such as "Local time zone must be set--see zic manual page",
so that showing it messes up psql's formatting of the whole view.
Since tzdb version 2016g, IANA instead uses the abbreviation "-00",
which is sane enough that there's no reason to discriminate against it.

On the other hand, it emerges that FreeBSD and possibly other packagers
are so wedded to backwards compatibility that they hack the IANA data
to keep the old spelling --- and not just that old spelling, but even
older spellings that IANA used back in the stone age.  This caused the
filter logic to fail to suppress "Factory" at all on such platforms,
though the formatting problem is definitely real in that case.

To solve both problems, get rid of the hard-wired assumption about
exactly what Factory's abbreviation is, and instead reject abbreviations
exceeding 31 characters.  This will allow Factory to appear in the view
if and only if it's using the modern abbreviation.

In passing, simplify the code we add to zic.c to support "zic -P"
to remove its now-obsolete hacks to not print the Factory zone's
abbreviation.  Unlike pg_timezone_names(), there's no reason for
that code to support old/nonstandard timezone data.

Since we generally prefer to keep timezone-related behavior the
same in all branches, and since this is arguably a bug fix,
back-patch to all supported branches.

Discussion: https://postgr.es/m/[email protected]
Expand the validate_exec() calls to cover all the used binaries.

Author: Daniel Gustafsson <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
Make the directory where the pg_upgrade binary resides the default for
new bindir, as running the pg_upgrade binary from where the new
cluster is installed is a very common scenario.  Setting this as the
defauly bindir for the new cluster will remove the need to provide it
explicitly via -B in many cases.

To support directories being missing from option parsing, extend the
directory check with a missingOk mode where the path must be filled at
a later point before being used.  Also move the exec_path check to
earlier in setup to make sure we know the new cluster bindir when we
scan for required executables.

This removes the exec_path from the OSInfo struct as it is not used
anywhere.

Author: Daniel Gustafsson <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
Recently released xfsprogs 5.1.0 has reflink support enabled by
default, so the note that it's not enabled by default can be removed.
When doing a schema-level or a database-level operation, a list of
relations to build is created which gets processed in parallel using
multiple connections, based on the recent refactoring for parallel slots
in src/bin/scripts/.  System catalogs are processed first in a
serialized fashion to prevent deadlocks, followed by the rest done in
parallel.

This new option is not compatible with --system as reindexing system
catalogs in parallel can lead to deadlocks, and with --index as there is
no conflict handling for indexes rebuilt in parallel depending in the
same relation.

Author: Julien Rouhaud
Reviewed-by: Sergei Kornilov, Michael Paquier
Discussion: https://postgr.es/m/CAOBaU_YrnH_Jqo46NhaJ7uRBiWWEcS40VNRQxgFbqYo9kApUsg@mail.gmail.com
For its entire existence, isolationtester.c has forced client_min_messages
to WARNING, but that seems like a very poor choice of test design.  It
should be up to individual test scripts to manage whether they emit notices
and to ensure that the results are stable.  (There were no NOTICE messages
in the original set of isolation tests, so this was certainly dead code
when committed, but perhaps it was needed at some earlier point.)

It's possible that the original motivation was due to platform-dependent
variations in the timing of stdout vs. stderr output.  That should be
moot since commits 73bcb76/6eda3e9c2, but just in case, adjust
isotesterNoticeProcessor to print to stdout not stderr.  (stderr seems
like the wrong thing anyway: it should be for error printouts not expected
test output.)

Testing shows that the notices in insert-conflict-specconflict are indeed
a bit timing-unstable on very slow machines, so hide them; maybe we can
improve that later.  Also, make the notices in plpgsql-toast a bit less
verbose than the original code would've had them.

Discussion: https://postgr.es/m/[email protected]
If a test sends a notice just before blocking, it's possible on
slow machines for isolationtester to detect the blocked state before
it's consumed the notice.  (For this to happen, the notice would have
to arrive after isolationtester has waited for data for 10ms, so on
fast/lightly-loaded machines it's hard to reproduce the failure.)
But, if we have seen the backend as blocked, it's certainly already
sent any notices it's going to send.  Therefore, one more round of
PQconsumeInput and PQisBusy should be enough to collect and process
any such notices.

This appears to explain the instability noted in commit ebd4992, so undo
the hack therein to not print notices from insert-conflict-specconflict.

Patch by me, diagnosis by Andres Freund.

Discussion: https://postgr.es/m/[email protected]
The frontend version of walkdir() is defined in file_utils.c, and not
initdb.c.

Author: Sehrope Sarkuni
Discussion: https://postgr.es/m/CAH7T-artawnBt4=KODNCD8Mt2ZX4CCjJT8c=_=950xjutcRZ4Q@mail.gmail.com
The table has not been updated for some commands introduced in recent
releases, so refresh it.  While on it, reorder entries alphabetically.

Backpatch all the way down for all the commands which have gone
missing.

Reported-by: Jeremy Smith
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 9.4
We had no actual end-to-end test of NOTIFY message delivery.  In the
core async.sql regression test, testing this is problematic because psql
traditionally prints the PID of the sending backend, making the output
unstable.  We also have an isolation test script, but it likewise
failed to prove that delivery worked, because isolationtester.c had
no provisions for detecting/reporting NOTIFY messages.

Hence, add such provisions to isolationtester.c, and extend
async-notify.spec to include direct tests of basic NOTIFY functionality.

I also added tests showing that NOTIFY de-duplicates messages normally,
but not across subtransaction boundaries.  (That's the historical
behavior since we introduced subtransactions, though perhaps we ought
to change it.)

Patch by me, with suggestions/review by Andres Freund.

Discussion: https://postgr.es/m/[email protected]
Implements SortSupport for the inet/cidr types in Postgres by devising
an abbreviated key representation for them that will faithfully
reproduce their existing sorting rules. This has the effect of typically
reducing the time taken for inet/cidr sorts by ~50-60%, and should show
a good improvement for the vast majority of real-world data sets.
@brandur brandur force-pushed the brandur-inet-sortsupport branch from 6755550 to 5e0d24c Compare July 28, 2019 17:44
brandur pushed a commit that referenced this pull request Jul 28, 2019
The original setup for dependencies of partitioned objects had
serious problems:

1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents.  Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation.  (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension.  I'm not sure if any less-artificial
cases exist.)

2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.

Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies.  This is
cosmetic to users but causes testing problems.

To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted.  To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.

To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes.  (They behave identically except for having different
priorities for being cited in error messages.)  This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.

This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.

Catversion bump because the contents of pg_depend change for
partitioning relationships.

Patch HEAD only.  It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.

Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
brandur pushed a commit that referenced this pull request Jul 28, 2019
…tions.

Commit 3d956d9 added support for update row movement in postgres_fdw.
This patch fixes the following issues introduced by that commit:

* When a remote partition chosen to insert routed rows into was also an
  UPDATE subplan target rel that would be updated later, the UPDATE that
  used a direct modification plan modified those routed rows incorrectly
  because those routed rows were visible to the later UPDATE command.
  The right fix for this would be to have some way in postgres_fdw in
  which the later UPDATE command ignores those routed rows, but it seems
  hard to do so with the current infrastructure.  For now throw an error
  in that case.

* When a remote partition chosen to insert routed rows into was also an
  UPDATE subplan target rel, fmstate created for the UPDATE that used a
  non-direct modification plan was mistakenly overridden by another
  fmstate created for inserting those routed rows into the partition.
  This caused 1) server crash when the partition would be updated later,
  and 2) resource leak when the partition had been already updated.  To
  avoid that, adjust the treatment of the fmstate for the inserting.  As
  for #1, since we would also have the incorrectness issue as mentioned
  above, error out in that case as well.

Update the docs to mention that postgres_fdw currently does not handle
the case where a remote partition chosen to insert a routed row into is
also an UPDATE subplan target rel that will be updated later.

Author: Amit Langote and Etsuro Fujita
Reviewed-by: Amit Langote
Backpatch-through: 11 where row movement in postgres_fdw was added
Discussion: https://postgr.es/m/[email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.