Upgrade embedded Postgres and fix compatibility issues.#1199
Upgrade embedded Postgres and fix compatibility issues.#1199bcipriano wants to merge 5 commits intoAcademySoftwareFoundation:masterfrom
Conversation
| "SELECT pk_frame FROM proc WHERE pk_frame=? FOR UPDATE", | ||
| String.class, f.getFrameId()).equals(f.getFrameId())) { | ||
|
|
||
| getJdbcTemplate().update(UPDATE_PROC_MEMORY_USAGE, |
There was a problem hiding this comment.
I'm pretty sure this was a merge mistake. This block and the one below it seem to duplicate each other, and this block is missing one of the placeholder values.
| @Test | ||
| @Transactional | ||
| @Rollback(true) | ||
| @Ignore |
There was a problem hiding this comment.
I don't really understand this unit test. I dug into it a bit and I don't see an obvious problem, but the assertNotEquals(jobs, sortedJobs) check fails for me sometimes in an ARM environment. After looking at the dispatch SQL query, I don't see why jobs and sortedJobs should be different. The jobs are all launched with the same priority, so it's expected to get them back in the same order, right?
@DiegoTavares Any idea here?
There was a problem hiding this comment.
This test doesn't make much sense to me. I guess the first assert with sortedJobs can be removed as it's not guaranteed both lists will be different.
| getJdbcTemplate().update(UPDATE_PROC_MEMORY_USAGE, | ||
| rss, maxRss, vss, maxVss, | ||
| usedGpuMemory, maxUsedGpuMemory, f.getFrameId()); | ||
| } |
There was a problem hiding this comment.
Alright, this is was a weird merge issue. Not sure where it came from. Good catch.
| "SELECT " + | ||
| "limit_record.pk_limit_record, " + | ||
| "SUM(layer_stat.int_running_count) AS int_sum_running " + | ||
| "SUM(CAST(layer_stat.int_running_count AS numeric)) AS int_sum_running " + |
There was a problem hiding this comment.
Reading postgres docs raised some concerns regarding the performance of this conversion:
The type numeric can store numbers with a very large number of digits. It is especially recommended for storing monetary amounts and other quantities where exactness is required. Calculations with numeric values yield exact results where possible, e.g., addition, subtraction, multiplication. However, calculations on numeric values are very slow compared to the integer types, or to the floating-point types described in the next section.
There was a problem hiding this comment.
I think we need to dig a bit deeper to understand where this is coming from. Was the issue impacting every query or a specific subset?
There was a problem hiding this comment.
It seemed to impact every query that used a SUM on a bigint field. I didn't test all of these queries but once I saw it a few times I just went through and changed all of them, it seemed consistent.
According to postgres docs SUM(bigint) always returns a numeric, so I'm assuming something is going wrong in that conversion process somewhere.
It could be a problem specific to the embedded postgres, I don't think I was able to reproduce it anywhere else. And I can't get an older version of the embedded postgres on ARM so I wasn't able to try that.
There was a problem hiding this comment.
@DiegoTavares Any update here? Possible to do a test on your side to see if it produces any performance issues of note?
There was a problem hiding this comment.
We've submitted a ticket to EDB asking about the error.
It looks there's a change this update will actually have performance implications. We can try a different version of postgres and see if this is a bug specific for this version.
There was a problem hiding this comment.
@bcipriano I have few questions for you
- Did you installed postgres 14.5 using following link https://www.enterprisedb.com/downloads/postgres-postgresql-downloads or using some other source ?
- Did you got similar issue when you was on version 12.12 ? Are you able to test it on Postgresql12.12 on your new macbook ?
There was a problem hiding this comment.
@bcipriano Can you confirm the source of postgres download? can you try https://www.enterprisedb.com/downloads/postgres-postgresql-downloads for download postgres ? Let me know how it goes with EDB 14.5?
There was a problem hiding this comment.
One thing to mention -- this only seems to happen in Docker. When I run tests directly on my ARM Macbook, they pass fine, in fact this PR isn't needed at all. It's only when I try to build the Cuebot Docker image (which does the Gradle build/test as part of the build process) that this problem appears.
Did you installed postgres 14.5 using following link https://www.enterprisedb.com/downloads/postgres-postgresql-downloads or using some other source ?
Postgres is installed via Gradle/Maven. You can see the versions defined in build.gradle: https://github.com/AcademySoftwareFoundation/OpenCue/pull/1199/files#diff-7e57ab670111c27c23069909d0cc32c88d266c9a1f3e6ca38052c973418146fe
I don't know if it's possible use Postgres binaries from some other source with this embedded Postgres thing we're using.
Did you got similar issue when you was on version 12.12 ? Are you able to test it on Postgresql12.12 on your new macbook ?
Same issue on 11.13.0 and 12.12.0. But only on Docker.
There was a problem hiding this comment.
Did a bit more research into this today:
- Tried the latest Postgres 15 binaries. Same issue.
- Tried a different base image (
almalinuxinstead ofgradle). Same issue. - Created a Postgres container from the standard
postgresimage on my mac. Was not able to reproduce the issue.
Still kind of stumped. I filed zonkyio/embedded-postgres#99 to see if those folks have any idea.
|
From TSC meeting: we're going to hold on this PR for now while we continue to investigate. The upstream ticket has some suggestions I'll look into next. |
|
Had a breakthrough on this based on the advice I got on the upstream ticket. See zonkyio/embedded-postgres#99 (comment) for details. Hopefully we can get some updated Postgres binaries released soon, after which we can probably throw out this PR and do a simple Postgres version upgrade. |
|
Closing stale PRs |
Link the Issue(s) this Pull Request is related to.
#1196
Summarize your change.
In order to get Cuebot working on the
arm64v8architecture, we need to upgrade the embedded Postgres we're using. This created some compatibility issues which are fixed here as well.The main issue I ran into was a strange one. On ARM only, and only in Docker, it appears that
SUM(bigint)does not work properly. It returns anumerictype response as expected, but the value is completely wrong -- we were getting results like20 + 9 = 368934881474632988248, which would cause overflows in the Cuebot code. The solution I found is toCASTthe input values asnumericbefore theSUMhappens.This only happens when building the Cuebot Docker image. When I run tests directly on my Mac i.e. through IntelliJ, the tests run fine.
There are a few other minor test fixes in here, possibly due to some differences in either the Postgres version or x86/arm differences, but I didn't see anything concerning there.
This PR contains some of the changes from #1185 to make testing easier; these will be cleaned up once that PR is merged.