UT_OUTPUT_CLOB_BUFFER_TMP_PK violation on parallel
We're getting this error on running massiv unit testing:
ORA-00001: unique constraint (UT3.UT_OUTPUT_CLOB_BUFFER_TMP_PK) violated ORA-06512: at "UT3.UT_RUNNER", line 180 ORA-06512: at "UT3.UT_OUTPUT_CLOB_TABLE_BUFFER", line 40 ORA-06512: at "UT3.UT_OUTPUT_REPORTER_BASE", line 53 ORA-06512: at "UT3.UT_DEBUG_REPORTER", line 54 ORA-06512: at "UT3.UT_EVENT_MANAGER", line 70 ORA-06512: at "UT3.UT_EVENT_MANAGER", line 77 ORA-06512: at "UT3.UT_EXECUTABLE", line 167 ORA-06512: at "UT3.UT_EXECUTABLE", line 44 ORA-06512: at "UT3.UT_EXECUTABLE_TEST", line 178 ORA-06512: at "UT3.UT_EXECUTABLE_TEST", line 38 ORA-06512: at "UT3.UT_TEST", line 79 ORA-06512: at "UT3.UT_SUITE_ITEM", line 49 ORA-06512: at "UT3.UT_SUITE", line 66 ORA-06512: at "UT3.UT_RUN", line 67 ORA-06512: at "UT3.UT_SUITE_ITEM", line 49 ORA-06512: at "UT3.UT_RUNNER", line 172 ORA-06512: at line 1
The scenario is quite complex.
- You need a lot of tests e.g. (actually we run >8000)
- You need to run them parallel in multiple sessions (totally stressed db for at least 60 minutes)
- each session runs a "suite" which represents a "moudle" in the sofware and takes a couple of minutes to finsh if run "stand alone"
- The test-runs do have overlapping code (e.g. partly covering the same "core" packages across different modules)
On running the unit test suits - the "end" of each suite is random because of DB performance, execution time etc.. Checking the code behind the mentioned table / type code: it looks like the system runs into a race condition which triggers the error above.
Please just review the code and assume if this can happen - (type body ut_output_clob_table_buffer)
as far as i can see - the error is because of:
in type: body ut_output_clob_table_buffer the content of the mentioned table is "controled".
The problem i see is, that on staring multiple UT suite runs at the same time, the value of message_id is set for each instance. During execution - even if covered by autonomous_transactions - the message_id is reset quite often by the functions
send_lines, send_clob
here - we see self.last_message_id := self.last_message_id + sql%rowcount;
So - in theory - what happens to a self.last_message_id if the sql%rowcount is added while antother session is doing the same at the same moment on the same $$plsql_unit with a lower initalized message_id. How is that kept in sync? Because of self.last_message_id := self.last_lastmessage_id +1 and self.last_message_id + rownum in other parallel running sessions, it's obviously happening, that the message_id can overlap.
Furthermore - there is an issue on get_lines in the same package. On the scenario above - the "waiting" for output // chunking is not optimal, too. In case the data is read by process 1 there is a small wait time on each chunk. The system is send to sleep for a short time and therefore will lock the table // data? But why - it's a autonomous_transactions and finally the code wants to delete the data.
Hi, currently the framework doesn't supports the parallelism. This is currently an idea.
@al-hexagon , sorry for late reply. Can you tell me:
- what version of utPLSQL are you on?
- what version of DB are you running?
- what operating-system is the database on?
- is it a RAC or single-instance DB?
- are you using utPLSQL-cli to invoke the tests or are the tests invoked using some SQL script?
- are you able to reproduce the problem regularly?
The failure occurs in this statement:
self.last_message_id := self.last_message_id + 1;
insert /*+ no_parallel */ into ut_output_clob_buffer_tmp(output_id, message_id, text, item_type)
values (self.output_id, self.last_message_id, a_text, a_item_type);
The code itself should be stable. We use SYS_GUID as output_id and I'm not sure if this is not the cause of the issue.
It is an interesting problem and it would be good to get to the bottom of it.
The problem is not related to the reading of data by output buffer but actually occurs at data-write
what version of utPLSQL are you on? v3.1.10.3349
what version of DB are you running? 19c
what operating-system is the database on? microsoft
is it a RAC or single-instance DB? single instance
are you using utPLSQL-cli to invoke the tests or are the tests invoked using some SQL script? sql script which will run a bunch of tagged tests (e.g. all 'ADMIN') are you able to reproduce the problem regularly? yes - if we run a couple of tagged tests in parallel SQLPLUS instances. e.g. running "ADMIN", "DATA", "FRAMEWORK", ... Tags in parallel. These tests might call the same packages (e.g. core application packages, helper packages, ...) at the same time. We stopped using utplsql in parallel and run them just sequential.
As far as i remeber (quite long back as i traced this) the issue is the message_id which is set during runtime.
e.g. Instance 1 sets message_id to 100000 for plsql_unit 'core'. at the same time as instance 1 runs the test instance 2 got finished with the plsql_unit and deletes messages from for plsql_unit 'core' (e.g. 15000 messages) Instance 3 starts while Instance 1 is still running and sets message_id = 85000 for plsql_unit 'core'
If then 1 and 3 try to write message data - they can overlap and cause this unique key constraint error.