-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Lock free drop partition (part) / truncate #33457
Description
The current behavior
create table x (a Int64) Engine=MergeTree order by a;
insert into x select * from numbers(10);
T1:
session1: select sleepEachRow(3) from x;
session2: truncate table x; -- blocked
session3: select * from x; -- blocked
T2:
session1: 10 rows in set. Elapsed: 33.002 sec.
session2: Ok. Elapsed: 30.810 sec. (session2 is blocked by Read Lock from session1.)
session3: 0 rows in set. Elapsed: 28.604 sec. (session3 is blocked by Write Lock from session2.)But what if truncate, drop partition and drop part just mark the removed parts as inactive?
In this case they became almost lock free because no need to wait for session1.
And these operations became almost instant because there is no need to wait for a long file removal (ext4 sync unlink).
It makes problems:
-
disk space not-immediate freeing. But instead of 8 minutes these special inactive parts can be stored for 1 second (no problem with racing
SELECTsbecause of refcnt=1)
(moreover all inactive parts, even already existing, should be stored for only "1" second) -
inactive parts resurrection for not-Replicated tables after restart. But we could implement a some registry for inactive parts, for example move to a folder
/inactiveor rename them with prefixinactive_or we could create an empty part which covers all removed parts in a partition (by min block and max block).
(currently an empty active part will be removed immediately, it should be addressed for this special case).