Skip to content

Lock free drop partition (part) / truncate #33457

@den-crane

Description

@den-crane

The current behavior

create table x (a Int64) Engine=MergeTree order by a;
insert into x select * from numbers(10);

T1:
session1: select sleepEachRow(3) from x;
session2: truncate table x;  -- blocked
session3: select * from x;   -- blocked

T2: 
session1: 10 rows in set. Elapsed: 33.002 sec.
session2: Ok. Elapsed: 30.810 sec.                  (session2 is blocked by Read Lock from session1.)
session3: 0 rows in set. Elapsed: 28.604 sec.       (session3 is blocked by Write Lock from session2.)

But what if truncate, drop partition and drop part just mark the removed parts as inactive?

In this case they became almost lock free because no need to wait for session1.
And these operations became almost instant because there is no need to wait for a long file removal (ext4 sync unlink).

It makes problems:

  1. disk space not-immediate freeing. But instead of 8 minutes these special inactive parts can be stored for 1 second (no problem with racing SELECTs because of refcnt=1)
    (moreover all inactive parts, even already existing, should be stored for only "1" second)

  2. inactive parts resurrection for not-Replicated tables after restart. But we could implement a some registry for inactive parts, for example move to a folder /inactive or rename them with prefix inactive_ or we could create an empty part which covers all removed parts in a partition (by min block and max block).
    (currently an empty active part will be removed immediately, it should be addressed for this special case).

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions