-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Exporting using s3 table function might create too many threads #51533
Copy link
Copy link
Open
Labels
comp-formatsInput/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).Input/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).potential bugTo be reviewed by developers and confirmed/rejected.To be reviewed by developers and confirmed/rejected.
Description
Create a table to export it to s3 using partitions:
Mordor :) create table n (a Int64, b Int64) ENGINE=MergeTree() order by a;
CREATE TABLE n
(
`a` Int64,
`b` Int64
)
ENGINE = MergeTree
ORDER BY a
Query id: 9307f2ba-a356-481e-9055-8ccd4a30a29c
Ok.
0 rows in set. Elapsed: 0.001 sec.
Mordor :) insert into n Select number, number from numbers(10000)
INSERT INTO n SELECT
number,
number
FROM numbers(10000)
Query id: 7c056482-30fc-4b27-a90f-a21ae6cede7e
Ok.
0 rows in set. Elapsed: 0.002 sec. Processed 10.00 thousand rows, 80.00 KB (6.00 million rows/s., 47.99 MB/s.)
Metrics before:
SELECT *
FROM system.asynchronous_metrics
WHERE (metric LIKE 'jemalloc%') OR (metric ILIKE '%thread%')
ORDER BY metric ASC
Query id: 0d0a854a-9013-4793-ae71-c9efb0a3d609
┌─metric───────────────────────────────────┬─────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ HTTPThreads │ 0 │ Number of threads in the server of the HTTP interface (without TLS). │
│ MySQLThreads │ 0 │ Number of threads in the server of the MySQL compatibility protocol. │
│ OSThreadsRunnable │ 4 │ The total number of 'runnable' threads, as the OS kernel scheduler seeing it. │
│ OSThreadsTotal │ 3233 │ The total number of threads, as the OS kernel scheduler seeing it. │
│ TCPThreads │ 1 │ Number of threads in the server of the TCP protocol (without TLS). │
│ jemalloc.active │ 118517760 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.allocated │ 113522904 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.dirty_purged │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.muzzy_purged │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.pactive │ 28935 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.pdirty │ 18114 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.pmuzzy │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.background_thread.num_runs │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.background_thread.num_threads │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.background_thread.run_intervals │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.epoch │ 17 │ An internal incremental update number of the statistics of jemalloc (Jason Evans' memory allocator), used in all other `jemalloc` metrics. │
│ jemalloc.mapped │ 276598784 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.metadata │ 20653712 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.metadata_thp │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.resident │ 205963264 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.retained │ 69955584 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
└──────────────────────────────────────────┴───────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Try a bad insert into s3:
INSERT INTO
TABLE FUNCTION s3(
'http://resolver:8083/root/data/test-upload_{_partition_id}.csv.gz',
'minio', 'minio123',
'CSV', auto, 'gzip'
)
PARTITION BY (a, b) Select * from n;
INSERT INTO FUNCTION s3('http://resolver:8083/root/data/test-upload_{_partition_id}.csv.gz', 'minio', 'minio123', 'CSV', auto, 'gzip') PARTITION BY (a, b) SELECT *
FROM n
Query id: 53867b93-9b7a-4fa7-987c-df4123b8ad37
↗ Progress: 10.00 thousand rows, 160.00 KB (5.49 thousand rows/s., 87.77 KB/s.) 99%
0 rows in set. Elapsed: 1.823 sec. Processed 10.00 thousand rows, 160.00 KB (5.49 thousand rows/s., 87.77 KB/s.)
Received exception from server (version 23.6.1):
Code: 439. DB::Exception: Received from localhost:9000. DB::Exception: Cannot schedule a task: cannot allocate thread (threads=0, jobs=0). (CANNOT_SCHEDULE_TASK)
Metrics after:
SELECT *
FROM system.asynchronous_metrics
WHERE (metric LIKE 'jemalloc%') OR (metric ILIKE '%thread%')
ORDER BY metric ASC
Query id: 623f1b5f-43b5-49c2-9821-ddf8497f252d
┌─metric───────────────────────────────────┬───────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ HTTPThreads │ 0 │ Number of threads in the server of the HTTP interface (without TLS). │
│ MySQLThreads │ 0 │ Number of threads in the server of the MySQL compatibility protocol. │
│ OSThreadsRunnable │ 2 │ The total number of 'runnable' threads, as the OS kernel scheduler seeing it. │
│ OSThreadsTotal │ 4220 │ The total number of threads, as the OS kernel scheduler seeing it. │
│ TCPThreads │ 1 │ Number of threads in the server of the TCP protocol (without TLS). │
│ jemalloc.active │ 1529765888 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.allocated │ 1405988368 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.dirty_purged │ 216494 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.muzzy_purged │ 42074 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.pactive │ 373478 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.pdirty │ 4328316 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.arenas.all.pmuzzy │ 171949 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.background_thread.num_runs │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.background_thread.num_threads │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.background_thread.run_intervals │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.epoch │ 35 │ An internal incremental update number of the statistics of jemalloc (Jason Evans' memory allocator), used in all other `jemalloc` metrics. │
│ jemalloc.mapped │ 20317270016 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.metadata │ 282603632 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.metadata_thp │ 0 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.resident │ 19505512448 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
│ jemalloc.retained │ 2280067072 │ An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html │
└──────────────────────────────────────────┴─────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
21 rows in set. Elapsed: 0.001 sec.
The server now holds an extra ~1000 threads an 16 GB of reserved memory.
Calling the export function again does not raise more threads, but the memory usage does increase with each call and it doesn't appear to completely go down. In my case it seems to estabilize at around 23-24 GB of memory on an empty database (default no config).
Note that I don't have minio running, so the url is invalid.
A couple of opinions:
- 1000 threads, or whatever, to export to s3 seems too much, no matter how bad the partition was. There should be a reasonable upper limit.
- Keeping those thousand threads alive is expensive in terms of memory
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
comp-formatsInput/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).Input/output formats (CSV/JSON/Parquet/ORC/Arrow/Protobuf/etc.).potential bugTo be reviewed by developers and confirmed/rejected.To be reviewed by developers and confirmed/rejected.