Implementation of a table engine to consume application log files in ClickHouse#25969
Implementation of a table engine to consume application log files in ClickHouse#25969kssenii merged 114 commits intoClickHouse:masterfrom
Conversation
216d2ee to
2599987
Compare
|
@ucasfl hi! Will you finish this PR, or should I finish? |
Hi, maybe you can give me more review and suggestions on current codes, I will continue to finish it recently. |
kssenii
left a comment
There was a problem hiding this comment.
It will also be very nice to add tests :)
As for your to-do list in pr description:
Multi-threads to parse records from files - will be very good.
So, if we want to implement multi-threads to parse records from files, we should monitor directory in Storage and dynamically create ReadBuffer? |
Yes, I was thinking it would be better to have a monitoring task in storage which will be started when we start a reading loop in threadFunc. It will be stopped when we reschedule and resumed (opening files once again) when main task is resumed. |
|
|
|
Thanks for carefully review! Learned a lot from it. @kssenii |
|
@ucasfl Congratulations! 🎉 ❤️ |
|
Internal documentation ticket: DOCSUP-17088 |
| @@ -0,0 +1,15 @@ | |||
| option (ENABLE_FILELOG "Enable FILELOG" ON) | |||
There was a problem hiding this comment.
BTW ENABLE_FILELOG looks redundant, since usually find_foobar created only for some third-party libraries.
It can be enabled for linux only and that's it. @ucasfl thoughts?
There was a problem hiding this comment.
Yes, I agree with it.
|
|
||
| fd = inotify_init(); | ||
| if (fd == -1) | ||
| throw Exception("Cannot initialize inotify", ErrorCodes::IO_SETUP_ERROR); |
| int wd = inotify_add_watch(fd, path.c_str(), mask); | ||
| if (wd == -1) | ||
| { | ||
| owner.onError(Exception(ErrorCodes::IO_SETUP_ERROR, "Watch directory {} failed", path)); |
|
|
||
| for i in {1..20} | ||
| do | ||
| echo $i, $i >> ${user_files_path}/a.txt |
There was a problem hiding this comment.
Better to use either unique name, or at least include test name, this way it can be run in parallel.
|
|
||
| for i in {1..20} | ||
| do | ||
| echo $i, $i >> ${user_files_path}/logs/a.txt |
There was a problem hiding this comment.
And not only this test cannot be run in parallel but it also cannot be run in parallel with other tests from this PR, i.e. 02024_storage_filelog_mv.sh, since it uses the same name. Am I right?
This will allow to distinguish really corrupted data, since right now if
you will CREATE/DETACH/ATTACH such engine you will have the following
error [1]:
2022.08.05 20:02:20.726398 [ 696405 ] {} <Error> StorageFileLog (file_log): Metadata files of table file_log are lost.
[1]: https://s3.amazonaws.com/clickhouse-test-reports/39926/72961328f68b1ec05300d6dc4411a87618a2f46b/stress_test__debug_.html
Likely that previously it was not created to avoid creating empty
directories, however this should be a problem I guess.
Refs: ClickHouse#25969 (@ucasfl)
Signed-off-by: Azat Khuzhin <[email protected]>
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implementation of a table engine to consume application log files in ClickHouse. closes #6953.
Detailed description / Documentation draft:
Use case: