Implementation of a table engine to consume application log files in ClickHouse by ucasfl · Pull Request #25969 · ClickHouse/ClickHouse

ucasfl · 2021-07-04T06:28:17Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implementation of a table engine to consume application log files in ClickHouse. closes #6953.

Detailed description / Documentation draft:

Use case:

CREATE TABLE log (...) ENGINE= FileLog('/path/to/file_or_directory', 'FormatName');

CREATE TABLE target (...) ENGINE = ...;

CREATE MATERIALIZED VIEW mv TO target AS SELECT .... FROM log;

fix fix fix

src/Storages/FileLog/ReadBufferFromFileLog.cpp

kssenii · 2021-08-18T20:23:37Z

@ucasfl hi! Will you finish this PR, or should I finish?

ucasfl · 2021-08-19T05:10:12Z

@ucasfl hi! Will you finish this PR, or should I finish?

Hi, maybe you can give me more review and suggestions on current codes, I will continue to finish it recently.

kssenii

It will also be very nice to add tests :)
As for your to-do list in pr description:
Multi-threads to parse records from files - will be very good.

src/Storages/FileLog/StorageFileLog.cpp

src/Storages/FileLog/ReadBufferFromFileLog.cpp

src/Storages/FileLog/ReadBufferFromFileLog.h

src/Storages/FileLog/StorageFileLog.cpp

src/Storages/FileLog/ReadBufferFromFileLog.cpp

…o filelog-engine

ucasfl · 2021-09-03T14:04:28Z

It will also be very nice to add tests :)
As for your to-do list in pr description:
Multi-threads to parse records from files - will be very good.

So, if we want to implement multi-threads to parse records from files, we should monitor directory in Storage and dynamically create ReadBuffer？

kssenii · 2021-09-03T14:20:43Z

So, if we want to implement multi-threads to parse records from files, we should monitor directory in Storage and dynamically create ReadBuffer？

Yes, I was thinking it would be better to have a monitoring task in storage which will be started when we start a reading loop in threadFunc. It will be stopped when we reschedule and resumed (opening files once again) when main task is resumed.
Overall idea it can be similar to RabbitMQ or Kafka but probably in this case there is no need for a staticaly defined set of read buffers and they can be created dynamically once FileLogSources gets initialized.

…o filelog-engine

cmake/find/filelog.cmake

src/Storages/FileLog/FileLogSource.h

src/Storages/FileLog/StorageFileLog.cpp

kssenii · 2021-10-23T08:48:46Z

Functional stateless tests (thread) -- failed because of Test internal error: ConnectionRefusedError -- not related.
Integration tests (thread) — Timeout, fail: 43 - not related to changes

tests/queries/0_stateless/02024_storage_filelog_mv.sh

src/Storages/FileLog/FileLogSource.cpp

ucasfl · 2021-10-24T07:21:38Z

Thanks for carefully review! Learned a lot from it. @kssenii

alexey-milovidov · 2021-10-24T08:24:56Z

@ucasfl Congratulations! 🎉 ❤️

sevirov · 2021-10-24T14:08:22Z

Internal documentation ticket: DOCSUP-17088

azat · 2021-10-28T15:45:35Z

cmake/find/filelog.cmake

@@ -0,0 +1,15 @@
+option (ENABLE_FILELOG "Enable FILELOG" ON)


BTW ENABLE_FILELOG looks redundant, since usually find_foobar created only for some third-party libraries.
It can be enabled for linux only and that's it. @ucasfl thoughts?

Yes, I agree with it.

azat · 2021-10-28T15:47:02Z

src/Storages/FileLog/DirectoryWatcherBase.cpp

+
+    fd = inotify_init();
+    if (fd == -1)
+        throw Exception("Cannot initialize inotify", ErrorCodes::IO_SETUP_ERROR);


throwFromErrno ?

azat · 2021-10-28T15:47:27Z

src/Storages/FileLog/DirectoryWatcherBase.cpp

+    int wd = inotify_add_watch(fd, path.c_str(), mask);
+    if (wd == -1)
+    {
+        owner.onError(Exception(ErrorCodes::IO_SETUP_ERROR, "Watch directory {} failed", path));


throwFromErrno ?

azat · 2021-10-28T15:52:49Z

tests/queries/0_stateless/02022_storage_filelog_one_file.sh

+
+for i in {1..20}
+do
+	echo $i, $i >> ${user_files_path}/a.txt


Better to use either unique name, or at least include test name, this way it can be run in parallel.

azat · 2021-10-28T15:56:52Z

tests/queries/0_stateless/02023_storage_filelog.sh

+
+for i in {1..20}
+do
+	echo $i, $i >> ${user_files_path}/logs/a.txt


And not only this test cannot be run in parallel but it also cannot be run in parallel with other tests from this PR, i.e. 02024_storage_filelog_mv.sh, since it uses the same name. Am I right?

@ucasfl

This will allow to distinguish really corrupted data, since right now if you will CREATE/DETACH/ATTACH such engine you will have the following error [1]: 2022.08.05 20:02:20.726398 [ 696405 ] {} <Error> StorageFileLog (file_log): Metadata files of table file_log are lost. [1]: https://s3.amazonaws.com/clickhouse-test-reports/39926/72961328f68b1ec05300d6dc4411a87618a2f46b/stress_test__debug_.html Likely that previously it was not created to avoid creating empty directories, however this should be a problem I guess. Refs: ClickHouse#25969 (@ucasfl) Signed-off-by: Azat Khuzhin <[email protected]>

robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jul 4, 2021

ucasfl force-pushed the filelog-engine branch 3 times, most recently from 216d2ee to 2599987 Compare July 4, 2021 07:27

kssenii added the can be tested label Jul 4, 2021

kssenii self-assigned this Jul 4, 2021

ucasfl added 7 commits July 11, 2021 03:34

initial implement

94011d9

add directory watcher

b15099f

fix

eafe8a2

update

4be47ca

fix

f9a7666

fix fix fix

fix build

fdc256e

fix build

85aa71c

ucasfl force-pushed the filelog-engine branch from 1d08bfb to 85aa71c Compare July 11, 2021 03:38

fix

8dbb2a3

kssenii reviewed Jul 14, 2021

View reviewed changes

src/Storages/FileLog/ReadBufferFromFileLog.cpp Outdated Show resolved Hide resolved

fix conflict

9f7785f

kssenii reviewed Aug 29, 2021

View reviewed changes

ucasfl added 2 commits September 1, 2021 15:38

Merge branch 'master' of https://github.com/ClickHouse/ClickHouse int…

7513cbe

…o filelog-engine

Merge branch 'master' of https://github.com/ClickHouse/ClickHouse int…

e2cadd2

…o filelog-engine

ucasfl added 5 commits September 4, 2021 17:04

refactor some code

595005e

refactor some code

09bc3d7

fix

dfea640

fix style

b97faf8

fix

580b047

ucasfl added 11 commits October 17, 2021 14:49

fix

9378b93

fix

074e02e

fix

a32abda

fix

6cf82f8

Merge branch 'master' of https://github.com/ClickHouse/ClickHouse int…

1e3df57

…o filelog-engine

fix build

0c62872

fix style

81fb4bc

fix test

b2976fa

fix

ab0d250

fix comflict

05fd6f7

fix conflict

6565e1d

ucasfl requested a review from kssenii October 21, 2021 15:34

kssenii approved these changes Oct 21, 2021

View reviewed changes

ucasfl added 2 commits October 22, 2021 02:36

fix

5cb783a

fix

54dc969

kssenii reviewed Oct 23, 2021

View reviewed changes

tests/queries/0_stateless/02024_storage_filelog_mv.sh Outdated Show resolved Hide resolved

src/Storages/FileLog/FileLogSource.cpp Show resolved Hide resolved

fix

235339d

kssenii approved these changes Oct 23, 2021

View reviewed changes

kssenii merged commit 7383bdd into ClickHouse:master Oct 24, 2021

ucasfl mentioned this pull request Oct 24, 2021

Intern tasks 2020/2021 #15065

Closed

tavplubix mentioned this pull request Oct 27, 2021

Assertion failed in StorageFileLog #30749

Closed

azat reviewed Oct 28, 2021

View reviewed changes

azat mentioned this pull request Aug 6, 2022

Create metadata directory on CREATE for FileLog engine #39940

Merged

Conversation

ucasfl commented Jul 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kssenii commented Aug 18, 2021

Uh oh!

ucasfl commented Aug 19, 2021

Uh oh!

kssenii left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ucasfl commented Sep 3, 2021

Uh oh!

kssenii commented Sep 3, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kssenii commented Oct 23, 2021

Uh oh!

Uh oh!

Uh oh!

ucasfl commented Oct 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov commented Oct 24, 2021

Uh oh!

sevirov commented Oct 24, 2021

Uh oh!

azat Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ucasfl Oct 29, 2021

Choose a reason for hiding this comment

Uh oh!

azat Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

azat Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

azat Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

azat Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ucasfl commented Jul 4, 2021 •

edited

Loading

ucasfl commented Oct 24, 2021 •

edited

Loading

azat Oct 28, 2021 •

edited

Loading