Add instrumentation using XRay#89173
Conversation
… funcIds to function names
…h custom handler for xray + started working on adding SYSTEM-statements to add/remove instrumentation(wip)
…atements using it + attempted to add system.instrumentation table(not sure if it's coorrect
Let's build with ENABLE_XRAY=1 only for the Debug build on AMD for now. This is just to ensure on the CI that it builds correctly. There are some USE_XRAY missing in other files, but it is left on purpose as an excercise to the reader :)
…(needs check) + includes to cpp from header
…andler_name and function_name as identifiers and not string literals, add possibility to pass different type of parameters for differnt hadlers and make them optional
I tried to fix it in a previous commit but changed the wrong build flavor :).
|
I'm merging this after syncing with master because cross compilation builds affected slightly how XRay flavor needed to be built. |
7f36330
|
FWIW this breaks incremental build (not a big deal, but still) I though that it was due to cmake cached variables, since previously we have At this point I will simply clean the build cache. |
|
Oh, this is actually Though we cannot use bundled |
|
Quick guide:
|
Why can't we? Using the system one seems like a bug to me. We are trying to use as fewer system files as possible as this goes against it. |
|
Moved the discussion to an issue: #91475 |
| break; | ||
| } | ||
|
|
||
| res->instrumentation_point_id = temporary_identifier->as<ASTLiteral>()->value.safeGet<UInt64>(); |
There was a problem hiding this comment.
azat.local$ :) SYSTEM INSTRUMENT REMOVE 'QueryMetricLog::startQuery'
Exception on client:
Code: 170. DB::Exception: Bad get: has String, requested UInt64. (BAD_GET), Stack trace (when copying this message, always include the lines below):
0. /src/ch/clickhouse/contrib/llvm-project/libcxx/include/__exception/exception.h:113: Poco::Exception::Exception(String const&, int) @ 0x00000000218618f9
1. /src/ch/clickhouse/src/Common/Exception.cpp:137: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000012efd6ed
2. /src/ch/clickhouse/src/Common/Exception.h:172: DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000aa89c11
3. /src/ch/clickhouse/src/Common/Exception.h:58: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000aa8968b
4. /src/ch/clickhouse/src/Common/Exception.h:190: DB::Exception::Exception<std::basic_string_view<char, std::char_traits<char>>, std::basic_string_view<char, std::char_traits<char>>>(int, FormatStringHelperImpl<std::type_identity<std::basic_string_view<char, std::char_traits<char>>>::type, std::type_identity<std::basic_string_view<char, std::char_traits<char>>>::type>, std::basic_string_view<char, std::char_traits<char>>&&, std::basic_string_view<char, std::char_traits<char>>&&) @ 0x00000000178c51fa
5. /src/ch/clickhouse/src/Core/Field.cpp:872: DB::NearestFieldTypeImpl<__decay(unsigned long), void>::Type& DB::Field::safeGet<unsigned long>() & @ 0x00000000178c5673
6. /src/ch/clickhouse/src/Parsers/ParserSystemQuery.cpp:800: DB::ParserSystemQuery::parseImpl(DB::IParser::Pos&, std::shared_ptr<DB::IAST>&, DB::Expected&) @ 0x000000001d32e877
7. /src/ch/clickhouse/src/Parsers/IParserBase.cpp:14: DB::IParserBase::parse(DB::IParser::Pos&, std::shared_ptr<DB::IAST>&, DB::Expected&) @ 0x000000001d2e5476
8. /src/ch/clickhouse/src/Parsers/ParserQuery.cpp:87: DB::ParserQuery::parseImpl(DB::IParser::Pos&, std::shared_ptr<DB::IAST>&, DB::Expected&) @ 0x000000001d31081d
9. /src/ch/clickhouse/src/Parsers/IParserBase.cpp:14: DB::IParserBase::parse(DB::IParser::Pos&, std::shared_ptr<DB::IAST>&, DB::Expected&) @ 0x000000001d2e5476
10. /src/ch/clickhouse/src/Parsers/parseQuery.cpp:321: DB::tryParseQuery(DB::IParser&, char const*&, char const*, String&, bool, String const&, bool, unsigned long, unsigned long, unsigned long, bool) @ 0x000000001d34e1e6
11. /src/ch/clickhouse/src/Client/ClientBase.cpp:411: DB::ClientBase::parseQuery(char const*&, char const*, DB::Settings const&, bool) @ 0x000000001c2d24e6
12. /src/ch/clickhouse/src/Client/ClientBase.cpp:2480: DB::ClientBase::analyzeMultiQueryText(char const*&, char const*&, char const*, std::shared_ptr<DB::IAST>&, std::unique_ptr<DB::Exception, std::default_delete<DB::Exception>>&) @ 0x000000001c2e34fd
13. /src/ch/clickhouse/src/Client/ClientBase.cpp:2604: DB::ClientBase::executeMultiQuery(String const&) @ 0x000000001c2e3c75
14. /src/ch/clickhouse/src/Client/ClientBase.cpp:2963: DB::ClientBase::processQueryText(String const&) @ 0x000000001c2e5165
15. /src/ch/clickhouse/src/Client/ClientBase.cpp:3679: DB::ClientBase::runInteractive() @ 0x000000001c2ede8c
16. /src/ch/clickhouse/programs/client/Client.cpp:402: DB::Client::main(std::vector<String, std::allocator<String>> const&) @ 0x000000001314f52d
17. /src/ch/clickhouse/base/poco/Util/src/Application.cpp:315: Poco::Util::Application::run() @ 0x000000002194e8b4
18. /src/ch/clickhouse/programs/client/Client.cpp:1266: mainEntryClickHouseClient(int, char**) @ 0x000000001315c5ce
19. /src/ch/clickhouse/programs/main.cpp:380: main @ 0x000000000aa7fd3f
20. ../sysdeps/nptl/libc_start_call_main.h:58: __libc_start_call_main @ 0x0000000000027635
21. ../csu/libc-start.c:360: __ieee754_ilogbf128 @ 0x00000000000276e9
22. _start @ 0x000000000aa7512e
There was a problem hiding this comment.
Yeah, this is the expected behavior. It was something like this at first, but then I changed it to take IDs to remove specific instrumentation points because you may want to remove only 1 handler for a function but leave the rest.
I can make REMOVE to take both a specific ID and also an identifier/string that removes all handlers that match that function if you think it's worth it.
There was a problem hiding this comment.
It will be nice to support removing by identifier/string, but my point here was that the exception is unexpected, it should be some kind of parsing error
| case Type::INSTRUMENT_ADD: | ||
| { | ||
| ASTPtr temporary_identifier; | ||
| if (ParserIdentifier{}.parse(pos, temporary_identifier, expected)) |
There was a problem hiding this comment.
Also any reasons for using identifier over string literals?
There was a problem hiding this comment.
I think that when we thought about this, Alina and I were happy considering function names as identifiers the same way we consider table or database names. Now, I agree it's more natural to think of function names as literals. I'm going to do a follow-up PR with UX improvements based on your feedback to make this easier to use.
|
Also right now it binds to first symbol only, while debuggers usually bind to all symbols, i.e. we have a few |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add instrumentation at runtime using XRay to debug issues in production and profile deterministically. Resolves #74249
Documentation entry for user-facing changes
Details
Add instrumentation to ClickHouse using LLVM's XRay feature to debug issues in production without needing to rebuild. This is a reworked PR to bring the work that @htuall started at #77153 to be production-ready.
Please take a look at the original issue to understand better what this is about. The TL;DR is that we can leverage LLVM's XRay to add a prolog and an epilog when entering and exiting functions that are longer than 200 instructions (by default) to execute extra handlers.
SYSTEM INSTRUMENTstatement support to add and remove instrumentation pointssystem.instrumentationsystem.trace_logto query instrumentation log entriesxray_instr_maponly when there's a need to patch a functionentryor onexitof the functionCloses #74249
Binary size comparison shows a ~4.5% increase for x86_64 and ~7% increase for aarch64.