Skip to content

Uncaught exception on HDFS write error #24117

@cyberhuman

Description

@cyberhuman

You have to provide the following information whenever possible.

Describe the bug
When there is a problem with writing to HDFS, ClickHouse aborts instead of handling it gracefully.

Does it reproduce on recent release?
21.3.9.1

How to reproduce

  • Which ClickHouse server version to use
    21.3.9.1

  • Which interface to use, if matters
    clickhouse-client / HTTP

  • Non-default settings, if any

  • CREATE TABLE statements for all tables involved

  • Sample data for all these tables, use clickhouse-obfuscator if necessary

  • Queries to run that lead to unexpected result

INSERT INTO FUNCTION hdfs('hdfs://nodename/file.zstd', 'JSONEachRow', '<columns>') SELECT <columns> FROM bigtable ... 

Expected behavior
The error must not be fatal.

Error message and/or stacktrace

2021.05.14 06:59:26.682799 [ 14228 ] {980b20c2-efad-4a27-8d1c-063a9b67ddba} <Error> executeQuery: Code: 210, e.displayText() = DB::Exception: Fail to write HDFS file: hdfs://nodename/file.zstd Cannot add datanode; src=/file.zstd, blk=BP-694406155-127.0.1.1-1602643540146:blk_1273853584_200146320. Name node is in safe mode.#012The reported blocks 0 needs additional 25777759 blocks to reach the threshold 0.9990 of total blocks 25803563.#012The number of live datanodes 5 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:nodename#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1438)#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1425)#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2707)#012#011at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:903)#012#011at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:574)#012#011at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)#012#011at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)#012#011at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)#012#011at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)#012#011at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)#012#011at java.security.AccessController.doPrivileged(Native Method)#012#011at javax.security.auth.Subject.doAs(Subject.java:422)#012#011at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)#012#011at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)#012: While executing SinkToOutputStream (version 21.3.9.1) (from 1.1.1.1:12345) (in query: INSERT INTO FUNCTION hdfs('hdfs://nodename/file.zstd', 'JSONEachRow', '`column` String....') SELECT column, ...
2021.05.14 06:59:26.683873 [ 14228 ] {980b20c2-efad-4a27-8d1c-063a9b67ddba} <Error> TCPHandler: Code: 210, e.displayText() = DB::Exception: Fail to write HDFS file: hdfs://nodename/file.zstd Cannot add datanode; src=/file.zstd, blk=BP-694406155-127.0.1.1-1602643540146:blk_1273853584_200146320. Name node is in safe mode.#012The reported blocks 0 needs additional 25777759 blocks to reach the threshold 0.9990 of total blocks 25803563.#012The number of live datanodes 5 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:nodename#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1438)#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1425)#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2707)#012#011at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:903)#012#011at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:574)#012#011at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)#012#011at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)#012#011at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)#012#011at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)#012#011at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)#012#011at java.security.AccessController.doPrivileged(Native Method)#012#011at javax.security.auth.Subject.doAs(Subject.java:422)#012#011at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)#012#011at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)#012: While executing SinkToOutputStream, Stack trace:#012#0120. DB::WriteBufferFromHDFS::WriteBufferFromHDFSImpl::write(char const*, unsigned long) const @ 0xe745152 in /usr/bin/clickhouse#0121. DB::WriteBufferFromHDFS::nextImpl() @ 0xe74503f in /usr/bin/clickhouse#0122. DB::ZstdDeflatingWriteBuffer::nextImpl() @ 0xe742488 in /usr/bin/clickhouse#0123. DB::writeJSONString(char const*, char const*, DB::WriteBuffer&, DB::FormatSettings const&) @ 0xab9a2da in /usr/bin/clickhouse#0124. DB::JSONEachRowRowOutputFormat::writeField(DB::IColumn const&, DB::IDataType const&, unsigned long) @ 0xfb30c32 in /usr/bin/clickhouse#0125. DB::IRowOutputFormat::write(std::__1::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn> > > const&, unsigned long) @ 0xfafcab1 in /usr/bin/clickhouse#0126. DB::IRowOutputFormat::consume(DB::Chunk) @ 0xfafc512 in /usr/bin/clickhouse#0127. DB::IOutputFormat::write(DB::Block const&) @ 0xfac6a43 in /usr/bin/clickhouse#0128. DB::MaterializingBlockOutputStream::write(DB::Block const&) @ 0xfa735a2 in /usr/bin/clickhouse#0129. DB::PushingToViewsBlockOutputStream::write(DB::Block const&) @ 0xedd0794 in /usr/bin/clickhouse#01210. DB::AddingDefaultBlockOutputStream::write(DB::Block const&) @ 0xedda3eb in /usr/bin/clickhouse#01211. DB::SquashingBlockOutputStream::write(DB::Block const&) @ 0xedda954 in /usr/bin/clickhouse#01212. DB::CountingBlockOutputStream::write(DB::Block const&) @ 0xecf61de in /usr/bin/clickhouse#01213. DB::SinkToOutputStream::consume(DB::Chunk) @ 0xfc2f076 in /usr/bin/clickhouse#01214. DB::ISink::work() @ 0xfa7e475 in /usr/bin/clickhouse#01215. ? @ 0xfaba24d in /usr/bin/clickhouse#01216. DB::PipelineExecutor::executeStepImpl(unsigned long, unsigned long, std::__1::atomic<bool>*) @ 0xfab6e71 in /usr/bin/clickhouse#01217. DB::PipelineExecutor::executeImpl(unsigned long) @ 0xfab4dc6 in /usr/bin/clickhouse#01218. ? @ 0xfac25c2 in /usr/bin/clickhouse#01219. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x886ddbf in /usr/bin/clickhouse#01220. ? @ 0x8871703 in /usr/bin/clickhouse#01221. start_thread @ 0x7fa3 in /usr/lib/x86_64-linux-gnu/libpthread-2.28.so#01222. __clone @ 0xf94cf in /usr/lib/x86_64-linux-gnu/libc-2.28.so
2021.05.14 06:59:26.687604 [ 65545 ] {53e04e14-3971-48c4-888c-3d1848959ecb} <Information> executeQuery: Read 97280 rows, 10.94 MiB in 0.261624115 sec., 371831 rows/sec., 41.83 MiB/sec.
2021.05.14 06:59:26.691214 [ 90001 ] {} <Fatal> BaseDaemon: (version 21.3.9.1, build id: 9E362C63BE532FC5B7F00105DB411A70D5DB9FCD) (from thread 14228) Terminate called for uncaught exception:#012Code: 210, e.displayText() = DB::Exception: Fail to write HDFS file: hdfs://nodename/file.zstd Cannot add datanode; src=/file.zstd, blk=BP-694406155-127.0.1.1-1602643540146:blk_1273853584_200146320. Name node is in safe mode.#012The reported blocks 0 needs additional 25777759 blocks to reach the threshold 0.9990 of total blocks 25803563.#012The number of live datanodes 5 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:nodename#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1438)#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1425)#012#011at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2707)#012#011at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:903)#012#011at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:574)#012#011at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)#012#011at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)#012#011at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)#012#011at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)#012#011at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)#012#011at java.security.AccessController.doPrivileged(Native Method)#012#011at javax.security.auth.Subject.doAs(Subject.java:422)#012#011at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)#012#011at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)#012, Stack trace (when copying this message, always include the lines below):#012#0120. DB::WriteBufferFromHDFS::WriteBufferFromHDFSImpl::write(char const*, unsigned long) const @ 0xe745152 in /usr/bin/clickhouse#0121. DB::WriteBufferFromHDFS::nextImpl() @ 0xe74503f in /usr/bin/clickhouse#0122. DB::WriteBuffer::finalize() @ 0x88957ff in /usr/bin/clickhouse#0123. DB::ZstdDeflatingWriteBuffer::~ZstdDeflatingWriteBuffer() @ 0xe741c84 in /usr/bin/clickhouse#0124. DB::ZstdDeflatingWriteBuffer::~ZstdDeflatingWriteBuffer() @ 0xe742409 in /usr/bin/clickhouse#0125. ? @ 0xe736e41 in /usr/bin/clickhouse#0126. DB::PushingToViewsBlockOutputStream::~PushingToViewsBlockOutputStream() @ 0xedd502f in /usr/bin/clickhouse#0127. DB::AddingDefaultBlockOutputStream::~AddingDefaultBlockOutputStream() @ 0xedda6aa in /usr/bin/clickhouse#0128. DB::SquashingBlockOutputStream::~SquashingBlockOutputStream() @ 0xeddacab in /usr/bin/clickhouse#0129. DB::CountingBlockOutputStream::~CountingBlockOutputStream() @ 0xecf6569 in /usr/bin/clickhouse#01210. DB::SinkToOutputStream::~SinkToOutputStream() @ 0xfc2f176 in /usr/bin/clickhouse#01211. DB::QueryPipeline::reset() @ 0xfa9537d in /usr/bin/clickhouse#01212. DB::BlockIO::reset() @ 0xea21636 in /usr/bin/clickhouse#01213. DB::BlockIO::operator=(DB::BlockIO&&) @ 0xea216bc in /usr/bin/clickhouse#01214. DB::QueryState::operator=(DB::QueryState&&) @ 0xfa52c29 in /usr/bin/clickhouse#01215. DB::TCPHandler::runImpl() @ 0xfa4087c in /usr/bin/clickhouse#01216. DB::TCPHandler::run() @ 0xfa52229 in /usr/bin/clickhouse#01217. Poco::Net::TCPServerConnection::start() @ 0x1210755f in /usr/bin/clickhouse#01218. Poco::Net::TCPServerDispatcher::run() @ 0x12108f71 in /usr/bin/clickhouse#01219. Poco::PooledThread::run() @ 0x1223f699 in /usr/bin/clickhouse#01220. Poco::ThreadImpl::runnableEntry(void*) @ 0x1223b4fa in /usr/bin/clickhouse#01221. start_thread @ 0x7fa3 in /usr/lib/x86_64-linux-gnu/libpthread-2.28.so#01222. __clone @ 0xf94cf in /usr/lib/x86_64-linux-gnu/libc-2.28.so#012 (version 21.3.9.1)
2021.05.14 06:59:26.720383 [ 104099 ] {} <Fatal> BaseDaemon: ########################################
2021.05.14 06:59:26.720689 [ 104099 ] {} <Fatal> BaseDaemon: (version 21.3.9.1, build id: 9E362C63BE532FC5B7F00105DB411A70D5DB9FCD) (from thread 14228) (query_id: 980b20c2-efad-4a27-8d1c-063a9b67ddba) Received signal Aborted (6)
2021.05.14 06:59:26.720727 [ 104099 ] {} <Fatal> BaseDaemon: 
2021.05.14 06:59:26.720783 [ 104099 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f3217dfe7bb 0x7f3217de9535 0x89e94d8 0x13c518c3 0x13c5186c 0x882d25b 0xe7421d9 0xe742409 0xe736e41 0xedd502f 0xedda6aa 0xeddacab 0xecf6569 0xfc2f176 0xfa9537d 0xea21636 0xea216bc 0xfa52c29 0xfa4087c 0xfa52229 0x1210755f 0x12108f71 0x1223f699 0x1223b4fa 0x7f3217f8ffa3 0x7f3217ec04cf
2021.05.14 06:59:26.720935 [ 104099 ] {} <Fatal> BaseDaemon: 1. raise @ 0x377bb in /usr/lib/x86_64-linux-gnu/libc-2.28.so
2021.05.14 06:59:26.720952 [ 104099 ] {} <Fatal> BaseDaemon: 2. abort @ 0x22535 in /usr/lib/x86_64-linux-gnu/libc-2.28.so
2021.05.14 06:59:26.720989 [ 104099 ] {} <Fatal> BaseDaemon: 3. ? @ 0x89e94d8 in /usr/bin/clickhouse
2021.05.14 06:59:26.721007 [ 104099 ] {} <Fatal> BaseDaemon: 4. ? @ 0x13c518c3 in ?
2021.05.14 06:59:26.721040 [ 104099 ] {} <Fatal> BaseDaemon: 5. std::terminate() @ 0x13c5186c in ?
2021.05.14 06:59:26.721056 [ 104099 ] {} <Fatal> BaseDaemon: 6. ? @ 0x882d25b in /usr/bin/clickhouse
2021.05.14 06:59:26.721077 [ 104099 ] {} <Fatal> BaseDaemon: 7. DB::ZstdDeflatingWriteBuffer::~ZstdDeflatingWriteBuffer() @ 0xe7421d9 in /usr/bin/clickhouse
2021.05.14 06:59:26.721092 [ 104099 ] {} <Fatal> BaseDaemon: 8. DB::ZstdDeflatingWriteBuffer::~ZstdDeflatingWriteBuffer() @ 0xe742409 in /usr/bin/clickhouse
2021.05.14 06:59:26.721109 [ 104099 ] {} <Fatal> BaseDaemon: 9. ? @ 0xe736e41 in /usr/bin/clickhouse
2021.05.14 06:59:26.721135 [ 104099 ] {} <Fatal> BaseDaemon: 10. DB::PushingToViewsBlockOutputStream::~PushingToViewsBlockOutputStream() @ 0xedd502f in /usr/bin/clickhouse
2021.05.14 06:59:26.721150 [ 104099 ] {} <Fatal> BaseDaemon: 11. DB::AddingDefaultBlockOutputStream::~AddingDefaultBlockOutputStream() @ 0xedda6aa in /usr/bin/clickhouse
2021.05.14 06:59:26.721162 [ 104099 ] {} <Fatal> BaseDaemon: 12. DB::SquashingBlockOutputStream::~SquashingBlockOutputStream() @ 0xeddacab in /usr/bin/clickhouse
2021.05.14 06:59:26.721173 [ 104099 ] {} <Fatal> BaseDaemon: 13. DB::CountingBlockOutputStream::~CountingBlockOutputStream() @ 0xecf6569 in /usr/bin/clickhouse
2021.05.14 06:59:26.721189 [ 104099 ] {} <Fatal> BaseDaemon: 14. DB::SinkToOutputStream::~SinkToOutputStream() @ 0xfc2f176 in /usr/bin/clickhouse
2021.05.14 06:59:26.721210 [ 104099 ] {} <Fatal> BaseDaemon: 15. DB::QueryPipeline::reset() @ 0xfa9537d in /usr/bin/clickhouse
2021.05.14 06:59:26.721222 [ 104099 ] {} <Fatal> BaseDaemon: 16. DB::BlockIO::reset() @ 0xea21636 in /usr/bin/clickhouse
2021.05.14 06:59:26.721242 [ 104099 ] {} <Fatal> BaseDaemon: 17. DB::BlockIO::operator=(DB::BlockIO&&) @ 0xea216bc in /usr/bin/clickhouse
2021.05.14 06:59:26.721255 [ 104099 ] {} <Fatal> BaseDaemon: 18. DB::QueryState::operator=(DB::QueryState&&) @ 0xfa52c29 in /usr/bin/clickhouse
2021.05.14 06:59:26.721264 [ 104099 ] {} <Fatal> BaseDaemon: 19. DB::TCPHandler::runImpl() @ 0xfa4087c in /usr/bin/clickhouse
2021.05.14 06:59:26.721272 [ 104099 ] {} <Fatal> BaseDaemon: 20. DB::TCPHandler::run() @ 0xfa52229 in /usr/bin/clickhouse
2021.05.14 06:59:26.721284 [ 104099 ] {} <Fatal> BaseDaemon: 21. Poco::Net::TCPServerConnection::start() @ 0x1210755f in /usr/bin/clickhouse
2021.05.14 06:59:26.721291 [ 104099 ] {} <Fatal> BaseDaemon: 22. Poco::Net::TCPServerDispatcher::run() @ 0x12108f71 in /usr/bin/clickhouse
2021.05.14 06:59:26.721301 [ 104099 ] {} <Fatal> BaseDaemon: 23. Poco::PooledThread::run() @ 0x1223f699 in /usr/bin/clickhouse
2021.05.14 06:59:26.721314 [ 104099 ] {} <Fatal> BaseDaemon: 24. Poco::ThreadImpl::runnableEntry(void*) @ 0x1223b4fa in /usr/bin/clickhouse
2021.05.14 06:59:26.721341 [ 104099 ] {} <Fatal> BaseDaemon: 25. start_thread @ 0x7fa3 in /usr/lib/x86_64-linux-gnu/libpthread-2.28.so
2021.05.14 06:59:26.721353 [ 104099 ] {} <Fatal> BaseDaemon: 26. __clone @ 0xf94cf in /usr/lib/x86_64-linux-gnu/libc-2.28.so

Metadata

Metadata

Assignees

Labels

bugConfirmed user-visible misbehaviour in official release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions