[BUG] KoP cannot support a FETCH request with many partitions

**Describe the bug**
When I run performance tests on a topic with 100 partitions, the topic will be not available quickly. What's worse is the whole KoP cluster will be broken because in this occasion, no bookie will be available. So even a new topic was created after that, it would still be unavailable because `Not enough non-faulty bookies available`.

> 15:07:19.134 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (177, 15): Bookie operation timeout
> 15:07:19.143 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:10.0.0.220:3181>, <Bookie:10.0.0.198:3181>, <Bookie:10.0.0.228:3181>], allBookies [<Bookie:10.0.0.228:3181>, <Bookie:10.0.0.198:3181>, <Bookie:10.0.0.220:3181>].
> 15:07:19.143 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:10.0.0.220:3181>, <Bookie:10.0.0.198:3181>, <Bookie:10.0.0.228:3181>], fallback to choose bookie randomly from the cluster.
> <repeated logs>...
> 15:07:19.143 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=177,loopId=4ef321e8) Exception updating
  1828 org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available

The root cause is `Bookie operation timeout`. It seems to lead to the result that all bookies are excluded.

After that, producer might still be enable to write some messages, but there could be a possibility that corrupted message could be written to BK, see https://github.com/streamnative/kop/issues/442 for reference. And then a direct memory leak will happen as well. https://github.com/streamnative/kop/pull/542 is to handle this corrupted messages.

When there's no connected consumer, this case won't happen.

**To Reproduce**
Deploy a KoP cluster, create 
Steps to reproduce the behavior:
1. Deploy a KoP cluster (maybe standalone is more easy to reproduce?)
2. Create a topic with 100 partitions
3. Run a producer and a consumer. Even if the produce rate is not so large (like 10 MB/s), the 

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
Bookie operation timeout should not happen.

**Additional context**
As I debugged, it may be caused by `MessageIdUtils#getPositionForOffset`:

https://github.com/streamnative/kop/blob/f831d3777c82d0b9ee19365ed7ddd7545448185d/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/utils/MessageIdUtils.java#L96-L103

The call stacks are

```
KafkaTopicConsumerManager#remove
  KafkaTopicConsumerManager#createCursorIfNotExists
    KafkaTopicConsumerManager#getCursorByOffset
      MessageIdUtils#getPositionForOffset
```

The `getPositionForOffset` is a synchronous method that blocks until a `CompletableFuture` is completed. However, it is called in a `CompletableFuture`'s callback (`thenAccept`) when the `ManagedCursor` was not cached.

To verify the point, I added a trick fix to `getPositionForOffset`:

```java
    public static PositionImpl getPositionForOffset(ManagedLedger managedLedger, Long offset) {
        if (offset == 0L) {
            return ((ManagedLedgerImpl) managedLedger).getFirstPosition();
        }
        // ...
```

Then I succeeded testing a 100 partitions topic for a few minutes. Though, if the produce rate was too high (like 800 MB/s), after about one minute, the bug could still happen. It may be caused by a namespace unload, then `getPositionForOffset` could call `getPositionForOffset` again while offset is not zero.

Here're the logs. There's no warn/error logs in bookie.log.

[bookie.log](https://github.com/streamnative/kop/files/6586226/bookie.log)
[broker.log](https://github.com/streamnative/kop/files/6586227/broker.log)

	public static PositionImpl getPositionForOffset(ManagedLedger managedLedger, Long offset) {
	try {
	return (PositionImpl) managedLedger.asyncFindPosition(new OffsetSearchPredicate(offset)).get();
	} catch (InterruptedException \| ExecutionException e) {
	log.error("[{}] Failed to find position for offset {}", managedLedger.getName(), offset);
	throw new RuntimeException(managedLedger.getName() + " failed to find position for offset " + offset);
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] KoP cannot support a FETCH request with many partitions #544

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] KoP cannot support a FETCH request with many partitions #544

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions