[Bug]: ElasticsearchIO connector does not properly estimate index size

In the process of investigating the issue reported here:
https://stackoverflow.com/questions/74390325/how-to-enable-elasticsearchio-parallel-reads-in-apache-beam

it appears that the method used by the ElasticsearchIO connector to get the estimated size of the data in the response is not accounting for the case where the configured index is an alias or a datastream or an index pattern which can point to multiple indexes.

The original issue was a query that returns over 100 million documents for processing in the pipeline was unable to scale and was only processing at a rate of 40 / second.

As discussed in the stackoverflow thread, the code here: https://github.com/apache/beam/blob/c7f2cab6ea30a63e04847dc45047a8193abc9552/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L871

is not properly accounting for a number of scenarios where the index name returned by ElasticSearch is different than `connectionConfiguration.getIndex()`. 

ElasticSearch should be relied upon to return the proper indexes for a given stats query, and as such the `_all` object should be used instead of the `indicies` top level object.  If there are other cases where the `_all` object isn't appropriate, then the code should iterate through all of the indicies returned under the `indices` field and sum the total store size, and not simply try to match the configured index.

### Issue Priority

Priority: 2

### Issue Component

Component: io-java-elasticsearch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: ElasticsearchIO connector does not properly estimate index size #24117

Issue Priority

Issue Component

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: ElasticsearchIO connector does not properly estimate index size #24117

Description

Issue Priority

Issue Component

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions