Skip to content

Conversation

@CosmosNi
Copy link
Contributor

elasticsearch source support PIT

close #9149

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@hailin0 hailin0 requested a review from Copilot April 11, 2025 12:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • seatunnel-e2e/seatunnel-connector-v2-e2e/connector-elasticsearch-e2e/src/test/resources/elasticsearch/elasticsearch_source_with_pit.conf: Language not supported

Comment on lines 984 to 986
Map<String, String> sortField = new HashMap<>();
sortField.put("order", "asc");
sort.add(Collections.singletonMap("_shard_doc", "asc"));
Copy link

Copilot AI Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'sortField' is assigned but never used; consider removing it to clean up the code.

Suggested change
Map<String, String> sortField = new HashMap<>();
sortField.put("order", "asc");
sort.add(Collections.singletonMap("_shard_doc", "asc"));

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hailin0
Copy link
Member

hailin0 commented Apr 11, 2025

Do you have more information on PIT vs scroll?

@CosmosNi
Copy link
Contributor Author

Do you have more information on PIT vs scroll?
@hailin0
Here are the advantages of PIT compared with Scroll API:

  • Better data consistency: PIT creates a consistent data view, ensuring accurate results even when the index changes during pagination.

  • Lower resource consumption: It uses snapshots to save search results, only storing document changes after the write point, thus reducing memory consumption.

  • More flexible usage: Combined with Search After, PIT allows for more stable pagination and is less affected by the number of retrieval requests.

.withDescription(
"Elasticsearch query language. You can control the range of data read");

public static final Option<Boolean> USE_PIT =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add new enum type named search_api_type? Support scorll and pit and set scroll by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get

@hailin0
Copy link
Member

hailin0 commented Apr 12, 2025

Do you have more information on PIT vs scroll?
@hailin0
Here are the advantages of PIT compared with Scroll API:

  • Better data consistency: PIT creates a consistent data view, ensuring accurate results even when the index changes during pagination.
  • Lower resource consumption: It uses snapshots to save search results, only storing document changes after the write point, thus reducing memory consumption.
  • More flexible usage: Combined with Search After, PIT allows for more stable pagination and is less affected by the number of retrieval requests.

Which versions of elasticsearch can be used?

@CosmosNi
Copy link
Contributor Author

Do you have more information on PIT vs scroll?
@hailin0
Here are the advantages of PIT compared with Scroll API:

  • Better data consistency: PIT creates a consistent data view, ensuring accurate results even when the index changes during pagination.
  • Lower resource consumption: It uses snapshots to save search results, only storing document changes after the write point, thus reducing memory consumption.
  • More flexible usage: Combined with Search After, PIT allows for more stable pagination and is less affected by the number of retrieval requests.

Which versions of elasticsearch can be used?

7.10

| source | array | no | - |
| query | json | no | {"match_all": {}} |
| search_type | json | no | Search method,sql or dsl,default dsl |
| search_type | json | no | Search method,SQL、PIT、SCROLL,default SCROLL |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_type and search_api_type not same.
search_type : SQL, DSL
search_api_type: SCROLL, PIT

@CosmosNi CosmosNi requested a review from Hisoka-X April 14, 2025 06:01
Comment on lines 984 to 986
Map<String, String> sortField = new HashMap<>();
sortField.put("order", "asc");
sort.add(Collections.singletonMap("_shard_doc", "asc"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +68 to +69
PIT_KEEP_ALIVE,
PIT_BATCH_SIZE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PIT_KEEP_ALIVE,
PIT_BATCH_SIZE,
PIT_KEEP_ALIVE,
PIT_BATCH_SIZE,
SEARCH_API_TYPE,

Comment on lines 116 to 119
}
// DSL query
else {
// Check if we should use PIT API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
// DSL query
else {
// Check if we should use PIT API
} else {
// Check if we should use PIT API

Comment on lines 123 to 125
}
// Default scroll API
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
// Default scroll API
else {
} else {

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CosmosNi . LGTM if ci passes.

@hailin0 hailin0 merged commit 948d588 into apache:dev Apr 16, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][connector-elasticsearch] elasticsearch source support PIT

3 participants