Skip to content

Comments

Support DATE_NANOS in LOOKUP JOIN#127962

Merged
craigtaverner merged 23 commits intoelastic:mainfrom
craigtaverner:lookup_join_date_nanos
Jun 4, 2025
Merged

Support DATE_NANOS in LOOKUP JOIN#127962
craigtaverner merged 23 commits intoelastic:mainfrom
craigtaverner:lookup_join_date_nanos

Conversation

@craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented May 9, 2025

We reported in #127249, there is no support for DATE_NANOS in LOOKUP JOIN, even though DATETIME is supported. This PR attempts to fix that.

The way that date-time was supported in LOOKUP JOIN (and ENRICH) was by using the DateFieldMapper.DateFieldType.rangeQuery (hidden behind the termQuery function) which internally takes our long values, casts them to Object, renders them to a string, parses that string back into an Instant (with a bunch of fancy and unnecessary checks for date-math, etc.), and then converts that instant back into a long for the actual query. Parts of this complex process are precision aware (ie. differentiate between ms and ns dates), but not the whole process. Simply dividing the original longs by 1_000_000 before passing them in actually works, but obviously looses precision. And the only reason it works anyway is that the date parsing code will accept a string containing a simple number and interpret it as either ms since the epoch, or years if the number is short enough. This does not work for nano-second dates, and in fact is far from ideal for LOOKUP JOIN on dates which does not need to re-parse the values at all.

This complex loop only makes sense in the Query DSL, where we can get all kinds of interesting sources of range values, but seems quite crazy for LOOKUP JOIN where we will always provide the join key from a LongBlock (the backing store of the DATE_TIME DataType, and the DATE_NANOS too).

So what we do here for DateNanos is provide two new methods to DateFieldType:

  • equalityQuery(Long, ...) to replace termQuery(Object, ...)
  • rangeQuery(Long, Long, ...) to replace rangeQuery(Object, Object, ...)

This allows us to pass in already parsed long values, and entirely skip the conversion to strings and re-parsing logic. The new methods are based on the original methods, but considerably simplified due to the removal of the complex parsing logic. The reason for both equalityQuery and rangeQuery is that it mimics the pattern used by the old termQuery with delegated directly down to rangeQuery. In addition to this, we hope to support range matching in LOOKUP JOIN in the near future.

Followup work:

Fixes #127249

@craigtaverner craigtaverner added >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels May 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@craigtaverner craigtaverner changed the title Supporti DATE_NANOS in LOOKUP JOIN Support DATE_NANOS in LOOKUP JOIN May 9, 2025
But it seems we should not need to divide by 1M at this point and rather deal with this deeper in the stack, where there is ns specific support.
@craigtaverner craigtaverner force-pushed the lookup_join_date_nanos branch from 7ac8f48 to 657f1c0 Compare May 20, 2025 20:04
@craigtaverner craigtaverner marked this pull request as ready for review June 2, 2025 16:57
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

But this might not be that hard to add support for. We need:
* commonType to work (us wider range type DATETIME)
* refactor QueryList.dateQuery to new approach used for QueryList.dateNanos
}

/**
* When the date value is already fully parsed and available as a long, use this method to skip parsing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like {@link termQuery} but works on dates that are already parsed to a long in the same precision and the field mapper. Or something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to say what method you are comparing it to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* When the date value is already fully parsed and available as a long, use this method to skip parsing.
*/
public Query rangeQuery(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth adding a couple of unit tests to DateFieldTypeTests just so we can be sure these do reasonable things. These methods are so much simpler than the others, but it'd be nice to have something really simple to read the exercises it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@craigtaverner craigtaverner added v9.0.0 v8.19.0 v9.0.3 v8.18.3 auto-backport Automatically create backport pull requests when merged labels Jun 4, 2025
@craigtaverner craigtaverner merged commit b01d552 into elastic:main Jun 4, 2025
18 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.0 Commit could not be cherrypicked due to conflicts
8.19 Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 127962

craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Jun 5, 2025
We reported in elastic#127249, there is no support for DATE_NANOS in LOOKUP JOIN, even though DATETIME is supported. This PR attempts to fix that.

The way that date-time was supported in LOOKUP JOIN (and ENRICH) was by using the `DateFieldMapper.DateFieldType.rangeQuery` (hidden behind the `termQuery` function) which internally takes our long values, casts them to Object, renders them to a string, parses that string back into an Instant (with a bunch of fancy and unnecessary checks for date-math, etc.), and then converts that instant back into a long for the actual query. Parts of this complex process are precision aware (ie. differentiate between ms and ns dates), but not the whole process. Simply dividing the original longs by 1_000_000 before passing them in actually works, but obviously looses precision. And the only reason it works anyway is that the date parsing code will accept a string containing a simple number and interpret it as either ms since the epoch, or years if the number is short enough. This does not work for nano-second dates, and in fact is far from ideal for LOOKUP JOIN on dates which does not need to re-parse the values at all.

This complex loop only makes sense in the Query DSL, where we can get all kinds of interesting sources of range values, but seems quite crazy for LOOKUP JOIN where we will always provide the join key from a LongBlock (the backing store of the DATE_TIME DataType, and the DATE_NANOS too).

So what we do here for DateNanos is provide two new methods to `DateFieldType`:
* `equalityQuery(Long, ...)` to replace `termQuery(Object, ...)`
* `rangeQuery(Long, Long, ...)` to replace `rangeQuery(Object, Object, ...)`

This allows us to pass in already parsed `long` values, and entirely skip the conversion to strings and re-parsing logic. The new methods are based on the original methods, but considerably simplified due to the removal of the complex parsing logic. The reason for both `equalityQuery` and `rangeQuery` is that it mimics the pattern used by the old `termQuery` with delegated directly down to `rangeQuery`. In addition to this, we hope to support range matching in `LOOKUP JOIN` in the near future.
craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Jun 5, 2025
We reported in elastic#127249, there is no support for DATE_NANOS in LOOKUP JOIN, even though DATETIME is supported. This PR attempts to fix that.

The way that date-time was supported in LOOKUP JOIN (and ENRICH) was by using the `DateFieldMapper.DateFieldType.rangeQuery` (hidden behind the `termQuery` function) which internally takes our long values, casts them to Object, renders them to a string, parses that string back into an Instant (with a bunch of fancy and unnecessary checks for date-math, etc.), and then converts that instant back into a long for the actual query. Parts of this complex process are precision aware (ie. differentiate between ms and ns dates), but not the whole process. Simply dividing the original longs by 1_000_000 before passing them in actually works, but obviously looses precision. And the only reason it works anyway is that the date parsing code will accept a string containing a simple number and interpret it as either ms since the epoch, or years if the number is short enough. This does not work for nano-second dates, and in fact is far from ideal for LOOKUP JOIN on dates which does not need to re-parse the values at all.

This complex loop only makes sense in the Query DSL, where we can get all kinds of interesting sources of range values, but seems quite crazy for LOOKUP JOIN where we will always provide the join key from a LongBlock (the backing store of the DATE_TIME DataType, and the DATE_NANOS too).

So what we do here for DateNanos is provide two new methods to `DateFieldType`:
* `equalityQuery(Long, ...)` to replace `termQuery(Object, ...)`
* `rangeQuery(Long, Long, ...)` to replace `rangeQuery(Object, Object, ...)`

This allows us to pass in already parsed `long` values, and entirely skip the conversion to strings and re-parsing logic. The new methods are based on the original methods, but considerably simplified due to the removal of the complex parsing logic. The reason for both `equalityQuery` and `rangeQuery` is that it mimics the pattern used by the old `termQuery` with delegated directly down to `rangeQuery`. In addition to this, we hope to support range matching in `LOOKUP JOIN` in the near future.
@craigtaverner
Copy link
Contributor Author

elasticsearchmachine pushed a commit that referenced this pull request Jun 5, 2025
* Support DATE_NANOS in LOOKUP JOIN (#127962)

We reported in #127249, there is no support for DATE_NANOS in LOOKUP JOIN, even though DATETIME is supported. This PR attempts to fix that.

The way that date-time was supported in LOOKUP JOIN (and ENRICH) was by using the `DateFieldMapper.DateFieldType.rangeQuery` (hidden behind the `termQuery` function) which internally takes our long values, casts them to Object, renders them to a string, parses that string back into an Instant (with a bunch of fancy and unnecessary checks for date-math, etc.), and then converts that instant back into a long for the actual query. Parts of this complex process are precision aware (ie. differentiate between ms and ns dates), but not the whole process. Simply dividing the original longs by 1_000_000 before passing them in actually works, but obviously looses precision. And the only reason it works anyway is that the date parsing code will accept a string containing a simple number and interpret it as either ms since the epoch, or years if the number is short enough. This does not work for nano-second dates, and in fact is far from ideal for LOOKUP JOIN on dates which does not need to re-parse the values at all.

This complex loop only makes sense in the Query DSL, where we can get all kinds of interesting sources of range values, but seems quite crazy for LOOKUP JOIN where we will always provide the join key from a LongBlock (the backing store of the DATE_TIME DataType, and the DATE_NANOS too).

So what we do here for DateNanos is provide two new methods to `DateFieldType`:
* `equalityQuery(Long, ...)` to replace `termQuery(Object, ...)`
* `rangeQuery(Long, Long, ...)` to replace `rangeQuery(Object, Object, ...)`

This allows us to pass in already parsed `long` values, and entirely skip the conversion to strings and re-parsing logic. The new methods are based on the original methods, but considerably simplified due to the removal of the complex parsing logic. The reason for both `equalityQuery` and `rangeQuery` is that it mimics the pattern used by the old `termQuery` with delegated directly down to `rangeQuery`. In addition to this, we hope to support range matching in `LOOKUP JOIN` in the near future.

* Use correct parameter name after backport

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Jun 5, 2025
* Support DATE_NANOS in LOOKUP JOIN (#127962)

We reported in #127249, there is no support for DATE_NANOS in LOOKUP JOIN, even though DATETIME is supported. This PR attempts to fix that.

The way that date-time was supported in LOOKUP JOIN (and ENRICH) was by using the `DateFieldMapper.DateFieldType.rangeQuery` (hidden behind the `termQuery` function) which internally takes our long values, casts them to Object, renders them to a string, parses that string back into an Instant (with a bunch of fancy and unnecessary checks for date-math, etc.), and then converts that instant back into a long for the actual query. Parts of this complex process are precision aware (ie. differentiate between ms and ns dates), but not the whole process. Simply dividing the original longs by 1_000_000 before passing them in actually works, but obviously looses precision. And the only reason it works anyway is that the date parsing code will accept a string containing a simple number and interpret it as either ms since the epoch, or years if the number is short enough. This does not work for nano-second dates, and in fact is far from ideal for LOOKUP JOIN on dates which does not need to re-parse the values at all.

This complex loop only makes sense in the Query DSL, where we can get all kinds of interesting sources of range values, but seems quite crazy for LOOKUP JOIN where we will always provide the join key from a LongBlock (the backing store of the DATE_TIME DataType, and the DATE_NANOS too).

So what we do here for DateNanos is provide two new methods to `DateFieldType`:
* `equalityQuery(Long, ...)` to replace `termQuery(Object, ...)`
* `rangeQuery(Long, Long, ...)` to replace `rangeQuery(Object, Object, ...)`

This allows us to pass in already parsed `long` values, and entirely skip the conversion to strings and re-parsing logic. The new methods are based on the original methods, but considerably simplified due to the removal of the complex parsing logic. The reason for both `equalityQuery` and `rangeQuery` is that it mimics the pattern used by the old `termQuery` with delegated directly down to `rangeQuery`. In addition to this, we hope to support range matching in `LOOKUP JOIN` in the near future.

* Fix compile error after backport

* Fix failing tests after backport
craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Jun 9, 2025
We reported in elastic#127249, there is no support for DATE_NANOS in LOOKUP JOIN, even though DATETIME is supported. This PR attempts to fix that.

The way that date-time was supported in LOOKUP JOIN (and ENRICH) was by using the `DateFieldMapper.DateFieldType.rangeQuery` (hidden behind the `termQuery` function) which internally takes our long values, casts them to Object, renders them to a string, parses that string back into an Instant (with a bunch of fancy and unnecessary checks for date-math, etc.), and then converts that instant back into a long for the actual query. Parts of this complex process are precision aware (ie. differentiate between ms and ns dates), but not the whole process. Simply dividing the original longs by 1_000_000 before passing them in actually works, but obviously looses precision. And the only reason it works anyway is that the date parsing code will accept a string containing a simple number and interpret it as either ms since the epoch, or years if the number is short enough. This does not work for nano-second dates, and in fact is far from ideal for LOOKUP JOIN on dates which does not need to re-parse the values at all.

This complex loop only makes sense in the Query DSL, where we can get all kinds of interesting sources of range values, but seems quite crazy for LOOKUP JOIN where we will always provide the join key from a LongBlock (the backing store of the DATE_TIME DataType, and the DATE_NANOS too).

So what we do here for DateNanos is provide two new methods to `DateFieldType`:
* `equalityQuery(Long, ...)` to replace `termQuery(Object, ...)`
* `rangeQuery(Long, Long, ...)` to replace `rangeQuery(Object, Object, ...)`

This allows us to pass in already parsed `long` values, and entirely skip the conversion to strings and re-parsing logic. The new methods are based on the original methods, but considerably simplified due to the removal of the complex parsing logic. The reason for both `equalityQuery` and `rangeQuery` is that it mimics the pattern used by the old `termQuery` with delegated directly down to `rangeQuery`. In addition to this, we hope to support range matching in `LOOKUP JOIN` in the near future.
elasticsearchmachine pushed a commit that referenced this pull request Jun 10, 2025
)

* Integration tests for LOOKUP JOIN over wider range of data types (#126150)

This test suite tests the lookup join functionality in ESQL with various data types.

For each pair of types being tested, it builds a main index called "index" containing a single document with as many fields as types being tested on the left of the pair, and then creates that many other lookup indexes, each with a single document containing exactly two fields: the field to join on, and a field to return.

The assertion is that for valid combinations, the return result should exist, and for invalid combinations an exception should be thrown. If no exception is thrown, and no result is returned, our validation rules are not aligned with the internal behaviour (ie. a bug).

Since the `LOOKUP JOIN` command requires the match field name to be the same between the main index and the lookup index, we will have field names that correctly represent the type of the field in the main index, but not the type of the field in the lookup index. This can be confusing, but it is important to remember that the field names are not the same as the types.

* Just use one lookup-settings file

This change simplifies backports from 9.x branches where these changes were done as part of other work.

* Support DATE_NANOS in LOOKUP JOIN (#127962)

We reported in #127249, there is no support for DATE_NANOS in LOOKUP JOIN, even though DATETIME is supported. This PR attempts to fix that.

The way that date-time was supported in LOOKUP JOIN (and ENRICH) was by using the `DateFieldMapper.DateFieldType.rangeQuery` (hidden behind the `termQuery` function) which internally takes our long values, casts them to Object, renders them to a string, parses that string back into an Instant (with a bunch of fancy and unnecessary checks for date-math, etc.), and then converts that instant back into a long for the actual query. Parts of this complex process are precision aware (ie. differentiate between ms and ns dates), but not the whole process. Simply dividing the original longs by 1_000_000 before passing them in actually works, but obviously looses precision. And the only reason it works anyway is that the date parsing code will accept a string containing a simple number and interpret it as either ms since the epoch, or years if the number is short enough. This does not work for nano-second dates, and in fact is far from ideal for LOOKUP JOIN on dates which does not need to re-parse the values at all.

This complex loop only makes sense in the Query DSL, where we can get all kinds of interesting sources of range values, but seems quite crazy for LOOKUP JOIN where we will always provide the join key from a LongBlock (the backing store of the DATE_TIME DataType, and the DATE_NANOS too).

So what we do here for DateNanos is provide two new methods to `DateFieldType`:
* `equalityQuery(Long, ...)` to replace `termQuery(Object, ...)`
* `rangeQuery(Long, Long, ...)` to replace `rangeQuery(Object, Object, ...)`

This allows us to pass in already parsed `long` values, and entirely skip the conversion to strings and re-parsing logic. The new methods are based on the original methods, but considerably simplified due to the removal of the complex parsing logic. The reason for both `equalityQuery` and `rangeQuery` is that it mimics the pattern used by the old `termQuery` with delegated directly down to `rangeQuery`. In addition to this, we hope to support range matching in `LOOKUP JOIN` in the near future.

* Fix compile error after backport

* Fix compile error after backport

* Update docs/changelog/129138.yaml

* SEMANTIC_TEXT was removed in later PRs, so not really testable in 8.18.

* Delete docs/changelog/129138.yaml

* Removed incorrectly added changelog

This is a backport
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.3 v8.19.0 v9.0.0 v9.0.3 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LOOKUP JOIN on date_nanos fields finds zero matches

3 participants