Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 27, 2025

What changes were proposed in this pull request?

This PR aims to fix Timezone to support legacy US TimeZone identifiers.

Why are the changes needed?

Since Ubuntu 24.04 and Debian 13 doesn't provide old /usr/share/zoneinfo/US/* files, ORC C++ library fails with the following error by default. It's misleading because both recent IANA timezone database and TZDIR cannot resolve this issue. We had better provide a workaround via aliases.

C++ exception with description "Time zone file /usr/share/zoneinfo/US/Pacific does not exist.
Please install IANA time zone database and set TZDIR env." thrown in the test body.

Although there are many legacy timezone identifies, this PR aims to focus on US issues. For the rest of the code, we can handle it later based on the usage.

How was this patch tested?

Pass the CIs and manually run a docker test without these lines.

RUN ln -fs /usr/share/zoneinfo/America/Los_Angeles /usr/share/zoneinfo/US/Pacific

RUN ln -fs /usr/share/zoneinfo/America/Los_Angeles /usr/share/zoneinfo/US/Pacific

I verified locally with the revised Debian 13 image.

$ docker run -it --rm apache/orc-dev:debian13 ls -al /usr/share/zoneinfo/US
ls: cannot access '/usr/share/zoneinfo/US': No such file or directory

$ ./run-one.sh local x debian13
Started local run for ORC-2011 on debian13 at Fri Sep 26 21:54:25 PDT 2025
-- The C compiler identification is GNU 14.2.0
-- The CXX compiler identification is GNU 14.2.0
...

Test project /root/build
    Start 1: orc-test
1/9 Test #1: orc-test .........................   Passed    7.24 sec
    Start 2: java-test
2/9 Test #2: java-test ........................   Passed  110.33 sec
    Start 3: java-examples-test
3/9 Test #3: java-examples-test ...............   Passed    0.37 sec
    Start 4: java-tools-test
4/9 Test #4: java-tools-test ..................   Passed    0.06 sec
    Start 5: java-bench-gen-test
5/9 Test #5: java-bench-gen-test ..............   Passed    0.71 sec
    Start 6: java-bench-scan-test
6/9 Test #6: java-bench-scan-test .............   Passed    0.66 sec
    Start 7: java-bench-hive-test
7/9 Test #7: java-bench-hive-test .............   Passed   11.14 sec
    Start 8: java-bench-spark-test
8/9 Test #8: java-bench-spark-test ............   Passed  214.61 sec
    Start 9: tool-test
9/9 Test #9: tool-test ........................   Passed    5.00 sec

100% tests passed, 0 tests failed out of 9

Total Test time (real) = 350.16 sec
Built target test-out
Finished debian13 at Fri Sep 26 22:06:39 PDT 2025

Please note that the test coverage should be added separately. In other words, the docker images should be updated selectively and gradually after this PR because the images are shared among multiple ORC branches. Since Debian 13 is added newly for main and branch-2.2 only, I'm planning to update the following after merging this PR to have a test coverage for this feature.

RUN ln -fs /usr/share/zoneinfo/America/Los_Angeles /usr/share/zoneinfo/US/Pacific

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CPP label Sep 27, 2025
@dongjoon-hyun dongjoon-hyun changed the title ORC-2011: Fix Timezone to support legacy US TimeZone identifies ORC-2011: Fix Timezone to support legacy US TimeZone identifiers Sep 27, 2025
@dongjoon-hyun dongjoon-hyun added this to the 2.2.1 milestone Sep 27, 2025
@dongjoon-hyun dongjoon-hyun marked this pull request as draft September 27, 2025 04:31
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review September 27, 2025 05:03
@dongjoon-hyun dongjoon-hyun changed the title ORC-2011: Fix Timezone to support legacy US TimeZone identifiers ORC-2011: [C++] Fix Timezone to support legacy US TimeZone identifiers Sep 27, 2025
@dongjoon-hyun
Copy link
Member Author

Could you review this when you have some time, @wgtmac ?

return *(itr->second).get();
}
timezoneCache[filename] = std::make_shared<LazyTimezone>(filename);
auto it = TZ_ALIASES.find(zone);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overhead occurs once per zone because we fill timezoneCache only one time and reuse it forever.

@dongjoon-hyun
Copy link
Member Author

I hope we can deliver this via Apache ORC 2.2.1 first for Debian 13 and Ubuntu 24.04 out-of-box support.

dongjoon-hyun added a commit that referenced this pull request Sep 27, 2025
…fiers

### What changes were proposed in this pull request?

This PR aims to fix `Timezone` to support legacy `US` TimeZone identifiers.

### Why are the changes needed?

Since `Ubuntu 24.04` and `Debian 13` doesn't provide old `/usr/share/zoneinfo/US/*` files, ORC C++ library fails with the following error by default. It's misleading because both recent `IANA timezone database` and `TZDIR` cannot resolve this issue. We had better provide a workaround via aliases.

> C++ exception with description "Time zone file /usr/share/zoneinfo/US/Pacific does not exist.
> Please install IANA time zone database and set TZDIR env." thrown in the test body.

Although there are many legacy timezone identifies, this PR aims to focus on `US` issues. For the rest of the code, we can handle it later based on the usage.

- https://data.iana.org/time-zones/tzdb/backward

### How was this patch tested?

Pass the CIs and manually run a docker test without these lines.

https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/ubuntu24/Dockerfile#L58

https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/debian13/Dockerfile#L40

I verified locally with the revised `Debian 13` image.

```
$ docker run -it --rm apache/orc-dev:debian13 ls -al /usr/share/zoneinfo/US
ls: cannot access '/usr/share/zoneinfo/US': No such file or directory

$ ./run-one.sh local x debian13
Started local run for ORC-2011 on debian13 at Fri Sep 26 21:54:25 PDT 2025
-- The C compiler identification is GNU 14.2.0
-- The CXX compiler identification is GNU 14.2.0
...

Test project /root/build
    Start 1: orc-test
1/9 Test #1: orc-test .........................   Passed    7.24 sec
    Start 2: java-test
2/9 Test #2: java-test ........................   Passed  110.33 sec
    Start 3: java-examples-test
3/9 Test #3: java-examples-test ...............   Passed    0.37 sec
    Start 4: java-tools-test
4/9 Test #4: java-tools-test ..................   Passed    0.06 sec
    Start 5: java-bench-gen-test
5/9 Test #5: java-bench-gen-test ..............   Passed    0.71 sec
    Start 6: java-bench-scan-test
6/9 Test #6: java-bench-scan-test .............   Passed    0.66 sec
    Start 7: java-bench-hive-test
7/9 Test #7: java-bench-hive-test .............   Passed   11.14 sec
    Start 8: java-bench-spark-test
8/9 Test #8: java-bench-spark-test ............   Passed  214.61 sec
    Start 9: tool-test
9/9 Test #9: tool-test ........................   Passed    5.00 sec

100% tests passed, 0 tests failed out of 9

Total Test time (real) = 350.16 sec
Built target test-out
Finished debian13 at Fri Sep 26 22:06:39 PDT 2025
```

Please note that the test coverage should be added separately. In other words, the docker images should be updated **selectively and gradually** after this PR because the images are shared among multiple ORC branches. Since `Debian 13` is added newly for `main` and `branch-2.2` only, I'm planning to update the following after merging this PR to have a test coverage for this feature.

https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/debian13/Dockerfile#L40

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #2422 from dongjoon-hyun/ORC-2011.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 3c89afe)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member Author

Merged to main/2.2.

@dongjoon-hyun dongjoon-hyun deleted the ORC-2011 branch September 27, 2025 15:04
@wgtmac
Copy link
Member

wgtmac commented Sep 29, 2025

Sorry for missing this and thanks for fixing it! @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you so much for reviewing this, @wgtmac !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants