Procedure to compute table stats#10986
Conversation
|
@aokolnychyi @szehon-ho Can you help review this please |
| sql("CREATE TABLE %s (id bigint NOT NULL, data string) USING iceberg", tableName); | ||
| List<Object[]> result = | ||
| sql("CALL %s.system.compute_table_stats('%s')", catalogName, tableIdent); | ||
| Assertions.assertTrue(result.isEmpty()); |
There was a problem hiding this comment.
| Assertions.assertTrue(result.isEmpty()); | |
| assertThat(result).isEmpty(); |
please use the AssertJ assertions and we generally try to avoid ussing assertTrue/assertFalse as it doesn't provide enough context about the actual/expected when an assertion fails
| sql( | ||
| "CALL %s.system.compute_table_stats(table => '%s', columns => array('id'))", | ||
| catalogName, tableIdent); | ||
| Assertions.assertNotNull(output.get(0)); |
There was a problem hiding this comment.
| Assertions.assertNotNull(output.get(0)); | |
| assertThat(output.get(0)).isNotNull(); |
should this maybe also assert some more details here?
...nsions/src/test/java/org/apache/iceberg/spark/extensions/TestComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
| IllegalArgumentException.class, | ||
| () -> | ||
| sql( | ||
| "CALL %s.system.compute_table_stats(table => '%s', snapshot_id => %dL)", |
There was a problem hiding this comment.
maybe also add a test with an invalid/non-existing table name
There was a problem hiding this comment.
@karuppayya can you please add a test with an invalid/non-existing table?
There was a problem hiding this comment.
Added this test.
| sql( | ||
| "CALL %s.system.compute_table_stats('%s', %dL)", | ||
| catalogName, tableIdent, snapshot.snapshotId()); | ||
| Assertions.assertNotNull(output.get(0)); |
|
I will take a look at the partition stats PR first by @ajantha-bhat. I want to understand if we want a single analyze procedure or different procedures for table and partition stats. |
de6f331 to
2b6e107
Compare
| sql( | ||
| "CALL %s.system.compute_table_stats('%s', %dL)", | ||
| catalogName, tableIdent, snapshot.snapshotId()); | ||
| assertThat(output.get(0)).isNotNull(); |
There was a problem hiding this comment.
I think it would be good to check whether the output has some valid results here and in the other test
nastra
left a comment
There was a problem hiding this comment.
LGTM, but I left a few comments around testing that would be good to address
2b6e107 to
c2790a8
Compare
|
@aokolnychyi Can this merged now? |
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...nsions/src/test/java/org/apache/iceberg/spark/extensions/TestComputeTableStatsProcedure.java
Show resolved
Hide resolved
c2790a8 to
1cb1ad0
Compare
(cherry picked from commit 49e668e687e0a5d77652405ba14bdc3ee592f261)
5def103 to
62c826f
Compare
...nsions/src/test/java/org/apache/iceberg/spark/extensions/TestComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
|
This looks good to me, will merge tomorrow if no additional comments |
I think single analyze procedure is fine. We can merge this PR. I will do a follow up to accept list of ENUMs to compute the required type of stats (default compute everything). We need to also modify spark action to compute partition stats. |
| mapBuilder.put("create_changelog_view", CreateChangelogViewProcedure::builder); | ||
| mapBuilder.put("rewrite_position_delete_files", RewritePositionDeleteFilesProcedure::builder); | ||
| mapBuilder.put("fast_forward", FastForwardBranchProcedure::builder); | ||
| mapBuilder.put("compute_table_stats", ComputeTableStatsProcedure::builder); |
There was a problem hiding this comment.
There was a problem hiding this comment.
Thanks @ajantha-bhat ,
I will create a follow up PR for the doc changes, just to keep this PR focussed on the procedure.
|
I'd love to review today as well. |
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Outdated
Show resolved
Hide resolved
...v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/ComputeTableStatsProcedure.java
Show resolved
Hide resolved
aokolnychyi
left a comment
There was a problem hiding this comment.
Minor styling suggestions. LGTM.
|
Looks like all comment addressed, merged, can do a follow up if more. Thanks @karuppayya , and also @aokolnychyi @ajantha-bhat @nastra for addition reviews! |
No description provided.