ARROW-12107: [Rust][DataFusion] Support SELECT * from information_schema.columns#9840
ARROW-12107: [Rust][DataFusion] Support SELECT * from information_schema.columns#9840alamb wants to merge 5 commits intoapache:masterfrom
SELECT * from information_schema.columns#9840Conversation
SELECT * from information_schema.columns
ad1935d to
0a3663b
Compare
andygrove
left a comment
There was a problem hiding this comment.
I'm excited to see this. Thanks @alamb
I had a proof-of-concept JDBC driver working against DataFusion at one point. One day I will find time to work on this again. With the information schema support it will really help with making this work with BI tools.
0a3663b to
ec17d43
Compare
|
FYI @returnString @Dandandan and @jorgecarleitao |
jorgecarleitao
left a comment
There was a problem hiding this comment.
LGTM! Thanks a lot, this metadata part is really starting to shine! ✨
There was a problem hiding this comment.
Not it matters much, just more like an FYI: this can cause large binaries as every variation used is compiled individually.
There was a problem hiding this comment.
I can change it into just taking &str -- though I think in the case since there is just one callsite there is likely to be just one version of the code
There was a problem hiding this comment.
yeah, it was really just if you had not though about it. No need to change anything imo 👍
There was a problem hiding this comment.
In 98f702da90d18d8ae855a5eca6b0d1d6c1809551
There was a problem hiding this comment.
(which I now realize I put into #9866 rather than this PR 🤦 ) -- but it should get in that PR hopefully
returnString
left a comment
There was a problem hiding this comment.
Looks great! And selfishly, really happy to see the namespacing stuff being used to power cool new features 😄
@returnString -- the namespacing stuff is pretty cool. I didn't realize how much easier it would make my life when it came in! |
ec17d43 to
c18f0ae
Compare
# Rationale Accessing the list of columns via `select * from information_schema.columns` (introduced in #9840) is a lot to type See the doc for background: https://docs.google.com/document/d/12cpZUSNPqVH9Z0BBx6O8REu7TFqL-NPPAYCUPpDls1k/edit# This is a sister PR to `SHOW TABLES` here: #9847 # Proposal Add support for `SHOW COLUMNS FROM <table>` command. Following the MySQL syntax supported by sqlparser: https://dev.mysql.com/doc/refman/8.0/en/show-columns.html # Example Use Setup: ``` echo "1,Foo,44.9" > /tmp/table.csv echo "2,Bar,22.1" >> /tmp/table.csv cargo run --bin datafusion-cli ``` Then run : ``` > CREATE EXTERNAL TABLE t(a int, b varchar, c float) STORED AS CSV LOCATION '/tmp/table.csv'; 0 rows in set. Query took 0 seconds. > show columns from t; +---------------+--------------+------------+-------------+-----------+-------------+ | table_catalog | table_schema | table_name | column_name | data_type | is_nullable | +---------------+--------------+------------+-------------+-----------+-------------+ | datafusion | public | t | a | Int32 | NO | | datafusion | public | t | b | Utf8 | NO | | datafusion | public | t | c | Float32 | NO | +---------------+--------------+------------+-------------+-----------+-------------+ 3 row in set. Query took 0 seconds. ``` # Commentary Note that the identifiers are case sensitive (which is a more general problem that affects all name resolution, not just `SHOW COLUMNS`). Ideally this should also work: ``` > show columns from T; Plan("Unknown relation for SHOW COLUMNS: T") > select * from T; Plan("Table or CTE with name \'T\' not found") ``` Closes #9866 from alamb/alamb/show_columns Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
# Rationale Accessing the list of columns via `select * from information_schema.columns` (introduced in apache#9840) is a lot to type See the doc for background: https://docs.google.com/document/d/12cpZUSNPqVH9Z0BBx6O8REu7TFqL-NPPAYCUPpDls1k/edit# This is a sister PR to `SHOW TABLES` here: apache#9847 # Proposal Add support for `SHOW COLUMNS FROM <table>` command. Following the MySQL syntax supported by sqlparser: https://dev.mysql.com/doc/refman/8.0/en/show-columns.html # Example Use Setup: ``` echo "1,Foo,44.9" > /tmp/table.csv echo "2,Bar,22.1" >> /tmp/table.csv cargo run --bin datafusion-cli ``` Then run : ``` > CREATE EXTERNAL TABLE t(a int, b varchar, c float) STORED AS CSV LOCATION '/tmp/table.csv'; 0 rows in set. Query took 0 seconds. > show columns from t; +---------------+--------------+------------+-------------+-----------+-------------+ | table_catalog | table_schema | table_name | column_name | data_type | is_nullable | +---------------+--------------+------------+-------------+-----------+-------------+ | datafusion | public | t | a | Int32 | NO | | datafusion | public | t | b | Utf8 | NO | | datafusion | public | t | c | Float32 | NO | +---------------+--------------+------------+-------------+-----------+-------------+ 3 row in set. Query took 0 seconds. ``` # Commentary Note that the identifiers are case sensitive (which is a more general problem that affects all name resolution, not just `SHOW COLUMNS`). Ideally this should also work: ``` > show columns from T; Plan("Unknown relation for SHOW COLUMNS: T") > select * from T; Plan("Table or CTE with name \'T\' not found") ``` Closes apache#9866 from alamb/alamb/show_columns Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
Builds on the code in #9818
Rationale
Provide schema metadata access (so a user can see what columns exist and their type).
See the doc for background: https://docs.google.com/document/d/12cpZUSNPqVH9Z0BBx6O8REu7TFqL-NPPAYCUPpDls1k/edit#
I plan to add support for
SHOW COLUMNSpossibly as a follow on PR (though I have found out thatSHOW COLUMNSandSHOW TABLESare not supported by either MySQL or by Postgres 🤔 )Changes
I chose to add the first 15 columns from
information_schema.columnsYou can see the full list in Postgres here and SQL Server here.There are a bunch more columns that say "Applies to features not available in PostgreSQL" and that don't apply to DataFusion either-- since my usecase is to get the basic schema information out I chose not to add a bunch of columns that are always null at this time.
I feel the use of column builders here is somewhat awkward (as it requires many calls to
unwrap). I am thinking of a follow on PR to refactor this code to useVec<String>andVec<u64>and then createStringArrayandUInt64Arraydirectly from them but for now I just want the functionalityExample use
Setup:
Then run :