Skip to content

Create a scalar from array of type Map #6485

@dadepo

Description

@dadepo

Is your feature request related to a problem or challenge?

I am not sure if this is a bug, feature request or things are working as it should.

How can I fix an error of this kind

Background, I have a UDF that mimics json_build_object from postgres. This UDF returns a MapArray as the returning value.

With a table like

let df = ctx.sql(r#"
select * from test"#).await?;
df.show().await?;

+-----------+-----------+-----------+
| clientid  | name      | parentid  |
+-----------+-----------+-----------+
| c-string1 | n-string1 | p-string1 |
| c-string2 | n-string2 | p-string2 |
+-----------+-----------+-----------+

I can build a json like this fine

        let df = ctx.sql(r#"
        select json_build_object('name', name, 'type', 'test')
        from test"#).await?;
        df.show().await?;

+---------------------------------------------------------------------+
| json_build_object(Utf8("name"),test.name,Utf8("type"),Utf8("test")) |
+---------------------------------------------------------------------+
| {name: n-string1, type: test}                                       |
| {name: n-string2, type: test}                                       |
+---------------------------------------------------------------------+

But if I remove name or any other column that exist in the table, I get the error

        let df = ctx.sql(r#"
        select json_build_object('type', 'test')
        from test"#).await?;
        df.show().await?;

Error: This feature is not implemented: Can't create a scalar from array of type "Map(Field { name: "entries", data_type: Struct([Field { name: "keys", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "values", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)"

A simple work around will be to always include a column from the table in the query, but unfortunately i cannot guarantee this. Especially when I have use case where json_build_object works on computed values or values created via other udfs.

How can I make my UDF not dependent on the present of a column? For example I can use the make_array function in this way. That is, I can use it without including a column from the table

        let df = ctx.sql(r#"
        select make_array('type', 'test')
        from test"#).await?;
        df.show().await?;

+--------------------------------------+
| makearray(Utf8("type"),Utf8("test")) |
+--------------------------------------+
| [type, test]                         |
| [type, test]                         |
+--------------------------------------+

Describe the solution you'd like

Ability to have a UDF not dependent on the present of a column, generate as much results in a column as needed in the resulting record batch

Describe alternatives you've considered

if I change the return type to UTF8 instead of Map, that is

let json_build_object_return_type: ReturnTypeFunction = Arc::new(move |_| Ok(Arc::new(DataType::Utf8)));

And i return StringArray from the UDF, I get the behaviour I am looking for. But this means the json structure returned from the UDF get formatted into strings, which makes processing more difficult down the line as I end up with values like this

let value = r#"\"{\\\"key_one\\\":\\\"val_one\\\"}\""#;

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions