Skip to content

[Bug]: BigQuery cross-language ReadFromQuery is outputting Beam rows with a different UUID than expected output. #21784

@youngoli

Description

@youngoli

What happened?

This was found in the Go SDK but the root cause seems to be in the expansion service. When performing an xlang BigQueryIO read from a query, the Beam rows it outputs end up being structurally identical to the registered output type in Go, but not actually equivalent, so it can't be converted to the named struct output despite being structurally identical.

Workaround in Go SDK

To workaround this issue in the short term, turn the named struct type that's being used as the output to a type alias of the unnamed type. This can easily be done by inserting an = sign.

Before: type OutputRow struct {...}
After: type OutputRow = struct {...}

Note that this doesn't play well with Beam Go's type registration, you'll need to avoid registering the type alias.

Log Snippets

Here's a snippet of the error on the Go side to see how it manifests:

panic: interface conversion: interface {} is struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }, not bigquery.TestRowPtrs
Full error:
while executing Process for Plan[s02-67]:
2: DataSink[S[ptransform-65@localhost:12371]] Coder:W;coder-80<LP;coder-81<R[bigquery.TestRow]>>!GWC
3: PCollection[pcollection-72] Out:[2]
4: ParDo[bigquery.castFn] Out:[2]
1: DataSource[S[ptransform-64@localhost:12371], 0] Coder:W;coder-76<LP;coder-77<R[struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }]>>!GWC Out:4
	caused by:
panic: interface conversion: interface {} is struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }, not bigquery.TestRowPtrs goroutine 58 [running]:
runtime/debug.Stack()
	/usr/lib/google-golang/src/runtime/debug/stack.go:24 +0x65
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.callNoPanic.func1()
	{...}/repos/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:58 +0xa5
panic({0xe0d380, 0xc0003a04e0})
	/usr/lib/google-golang/src/runtime/panic.go:866 +0x212
github.com/apache/beam/sdks/v2/go/pkg/beam/register.(*caller1x1[...]).Call1x1(...)
	{...}/repos/beam/sdks/go/pkg/beam/register/register.go:3205

And a snippet of the raw schema protos that are received by the Go SDK, compared with the schema it's expecting.

Received schema:

Schema proto: fields: {
  name: "counter"
  type: {
    nullable: true
    atomic_type: INT64
  }
}
fields: {
  name: "rand_data"
  type: {
    nullable: true
    row_type: {
      schema: {
        fields: {
          name: "flip"
          type: {
            nullable: true
            atomic_type: BOOLEAN
          }
        }
        fields: {
          name: "num"
          type: {
            nullable: true
            atomic_type: INT64
          }
          id: 1
          encoding_position: 1
        }
        fields: {
          name: "word"
          type: {
            nullable: true
            atomic_type: STRING
          }
          id: 2
          encoding_position: 2
        }
        id: "141b0073-d725-456c-bcdc-46c9c84e7a6d"
      }
    }
  }
  id: 1
  encoding_position: 1
}
id: "d520c5bd-86f8-4a7b-8cbd-af6816f09f61"

Expected schema:

Schema proto: fields: {
  name: "counter"
  type: {
    nullable: true
    atomic_type: INT64
  }
}
fields: {
  name: "rand_data"
  type: {
    nullable: true
    row_type: {
      schema: {
        fields: {
          name: "flip"
          type: {
            nullable: true
            atomic_type: BOOLEAN
          }
        }
        fields: {
          name: "num"
          type: {
            nullable: true
            atomic_type: INT64
          }
        }
        fields: {
          name: "word"
          type: {
            nullable: true
            atomic_type: STRING
          }
        }
        id: "c39b4c69-1e23-4267-9fb2-776e1a61a34f"
      }
    }
  }
}
id: "952f2fc2-afb0-4646-aaec-88b9a0f307be"

To see the data above, simply add the following lines after graphx/coder.go:371:

sp := prototext.Format(&s)
log.Warnf(context.Background(), "Schema proto: %v", sp)
log.Warnf(context.Background(), "Schema type: %v", t)
return coder.NewR(typex.New(t)), nil

Issue Priority

Priority: 2

Issue Component

Component: cross-language

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions