-
Notifications
You must be signed in to change notification settings - Fork 547
feat(hackernews)!: Migrate to SDKv3 #10831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hackernews)!: Migrate to SDKv3 #10831
Conversation
|
I see that these breaking changes are due to SDKv3 using [1] https://github.com/cloudquery/plugin-sdk/blob/e890385102e2668a16e35cff75fe2ffea32f2937/plugins/source/docs.go#L231 |
|
@brucehsu the breaking change in sources is expected when performing the upgrade to plugin-sdk/v3. |
|
Nice one, thank you @brucehsu! We will have to wait for all destinations to be migrated to v3 (and released) before merging this, because sources using SDK v3 will require a destination that also supports SDK v3. The other thing is that I think we should/will try and update the "Breaking Change" detection script that commented above to accept Arrow type names as aliases of the old names, so it looks a bit less scary and we can verify that there are no real breaking changes. Not saying you need to do this for this PR, I can take a look at this soon :) |
|
@hermanschaaf Gotcha! I was about to ask how do we cope with the change in types when say writing to an existing PostgreSQL database since that'd probably also change the types of columns. |
@brucehsu I can probably shed some light on this! Previously we had our own type system, which we refer to as CQTypes. As you know, we're now migrating to use Arrow types. We've created Arrow extensions for some of the unique CQTypes, so there is a fairly straight-forward mapping between CQTypes and Arrow types. For example, We're first migrating all the destination plugins to support plugin-sdk V3 right now. It's mostly easy at this point, as most of them already support Arrow from the v2 migration. Now when a destination plugin receives an Arrow type of Int64, it will translate this to the same column type that it was using before when it received an Int CQType. Destinations have already been doing this since V2. Because it's translating to the same column type as before, it doesn't break the schema for users. The next step will be to have all the sources use and send Arrow types directly. We'll start out by translating CQTypes to their direct Arrow equivalents in the source (as you've done here by upgrading to v3 😄 ). Then in the next phase, we will start allowing more Arrow types to be used, which may result in new, or more specific, column types in the destinations. For example, we could convert nested Go structs to their equivalent Arrow struct types, and then send that over the wire instead of a JSON column. The destination can then decide how to handle this: BigQuery might write it as a RECORD column, while postgres might still write it as a JSON column. We will also introduce a setting that allows users to maintain the previous behavior, so that this again does not need to be breaking. |
- Migrate dependencies to plugin-sdk/v3 - Deprecate ColumnCreationOptions in favour of inline fields - Migrate to Arrow types (TypeInt->Int64, TypeTimestamp->Timestamp_us)
73a17ea to
c192474
Compare
🤖 I have created a release *beep* *boop* --- ## [2.0.0](plugins-source-hackernews-v1.3.1...plugins-source-hackernews-v2.0.0) (2023-05-30) ### ⚠ BREAKING CHANGES * This release introduces an internal change to our type system to use [Apache Arrow](https://arrow.apache.org/). This should not have any visible breaking changes, however due to the size of the change we are introducing it under a major version bump to communicate that it might have some bugs that we weren't able to catch during our internal tests. If you encounter an issue during the upgrade, please submit a [bug report](https://github.com/cloudquery/cloudquery/issues/new/choose). You will also need to update destinations depending on which one you use: - Azure Blob Storage >= v3.2.0 - BigQuery >= v3.0.0 - ClickHouse >= v3.1.1 - DuckDB >= v1.1.6 - Elasticsearch >= v2.0.0 - File >= v3.2.0 - Firehose >= v2.0.2 - GCS >= v3.2.0 - Gremlin >= v2.1.10 - Kafka >= v3.0.1 - Meilisearch >= v2.0.1 - Microsoft SQL Server >= v4.2.0 - MongoDB >= v2.0.1 - MySQL >= v2.0.2 - Neo4j >= v3.0.0 - PostgreSQL >= v4.2.0 - S3 >= v4.4.0 - Snowflake >= v2.1.1 - SQLite >= v2.2.0 ### Features * **deps:** Upgrade to Apache Arrow v13 (latest `cqmain`) ([#10605](#10605)) ([a55da3d](a55da3d)) * Update to use [Apache Arrow](https://arrow.apache.org/) type system ([#10831](#10831)) ([ee3465f](ee3465f)) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.0.8 ([#10798](#10798)) ([27ff430](27ff430)) * **deps:** Update module github.com/cloudquery/plugin-sdk/v3 to v3.6.7 ([#11043](#11043)) ([3c6d885](3c6d885)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Summary
This PR addresses #10749.
ColumnCreationOptionsin favour of inline fieldsTypeInt->Int64,TypeTimestamp->Timestamp_us)BEGIN_COMMIT_OVERRIDE
feat: Update to use Apache Arrow type system (#10831)
BREAKING-CHANGE: This release introduces an internal change to our type system to use Apache Arrow. This should not have any visible breaking changes, however due to the size of the change we are introducing it under a major version bump to communicate that it might have some bugs that we weren't able to catch during our internal tests. If you encounter an issue during the upgrade, please submit a bug report. You will also need to update destinations depending on which one you use:
END_COMMIT_OVERRIDE