Skip to content

[BEAM-7] Initial Dataflow code drop#1

Merged
asfgit merged 1575 commits intoapache:masterfrom
francesperry:master
Feb 26, 2016
Merged

[BEAM-7] Initial Dataflow code drop#1
asfgit merged 1575 commits intoapache:masterfrom
francesperry:master

Conversation

@francesperry
Copy link
Member

Initial contribution of the Google Cloud Dataflow Java SDK to Apache Beam.

Caveat: There is still a lot to do before this becomes usable as Apache Beam. In particular:

  • Reorganize directories.
  • Incorporate additional drops by Google, Cloudera, and dataArtisans.
  • Make major backwards incompatible API changes.
  • Rename from Dataflow to Beam.

Beaming with joy ;-D

peihe and others added 30 commits January 15, 2016 10:13
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112105439
Users should not need to compare DataflowAssert objects on Java equality.
Instead, it's nearly always a broken test that will silently fail.

Throw an UnsupportedOperationException instead, and direct users to
isEqualTo (Singleton) or containsInAnyOrder (Iterable).

This change caught a broken test.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112200184
Generalize the 'game' example BigQuery write classes to take a map that specifies how
to generate the output fields.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112253306
Some tools don't support .zip in the class path.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112261905
gcloud moved where it stores the credentials configured on the command line.
Since there is still no support in standard libraries to get the default
project, update DefaultProjectFactory to support the new location.

Note that users who have not upgraded gcloud are still supported.

----Release Notes----
The DataflowPipelineRunner will now prefer the default project configuration
produced by newer versions of the gcloud utility. Users with old gcloud clients
are still supported.
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112281533
Fix Javadoc issue in HourlyTeamScore pipeline.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112311676
initializationStateLock should be held for short, bounded amounts of time,
because it is acquired on the dynamic work rebalancing code path
(requestDynamicSplit) which must be effectively non-blocking.
NativeReader.iterator() can do I/O and thus can take unbounded amount
of time, so it shouldn't be done under the lock.
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112375806
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112415033
----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112480742
This resolves the user issue on SO:
http://stackoverflow.com/questions/34780459/runtimeexception-from-cloud-dataflow-related-to-serializing-coder
Since Jackson 2.3, TypeIdResolvers were meant to implement
this method since typeFromId(String) became deprecated.
This newer versions of Jackson enforce this.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112487029
Custom unbounded readers are read in bundles of at most
10k elements or 10 seconds. A recent change accidentally removed
the 10k element limit. This change reintroduces it and
adds a test.

The previous test also was passing vacuously because
the iteration limit was incorrect (it would always
have only one iteration).
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112723469
Adapt join-library module to be able to upload to maven-central
Updating version numbers from 1.4.0-SNAPSHOT to 1.5.0-SNAPSHOT

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=113022038
As in 6a11a72, this makes BigQueryIO.Read work in the
DirectPipelineRunner as it does in the DataflowPipelineRunner.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112496161
Also updates /heapz so that it downloads the heapdump rather than just
telling you where on the worker it is.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112535088
This is a deterministic coder for ByteString. In the
wholeStream context, it simply writes the string. Otherwise,
it writes the string delimited with its length (encoded as a
VarInt).

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112586805
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112587034
Users who check out and edit the SDK in Eclipse should
use m2e's Eclipse import wizard, and should not want to
commit their actual project configurations.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112597945
steveniemitz referenced this pull request in twitter-forks/beam Jun 10, 2020
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
saavan-google added a commit to saavan-google/beam that referenced this pull request Aug 21, 2020
udim pushed a commit that referenced this pull request Aug 25, 2020
…odule of the Python SDK (#12657)

* Add myself to authors

* Add blog post #1: improved annotation support

* Add draft of blog post #2: performance runtime type checking

* Finish blog post #2

* Remove white space

* Resolve PR comments

Co-authored-by: Saavan Nanavati <[email protected]>
steveniemitz referenced this pull request in twitter-forks/beam Sep 18, 2020
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz referenced this pull request in twitter-forks/beam Sep 18, 2020
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz referenced this pull request in twitter-forks/beam Nov 7, 2020
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
nikie referenced this pull request in nikie/beam Nov 17, 2020
pabloem pushed a commit that referenced this pull request Dec 1, 2020
…ovider for Python SDK

* Support for NestedValueProvider for Python SDK

* Fix typo

* Update CHANGES.md

* Update value_provider_test.py

* Fix NestedValueProvider docstrings. (#1)

* Fix isort and doc errors. (#2)

* Update CHANGES.md

Co-authored-by: Eugene Nikolaiev <[email protected]>
kennknowles pushed a commit that referenced this pull request Jan 25, 2021
pabloem pushed a commit that referenced this pull request Feb 17, 2021
Debeziumio PoC (#7)

* New DebeziumIO class.

* Merge connector code

* DebeziumIO and MySqlConnector integrated.

* Added FormatFuntion param to Read builder on DebeziumIO.

* Added arguments checker to DebeziumIO.

* Add simple JSON mapper object (#1)

* Add simple JSON mapper object

* Fixed Mapper.

* Add SqlServer connector test

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Fixing MySQL schema DataException

Using file instead of schema should fix it

* MySQL Connector updated from 1.3.0 to 1.3.1

Co-authored-by: osvaldo-salinas <[email protected]>
Co-authored-by: Carlos Dominguez <[email protected]>
Co-authored-by: Carlos Domínguez <[email protected]>

* Add debeziumio tests

* Debeziumio testing json mapper (#3)

* Some code refactors. Use a default DBHistory if not provided

* Add basic tests for Json mapper

* Debeziumio time restriction (#5)

* Add simple JSON mapper object

* Fixed Mapper.

* Add SqlServer connector test

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Fixing MySQL schema DataException

Using file instead of schema should fix it

* MySQL Connector updated from 1.3.0 to 1.3.1

* Some code refactors. Use a default DBHistory if not provided

* Adding based-time restriction

Stop polling after specified amount of time

* Add basic tests for Json mapper

* Adding new restriction

Uses a time-based restriction

* Adding optional restrcition

Uses an optional time-based restriction

Co-authored-by: juanitodread <[email protected]>
Co-authored-by: osvaldo-salinas <[email protected]>

* Upgrade DebeziumIO connector (#4)

* Address comments (Change dependencies to testCompile, Set JsonMapper/Coder as default, refactors) (#8)

* Revert file

* Change dependencies to testCompile
* Move Counter sample to unit test

* Set JsonMapper as default mapper function
* Set String Coder as default coder when using JsonMapper
* Change logs from info to debug

* Debeziumio javadoc (#9)

* Adding javadoc

* Added some titles and examples

* Added SourceRecordJson doc

* Added Basic Connector doc

* Added KafkaSourceConsumer doc

* Javadoc cleanup

* Removing BasicConnector

No usages of this class were found overall

* Editing documentation

* Debeziumio fetched records restriction (#10)

* Adding javadoc

* Adding restriction by number of fetched records

Also adding a quick-fix for null value within SourceRecords
Minor fix on both MySQL and PostgreSQL Connectors Tests

* Run either by time or by number of records

* Added DebeziumOffsetTrackerTest

Tests both restrictions: By amount of time and by Number of records

* Removing comment

* DebeziumIO test for DB2. (#11)

* DebeziumIO test for DB2.

* DebeziumIO javadoc.

* Clean code:removed commented code lines on DebeziumIOConnectorTest.java

* Clean code:removing unused imports and using readAsJson().

Co-authored-by: Carlos Domínguez <[email protected]>

* Debezium limit records (now configurable) (#12)

* Adding javadoc

* Records Limit is now configurable

(It was fixed before)

* Debeziumio dockerize (#13)

* Add mysql docker container to tests

* Move debezium mysql integration test to its own file

* Add assertion to verify that the results contains a record.

* Debeziumio readme (#15)

* Adding javadoc

* Adding README file

* Add number of records configuration to the DebeziumIO component (#16)

* Code refactors (#17)

* Remove/ignore null warnings

* Remove DB2 code

* Remove docker dependency in DebeziumIO unit test and max number of recods to MySql integration test

* Change access modifiers accordingly

* Remove incomplete integration tests (Postgres and SqlServer)

* Add experimenal tag

* Debezium testing stoppable consumer (#18)

* Add try-catch-finally, stop SourceTask at finally.

* Fix warnings

* stopConsumer and processedRecords local variables removed. UT for task stop use case added

* Fix minor code style issue

Co-authored-by: juanitodread <[email protected]>

* Fix style issues (check, spotlessApply) (#19)

Co-authored-by: Osvaldo Salinas <[email protected]>
Co-authored-by: alejandro.maguey <[email protected]>
Co-authored-by: osvaldo-salinas <[email protected]>
Co-authored-by: Carlos Dominguez <[email protected]>
Co-authored-by: Carlos Domínguez <[email protected]>
Co-authored-by: Carlos Domínguez <[email protected]>
Co-authored-by: Alejandro Maguey <[email protected]>
Co-authored-by: Hassan Reyes <[email protected]>

Add missing apache license to README.md

Enabling integration test for DebeziumIO (#20)

Rename connector package cdc=>debezium. Update doc references (#21)

Fix code style on DebeziumIOMySqlConnectorIT
steveniemitz referenced this pull request in twitter-forks/beam Mar 11, 2021
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz referenced this pull request in twitter-forks/beam Apr 26, 2021
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
@bullet03 bullet03 mentioned this pull request Jul 14, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.