Conversation
I noticed ~30 API calls to download a 1.44M file from DBFS and it turns out that [io.ReadAll] starts out with a small buffer and grows it conservatively. Through the [io.WriterTo] interface we can make the same reader use the maximum DBFS read size of 1MB and reduce the number of API calls for the same file down to 2. This PR also contains a related fix for [FileReader.Read] such that it no longer requires a final zero-length read call.
nfx
approved these changes
Dec 15, 2022
Codecov ReportBase: 42.92% // Head: 42.92% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #249 +/- ##
=======================================
Coverage 42.92% 42.92%
=======================================
Files 42 42
Lines 2437 2437
=======================================
Hits 1046 1046
Misses 1312 1312
Partials 79 79 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
pietern
added a commit
that referenced
this pull request
Dec 19, 2022
pietern
added a commit
that referenced
this pull request
Dec 22, 2022
Follow up to #249. Co-authored-by: Serge Smertin <[email protected]>
Merged
nfx
added a commit
that referenced
this pull request
Dec 23, 2022
# Version changelog ## 0.2.0 * Added `DATABRICKS_AUTH_TYPE` environment variable ([#248](#248)). * Added Policy Families API ([#263](#263)). * Added experimental `ErrCannotConfigureAuth` and `ErrNotAccountClient` ([#237](#237), [#238](#238)). * Added DBFS file handle that supports both reading and writing ([#261](#261)). * Added `io.WriterTo` for DBFS file reader ([#249](#249)). * Added `pflag.Value` interfaces for enums ([#234](#234)). * Added support for adding custom HTTP visitors per request ([#230](#230)). * Added support for raw body as byte slice if requested ([#247](#247)). * Improved callbacks for polling the status of long-running operations ([#258](#258)). * Improved rendering of HTTP links in godoc ([#229](#229)). * Updated field types in the Jobs API from spec ([#259](#259)). * Multiple OpenAPI consistency passes ([#254](#254), [#241](#241), [#243](#243), [#255](#255), [#236](#236)). API changes: * Renamed `IsAccountsClient` to `IsAccountClient` ([#231](#231)). * `w.ClusterPolicies.ListAll` now takes `clusterpolicies.List` as an argument. * `github.com/databricks/databricks-sdk-go/service/dbsql` package is renamed to `github.com/databricks/databricks-sdk-go/service/sql`. * `w.DataSources.ListDataSources` is renamed to `w.DataSources.List`. * `w.Queries.CreateQuery` is renamed to `w.Queries.CreateQuery`. * `w.Queries.DeleteQueryByQueryId` is renamed to `w.Queries.DeleteByQueryId`. * `w.Queries.GetQueryByQueryId` is renamed to `w.Queries.GetByQueryId`. * `w.Queries.UpdateQuery` is renamed to `w.Queries.Update`. * `w.Alerts.DeleteAlertByAlertId` is renamed to `w.Alerts.DeleteByAlertId`. * `w.Alerts.UpdateAlert` is renamed to `w.Alerts.Update`. * `w.Alerts.GetAlertByAlertId` is renamed to `w.Alerts.GetByAlertId`. * `w.Alerts.ListAlerts` is renamed to `w.Alerts.List`. * `w.Dashboards.CreateDashboard` is renamed to `w.Dashboards.Create`. * `w.Dashboards.DeleteDashboardByDashboardId` is renamed to `w.Dashboards.GetByDashboardId`. * `w.Dashboards.ListDashboardsAll` is renamed to `w.Dashboards.ListAll`. * `w.Dashboards.DeleteDashboardByDashboardId` is renamed to `w.Dashboards.DeleteByDashboardId`. * `w.Dashboards.RestoreDashboard` is renamed to `w.Dashboards.Restore`. * `deployment.CreateCustomerManagedKeyRequest` now takes `deployment.KeyUseCase` enum. * `w.GlobalInitScripts.CreateScript` is renamed to `w.GlobalInitScripts.Create`. * `w.GlobalInitScripts.DeleteScriptByScriptId` is renamed to `w.GlobalInitScripts.DeleteByScriptId`. * `w.GlobalInitScripts.UpdateScript` is renamed to `w.GlobalInitScripts.Update`. * `w.GlobalInitScripts.GetScriptByScriptId` is renamed to `w.GlobalInitScripts.GetByScriptId`. * `w.GlobalInitScripts.ListScriptsAll` is renamed to `w.GlobalInitScripts.ListAll`. * `jobs.ResetJob.NewSettings` is now required field. * `w.Pipelines.CreatePipeline` is renamed to `w.Pipelines.Create`. * `w.Pipelines.DeletePipelineByPipelineId` is renamed to `w.Pipelines.DeleteByPipelineId`. * `w.Pipelines.UpdatePipeline` is renamed to `w.Pipelines.Update`. * `w.Pipelines.GetPipelineByPipelineId` is renamed to `w.Pipelines.GetByPipelineId`. * `w.StorageCredentials.Update` now also returns an entity. * `w.ExternalLocations.Update` now also returns an entity. * `w.Metastores.Update` now also returns an entity. * `unitycatalog.CreateMetastoreAssignment.WorkspaceId` type changed from `int` to `int64`. * `unitycatalog.UnassignRequest.WorkspaceId` type changed from `int` to `int64`. * `w.Catalogs.Update` now also returns an entity. * `w.Schemas.Update` now also returns an entity. * `w.Providers.Update` now also returns an entity. * `w.Shares.Update` now also returns an entity. * `WarehousesAPI` service moved to `github.com/databricks/databricks-sdk-go/service/sql` package. * `w.Warehouses.CreateWarehouseAndWait` renamed to `w.Warehouses.CreateAndWait`. * `w.Warehouses.DeleteWarehouseByIdAndWait` renamed to `w.Warehouses.DeleteByIdAndWait`. * `w.Warehouses.EditWarehouse` renamed to `w.Warehouses.Edit`. * `w.Warehouses.GetWarehouseById` renamed to `w.Warehouses.GetById`. * `w.Warehouses.ListWarehousesAll` renamed to `w.Warehouses.ListAll`. * Removed `w.Dbfs.Overwrite` in favor of `w.Dbfs.Open("....", dbfs.FileModeOverwrite|dbfs.FileModeWrite)`. * Added third required argument to `w.Dbfs.Open`. Code generation: * Added concept of `main` service for the package ([#239](#239)). * Added entity primitives check ([#242](#242)). * Added helpers for CRUD generation ([#246](#246)). * Added more entity-generation utils ([#257](#257)). * Dynamically generate `.gitattributes` ([#244](#244)). * Fixed required order fields ([#245](#245)). * Parse summary from descriptions ([#228](#228)). * Print error on formatter failure ([#235](#235)). * Update usage string in generator ([#260](#260)). * Fixed order of host completion ([#233](#233)). Dependency updates: * Bump google.golang.org/api from 0.103.0 to 0.105.0 ([#232](#232), [#252](#252)).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I noticed ~30 API calls to download a 1.44M file from DBFS and it turns out that
io.ReadAllstarts out with a small buffer and grows it conservatively.Through the
io.WriterTointerface we can make the same reader use the maximum DBFS read size of 1MB and reduce the number of API calls for the same file down to 2.This PR also contains a related fix for
FileReader.Readsuch that it no longer requires a final zero-length read call.