Skip to content

Implement io.WriterTo for DBFS file reader#249

Merged
pietern merged 2 commits intomainfrom
dbfs-writerto
Dec 15, 2022
Merged

Implement io.WriterTo for DBFS file reader#249
pietern merged 2 commits intomainfrom
dbfs-writerto

Conversation

@pietern
Copy link
Copy Markdown
Contributor

@pietern pietern commented Dec 14, 2022

I noticed ~30 API calls to download a 1.44M file from DBFS and it turns out that io.ReadAll starts out with a small buffer and grows it conservatively.

Through the io.WriterTo interface we can make the same reader use the maximum DBFS read size of 1MB and reduce the number of API calls for the same file down to 2.

This PR also contains a related fix for FileReader.Read such that it no longer requires a final zero-length read call.

I noticed ~30 API calls to download a 1.44M file from DBFS and it
turns out that [io.ReadAll] starts out with a small buffer and
grows it conservatively.

Through the [io.WriterTo] interface we can make the same reader
use the maximum DBFS read size of 1MB and reduce the number of
API calls for the same file down to 2.

This PR also contains a related fix for [FileReader.Read] such that
it no longer requires a final zero-length read call.
@pietern pietern requested a review from nfx December 14, 2022 21:41
@pietern pietern changed the title Implement [io.WriterTo] for DBFS file reader Implement io.WriterTo for DBFS file reader Dec 14, 2022
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Base: 42.92% // Head: 42.92% // No change to project coverage 👍

Coverage data is based on head (85648dc) compared to base (bdc131b).
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #249   +/-   ##
=======================================
  Coverage   42.92%   42.92%           
=======================================
  Files          42       42           
  Lines        2437     2437           
=======================================
  Hits         1046     1046           
  Misses       1312     1312           
  Partials       79       79           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pietern pietern merged commit dfb127a into main Dec 15, 2022
@pietern pietern deleted the dbfs-writerto branch December 15, 2022 08:27
pietern added a commit that referenced this pull request Dec 19, 2022
pietern added a commit that referenced this pull request Dec 22, 2022
@nfx nfx mentioned this pull request Dec 23, 2022
nfx added a commit that referenced this pull request Dec 23, 2022
# Version changelog

## 0.2.0

* Added `DATABRICKS_AUTH_TYPE` environment variable
([#248](#248)).
* Added Policy Families API
([#263](#263)).
* Added experimental `ErrCannotConfigureAuth` and `ErrNotAccountClient`
([#237](#237),
[#238](#238)).
* Added DBFS file handle that supports both reading and writing
([#261](#261)).
* Added `io.WriterTo` for DBFS file reader
([#249](#249)).
* Added `pflag.Value` interfaces for enums
([#234](#234)).
* Added support for adding custom HTTP visitors per request
([#230](#230)).
* Added support for raw body as byte slice if requested
([#247](#247)).
* Improved callbacks for polling the status of long-running operations
([#258](#258)).
* Improved rendering of HTTP links in godoc
([#229](#229)).
* Updated field types in the Jobs API from spec
([#259](#259)).
* Multiple OpenAPI consistency passes
([#254](#254),
[#241](#241),
[#243](#243),
[#255](#255),
[#236](#236)).

API changes:
* Renamed `IsAccountsClient` to `IsAccountClient`
([#231](#231)).
* `w.ClusterPolicies.ListAll` now takes `clusterpolicies.List` as an
argument.
* `github.com/databricks/databricks-sdk-go/service/dbsql` package is
renamed to `github.com/databricks/databricks-sdk-go/service/sql`.
 * `w.DataSources.ListDataSources` is renamed to `w.DataSources.List`.
 * `w.Queries.CreateQuery` is renamed to `w.Queries.CreateQuery`.
* `w.Queries.DeleteQueryByQueryId` is renamed to
`w.Queries.DeleteByQueryId`.
 * `w.Queries.GetQueryByQueryId` is renamed to `w.Queries.GetByQueryId`.
 * `w.Queries.UpdateQuery` is renamed to `w.Queries.Update`.
* `w.Alerts.DeleteAlertByAlertId` is renamed to
`w.Alerts.DeleteByAlertId`.
 * `w.Alerts.UpdateAlert` is renamed to `w.Alerts.Update`.
 * `w.Alerts.GetAlertByAlertId` is renamed to `w.Alerts.GetByAlertId`.
 * `w.Alerts.ListAlerts` is renamed to `w.Alerts.List`.
 * `w.Dashboards.CreateDashboard` is renamed to `w.Dashboards.Create`.
* `w.Dashboards.DeleteDashboardByDashboardId` is renamed to
`w.Dashboards.GetByDashboardId`.
* `w.Dashboards.ListDashboardsAll` is renamed to `w.Dashboards.ListAll`.
* `w.Dashboards.DeleteDashboardByDashboardId` is renamed to
`w.Dashboards.DeleteByDashboardId`.
 * `w.Dashboards.RestoreDashboard` is renamed to `w.Dashboards.Restore`.
* `deployment.CreateCustomerManagedKeyRequest` now takes
`deployment.KeyUseCase` enum.
* `w.GlobalInitScripts.CreateScript` is renamed to
`w.GlobalInitScripts.Create`.
* `w.GlobalInitScripts.DeleteScriptByScriptId` is renamed to
`w.GlobalInitScripts.DeleteByScriptId`.
* `w.GlobalInitScripts.UpdateScript` is renamed to
`w.GlobalInitScripts.Update`.
* `w.GlobalInitScripts.GetScriptByScriptId` is renamed to
`w.GlobalInitScripts.GetByScriptId`.
* `w.GlobalInitScripts.ListScriptsAll` is renamed to
`w.GlobalInitScripts.ListAll`.
 * `jobs.ResetJob.NewSettings` is now required field.
 * `w.Pipelines.CreatePipeline` is renamed to `w.Pipelines.Create`.
* `w.Pipelines.DeletePipelineByPipelineId` is renamed to
`w.Pipelines.DeleteByPipelineId`.
 * `w.Pipelines.UpdatePipeline` is renamed to `w.Pipelines.Update`.
* `w.Pipelines.GetPipelineByPipelineId` is renamed to
`w.Pipelines.GetByPipelineId`.
 * `w.StorageCredentials.Update` now also returns an entity.
 * `w.ExternalLocations.Update` now also returns an entity.
 * `w.Metastores.Update` now also returns an entity.
* `unitycatalog.CreateMetastoreAssignment.WorkspaceId` type changed from
`int` to `int64`.
* `unitycatalog.UnassignRequest.WorkspaceId` type changed from `int` to
`int64`.
 * `w.Catalogs.Update` now also returns an entity.
 * `w.Schemas.Update` now also returns an entity.
 * `w.Providers.Update` now also returns an entity.
 * `w.Shares.Update` now also returns an entity.
* `WarehousesAPI` service moved to
`github.com/databricks/databricks-sdk-go/service/sql` package.
* `w.Warehouses.CreateWarehouseAndWait` renamed to
`w.Warehouses.CreateAndWait`.
* `w.Warehouses.DeleteWarehouseByIdAndWait` renamed to
`w.Warehouses.DeleteByIdAndWait`.
 * `w.Warehouses.EditWarehouse` renamed to `w.Warehouses.Edit`.
 * `w.Warehouses.GetWarehouseById` renamed to `w.Warehouses.GetById`.
 * `w.Warehouses.ListWarehousesAll` renamed to `w.Warehouses.ListAll`.
* Removed `w.Dbfs.Overwrite` in favor of `w.Dbfs.Open("....",
dbfs.FileModeOverwrite|dbfs.FileModeWrite)`.
 * Added third required argument to `w.Dbfs.Open`.

Code generation:

* Added concept of `main` service for the package
([#239](#239)).
* Added entity primitives check
([#242](#242)).
* Added helpers for CRUD generation
([#246](#246)).
* Added more entity-generation utils
([#257](#257)).
* Dynamically generate `.gitattributes`
([#244](#244)).
* Fixed required order fields
([#245](#245)).
* Parse summary from descriptions
([#228](#228)).
* Print error on formatter failure
([#235](#235)).
* Update usage string in generator
([#260](#260)).
* Fixed order of host completion
([#233](#233)).

Dependency updates:

* Bump google.golang.org/api from 0.103.0 to 0.105.0
([#232](#232),
[#252](#252)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants