Skip to content

Cosmos Scale Testing#40611

Merged
tvaron3 merged 210 commits intoAzure:mainfrom
tvaron3:sdk_scale_testing
Apr 22, 2025
Merged

Cosmos Scale Testing#40611
tvaron3 merged 210 commits intoAzure:mainfrom
tvaron3:sdk_scale_testing

Conversation

@tvaron3
Copy link
Member

@tvaron3 tvaron3 commented Apr 18, 2025

Description

Easy to run scripts for cosmos scale testing.

@tvaron3 tvaron3 marked this pull request as ready for review April 21, 2025 20:03
Copilot AI review requested due to automatic review settings April 21, 2025 20:03
@tvaron3 tvaron3 requested a review from a team as a code owner April 21, 2025 20:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a suite of new workloads and configurations for Cosmos DB scale testing, including asynchronous utilities and proxy configuration setups. Key changes include:

  • New asynchronous utility functions (upsert, read, query) in workload_utils.py.
  • Multiple workload scripts for both asynchronous and synchronous testing of Cosmos DB operations.
  • Configuration and setup files, including envoy proxy configurations and workload metadata.

Reviewed Changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/tests/workloads/workload_utils.py New async utility functions for concurrent database operations.
sdk/cosmos/azure-cosmos/tests/workloads/workload_configs.py Added configuration variables for Cosmos DB and proxy URI.
sdk/cosmos/azure-cosmos/tests/workloads/w_workload.py Async workload using the new utilities and logger initialization.
sdk/cosmos/azure-cosmos/tests/workloads/w_proxy_workload.py Async workload that uses a proxy session for Cosmos DB operations.
sdk/cosmos/azure-cosmos/tests/workloads/r_workload.py Async workload for reading and querying items concurrently.
sdk/cosmos/azure-cosmos/tests/workloads/r_w_q_workload_sync.py Synchronous version of workload testing using Cosmos DB client.
sdk/cosmos/azure-cosmos/tests/workloads/r_w_q_workload.py Async workload combining upsert, read, and query operations.
sdk/cosmos/azure-cosmos/tests/workloads/r_w_q_with_incorrect_client_workload.py Similar async workload using an incorrect client instantiation pattern.
sdk/cosmos/azure-cosmos/tests/workloads/r_w_q_proxy_workload.py Async workload that tests Cosmos DB operations via a proxy with async client.
sdk/cosmos/azure-cosmos/tests/workloads/r_proxy_workload.py Async workload for reading and querying items via a proxy.
sdk/cosmos/azure-cosmos/tests/workloads/initial-setup.py Async script to setup the database, container, and perform bulk upserts.
sdk/cosmos/azure-cosmos/tests/workloads/get-database-account-call.py Async script to retrieve and log database account information.
sdk/cosmos/azure-cosmos/tests/workloads/envoy/*.yaml Envoy proxy configuration files for simple and complex setups.
sdk/cosmos/azure-cosmos/tests/workloads/README.md Documentation for running scale testing workloads and setting up the environment.
Files not reviewed (2)
  • sdk/cosmos/azure-cosmos/tests/workloads/run_workloads.sh: Language not supported
  • sdk/cosmos/azure-cosmos/tests/workloads/shutdown_workloads.sh: Language not supported

Copy link
Contributor

@allenkim0129 allenkim0129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tvaron3 tvaron3 enabled auto-merge (squash) April 22, 2025 18:38
@tvaron3 tvaron3 merged commit 4b4e38e into Azure:main Apr 22, 2025
18 checks passed
cRui861 pushed a commit that referenced this pull request May 14, 2025
* background call for get database account call

* only call get database account in health check for different endpoints

* Change health check logic in sync

* Revert removing timing

* update changelog

* fix tests

* use asyncio create_task

* fix tests

* fix pylint

* Renamed variables and added effective preferred locations

* update changelog

* Add test for effective preferred regions

* Add test for effective preferred regions

* sync test for preferred regions

* Renaming and add test for health check

* fix tests

* fix tests and add more health check tests

* fix tests

* add tests

* fix tests

* Move to breaking change

* Added scale workload scripts

* Scale testing upserts

* fix cspell and tests

* fix tests

* fix initial upserts

* fix tests

* revert preferred locations

* timeout mark unavailable

* moving health check to background

* only finish health check after 2 successes

* fix tests

* add tests

* fix and add timeout tests for marking endpoints unavailable

* Reacting to comments

* Reacting to comments

* Add cleanup of background task when cosmos client closes

* React to comments

* Remove consecutive failures code

* updated changelog

* Fix test

* Fix tests that weren't awaiting properly

* Add a sync workload and fix initial setup and separate client configs to separate file

* fix imports

* fix tests

* update proxy

* change unavailable time to 7 minutes

* remove marking unavailable, health check four regions, health check primary and alternate

* fix type hints

* logs to debug

* logs to debug

* fix startup scenario

* debug logs

* fix updating cache

* adding logs

* fix operation type check unavailable

* add logic to check by endpoint and not by regional routing context

* bash sript to run workloads

* adding more workloads

* fix tests

* fix tests

* lower write workloads to use

* fix read envoy workloads

* increase read concurrency

* fix test

* more read workloads

* more read workloads

* fix tests

* fix tests

* add smaller lag for some writes and reads

* debug multimaster test

* cleanup

* mark global endpoint available

* fix tests and only use endpoint from gateway for multimaster

* balance read and write workloads

* retry multi region writes same as reads

* react to comments

* multi write fix

* debug logs

* multiwrite config

* fix multimaster workloads

* cleanup / pylint

* refactor workloads

* fix read proxy workload

* swallow exceptions and add mocking

* fix import

* fix import

* fix mocking

* fix mocking

* fix mocking

* logging improvements for timeout and failover retry policy

* debugging logs

* fix mock

* fix mock

* improve mock and adding more logs

* service request instead of response

* more logs

* cleanup logs

* remove import

* more logs and fix mock

* fix logs

* more logs for debugging

* more logs for debugging

* fix mocking

* fix mocking

* fix mocking

* add envoy

* update envoy and remove mocking

* envoy changes

* fix workloads

* lower sleep time between concurrent operations

* remove sleeps

* increase concurrency

* increase default concurrency

* remove excessive logs

* change proxy workloads

* change proxy file

* change proxy file

* change proxy file

* change proxy file

* useful logs

* change envoy file

* change envoy file

* change envoy file

* change envoy file

* change envoy file

* change envoy file

* change envoy file

* Added envoy open ai simple file

* update simple

* Updated scale testing workloads for ycsb benchmark

* Added run workload script without proxy

* Added more workloads

* Fixed the run workloads script

* Fixed one more scriptp

* Reduced the clients to below 60

* Updated file handler to rotating file handler

* Logging headers for testing

* Skipping session token for writes

* Skipping session token for writes

* commented out request / response headers

* Updated file handler to rotating file handler with 10 mb max

* Updated file handler to rotating file handler with 10 mb max

* Fixed rotating file handler for every workload script

* Updated test directory to tests

* Updated test directory to tests and added all files

* Updated test directory to tests and added workloads

* Fixed envoy proxy code

* Container and Database is configurable now

* Container and Database is configurable fix

* ru change to create container and incorrect client workload

* add None etag and match_condition edge case

* Use process id for log files and refactor common methods across workloads

* fix initial-setup.py

* fix logger naming

* fix logger naming

* fix workloads and envoy file

* improve workload scripts

* improve workload scripts

* test improvements

* cleanup

* envoy changes

* envoy changes

* revert changes

* Update sdk/cosmos/azure-cosmos/tests/workloads/w_workload.py

Co-authored-by: Copilot <[email protected]>

* Update sdk/cosmos/azure-cosmos/tests/workloads/initial-setup.py

Co-authored-by: Copilot <[email protected]>

* Update sdk/cosmos/azure-cosmos/tests/workloads/get-database-account-call.py

Co-authored-by: Copilot <[email protected]>

* Update sdk/cosmos/azure-cosmos/tests/workloads/r_proxy_workload.py

Co-authored-by: Copilot <[email protected]>

* Update sdk/cosmos/azure-cosmos/tests/workloads/r_w_q_with_incorrect_client_workload.py

Co-authored-by: Copilot <[email protected]>

* Update sdk/cosmos/azure-cosmos/tests/workloads/r_workload.py

Co-authored-by: Copilot <[email protected]>

* Update sdk/cosmos/azure-cosmos/tests/workloads/r_w_q_workload.py

Co-authored-by: Copilot <[email protected]>

* Update sdk/cosmos/azure-cosmos/tests/workloads/w_proxy_workload.py

Co-authored-by: Copilot <[email protected]>

* fix analyze and add licensing

* fix analyze

* react to comments

* fix imports

* make shutdown executable

* fix import

* fix workloads

* fix imports

* fix shutdown

---------

Co-authored-by: Kushagra Thapar <[email protected]>
Co-authored-by: Tomas Varon <[email protected]>
Co-authored-by: Ubuntu <openai-test@openai-testing-vm1.q2mff55cv1bu1hrxqay0qbmh1g.cbnx.internal.cloudapp.net>
Co-authored-by: Simon Moreno <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Scale up test workload for Cosmos Python SDK for DR Drill and performance testing

6 participants