{"id":8467,"date":"2024-08-13T07:00:36","date_gmt":"2024-08-13T14:00:36","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=8467"},"modified":"2024-08-13T08:37:42","modified_gmt":"2024-08-13T15:37:42","slug":"mongo-migrations-spark-databricks","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/mongo-migrations-spark-databricks\/","title":{"rendered":"How to migrate MongoDB to Azure Cosmos DB for MongoDB using Spark and Databricks"},"content":{"rendered":"<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/how_to_migrate_spark.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-8500\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/how_to_migrate_spark.jpg\" alt=\"How to migrate MongoDB to Azure Cosmos DB using Spark and Databricks\" width=\"960\" height=\"540\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/how_to_migrate_spark.jpg 960w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/how_to_migrate_spark-300x169.jpg 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/how_to_migrate_spark-768x432.jpg 768w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/><\/a><\/p>\n<p><span data-contrast=\"none\">MongoDB is a popular document database that offers high performance, scalability, and flexibility. Many organizations use MongoDB to store and process large volumes of data for various applications. However, managing and maintaining MongoDB clusters can be challenging and costly, especially as the data grows and the demand increases.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Azure Cosmos DB for MongoDB, particularly the vCore-based model, offers several advantages over traditional MongoDB. It provides a fully managed, globally distributed database service with a 99.99% high availability SLA, ensuring robust performance and reliability. The vCore-based model supports high-capacity vertical and horizontal scaling, making it ideal for workloads with long-running queries, complex aggregation pipelines, and distributed transactions. Additionally, it integrates seamlessly with the Azure ecosystem, allowing developers to leverage Azure\u2019s security features and other services without needing to adapt to new tools. This makes vCore-based Azure Cosmos DB for MongoDB a compelling choice for scalable, secure, and efficient database solutions. By migrating MongoDB to vCore-based Azure Cosmos DB for MongoDB, you can reduce the operational overhead, improve the availability and reliability, and leverage the native Azure integration and features.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">There are several methods to migrate MongoDB to Azure Cosmos DB, tailored to your specific needs and preferences. For smaller datasets (less than 10GB), you can use MongoDB\u2019s native tools like\u00a0mongoimport\u00a0and\u00a0mongorestore. These commands are straightforward and effective for basic migrations.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">For larger datasets, the <a href=\"https:\/\/learn.microsoft.com\/azure-data-studio\/extensions\/database-migration-for-mongo-extension\" target=\"_blank\" rel=\"noopener\">Azure Data Studio (ADS) extension<\/a> offers a user-friendly, self-service tool that supports both online and offline migrations. While the ADS extension is suitable for most scenarios, it has some limitations, such as handling complex migration requirements and private endpoint situations.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">If you need a flexible solution which also works in private endpoint situations, consider using the Spark-based Mongo Migration tool on Databricks. Additionally, it allows you to control the migration speed and parallelism and customize configuration settings to meet your specific needs.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<h3><span class=\"TextRun SCXW242486441 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"none\"><span class=\"NormalTextRun SCXW242486441 BCX8\" data-ccp-parastyle=\"heading 2\">A new tool for complex and secure migrations<\/span><\/span><span class=\"EOP SCXW242486441 BCX8\" data-ccp-props=\"{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:320,&quot;335559740&quot;:240,&quot;335572071&quot;:24,&quot;335572072&quot;:18,&quot;335572073&quot;:4270094,&quot;469789798&quot;:&quot;single&quot;}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"none\">The Spark based MongoDB Migration tool is a JAR application that uses the Spark MongoDB Connector and the Azure Cosmos DB Spark Connector to read data from MongoDB and write data to VCore-based Azure Cosmos DB for MongoDB. It can be deployed in your Databricks cluster and virtual network, and run as a Databricks job. You can access the GitHub repository with the necessary binary files, sample configuration files, and detailed instructions on how to use the tool.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<h6><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/CosmosSparkMigration.png\"><img decoding=\"async\" class=\"alignleft size-large wp-image-8473\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/CosmosSparkMigration-1024x416.png\" alt=\"Image CosmosSparkMigration\" width=\"640\" height=\"260\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/CosmosSparkMigration-1024x416.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/CosmosSparkMigration-300x122.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/CosmosSparkMigration-768x312.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/CosmosSparkMigration-1536x624.png 1536w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2024\/08\/CosmosSparkMigration-2048x832.png 2048w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/h6>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><span data-contrast=\"none\">The Spark based MongoDB Migration tool has the following features and benefits:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<ul>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"5\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">It supports both online and offline migrations, with the option to resume the migration from the last checkpoint.<\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"5\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">It can migrate data from MongoDB Atlas or MongoDB on-premises or AWS Document DB.<\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"5\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">It can migrate to both vCore and RU offerings of Azure Cosmos DB for MongoDB.<\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"5\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">It can migrate data to vCore-based Azure Cosmos DB for MongoDB with private endpoints and virtual networks.<\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"5\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">It can handle index conflicts during the migration.<\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"5\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">It can perform data transformations and generate unique Ids.<\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"5\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">It can perform error handling and logging, skipping invalid documents, and recording the errors and warnings.<\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"5\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">It can control the speed and parallelism of the migration, such as setting the batch size, the number of partitions, and the number of threads.<\/li>\n<\/ul>\n<p><span data-contrast=\"none\">Follow these steps to use the Spark based MongoDB Migration tool to migrate MongoDB to VCore-based Azure Cosmos DB for MongoDB:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<ol>\n<li data-leveltext=\"%1.\" data-font=\"\" data-listid=\"6\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\"><a href=\"https:\/\/forms.office.com\/r\/cLSRNugFSp\" target=\"_blank\" rel=\"noopener\">Sign up for Azure Cosmos DB for MongoDB Spark Migration<\/a>\u202fto gain access to the <a href=\"https:\/\/github.com\/AzureCosmosDB\/MongoMigrationSparkBasedUtility\" target=\"_blank\" rel=\"noopener\">Spark Migration Tool GitHub repository<\/a>.<\/li>\n<li data-leveltext=\"%1.\" data-font=\"\" data-listid=\"6\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">Download the JAR file and the sample configuration files from the Spark Migration Tool GitHub repository.<\/li>\n<li data-leveltext=\"%1.\" data-font=\"\" data-listid=\"6\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">Edit the configuration files according to your source and target database settings and your migration preferences.<\/li>\n<li data-leveltext=\"%1.\" data-font=\"\" data-listid=\"6\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">Upload the binary files and the configuration files to your Databricks cluster.<\/li>\n<li data-leveltext=\"%1.\" data-font=\"\" data-listid=\"6\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">Create a Databricks job and configure it to run the Spark based MongoDB Migration tool with the configuration files as arguments.<\/li>\n<li data-leveltext=\"%1.\" data-font=\"\" data-listid=\"6\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">Run the Databricks job and monitor the migration progress and status.<\/li>\n<li data-leveltext=\"%1.\" data-font=\"\" data-listid=\"6\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">Verify the migration results and troubleshoot any issues.<\/li>\n<\/ol>\n<p><span data-contrast=\"none\">The Spark based MongoDB Migration tool is a powerful way to migrate MongoDB to vCore-based Azure Cosmos DB for MongoDB. It can handle complex and secure migration scenarios and provide you with more control and flexibility over the migration process. <\/span><a href=\"https:\/\/forms.office.com\/r\/cLSRNugFSp\" target=\"_blank\" rel=\"noopener\"><span data-contrast=\"none\">Sign up for Azure Cosmos DB for MongoDB Spark Migration<\/span><\/a><span data-contrast=\"none\">\u202fto gain access to the Spark Migration Tool GitHub repository.\u202f<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:312}\">\u00a0<\/span><\/p>\n<h2>About Azure Cosmos DB<button class=\"linkicon\" title=\"\" aria-label=\"Copy Post URL\" data-id-href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/new-sdk-options-for-fine-grained-request-routing-to-azure-cosmos-db\/#about-azure-cosmos-db\" data-toggle=\"tooltip\" data-placement=\"right\" data-original-title=\"Copy Post URL\"><i class=\"fabric-icon fabric-icon--Link\"><\/i><\/button><\/h2>\n<p>Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra.\u00a0<a href=\"https:\/\/cosmos.azure.com\/try\/\" target=\"_blank\" rel=\"noopener\">Try Azure Cosmos DB for free here.<\/a>\u00a0To stay in the loop on Azure Cosmos DB updates, follow us on\u00a0<a href=\"https:\/\/twitter.com\/AzureCosmosDB\" target=\"_blank\" rel=\"noopener\">X<\/a>,\u00a0<a href=\"https:\/\/aka.ms\/AzureCosmosDBYouTube\" target=\"_blank\" rel=\"noopener\">YouTube<\/a>, and\u00a0<a href=\"https:\/\/www.linkedin.com\/company\/azure-cosmos-db\/\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>MongoDB is a popular document database that offers high performance, scalability, and flexibility. Many organizations use MongoDB to store and process large volumes of data for various applications. However, managing and maintaining MongoDB clusters can be challenging and costly, especially as the data grows and the demand increases.\u00a0 Azure Cosmos DB for MongoDB, particularly the [&hellip;]<\/p>\n","protected":false},"author":96034,"featured_media":8500,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[15,996,1778],"tags":[],"class_list":["post-8467","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mongodb-api","category-migration","category-spark"],"acf":[],"blog_post_summary":"<p>MongoDB is a popular document database that offers high performance, scalability, and flexibility. Many organizations use MongoDB to store and process large volumes of data for various applications. However, managing and maintaining MongoDB clusters can be challenging and costly, especially as the data grows and the demand increases.\u00a0 Azure Cosmos DB for MongoDB, particularly the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/8467","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/96034"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=8467"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/8467\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/8500"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=8467"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=8467"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=8467"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}