custom background image

Data Processing Engine


Data Processing Engine

OVHcloud Data Platform service for automating your data integration and transformation tasks for ETL/ELT production workflows.

Automate your data processing and transformations

manageable OVHcloud

Process

Run batch processes to extract, transform, and load data from your sources to their destinations.

quick OVHcloud

Automate

Use a low-code interface to create workflows, automate and schedule tasks to run independently.

flexible OVHcloud

Develop

Code and run any custom Python or PySpark script; harness a complete SDK with multiple connectors.

reversible OVHcloud

Iterate

Organise and version your code, via Git integration and native versioning systems.

Demo

Accelerate your Data & Analytics projects

Need to deploy, manage and scale your data projects and applications very quickly and easily? Your teams, whether business analysts, data engineers, or front-end developers, can take advantage of a unified, collaborative and secure platform to work more efficiently. Using open-source technologies such as Apache Spark, Iceberg and Trino, the OVHcloud Data Platform gives you access to your data integration, storage and retrieval services in the same environment.

 

Vimeo conditions the playback of its videos on the deposit of tracers in order to offer you targeted advertising based on your browsing.

In order to watch the video, you need to accept the Sharing cookies on third-party platforms privacy category in our Privacy Center. You have the option of withdrawing your consent at any time.

For more information,visit the Vimeo cookies policy and the OVHcloud cookies policy .

Control your Data

Create and customise your processing tasks

Connect to any data source. An extensive catalogue of pre-defined job templates you can use to perform tasks like extracting, loading, aggregating and cleaning up data, as well as updating metadata. Code and run any custom script in Python or PySpark, and get a comprehensive SDK with over 40 connectors. If you already have Python data processing scripts, simply import them to centralise and orchestrate them in the Data Platform.

Manage your packages and dependencies using custom actions — create your own libraries and reuse them for various projects. The Data Processing Engine has two version control systems to protect critical workloads in production. With Data Platform version control, you can track version scalability and sync with any external Git repository.

dpe 1 tasks img
dpe 3 workflows img

Define and orchestrate your workflows

Simply sort out, line up, plan your tasks and manage your resources. Scale them up, with workers you can control if necessary. A simple authoring interface, with drag-and-drop features, allows you to view and run your projects in the cloud, whether or not you have the technical knowledge or the know-how to manage a cloud infrastructure. Set up triggers to automate your jobs, like CRON triggers, so they run punctually.

Run and scale your pipelines in the cloud

Launch actions and complete workflows as jobs in a single API call. The Data Processing Engine includes two engines: a Pandas engine (in Python 3) optimised for smaller data processing tasks, and a Spark engine (in PySpark) for massive workloads.

Scale your jobs horizontally and vertically for faster execution, using OVHcloud computing resources. Take advantage of segmentation tools to split tasks and speed up processing. Use our perimeter options to include or exclude data points beyond a given configuration.

Apache Spark™ and its logos are a registered trademark of the Apache Software Foundation. OVH SAS and its subsidiaries are not affiliated with or endorsed by the Apache Software Foundation.

Apache Spark
dpe 4 jobexecutions img

Monitor job execution and performance

View detailed reports on completed jobs, including worker CPU and RAM usage over time, and the corresponding logs. Test, validate your jobs and maximise their resource efficiency by using checkpoints in your workflows.

Get notified when a task is done or fails, with duration and RAM usage details, using the Data Platform Control Centre, and setting up task alerts. Accurately control access with the Data Platform (Identity Access Manager or IAM).