Skip to content

In this repo, I demonstrate how an automation platform like Make or Zappier can be used to build a data pipeline.

License

Notifications You must be signed in to change notification settings

onubrooks/Data-Pipeline-Using-Automation-Tool-Make

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Pipeline-Using-Automation-Tool-Make

In this repo, I demonstrate how an automation platform like Make or Zappier can be used to build a data pipeline.

Data Pipeline for CDC.gov Dataset using Make.com

Make Workflow Screenshot

This project demonstrates an automated data pipeline built using Make.com to fetch, clean, transform, and load data from the CDC.gov Public Health Emergency Response Data Catalog (link).

Project Overview

This pipeline showcases the capabilities of Make.com to orchestrate data processing tasks. It utilizes various modules to:

  • Fetch data: An HTTP module retrieves data from the CDC.gov API with authentication.
  • Clean and Transform data: An AWS Lambda function cleanses and transforms the retrieved data.
  • Parse and Convert JSON: A JSON module parses the processed JSON data returned by the Lambda function and converts it into an array.
  • Iterate and Insert: A flow control module iterates through each data row and inserts it into an Amazon Redshift data warehouse using a Redshift module.
  • Data Quality Check: Another Redshift module performs basic null checks on the loaded data.

Architecture Diagram

CDC HEALTH VACCINE DATA PIPELINE

Make Workflow Diagram

Make.com Workflow Screenshot

Demo Video

A demo video can be found here:

https://www.loom.com/share/d3b24d6d501948fcaee1d11a1406b0cd?sid=8a89999d-151e-4521-b2ba-85d6108d5ff8

Scenarios

This data pipeline can be used in various scenarios where data needs to be automatically fetched, processed, and stored in a data warehouse. Here are some potential applications:

  • Public health data analysis: The pipeline can be adapted to fetch and analyze other relevant datasets from the CDC.gov API, aiding public health professionals in monitoring and understanding disease trends.
  • Real-time data monitoring: The pipeline can be configured to run on a schedule, ensuring the Redshift data warehouse is kept up-to-date with the latest information.
  • Data integration pipelines: This example can serve as a building block for more complex data integration pipelines involving multiple data sources and transformations.

Technologies Used

  • Make.com: A visual automation platform used to orchestrate the data pipeline.
    • HTTP Module: Makes HTTP requests to fetch and push data.
    • JSON Module: Parses and manipulates JSON data.
    • Array Aggregator Module: Converts JSON data into an array.
    • Flow Control/Iterator Module: Iterates through data elements.
  • Amazon Web Services
    • AWS Lambda: Serverless function for data cleaning and transformation.
    • Amazon Redshift: A data warehouse that serves as data store for transformation and analytics.

Getting Started

Prerequisites:

  • A Make.com account
  • An AWS account with a configured Lambda function
  • Access credentials for the CDC.gov API

Instructions/What's in this repo:

This repository doesn't necessarily provide step by step instructions. Instead, it just acts as a guide to demonstrate what's possible using an automation platform which should be more cost effective than a full fledged no-code data platform like Talend or Estuary.

  1. This repository contains architecture diagrams to serve as a guide.
  2. Configure the Make.com workflow with your specific credentials and settings.
  3. Customize the Lambda function logic for data cleaning and transformation as needed. The code I used is in the code folder.
  4. Schedule the workflow on Make.com to run, eg every 1 hour. You can also use system variables to calculate things like offset and limit for the API calls.
  5. Deploy the Make.com workflow and start automating your data pipeline.

Additional Notes

  • This example provides a basic framework for building a data pipeline using Make.com.
  • You can extend it to incorporate more complex data processing logic and data quality checks.
  • Refer to the Make.com documentation for details on specific modules and their functionalities.

This project serves as a starting point for building automated data pipelines using Make.com. Feel free to modify and adapt it to your specific data processing needs.

About

In this repo, I demonstrate how an automation platform like Make or Zappier can be used to build a data pipeline.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages