Master the skills to become highly effective data engineers with the modern data stack in 16 weeks
















Topic 01
This section is designed not only to introduce you to the basics of ETL but also to equip you with hands-on experience using Python, one of the most versatile and widely-used programming languages in the data engineering field. You’ll learn through practical examples, using tools and libraries that are vital for any aspiring data engineer. This foundation is crucial, as it supports advanced topics and tools such as Airflow for data orchestration and Airbyte for data integration, which you will encounter later in your data engineering career.
Topic 1 serves as the bedrock upon which the art and science of data engineering are built. By mastering ETL processes, you gain the ability to efficiently handle data from its extraction through transformation, and finally, to its loading into a usable format. This foundational knowledge is not only critical for tackling more advanced data engineering challenges but also immensely valuable in a data-driven world.
The skills you acquire here, from managing Python environments to implementing sophisticated data transformations and automation, will prepare you for a successful career in data engineering. You’ll be able to design robust, scalable ETL pipelines that can handle the complexities of modern data ecosystems, making you an asset to any organization and propelling you to the forefront of the industry
Topic 02
In this section of our data engineering bootcamp, we explore the Extract, Load, Transform (ELT) process, a methodology that has gained popularity with the rise of cloud technologies. This topic will not only broaden your understanding of data engineering concepts but also equip you with the practical skills needed to excel in this dynamic field.
In the realm of data engineering, mastering the ELT process represents a crucial competency, particularly in an era dominated by cloud computing and big data. This curriculum section not only equips you with the theoretical knowledge needed to understand the ELT framework but also provides hands-on experience with the tools and technologies urrently shaping the industry. From learning how to efficiently extract and load data to performing complex transformations within databases, this topic ensures a comprehensive understanding of modern data engineering practices.
As you progress through this course, you’ll gain invaluable skills that are highly sought after in the job market. The practical knowledge of Python, SQL, and other tools you’ll acquire here is directly applicable to real-world scenarios, preparing you for a successful career in data engineering. By understanding the intricacies of ELT, you’ll be well-positioned to design and implement efficient data pipelines that can handle the volume, velocity, and variety of today’s data ecosystems. This knowledge not only makes you a valuable asset to any organisation but also opens up a pathway to innovation and problem-solving within the vast landscape of data.
Topic 03
In Topic 3 of our data engineering bootcamp, we shift our focus towards the critical phase of productionizing pipelines. This section is designed to equip you with the expertise needed to containerize, build, and deploy ETL pipelines into a production environment, particularly within the cloud. Additionally, we delve into the essentials of code versioning and fostering team collaboration through Git.
As data engineering projects grow in complexity and scale, these skills become
indispensable for ensuring that pipelines are not only functional but also maintainable, scalable, and seamlessly integrated into production workflows. By mastering these concepts, you’ll be well-prepared to navigate the challenges of deploying data pipelines in real-world scenarios, making you a valuable asset in the field of data engineering.
Mastering the deployment and management of ETL pipelines in a production environment is a significant milestone in a data engineer’s career. This topic not only introduces you to the technicalities of containerization with Docker and cloud services with AWS but also emphasizes the importance of code versioning and collaboration using Git.
These skills are fundamental in today’s data-driven landscape, where the ability to efficiently deploy, manage, and scale data pipelines is as crucial as the insights derived from the data itself.
By the end of this topic, you’ll have a comprehensive understanding of the tools and practices needed to bring data engineering projects from development to production. This knowledge not only prepares you for the technical aspects of data engineering but also equips you with the collaborative and management skills necessary for working within modern data teams. The ability to production pipelines effectively ensures that your data projects are robust, scalable, and aligned with the evolving needs of businesses, positioning you as a key player in the field of data engineering.
Topic 04
In the modern data landscape, businesses are inundated with data from a myriad of sources: Customer Relationship Management (CRM) systems, Order Management Systems (OMS), accounting platforms, marketing tools, and much more. The task of crafting custom Extract and Load (E&L) logic for each of these data sources is not only time-consuming but also prone to inefficiency and errors.
Topic 4 of our data engineering bootcamp introduces a powerful solution to this challenge: Airbyte, an open-source data integration platform that automates the E&L processes, making data integration seamless and scalable.
This section is meticulously designed to provide a deep dive into Airbyte’s capabilities, from understanding its sources, destinations, and connections to mastering data extraction and loading patterns. By the end of this topic, you’ll be equipped with the knowledge to deploy Airbyte in real-world scenarios, significantly enhancing your skills in building efficient, reliable data integration pipelines.
The advent of tools like Airbyte represents a significant leap forward in the field of data engineering, democratizing data integration by providing a uniform platform to connect disparate data sources with minimal manual coding. Topic 4 not only equips you with the practical skills to implement Airbyte for automating data pipelines but also deepens your understanding of modern ELT processes, preparing you for the challenges of handling data in a multi-system environment.
Upon completing this topic, you’ll possess a robust set of skills that are highly sought after in the data engineering domain. The ability to seamlessly integrate data from various sources into coherent, analysis-ready datasets opens up new avenues for insights and decision-making. Your expertise in deploying and managing Airbyte pipelines, especially in cloud environments like AWS, will make you a pivotal asset in any data-driven organization, ready to tackle the complexities of today’s data ecosystem and drive meaningful business outcomes.
Topic 05
As businesses grow and their data volumes expand, the challenge of processing vast amounts of information efficiently becomes paramount. Traditional methods of data processing often hit a bottleneck, unable to cope with the scale and agility required in today’s fast-paced environment.
Topic 5 of our data engineering bootcamp addresses this challenge head-on by introducing students to the world of Analytics Engineering, focusing on two groundbreaking technologies: Snowflake for data storage and analytics, and dbt (data build tool) for transforming data in a more modular and version-controlled manner. This topic is designed to equip you with the advanced skills needed to tackle large-scale data projects, streamlining the transformation process and ensuring that data analytics can be conducted with precision at scale.
The convergence of Snowflake and dbt in the analytics engineering landscape represents a significant evolution in how data teams approach large-scale data transformation and analysis. Through this topic, you’ll gain not only the technical acumen to leverage these powerful tools but also a deeper understanding of their role in modern data engineering practices. Analytics engineering with Snowflake and dbt enables data teams to build more efficient, scalable, and manageable data pipelines, fundamentally changing the speed and efficacy with which businesses can derive insights from their data.
By mastering the concepts and practices taught in this topic, you will be well-equipped to navigate the complexities of large-scale data analytics projects. Your ability to efficiently process and transform data with Snowflake and dbt will make you an invaluable asset to any data-driven organization, ready to tackle the challenges of analytics at scale and drive forward the strategic goals of your business. This expertise not only enhances your career prospects but also positions you at the forefront of data engineering innovation.
Topic 06
In Topic 6 of our data engineering bootcamp, we delve into the crucial concepts of data modelling and semantic modelling, which stand at the heart of making data comprehensible and useful for end-user consumption. This topic is designed to bridge the gap between raw data processing and the delivery of insightful, actionable information suitable for applications in machine learning, business intelligence, and analytics. By applying software engineering principles such as modularity and reusability to data modelling, you will learn to create structured, efficient data models that serve as the foundation for robust analytics.
Furthermore, the introduction of a semantic layer atop the data warehouse facilitates intuitive data exploration, enabling users to easily interact with the underlying models. This comprehensive overview will not only enhance your technical skills but also deepen your understanding of how data engineering supports and enhances data-driven decision-making processes.
Data modelling and semantic modelling are pivotal in translating complex data into formats that are readily understandable and usable by end-users. This topic equips you with the methodologies and tools needed to construct effective data models and semantic layers, ensuring that the data processed and stored within your systems can be efficiently analyzed and interpreted.
By the end of this topic, you’ll have a solid grasp of both traditional and modern data modelling techniques, as well as the ability to implement a semantic layer that enhances data accessibility and usability. These skills are indispensable in today’s data-centric world, enabling you to support a wide range of analytics applications and empower decision-makers with the insights needed to drive business success. Your expertise in these areas will not only elevate your value as a data engineer but also contribute significantly to the strategic use of data within any organization.
Topic 07
Topic 7 of our data engineering bootcamp brings you to the cutting edge of big data processing by exploring the Data Lakehouse architecture, utilizing Databricks and Apache Spark. This segment is meticulously crafted to offer a deep dive into the world of scalable data processing, streamlining workflows for data engineering, stream processing, and machine learning. The advent of the Data Lakehouse, supported by technologies like Spark and Databricks, represents a significant leap forward, merging the flexibility of data lakes with the management features of data warehouses.
Through this topic, you’ll learn how Spark’s distributed data processing capabilities, combined with Databricks’ comprehensive ecosystem, enable the handling of vast data volumes efficiently and effectively. This knowledge is crucial for modern data engineers tasked with building scalable, robust data pipelines that can accommodate the exploding volume, velocity, and variety of data in today’s digital landscape.
Through the exploration of Databricks and Spark within the Data Lakehouse paradigm, this topic equips you with the skills and knowledge to tackle big data challenges head-on. You’ll learn not only about the technical aspects of data processing at scale but also about ensuring data quality and optimizing performance, which are crucial for delivering actionable insights.
Upon completion of this topic, you’ll have a solid understanding of how to leverage Databricks and Spark in a Data Lakehouse architecture to build scalable, efficient, and reliable data pipelines. This expertise is invaluable in a world where data is continuously growing in importance, enabling you to drive innovation and make data-driven decisions that can significantly impact the success of any organization. Your ability to apply these advanced data engineering techniques will set you apart in the field, preparing you for a rewarding career in data engineering and beyond.
Topic 08
Topic 8 of our data engineering bootcamp transitions focus towards data orchestration with Dagster, an innovative tool that reimagines the orchestration and observability of data pipelines. Dagster is designed to address the complexities of modern data applications, offering a more integrated approach to constructing, executing, and monitoring data workflows. Unlike traditional orchestrators, Dagster emphasizes the development experience and operational robustness, making it an attractive choice for data engineers seeking to streamline their data processes.
This topic aims to equip you with comprehensive knowledge of Dagster’s capabilities, from its intuitive programming model to its operational features, enabling you to build sophisticated, maintainable, and scalable data pipelines that are tightly integrated with your data stack, including tools like Airbyte, dbt, Snowflake, and Databricks.
Through this exploration of Dagster, you’ll discover a holistic approach to data pipeline orchestration that not only simplifies the development and management of complex workflows but also provides superior visibility and control over data operations. Dagster’s emphasis on type safety, asset tracking, and comprehensive observability addresses many of the challenges faced in modern data engineering practices, offering a path to more reliable, maintainable, and scalable data ecosystems.
Upon completing this topic, you’ll possess a solid foundation in orchestrating data workflows with Dagster, prepared to tackle the intricacies of data engineering with confidence. Your ability to leverage Dagster’s advanced features for pipeline construction, execution, and monitoring will make you an invaluable asset in any data-driven organization. Armed with these skills, you’re well-positioned to contribute significantly to the efficiency, reliability, and success of data projects, driving forward the strategic objectives of your organization through effective data orchestration.
Topic 09
Topic 9 of our data engineering bootcamp delves into the dynamic world of streaming
analytics, focusing on leveraging Kafka, Confluent, and Clickhouse to harness real-time insights from rapidly moving data. As businesses increasingly rely on timely data for decision-making, understanding and implementing streaming data architectures becomes crucial.
This topic is designed to provide you with a solid foundation in the principles of stream processing, enabling you to deploy Kafka topics on Confluent Cloud and integrate real-time events into Clickhouse for analysis. Additionally, you’ll learn how to transform data within Clickhouse and utilize dbt for defining and testing materialized views, equipping you with the skills needed to build scalable, real-time analytics solutions.
Through this comprehensive exploration of streaming analytics with Kafka, Confluent, and Clickhouse, you’ll acquire the capability to build and maintain robust, scalable systems that provide real-time insights into data. This topic not only covers the technical aspects of stream processing technologies but also emphasizes practical applications and best practices for deploying these solutions in real-world scenarios.
Upon completing this topic, you’ll be adept at navigating the complexities of streaming data, from ingestion with Kafka and Confluent to analysis and visualization with Clickhouse and Preset. Your newfound skills will enable you to deliver valuable, timely insights that can drive strategic decisions and operational efficiencies in any organization. Embracing streaming analytics will position you at the forefront of data engineering innovation, ready to tackle the challenges and opportunities presented by real-time data processing.
Topic 10
Topic 10 of our data engineering bootcamp focuses on the critical practice of Continuous Integration (CI) and Continuous Deployment (CD) within the realm of data engineering. As data teams expand and projects become more complex, ensuring code quality and seamless deployment becomes increasingly challenging.
This topic is designed to equip you with the knowledge and skills to implement automated CI/CD pipelines, fostering a culture of DataOps that emphasizes rapid, reliable, and automated data pipeline development. By integrating these practices, you’ll learn how to enhance team collaboration, streamline code integration, and ensure consistent deployments to various environments, including staging and production.
By the end of this topic, you’ll have a comprehensive understanding of how to implement CI/CD pipelines in data engineering projects, aligning with the DataOps principles for enhanced efficiency and collaboration. These practices not only facilitate quicker iterations and improvements of data pipelines but also significantly reduce the risk of errors and downtime in production environments.
Upon completing this topic, you’ll be well-prepared to contribute to a culture of continuous improvement within your data team, employing CI/CD pipelines to automate testing, integration, and deployment processes. Your ability to implement these methodologies will ensure that your data pipelines are robust, reliable, and ready for the demands of modern data-driven organizations. This expertise is crucial for any data engineer looking to excel in today’s fast-paced, quality-oriented industry, making you a valuable asset to any team focused on delivering high-quality data solutions efficiently.