Spark
-
Enterprise Java

Delta Lake Introduction
Delta Lake is an open-source storage layer that brings reliability, performance, and ACID (Atomicity, Consistency, Isolation, Durability) transactions to data…
Read More » -
Enterprise Java

Apache Spark Join DataFrames Java Example
In modern data engineering pipelines, applications often need to combine multiple datasets that share the same schema. Apache Spark provides…
Read More » -
Enterprise Java

Reading and Writing Deeply Partitioned Files in Apache Spark
In large-scale data engineering and analytics, files are often stored in deeply partitioned directories to improve performance and manageability. This…
Read More » -
Software Development

Apache Spark: Unleashing Big Data Power
1. Introduction Apache Spark is a powerful open-source, distributed computing system that has become a cornerstone in the world of…
Read More » -
Software Development

Apache Spark Cheatsheet
1. Introduction to Apache Spark 1.1 What is Apache Spark? Apache Spark is an open-source, distributed computing system designed for…
Read More » -
Software Development

Where is Apache Spark heading?
I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks,…
Read More » -
Enterprise Java

Recommendation System Using Spark ML Akka and Cassandra
Building a recommendation system with Spark is a simple task. Spark’s machine learning library already does all the hard work…
Read More » -
Enterprise Java

The Kubernetes Spark operator in OpenShift Origin (Part 1)
This series is about the Kubernetes Spark operator by Radanalytics.io onOpenShift Origin. It is an Open Source operator to manageApache…
Read More » -
Enterprise Java

Sparklens: a tool for Spark applications optimization
Sparklens is a profiling tool for Spark with a built-in Spark Scheduler simulator: it makes easier to understand the scalability…
Read More »




