Apache Spark
-
Enterprise Java

Reading and Writing Deeply Partitioned Files in Apache Spark
In large-scale data engineering and analytics, files are often stored in deeply partitioned directories to improve performance and manageability. This…
Read More » -
Enterprise Java

Real-Time Data Streams: Building Analytics with Kafka and Spark
In todayās fast-paced digital world, businesses demand real-time insights to make critical decisions. Batch processing is no longer enoughāorganizations want…
Read More » -
Software Development

Apache Spark: Unleashing Big Data Power
1. Introduction Apache Spark is a powerful open-source, distributed computing system that has become a cornerstone in the world of…
Read More » -
Software Development

Where is Apache Spark heading?
I watched (COVID19-era version of āattendedā) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks,…
Read More » -
Enterprise Java

Long Live ETL
Extract transform load is process for pulling data from one datasystem and loading into another datasystem. Datasystem involved are called…
Read More » -
Enterprise Java

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 2)
In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can…
Read More » -
Enterprise Java

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 1)
One interesting and promising Open Source project that caught my attention lately is Spline, a data lineage tracking and visualization tool…
Read More » -
Enterprise Java

Insights from Spark UI
As continuation ofĀ Ā anatomy-of-apache-spark-jobĀ post i will share how you can use Spark UI for tuning job. I will continue with same…
Read More » -
Enterprise Java

Anatomy of Apache Spark Job
ApacheĀ Ā SparkĀ is general purpose large scale data processing framework. Understanding how spark executes jobs is very important for getting most of…
Read More »


