Apache Spark – Aironman techblog

Publicado el enero 27, 2025 por aironman

Mandelbrot, Spark y UDF

export JAVA_TOOL_OPTIONS="-Djava.security.manager=allow -Dhadoop.security.token.service.use_ip=false" && \ export SPARK_LOCAL_IP=********* && \ export HADOOP_USER_NAME=$(whoami) && \ spark-shell --packages org.apache.arrow:arrow-memory:2.0.0,org.apache.arrow:arrow-vector:2.0.0 \ --conf "spark.hadoop.fs.defaultFS=file:///" import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions.{udf, lit} import org.apache.spark.sql.expressions.UserDefinedFunction object MandelbrotComparison { def main(args: Array[String]): Unit = { val spark = SparkSession.builder .appName("MandelbrotComparison") .getOrCreate() import spark.implicits._ // Configuración para optimizar el uso de memoria y partición spark.conf.set("spark.sql.inMemoryColumnarStorage.batchSize", "10000")…

Publicado el abril 4, 2024 por aironman · 1 Comentario

How to create a langchain agent able to talk to a spark cluster.txt

Straight to the point, the code is so intuitive that I think it doesn't need much more. To say that I personally am very excited about this possibility, because the ability on the one hand to tell the spark cluster what to do, plus the possibility of having an agent with all the knowledge and…

Publicado el febrero 8, 2024febrero 9, 2024 por aironman

How to run spark-3.x with Delta Lake, apache Hudi, Apache Iceberg.

Can i run a spark-shell with Delta lake, Apache Hudi and Apache Iceberg?

Publicado el febrero 8, 2024febrero 8, 2024 por aironman

A proposal for a lambda architecture for modern real-time telecommunications

To obtain real consistency with Delta Lake, Hudi and Iceberg leaving behind Apache Impala and classic Spark.

Publicado el febrero 7, 2024febrero 8, 2024 por aironman

Una propuesta de una arquitectura lambda para telecomunicaciones modernas en tiempo real

Una propuesta para tratar de dejar atrás Impala cuando necesitas consistencia de datos y la tratas de conseguir mediante software

Publicado el diciembre 12, 2023 por aironman · 1 Comentario

Acerca de las estructuras de datos Bloom filters.

Acerca de la estructura de datos Bloom Filter.

Publicado el noviembre 28, 2023noviembre 3, 2025 por aironman

First steps with Apache Spark 3.5.0 Delta Lake using scala.

https://docs.delta.io/latest/quick-start.html#create-a-table&language-scala first, install apache spark, i am osx user, so i will not recommend to use homebrew because it will not install third party libraries. I recommend to download from https://spark.apache.org Latest version is 3.5.0 at 28 nov 2023. Then, run spark-shell with delta lake support: ATTENTION, be sure about delta lake version, you must…

Publicado el junio 23, 2023 por aironman

𝗦𝗽𝗮𝗿𝗸 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹𝗶𝘀𝗺

Parsing an exception in a long running Spark Streaming application.

We start from the fact that the exception appears at some point in the execution: java.io.IOException: Filesystem closed. and GC HEAD OVER LIMIT. It does not indicate how this last message appears, but I think it is not very important. Now, I'm going to try to figure out what's going on by asking questions and…

Publicado el diciembre 29, 2022 por aironman

Analizando una excepción en una aplicación Spark streaming de larga duración.

Analizando una excepción Filesystem closed en una app spark streaming de larga duración.

	How to feed in near… en How to create a langchain agen…
	How to feed in near… en How to create a custom pdf lan…
	Acerca de la entrega… en Acerca de la entrega y procesa…
	Acerca de la estruct… en Acerca de las estructuras de d…
	Encrypting messages… en Cifrar y descifrar mensajes…

	How to feed in near… en How to create a langchain agen…
	How to feed in near… en How to create a custom pdf lan…
	Acerca de la entrega… en Acerca de la entrega y procesa…
	Acerca de la estruct… en Acerca de las estructuras de d…
	Encrypting messages… en Cifrar y descifrar mensajes…