Skip to content

mgillr/crdt-merge-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔀 crdt-merge

Conflict-free merge, dedup & diff for any dataset — powered by CRDTs

Maven Central Java 17+ License Tests: 79/79

Merge any two datasets in one function call. No conflicts. No coordination. No data loss.

Quick StartAPI ReferenceWhy CRDTsAll Languages


🌐 Available in Every Language

Language Package Install Repo
Python 🐍 crdt-merge pip install crdt-merge crdt-merge
TypeScript crdt-merge npm install crdt-merge crdt-merge-ts
Rust 🦀 crdt-merge cargo add crdt-merge crdt-merge-rs
Java io.optitransfer:crdt-merge Maven / Gradle You are here
CLI 🖥️ included in Rust cargo install crdt-merge crdt-merge-rs

🤗 Try it in the browser →


🎯 The Problem

You have two versions of a dataset. Maybe two Spark jobs ran in parallel. Maybe two microservices updated the same records. Maybe you're merging data from multiple sources.

Today: Write custom merge scripts, lose data, or block on a coordinator.

With crdt-merge: One method call. Zero conflicts. Mathematically guaranteed.

List<Map<String, Object>> merged = CrdtMerge.merge(datasetA, datasetB, "id"); // done.

⚡ Quick Start

Maven

<dependency>
  <groupId>io.optitransfer</groupId>
  <artifactId>crdt-merge</artifactId>
  <version>0.1.0</version>
</dependency>

Gradle

implementation 'io.optitransfer:crdt-merge:0.1.0'

From Source

git clone https://github.com/mgillr/crdt-merge-java.git
cd crdt-merge-java
mvn package

📖 API Reference

Merge Two Datasets

import io.optitransfer.crdtmerge.CrdtMerge;

List<Map<String, Object>> teamA = List.of(
    Map.of("id", 1, "name", "Alice", "role", "engineer"),
    Map.of("id", 2, "name", "Bob", "role", "designer")
);

List<Map<String, Object>> teamB = List.of(
    Map.of("id", 2, "name", "Robert", "role", "designer"),
    Map.of("id", 3, "name", "Charlie", "role", "pm")
);

List<Map<String, Object>> merged = CrdtMerge.merge(teamA, teamB, "id");
// id=1: Alice (only in A — preserved)
// id=2: Robert (B wins — latest)
// id=3: Charlie (only in B — preserved)

Deduplicate

import io.optitransfer.crdtmerge.DedupEngine;

List<Map<String, Object>> data = List.of(
    Map.of("name", "Alice"),
    Map.of("name", "Alicia"),
    Map.of("name", "Bob")
);

DedupEngine.DedupResult result = CrdtMerge.dedup(data, "name", 0.7);
System.out.println("Unique: " + result.unique.size());
System.out.println("Duplicates: " + result.duplicates.size());

Structural Diff

import io.optitransfer.crdtmerge.DiffEngine;

DiffEngine.DiffResult diff = CrdtMerge.diff(oldData, newData, "id");
System.out.println(diff.summary);
// "+5 added, -2 removed, ~3 modified, =990 unchanged"

Deep JSON Merge

import com.google.gson.JsonObject;

JsonObject configA = JsonParser.parseString(
    "{\"model\": {\"name\": \"bert\", \"layers\": 12}, \"tags\": [\"nlp\"]}"
).getAsJsonObject();

JsonObject configB = JsonParser.parseString(
    "{\"model\": {\"name\": \"bert-large\", \"dropout\": 0.1}, \"tags\": [\"qa\"]}"
).getAsJsonObject();

JsonObject merged = CrdtMerge.mergeJson(configA, configB);
// {"model": {"name": "bert-large", "layers": 12, "dropout": 0.1}, "tags": ["nlp", "qa"]}

Core CRDT Types

import io.optitransfer.crdtmerge.crdt.*;

// Distributed counter
GCounter counterA = new GCounter();
counterA.increment("server-1", 100);

GCounter counterB = new GCounter();
counterB.increment("server-2", 200);

GCounter merged = counterA.merge(counterB);
System.out.println(merged.value()); // 300

// Last-writer-wins register
LWWRegister<String> regA = new LWWRegister<>("Alice", 1000L);
LWWRegister<String> regB = new LWWRegister<>("Alicia", 2000L);
System.out.println(regA.merge(regB).value()); // "Alicia" (later wins)

// Observed-remove set
ORSet<String> setA = new ORSet<>();
setA.add("item1");
ORSet<String> setB = new ORSet<>();
setB.add("item2");
ORSet<String> mergedSet = setA.merge(setB);
System.out.println(mergedSet.contains("item1")); // true
System.out.println(mergedSet.contains("item2")); // true

🧠 Why CRDTs

CRDT = Conflict-free Replicated Data Type. A data structure with one mathematical superpower:

Any two copies can merge — in any order, at any time — and the result is always identical and always correct.

Three mathematical guarantees (proven, not hoped):

Property What it means
Commutative merge(A, B) == merge(B, A) — order doesn't matter
Associative merge(merge(A, B), C) == merge(A, merge(B, C)) — grouping doesn't matter
Idempotent merge(A, A) == A — re-merging is safe

This means: zero coordination, zero locks, zero conflicts.

Built-in CRDT Types

Type Use Case Example
GCounter Grow-only counters Download counts, page views
PNCounter Increment + decrement Stock levels, balances
LWWRegister<T> Single value (latest wins) Name, email, status fields
ORSet<T> Add/remove set Tags, memberships, dedup sets

Features

  • Tabular Merge — Merge two lists of maps by primary key using CRDT LWW semantics
  • Deduplication — Exact and fuzzy dedup using Jaccard similarity on character bigrams
  • Structural Diff — See added, removed, and modified rows between two datasets
  • JSON Merge — Deep merge of nested JSON objects with conflict-free resolution
  • Core CRDTs — Production-ready GCounter, PNCounter, LWWRegister, ORSet
  • Zero config — One dependency (Gson), works with any Map/List data

🏗️ Use Cases

  • Spark pipelines: Merge partitioned outputs without a coordinator
  • Microservices: Each service maintains local state, merge on demand
  • Event sourcing: Merge event streams from multiple sources
  • Data lakes: Combine datasets from different teams/regions
  • Cache reconciliation: Merge divergent cache states after network partition

Requirements

  • Java 17+
  • Gson 2.10.1+ (included via Maven)

Building

mvn compile   # Compile
mvn test      # Run tests (79/79 passing)
mvn package   # Create JAR

License

Licensed under the Apache License, Version 2.0.

Contributing? By opening a pull request, you agree to our Contributor License Agreement.

Copyright 2026 Ryan Gillespie / Optitransfer. See NOTICE for attribution requirements.

For commercial licensing inquiries: [email protected], [email protected]


Built with math, not hope. 🧬

⭐ Star on GitHub🤗 Try on HuggingFace📦 Maven Central

About

crdt-merge — Conflict-free merge, dedup & diff. JAVA edition.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages