Conflict-free merge, dedup & diff for any dataset — powered by CRDTs
Merge any two datasets in one function call. No conflicts. No coordination. No data loss.
| Language | Package | Install | Repo |
|---|---|---|---|
| Python 🐍 | crdt-merge |
pip install crdt-merge |
crdt-merge |
| TypeScript | crdt-merge |
npm install crdt-merge |
crdt-merge-ts |
| Rust 🦀 | crdt-merge |
cargo add crdt-merge |
crdt-merge-rs |
| Java ☕ | io.optitransfer:crdt-merge |
Maven / Gradle | You are here |
| CLI 🖥️ | included in Rust | cargo install crdt-merge |
crdt-merge-rs |
You have two versions of a dataset. Maybe two Spark jobs ran in parallel. Maybe two microservices updated the same records. Maybe you're merging data from multiple sources.
Today: Write custom merge scripts, lose data, or block on a coordinator.
With crdt-merge: One method call. Zero conflicts. Mathematically guaranteed.
List<Map<String, Object>> merged = CrdtMerge.merge(datasetA, datasetB, "id"); // done.<dependency>
<groupId>io.optitransfer</groupId>
<artifactId>crdt-merge</artifactId>
<version>0.1.0</version>
</dependency>implementation 'io.optitransfer:crdt-merge:0.1.0'git clone https://github.com/mgillr/crdt-merge-java.git
cd crdt-merge-java
mvn packageimport io.optitransfer.crdtmerge.CrdtMerge;
List<Map<String, Object>> teamA = List.of(
Map.of("id", 1, "name", "Alice", "role", "engineer"),
Map.of("id", 2, "name", "Bob", "role", "designer")
);
List<Map<String, Object>> teamB = List.of(
Map.of("id", 2, "name", "Robert", "role", "designer"),
Map.of("id", 3, "name", "Charlie", "role", "pm")
);
List<Map<String, Object>> merged = CrdtMerge.merge(teamA, teamB, "id");
// id=1: Alice (only in A — preserved)
// id=2: Robert (B wins — latest)
// id=3: Charlie (only in B — preserved)import io.optitransfer.crdtmerge.DedupEngine;
List<Map<String, Object>> data = List.of(
Map.of("name", "Alice"),
Map.of("name", "Alicia"),
Map.of("name", "Bob")
);
DedupEngine.DedupResult result = CrdtMerge.dedup(data, "name", 0.7);
System.out.println("Unique: " + result.unique.size());
System.out.println("Duplicates: " + result.duplicates.size());import io.optitransfer.crdtmerge.DiffEngine;
DiffEngine.DiffResult diff = CrdtMerge.diff(oldData, newData, "id");
System.out.println(diff.summary);
// "+5 added, -2 removed, ~3 modified, =990 unchanged"import com.google.gson.JsonObject;
JsonObject configA = JsonParser.parseString(
"{\"model\": {\"name\": \"bert\", \"layers\": 12}, \"tags\": [\"nlp\"]}"
).getAsJsonObject();
JsonObject configB = JsonParser.parseString(
"{\"model\": {\"name\": \"bert-large\", \"dropout\": 0.1}, \"tags\": [\"qa\"]}"
).getAsJsonObject();
JsonObject merged = CrdtMerge.mergeJson(configA, configB);
// {"model": {"name": "bert-large", "layers": 12, "dropout": 0.1}, "tags": ["nlp", "qa"]}import io.optitransfer.crdtmerge.crdt.*;
// Distributed counter
GCounter counterA = new GCounter();
counterA.increment("server-1", 100);
GCounter counterB = new GCounter();
counterB.increment("server-2", 200);
GCounter merged = counterA.merge(counterB);
System.out.println(merged.value()); // 300
// Last-writer-wins register
LWWRegister<String> regA = new LWWRegister<>("Alice", 1000L);
LWWRegister<String> regB = new LWWRegister<>("Alicia", 2000L);
System.out.println(regA.merge(regB).value()); // "Alicia" (later wins)
// Observed-remove set
ORSet<String> setA = new ORSet<>();
setA.add("item1");
ORSet<String> setB = new ORSet<>();
setB.add("item2");
ORSet<String> mergedSet = setA.merge(setB);
System.out.println(mergedSet.contains("item1")); // true
System.out.println(mergedSet.contains("item2")); // trueCRDT = Conflict-free Replicated Data Type. A data structure with one mathematical superpower:
Any two copies can merge — in any order, at any time — and the result is always identical and always correct.
Three mathematical guarantees (proven, not hoped):
| Property | What it means |
|---|---|
| Commutative | merge(A, B) == merge(B, A) — order doesn't matter |
| Associative | merge(merge(A, B), C) == merge(A, merge(B, C)) — grouping doesn't matter |
| Idempotent | merge(A, A) == A — re-merging is safe |
This means: zero coordination, zero locks, zero conflicts.
| Type | Use Case | Example |
|---|---|---|
GCounter |
Grow-only counters | Download counts, page views |
PNCounter |
Increment + decrement | Stock levels, balances |
LWWRegister<T> |
Single value (latest wins) | Name, email, status fields |
ORSet<T> |
Add/remove set | Tags, memberships, dedup sets |
- Tabular Merge — Merge two lists of maps by primary key using CRDT LWW semantics
- Deduplication — Exact and fuzzy dedup using Jaccard similarity on character bigrams
- Structural Diff — See added, removed, and modified rows between two datasets
- JSON Merge — Deep merge of nested JSON objects with conflict-free resolution
- Core CRDTs — Production-ready GCounter, PNCounter, LWWRegister, ORSet
- Zero config — One dependency (Gson), works with any Map/List data
- Spark pipelines: Merge partitioned outputs without a coordinator
- Microservices: Each service maintains local state, merge on demand
- Event sourcing: Merge event streams from multiple sources
- Data lakes: Combine datasets from different teams/regions
- Cache reconciliation: Merge divergent cache states after network partition
- Java 17+
- Gson 2.10.1+ (included via Maven)
mvn compile # Compile
mvn test # Run tests (79/79 passing)
mvn package # Create JARLicensed under the Apache License, Version 2.0.
Contributing? By opening a pull request, you agree to our Contributor License Agreement.
Copyright 2026 Ryan Gillespie / Optitransfer. See NOTICE for attribution requirements.
For commercial licensing inquiries: [email protected], [email protected]
Built with math, not hope. 🧬