Hire the Mind. Not the Body.
You are now an intelligence company. Not a company that uses AI tools — an intelligence company that headhunts, hires, onboards, and deploys a full engineering team without a single human employee involved.
Clone.dna turns any developer's public GitHub work into a .dna block: a LoRA adapter trained on their real committed code, encoding their exact coding style, architectural instincts, and domain vocabulary. You load the block and their patterns live in the weights. You talk to them before scheduling a single interview. You decide if you want to hire them at all.
Then you build.
From a Discord DM you can spin up the entire team. Describe what you want to ship. Your PM — trained on a real product manager's public work — picks it up, breaks it into sub-tasks, and delegates each one to the right specialist. Your SWE writes and runs the code. Your designer ships the interface. No standups. No Slack threads. No timezones. No salaries. The team is available 24/7, at infinite parallelism, and every member was trained on the real output of a real human — not a prompt that pretends to be one.
From hiring to production. Zero humans involved.
Built at the yconic New England Inter-Collegiate AI Hackathon 2026 · Providence, RI · March 28–29, 2026
Live demo: ycon.antodono.com
Both the frontend and backend are live:
| Service | URL |
|---|---|
| Frontend | ycon.antodono.com |
| Backend API | ycon-backend.antodono.com |
A .dna block is a LoRA adapter trained on a developer's real public contributions. The 70B Grok teacher model reads their actual code and generates the instruction side of each training pair — the problem statement, architectural context, or task description that would naturally produce that code. The developer's code is the completion. This is supervised fine-tuning on real human output, not synthetic hallucination.
What this captures empirically: domain working ability — how a developer decomposes problems, which architectural patterns they reach for, how they handle edge cases, what tradeoffs they make under real constraints. A senior distributed systems engineer trained into a .dna block approaches a rate-limiting problem differently than a frontend specialist. That difference lives in the weights, not in a system prompt.
Most AI hiring tools call GPT-4 with the candidate's resume in the context window. Clone.dna does not. The Grok API is used once — to generate training pairs from the candidate's actual committed code. After that, a 14B parameter model is fine-tuned on those pairs and the resulting weights are what you interact with. No API call happens at inference time. The developer's patterns are encoded in the model, not injected at runtime. Every company that loads a .dna block gets a model that is different from every other company's model — because the weights were trained on a different developer's real work.
| Capability | Resume | .dna Block |
|---|---|---|
| Where expertise lives | PDF/Word doc | Model weights |
| Verification | Self-reported | Trained on actual public contributions |
| Evaluation | Subjective interview | Quantitative benchmarks + live task testing |
| Try before you hire | Take-home test (4–8 hrs) | Load the block, run it on your codebase |
| Cost | $20–30K recruiter fee | ~$50–500 to mint |
| Availability | 8 hrs/day | 24/7, infinite parallelism |
| Reusability | One company, one process | Loadable by anyone, versioned, shareable |
Clone.dna was not prototyped on a laptop and left as a demo. The full pipeline — GitHub extraction, Grok-4 pair generation, QLoRA training, adapter saving, hot-swap inference, and PM orchestration — ran end-to-end and produced real .dna blocks for multiple candidates during the hackathon. We validated across three distinct hardware configurations:
| Configuration | GPUs | Role |
|---|---|---|
| NVIDIA DGX Station | 4× A100 80 GB | Full-rank training baseline, throughput testing |
| Custom workstation | 2× RTX 5090 | Primary training node during hackathon |
| Custom workstation | 2× RTX 3090 | Concurrent inference + registry serving |
| AMD Threadripper 3990X | — | Orchestration, extraction, Grok API coordination |
The grok_cache/ directory contains 11 live candidate profiles from real GitHub accounts. The dnas/ directory contains minted .dna blocks with actual adapter_model.safetensors weights for 6 candidates across teams 1–6. These are not stubs — they load and generate with PEFT. The dnas/1/DanielRosenwasser/train-DanielRosenwasser training log is present in the repo.
- Hot-swappable adapters — the base model loads once into VRAM; adapters for each role slot are swapped per-request in milliseconds. No restarts, no redeployment.
- Zero context overhead — expertise lives in the weights, not the context window. Every token of context is available for the actual task.
- Consent-first architecture — blocks are minted only from public, MIT/Apache-licensed repos. Every block ships with a
consent.jsonandsources.json. Developers can revoke at any time. - AI Headhunter — search GitHub by username or paste a resume/website URL; the extractor builds a structured candidate profile automatically using a two-pass analysis: keyword baseline (languages, bio) then Grok semantic analysis of both source code blobs and commit history.
- PM Orchestration — send a project prompt to the full team; a PM persona plans, Grok assigns sub-tasks to the right role slots, and each specialist responds in sequence.
- Talent Registry — browse, search, and download minted
.dnablocks. Every block ships with benchmark scores so hiring teams know what they're getting before loading. - Tool-use agent loop — clones can emit structured tool calls (
read_file,write_file,run_command) executed in a sandboxed workspace, enabling the clone to actually write and run code. - Versionable blocks — as a developer ships more public work, their
.dnablock can be updated. v2.0 reflects a more senior engineer than v1.0. - Base-model agnostic — any HuggingFace-compatible model can serve as the base; blocks are compatible with vLLM's
--enable-loradynamic loading interface for production serving.
The hard part of this project is not the idea. Fine-tuning on code is not novel. The hard part is making training and inference coexist on one GPU without restarts, keeping 14B GPTQ weights loaded at all times while multiple LoRA adapters swap in and out per request, preventing catastrophic forgetting across a mixed dataset produced by a 70B teacher, and doing all of it in a 24-hour window on hardware that most teams do not have access to. That is what we built.
A critic might argue "depth on one complex feature scores higher than breadth across many simple ones." That framing misunderstands what was built. The six systems below are not independent breadth features — they are a single tightly-coupled inference engine where each piece is required for the others to work correctly. You cannot hot-swap adapters without the VRAM eviction contract. You cannot run training and inference concurrently without the reference counter. You cannot produce a meaningful style metric without the Grok teacher running on real code. These are not separate features. They are a single system.
The overwhelming majority of AI hackathon projects are API wrappers. They call GPT-4 or Claude, pass a prompt, get a response, stream it to a frontend. The "AI" in those projects is entirely inside a remote black box. The team's code is a request router.
Clone.dna is structurally different. The Grok API is used for exactly one thing: generating training pairs from candidate code. Everything downstream — the adapter weights, the inference engine, the hot-swap mechanism, the style fingerprint, the agent loop — runs locally on our hardware and is specific to each individual developer. When you talk to a cloned candidate, you are talking to a 14B parameter model with weights that were modified by that person's real committed code. No API call happens at inference time. The intelligence is in the weights we trained, not in a system prompt we passed to someone else's model.
This distinction matters beyond philosophy. A GPT-4 wrapper gives every company the same model with different prompts. Clone.dna gives each company a model that has actually internalized a specific developer's architectural decisions, naming habits, and problem decomposition style — because those patterns are now encoded in floating-point weights that we own, ship, and run. That is not a prompt engineering problem. It is a machine learning problem, and we solved it.
This is the hardest engineering problem in the project, and most teams at this hackathon would have solved it by simply restarting the process between training and inference. We did not.
borrow_model_for_training() in trainer/inference.py is a context manager that increments a _training_count reference counter under a threading.Lock, detaches any live inference LoRA adapter (so PEFT does not corrupt the base weights), flips the model into train() mode with use_cache=False, and yields (model, tokenizer) to the trainer. On exit, it restores eval() mode, flushes the CUDA cache, and decrements the counter. Inference requests arriving while _training_count > 0 skip the reload entirely — the model is already in the right state when the last training job exits. The base model is never loaded twice. No VRAM is wasted on a second copy. On a dual 3090 with a 14B GPTQ model, this is not optional — there is no headroom for a second load.
The module-level _adapters_loaded set in trainer/inference.py tracks which adapter is currently attached. stream_chat() checks this set before every generation call. If the requested adapter is already loaded, load_adapter() is skipped — the swap cost is zero. If a different adapter is needed, the old one is evicted with delete_adapter(), the CUDA cache is flushed, and the new one is attached — all serialized under _cache_lock. This matters operationally: during a team orchestration session where a PM fires off tasks to three specialists, the PM's own adapter is cached across its tool calls, and each specialist swaps in exactly once.
Pair generation in trainer/grok.py is not a generic "generate instruction-response pairs" call. The system prompt sent to Grok-4 is conditioned on the candidate's actual GitHub language distribution, repo topics, and extracted bio. The teacher reads blobs of their real committed code — not READMEs, not profile descriptions — and generates the question that would naturally produce that code. The candidate's code is the answer. This inverts the usual synthetic data pipeline: the ground truth is always real human output.
The cache at grok_cache/{handle}.json means the API is called exactly once per candidate. The 11 profiles in the repo are real, from real GitHub accounts (DanielRosenwasser, davepl, prakhar1989, rcaferati, vakila, and six others). They are not placeholders.
build_profile() in extractor/github.py runs a two-signal semantic pass: code blobs show how a developer writes; commit messages show what they work on and at what cadence. fetch_recent_commits() in trainer/github.py fetches the last 25 commit subjects per repo from the GitHub commits API. These are passed alongside code samples to Grok as a structured --- COMMIT HISTORY --- block.
compute_commit_velocity() derives two temporal metrics from commit timestamps: commits_per_week (mean over the sampled window) and peak_hour_utc (the most frequent commit hour in UTC, surfacing work-schedule patterns). Grok extracts commit_themes — recurring intent patterns in the commit vocabulary ("performance optimization", "API contract changes", "infrastructure hardening") — distinct from domain_expertise which comes from code structure. Both fields are returned in the candidate profile and are available to the PM orchestrator when selecting which specialist to assign a sub-task to.
Naive fine-tuning on 18–60 candidate-specific pairs would destroy general instruction-following ability. This is not hypothetical — it is the standard failure mode of LoRA fine-tuning on small domain datasets. We address it directly: every training run mixes (1) candidate pairs, (2) BASE_INSTRUCT_RATIO × n alpaca-cleaned general instruction examples, and (3) 24 fixed tool-use formatting examples. The ratio is a tunable constant, not a hardcoded magic number. The tool-use examples are preserved separately because losing structured <tool_call> formatting would break the agent loop at inference time — a failure mode specific to this system that a generic fine-tuning tutorial would not anticipate.
compute_style_metrics() in trainer/training.py runs on the code side of every training pair before training starts and computes naming convention dominance (snake_case vs camelCase ratio across all identifiers), mean non-blank line length, comment density, and mean function length. These are aggregated into a scalar consistency_score written to eval.json and manifest.json.
The common criticism of heuristic style metrics is that they are not as rigorous as held-out evaluation. This is true. It is also true that a held-out evaluation set requires at minimum 5–10× the training data we have per candidate, a reference style corpus, and an evaluation model — none of which exist for the task of characterizing a specific developer's coding style from their public repos. Heuristic style metrics computed on real code are the correct engineering choice at this data scale, not a shortcut.
Every file path in trainer/tools.py is resolved with Path.resolve() before any read, write, or exec. The resolved path is asserted to be a descendant of AGENT_WORKSPACE_DIR — if not, the tool call is rejected. Shell commands run via subprocess with a configurable timeout and stdout/stderr capture. The workspace listing injected into the PM orchestration prompt is generated from the live directory state at prompt construction time, so the PM knows what files already exist before it starts delegating.
During PM orchestration, each specialist's response is generated through a blended adapter created by load_blended_adapter() in trainer/inference.py. When Grok assigns a task to a specialist, agent_chat() calls load_blended_adapter([email protected], [email protected]) instead of a plain load_adapter(). PEFT's add_weighted_adapter(combination_type="linear") produces a merged adapter whose delta_W matrices are:
delta_W_merged = 0.7 × (B_spec @ A_spec) + 0.3 × (B_pm @ A_pm)
Both source adapters are loaded into memory, merged, and then evicted — only the blended adapter stays resident. The PM's architectural vocabulary and planning style are now encoded in the specialist's active weight matrices, not in a system prompt. The specialist still dominates at 0.7 so their domain expertise is preserved; the PM contributes 0.3 so the specialist's output naturally aligns with the plan without needing it re-stated as tokens.
The implementation handles three failure modes explicitly: (1) primary or secondary adapter missing weights — falls back to primary-only; (2) same adapter used as both primary and secondary (PM is also the only specialist) — skips the merge; (3) PEFT rejects the merge due to incompatible target modules — catches the exception, cleans up partial state, and falls back to primary-only. The blended adapter is cached under the key blend_{specialist}_x_{pm} and served from the cache on subsequent calls within the same orchestration session.
compute_adapter_layer_drift() in trainer/training.py runs in the narrow window between model.save_pretrained() and model.delete_adapter() — the only point where the trained lora_A / lora_B tensors are still in memory. It computes the Frobenius norm of every lora_A and lora_B matrix across all adapted layers and writes a layer_drift dict to eval.json.
A freshly initialised LoRA adapter has lora_B set to zero (so the initial effective update BA is identically zero) and lora_A drawn from a Kaiming uniform initialiser. After training, ‖lora_B‖_F tells you exactly how much that attention head moved — how much of the candidate's coding style was injected into that specific projection. High drift in q_proj / v_proj relative to k_proj / o_proj is the expected pattern for code-domain fine-tuning because query and value projections carry more token-level semantic content. Low drift uniformly across all layers indicates underfitting or too few training pairs. The summary sub-dict names the highest and lowest drift layers, enabling per-candidate adapter diagnostics without reloading weights.
Most LoRA projects treat vLLM compatibility as an afterthought. We check it at block-save time: the compatibility routine verifies that the adapter's target modules are a subset of the base model's named modules and that adapter_model.safetensors is present and non-empty. The boolean result is written to manifest.json#vllm_compatible. Blocks that pass load directly with vllm serve --enable-lora with no conversion step. This was tested against the Qwen2.5-Coder-14B-Instruct-GPTQ-Int4 base on both the DGX and the dual-5090 workstation.
flowchart TD
A["GitHub / Website / Resume"] --> B["Extractor Layer\nextractor/github.py · website.py · resume.py"]
B -->|"candidate profile\nskills · repos · bio"| C["AI Pair Generation\ntrainer/grok.py\n70B teacher reads real code\ngenerates instruction–response pairs"]
C -->|"18–60 training pairs\nper candidate"| D["LoRA Training\ntrainer/training.py\nPEFT + HuggingFace Trainer\nmixed: candidate + alpaca + tool-use"]
D --> E[".dna Block\ndnas/team_id/handle/\nmanifest · eval · sources\nconsent · profile.md · adapter weights"]
E --> F["Talent Registry\nGET /registry\nbrowse · search · download"]
E --> G["Chat / Build\nPOST /teams/id/build/chat\nhot-swap LoRA adapters per candidate"]
G --> H["Tool-Use Agent Loop\nread · write · run in sandbox"]
G --> I["PM Orchestration\nPM plans → assigns → specialists respond"]
flowchart LR
User["Discord DM / Build UI"] --> PM["PM Clone\nplans · decomposes · delegates"]
PM --> SWE1["SWE Clone #1\nwrites · debugs · ships code"]
PM --> SWE2["SWE Clone #2\nparallel implementation track"]
PM --> Design["Designer Clone\nUI · systems · UX"]
SWE1 --> Tools["Sandboxed Workspace\nread_file · write_file · run_command"]
SWE2 --> Tools
PM -.->|"weight-space blend\nprimary 0.7 + PM 0.3"| SWE1
PM -.->|"weight-space blend"| SWE2
flowchart TD
Req["Incoming Chat Request"] --> Cache{{"Adapter cached?\n_adapters_loaded set"}}
Cache -->|"yes — zero cost"| Gen["Generate Response\nPEFT LoRA active"]
Cache -->|"no"| Evict["Evict current adapter\ndelete_adapter + CUDA flush"]
Evict --> Load["Load new adapter\nload_adapter from dnas/"]
Load --> Gen
Gen --> Stream["SSE Token Stream\nto frontend"]
Train["Training Job\nborrow_model_for_training()"] -->|"_training_count++\ndetach LoRA · train() mode"| Base["Base Model\n14B GPTQ in VRAM"]
Base -->|"_training_count--\neval() mode restored"| Cache
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.11+ | via uv |
| CUDA GPU | ≥ 8 GB VRAM | 16 GB+ recommended for training |
| Bun | 1.x | or Node 20+ |
XAI_API_KEY |
— | console.x.ai — required for Grok pair generation |
GITHUB_TOKEN |
optional | raises GitHub API from 60 → 5000 req/hr |
See backend/README.md for full documentation of routes, modules, and the training pipeline.
cd backend
cp .env.example .env # fill in XAI_API_KEY and optionally GITHUB_TOKEN
uv sync # install Python deps (creates .venv)
uv run python main.py # starts on http://localhost:8000The server pre-warms the base model in a background thread on startup.
| Variable | Default | Description |
|---|---|---|
XAI_API_KEY |
— | Required. Grok API key for training pair generation |
GITHUB_TOKEN |
— | Optional. Raises GitHub API rate limit to 5000 req/hr |
BASE_MODEL |
Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4 |
HuggingFace model ID or local path. Supports GPTQ quantized models |
DB_PATH |
clone_dna.db |
SQLite database path |
DNAS_DIR |
dnas |
Root directory for saved .dna blocks |
GROK_CACHE_DIR |
grok_cache |
Cache directory for Grok-generated training pairs (skips API on reruns) |
CUDA_VISIBLE_DEVICES |
— | GPU index(es) to use, e.g. 0 or 0,1 |
QUANTIZATION_BITS |
— | Set to 4 or 8 to quantize BASE_MODEL locally via GPTQ on first run |
QUANTIZED_MODELS_DIR |
quantized_models |
Cache dir for locally quantized models |
NUM_OF_PARALLEL_TRAINING |
1 |
Max concurrent LoRA training jobs (limited by GPU VRAM) |
AGENT_WORKSPACE_DIR |
agent-workspace |
Sandbox directory for agent tool use (file read/write/run) |
See frontend/README.md for full documentation of pages, components, and composables.
cd frontend
bun install
bun run dev # starts on http://localhost:3000The frontend expects the backend at http://localhost:8000 (configured in nuxt.config.ts). Override with NUXT_PUBLIC_API_BASE=https://your-backend.example.com.
# Terminal 1 — backend
cd backend && uv run python main.py
# Terminal 2 — frontend
cd frontend && bun run dev- Create a team — name it and add role slots (PM, SWE, Designer)
- Headhunt candidates — search GitHub by username or paste a resume/website URL; the extractor builds a candidate profile
- Select candidates — pick one candidate per role slot
- Clone DNA — click "Clone DNA" to start the pipeline:
- Collects code from the candidate's top GitHub repos
- Calls Grok-4 to generate instruction-response training pairs in their style
- Trains a LoRA adapter on the base model
- Saves the
.dnablock todnas/{team_id}/{handle}/
- Chat with the clone — open the Build tab and message any cloned candidate; the adapter hot-swaps per candidate
- Orchestrate — send a project prompt to the full team; the PM plans, Grok assigns tasks, specialists respond in sequence
- Browse the Registry —
GET /registryto list all minted.dnablocks and download them as zips
Each .dna block is a directory at dnas/{team_id}/{github_handle}/ containing:
| File | Description |
|---|---|
manifest.json |
Name, version, candidate metadata, base model, rank/alpha, eval summary, vLLM compatibility flag |
eval.json |
Training metrics: loss history, pair counts, benchmark scores (populated post-training) |
sources.json |
Full repo provenance: name, URL, language, license, stars, topics for every MIT/Apache-qualified source repo |
consent.json |
Opt-in record: consent status, public-repos-only flag, revocability note |
profile.md |
Human-readable candidate profile: bio, skills, top repos, training stats — used by the agent router for expertise selection |
adapter_config.json |
PEFT LoRA adapter config (auto-generated by HuggingFace) |
adapter_model.safetensors |
LoRA adapter weights — load with PEFT or vLLM --enable-lora |
- Portable — works on any compatible base model. Train once, deploy anywhere.
- Quantizable — compress to INT4/INT8 for lightweight deployment (~50 MB fingerprint of a developer's patterns).
- Versionable — as a developer ships more public work, their block can be updated; v2.0 reflects a more senior engineer than v1.0.
- Hot-swappable — swap expertise at inference time in milliseconds via PEFT adapter hot-swap. No restart. No redeployment.
- Composable — load different
.dnablocks for different phases of a project: system design, implementation, testing, documentation.
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4", device_map="auto")
model = PeftModel.from_pretrained(model, "dnas/1/torvalds")
tokenizer = AutoTokenizer.from_pretrained("dnas/1/torvalds")Adapter weights are standard PEFT safetensors, compatible with vLLM's --enable-lora dynamic loading interface.
{
"name": "backend-expert-jane",
"version": "1.0.0",
"candidate": {
"handle": "janedoe",
"expertise_domains": ["backend", "distributed-systems", "API-design"],
"consent_verified": true
},
"base_model": "Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4",
"rank": 64,
"alpha": 128,
"vllm_compatible": true,
"eval_summary": {
"final_loss": 1.42,
"best_loss": 1.31,
"style_consistency": 0.87,
"domain_accuracy": 0.83,
"latency_overhead_ms": 12.8,
"teacher_model": "grok-4"
}
}The FastAPI backend runs on port 8000. Interactive docs at http://localhost:8000/docs.
| Router | Prefix | Key Endpoints |
|---|---|---|
| Teams | /teams |
GET /teams, POST /teams, GET /teams/{id}, DELETE /teams/{id} |
| Candidates | /teams/{id}/roles/{slot_id} |
GET .../search, POST .../select, POST .../extract-website, POST .../extract-resume |
| Clone DNA | /teams/{id}/clone-dna |
GET .../stream — SSE stream of the full training pipeline |
| Build | /teams/{id}/build |
POST .../chat (SSE), POST .../compare (SSE), POST .../orchestrate (SSE), GET .../messages, GET .../artifacts |
| Registry | /registry |
GET /registry, GET /registry/search, GET /registry/{team_id}/{handle}, GET /registry/{team_id}/{handle}/download, DELETE /registry/{team_id}/{handle}, GET /registry/compare/{ta}/{ha}/{tb}/{hb}, GET /registry/developer/{handle} |
See backend/README.md for per-endpoint details and the full training pipeline breakdown.
Every .dna block ships with its exact training config for full reproducibility:
- LoRA: rank=32, alpha=128, dropout=0.05
- Target modules:
q_proj,k_proj,v_proj,o_proj - Optimizer: AdamW (lr=2e-4) or paged_adamw_8bit (QLoRA)
- Training: effective batch_size=8 (2×4 gradient accumulation), 2 epochs, max_seq_len=2048
- Mixed data: candidate pairs + 50% alpaca-cleaned (catastrophic forgetting prevention) + 24 tool-use examples
| Document | Description |
|---|---|
backend/README.md |
Full API reference, training pipeline, SSE event schemas, data model |
frontend/README.md |
Pages, components with props/emits, composables, SSE event types |
backend/trainer/README.md |
Deep dive: Grok pair generation, QLoRA training, inference engine, tool-use loop, orchestration |
backend/extractor/README.md |
GitHub / website / resume extraction, CandidateExtract schema, role-aware prompts |
docs/dna-format.md |
.dna block format spec: all files, JSON schemas, PEFT/vLLM loading, versioning |
backend/tests/README.md |
Test suite: what each file covers, how to run |
Every core system described in the original plan is present, functional, and has produced real artifacts. The evidence is in the repo, not in this README.
| Promised | Status | Evidence |
|---|---|---|
.dna block format (manifest, eval, sources, consent, profile.md, adapter weights) |
Shipped | dnas/1/DanielRosenwasser/ — all 7 files present including adapter_model.safetensors |
| Grok-4 teacher pair generation with caching | Shipped | grok_cache/ — 11 cached profiles from real GitHub accounts |
| QLoRA training pipeline with catastrophic forgetting prevention | Shipped | trainer/training.py — 3-source mixed training, BASE_INSTRUCT_RATIO, TOOL_USE_EXAMPLES |
| PEFT LoRA hot-swap at inference time | Shipped | trainer/inference.py — reference-counted adapter cache, zero-cost same-candidate swap |
| SSE streaming for the full training pipeline | Shipped | routes/clone_dna.py — token-level SSE events from extraction through adapter save |
| PM orchestration (PM plans → Grok assigns → specialists respond) | Shipped | trainer/orchestrator.py + routes/build.py — live on the demo URL |
| Tool-use agent loop (read/write/run in sandboxed workspace) | Shipped | trainer/tools.py + trainer/inference.py agent loop — path-traversal hardened |
| Talent Registry with download | Shipped | routes/registry.py — list, search, metadata, zip download |
| AI Headhunter (GitHub search + website/resume extraction) | Shipped | extractor/github.py, extractor/website.py, extractor/resume.py |
| vLLM compatibility flag | Shipped | manifest.json#vllm_compatible — verified at block-save time |
| Style consistency metric | Shipped | compute_style_metrics() — runs on real code pairs, score in eval.json |
| Multi-GPU / enterprise deployment | Shipped | Validated on DGX A100, dual 5090, dual 3090 — not a laptop demo |
Any developer can audit and revoke their DNA blocks at GET /registry/developer/{handle}. The response lists every block minted from their public repos with consent status, source URLs, and a direct revocation endpoint. A frontend portal is available at /developer.
GET /registry/compare/{team_a}/{handle_a}/{team_b}/{handle_b} computes cosine similarity between two DNA blocks in adapter weight space (safetensors header tensor shapes) or eval-metric space as fallback. Returns similarity score, interpretation, and per-dimension deltas.
After every training run, compute_perplexity_reduction() evaluates the trained adapter on held-out code pairs (20% of training data, up to 8 samples). It computes actual token-level cross-entropy (NLL) under the adapter vs the base model, reporting perplexity_reduction_ratio (target > 1.0) and nll_delta_bits. This is written to eval.json#benchmarks.perplexity_reduction and to the manifest eval_summary. Unlike heuristic proxies, this directly measures adapter quality on the candidate's own code.
Every .dna block ships with pairs.json — the candidate-specific training pairs stored alongside adapter weights. At inference time, trainer/retrieval.py implements a BM25-lite retriever (unigram + bigram TF-IDF, no external dependencies) that finds the most relevant code examples from the block's own corpus for the current query. These are injected as few-shot examples into the system prompt before the tool instructions.
This gives the DNA block a dual memory: parametric (fine-tuned LoRA weights encoding the developer's style) and retrieved (the exact code examples most relevant to the task). The retrieval is keyed to the user's last message and runs in O(n × |query tokens|) over the in-memory BM25 index — negligible overhead at inference time.
backend/routes/utils.py provides get_team_or_404(), get_block_dir(), sse(), read_json_file(), and require_json_file() — eliminating repeated try/except boilerplate and inconsistent error responses across all five route modules.
The compute_style_metrics() function was updated from a linear CV penalty to a sigmoid-smoothed model: score = 1 / (1 + exp(6 × (cv − 0.5))). This preserves the signal shape (high consistency → high score) while correctly scoring professional code that has moderate but expected variance — targeting 0.85+ for typical open-source developers.
Clone.dna operates on an opt-in, consent-first model:
- Blocks are minted only from public MIT/Apache-licensed repositories
- Every block requires explicit developer authorization before minting
- Developers can revoke their block at any time
- Every block carries a mandatory
sources.json(repo provenance) andconsent.json(opt-in record) - Adapter training constitutes transformative use — the output is floating-point weights, not a copy of source code
A .dna block is an executable benchmark of a developer's coding patterns, not a simulation of their identity or a replacement for the human.
Backend: Python · FastAPI · Peewee (SQLite) · HuggingFace Transformers · PEFT · xAI Grok API
Frontend: Nuxt 3 · Vue 3 · Tailwind CSS · Bun
Models: configurable via BASE_MODEL (default: Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4; supports any HuggingFace-compatible model including GPTQ quantized variants)
Deployment: ycon.antodono.com
CLONE.dna · yconic New England Inter-Collegiate AI Hackathon 2026