Every feed you use — Hacker News, Reddit, X — has the same problem. Either it’s chronological and you’re drowning in noise, or it’s ranked by an algorithm optimizing for engagement, not your actual interests. You scroll past 90% of what you see, hoping something relevant catches your eye before the dopamine loop wins.
The usual solutions don’t help much. RSS gives you firehose-level control but no ranking. Keyword filters are blunt — blocking “crypto” also hides legitimate cryptography research. And platform algorithms are black boxes that learn to maximize your time-on-site, not the quality of what you read.
What if your browser could learn what you care about and quietly highlight it — without sending a single byte to any server?
That’s what I built. Sift is a Chrome extension that runs a 300-million parameter embedding model directly in your browser. It scores every item in your feed against your interests, dims the noise, and highlights what matters. No cloud. No accounts. No tracking. The model, the scoring, and your data all stay on your machine.
What Sift Does
After installing, you pick your interests from 25 built-in categories — things like “AI Research”, “Open Source”, “Climate”, “Startups”, or “Design & UX”. Then you browse normally.
On Hacker News, Reddit, and X, Sift scores every item in real-time. High-relevance items stay bright; low-relevance ones fade into the background. Here’s HN before and after:
Before — every story looks the same, no signal about what’s relevant to you:

After — Sift scores each item and dims the noise. The bottom items show category pills (“Programming”, “Deep Tech”) and relevance scores:

The popup gives you a detailed view of any page. Here it’s scoring a TechCrunch article about Meta’s AMD chip deal — 0.93 relevance, matching “AI Research” as the top category:

It works across sites. Here’s Sift scoring a Reddit post on r/LocalLLaMA, with the inspector showing multi-category matches:

And on X, scoring a post from Liquid AI about their new LFM2 model:

Teaching Sift Your Taste
Scoring against predefined categories gets you 80% of the way. The remaining 20% comes from your own taste. Every item has thumbs up/down buttons — tap them to tell Sift “more like this” or “less like this”. These labels are stored locally and can be managed in a full-page editor:

After 10+ positive labels, Sift builds a taste profile — a contrastive representation of what you like vs. what you don’t. It’s visualized as a radar chart with ranked sub-topics discovered by probing your taste vector:

The taste profile reveals nuance beyond broad categories. It’s not just “AI” — it’s “open source machine learning models” and “AI safety and alignment research” ranked above “game development” and “user experience research”. Your feed becomes yours.
Demo video: See Sift in action — scoring, labeling, and taste profile — in this short walkthrough.
How It Works: ML in a Chrome Tab
The core insight behind Sift is that modern embedding models are small enough to run entirely in the browser, and powerful enough to do useful semantic scoring.
The Model
Sift uses EmbeddingGemma-300M, a sentence embedding model from Google. Given any piece of text, it produces a 768-dimensional vector that captures its semantic meaning. Two texts about similar topics will have vectors that point in roughly the same direction; unrelated texts will point in different directions. Cosine similarity between these vectors ranges from -1 to 1, but in practice sentence embeddings cluster in the positive range — Sift maps and clamps the raw similarity into a 0–1 score for display.
Why not just use keyword matching? Because embeddings understand meaning, not just words. “Rust is eating the world” and “Memory-safe systems programming” have no words in common, but an embedding model knows they’re related. This is what makes Sift dramatically more useful than keyword filters.
Running in the Browser
Transformers.js v4 makes it possible to run ONNX models directly in Chrome’s service worker. All you need is wasm-unsafe-eval in your Content Security Policy — no offscreen documents, no sandbox iframes, no special backends. Chrome’s MV3 service workers can be terminated and restarted by the browser, so Sift handles re-initialization gracefully — the model reloads on demand and scores text in milliseconds.
Sift auto-detects WebGPU at runtime for faster inference, falling back to WebAssembly if unavailable. The green “WEBGPU” badge in the popup tells you which backend is active.
The build system is a dual Vite setup: one ES module build for the background service worker (which bundles Transformers.js and the ONNX runtime), and separate IIFE builds for content scripts and the popup. Chrome requires content scripts to be classic scripts — they can’t use ES module imports — so each one gets bundled into a self-contained file.
Here’s the architecture at a glance:

Scoring Everything, All At Once
Instead of scoring items against a single category, Sift scores every item against all active categories simultaneously. The core primitive is rankPresets() — it takes an embedding and returns a sorted ranking of category matches. The top category always shows as a pill; additional pills appear if their scores meet a minimum threshold (0.15), so items that genuinely span multiple categories get multiple labels.
Explanations are deterministic — no LLM generation needed. Sift maps the score to a band (HIGH / GOOD / FLAT / LOW) and fills a template based on the score and category ranking to produce a concise rationale. No network calls. No hallucinations. Just templates and math.
The Taste Vector
When you label items, Sift builds a contrastive taste vector:
tasteVec = normalize(posCentroid - 0.3 × negCentroid)
The positive centroid captures the direction of things you like. The negative centroid captures what you dislike. The 0.3 scaling prevents negative labels from overwhelming the signal — one strong “no” shouldn’t cancel out five “yes” votes.
To discover what this vector actually represents, Sift scores it against ~100 curated probe phrases — sub-topics like “open source machine learning models” or “startup funding and venture capital”. The top matches become your taste profile, revealing interests more nuanced than your broad category selections.
Train Your Own Model
The browser-based scoring works well out of the box. But Sift’s real trick is closing the loop: your labels become training data for a fine-tuned model that scores even better for your specific taste.
The Flywheel
Browse → Label (thumbs up/down) → Export CSV → Fine-tune → Reload
Each cycle makes the model sharper. The base EmbeddingGemma-300M is good at general semantic similarity. After fine-tuning on your labels, it learns your definition of what’s interesting within each category.
The Training Pipeline
Sift includes a complete training pipeline in train.py. Export your labels from the extension as CSV (Anchor, Positive, Negative triplets), then:
# Fine-tune on your labels (auto-detects GPU)
python train.py sift_training.csv --hf-token hf_xxx
# Custom hyperparameters
python train.py sift_training.csv --epochs 6 --lr 3e-5
# Convert an existing model to ONNX without retraining
python train.py --convert-only path/to/saved_model
Under the hood, two design choices shape the training: the loss function and the task prompt.
Why Contrastive Learning (MultipleNegativesRankingLoss)
Sift’s training data is small — maybe 50–200 triplets from a single user’s labeling session. A classification head would overfit instantly. Instead, we use MultipleNegativesRankingLoss (MNRL), a contrastive loss from the sentence-transformers library.
MNRL takes (anchor, positive, negative) triplets and learns to pull the anchor closer to the positive while pushing it away from negatives. The key benefit is in-batch negatives: every other positive in the batch is treated as an additional negative. With larger batches (for example, batch size 8), each example gets up to 7 extra negatives.
In Sift, the default CLI uses batch size 1 (to fit modest hardware), while the Colab notebook defaults to batch size 4 on a T4. As batch size increases, the in-batch signal scales. Even at smaller batch sizes, MNRL typically extracts more supervision from limited data than a single-negative triplet setup.
The result is that the model doesn’t learn to classify into fixed categories. It learns a better embedding space — one where articles you like are closer together and articles you don’t like are farther apart. This improved embedding space benefits all downstream scoring, not just the categories you labeled.
Why “Classification” as the Task Prompt
EmbeddingGemma is a multi-task model — it supports different task prompts that steer the embedding behavior. The available tasks include “Retrieval”, “Clustering”, “Semantic Similarity”, and “Classification”, each prepending a different instruction to the input.
We use “Classification” because Sift’s core operation is essentially: does this article belong to the “AI Research” category? That’s a classification question, not a retrieval or clustering one. The “Classification” prompt tells the model to produce embeddings optimized for distinguishing between categories — exactly what Sift needs when scoring items against anchor texts.
This matters during fine-tuning because the task prompt is baked into training via SentenceTransformerTrainingArguments(prompts=...), so the model learns to improve its classification-oriented embeddings specifically. At inference time, the browser extension prepends the same task: classification | query: prefix before tokenizing, so training and inference operate in the same embedding subspace.
Held-Out Taste Evaluation
Before training begins, Sift splits 15% of triplets per anchor into a held-out set (anchors with too few triplets are kept entirely for training). A TasteTracker callback scores these items at baseline and after each epoch, producing a table like this:
=== Taste Check (baseline) ====================================
Anchor: AI Research (6 items)
+ "Sterling-8B: The First Inherently Interpreta..." 0.68
+ "arcee-ai/Trinity-Large-Preview" 0.62
- "Texas is about to overtake California in batt..." 0.54
- "Making Wolfram Tech Available as a Foundation..." 0.51
avg +: 0.65 avg -: 0.53 gap: 0.12 pos>neg: 75%
=== Taste Check (epoch 4) =====================================
Anchor: AI Research (6 items)
+ "Sterling-8B: The First Inherently Interpreta..." 0.81 (+0.13)
+ "arcee-ai/Trinity-Large-Preview" 0.78 (+0.16)
- "Texas is about to overtake California in batt..." 0.41 (-0.13)
- "Making Wolfram Tech Available as a Foundation..." 0.38 (-0.13)
avg +: 0.80 avg -: 0.40 gap: 0.40 pos>neg: 100%
The gap between positive and negative items widened from 0.12 to 0.40, and pair accuracy went from 75% to 100%. The model learned your taste.
Google Colab: Zero-Setup Training
If you don’t have a local GPU, the Colab notebook is the easiest path. It works in Colab’s GPU runtimes (typically a T4 when available) and handles everything:
- Set GPU runtime (
Runtime → Change runtime type → T4 GPU) - Paste your HuggingFace token (read access, needed for the gated base model)
- Upload your exported CSV
- Run all cells — training, ONNX export, and quantization happen automatically
- Download the resulting ONNX zip
The notebook is self-contained — all training code is inlined, no local setup required.
ONNX Export: Getting the Model Back Into the Browser
This was one of the trickiest parts of the project. You can’t use torch.onnx.export() on EmbeddingGemma — Gemma3’s sliding window attention uses custom autograd operations and vmap that break TorchScript tracing.
The solution: Optimum’s ONNX exporter with library_name='sentence_transformers'. This exports the full sentence-transformer pipeline — Transformer, Pooling, Dense layers, and Normalize — as a single ONNX graph with a sentence_embedding output. This matches the format Transformers.js expects.
The export pipeline produces four model variants:
| File | Format | Size | Use case |
|---|---|---|---|
model.onnx | FP32 | ~1.2 GB | Reference |
model_quantized.onnx | INT8 | ~300 MB | Smaller download |
model_q4.onnx | 4-bit | ~170 MB | WASM inference |
model_no_gather_q4.onnx | 4-bit | ~170 MB | WebGPU inference |
The WebGPU variant strips GatherElements ops (which WebGPU doesn’t support) and replaces them with Gather. Transformers.js auto-selects the right variant at runtime.
Loading Your Fine-Tuned Model
Once exported, you can either:
Test locally — serve the model files and point the extension to http://localhost:8000:
python train.py --serve path/to/onnx_output
Publish to HuggingFace Hub — the extension can load any public model directly:
python train.py sift_training.csv --push-to-hub your-username/sift-finetuned
Then set the model ID in Sift’s popup settings. ONNX files contain only numerical weights and tokenizer data — no training examples or personal information — so they’re safe to publish publicly.
What’s Next
Sift is open source and available now. You can install it from the latest release, or build from source.
What’s planned:
- User-defined categories — create your own scoring categories beyond the 25 built-in ones
- More site integrations — beyond HN, Reddit, and X
- Improved agent mode — the experimental taste-ranked HN feed is just the beginning
The broader point is this: embedding models are now small enough to run in a browser tab, and powerful enough to do genuinely useful personalization. You don’t need a cloud service to have a smart feed. You don’t need to hand your reading habits to a platform. The model runs on your machine, learns from your labels, and the data never leaves.
Your feed. Your model. Your data.
Links: