Jinjian Liu1*,
Yichuan Wang1*,
Xinxi Lyu2,
Rulin Shao3,
Joseph E. Gonzalez1,
Matei Zaharia1,
Sewon Min1
1University of California, Berkeley 2University of Illinois Urbana–Champaign 3University of Washington
*Equal contribution.
[Blog] [Web Interface] [API Endpoint] [Voting System] [Paper]
- You can turn any large in-house dataset (<1T tokens) into a high-throughput (up to 10000 index-only QPS), memory-efficient (<200 GB RAM) retrieval system with a web UI and API.
- Our prototype, built on 400B words of high-quality LLM pre-training data, is readily available and provides downstream gains comparable to commercial search engine endpoints.
DS-Serve UI & control panel
This repository contains the DS-Serve Public API and server code. It exposes a production‑ready Flask service for retrieval‑augmented generation (RAG) backed by a billion‑scale FAISS IVFPQ index or DiskANN. The server provides adjustable settings and search modes at low-latency. A small CLI helps download/prepare indices and start the server.
<DATASTORE_PATH>/
<domain_name>/
config.json # loader config (encoder, nprobe, index filename, etc.)
index/ # a single merged FAISS file (*.faiss)
passages/ # *.jsonl shards for text lookup
- CompactDS-102GB
- Core index and passages. Please refer to the dataset card for details.
- Full embeddings
- PubMed embeddings are sharded, so combine locally if needed:
cat massiveds-pubmed--passages7_00.pkl_{aa,ab,ac,ad,ae,af,ag,ah,ai} \
> massiveds-pubmed--passages7_00.pklgit clone https://github.com/Berkeley-Large-RAG/RAG-DS-Serve.git
cd RAG-DS-Serve
git submodule update --init --recursive- Choose a local data root (DATASTORE_PATH). Example:
export DATASTORE_PATH=/home/ubuntu/massive-serve-dev- Download the dataset into
$DATASTORE_PATH/<domain_name>(example usesindex_dev):
huggingface-cli download <ORG_OR_USER>/<DATASET_REPO> \
--repo-type dataset \
--local-dir $DATASTORE_PATH/index_devNotes:
- The directory should include an
index/directory with a IVFPQ (FAISS) index and apassages/directory with.jsonlfiles. - If your index is uploaded in split/chunked form, see Step 3 to combine shards.
The server looks up passage text by the IVFPQ index id using position mapping arrays. Generate them once from your passages/ directory:
- Open
utils/build_arr.pyand setINPUT_DIRto your passages folder, e.g.:
INPUT_DIR = "/home/ubuntu/massive-serve-dev/index_dev/passages"- Then run from the repo root:
python utils/build_arr.pyThis writes three files next to the script (and the server expects them under index_dev/ as configured by the code):
index_dev/position_array.npyindex_dev/filename_index_array.npyindex_dev/filename_list.npy
Your index/ folder contains split parts, combine them into a single .faiss file before serving.
- Simple shard set (concatenate all
...faiss_**parts in order; do NOT include the.metafile):
cd $DATASTORE_PATH/index_dev/index
# Example names: index_IVFPQ.100000000.768.65536.64.faiss_aa, ..._ab, ..._ac, ...
cat $(ls index_IVFPQ.100000000.768.65536.64.faiss_* | sort) > index_full.faissAfter combining, ensure there is exactly one .faiss file in index/ (e.g., index/index_full.faiss).
Use index_dev as the domain name (provided by index_dev/config.json):
DATASTORE_PATH=/home/ubuntu/massive-serve-dev \
python -m massive_serve.cli serve --domain_name index_devBy default the server starts at port 30888 and exposes /search and /vote endpoints.
For the full reference and examples, see docs/API_DOCUMENTATION.md. You can use curl commands documented there to run quick tests.
NOTE: The pre-built DiskANN index will be released publicly soon. The instructions below are currently intended for technical reference or internal testing.
For convenience when testing from this repo root, you can point to the local copies under ./:
./position_array.npy./filename_index_array.npy./filename_list.npy./data/passages/./DiskANN-build/DiskANN_index/
# Using uv
uv venv .venv
source .venv/bin/activate
uv pip install -U pip setuptools wheel
# Install project dependencies
uv pip install -r requirements.txt
uv pip install -e .
# Install DiskANN
uv pip install --no-deps diskannpy==0.7.0From the repo root:
DATASTORE_PATH=$(pwd)
# Set your DiskANN index prefix here (e.g., diskann_mips_f32_R60_L80_B200_M500)
DISKANN_PREFIX="<YOUR_INDEX_PREFIX>"
mkdir -p "$DATASTORE_PATH/logging"
DS_SERVE_LOG_DIR="$DATASTORE_PATH/logging" \
MASSIVE_SERVE_PORT=30888 \
MS_BACKEND=diskann \
DATASTORE_PATH="$DATASTORE_PATH" \
DISKANN_INDEX_DIR="$DATASTORE_PATH/DiskANN-build/DiskANN_index" \
DISKANN_INDEX_PREFIX="$DISKANN_PREFIX" \
DISKANN_DISTANCE=mips \
DISKANN_NUM_THREADS=128 \
DISKANN_NODES_TO_CACHE=100000 \
DISKANN_L=500 \
DISKANN_W=4 \
DISKANN_WARMUP=1 \
DISKANN_WARMUP_QUERIES=5000 \
DISKANN_WARMUP_BATCH=256 \
DISKANN_WARMUP_QUERY_FILE="$DISKANN_INDEX_DIR/${DISKANN_PREFIX}_sample_data.bin" \
DISKANN_WARMUP_KEEPALIVE=1 \
python -m massive_serve.cli serve --domain_name dataTips:
- Use a different
MASSIVE_SERVE_PORTif firewall issues occur or one is already in use. DISKANN_NUM_THREADSsets CPU threads for DiskANN search; 0 uses all logical CPUs.DISKANN_NODES_TO_CACHEpins popular nodes in RAM; warmup further primes OS page cache.
"Exact Search" re-scores top candidates using a heavy encoder (GritLM) for higher accuracy but requires a GPU. To enable it:
- Open
massive_serve/api/backup.html. - Uncomment the "Exact Search" toggle block (search for "Exact Search").
- Uncomment the help text entry in the JavaScript
HELP_CONTENTobject. - Ensure your server has GPU access (the backend automatically detects and uses it).

