🚀 DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Jinjian Liu^1*, Yichuan Wang^1*, Xinxi Lyu², Rulin Shao³,
Joseph E. Gonzalez¹, Matei Zaharia¹, Sewon Min¹

¹University of California, Berkeley ²University of Illinois Urbana–Champaign ³University of Washington

^*Equal contribution.

[Blog] [Web Interface] [API Endpoint] [Voting System] [Paper]

You can turn any large in-house dataset (<1T tokens) into a high-throughput (up to 10000 index-only QPS), memory-efficient (<200 GB RAM) retrieval system with a web UI and API.

Our prototype, built on 400B words of high-quality LLM pre-training data, is readily available and provides downstream gains comparable to commercial search engine endpoints.

DS-Serve UI & control panel

Introduction

This repository contains the DS-Serve Public API and server code. It exposes a production‑ready Flask service for retrieval‑augmented generation (RAG) backed by a billion‑scale FAISS IVFPQ index or DiskANN. The server provides adjustable settings and search modes at low-latency. A small CLI helps download/prepare indices and start the server.

Expected data layout (under DATASTORE_PATH)

<DATASTORE_PATH>/
  <domain_name>/
    config.json              # loader config (encoder, nprobe, index filename, etc.)
    index/                   # a single merged FAISS file (*.faiss)
    passages/                # *.jsonl shards for text lookup

Quickstart (IVFPQ / Standard Setup)

Datasets

CompactDS-102GB
- Core index and passages. Please refer to the dataset card for details.
Full embeddings
- PubMed embeddings are sharded, so combine locally if needed:

cat massiveds-pubmed--passages7_00.pkl_{aa,ab,ac,ad,ae,af,ag,ah,ai} \
  > massiveds-pubmed--passages7_00.pkl

0) Prepare the repo

git clone https://github.com/Berkeley-Large-RAG/RAG-DS-Serve.git 
cd RAG-DS-Serve
git submodule update --init --recursive

1) Download the dataset/index from Hugging Face

Choose a local data root (DATASTORE_PATH). Example:

export DATASTORE_PATH=/home/ubuntu/massive-serve-dev

Download the dataset into $DATASTORE_PATH/<domain_name> (example uses index_dev):

huggingface-cli download <ORG_OR_USER>/<DATASET_REPO> \
  --repo-type dataset \
  --local-dir $DATASTORE_PATH/index_dev

Notes:

The directory should include an index/ directory with a IVFPQ (FAISS) index and a passages/ directory with .jsonl files.
If your index is uploaded in split/chunked form, see Step 3 to combine shards.

2) Build the position mapping arrays

The server looks up passage text by the IVFPQ index id using position mapping arrays. Generate them once from your passages/ directory:

Open utils/build_arr.py and set INPUT_DIR to your passages folder, e.g.:

INPUT_DIR = "/home/ubuntu/massive-serve-dev/index_dev/passages"

Then run from the repo root:

python utils/build_arr.py

This writes three files next to the script (and the server expects them under index_dev/ as configured by the code):

index_dev/position_array.npy
index_dev/filename_index_array.npy
index_dev/filename_list.npy

3) Combine IVFPQ index shards

Your index/ folder contains split parts, combine them into a single .faiss file before serving.

Simple shard set (concatenate all ...faiss_** parts in order; do NOT include the .meta file):

cd $DATASTORE_PATH/index_dev/index
# Example names: index_IVFPQ.100000000.768.65536.64.faiss_aa, ..._ab, ..._ac, ...
cat $(ls index_IVFPQ.100000000.768.65536.64.faiss_* | sort) > index_full.faiss

After combining, ensure there is exactly one .faiss file in index/ (e.g., index/index_full.faiss).

4) Launch the API server

Use index_dev as the domain name (provided by index_dev/config.json):

DATASTORE_PATH=/home/ubuntu/massive-serve-dev \
python -m massive_serve.cli serve --domain_name index_dev

By default the server starts at port 30888 and exposes /search and /vote endpoints.

5) Test the API

For the full reference and examples, see docs/API_DOCUMENTATION.md. You can use curl commands documented there to run quick tests.

DiskANN Build

NOTE: The pre-built DiskANN index will be released publicly soon. The instructions below are currently intended for technical reference or internal testing.
For convenience when testing from this repo root, you can point to the local copies under ./:

./position_array.npy
./filename_index_array.npy
./filename_list.npy
./data/passages/
./DiskANN-build/DiskANN_index/

DiskANN Serving (Setup and Launch)

1) Create environment

# Using uv
uv venv .venv
source .venv/bin/activate
uv pip install -U pip setuptools wheel
# Install project dependencies
uv pip install -r requirements.txt
uv pip install -e .
# Install DiskANN
uv pip install --no-deps diskannpy==0.7.0

2) Launch the server (DiskANN)

From the repo root:

DATASTORE_PATH=$(pwd)
# Set your DiskANN index prefix here (e.g., diskann_mips_f32_R60_L80_B200_M500)
DISKANN_PREFIX="<YOUR_INDEX_PREFIX>" 

mkdir -p "$DATASTORE_PATH/logging"

DS_SERVE_LOG_DIR="$DATASTORE_PATH/logging" \
MASSIVE_SERVE_PORT=30888 \
MS_BACKEND=diskann \
DATASTORE_PATH="$DATASTORE_PATH" \
DISKANN_INDEX_DIR="$DATASTORE_PATH/DiskANN-build/DiskANN_index" \
DISKANN_INDEX_PREFIX="$DISKANN_PREFIX" \
DISKANN_DISTANCE=mips \
DISKANN_NUM_THREADS=128 \
DISKANN_NODES_TO_CACHE=100000 \
DISKANN_L=500 \
DISKANN_W=4 \
DISKANN_WARMUP=1 \
DISKANN_WARMUP_QUERIES=5000 \
DISKANN_WARMUP_BATCH=256 \
DISKANN_WARMUP_QUERY_FILE="$DISKANN_INDEX_DIR/${DISKANN_PREFIX}_sample_data.bin" \
DISKANN_WARMUP_KEEPALIVE=1 \
python -m massive_serve.cli serve --domain_name data

Tips:

Use a different MASSIVE_SERVE_PORT if firewall issues occur or one is already in use.
DISKANN_NUM_THREADS sets CPU threads for DiskANN search; 0 uses all logical CPUs.
DISKANN_NODES_TO_CACHE pins popular nodes in RAM; warmup further primes OS page cache.

Enabling Exact Search (Optional)

"Exact Search" re-scores top candidates using a heavy encoder (GritLM) for higher accuracy but requires a GPU. To enable it:

Open massive_serve/api/backup.html.
Uncomment the "Exact Search" toggle block (search for "Exact Search").
Uncomment the help text entry in the JavaScript HELP_CONTENT object.
Ensure your server has GPU access (the backend automatically detects and uses it).

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
DiskANN-build		DiskANN-build
docs		docs
massive_serve		massive_serve
rerank		rerank
runtime		runtime
scripts		scripts
test-data		test-data
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
VOTES_DOCUMENTATION.md		VOTES_DOCUMENTATION.md
minimal_example.py		minimal_example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Introduction

Expected data layout (under DATASTORE_PATH)

Quickstart (IVFPQ / Standard Setup)

Datasets

0) Prepare the repo

1) Download the dataset/index from Hugging Face

2) Build the position mapping arrays

3) Combine IVFPQ index shards

4) Launch the API server

5) Test the API

DiskANN Build

DiskANN Serving (Setup and Launch)

1) Create environment

2) Launch the server (DiskANN)

Enabling Exact Search (Optional)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Introduction

Expected data layout (under DATASTORE_PATH)

Quickstart (IVFPQ / Standard Setup)

Datasets

0) Prepare the repo

1) Download the dataset/index from Hugging Face

2) Build the position mapping arrays

3) Combine IVFPQ index shards

4) Launch the API server

5) Test the API

DiskANN Build

DiskANN Serving (Setup and Launch)

1) Create environment

2) Launch the server (DiskANN)

Enabling Exact Search (Optional)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages