Skip to content

Conversation

@cutecutecat
Copy link
Member

@cutecutecat cutecutecat commented Aug 25, 2025

job: +psql_windows
job: +psql_macos
job: +check_debian
job: +psql_alpine

GUC

  • vchordrq.query_sampling_enable: Whether to enable the query logger
    • bool, default off
  • vchordrq.query_sampling_max_records: Maximum logged queries per index
    • int, 0 to 10000, default 0
  • vchordrq.query_sampling_rate: Sample rate of query to be logged
    • float, 0 to 1, default 0.01 (for 1% pick rate)

Function

  • vchordrq_sampled_queries(index: Oid): Show logged queries for the given index
  • vchordrq_sampled_vectors(index: Oid): Get logged vector samples (only for internal use).

View

  • vchordrq_sampled_queries: Show logged queries for all indexes which you have access

Example

-- Preset
CREATE TABLE t (id SERIAL PRIMARY KEY, val vector(3));
INSERT INTO t (val)
SELECT ARRAY[i * 0.0001, i * 0.00005, i * 0.0002]::vector(3) FROM generate_series(1, 10000) as s(i);
CREATE INDEX idx ON t USING vchordrq (val vector_l2_ops);
CREATE TABLE t1 (val vector(4), id SERIAL PRIMARY KEY);
INSERT INTO t1 (val)
SELECT ARRAY[i * 0.0001, i * 0.00005, i * 0.0002, i * 0.001]::vector(4) FROM generate_series(1, 10000) as s(i);
CREATE INDEX idx1 ON t1 USING vchordrq (val vector_l2_ops);

SET vchordrq.query_sampling_enable = on;
-- Log up to 2 queries
SET vchordrq.query_sampling_max_records = 2;
-- Record query at 100% rate
SET vchordrq.query_sampling_rate = 1;

-- Record
SELECT * from t ORDER BY val <-> '[0.50, 0.25, 1.00]';
SELECT val from t1 ORDER BY val <-> '[0.50, 0.25, 1.00, 0]';

SELECT * from vchordrq_sampled_queries('idx1');

-- Result
 schema_name | table_name | column_name | operator |            vector_text             
-------------+------------+-------------+----------+------------------------------------
 public      | t1         | val         | <->      | [0.5,0.25,1,0]

SELECT * from vchordrq_sampled_queries;

-- Result
 schema_name | table_name | column_name | operator |            vector_text             
-------------+------------+-------------+----------+------------------------------------
 public      | t          | val         | <->      | [0.5,0.25,1]
 public      | t1         | val         | <->      | [0.5,0.25,1,0]

-- Recall metric
SELECT AVG(recall_value)
FROM (
    SELECT
        vchordrq_evaluate_query_recall(
            query => format(
                'SELECT ctid FROM %I.%I ORDER BY %I %s ''%s'' LIMIT 10',
                lq.schema_name,
                lq.table_name,
                lq.column_name,
                lq.operator,
                lq.vector_text
            )
        ) AS recall_value
    FROM
        vchordrq_sampled_queries('idx') AS lq
) AS eval_results;

-- Result
 avg 
-----
   1

Restriction

  • Only users with read access to the index can obtain the data of the corresponding index
  • Now, the record for vchordg index is not enabled

Internal

Structure

  • Use a bunded sqlite database provided by https://github.com/rusqlite/rusqlite
  • INIT: Connect and create a disk-based sqlite table at PG_init
  • INSERT: Insert to sqlite table when query logged, and maintain max rows to max_logged_queries_per_index for each index
  • DELETE: Delete rows belongs to an index from sqlite table when a DROP TABLE or DROP INDEX triggered hook
  • READ: Read all rows belongs to an PG index when the function vchordrq_logged_queries called

Synchronous

  • There will only be 1 read/delete/insert request sent to sqlite at a time
  • INSERT: If cannot get the lock, then skip
  • READ/DELETE: If cannot get the lock, then wait

Latency

  • For first logged query (with initialize): +30ms
  • For following logged queries (without initialize): +10ms

@cutecutecat cutecutecat marked this pull request as ready for review August 26, 2025 01:54
@cutecutecat cutecutecat requested a review from Copilot August 26, 2025 01:54

This comment was marked as outdated.

@cutecutecat cutecutecat force-pushed the record-sqlite branch 2 times, most recently from 8a96a32 to dbe998e Compare August 26, 2025 02:09
@cutecutecat cutecutecat requested review from usamoi and removed request for usamoi August 26, 2025 02:22
@VoVAllen VoVAllen requested a review from usamoi August 26, 2025 10:37
@cutecutecat cutecutecat force-pushed the record-sqlite branch 12 times, most recently from 3cc03ba to 80ec6e2 Compare August 29, 2025 08:52
@cutecutecat cutecutecat requested a review from Copilot August 29, 2025 08:59

This comment was marked as outdated.

@cutecutecat cutecutecat force-pushed the record-sqlite branch 3 times, most recently from 1013474 to 2c02ee5 Compare August 29, 2025 10:26
@cutecutecat cutecutecat requested a review from Copilot August 29, 2025 10:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements query logging functionality for the vchordrq index type using SQLite storage. It adds GUC parameters to control query logging behavior and provides functions and views to access logged query data.

Key changes:

  • Added SQLite-based query collection system with configurable sampling and limits
  • Introduced three new GUC parameters for controlling query logging behavior
  • Added SQL functions and views to retrieve logged queries with proper access control

Reviewed Changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/lib.rs Adds collector module initialization
src/collector/*.rs Implements SQLite-based query logging infrastructure
src/index/gucs.rs Adds new GUC parameters for query logging configuration
src/index/functions.rs Adds function to retrieve logged samples with access control
src/sql/finalize.sql Defines SQL functions and views for query access
src/index//scanners/.rs Integrates query logging into search builders
src/index/*/am/mod.rs Adds sender configuration for query collection
tests/vchordrq/recall.slt Adds comprehensive tests for query logging functionality
Cargo.toml Adds rusqlite dependency
scripts/train.py Reduces training iterations (unrelated optimization)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@cutecutecat cutecutecat force-pushed the record-sqlite branch 3 times, most recently from dd14985 to 2dce740 Compare September 3, 2025 03:12
@cutecutecat
Copy link
Member Author

cutecutecat commented Sep 3, 2025

Query record latency test [deprecated]

Latency dim minimum maximum majority
insert at <1000 records + maintain 3 3ms 5ms 5ms
insert at ~5000 records + maintain 3 5ms 6ms 5ms
insert at ~10000 records + maintain 3 10ms 13ms 10ms
insert at <1000 records + maintain 768 3ms 6ms 5ms
insert at ~5000 records + maintain 768 4ms 6ms 5ms

@cutecutecat cutecutecat force-pushed the record-sqlite branch 4 times, most recently from 2c51021 to cbd7800 Compare September 3, 2025 05:32
@cutecutecat cutecutecat requested a review from usamoi September 3, 2025 06:01
@usamoi
Copy link
Contributor

usamoi commented Sep 3, 2025

Query record latency test

What does this show? Shouldn't we conduct a controlled experiment with variables?

@cutecutecat
Copy link
Member Author

cutecutecat commented Sep 4, 2025

Benifit from synchronous = NORMAL instead of FULL, now we can achieve a record latency of only 0.3ms - 0.5ms at up to 10000 records.

The latency bottleneck is caused by every fsync.

ref:
https://www.sqlite.org/pragma.html#pragma_synchronous

@VoVAllen
Copy link
Member

VoVAllen commented Sep 4, 2025

lgtm. Shall we add a reset interface, to let user recreate the sqlite db if it doesn't work properly after some crash?

@cutecutecat cutecutecat force-pushed the record-sqlite branch 2 times, most recently from 53c1c15 to 76ecc21 Compare September 4, 2025 09:47
@cutecutecat cutecutecat requested a review from usamoi September 4, 2025 10:05
@cutecutecat cutecutecat force-pushed the record-sqlite branch 2 times, most recently from 12d2003 to 9b9a8f1 Compare September 4, 2025 11:34
@cutecutecat cutecutecat merged commit 9e1bbe1 into tensorchord:main Sep 4, 2025
39 checks passed
@cutecutecat
Copy link
Member Author

cutecutecat commented Sep 4, 2025

lgtm. Shall we add a reset interface, to let user recreate the sqlite db if it doesn't work properly after some crash?

We implement a simple strategy that deletes an SQLite database file if corruption is detected upon opening. Then, the next time when a SQL is captured or records are read, the database will be recreated, and everything will be fine.

if err.sqlite_error_code() == Some(rusqlite::ErrorCode::DatabaseCorrupt) {

usamoi pushed a commit to usamoi/VectorChord that referenced this pull request Sep 15, 2025
Real-time sampling of SQL queries for recall evaluation and monitoring.

Signed-off-by: cutecutecat <[email protected]>
@cutecutecat cutecutecat deleted the record-sqlite branch October 13, 2025 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants