Skip to content

Conversation

@taishikato
Copy link
Contributor

@taishikato taishikato commented Jul 1, 2025

Summary

This PR

  • removes automatic UUID generation from the SupabaseVectorStore implementation (from_texts and add_texts methods)
  • makes ID handling optional, bringing it in line with the TypeScript/JavaScript version

When IDs aren't provided, the database is now responsible for generating them instead of the library forcing UUID strings.

Why need this change

The current Python implementation has several limitations compared to the TypeScript version especially when I use SupabaseVectorStore.from_documents:

  1. Inflexible ID types: Always generates UUID strings in from_texts method (which is called by from_documents method), preventing use of AUTO_INCREMENT, SERIAL, or custom ID strategies
  2. Database design constraints: Forces tables to use UUID string primary keys instead of integers or other types
  3. Inconsistent behavior: TypeScript and Python versions handle IDs differently, causing confusion for users switching between languages
  4. Performance overhead: Generates UUIDs even when the database could handle ID generation more efficiently
  5. Limited customization: No way to use custom ID generation logic or database-native features in from_texts() method

What's the solution

  1. Remove automatic UUID generation from add_texts() and from_texts() methods
  2. Make IDs optional in add_vectors() and _add_vectors() methods
  3. Add validation to ensure ID count matches document count when IDs are provided
  4. Conditional ID inclusion - only add ID field to row data when explicitly provided
  5. Remove unused imports - clean up uuid import

When IDs are provided:

vector_store = SupabaseVectorStore.from_documents(
    docs,
    embeddings,
    client=supabase,
    table_name="documents",
    query_name="match_documents",
    ids=["ids1", "id2", ...]
)

# NOW: ID: "ids1", "id2", ...
# BEFORE: ID: uuids generated by this library because from_texts doesn't respect the ids args. (it always overwrites `ids` with `[str(uuid.uuid4()) for _ in texts]`)
# reference: https://github.com/langchain-ai/langchain-community/blob/main/libs/community/langchain_community/vectorstores/supabase.py#L152

When IDs are NOT provided:

vector_store = SupabaseVectorStore.from_documents(
    docs,
    embeddings,
    client=supabase,
    table_name="documents",
    query_name="match_documents",
)

# ID: depends on your DB settings
# BEFORE: ID: uuids generated by this library

Make from_texts consistent with add_texts by respecting custom ids.
@taishikato taishikato changed the title align SupabaseVectorStore ID handling with TypeScript version align SupabaseVectorStore ID handling with JS lib Jul 2, 2025
@mdrxy mdrxy requested a review from Copilot November 5, 2025 01:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the Supabase vector store to make the ids parameter optional and delegates ID generation to the Supabase database. Previously, the code automatically generated UUIDs when IDs were not provided; now it allows the database to handle ID generation via auto-increment or default values.

Key changes:

  • Removes automatic UUID generation from add_texts() and from_texts() methods
  • Makes the ids parameter optional in add_vectors() and _add_vectors() signatures
  • Conditionally includes the id field in row data only when IDs are explicitly provided
Comments suppressed due to low confidence (2)

libs/community/langchain_community/vectorstores/supabase.py:372

  • The variable name ids on line 368 shadows the function parameter ids from line 340. This creates confusion and could lead to bugs if the parameter needs to be referenced later. Rename the local variable to something like returned_ids or chunk_ids.
        id_list: List[str] = []
        for i in range(0, len(rows), chunk_size):
            chunk = rows[i : i + chunk_size]

            result = client.from_(table_name).upsert(chunk).execute()

            if len(result.data) == 0:
                raise Exception("Error inserting: No rows added")

            # VectorStore.add_vectors returns ids as strings
            ids = [str(i.get("id")) for i in result.data if i.get("id")]

            id_list.extend(ids)

        return id_list

libs/community/langchain_community/vectorstores/supabase.py:368

  • When IDs are not provided and the database generates them, this line filters out any results where i.get('id') is falsy (None, 0, empty string, etc.). If the database uses integer IDs starting from 0, the ID 0 would be filtered out. Use explicit is not None check instead: [str(i.get('id')) for i in result.data if i.get('id') is not None].
            ids = [str(i.get("id")) for i in result.data if i.get("id")]

@mdrxy mdrxy merged commit 65b26f4 into langchain-ai:main Nov 5, 2025
15 checks passed
@taishikato taishikato deleted the refactor/supabase-optional-ids branch November 5, 2025 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants