Python API Reference

This section contains the API reference for the Python API of LanceDB. Both synchronous and asynchronous APIs are available.

The general flow of using the API is:

Use lancedb.connect or lancedb.connect_async to connect to a database.
Use the returned lancedb.DBConnection or lancedb.AsyncConnection to create or open tables.
Use the returned lancedb.table.Table or lancedb.AsyncTable to query or modify tables.

Installation

pip install lancedb

The following methods describe the synchronous API client. There is also an asynchronous API client.

Connections (Synchronous)

lancedb.connect

connect(uri: URI, *, api_key: Optional[str] = None, region: str = 'us-east-1', host_override: Optional[str] = None, read_consistency_interval: Optional[timedelta] = None, request_thread_pool: Optional[Union[int, ThreadPoolExecutor]] = None, client_config: Union[ClientConfig, Dict[str, Any], None] = None, storage_options: Optional[Dict[str, str]] = None, session: Optional[Session] = None, **kwargs: Any) -> DBConnection

Connect to a LanceDB database.

Parameters:

uri (URI) –

The uri of the database.
api_key (Optional[str], default: None ) –

If presented, connect to LanceDB cloud. Otherwise, connect to a database on file system or cloud storage. Can be set via environment variable LANCEDB_API_KEY.
region (str, default: 'us-east-1' ) –

The region to use for LanceDB Cloud.
host_override (Optional[str], default: None ) –

The override url for LanceDB Cloud.
read_consistency_interval (Optional[timedelta], default: None ) –

(For LanceDB OSS only) The interval at which to check for updates to the table from other processes. If None, then consistency is not checked. For performance reasons, this is the default. For strong consistency, set this to zero seconds. Then every read will check for updates from other processes. As a compromise, you can set this to a non-zero timedelta for eventual consistency. If more than that interval has passed since the last check, then the table will be checked for updates. Note: this consistency only applies to read operations. Write operations are always consistent.
client_config (Union[ClientConfig, Dict[str, Any], None], default: None ) –

Configuration options for the LanceDB Cloud HTTP client. If a dict, then the keys are the attributes of the ClientConfig class. If None, then the default configuration is used.
storage_options (Optional[Dict[str, str]], default: None ) –

Additional options for the storage backend. See available options at https://lancedb.com/docs/storage/
session (Optional[Session], default: None ) –

(For LanceDB OSS only) A session to use for this connection. Sessions allow you to configure cache sizes for index and metadata caches, which can significantly impact memory use and performance. They can also be re-used across multiple connections to share the same cache state.

Examples:

For a local directory, provide a path for the database:

>>> import lancedb
>>> db = lancedb.connect("~/.lancedb")

For object storage, use a URI prefix:

>>> db = lancedb.connect("s3://my-bucket/lancedb",
...                      storage_options={"aws_access_key_id": "***"})

Connect to LanceDB cloud:

>>> db = lancedb.connect("db://my_database", api_key="ldb_...",
...                      client_config={"retry_config": {"retries": 5}})

Returns:

conn ( DBConnection ) –

A connection to a LanceDB database.

Source code in lancedb/__init__.py

def connect(
    uri: URI,
    *,
    api_key: Optional[str] = None,
    region: str = "us-east-1",
    host_override: Optional[str] = None,
    read_consistency_interval: Optional[timedelta] = None,
    request_thread_pool: Optional[Union[int, ThreadPoolExecutor]] = None,
    client_config: Union[ClientConfig, Dict[str, Any], None] = None,
    storage_options: Optional[Dict[str, str]] = None,
    session: Optional[Session] = None,
    **kwargs: Any,
) -> DBConnection:
    """Connect to a LanceDB database.

    Parameters
    ----------
    uri: str or Path
        The uri of the database.
    api_key: str, optional
        If presented, connect to LanceDB cloud.
        Otherwise, connect to a database on file system or cloud storage.
        Can be set via environment variable `LANCEDB_API_KEY`.
    region: str, default "us-east-1"
        The region to use for LanceDB Cloud.
    host_override: str, optional
        The override url for LanceDB Cloud.
    read_consistency_interval: timedelta, default None
        (For LanceDB OSS only)
        The interval at which to check for updates to the table from other
        processes. If None, then consistency is not checked. For performance
        reasons, this is the default. For strong consistency, set this to
        zero seconds. Then every read will check for updates from other
        processes. As a compromise, you can set this to a non-zero timedelta
        for eventual consistency. If more than that interval has passed since
        the last check, then the table will be checked for updates. Note: this
        consistency only applies to read operations. Write operations are
        always consistent.
    client_config: ClientConfig or dict, optional
        Configuration options for the LanceDB Cloud HTTP client. If a dict, then
        the keys are the attributes of the ClientConfig class. If None, then the
        default configuration is used.
    storage_options: dict, optional
        Additional options for the storage backend. See available options at
        <https://lancedb.com/docs/storage/>
    session: Session, optional
        (For LanceDB OSS only)
        A session to use for this connection. Sessions allow you to configure
        cache sizes for index and metadata caches, which can significantly
        impact memory use and performance. They can also be re-used across
        multiple connections to share the same cache state.

    Examples
    --------

    For a local directory, provide a path for the database:

    >>> import lancedb
    >>> db = lancedb.connect("~/.lancedb")

    For object storage, use a URI prefix:

    >>> db = lancedb.connect("s3://my-bucket/lancedb",
    ...                      storage_options={"aws_access_key_id": "***"})

    Connect to LanceDB cloud:

    >>> db = lancedb.connect("db://my_database", api_key="ldb_...",
    ...                      client_config={"retry_config": {"retries": 5}})

    Returns
    -------
    conn : DBConnection
        A connection to a LanceDB database.
    """
    if isinstance(uri, str) and uri.startswith("db://"):
        if api_key is None:
            api_key = os.environ.get("LANCEDB_API_KEY")
        if api_key is None:
            raise ValueError(f"api_key is required to connect to LanceDB cloud: {uri}")
        if isinstance(request_thread_pool, int):
            request_thread_pool = ThreadPoolExecutor(request_thread_pool)
        return RemoteDBConnection(
            uri,
            api_key,
            region,
            host_override,
            # TODO: remove this (deprecation warning downstream)
            request_thread_pool=request_thread_pool,
            client_config=client_config,
            storage_options=storage_options,
            **kwargs,
        )

    if kwargs:
        raise ValueError(f"Unknown keyword arguments: {kwargs}")
    return LanceDBConnection(
        uri,
        read_consistency_interval=read_consistency_interval,
        storage_options=storage_options,
        session=session,
    )

lancedb.db.DBConnection

Bases: EnforceOverrides

An active LanceDB connection interface.

Source code in lancedb/db.py

class DBConnection(EnforceOverrides):
    """An active LanceDB connection interface."""

    def list_namespaces(
        self,
        namespace: Optional[List[str]] = None,
        page_token: Optional[str] = None,
        limit: Optional[int] = None,
    ) -> ListNamespacesResponse:
        """List immediate child namespace names in the given namespace.

        Parameters
        ----------
        namespace: List[str], default []
            The parent namespace to list namespaces in.
            Empty list represents root namespace.
        page_token: str, optional
            Token for pagination. Use the token from a previous response
            to get the next page of results.
        limit: int, optional
            The maximum number of results to return.

        Returns
        -------
        ListNamespacesResponse
            Response containing namespace names and optional page_token for pagination.
        """
        if namespace is None:
            namespace = []
        return ListNamespacesResponse(namespaces=[], page_token=None)

    def create_namespace(
        self,
        namespace: List[str],
        mode: Optional[str] = None,
        properties: Optional[Dict[str, str]] = None,
    ) -> CreateNamespaceResponse:
        """Create a new namespace.

        Parameters
        ----------
        namespace: List[str]
            The namespace identifier to create.
        mode: str, optional
            Creation mode - "create" (fail if exists), "exist_ok" (skip if exists),
            or "overwrite" (replace if exists). Case insensitive.
        properties: Dict[str, str], optional
            Properties to set on the namespace.

        Returns
        -------
        CreateNamespaceResponse
            Response containing the properties of the created namespace.
        """
        raise NotImplementedError(
            "Namespace operations are not supported for this connection type"
        )

    def drop_namespace(
        self,
        namespace: List[str],
        mode: Optional[str] = None,
        behavior: Optional[str] = None,
    ) -> DropNamespaceResponse:
        """Drop a namespace.

        Parameters
        ----------
        namespace: List[str]
            The namespace identifier to drop.
        mode: str, optional
            Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
        behavior: str, optional
            Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE").
            Case insensitive.

        Returns
        -------
        DropNamespaceResponse
            Response containing properties and transaction_id if applicable.
        """
        raise NotImplementedError(
            "Namespace operations are not supported for this connection type"
        )

    def describe_namespace(self, namespace: List[str]) -> DescribeNamespaceResponse:
        """Describe a namespace.

        Parameters
        ----------
        namespace: List[str]
            The namespace identifier to describe.

        Returns
        -------
        DescribeNamespaceResponse
            Response containing the namespace properties.
        """
        raise NotImplementedError(
            "Namespace operations are not supported for this connection type"
        )

    def list_tables(
        self,
        namespace: Optional[List[str]] = None,
        page_token: Optional[str] = None,
        limit: Optional[int] = None,
    ) -> ListTablesResponse:
        """List all tables in this database with pagination support.

        Parameters
        ----------
        namespace: List[str], optional
            The namespace to list tables in.
            None or empty list represents root namespace.
        page_token: str, optional
            Token for pagination. Use the token from a previous response
            to get the next page of results.
        limit: int, optional
            The maximum number of results to return.

        Returns
        -------
        ListTablesResponse
            Response containing table names and optional page_token for pagination.
        """
        raise NotImplementedError(
            "list_tables is not supported for this connection type"
        )

    @abstractmethod
    def table_names(
        self,
        page_token: Optional[str] = None,
        limit: int = 10,
        *,
        namespace: Optional[List[str]] = None,
    ) -> Iterable[str]:
        """List all tables in this database, in sorted order

        Parameters
        ----------
        namespace: List[str], default []
            The namespace to list tables in.
            Empty list represents root namespace.
        page_token: str, optional
            The token to use for pagination. If not present, start from the beginning.
            Typically, this token is last table name from the previous page.
            Only supported by LanceDb Cloud.
        limit: int, default 10
            The size of the page to return.
            Only supported by LanceDb Cloud.

        Returns
        -------
        Iterable of str
        """
        pass

    @abstractmethod
    def create_table(
        self,
        name: str,
        data: Optional[DATA] = None,
        schema: Optional[Union[pa.Schema, LanceModel]] = None,
        mode: str = "create",
        exist_ok: bool = False,
        on_bad_vectors: str = "error",
        fill_value: float = 0.0,
        embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
        *,
        namespace: Optional[List[str]] = None,
        storage_options: Optional[Dict[str, str]] = None,
        storage_options_provider: Optional["StorageOptionsProvider"] = None,
        data_storage_version: Optional[str] = None,
        enable_v2_manifest_paths: Optional[bool] = None,
    ) -> Table:
        """Create a [Table][lancedb.table.Table] in the database.

        Parameters
        ----------
        name: str
            The name of the table.
        namespace: List[str], default []
            The namespace to create the table in.
            Empty list represents root namespace.
        data: The data to initialize the table, *optional*
            User must provide at least one of `data` or `schema`.
            Acceptable types are:

            - list-of-dict

            - pandas.DataFrame

            - pyarrow.Table or pyarrow.RecordBatch
        schema: The schema of the table, *optional*
            Acceptable types are:

            - pyarrow.Schema

            - [LanceModel][lancedb.pydantic.LanceModel]
        mode: str; default "create"
            The mode to use when creating the table.
            Can be either "create" or "overwrite".
            By default, if the table already exists, an exception is raised.
            If you want to overwrite the table, use mode="overwrite".
        exist_ok: bool, default False
            If a table by the same name already exists, then raise an exception
            if exist_ok=False. If exist_ok=True, then open the existing table;
            it will not add the provided data but will validate against any
            schema that's specified.
        on_bad_vectors: str, default "error"
            What to do if any of the vectors are not the same size or contains NaNs.
            One of "error", "drop", "fill".
        fill_value: float
            The value to use when filling vectors. Only used if on_bad_vectors="fill".
        storage_options: dict, optional
            Additional options for the storage backend. Options already set on the
            connection will be inherited by the table, but can be overridden here.
            See available options at
            <https://lancedb.com/docs/storage/>

            To enable stable row IDs (row IDs remain stable after compaction,
            update, delete, and merges), set `new_table_enable_stable_row_ids`
            to `"true"` in storage_options when connecting to the database.
        data_storage_version: optional, str, default "stable"
            Deprecated.  Set `storage_options` when connecting to the database and set
            `new_table_data_storage_version` in the options.
        enable_v2_manifest_paths: optional, bool, default False
            Deprecated.  Set `storage_options` when connecting to the database and set
            `new_table_enable_v2_manifest_paths` in the options.
        Returns
        -------
        LanceTable
            A reference to the newly created table.

        !!! note

            The vector index won't be created by default.
            To create the index, call the `create_index` method on the table.

        Examples
        --------

        Can create with list of tuples or dictionaries:

        >>> import lancedb
        >>> db = lancedb.connect("./.lancedb")
        >>> data = [{"vector": [1.1, 1.2], "lat": 45.5, "long": -122.7},
        ...         {"vector": [0.2, 1.8], "lat": 40.1, "long":  -74.1}]
        >>> db.create_table("my_table", data)
        LanceTable(name='my_table', version=1, ...)
        >>> db["my_table"].head()
        pyarrow.Table
        vector: fixed_size_list<item: float>[2]
          child 0, item: float
        lat: double
        long: double
        ----
        vector: [[[1.1,1.2],[0.2,1.8]]]
        lat: [[45.5,40.1]]
        long: [[-122.7,-74.1]]

        You can also pass a pandas DataFrame:

        >>> import pandas as pd
        >>> data = pd.DataFrame({
        ...    "vector": [[1.1, 1.2], [0.2, 1.8]],
        ...    "lat": [45.5, 40.1],
        ...    "long": [-122.7, -74.1]
        ... })
        >>> db.create_table("table2", data)
        LanceTable(name='table2', version=1, ...)
        >>> db["table2"].head()
        pyarrow.Table
        vector: fixed_size_list<item: float>[2]
          child 0, item: float
        lat: double
        long: double
        ----
        vector: [[[1.1,1.2],[0.2,1.8]]]
        lat: [[45.5,40.1]]
        long: [[-122.7,-74.1]]

        Data is converted to Arrow before being written to disk. For maximum
        control over how data is saved, either provide the PyArrow schema to
        convert to or else provide a [PyArrow Table](pyarrow.Table) directly.

        >>> import pyarrow as pa
        >>> custom_schema = pa.schema([
        ...   pa.field("vector", pa.list_(pa.float32(), 2)),
        ...   pa.field("lat", pa.float32()),
        ...   pa.field("long", pa.float32())
        ... ])
        >>> db.create_table("table3", data, schema = custom_schema)
        LanceTable(name='table3', version=1, ...)
        >>> db["table3"].head()
        pyarrow.Table
        vector: fixed_size_list<item: float>[2]
          child 0, item: float
        lat: float
        long: float
        ----
        vector: [[[1.1,1.2],[0.2,1.8]]]
        lat: [[45.5,40.1]]
        long: [[-122.7,-74.1]]


        It is also possible to create an table from `[Iterable[pa.RecordBatch]]`:


        >>> import pyarrow as pa
        >>> def make_batches():
        ...     for i in range(5):
        ...         yield pa.RecordBatch.from_arrays(
        ...             [
        ...                 pa.array([[3.1, 4.1], [5.9, 26.5]],
        ...                     pa.list_(pa.float32(), 2)),
        ...                 pa.array(["foo", "bar"]),
        ...                 pa.array([10.0, 20.0]),
        ...             ],
        ...             ["vector", "item", "price"],
        ...         )
        >>> schema=pa.schema([
        ...     pa.field("vector", pa.list_(pa.float32(), 2)),
        ...     pa.field("item", pa.utf8()),
        ...     pa.field("price", pa.float32()),
        ... ])
        >>> db.create_table("table4", make_batches(), schema=schema)
        LanceTable(name='table4', version=1, ...)

        """
        raise NotImplementedError

    def __getitem__(self, name: str) -> LanceTable:
        return self.open_table(name)

    def open_table(
        self,
        name: str,
        *,
        namespace: Optional[List[str]] = None,
        storage_options: Optional[Dict[str, str]] = None,
        storage_options_provider: Optional["StorageOptionsProvider"] = None,
        index_cache_size: Optional[int] = None,
    ) -> Table:
        """Open a Lance Table in the database.

        Parameters
        ----------
        name: str
            The name of the table.
        namespace: List[str], optional
            The namespace to open the table from.
            None or empty list represents root namespace.
        index_cache_size: int, default 256
            **Deprecated**: Use session-level cache configuration instead.
            Create a Session with custom cache sizes and pass it to lancedb.connect().

            Set the size of the index cache, specified as a number of entries

            The exact meaning of an "entry" will depend on the type of index:
            * IVF - there is one entry for each IVF partition
            * BTREE - there is one entry for the entire index

            This cache applies to the entire opened table, across all indices.
            Setting this value higher will increase performance on larger datasets
            at the expense of more RAM
        storage_options: dict, optional
            Additional options for the storage backend. Options already set on the
            connection will be inherited by the table, but can be overridden here.
            See available options at
            <https://lancedb.com/docs/storage/>

        Returns
        -------
        A LanceTable object representing the table.
        """
        raise NotImplementedError

    def drop_table(self, name: str, namespace: Optional[List[str]] = None):
        """Drop a table from the database.

        Parameters
        ----------
        name: str
            The name of the table.
        namespace: List[str], default []
            The namespace to drop the table from.
            Empty list represents root namespace.
        """
        if namespace is None:
            namespace = []
        raise NotImplementedError

    def rename_table(
        self,
        cur_name: str,
        new_name: str,
        cur_namespace: Optional[List[str]] = None,
        new_namespace: Optional[List[str]] = None,
    ):
        """Rename a table in the database.

        Parameters
        ----------
        cur_name: str
            The current name of the table.
        new_name: str
            The new name of the table.
        cur_namespace: List[str], optional
            The namespace of the current table.
            None or empty list represents root namespace.
        new_namespace: List[str], optional
            The namespace to move the table to.
            If not specified, defaults to the same as cur_namespace.
        """
        if cur_namespace is None:
            cur_namespace = []
        if new_namespace is None:
            new_namespace = []
        raise NotImplementedError

    def drop_database(self):
        """
        Drop database
        This is the same thing as dropping all the tables
        """
        raise NotImplementedError

    def drop_all_tables(self, namespace: Optional[List[str]] = None):
        """
        Drop all tables from the database

        Parameters
        ----------
        namespace: List[str], optional
            The namespace to drop all tables from.
            None or empty list represents root namespace.
        """
        if namespace is None:
            namespace = []
        raise NotImplementedError

    @property
    def uri(self) -> str:
        return self._uri

list_namespaces

list_namespaces(namespace: Optional[List[str]] = None, page_token: Optional[str] = None, limit: Optional[int] = None) -> ListNamespacesResponse

List immediate child namespace names in the given namespace.

Parameters:

namespace (Optional[List[str]], default: None ) –

The parent namespace to list namespaces in. Empty list represents root namespace.
page_token (Optional[str], default: None ) –

Token for pagination. Use the token from a previous response to get the next page of results.
limit (Optional[int], default: None ) –

The maximum number of results to return.

Returns:

ListNamespacesResponse –

Response containing namespace names and optional page_token for pagination.

Source code in lancedb/db.py

def list_namespaces(
    self,
    namespace: Optional[List[str]] = None,
    page_token: Optional[str] = None,
    limit: Optional[int] = None,
) -> ListNamespacesResponse:
    """List immediate child namespace names in the given namespace.

    Parameters
    ----------
    namespace: List[str], default []
        The parent namespace to list namespaces in.
        Empty list represents root namespace.
    page_token: str, optional
        Token for pagination. Use the token from a previous response
        to get the next page of results.
    limit: int, optional
        The maximum number of results to return.

    Returns
    -------
    ListNamespacesResponse
        Response containing namespace names and optional page_token for pagination.
    """
    if namespace is None:
        namespace = []
    return ListNamespacesResponse(namespaces=[], page_token=None)

create_namespace

create_namespace(namespace: List[str], mode: Optional[str] = None, properties: Optional[Dict[str, str]] = None) -> CreateNamespaceResponse

Create a new namespace.

Parameters:

namespace (List[str]) –

The namespace identifier to create.
mode (Optional[str], default: None ) –

Creation mode - "create" (fail if exists), "exist_ok" (skip if exists), or "overwrite" (replace if exists). Case insensitive.
properties (Optional[Dict[str, str]], default: None ) –

Properties to set on the namespace.

Returns:

CreateNamespaceResponse –

Response containing the properties of the created namespace.

Source code in lancedb/db.py

def create_namespace(
    self,
    namespace: List[str],
    mode: Optional[str] = None,
    properties: Optional[Dict[str, str]] = None,
) -> CreateNamespaceResponse:
    """Create a new namespace.

    Parameters
    ----------
    namespace: List[str]
        The namespace identifier to create.
    mode: str, optional
        Creation mode - "create" (fail if exists), "exist_ok" (skip if exists),
        or "overwrite" (replace if exists). Case insensitive.
    properties: Dict[str, str], optional
        Properties to set on the namespace.

    Returns
    -------
    CreateNamespaceResponse
        Response containing the properties of the created namespace.
    """
    raise NotImplementedError(
        "Namespace operations are not supported for this connection type"
    )

drop_namespace

drop_namespace(namespace: List[str], mode: Optional[str] = None, behavior: Optional[str] = None) -> DropNamespaceResponse

Drop a namespace.

Parameters:

namespace (List[str]) –

The namespace identifier to drop.
mode (Optional[str], default: None ) –

Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
behavior (Optional[str], default: None ) –

Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE"). Case insensitive.

Returns:

DropNamespaceResponse –

Response containing properties and transaction_id if applicable.

Source code in lancedb/db.py

def drop_namespace(
    self,
    namespace: List[str],
    mode: Optional[str] = None,
    behavior: Optional[str] = None,
) -> DropNamespaceResponse:
    """Drop a namespace.

    Parameters
    ----------
    namespace: List[str]
        The namespace identifier to drop.
    mode: str, optional
        Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
    behavior: str, optional
        Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE").
        Case insensitive.

    Returns
    -------
    DropNamespaceResponse
        Response containing properties and transaction_id if applicable.
    """
    raise NotImplementedError(
        "Namespace operations are not supported for this connection type"
    )

describe_namespace

describe_namespace(namespace: List[str]) -> DescribeNamespaceResponse

Describe a namespace.

Parameters:

namespace (List[str]) –

The namespace identifier to describe.

Returns:

DescribeNamespaceResponse –

Response containing the namespace properties.

Source code in lancedb/db.py

def describe_namespace(self, namespace: List[str]) -> DescribeNamespaceResponse:
    """Describe a namespace.

    Parameters
    ----------
    namespace: List[str]
        The namespace identifier to describe.

    Returns
    -------
    DescribeNamespaceResponse
        Response containing the namespace properties.
    """
    raise NotImplementedError(
        "Namespace operations are not supported for this connection type"
    )

list_tables

list_tables(namespace: Optional[List[str]] = None, page_token: Optional[str] = None, limit: Optional[int] = None) -> ListTablesResponse

List all tables in this database with pagination support.

Parameters:

namespace (Optional[List[str]], default: None ) –

The namespace to list tables in. None or empty list represents root namespace.
page_token (Optional[str], default: None ) –

Token for pagination. Use the token from a previous response to get the next page of results.
limit (Optional[int], default: None ) –

The maximum number of results to return.

Returns:

ListTablesResponse –

Response containing table names and optional page_token for pagination.

Source code in lancedb/db.py

def list_tables(
    self,
    namespace: Optional[List[str]] = None,
    page_token: Optional[str] = None,
    limit: Optional[int] = None,
) -> ListTablesResponse:
    """List all tables in this database with pagination support.

    Parameters
    ----------
    namespace: List[str], optional
        The namespace to list tables in.
        None or empty list represents root namespace.
    page_token: str, optional
        Token for pagination. Use the token from a previous response
        to get the next page of results.
    limit: int, optional
        The maximum number of results to return.

    Returns
    -------
    ListTablesResponse
        Response containing table names and optional page_token for pagination.
    """
    raise NotImplementedError(
        "list_tables is not supported for this connection type"
    )

table_names `abstractmethod`

table_names(page_token: Optional[str] = None, limit: int = 10, *, namespace: Optional[List[str]] = None) -> Iterable[str]

List all tables in this database, in sorted order

Parameters:

namespace (Optional[List[str]], default: None ) –

The namespace to list tables in. Empty list represents root namespace.
page_token (Optional[str], default: None ) –

The token to use for pagination. If not present, start from the beginning. Typically, this token is last table name from the previous page. Only supported by LanceDb Cloud.
limit (int, default: 10 ) –

The size of the page to return. Only supported by LanceDb Cloud.

Returns:

Iterable of str –

Source code in lancedb/db.py

@abstractmethod
def table_names(
    self,
    page_token: Optional[str] = None,
    limit: int = 10,
    *,
    namespace: Optional[List[str]] = None,
) -> Iterable[str]:
    """List all tables in this database, in sorted order

    Parameters
    ----------
    namespace: List[str], default []
        The namespace to list tables in.
        Empty list represents root namespace.
    page_token: str, optional
        The token to use for pagination. If not present, start from the beginning.
        Typically, this token is last table name from the previous page.
        Only supported by LanceDb Cloud.
    limit: int, default 10
        The size of the page to return.
        Only supported by LanceDb Cloud.

    Returns
    -------
    Iterable of str
    """
    pass

create_table `abstractmethod`

create_table(name: str, data: Optional[DATA] = None, schema: Optional[Union[Schema, LanceModel]] = None, mode: str = 'create', exist_ok: bool = False, on_bad_vectors: str = 'error', fill_value: float = 0.0, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, *, namespace: Optional[List[str]] = None, storage_options: Optional[Dict[str, str]] = None, storage_options_provider: Optional['StorageOptionsProvider'] = None, data_storage_version: Optional[str] = None, enable_v2_manifest_paths: Optional[bool] = None) -> Table

Create a Table in the database.

Parameters:

name (str) –

The name of the table.
namespace (Optional[List[str]], default: None ) –

The namespace to create the table in. Empty list represents root namespace.
data (Optional[DATA], default: None ) –
User must provide at least one of data or schema. Acceptable types are:
- list-of-dict
- pandas.DataFrame
- pyarrow.Table or pyarrow.RecordBatch
schema (Optional[Union[Schema, LanceModel]], default: None ) –
Acceptable types are:
- pyarrow.Schema
- LanceModel
mode (str, default: 'create' ) –

The mode to use when creating the table. Can be either "create" or "overwrite". By default, if the table already exists, an exception is raised. If you want to overwrite the table, use mode="overwrite".
exist_ok (bool, default: False ) –

If a table by the same name already exists, then raise an exception if exist_ok=False. If exist_ok=True, then open the existing table; it will not add the provided data but will validate against any schema that's specified.
on_bad_vectors (str, default: 'error' ) –

What to do if any of the vectors are not the same size or contains NaNs. One of "error", "drop", "fill".
fill_value (float, default: 0.0 ) –

The value to use when filling vectors. Only used if on_bad_vectors="fill".
storage_options (Optional[Dict[str, str]], default: None ) –

Additional options for the storage backend. Options already set on the connection will be inherited by the table, but can be overridden here. See available options at https://lancedb.com/docs/storage/

To enable stable row IDs (row IDs remain stable after compaction, update, delete, and merges), set new_table_enable_stable_row_ids to "true" in storage_options when connecting to the database.
data_storage_version (Optional[str], default: None ) –

Deprecated. Set storage_options when connecting to the database and set new_table_data_storage_version in the options.
enable_v2_manifest_paths (Optional[bool], default: None ) –

Deprecated. Set storage_options when connecting to the database and set new_table_enable_v2_manifest_paths in the options.

Returns:

LanceTable –

A reference to the newly created table.
!!! note –

The vector index won't be created by default. To create the index, call the create_index method on the table.

Examples:

Can create with list of tuples or dictionaries:

>>> import lancedb
>>> db = lancedb.connect("./.lancedb")
>>> data = [{"vector": [1.1, 1.2], "lat": 45.5, "long": -122.7},
...         {"vector": [0.2, 1.8], "lat": 40.1, "long":  -74.1}]
>>> db.create_table("my_table", data)
LanceTable(name='my_table', version=1, ...)
>>> db["my_table"].head()
pyarrow.Table
vector: fixed_size_list<item: float>[2]
  child 0, item: float
lat: double
long: double
----
vector: [[[1.1,1.2],[0.2,1.8]]]
lat: [[45.5,40.1]]
long: [[-122.7,-74.1]]

You can also pass a pandas DataFrame:

>>> import pandas as pd
>>> data = pd.DataFrame({
...    "vector": [[1.1, 1.2], [0.2, 1.8]],
...    "lat": [45.5, 40.1],
...    "long": [-122.7, -74.1]
... })
>>> db.create_table("table2", data)
LanceTable(name='table2', version=1, ...)
>>> db["table2"].head()
pyarrow.Table
vector: fixed_size_list<item: float>[2]
  child 0, item: float
lat: double
long: double
----
vector: [[[1.1,1.2],[0.2,1.8]]]
lat: [[45.5,40.1]]
long: [[-122.7,-74.1]]

Data is converted to Arrow before being written to disk. For maximum control over how data is saved, either provide the PyArrow schema to convert to or else provide a PyArrow Table directly.

>>> import pyarrow as pa
>>> custom_schema = pa.schema([
...   pa.field("vector", pa.list_(pa.float32(), 2)),
...   pa.field("lat", pa.float32()),
...   pa.field("long", pa.float32())
... ])
>>> db.create_table("table3", data, schema = custom_schema)
LanceTable(name='table3', version=1, ...)
>>> db["table3"].head()
pyarrow.Table
vector: fixed_size_list<item: float>[2]
  child 0, item: float
lat: float
long: float
----
vector: [[[1.1,1.2],[0.2,1.8]]]
lat: [[45.5,40.1]]
long: [[-122.7,-74.1]]

It is also possible to create an table from [Iterable[pa.RecordBatch]]:

>>> import pyarrow as pa
>>> def make_batches():
...     for i in range(5):
...         yield pa.RecordBatch.from_arrays(
...             [
...                 pa.array([[3.1, 4.1], [5.9, 26.5]],
...                     pa.list_(pa.float32(), 2)),
...                 pa.array(["foo", "bar"]),
...                 pa.array([10.0, 20.0]),
...             ],
...             ["vector", "item", "price"],
...         )
>>> schema=pa.schema([
...     pa.field("vector", pa.list_(pa.float32(), 2)),
...     pa.field("item", pa.utf8()),
...     pa.field("price", pa.float32()),
... ])
>>> db.create_table("table4", make_batches(), schema=schema)
LanceTable(name='table4', version=1, ...)

Source code in lancedb/db.py

@abstractmethod
def create_table(
    self,
    name: str,
    data: Optional[DATA] = None,
    schema: Optional[Union[pa.Schema, LanceModel]] = None,
    mode: str = "create",
    exist_ok: bool = False,
    on_bad_vectors: str = "error",
    fill_value: float = 0.0,
    embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
    *,
    namespace: Optional[List[str]] = None,
    storage_options: Optional[Dict[str, str]] = None,
    storage_options_provider: Optional["StorageOptionsProvider"] = None,
    data_storage_version: Optional[str] = None,
    enable_v2_manifest_paths: Optional[bool] = None,
) -> Table:
    """Create a [Table][lancedb.table.Table] in the database.

    Parameters
    ----------
    name: str
        The name of the table.
    namespace: List[str], default []
        The namespace to create the table in.
        Empty list represents root namespace.
    data: The data to initialize the table, *optional*
        User must provide at least one of `data` or `schema`.
        Acceptable types are:

        - list-of-dict

        - pandas.DataFrame

        - pyarrow.Table or pyarrow.RecordBatch
    schema: The schema of the table, *optional*
        Acceptable types are:

        - pyarrow.Schema

        - [LanceModel][lancedb.pydantic.LanceModel]
    mode: str; default "create"
        The mode to use when creating the table.
        Can be either "create" or "overwrite".
        By default, if the table already exists, an exception is raised.
        If you want to overwrite the table, use mode="overwrite".
    exist_ok: bool, default False
        If a table by the same name already exists, then raise an exception
        if exist_ok=False. If exist_ok=True, then open the existing table;
        it will not add the provided data but will validate against any
        schema that's specified.
    on_bad_vectors: str, default "error"
        What to do if any of the vectors are not the same size or contains NaNs.
        One of "error", "drop", "fill".
    fill_value: float
        The value to use when filling vectors. Only used if on_bad_vectors="fill".
    storage_options: dict, optional
        Additional options for the storage backend. Options already set on the
        connection will be inherited by the table, but can be overridden here.
        See available options at
        <https://lancedb.com/docs/storage/>

        To enable stable row IDs (row IDs remain stable after compaction,
        update, delete, and merges), set `new_table_enable_stable_row_ids`
        to `"true"` in storage_options when connecting to the database.
    data_storage_version: optional, str, default "stable"
        Deprecated.  Set `storage_options` when connecting to the database and set
        `new_table_data_storage_version` in the options.
    enable_v2_manifest_paths: optional, bool, default False
        Deprecated.  Set `storage_options` when connecting to the database and set
        `new_table_enable_v2_manifest_paths` in the options.
    Returns
    -------
    LanceTable
        A reference to the newly created table.

    !!! note

        The vector index won't be created by default.
        To create the index, call the `create_index` method on the table.

    Examples
    --------

    Can create with list of tuples or dictionaries:

    >>> import lancedb
    >>> db = lancedb.connect("./.lancedb")
    >>> data = [{"vector": [1.1, 1.2], "lat": 45.5, "long": -122.7},
    ...         {"vector": [0.2, 1.8], "lat": 40.1, "long":  -74.1}]
    >>> db.create_table("my_table", data)
    LanceTable(name='my_table', version=1, ...)
    >>> db["my_table"].head()
    pyarrow.Table
    vector: fixed_size_list<item: float>[2]
      child 0, item: float
    lat: double
    long: double
    ----
    vector: [[[1.1,1.2],[0.2,1.8]]]
    lat: [[45.5,40.1]]
    long: [[-122.7,-74.1]]

    You can also pass a pandas DataFrame:

    >>> import pandas as pd
    >>> data = pd.DataFrame({
    ...    "vector": [[1.1, 1.2], [0.2, 1.8]],
    ...    "lat": [45.5, 40.1],
    ...    "long": [-122.7, -74.1]
    ... })
    >>> db.create_table("table2", data)
    LanceTable(name='table2', version=1, ...)
    >>> db["table2"].head()
    pyarrow.Table
    vector: fixed_size_list<item: float>[2]
      child 0, item: float
    lat: double
    long: double
    ----
    vector: [[[1.1,1.2],[0.2,1.8]]]
    lat: [[45.5,40.1]]
    long: [[-122.7,-74.1]]

    Data is converted to Arrow before being written to disk. For maximum
    control over how data is saved, either provide the PyArrow schema to
    convert to or else provide a [PyArrow Table](pyarrow.Table) directly.

    >>> import pyarrow as pa
    >>> custom_schema = pa.schema([
    ...   pa.field("vector", pa.list_(pa.float32(), 2)),
    ...   pa.field("lat", pa.float32()),
    ...   pa.field("long", pa.float32())
    ... ])
    >>> db.create_table("table3", data, schema = custom_schema)
    LanceTable(name='table3', version=1, ...)
    >>> db["table3"].head()
    pyarrow.Table
    vector: fixed_size_list<item: float>[2]
      child 0, item: float
    lat: float
    long: float
    ----
    vector: [[[1.1,1.2],[0.2,1.8]]]
    lat: [[45.5,40.1]]
    long: [[-122.7,-74.1]]


    It is also possible to create an table from `[Iterable[pa.RecordBatch]]`:


    >>> import pyarrow as pa
    >>> def make_batches():
    ...     for i in range(5):
    ...         yield pa.RecordBatch.from_arrays(
    ...             [
    ...                 pa.array([[3.1, 4.1], [5.9, 26.5]],
    ...                     pa.list_(pa.float32(), 2)),
    ...                 pa.array(["foo", "bar"]),
    ...                 pa.array([10.0, 20.0]),
    ...             ],
    ...             ["vector", "item", "price"],
    ...         )
    >>> schema=pa.schema([
    ...     pa.field("vector", pa.list_(pa.float32(), 2)),
    ...     pa.field("item", pa.utf8()),
    ...     pa.field("price", pa.float32()),
    ... ])
    >>> db.create_table("table4", make_batches(), schema=schema)
    LanceTable(name='table4', version=1, ...)

    """
    raise NotImplementedError

open_table

open_table(name: str, *, namespace: Optional[List[str]] = None, storage_options: Optional[Dict[str, str]] = None, storage_options_provider: Optional['StorageOptionsProvider'] = None, index_cache_size: Optional[int] = None) -> Table

Open a Lance Table in the database.

Parameters:

name (str) –

The name of the table.
namespace (Optional[List[str]], default: None ) –

The namespace to open the table from. None or empty list represents root namespace.
index_cache_size (Optional[int], default: None ) –

Deprecated: Use session-level cache configuration instead. Create a Session with custom cache sizes and pass it to lancedb.connect().

Set the size of the index cache, specified as a number of entries

The exact meaning of an "entry" will depend on the type of index: * IVF - there is one entry for each IVF partition * BTREE - there is one entry for the entire index

This cache applies to the entire opened table, across all indices. Setting this value higher will increase performance on larger datasets at the expense of more RAM
storage_options (Optional[Dict[str, str]], default: None ) –

Additional options for the storage backend. Options already set on the connection will be inherited by the table, but can be overridden here. See available options at https://lancedb.com/docs/storage/

Returns:

A LanceTable object representing the table. –

Source code in lancedb/db.py

def open_table(
    self,
    name: str,
    *,
    namespace: Optional[List[str]] = None,
    storage_options: Optional[Dict[str, str]] = None,
    storage_options_provider: Optional["StorageOptionsProvider"] = None,
    index_cache_size: Optional[int] = None,
) -> Table:
    """Open a Lance Table in the database.

    Parameters
    ----------
    name: str
        The name of the table.
    namespace: List[str], optional
        The namespace to open the table from.
        None or empty list represents root namespace.
    index_cache_size: int, default 256
        **Deprecated**: Use session-level cache configuration instead.
        Create a Session with custom cache sizes and pass it to lancedb.connect().

        Set the size of the index cache, specified as a number of entries

        The exact meaning of an "entry" will depend on the type of index:
        * IVF - there is one entry for each IVF partition
        * BTREE - there is one entry for the entire index

        This cache applies to the entire opened table, across all indices.
        Setting this value higher will increase performance on larger datasets
        at the expense of more RAM
    storage_options: dict, optional
        Additional options for the storage backend. Options already set on the
        connection will be inherited by the table, but can be overridden here.
        See available options at
        <https://lancedb.com/docs/storage/>

    Returns
    -------
    A LanceTable object representing the table.
    """
    raise NotImplementedError

drop_table

drop_table(name: str, namespace: Optional[List[str]] = None)

Drop a table from the database.

Parameters:

name (str) –

The name of the table.
namespace (Optional[List[str]], default: None ) –

The namespace to drop the table from. Empty list represents root namespace.

Source code in lancedb/db.py

def drop_table(self, name: str, namespace: Optional[List[str]] = None):
    """Drop a table from the database.

    Parameters
    ----------
    name: str
        The name of the table.
    namespace: List[str], default []
        The namespace to drop the table from.
        Empty list represents root namespace.
    """
    if namespace is None:
        namespace = []
    raise NotImplementedError

rename_table

rename_table(cur_name: str, new_name: str, cur_namespace: Optional[List[str]] = None, new_namespace: Optional[List[str]] = None)

Rename a table in the database.

Parameters:

cur_name (str) –

The current name of the table.
new_name (str) –

The new name of the table.
cur_namespace (Optional[List[str]], default: None ) –

The namespace of the current table. None or empty list represents root namespace.
new_namespace (Optional[List[str]], default: None ) –

The namespace to move the table to. If not specified, defaults to the same as cur_namespace.

Source code in lancedb/db.py

def rename_table(
    self,
    cur_name: str,
    new_name: str,
    cur_namespace: Optional[List[str]] = None,
    new_namespace: Optional[List[str]] = None,
):
    """Rename a table in the database.

    Parameters
    ----------
    cur_name: str
        The current name of the table.
    new_name: str
        The new name of the table.
    cur_namespace: List[str], optional
        The namespace of the current table.
        None or empty list represents root namespace.
    new_namespace: List[str], optional
        The namespace to move the table to.
        If not specified, defaults to the same as cur_namespace.
    """
    if cur_namespace is None:
        cur_namespace = []
    if new_namespace is None:
        new_namespace = []
    raise NotImplementedError

drop_database

drop_database()

Drop database This is the same thing as dropping all the tables

Source code in lancedb/db.py

def drop_database(self):
    """
    Drop database
    This is the same thing as dropping all the tables
    """
    raise NotImplementedError

drop_all_tables

drop_all_tables(namespace: Optional[List[str]] = None)

Drop all tables from the database

Parameters:

namespace (Optional[List[str]], default: None ) –

The namespace to drop all tables from. None or empty list represents root namespace.

Source code in lancedb/db.py

def drop_all_tables(self, namespace: Optional[List[str]] = None):
    """
    Drop all tables from the database

    Parameters
    ----------
    namespace: List[str], optional
        The namespace to drop all tables from.
        None or empty list represents root namespace.
    """
    if namespace is None:
        namespace = []
    raise NotImplementedError

Tables (Synchronous)

lancedb.table.Table

Bases: ABC

A Table is a collection of Records in a LanceDB Database.

Examples:

Create using DBConnection.create_table (more examples in that method's documentation).

>>> import lancedb
>>> db = lancedb.connect("./.lancedb")
>>> table = db.create_table("my_table", data=[{"vector": [1.1, 1.2], "b": 2}])
>>> table.head()
pyarrow.Table
vector: fixed_size_list<item: float>[2]
  child 0, item: float
b: int64
----
vector: [[[1.1,1.2]]]
b: [[2]]

Can append new data with Table.add().

>>> table.add([{"vector": [0.5, 1.3], "b": 4}])
AddResult(version=2)

Can query the table with Table.search.

>>> table.search([0.4, 0.4]).select(["b", "vector"]).to_pandas()
   b      vector  _distance
0  4  [0.5, 1.3]       0.82
1  2  [1.1, 1.2]       1.13

Search queries are much faster when an index is created. See Table.create_index.

Source code in lancedb/table.py

class Table(ABC):
    """
    A Table is a collection of Records in a LanceDB Database.

    Examples
    --------

    Create using [DBConnection.create_table][lancedb.DBConnection.create_table]
    (more examples in that method's documentation).

    >>> import lancedb
    >>> db = lancedb.connect("./.lancedb")
    >>> table = db.create_table("my_table", data=[{"vector": [1.1, 1.2], "b": 2}])
    >>> table.head()
    pyarrow.Table
    vector: fixed_size_list<item: float>[2]
      child 0, item: float
    b: int64
    ----
    vector: [[[1.1,1.2]]]
    b: [[2]]

    Can append new data with [Table.add()][lancedb.table.Table.add].

    >>> table.add([{"vector": [0.5, 1.3], "b": 4}])
    AddResult(version=2)

    Can query the table with [Table.search][lancedb.table.Table.search].

    >>> table.search([0.4, 0.4]).select(["b", "vector"]).to_pandas()
       b      vector  _distance
    0  4  [0.5, 1.3]       0.82
    1  2  [1.1, 1.2]       1.13

    Search queries are much faster when an index is created. See
    [Table.create_index][lancedb.table.Table.create_index].
    """

    @property
    @abstractmethod
    def name(self) -> str:
        """The name of this Table"""
        raise NotImplementedError

    @property
    @abstractmethod
    def version(self) -> int:
        """The version of this Table"""
        raise NotImplementedError

    @property
    @abstractmethod
    def schema(self) -> pa.Schema:
        """The [Arrow Schema](https://arrow.apache.org/docs/python/api/datatypes.html#)
        of this Table

        """
        raise NotImplementedError

    @property
    @abstractmethod
    def tags(self) -> Tags:
        """Tag management for the table.

        Similar to Git, tags are a way to add metadata to a specific version of the
        table.

        .. warning::

            Tagged versions are exempted from the :py:meth:`cleanup_old_versions()`
            process.

            To remove a version that has been tagged, you must first
            :py:meth:`~Tags.delete` the associated tag.

        Examples
        --------

        .. code-block:: python

            table = db.open_table("my_table")
            table.tags.create("v2-prod-20250203", 10)

            tags = table.tags.list()

        """
        raise NotImplementedError

    def __len__(self) -> int:
        """The number of rows in this Table"""
        return self.count_rows(None)

    @property
    @abstractmethod
    def embedding_functions(self) -> Dict[str, EmbeddingFunctionConfig]:
        """
        Get a mapping from vector column name to it's configured embedding function.
        """

    @abstractmethod
    def count_rows(self, filter: Optional[str] = None) -> int:
        """
        Count the number of rows in the table.

        Parameters
        ----------
        filter: str, optional
            A SQL where clause to filter the rows to count.
        """
        raise NotImplementedError

    def to_pandas(self) -> "pandas.DataFrame":
        """Return the table as a pandas DataFrame.

        Returns
        -------
        pd.DataFrame
        """
        return self.to_arrow().to_pandas()

    @abstractmethod
    def to_arrow(self) -> pa.Table:
        """Return the table as a pyarrow Table.

        Returns
        -------
        pa.Table
        """
        raise NotImplementedError

    def create_index(
        self,
        metric="l2",
        num_partitions=256,
        num_sub_vectors=96,
        vector_column_name: str = VECTOR_COLUMN_NAME,
        replace: bool = True,
        accelerator: Optional[str] = None,
        index_cache_size: Optional[int] = None,
        *,
        index_type: VectorIndexType = "IVF_PQ",
        wait_timeout: Optional[timedelta] = None,
        num_bits: int = 8,
        max_iterations: int = 50,
        sample_rate: int = 256,
        m: int = 20,
        ef_construction: int = 300,
        name: Optional[str] = None,
        train: bool = True,
        target_partition_size: Optional[int] = None,
    ):
        """Create an index on the table.

        Parameters
        ----------
        metric: str, default "l2"
            The distance metric to use when creating the index.
            Valid values are "l2", "cosine", "dot", or "hamming".
            l2 is euclidean distance.
            Hamming is available only for binary vectors.
        num_partitions: int, default 256
            The number of IVF partitions to use when creating the index.
            Default is 256.
        num_sub_vectors: int, default 96
            The number of PQ sub-vectors to use when creating the index.
            Default is 96.
        vector_column_name: str, default "vector"
            The vector column name to create the index.
        replace: bool, default True
            - If True, replace the existing index if it exists.

            - If False, raise an error if duplicate index exists.
        accelerator: str, default None
            If set, use the given accelerator to create the index.
            Only support "cuda" for now.
        index_cache_size : int, optional
            The size of the index cache in number of entries. Default value is 256.
        num_bits: int
            The number of bits to encode sub-vectors. Only used with the IVF_PQ index.
            Only 4 and 8 are supported.
        wait_timeout: timedelta, optional
            The timeout to wait if indexing is asynchronous.
        name: str, optional
            The name of the index. If not provided, a default name will be generated.
        train: bool, default True
            Whether to train the index with existing data. Vector indices always train
            with existing data.
        """
        raise NotImplementedError

    def drop_index(self, name: str) -> None:
        """
        Drop an index from the table.

        Parameters
        ----------
        name: str
            The name of the index to drop.

        Notes
        -----
        This does not delete the index from disk, it just removes it from the table.
        To delete the index, run [optimize][lancedb.table.Table.optimize]
        after dropping the index.

        Use [list_indices][lancedb.table.Table.list_indices] to find the names of
        the indices.
        """
        raise NotImplementedError

    def wait_for_index(
        self, index_names: Iterable[str], timeout: timedelta = timedelta(seconds=300)
    ) -> None:
        """
        Wait for indexing to complete for the given index names.
        This will poll the table until all the indices are fully indexed,
        or raise a timeout exception if the timeout is reached.

        Parameters
        ----------
        index_names: str
            The name of the indices to poll
        timeout: timedelta
            Timeout to wait for asynchronous indexing. The default is 5 minutes.
        """
        raise NotImplementedError

    @abstractmethod
    def stats(self) -> TableStatistics:
        """
        Retrieve table and fragment statistics.
        """
        raise NotImplementedError

    @abstractmethod
    def create_scalar_index(
        self,
        column: str,
        *,
        replace: bool = True,
        index_type: ScalarIndexType = "BTREE",
        wait_timeout: Optional[timedelta] = None,
        name: Optional[str] = None,
    ):
        """Create a scalar index on a column.

        Parameters
        ----------
        column : str
            The column to be indexed.  Must be a boolean, integer, float,
            or string column.
        replace : bool, default True
            Replace the existing index if it exists.
        index_type: Literal["BTREE", "BITMAP", "LABEL_LIST"], default "BTREE"
            The type of index to create.
        wait_timeout: timedelta, optional
            The timeout to wait if indexing is asynchronous.
        name: str, optional
            The name of the index. If not provided, a default name will be generated.
        Examples
        --------

        Scalar indices, like vector indices, can be used to speed up scans.  A scalar
        index can speed up scans that contain filter expressions on the indexed column.
        For example, the following scan will be faster if the column ``my_col`` has
        a scalar index:

        >>> import lancedb # doctest: +SKIP
        >>> db = lancedb.connect("/data/lance") # doctest: +SKIP
        >>> img_table = db.open_table("images") # doctest: +SKIP
        >>> my_df = img_table.search().where("my_col = 7", # doctest: +SKIP
        ...                                  prefilter=True).to_pandas()

        Scalar indices can also speed up scans containing a vector search and a
        prefilter:

        >>> import lancedb # doctest: +SKIP
        >>> db = lancedb.connect("/data/lance") # doctest: +SKIP
        >>> img_table = db.open_table("images") # doctest: +SKIP
        >>> img_table.search([1, 2, 3, 4], vector_column_name="vector") # doctest: +SKIP
        ...     .where("my_col != 7", prefilter=True)
        ...     .to_pandas()

        Scalar indices can only speed up scans for basic filters using
        equality, comparison, range (e.g. ``my_col BETWEEN 0 AND 100``), and set
        membership (e.g. `my_col IN (0, 1, 2)`)

        Scalar indices can be used if the filter contains multiple indexed columns and
        the filter criteria are AND'd or OR'd together
        (e.g. ``my_col < 0 AND other_col> 100``)

        Scalar indices may be used if the filter contains non-indexed columns but,
        depending on the structure of the filter, they may not be usable.  For example,
        if the column ``not_indexed`` does not have a scalar index then the filter
        ``my_col = 0 OR not_indexed = 1`` will not be able to use any scalar index on
        ``my_col``.
        """
        raise NotImplementedError

    def create_fts_index(
        self,
        field_names: Union[str, List[str]],
        *,
        ordering_field_names: Optional[Union[str, List[str]]] = None,
        replace: bool = False,
        writer_heap_size: Optional[int] = 1024 * 1024 * 1024,
        use_tantivy: bool = False,
        tokenizer_name: Optional[str] = None,
        with_position: bool = False,
        # tokenizer configs:
        base_tokenizer: BaseTokenizerType = "simple",
        language: str = "English",
        max_token_length: Optional[int] = 40,
        lower_case: bool = True,
        stem: bool = True,
        remove_stop_words: bool = True,
        ascii_folding: bool = True,
        ngram_min_length: int = 3,
        ngram_max_length: int = 3,
        prefix_only: bool = False,
        wait_timeout: Optional[timedelta] = None,
        name: Optional[str] = None,
    ):
        """Create a full-text search index on the table.

        Warning - this API is highly experimental and is highly likely to change
        in the future.

        Parameters
        ----------
        field_names: str or list of str
            The name(s) of the field to index.
            can be only str if use_tantivy=True for now.
        replace: bool, default False
            If True, replace the existing index if it exists. Note that this is
            not yet an atomic operation; the index will be temporarily
            unavailable while the new index is being created.
        writer_heap_size: int, default 1GB
            Only available with use_tantivy=True
        ordering_field_names:
            A list of unsigned type fields to index to optionally order
            results on at search time.
            only available with use_tantivy=True
        tokenizer_name: str, default "default"
            The tokenizer to use for the index. Can be "raw", "default" or the 2 letter
            language code followed by "_stem". So for english it would be "en_stem".
            For available languages see: https://docs.rs/tantivy/latest/tantivy/tokenizer/enum.Language.html
        use_tantivy: bool, default False
            If True, use the legacy full-text search implementation based on tantivy.
            If False, use the new full-text search implementation based on lance-index.
        with_position: bool, default False
            Only available with use_tantivy=False
            If False, do not store the positions of the terms in the text.
            This can reduce the size of the index and improve indexing speed.
            But it will raise an exception for phrase queries.
        base_tokenizer : str, default "simple"
            The base tokenizer to use for tokenization. Options are:
            - "simple": Splits text by whitespace and punctuation.
            - "whitespace": Split text by whitespace, but not punctuation.
            - "raw": No tokenization. The entire text is treated as a single token.
            - "ngram": N-Gram tokenizer.
        language : str, default "English"
            The language to use for tokenization.
        max_token_length : int, default 40
            The maximum token length to index. Tokens longer than this length will be
            ignored.
        lower_case : bool, default True
            Whether to convert the token to lower case. This makes queries
            case-insensitive.
        stem : bool, default True
            Whether to stem the token. Stemming reduces words to their root form.
            For example, in English "running" and "runs" would both be reduced to "run".
        remove_stop_words : bool, default True
            Whether to remove stop words. Stop words are common words that are often
            removed from text before indexing. For example, in English "the" and "and".
        ascii_folding : bool, default True
            Whether to fold ASCII characters. This converts accented characters to
            their ASCII equivalent. For example, "café" would be converted to "cafe".
        ngram_min_length: int, default 3
            The minimum length of an n-gram.
        ngram_max_length: int, default 3
            The maximum length of an n-gram.
        prefix_only: bool, default False
            Whether to only index the prefix of the token for ngram tokenizer.
        wait_timeout: timedelta, optional
            The timeout to wait if indexing is asynchronous.
        name: str, optional
            The name of the index. If not provided, a default name will be generated.
        """
        raise NotImplementedError

    @abstractmethod
    def add(
        self,
        data: DATA,
        mode: AddMode = "append",
        on_bad_vectors: OnBadVectorsType = "error",
        fill_value: float = 0.0,
    ) -> AddResult:
        """Add more data to the [Table](Table).

        Parameters
        ----------
        data: DATA
            The data to insert into the table. Acceptable types are:

            - list-of-dict

            - pandas.DataFrame

            - pyarrow.Table or pyarrow.RecordBatch
        mode: str
            The mode to use when writing the data. Valid values are
            "append" and "overwrite".
        on_bad_vectors: str, default "error"
            What to do if any of the vectors are not the same size or contains NaNs.
            One of "error", "drop", "fill".
        fill_value: float, default 0.
            The value to use when filling vectors. Only used if on_bad_vectors="fill".

        Returns
        -------
        AddResult
            An object containing the new version number of the table after adding data.
        """
        raise NotImplementedError

    def merge_insert(self, on: Union[str, Iterable[str]]) -> LanceMergeInsertBuilder:
        """
        Returns a [`LanceMergeInsertBuilder`][lancedb.merge.LanceMergeInsertBuilder]
        that can be used to create a "merge insert" operation

        This operation can add rows, update rows, and remove rows all in a single
        transaction. It is a very generic tool that can be used to create
        behaviors like "insert if not exists", "update or insert (i.e. upsert)",
        or even replace a portion of existing data with new data (e.g. replace
        all data where month="january")

        The merge insert operation works by combining new data from a
        **source table** with existing data in a **target table** by using a
        join.  There are three categories of records.

        "Matched" records are records that exist in both the source table and
        the target table. "Not matched" records exist only in the source table
        (e.g. these are new data) "Not matched by source" records exist only
        in the target table (this is old data)

        The builder returned by this method can be used to customize what
        should happen for each category of data.

        Please note that the data may appear to be reordered as part of this
        operation.  This is because updated rows will be deleted from the
        dataset and then reinserted at the end with the new values.

        Parameters
        ----------

        on: Union[str, Iterable[str]]
            A column (or columns) to join on.  This is how records from the
            source table and target table are matched.  Typically this is some
            kind of key or id column.

        Examples
        --------
        >>> import lancedb
        >>> data = pa.table({"a": [2, 1, 3], "b": ["a", "b", "c"]})
        >>> db = lancedb.connect("./.lancedb")
        >>> table = db.create_table("my_table", data)
        >>> new_data = pa.table({"a": [2, 3, 4], "b": ["x", "y", "z"]})
        >>> # Perform a "upsert" operation
        >>> res = table.merge_insert("a")     \\
        ...      .when_matched_update_all()     \\
        ...      .when_not_matched_insert_all() \\
        ...      .execute(new_data)
        >>> res
        MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0, num_attempts=1)
        >>> # The order of new rows is non-deterministic since we use
        >>> # a hash-join as part of this operation and so we sort here
        >>> table.to_arrow().sort_by("a").to_pandas()
           a  b
        0  1  b
        1  2  x
        2  3  y
        3  4  z
        """  # noqa: E501
        on = [on] if isinstance(on, str) else list(iter(on))

        return LanceMergeInsertBuilder(self, on)

    @abstractmethod
    def search(
        self,
        query: Optional[
            Union[VEC, str, "PIL.Image.Image", Tuple, FullTextQuery]
        ] = None,
        vector_column_name: Optional[str] = None,
        query_type: QueryType = "auto",
        ordering_field_name: Optional[str] = None,
        fts_columns: Optional[Union[str, List[str]]] = None,
    ) -> LanceQueryBuilder:
        """Create a search query to find the nearest neighbors
        of the given query vector. We currently support [vector search][search]
        and [full-text search][experimental-full-text-search].

        All query options are defined in
        [LanceQueryBuilder][lancedb.query.LanceQueryBuilder].

        Examples
        --------
        >>> import lancedb
        >>> db = lancedb.connect("./.lancedb")
        >>> data = [
        ...    {"original_width": 100, "caption": "bar", "vector": [0.1, 2.3, 4.5]},
        ...    {"original_width": 2000, "caption": "foo",  "vector": [0.5, 3.4, 1.3]},
        ...    {"original_width": 3000, "caption": "test", "vector": [0.3, 6.2, 2.6]}
        ... ]
        >>> table = db.create_table("my_table", data)
        >>> query = [0.4, 1.4, 2.4]
        >>> (table.search(query)
        ...     .where("original_width > 1000", prefilter=True)
        ...     .select(["caption", "original_width", "vector"])
        ...     .limit(2)
        ...     .to_pandas())
          caption  original_width           vector  _distance
        0     foo            2000  [0.5, 3.4, 1.3]   5.220000
        1    test            3000  [0.3, 6.2, 2.6]  23.089996

        Parameters
        ----------
        query: list/np.ndarray/str/PIL.Image.Image, default None
            The targetted vector to search for.

            - *default None*.
            Acceptable types are: list, np.ndarray, PIL.Image.Image

            - If None then the select/where/limit clauses are applied to filter
            the table
        vector_column_name: str, optional
            The name of the vector column to search.

            The vector column needs to be a pyarrow fixed size list type

            - If not specified then the vector column is inferred from
            the table schema

            - If the table has multiple vector columns then the *vector_column_name*
            needs to be specified. Otherwise, an error is raised.
        query_type: str
            *default "auto"*.
            Acceptable types are: "vector", "fts", "hybrid", or "auto"

            - If "auto" then the query type is inferred from the query;

                - If `query` is a list/np.ndarray then the query type is
                "vector";

                - If `query` is a PIL.Image.Image then either do vector search,
                or raise an error if no corresponding embedding function is found.

            - If `query` is a string, then the query type is "vector" if the
            table has embedding functions else the query type is "fts"

        Returns
        -------
        LanceQueryBuilder
            A query builder object representing the query.
            Once executed, the query returns

            - selected columns

            - the vector

            - and also the "_distance" column which is the distance between the query
            vector and the returned vector.
        """
        raise NotImplementedError

    @abstractmethod
    def take_offsets(
        self, offsets: list[int], *, with_row_id: bool = False
    ) -> LanceTakeQueryBuilder:
        """
        Take a list of offsets from the table.

        Offsets are 0-indexed and relative to the current version of the table.  Offsets
        are not stable.  A row with an offset of N may have a different offset in a
        different version of the table (e.g. if an earlier row is deleted).

        Offsets are mostly useful for sampling as the set of all valid offsets is easily
        known in advance to be [0, len(table)).

        No guarantees are made regarding the order in which results are returned.  If
        you desire an output order that matches the order of the given offsets, you will
        need to add the row offset column to the output and align it yourself.

        Parameters
        ----------
        offsets: list[int]
            The offsets to take.

        Returns
        -------
        pa.RecordBatch
            A record batch containing the rows at the given offsets.
        """

    def __getitems__(self, offsets: list[int]) -> pa.RecordBatch:
        """
        Take a list of offsets from the table and return as a record batch.

        This method uses the `take_offsets` method to take the rows.  However, it
        aligns the offsets to the passed in offsets.  This means the return type
        is a record batch (and so users should take care not to pass in too many
        offsets)

        Note: this method is primarily intended to fulfill the Dataset contract
        for pytorch.

        Parameters
        ----------
        offsets: list[int]
            The offsets to take.

        Returns
        -------
        pa.RecordBatch
            A record batch containing the rows at the given offsets.
        """
        # We don't know the order of the results at all.  So we calculate a permutation
        # for ordering the given offsets.  Then we load the data with the _rowoffset
        # column.  Then we sort by _rowoffset and apply the inverse of the permutation
        # that we calculated.
        #
        # Note: this is potentially a lot of memory copy if we're operating on large
        # batches :(
        num_offsets = len(offsets)
        indices = list(range(num_offsets))
        permutation = sorted(indices, key=lambda idx: offsets[idx])
        permutation_inv = [0] * num_offsets
        for i in range(num_offsets):
            permutation_inv[permutation[i]] = i

        columns = self.schema.names
        columns.append("_rowoffset")
        tbl = (
            self.take_offsets(offsets)
            .select(columns)
            .to_arrow()
            .sort_by("_rowoffset")
            .take(permutation_inv)
            .combine_chunks()
            .drop_columns(["_rowoffset"])
        )

        return tbl

    @abstractmethod
    def take_row_ids(
        self, row_ids: list[int], *, with_row_id: bool = False
    ) -> LanceTakeQueryBuilder:
        """
        Take a list of row ids from the table.

        Row ids are not stable and are relative to the current version of the table.
        They can change due to compaction and updates.

        No guarantees are made regarding the order in which results are returned.  If
        you desire an output order that matches the order of the given ids, you will
        need to add the row id column to the output and align it yourself.

        Unlike offsets, row ids are not 0-indexed and no assumptions should be made
        about the possible range of row ids.  In order to use this method you must
        first obtain the row ids by scanning or searching the table.

        Even so, row ids are more stable than offsets and can be useful in some
        situations.

        There is an ongoing effort to make row ids stable which is tracked at
        https://github.com/lancedb/lancedb/issues/1120

        Parameters
        ----------
        row_ids: list[int]
            The row ids to take.

        Returns
        -------
        AsyncTakeQuery
            A query object that can be executed to get the rows.
        """

    @abstractmethod
    def _execute_query(
        self,
        query: Query,
        *,
        batch_size: Optional[int] = None,
        timeout: Optional[timedelta] = None,
    ) -> pa.RecordBatchReader: ...

    @abstractmethod
    def _explain_plan(self, query: Query, verbose: Optional[bool] = False) -> str: ...

    @abstractmethod
    def _analyze_plan(self, query: Query) -> str: ...

    @abstractmethod
    def _output_schema(self, query: Query) -> pa.Schema: ...

    @abstractmethod
    def _do_merge(
        self,
        merge: LanceMergeInsertBuilder,
        new_data: DATA,
        on_bad_vectors: OnBadVectorsType,
        fill_value: float,
    ) -> MergeResult: ...

    @abstractmethod
    def delete(self, where: str) -> DeleteResult:
        """Delete rows from the table.

        This can be used to delete a single row, many rows, all rows, or
        sometimes no rows (if your predicate matches nothing).

        Parameters
        ----------
        where: str
            The SQL where clause to use when deleting rows.

            - For example, 'x = 2' or 'x IN (1, 2, 3)'.

            The filter must not be empty, or it will error.

        Returns
        -------
        DeleteResult
            An object containing the new version number of the table after deletion.

        Examples
        --------
        >>> import lancedb
        >>> data = [
        ...    {"x": 1, "vector": [1.0, 2]},
        ...    {"x": 2, "vector": [3.0, 4]},
        ...    {"x": 3, "vector": [5.0, 6]}
        ... ]
        >>> db = lancedb.connect("./.lancedb")
        >>> table = db.create_table("my_table", data)
        >>> table.to_pandas()
           x      vector
        0  1  [1.0, 2.0]
        1  2  [3.0, 4.0]
        2  3  [5.0, 6.0]
        >>> table.delete("x = 2")
        DeleteResult(version=2)
        >>> table.to_pandas()
           x      vector
        0  1  [1.0, 2.0]
        1  3  [5.0, 6.0]

        If you have a list of values to delete, you can combine them into a
        stringified list and use the `IN` operator:

        >>> to_remove = [1, 5]
        >>> to_remove = ", ".join([str(v) for v in to_remove])
        >>> to_remove
        '1, 5'
        >>> table.delete(f"x IN ({to_remove})")
        DeleteResult(version=3)
        >>> table.to_pandas()
           x      vector
        0  3  [5.0, 6.0]
        """
        raise NotImplementedError

    @abstractmethod
    def update(
        self,
        where: Optional[str] = None,
        values: Optional[dict] = None,
        *,
        values_sql: Optional[Dict[str, str]] = None,
    ) -> UpdateResult:
        """
        This can be used to update zero to all rows depending on how many
        rows match the where clause. If no where clause is provided, then
        all rows will be updated.

        Either `values` or `values_sql` must be provided. You cannot provide
        both.

        Parameters
        ----------
        where: str, optional
            The SQL where clause to use when updating rows. For example, 'x = 2'
            or 'x IN (1, 2, 3)'. The filter must not be empty, or it will error.
        values: dict, optional
            The values to update. The keys are the column names and the values
            are the values to set.
        values_sql: dict, optional
            The values to update, expressed as SQL expression strings. These can
            reference existing columns. For example, {"x": "x + 1"} will increment
            the x column by 1.

        Returns
        -------
        UpdateResult
            - rows_updated: The number of rows that were updated
            - version: The new version number of the table after the update

        Examples
        --------
        >>> import lancedb
        >>> import pandas as pd
        >>> data = pd.DataFrame({"x": [1, 2, 3], "vector": [[1.0, 2], [3, 4], [5, 6]]})
        >>> db = lancedb.connect("./.lancedb")
        >>> table = db.create_table("my_table", data)
        >>> table.to_pandas()
           x      vector
        0  1  [1.0, 2.0]
        1  2  [3.0, 4.0]
        2  3  [5.0, 6.0]
        >>> table.update(where="x = 2", values={"vector": [10.0, 10]})
        UpdateResult(rows_updated=1, version=2)
        >>> table.to_pandas()
           x        vector
        0  1    [1.0, 2.0]
        1  3    [5.0, 6.0]
        2  2  [10.0, 10.0]
        >>> table.update(values_sql={"x": "x + 1"})
        UpdateResult(rows_updated=3, version=3)
        >>> table.to_pandas()
           x        vector
        0  2    [1.0, 2.0]
        1  4    [5.0, 6.0]
        2  3  [10.0, 10.0]
        """
        raise NotImplementedError

    @abstractmethod
    def cleanup_old_versions(
        self,
        older_than: Optional[timedelta] = None,
        *,
        delete_unverified: bool = False,
    ) -> "CleanupStats":
        """
        Clean up old versions of the table, freeing disk space.

        Parameters
        ----------
        older_than: timedelta, default None
            The minimum age of the version to delete. If None, then this defaults
            to two weeks.
        delete_unverified: bool, default False
            Because they may be part of an in-progress transaction, files newer
            than 7 days old are not deleted by default. If you are sure that
            there are no in-progress transactions, then you can set this to True
            to delete all files older than `older_than`.

        Returns
        -------
        CleanupStats
            The stats of the cleanup operation, including how many bytes were
            freed.

        See Also
        --------
        [Table.optimize][lancedb.table.Table.optimize]: A more comprehensive
            optimization operation that includes cleanup as well as other operations.

        Notes
        -----
        This function is not available in LanceDb Cloud (since LanceDB
        Cloud manages cleanup for you automatically)
        """

    @abstractmethod
    def compact_files(self, *args, **kwargs):
        """
        Run the compaction process on the table.
        This can be run after making several small appends to optimize the table
        for faster reads.

        Arguments are passed onto Lance's
        [compact_files][lance.dataset.DatasetOptimizer.compact_files].
        For most cases, the default should be fine.

        See Also
        --------
        [Table.optimize][lancedb.table.Table.optimize]: A more comprehensive
            optimization operation that includes cleanup as well as other operations.

        Notes
        -----
        This function is not available in LanceDB Cloud (since LanceDB
        Cloud manages compaction for you automatically)
        """

    @abstractmethod
    def optimize(
        self,
        *,
        cleanup_older_than: Optional[timedelta] = None,
        delete_unverified: bool = False,
        retrain: bool = False,
    ):
        """
        Optimize the on-disk data and indices for better performance.

        Modeled after ``VACUUM`` in PostgreSQL.

        Optimization covers three operations:

Python API Reference

Installation

Connections (Synchronous)

lancedb.connect

lancedb.db.DBConnection

list_namespaces

create_namespace

drop_namespace

describe_namespace

list_tables

table_names abstractmethod

create_table abstractmethod

open_table

drop_table

rename_table

drop_database

drop_all_tables

Tables (Synchronous)

lancedb.table.Table

table_names `abstractmethod`

create_table `abstractmethod`