You need to enable JavaScript to run this app.
导航

Python API

最近更新时间2024.03.14 11:22:40

首次发布时间2024.03.14 11:22:40

理论上 ByteHouse 支持的 python sdk 都可以使用。这里以 clickhouse_connect 为例说明如何通过 python 进行 vector search 相关操作

建立连接

from clickhouse_connect import get_client
client = get_client(host="server", # server ip
                         port=9000,  # server port
                         user="test", # user
                         password="password", # password
                         compress='zstd', # compress method, zstd recommanded
                         send_receive_timeout=1000) # connect timeout

建表

schema = f"""\
                CREATE TABLE IF NOT EXISTS {database}.{table}(
                    id UInt64,
                    embedding Array(Float32),
                    CONSTRAINT cons_vec_len CHECK length(embedding) = {dim},
                    INDEX vec_idx embedding TYPE HNSW('METRIC={metric.upper()}, DIM={dim}')
                ) ENGINE = {engine} ORDER BY id\
                """


client.command(self.schema) 

插入向量

#  embeddings(list[list[float]]): list of embeddings
#  ids(list[int]): list of ids
data = zip(ids, embeddings)
values = [list(elem) for elem in data]
client.insert(f'{database}.{table}', values, column_names=['id', 'embedding'],
      column_type_names=['UInt64', 'Array(Float32)'])

查询

# query: list[float]
q_str = f"""
        SELECT id
        FROM {database}.{collection}
        ORDER BY {metric}Distance(embedding, {str(query)}) 
        LIMIT {k}
        settings enable_new_ann=1, hnsw_ef_s={search_param["ef"]}
        """
        
results = client.query(q_str)
result_ids = [int(id) for id in results.result_columns[0]]