如何用自然语言构建视觉模型和查询的现场图像搜索

在本博客中,我们将构建实时图像搜索,并用自然语言查询它,例如,您可以搜索“大象”或“可爱的动物”作为输入的图像列表。

我们将使用多式嵌入模型来理解和嵌入图像,并构建一个矢量索引,以便有效检索。我们将使用CocoIndex来构建索引流程,这是一个超高性能的实时数据转换框架。

这对我们来说意味着很多,如果你能放下一颗星CocoIndex 在 GitHub 上,如果这个教程是有用的。

技术

可口可乐

可口可乐是AI的高性能实时数据转换框架。

/ 14

/ 14它是一个强大的视觉语言模型,可以理解图像和文本,它被训练在共享嵌入空间中对视觉和文本表示进行对齐,使其完美适合我们的图像搜索用例。

在我们的项目中,我们使用Clip来:

直接生成图像嵌入式
将自然语言搜索查询转换为相同的嵌入空间
通过比较查询嵌入与标题嵌入来启用语义搜索

是一个高性能的矢量数据库,我们使用它来存储和查询嵌入式。

快速

快速是基于标准Python类型提示的现代,快速(高性能)的Web框架,用于构建Python 3.7+的API。

前提条件

安装 Postgres. CocoIndex 使用 Postgres 来跟踪数据线程以进行增量处理。
安装Qdrant。

定义索引流量

流程设计

流程图说明了我们将如何处理我们的代码库:

从本地文件系统读取图像文件
使用 CLIP 来理解和嵌入图像
将嵌入式存储在矢量数据库中以进行检索

1、插入图像。

@cocoindex.flow_def(name="ImageObjectEmbedding")
def image_object_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    data_scope["images"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="img", included_patterns=["*.jpg", "*.jpeg", "*.png"], binary=True),
        refresh_interval=datetime.timedelta(minutes=1)  # Poll for changes every 1 minute
    )
    img_embeddings = data_scope.add_collector()

flow_builder.add_source将创建一个带子字段的表(filename,content) ,我们可以提到文档为了更多细节。

2、处理每个图像并收集信息。

2.1 使用 CLIP 插入图像

@functools.cache
def get_clip_model() -> tuple[CLIPModel, CLIPProcessor]:
    model = CLIPModel.from_pretrained(CLIP_MODEL_NAME)
    processor = CLIPProcessor.from_pretrained(CLIP_MODEL_NAME)
    return model, processor

该@functools.cache在这种情况下,它确保我们只加载 CLIP 模型和处理器一次。

@cocoindex.op.function(cache=True, behavior_version=1, gpu=True)
def embed_image(img_bytes: bytes) -> cocoindex.Vector[cocoindex.Float32, Literal[384]]:
    """
    Convert image to embedding using CLIP model.
    """
    model, processor = get_clip_model()
    image = Image.open(io.BytesIO(img_bytes)).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    with torch.no_grad():
        features = model.get_image_features(**inputs)
    return features[0].tolist()

embed_image是一个自定义函数,使用CLIP模型将图像转换为矢量嵌入式,它接受字节格式的图像数据,并返回代表图像嵌入的浮点数列表。

该功能支持通过cache参数. 当启用时,执行器将存储函数的结果,以便在重新处理过程中重复使用,这对于计算密集型操作尤其有用。文档.

然后我们将处理每个图像并收集信息。

with data_scope["images"].row() as img:
    img["embedding"] = img["content"].transform(embed_image)
    img_embeddings.collect(
        id=cocoindex.GeneratedField.UUID,
        filename=img["filename"],
        embedding=img["embedding"],
    )

2.3 收集嵌入式

将嵌入式导出到Qdrant中的表格。

img_embeddings.export(
    "img_embeddings",
    cocoindex.storages.Qdrant(
        collection_name="image_search",
        grpc_url=QDRANT_GRPC_URL,
    ),
    primary_key_fields=["id"],
    setup_by_user=True,
)

3、要索引

使用 CLIP 嵌入查询,该查询将文本和图像映射到相同的嵌入空间中,允许跨模式相似性搜索。

def embed_query(text: str) -> list[float]:
    model, processor = get_clip_model()
    inputs = processor(text=[text], return_tensors="pt", padding=True)
    with torch.no_grad():
        features = model.get_text_features(**inputs)
    return features[0].tolist()

定义 FastAPI 终端/search它执行了语义图像搜索。

@app.get("/search")
def search(q: str = Query(..., description="Search query"), limit: int = Query(5, description="Number of results")):
    # Get the embedding for the query
    query_embedding = embed_query(q)
    
    # Search in Qdrant
    search_results = app.state.qdrant_client.search(
        collection_name="image_search",
        query_vector=("embedding", query_embedding),
        limit=limit
    )

这会搜索Qdrant矢量数据库以获取类似的嵌入。limit结果

# Format results
out = []
for result in search_results:
    out.append({
        "filename": result.payload["filename"],
        "score": result.score
    })
return {"results": out}

这个终端允许语义图像搜索,用户可以通过用自然语言描述图像,而不是使用准确的关键字匹配来找到图像。

应用

快速火

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)
# Serve images from the 'img' directory at /img
app.mount("/img", StaticFiles(directory="img"), name="img")

FastAPI 应用程序设置与 CORS 中间软件和静态文件服务应用程序配置为:

允许来自任何来源的交叉请求
从“img”目录中服务静态图像文件
处理图像搜索功能的 API 终端

@app.on_event("startup")
def startup_event():
    load_dotenv()
    cocoindex.init()
    # Initialize Qdrant client
    app.state.qdrant_client = QdrantClient(
        url=QDRANT_GRPC_URL,
        prefer_grpc=True
    )
    app.state.live_updater = cocoindex.FlowLiveUpdater(image_object_embedding_flow)
    app.state.live_updater.start()

启动事件处理器在首次启动时启动应用程序. 以下是每个部分的功能:

load_dotenv():从 .env 文件中加载环境变量,可用于配置,例如 API 密钥和 URL
cocoindex.init():初始化CocoIndex框架,设置必要的组件和配置
Qdrant Client Setup:
- Creates a new QdrantClient instance
- Configures it to use the gRPC URL specified in environment variables
- Enables gRPC preference for better performance
- Stores the client in the FastAPI app state for access across requests
Live Updater Setup:
- Creates a FlowLiveUpdater instance for the image_object_embedding_flow
- This enables real-time updates to the image search index
- Starts the live updater to begin monitoring for changes

这种初始化确保所有必要的组件在应用程序启动时正确配置和运行。

前线

你可以检查前端代码。这里我们故意保持专注于图像搜索功能的简单和简洁。

时间玩得开心!

Create a collection in Qdrant

curl -X PUT 'http://localhost:6333/collections/image_search' \
-H 'Content-Type: application/json' \
-d '{
    "vectors": {
    "embedding": {
        "size": 768,
        "distance": "Cosine"
    }
    }
}'

Setup indexing flow
```
cocoindex setup main.py
```
It is setup with a live updater, so you can add new files to the folder and it will be indexed within a minute.

Run backend

uvicorn main:app --reload --host 0.0.0.0 --port 8000

Run frontend
```
cd frontend
npm install
npm run dev
```

去吧http://localhost:5174二搜索

现在将另一个图像添加到img举个例子,这个可爱的蜘蛛,或您喜欢的任何图像. 等待一分钟,以便新图像进行处理和索引。

如果你想监控索引进度,你可以在CocoInsight中查看它。cocoindex server -ci main.py .

Finally - we are constantly improving, and more features and examples are coming soon. If you love this article, please give us a star ⭐ at GitHub to help us grow. Thanks for reading!

吉普赛

如何用自然语言构建视觉模型和查询的现场图像搜索

太長; 讀書

技术

可口可乐

/ 14

快速

前提条件

定义索引流量

流程设计

1、插入图像。

2、处理每个图像并收集信息。

2.1 使用 CLIP 插入图像

2.3 收集嵌入式

3、要索引

应用

快速火

前线

时间玩得开心!

About Author

標籤

这篇文章刊登在...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

如何用自然语言构建视觉模型和查询的现场图像搜索

太長; 讀書

技术

可口可乐

/ 14

快速

前提条件

定义索引流量

流程设计

1、插入图像。

2、处理每个图像并收集信息。

2.1 使用 CLIP 插入图像

2.3 收集嵌入式

3、要索引

应用

快速火

前线

时间玩得开心!

About Author

標籤

这篇文章刊登在...

相關故事

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps