请求协助构建咖啡店推荐RAG管道并部署至Google Cloud Run

阿华AIGC实验室

2026-4-8

Hey there! Let’s work through building your LA coffee shop recommendation RAG pipeline, wrapping it in a Flask app, containerizing it, and deploying to Google Cloud Run. I’ll break this down into actionable steps tailored to your goal of outputting a single shop with name, address, and website based on the Combined Review Text column.

一、RAG Pipeline 核心构建

1. 数据集预处理

First, clean up your dataset to ensure critical fields (reviews, shop name, address, website) are free of missing values. Use Pandas for this quick cleanup:

import pandas as pd
import os

# Load your dataset (replace with your file path)
df = pd.read_csv("la_coffee_shops.csv")
# Drop rows with missing key info
df = df.dropna(subset=["Name", "Address", "Website", "Combined Review Text"])

2. 文本嵌入与向量存储

Turn the review text into numerical vectors for semantic search. You can use an open-source model like Sentence-BERT to avoid API costs, and a vector database like Pinecone (or FAISS for local testing):

from sentence_transformers import SentenceTransformer
from pinecone import Pinecone, ServerlessSpec

# Initialize embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
# Set up Pinecone (use your API key from environment variables)
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
# Create or connect to your index
if "coffee-shops-rag" not in pc.list_indexes().names():
    pc.create_index(
        name="coffee-shops-rag",
        dimension=384,  # Matches all-MiniLM-L6-v2's output
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )
index = pc.Index("coffee-shops-rag")

# Batch embed and store shop data
for _, row in df.iterrows():
    review_embedding = embedding_model.encode(row["Combined Review Text"]).tolist()
    index.upsert(
        vectors=[(
            row["Name"], 
            review_embedding, 
            {
                "address": row["Address"],
                "website": row["Website"],
                "reviews": row["Combined Review Text"]
            }
        )]
    )

3. 检索+生成逻辑

Build a function that takes a user query, finds the most relevant coffee shop via vector search, then uses an LLM to format the recommendation properly:

from openai import OpenAI

def generate_recommendation(user_query):
    # Create embedding for the user's query
    query_embedding = embedding_model.encode(user_query).tolist()
    # Fetch the top 1 most relevant coffee shop
    search_results = index.query(vector=query_embedding, top_k=1, include_metadata=True)
    top_shop = search_results['matches'][0]
    
    # Craft a prompt for the LLM to format the output
    prompt = f"""
    Based on the user's request: "{user_query}"
    Recommend this coffee shop in a clear, concise way with only these details:
    - Shop Name: {top_shop['id']}
    - Address: {top_shop['metadata']['address']}
    - Website: {top_shop['metadata']['website']}
    
    Don't add extra commentary—stick strictly to the requested format.
    """
    
    # Call LLM (use your OpenAI API key from env vars)
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

二、用Flask封装成API

Wrap the RAG logic in a simple Flask app to create a web endpoint:

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route("/recommend-coffee", methods=["POST"])
def recommend_coffee():
    request_data = request.get_json()
    user_query = request_data.get("query")
    
    if not user_query:
        return jsonify({"error": "Please provide a query (e.g., 'quiet shops for working')"}), 400
    
    try:
        recommendation = generate_recommendation(user_query)
        return jsonify({"recommendation": recommendation})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    # Use PORT env var for Cloud Run compatibility
    app.run(host="0.0.0.0", port=int(os.getenv("PORT", 8080)))

三、容器化（Docker）

Create a Dockerfile to package your app:

# Use a lightweight Python base image
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Copy requirements first for better caching
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy all app code
COPY . .

# Set default port (matches Cloud Run's expected PORT env var)
ENV PORT 8080

# Start the Flask app
CMD ["python", "app.py"]

Create a requirements.txt with all your dependencies:

flask==2.3.3
pandas==2.1.4
sentence-transformers==2.2.2
pinecone-client==2.2.4
openai==1.3.7

四、部署到Google Cloud Run

Initialize Google Cloud SDK:
```
gcloud init
```
Build and push the Docker image (replace PROJECT_ID with your GCP project ID):
```
gcloud builds submit --tag gcr.io/PROJECT_ID/coffee-shop-recommender
```

Deploy to Cloud Run:

gcloud run deploy coffee-shop-recommender \
    --image gcr.io/PROJECT_ID/coffee-shop-recommender \
    --platform managed \
    --region us-central1 \
    --allow-unauthenticated  # Disable this in production for security

After deployment, you’ll get a public URL to test your API!

Quick Tips

Vector Storage Alternative: If you don’t want to use Pinecone, use FAISS for local testing or Google Cloud Vertex AI Vector Search for managed deployment.
Open-Source LLM Option: Replace OpenAI with Llama 2 or Mistral deployed on Google Cloud Vertex AI to avoid API costs.
Security: Never hardcode API keys—store them as Cloud Run environment variables via the GCP Console or gcloud run deploy --set-env-vars.

内容来源于stack exchange