请求协助构建咖啡店推荐RAG管道并部署至Google Cloud Run
Hey there! Let’s work through building your LA coffee shop recommendation RAG pipeline, wrapping it in a Flask app, containerizing it, and deploying to Google Cloud Run. I’ll break this down into actionable steps tailored to your goal of outputting a single shop with name, address, and website based on the Combined Review Text column.
一、RAG Pipeline 核心构建
1. 数据集预处理
First, clean up your dataset to ensure critical fields (reviews, shop name, address, website) are free of missing values. Use Pandas for this quick cleanup:
import pandas as pd import os # Load your dataset (replace with your file path) df = pd.read_csv("la_coffee_shops.csv") # Drop rows with missing key info df = df.dropna(subset=["Name", "Address", "Website", "Combined Review Text"])
2. 文本嵌入与向量存储
Turn the review text into numerical vectors for semantic search. You can use an open-source model like Sentence-BERT to avoid API costs, and a vector database like Pinecone (or FAISS for local testing):
from sentence_transformers import SentenceTransformer from pinecone import Pinecone, ServerlessSpec # Initialize embedding model embedding_model = SentenceTransformer('all-MiniLM-L6-v2') # Set up Pinecone (use your API key from environment variables) pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY")) # Create or connect to your index if "coffee-shops-rag" not in pc.list_indexes().names(): pc.create_index( name="coffee-shops-rag", dimension=384, # Matches all-MiniLM-L6-v2's output metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1") ) index = pc.Index("coffee-shops-rag") # Batch embed and store shop data for _, row in df.iterrows(): review_embedding = embedding_model.encode(row["Combined Review Text"]).tolist() index.upsert( vectors=[( row["Name"], review_embedding, { "address": row["Address"], "website": row["Website"], "reviews": row["Combined Review Text"] } )] )
3. 检索+生成逻辑
Build a function that takes a user query, finds the most relevant coffee shop via vector search, then uses an LLM to format the recommendation properly:
from openai import OpenAI def generate_recommendation(user_query): # Create embedding for the user's query query_embedding = embedding_model.encode(user_query).tolist() # Fetch the top 1 most relevant coffee shop search_results = index.query(vector=query_embedding, top_k=1, include_metadata=True) top_shop = search_results['matches'][0] # Craft a prompt for the LLM to format the output prompt = f""" Based on the user's request: "{user_query}" Recommend this coffee shop in a clear, concise way with only these details: - Shop Name: {top_shop['id']} - Address: {top_shop['metadata']['address']} - Website: {top_shop['metadata']['website']} Don't add extra commentary—stick strictly to the requested format. """ # Call LLM (use your OpenAI API key from env vars) client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) response = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content
二、用Flask封装成API
Wrap the RAG logic in a simple Flask app to create a web endpoint:
from flask import Flask, request, jsonify app = Flask(__name__) @app.route("/recommend-coffee", methods=["POST"]) def recommend_coffee(): request_data = request.get_json() user_query = request_data.get("query") if not user_query: return jsonify({"error": "Please provide a query (e.g., 'quiet shops for working')"}), 400 try: recommendation = generate_recommendation(user_query) return jsonify({"recommendation": recommendation}) except Exception as e: return jsonify({"error": str(e)}), 500 if __name__ == "__main__": # Use PORT env var for Cloud Run compatibility app.run(host="0.0.0.0", port=int(os.getenv("PORT", 8080)))
三、容器化(Docker)
Create a Dockerfile to package your app:
# Use a lightweight Python base image FROM python:3.11-slim # Set working directory WORKDIR /app # Copy requirements first for better caching COPY requirements.txt . # Install dependencies RUN pip install --no-cache-dir -r requirements.txt # Copy all app code COPY . . # Set default port (matches Cloud Run's expected PORT env var) ENV PORT 8080 # Start the Flask app CMD ["python", "app.py"]
Create a requirements.txt with all your dependencies:
flask==2.3.3 pandas==2.1.4 sentence-transformers==2.2.2 pinecone-client==2.2.4 openai==1.3.7
四、部署到Google Cloud Run
- Initialize Google Cloud SDK:
gcloud init - Build and push the Docker image (replace
PROJECT_IDwith your GCP project ID):gcloud builds submit --tag gcr.io/PROJECT_ID/coffee-shop-recommender - Deploy to Cloud Run:
After deployment, you’ll get a public URL to test your API!gcloud run deploy coffee-shop-recommender \ --image gcr.io/PROJECT_ID/coffee-shop-recommender \ --platform managed \ --region us-central1 \ --allow-unauthenticated # Disable this in production for security
Quick Tips
- Vector Storage Alternative: If you don’t want to use Pinecone, use FAISS for local testing or Google Cloud Vertex AI Vector Search for managed deployment.
- Open-Source LLM Option: Replace OpenAI with Llama 2 or Mistral deployed on Google Cloud Vertex AI to avoid API costs.
- Security: Never hardcode API keys—store them as Cloud Run environment variables via the GCP Console or
gcloud run deploy --set-env-vars.
内容来源于stack exchange




