In the previous post, we covered LangChain’s core components: Models, Messages, Prompts, Tools, and Memory. Each component plays a specific role — models reason, tools execute, memory retains context.
Now it’s time to put them together.
This post walks through building a complete multimodal AI agent application: AI Private Chef. You’ll take a photo of your fridge, and the agent identifies the ingredients, searches for recipes, and ranks them by nutrition and ease of cooking.
We’ll go from a raw Jupyter prototype to a LangGraph-deployed agent, and finally to a production-ready FastAPI application with Alibaba Cloud OSS for file uploads.
What Are We Building?
AI Private Chef is a recipe recommendation agent powered by multimodal AI. The user uploads a photo of ingredients (or types a list), and the agent:
Identifies ingredients from the image
Searches the web for matching recipes
Scores and ranks recipes by nutrition and difficulty
Returns a structured recommendation report
Feature Overview
Feature
Description
📸 Image Recognition
Upload a food photo, auto-identify ingredients
🔍 Smart Search
Search recipes based on identified ingredients
🍽️ Smart Ranking
Sort recipes by nutrition score and difficulty
💡 Creative Suggestions
Suggest creative pairings when no exact match found
💬 Chat Interface
Conversational UI supporting image + text input
Architecture Overview
The application follows a straightforward pipeline:
1 2 3 4 5 6 7 8 9 10 11
User (image + text) ↓ Multimodal Model (qwen3.5-plus) ↓ Identify Ingredients ↓ Tavily Web Search ↓ Score & Rank Recipes ↓ Structured Report Output
For the production deployment, the architecture expands to include file uploads and persistence:
Before writing any code, let’s pin down the key design decisions.
Model Selection
Since the agent must process images, we need a multimodal model that supports image input. A good choice is qwen3.5-plus from Alibaba Cloud’s DashScope platform — it supports images, text, audio, and video.
We access it through the OpenAI-compatible API provided by DashScope, using init_chat_model with model_provider="openai".
Tool Selection
The agent needs to search recipes on the web. Tavily is a search API optimized for AI agents — it returns structured results that are easy for models to parse.
LangChain provides built-in integration via langchain-tavily.
Memory Strategy
For the prototype, we use SqliteSaver as the checkpointer to persist conversation memory. Each conversation is identified by a thread_id.
When deploying with LangGraph, memory is handled automatically — no need to add a checkpointer yourself.
System Prompt Design
The system prompt is the single most important factor in agent behavior. After testing, here’s what works well:
1 2 3 4 5 6 7 8 9
system_prompt = """ You are a private chef. When receiving a user's ingredient photo or list, follow these steps: 1. Identify and evaluate ingredients: If the user provides a photo, identify all visible ingredients first. Assess freshness and available quantity based on appearance, then compile a "Current Available Ingredients List". 2. Smart recipe search: Prioritize calling the web_search tool using the "Available Ingredients List" as core keywords to find feasible recipes. 3. Multi-dimensional evaluation and ranking: Score candidate recipes from two dimensions — nutritional value and preparation difficulty — then rank by score. Simpler and more nutritious recipes rank higher. 4. Structured output: Organize the ranked recipes into a clear recommendation report including recipe information, scores, reasons for recommendation, and reference images. Strictly follow the process. Prioritize calling the web_search tool to search for recipes. Only improvise when search returns no results. """
Key design points:
Step-by-step instructions — the model follows a clear pipeline rather than improvising
Prioritize tool use — explicitly tell the agent to search first, not guess
Structured output — request a report format so results are readable
Prototyping in Jupyter
Let’s validate the agent logic in a Jupyter notebook first.
# Load environment variables from dotenv import load_dotenv
load_dotenv()
Defining Tools
1 2 3 4 5 6 7
from langchain_tavily import TavilySearch
# Web search tool using Tavily web_search = TavilySearch( max_results=5, topic="general", )
That’s it — one line to define a web search tool. Tavily handles the API calls, result parsing, and formatting.
Initializing the Multimodal Model
1 2 3 4 5 6 7 8 9
from langchain.chat_models import init_chat_model import os
model = init_chat_model( model="qwen3.5-plus", # Multimodal model supporting images model_provider="openai", # DashScope is OpenAI-compatible base_url=os.getenv("DASHSCOPE_BASE_URL"), api_key=os.getenv("DASHSCOPE_API_KEY") )
Since DashScope is not a LangChain-native provider, we set model_provider="openai" to use the OpenAI-compatible API format, and manually provide base_url and api_key.
Memory Management
1 2 3 4 5 6 7 8 9
from langgraph.checkpoint.sqlite import SqliteSaver import sqlite3
system_prompt = """ You are a private chef. When receiving a user's ingredient photo or list, follow these steps: 1. Identify and evaluate ingredients: If the user provides a photo, identify all visible ingredients first. Assess freshness and available quantity based on appearance, then compile a "Current Available Ingredients List". 2. Smart recipe search: Prioritize calling the web_search tool using the "Available Ingredients List" as core keywords to find feasible recipes. 3. Multi-dimensional evaluation and ranking: Score candidate recipes from two dimensions — nutritional value and preparation difficulty — then rank by score. Simpler and more nutritious recipes rank higher. 4. Structured output: Organize the ranked recipes into a clear recommendation report including recipe information, scores, reasons for recommendation, and reference images. Strictly follow the process. Prioritize calling the web_search tool to search for recipes. Only improvise when search returns no results. """
# Prepare multimodal message — online image URL + text prompt multimodal_message = HumanMessage( content=[ {"type": "image", "url": "https://img.freepik.com/free-photo/arrangement-different-foods-organized-fridge_23-2149099882.jpg"}, {"type": "text", "text": "What can I make with these ingredients?"} ])
================================ Human Message ================================= [{'type': 'image', 'url': 'https://img.freepik.com/...'}, {'type': 'text', 'text': 'What can I make with these ingredients?'}] ================================== Ai Message ================================== These ingredients are very rich... I will search for recipes based on these ingredients. Tool Calls: tavily_search (call_5f30eac22b4b4f8c8b7927) Args: query: recipes with mushrooms, tomatoes, salmon, chicken breast, broccoli... ================================= Tool Message ================================= Name: tavily_search {"query": "recipes with mushrooms, tomatoes, salmon...", "results": [...]} ================================== Ai Message ================================== ### 1. Salmon Broccoli Roast - Ingredients: salmon, broccoli, red pepper, mushrooms, carrots - Nutritional value: 9/10, Difficulty: 3/10 - ...
response = agent.invoke( {"messages": [HumanMessage(content="I like the first recipe. Can you give me more details?")]}, config )
response['messages'][-1].pretty_print()
The agent remembers the previous conversation and expands on the first recipe with detailed steps, tips, and nutritional information — proof that the checkpointer is working.
Deploying with LangGraph and LangSmith
The prototype works. Now let’s deploy it professionally.
LangChain agents are built on LangGraph under the hood, which provides built-in server deployment with a full REST API — no extra work needed. And LangSmith adds a GUI for debugging, monitoring, and one-click cloud deployment.
# 3. Multimodal model model = init_chat_model( model="qwen3.5-plus", model_provider="openai", base_url=os.getenv("DASHSCOPE_BASE_URL"), api_key=os.getenv("DASHSCOPE_API_KEY") )
# 4. System prompt system_prompt = """ You are a private chef. When receiving a user's ingredient photo or list, follow these steps: 1. Identify and evaluate ingredients: If the user provides a photo, identify all visible ingredients first. Based on their appearance, assess freshness and available quantity, then compile a "Current Available Ingredients List". 2. Smart recipe search: Prioritize calling the web_search tool using the "Available Ingredients List" as core keywords to find feasible recipes. 3. Multi-dimensional evaluation and ranking: Score candidate recipes on two dimensions — nutritional value and preparation difficulty — then rank by score. Simpler and more nutritious recipes rank higher. 4. Structured output: Organize the ranked recipes into a clear recommendation report including recipe information, scores, reasons for recommendation, and reference images. Strictly follow the process. Prioritize calling the web_search tool. Only improvise when search returns no results. """
LangSmith Studio gives you a GUI chat interface to test your agent interactively. You can:
Chat with the agent — send messages and images directly
Inspect execution traces — see every model call, tool invocation, and intermediate step
Debug issues — trace exactly where the agent goes wrong
LangSmith also offers one-click cloud deployment, but it requires a paid plan. For most developers, using LangSmith for testing and debugging during development is sufficient — deploy elsewhere for production.
Building the Production Application
The LangGraph deployment is great for testing, but real users need a custom frontend and proper file handling. Two issues remain:
Image handling: Base64 encoding images into messages is memory-heavy and slow
User experience: No custom frontend
Why Not Base64? The OSS Approach
When sending images to multimodal models, you have two options:
Approach
How It Works
Pros
Cons
Base64
Encode image bytes into the message
Simple, no external services
High memory usage, bloated messages, slow
OSS URL
Upload image to Object Storage, send URL
Clean messages, CDN-friendly, scalable
Needs OSS setup
The standard production approach:
Frontend requests a presigned upload URL from the server
Frontend uploads the file directly to OSS (no server throughput)
Frontend sends the OSS URL to the agent
1 2 3 4
Frontend → Server (request upload URL) Frontend → OSS (upload file directly) Frontend → Server (send message with OSS URL) Server → Agent (process with image URL)
from langchain.chat_models import init_chat_model from langchain_tavily import TavilySearch from langchain.agents import create_agent import os from langgraph.checkpoint.sqlite import SqliteSaver import sqlite3
# Load environment variables from dotenv import load_dotenv load_dotenv()
# Web search tool web_search = TavilySearch( max_results=5, topic="general" )
# Multimodal model model = init_chat_model( model="qwen3.5-plus", model_provider="openai", base_url=os.getenv("DASHSCOPE_BASE_URL"), api_key=os.getenv("DASHSCOPE_API_KEY") )
# System prompt system_prompt = """ You are a private chef. When receiving a user's ingredient photo or list, follow these steps: 1. Identify and evaluate ingredients: If the user provides a photo, identify all visible ingredients first. Based on appearance, assess freshness and available quantity, then compile a "Current Available Ingredients List". 2. Smart recipe search: Prioritize calling the web_search tool using the "Available Ingredients List" as core keywords to find feasible recipes. 3. Multi-dimensional evaluation and ranking: Score candidate recipes on two dimensions — nutritional value and preparation difficulty — then rank by score. Simpler and more nutritious recipes rank higher. 4. Structured output: Organize the ranked recipes into a clear recommendation report including recipe information, scores, reasons for recommendation, and reference images. Strictly follow the process. Prioritize calling the web_search tool. Only improvise when search returns no results. """
from langchain_core.messages import HumanMessage, AIMessageChunk, AIMessage from langchain_tavily import TavilySearch from langchain.agents import create_agent from app.common.logger import logger import os from langgraph.checkpoint.sqlite import SqliteSaver import sqlite3
# Load environment variables from dotenv import load_dotenv load_dotenv()
# Web search tool tavily = TavilySearch( max_results=5, topic="general" )
# Multimodal model model = init_chat_model( model="qwen3-omni-flash", model_provider="openai", base_url=os.getenv("DASHSCOPE_BASE_URL"), api_key=os.getenv("DASHSCOPE_API_KEY") )
# System prompt system_prompt = """ You are a private chef. When receiving a user's ingredient photo or list, follow these steps: 1. Identify and evaluate ingredients: If the user provides a photo, identify all visible ingredients first. Based on appearance, assess freshness and available quantity, then compile a "Current Available Ingredients List". 2. Smart recipe search: Prioritize calling the web_search tool using the "Available Ingredients List" as core keywords to find feasible recipes. 3. Multi-dimensional evaluation and ranking: Score candidate recipes on two dimensions — nutritional value and preparation difficulty — then rank by score. Simpler and more nutritious recipes rank higher. 4. Structured output: Organize the ranked recipes into a clear recommendation report including recipe information, scores, reasons for recommendation, and reference images. Strictly follow the process. Prioritize calling the web_search tool. Only improvise when search returns no results. """
from fastapi import APIRouter from app.models.schemas import ChatRequest from fastapi.responses import StreamingResponse from app.agents.personal_chief import search_recipes, get_messages, clear_messages
Visit http://localhost:8001. You should be able to:
Upload an ingredient photo (it goes to OSS first)
See the image preview
Send the message and receive a streamed recipe recommendation
Continue the conversation with follow-up questions
Please see below how it works like:
Best Practices
Area
Recommendation
Image handling
Upload to OSS first, send URLs to the model — never embed base64 in messages
Memory
Use SqliteSaver for development, PostgresSaver for production
System prompt
Be explicit about tool priority — tell the agent to search first, guess second
Tool design
Keep tool descriptions concise; wrap complex tools like TavilySearch in simpler @tool functions
OSS security
Use presigned URLs for uploads; keep buckets private in production; add CORS rules during development
LangSmith
Use it during development for tracing and debugging; disable LANGSMITH_TRACING in production to reduce overhead
Error handling
Catch exceptions in streaming generators and yield user-friendly fallback messages
FAQ
Q: Why qwen3.5-plus instead of GPT-4o or Claude? A: qwen3.5-plus is a strong multimodal model available through Alibaba Cloud’s DashScope API. It’s cost-effective for Chinese-language use cases and supports images out of the box. You can swap models easily — just change the model parameter in init_chat_model.
Q: Can I use a different search tool instead of Tavily? A: Yes. LangChain supports various search integrations (SerpAPI, Google Search, etc.). Tavily is recommended because it’s optimized for AI agents — results are clean, structured, and include relevance scores.
Q: What happens when the conversation gets too long? A: The context window has a limit. For long conversations, use LangChain’s SummarizationMiddleware to automatically summarize old messages while keeping recent ones intact. See the LangChain documentation on memory management strategies.
Q: Why not just deploy with LangGraph cloud? A: LangGraph cloud deployment is convenient but expensive. For production, deploying the FastAPI server on your own infrastructure (with Docker) gives you more control and lower costs.
Q: Is the OSS bucket really safe with public read? A: No! Public read means anyone can access uploaded files. Only use it during development. In production, keep buckets private and serve files through a CDN or generate time-limited signed URLs.
Summary
We built a complete multimodal AI agent from scratch:
Prototype — Validated the pipeline in Jupyter using create_agent, a multimodal model, Tavily search, and SqliteSaver
LangGraph deployment — Deployed with langgraph dev and tested via LangSmith Studio
Production app — Built a FastAPI gateway with OSS file uploads, streaming responses, and persistent memory
Key takeaways:
A working agent needs model + tools + prompt + memory — nothing more
System prompts are the most impactful lever for controlling agent behavior
OSS uploads are the right way to handle images in production — not base64
LangSmith is invaluable for debugging during development, but you’ll want your own deployment for production