LangChain4J in Action: Building a Medical AI Assistant Agent

1. What is LangChain4J?
- 1.1 The Problem LangChain4J Solves
- 1.2 Core Concepts
2. Project Setup
- 2.1 Dependencies
- 2.2 Configuration
3. Your First LLM Integration
- 3.1 Direct API Call
- 3.2 Spring Boot Auto-Configuration
4. Building the AI Service
- 4.1 The Assistant Interface
- 4.2 System Messages
5. Adding Chat Memory
6. Function Calling
7. RAG - Retrieval-Augmented Generation
8. Putting It All Together
- 8.1 Complete Agent Configuration
- 8.2 Controller Layer
9. Best Practices
10. FAQ
11. Summary

1. What is LangChain4J?

1.1 The Problem LangChain4J Solves

Building AI applications involves more than just calling an LLM API. You need to:

Manage conversation context (chat memory)
Connect to external data sources (RAG)
Execute business logic (function calling)
Handle different LLM providers

LangChain4J is a Java library that simplifies integrating Large Language Models (LLMs) into Java applications. It provides:

Feature	Purpose
Unified API	Switch between OpenAI, DeepSeek, Qwen without changing code
Chat Memory	Maintain conversation context across multiple turns
Function Calling	Let LLM call your Java methods
RAG	Connect LLM to your knowledge base
Spring Boot Integration	Auto-configuration and dependency injection

1.2 Core Concepts

User Input → AIService → LLM → Response
                ↓
         [Chat Memory]
         [Tools/Functions]
         [Knowledge Base]

Key Components:

ChatModel: Interface to LLM providers (OpenAI, DeepSeek, etc.)
AIService: High-level abstraction that orchestrates components
ChatMemory: Stores conversation history
Tools: Java methods the LLM can invoke
ContentRetriever: Fetches relevant documents for RAG

2. Project Setup

2.1 Dependencies

Create a Spring Boot project with these dependencies:

<properties>
    <java.version>17</java.version>
    <spring-boot.version>3.2.6</spring-boot.version>
    <langchain4j.version>1.0.0-beta3</langchain4j.version>
</properties>

<dependencies>
    <!-- Spring Boot Web -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- LangChain4J OpenAI (supports OpenAI, DeepSeek, etc.) -->
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-open-ai-spring-boot-starter</artifactId>
    </dependency>

    <!-- LangChain4J Core -->
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-spring-boot-starter</artifactId>
    </dependency>

    <!-- For RAG -->
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-easy-rag</artifactId>
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-bom</artifactId>
            <version>${langchain4j.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

2.2 Configuration

application.properties:

# Server
server.port=8080

# OpenAI / DeepSeek Configuration
langchain4j.open-ai.chat-model.base-url=https://api.deepseek.com
langchain4j.open-ai.chat-model.api-key=${DEEP_SEEK_API_KEY}
langchain4j.open-ai.chat-model.model-name=deepseek-chat

# Logging
langchain4j.open-ai.chat-model.log-requests=true
langchain4j.open-ai.chat-model.log-responses=true

Note: Store API keys in environment variables, never in code.

3. Your First LLM Integration

3.1 Direct API Call

Test the connection directly:

@SpringBootTest
public class LLMTest {

    @Test
    public void testDirectCall() {
        // Build model manually
        OpenAiChatModel model = OpenAiChatModel.builder()
                .apiKey("demo") // Use demo key for testing
                .modelName("gpt-4o-mini")
                .build();

        String answer = model.chat("Hello, what is LangChain4J?");
        System.out.println(answer);
    }
}

3.2 Spring Boot Auto-Configuration

With Spring Boot starter, the model is auto-configured:

@SpringBootTest
public class LLMTest {

    @Autowired
    private OpenAiChatModel chatModel;

    @Test
    public void testAutoConfig() {
        String answer = chatModel.chat("Explain RAG in simple terms");
        System.out.println(answer);
    }
}

4. Building the AI Service

4.1 The Assistant Interface

LangChain4J uses interfaces and dynamic proxies. Define what your AI can do:

package com.example.assistant;

import dev.langchain4j.service.spring.AiService;
import static dev.langchain4j.service.spring.AiServiceWiringMode.EXPLICIT;

@AiService(
    wiringMode = EXPLICIT,
    chatModel = "openAiChatModel"
)
public interface MedicalAssistant {

    String chat(String userMessage);
}

How it works:

Define an interface with methods
Annotate with @AiService
LangChain4J creates a proxy implementation
The proxy handles input/output conversion

4.2 System Messages

Define the AI’s role using @SystemMessage:

public interface MedicalAssistant {

    @SystemMessage("""
        You are "MediBot", an AI medical assistant for Peking Union Medical College Hospital.

        Your responsibilities:
        1. Provide general medical information and guidance
        2. Help patients understand symptoms and treatment options
        3. Assist with appointment scheduling when asked
        4. Answer questions about hospital departments and doctors

        Rules:
        - Always be polite and professional
        - Include appropriate medical disclaimers
        - Never provide definitive diagnoses
        - Recommend seeing a doctor for serious concerns
        """)
    String chat(String userMessage);
}

System Message is sent once at the start to set the AI’s behavior context.

5. Adding Chat Memory

5.1 Why Memory Matters

Without memory, each message is independent:

User: My name is John
AI: Nice to meet you, John!

User: What's my name?
AI: I don't know your name.  ← Problem!

5.2 Implementing Memory

@AiService(
    wiringMode = EXPLICIT,
    chatModel = "openAiChatModel",
    chatMemory = "chatMemory"  // Reference to ChatMemory bean
)
public interface MedicalAssistant {

    @SystemMessage("You are a helpful medical assistant.")
    String chat(String userMessage);
}

Configuration:

@Configuration
public class AssistantConfig {

    @Bean
    ChatMemory chatMemory() {
        // Keep last 10 messages
        return MessageWindowChatMemory.withMaxMessages(10);
    }
}

5.3 Memory Isolation

For multi-user scenarios, isolate conversations by memoryId:

@AiService(
    wiringMode = EXPLICIT,
    chatModel = "openAiChatModel",
    chatMemoryProvider = "chatMemoryProvider"
)
public interface MedicalAssistant {

    @SystemMessage("You are a helpful medical assistant.")
    String chat(
        @MemoryId Long conversationId,  // Unique per conversation
        @UserMessage String message
    );
}

Configuration:

@Bean
ChatMemoryProvider chatMemoryProvider() {
    return memoryId -> MessageWindowChatMemory.builder()
            .id(memoryId)
            .maxMessages(10)
            .build();
}

5.4 Persistent Storage with MongoDB

Store conversations in MongoDB:

@Component
public class MongoChatMemoryStore implements ChatMemoryStore {

    @Autowired
    private MongoTemplate mongoTemplate;

    @Override
    public List<ChatMessage> getMessages(Object memoryId) {
        ChatMessages stored = mongoTemplate.findOne(
            Query.query(Criteria.where("memoryId").is(memoryId)),
            ChatMessages.class
        );
        return stored == null
            ? new ArrayList<>()
            : ChatMessageDeserializer.messagesFromJson(stored.getContent());
    }

    @Override
    public void updateMessages(Object memoryId, List<ChatMessage> messages) {
        Query query = Query.query(Criteria.where("memoryId").is(memoryId));
        Update update = new Update().set("content",
            ChatMessageSerializer.messagesToJson(messages));
        mongoTemplate.upsert(query, update, ChatMessages.class);
    }
}

Update configuration:

@Bean
ChatMemoryProvider chatMemoryProvider(MongoChatMemoryStore store) {
    return memoryId -> MessageWindowChatMemory.builder()
            .id(memoryId)
            .maxMessages(20)
            .chatMemoryStore(store)  // Persistent storage
            .build();
}

6. Function Calling

6.1 What is Function Calling?

Function calling allows the LLM to invoke your Java methods. The LLM:

Analyzes the user’s request
Decides if a tool is needed
Extracts parameters
Calls your method
Uses the result in its response

User: Book an appointment with Dr. Wang tomorrow at 2pm

LLM → Detects "book appointment" intent
    → Calls bookAppointment(doctor="Dr. Wang", date="2025-06-12", time="14:00")
    ← Returns "Appointment confirmed"
    → Responds: "Your appointment with Dr. Wang is confirmed for tomorrow at 2pm."

6.2 Creating Tools

Annotate methods with @Tool:

@Component
public class CalculatorTools {

    @Tool(name = "sum", value = "Add two numbers")
    public double sum(
            @P(value = "First number") double a,
            @P(value = "Second number") double b) {
        return a + b;
    }

    @Tool(name = "squareRoot", value = "Calculate square root")
    public double squareRoot(@P(value = "Number") double x) {
        return Math.sqrt(x);
    }
}

@AiService(
    wiringMode = EXPLICIT,
    chatModel = "openAiChatModel",
    chatMemoryProvider = "chatMemoryProvider",
    tools = "calculatorTools"  // Tool bean name
)
public interface MedicalAssistant {
    String chat(@MemoryId Long id, @UserMessage String message);
}

6.3 Appointment Booking Tool

Real-world example for medical appointments:

@Component
public class AppointmentTools {

    @Autowired
    private AppointmentService appointmentService;

    @Tool(name = "checkAvailability",
          value = "Check if a doctor has available slots")
    public boolean checkAvailability(
            @P(value = "Department name") String department,
            @P(value = "Date in YYYY-MM-DD format") String date,
            @P(value = "Time slot: morning or afternoon") String time,
            @P(value = "Doctor name (optional)", required = false) String doctorName) {

        return appointmentService.hasAvailability(department, date, time, doctorName);
    }

    @Tool(name = "bookAppointment",
          value = "Book a medical appointment. Confirm details with user first.")
    public String bookAppointment(
            @P(value = "Patient name") String patientName,
            @P(value = "Patient ID card number") String idCard,
            @P(value = "Department") String department,
            @P(value = "Date in YYYY-MM-DD") String date,
            @P(value = "Time: morning or afternoon") String time,
            @P(value = "Doctor name (optional)", required = false) String doctorName) {

        Appointment appointment = new Appointment();
        appointment.setPatientName(patientName);
        appointment.setIdCard(idCard);
        appointment.setDepartment(department);
        appointment.setDate(date);
        appointment.setTime(time);
        appointment.setDoctorName(doctorName);

        boolean success = appointmentService.save(appointment);
        return success ? "Appointment booked successfully" : "Failed to book appointment";
    }

    @Tool(name = "cancelAppointment",
          value = "Cancel an existing appointment")
    public String cancelAppointment(
            @P(value = "Patient name") String patientName,
            @P(value = "ID card number") String idCard,
            @P(value = "Department") String department,
            @P(value = "Date in YYYY-MM-DD") String date) {

        boolean success = appointmentService.cancel(patientName, idCard, department, date);
        return success ? "Appointment cancelled successfully" : "No matching appointment found";
    }
}

7. RAG - Retrieval-Augmented Generation

7.1 Why RAG?

LLMs have knowledge cutoff dates. They don’t know:

Your hospital’s specific departments
Doctor schedules and specialties
Latest hospital policies
Internal procedures

RAG retrieves relevant documents and provides them as context:

Approach	Pros	Cons
Fine-tuning	Fast inference, high accuracy	Expensive, slow to update, requires expertise
RAG	Easy to update, cost-effective, no training	Requires two queries (retrieval + generation)

7.2 Vector Search Explained

Vectors are numerical representations of text. Similar texts have similar vectors.

1
2
3

"Cardiology department"  → [0.1, 0.3, -0.2, ...] (1024 dimensions)
"Heart specialist"       → [0.12, 0.28, -0.19, ...] (similar vector)
"Dentistry"              → [-0.5, 0.1, 0.8, ...]   (different vector)

Similarity Calculation:

Cosine Similarity: Measures angle between vectors
Euclidean Distance: Measures straight-line distance

Higher similarity = more relevant content.

7.3 Document Processing Pipeline

1 2	Document → Parse → Split → Embed → Store (PDF/MD) (Chunks) (Vectors) (Vector DB)

Why split documents?

LLM context windows are limited
Smaller chunks = more precise retrieval
Reduces noise from irrelevant content

7.4 Implementing RAG

Step 1: Load and Process Documents

// Load hospital information documents
Document hospitalDoc = FileSystemDocumentLoader.loadDocument(
    "knowledge/hospital-info.md");
Document deptDoc = FileSystemDocumentLoader.loadDocument(
    "knowledge/departments.md");

Step 2: Create Vector Store

@Bean
ContentRetriever contentRetriever(EmbeddingModel embeddingModel) {
    // Load documents
    List<Document> documents = Arrays.asList(
        FileSystemDocumentLoader.loadDocument("knowledge/hospital-info.md"),
        FileSystemDocumentLoader.loadDocument("knowledge/departments.md"),
        FileSystemDocumentLoader.loadDocument("knowledge/doctors.md")
    );

    // In-memory store (use Pinecone/Milvus for production)
    InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();

    // Process and store
    EmbeddingStoreIngestor.ingest(documents, store);

    // Create retriever
    return EmbeddingStoreContentRetriever.from(store);
}

Step 3: Configure AIService with RAG

@AiService(
    wiringMode = EXPLICIT,
    chatModel = "openAiChatModel",
    chatMemoryProvider = "chatMemoryProvider",
    tools = "appointmentTools",
    contentRetriever = "contentRetriever"  // RAG component
)
public interface MedicalAssistant {

    @SystemMessage(fromResource = "medical-assistant-prompt.txt")
    String chat(@MemoryId Long id, @UserMessage String message);
}

Production Vector Database - Pinecone:

@Bean
EmbeddingStore<TextSegment> embeddingStore(EmbeddingModel model) {
    return PineconeEmbeddingStore.builder()
            .apiKey(System.getenv("PINECONE_API_KEY"))
            .index("medical-kb")
            .nameSpace("hospital-info")
            .createIndex(PineconeServerlessIndexConfig.builder()
                    .cloud("AWS")
                    .region("us-east-1")
                    .dimension(model.dimension())  // 1024 for text-embedding-v3
                    .build())
            .build();
}

8. Putting It All Together

8.1 Complete Agent Configuration

MedicalAssistant.java:

package com.example.assistant;

import dev.langchain4j.service.*;
import dev.langchain4j.service.spring.AiService;
import static dev.langchain4j.service.spring.AiServiceWiringMode.EXPLICIT;

@AiService(
    wiringMode = EXPLICIT,
    chatModel = "openAiChatModel",
    chatMemoryProvider = "chatMemoryProvider",
    tools = "appointmentTools",
    contentRetriever = "contentRetriever"
)
public interface MedicalAssistant {

    @SystemMessage(fromResource = "medical-assistant-prompt.txt")
    String chat(
        @MemoryId Long conversationId,
        @UserMessage String userMessage
    );
}

medical-assistant-prompt.txt:

You are "MediBot", an AI assistant for Peking Union Medical College Hospital.

Your capabilities:
1. Medical consultation - provide general health information
2. Department guidance - help patients find the right department
3. Doctor information - answer questions about our doctors
4. Appointment management - book and cancel appointments

Rules:
- Always verify patient identity (name + ID card) before appointments
- Confirm appointment details before booking
- Use the knowledge base for hospital-specific information
- Be professional yet friendly
- Add appropriate emoji to make responses warm

Today is {{current_date}}.

8.2 Controller Layer

@RestController
@RequestMapping("/api/medical")
@Tag(name = "Medical AI Assistant")
public class MedicalController {

    @Autowired
    private MedicalAssistant assistant;

    @PostMapping("/chat")
    @Operation(summary = "Chat with medical assistant")
    public String chat(@RequestBody ChatRequest request) {
        return assistant.chat(request.getConversationId(), request.getMessage());
    }
}

@Data
public class ChatRequest {
    private Long conversationId;  // Unique per user session
    private String message;
}

9. Best Practices

System Design

Practice	Why It Matters
Use `@MemoryId` for multi-user	Prevents conversation bleeding between users
Persistent chat memory	Don’t lose context on server restart
Tool descriptions	Clear descriptions help LLM choose correctly
Document chunking	Smaller chunks improve retrieval precision
Min score threshold	Filters low-relevance results

Security

1 2	# Never commit API keys langchain4j.open-ai.chat-model.api-key=${OPENAI_API_KEY}

Performance

// Limit chat memory to control token usage
MessageWindowChatMemory.withMaxMessages(10)

// Set minScore to filter irrelevant documents
EmbeddingStoreContentRetriever.builder()
    .minScore(0.8)
    .maxResults(3)
    .build()

Common Pitfalls

Pitfall	Solution
LLM doesn’t use tools	Improve tool descriptions
Wrong tool parameters	Use `@P` annotations with clear descriptions
Out-of-scope responses	Refine system message
Slow responses	Use streaming output for real-time feedback

10. FAQ

Q: Can I switch from OpenAI to DeepSeek without code changes?

A: Yes. Just change the configuration:

# From OpenAI
langchain4j.open-ai.chat-model.base-url=https://api.openai.com/v1
langchain4j.open-ai.chat-model.api-key=${OPENAI_KEY}

# To DeepSeek
langchain4j.open-ai.chat-model.base-url=https://api.deepseek.com
langchain4j.open-ai.chat-model.api-key=${DEEPSEEK_KEY}

Q: How do I handle sensitive medical data?

Use local LLMs via Ollama for sensitive data
Implement data anonymization before sending to external APIs
Store chat history encrypted
Follow HIPAA/GDPR compliance requirements

Q: What’s the difference between ChatMemory and ChatMemoryProvider?

ChatMemory: Single shared memory instance
ChatMemoryProvider: Factory that creates isolated memory per memoryId

Q: How do I update the knowledge base?

A: Simply add new documents and re-ingest:

// New document
Document newDoc = FileSystemDocumentLoader.loadDocument("new-policy.md");

// Add to existing store
EmbeddingStoreIngestor.ingest(newDoc, embeddingStore);

Q: Can the LLM call multiple tools in one conversation?

A: Yes. The LLM can chain tool calls:

User: "Book Dr. Wang and check if Dr. Li is available next week"

LLM → Calls bookAppointment(...)
    → Calls checkAvailability(...)
    → Responds with both results

11. Summary

We built a medical AI assistant using LangChain4J with:

Spring Boot Integration - Auto-configuration and dependency injection
LLM Integration - Unified API for OpenAI, DeepSeek, and others
Chat Memory - Persistent conversation context
Function Calling - Java methods callable by the LLM
RAG - Knowledge retrieval from documents

Key Takeaways:

LangChain4J abstracts LLM complexity behind simple interfaces
AIService pattern keeps code clean and testable
Chat memory enables natural multi-turn conversations
Function calling bridges LLM reasoning with business logic
RAG grounds the LLM in your specific domain knowledge

Next Steps:

Add streaming responses for better UX
Implement multi-modal support (images)
Add evaluation framework for response quality
Deploy with Docker and Kubernetes

Project Link:

SmartMed-LangChain4j-RAG
If you find this repo useful, a star would be much appreciated. ⭐⭐⭐

Demo:

AI with Alex's Blog