Creating Knowledge Bases with RAG

intermediate 35 minutes Knowledge Management

Build intelligent chatbots that can answer questions from your documents using Retrieval-Augmented Generation (RAG).

1 Understanding RAG

5 min

Retrieval-Augmented Generation (RAG) is a powerful technique that allows AI to access external knowledge sources to provide accurate, up-to-date information.

How RAG Works

  1. Document Processing: Your documents are split into chunks and converted to embeddings
  2. Query Processing: User questions are also converted to embeddings
  3. Similarity Search: The system finds the most relevant document chunks
  4. Response Generation: The AI uses retrieved information to generate accurate answers

Benefits of RAG

  • Accuracy: Reduces AI hallucinations by grounding responses in real data
  • Current Information: Access to your latest documents and data
  • Source Attribution: Can cite specific sources for transparency
  • Cost Effective: More efficient than fine-tuning models

Use Cases

  • Customer support with product documentation
  • Internal knowledge sharing
  • Educational content delivery
  • Legal document analysis
  • Technical documentation assistance
Tips:
  • RAG is perfect when you need AI to answer questions about specific content
  • Quality of your source documents directly impacts answer quality
  • RAG works best with well-structured, factual content

2 Creating Your First Knowledge Base

6 min

Let's create a knowledge base from your documents that your AI can search and reference.

Step-by-Step Knowledge Base Creation

  1. Navigate to Knowledge in the main menu
  2. Click "Create Knowledge"
  3. Give your knowledge base a descriptive name
  4. Add a brief description of its contents
  5. Click "Create"

Supported Data Sources

Dify supports multiple data sources:

  • Local Files: Upload PDFs, Word docs, text files, CSV, etc.
  • Notion Pages: Sync directly from your Notion workspace
  • Web Pages: Scrape content from websites using Jina or Firecrawl API
  • Plain Text: Copy and paste content directly

File Requirements and Limits

  • Supported formats: PDF, DOCX, TXT, MD, CSV, XLSX
  • File size limit: Usually 15MB per file (varies by plan)
  • Total size: Depends on your subscription tier
  • Language support: Multi-language documents supported
Tips:
  • Start with a small set of high-quality documents
  • Use descriptive names for easy management
  • Organize related content in the same knowledge base

3 Uploading and Processing Documents

8 min

Now let's add documents to your knowledge base and configure how they're processed.

Document Upload Process

  1. Click "Add Document" in your knowledge base
  2. Choose your upload method (File, Notion, Web scraping, or Text)
  3. Select or upload your documents
  4. Review the document preview
  5. Configure processing settings

Chunking Configuration

Documents are split into smaller chunks for better retrieval:

  • Automatic Chunking: Dify automatically splits by paragraphs
  • Custom Rules: Set your own chunk size and overlap
  • Chunk Size: 500-1000 characters is usually optimal
  • Overlap: 50-100 characters to maintain context

Text Preprocessing Options

  • Remove extra spaces: Clean up formatting
  • Remove URLs: Filter out web links
  • Remove email addresses: Protect privacy
  • Custom preprocessing: Advanced filtering rules

Embedding Model Selection

Choose the right embedding model for your content:

  • OpenAI text-embedding-3-small: Fast and cost-effective
  • OpenAI text-embedding-3-large: Higher accuracy
  • Cohere embed-english: Good for English content
  • Cohere embed-multilingual: For multiple languages
Example:

# Example document structure for optimal RAG performance:

## Product FAQ Document

### What is Product X?
Product X is a comprehensive solution for...

### How do I install Product X?
1. Download the installer from our website
2. Run the installer as administrator
3. Follow the setup wizard

### Troubleshooting Common Issues
**Issue:** Application won't start
**Solution:** Check system requirements and try running as administrator
                    
Tips:
  • Well-structured documents with clear headings work best
  • Keep chunk sizes moderate - too small loses context, too large reduces precision
  • Choose embedding models based on your primary language

4 Configuring Retrieval Settings

5 min

Fine-tune how your knowledge base searches for and retrieves relevant information.

Retrieval Methods

  • Vector Retrieval: Finds semantically similar content using embeddings
  • Full-Text Search: Traditional keyword-based search
  • Hybrid Retrieval (Recommended): Combines both methods for best results

Hybrid Retrieval Configuration

Adjust the balance between semantic and keyword search:

  • Semantic Weight (70%): Finds conceptually related content
  • Keyword Weight (30%): Finds exact term matches
  • Custom Weights: Adjust based on your content type

Reranking Models

Improve retrieval accuracy with reranking:

  • Cohere Rerank: Reorders results for better relevance
  • BGE Reranker: Open-source alternative
  • No Reranking: Faster but potentially less accurate

Retrieval Parameters

  • Top K: Number of chunks to retrieve (3-5 recommended)
  • Score Threshold: Minimum similarity score for inclusion
  • Max Tokens: Total token limit for retrieved content
Tips:
  • Hybrid retrieval works best for most use cases
  • Start with 70% semantic, 30% keyword weighting
  • Use reranking for better accuracy when response quality matters most

5 Testing Your Knowledge Base

4 min

Before integrating your knowledge base into an application, test its retrieval accuracy.

Using the Recall Test

  1. Go to your knowledge base settings
  2. Click on the "Recall Test" tab
  3. Enter test queries related to your content
  4. Review the retrieved chunks and their relevance scores
  5. Adjust settings if needed

Effective Test Queries

  • Direct questions: "How do I reset my password?"
  • Conceptual queries: "Security best practices"
  • Specific terms: "API rate limits"
  • Variations: Test different ways of asking the same thing

Evaluating Results

Look for:

  • Relevance: Do retrieved chunks actually answer the question?
  • Completeness: Is all necessary information retrieved?
  • Ranking: Are the most relevant chunks ranked highest?
  • Coverage: Can the system find information across all your documents?

Common Issues and Solutions

  • Poor retrieval: Adjust chunk size or embedding model
  • Irrelevant results: Increase score threshold
  • Missing information: Check document quality and chunking
  • Inconsistent results: Consider using reranking
Tips:
  • Test with questions your actual users would ask
  • Document any query patterns that don't work well
  • Iterate on your settings based on test results

6 Building a RAG-Powered Chatbot

5 min

Now let's create a chatbot that uses your knowledge base to answer questions accurately.

Creating the Chatflow

  1. Create a new Chatflow application
  2. Keep the default Start → LLM → Answer flow
  3. Click on the LLM node to configure it

Adding Your Knowledge Base

  1. In the LLM node settings, find the "Context" section
  2. Click "Add Knowledge"
  3. Select your knowledge base
  4. Configure retrieval settings if needed

Crafting a RAG-Optimized Prompt

You are a helpful assistant that answers questions based on the provided context.

Instructions:
1. Use the context information below to answer the user's question
2. If the context doesn't contain relevant information, say "I don't have information about that in my knowledge base"
3. Always cite specific parts of the context when possible
4. Be accurate and don't make up information not in the context

Context: {{#knowledge}}

User Question: {{sys.query}}

Please provide a helpful and accurate response based on the context above.

Advanced RAG Techniques

  • Question Classification: Route different types of questions appropriately
  • Multiple Knowledge Bases: Use different sources for different topics
  • Fallback Strategies: Handle cases when no relevant information is found
Tips:
  • Always instruct the AI to stay within the provided context
  • Enable citation features to show sources
  • Test with questions both inside and outside your knowledge base

7 Advanced RAG Workflows

6 min

Create more sophisticated RAG applications with conditional logic and multiple knowledge sources.

Question Classification Workflow

Route different types of questions to appropriate knowledge bases:

  1. Add a Question Classifier node after Start
  2. Define categories (e.g., "Product Info", "Technical Support", "Billing")
  3. Connect different paths to different knowledge bases
  4. Use conditional logic to route appropriately

Multi-Step RAG Process

  1. Initial Retrieval: Find relevant chunks
  2. Relevance Check: Evaluate if information is sufficient
  3. Follow-up Retrieval: Search additional sources if needed
  4. Response Generation: Synthesize all found information

Handling Edge Cases

  • No Matches Found: Provide helpful guidance on how to rephrase
  • Low Confidence: Ask clarifying questions
  • Multiple Valid Answers: Present options clearly
  • Outdated Information: Include disclaimers about data freshness
# Example multi-step RAG prompt

You are analyzing a user question in two steps:

Step 1: Evaluate if the retrieved context contains sufficient information
Context: {{#knowledge}}
Question: {{sys.query}}

If context is sufficient, respond with: SUFFICIENT
If context is insufficient, respond with: INSUFFICIENT - [reason]

Step 2 (only if sufficient): Provide a complete answer based on the context.
Tips:
  • Question classification improves accuracy for diverse knowledge bases
  • Always have fallback options when retrieval fails
  • Consider the user experience when no good answers are found

8 Monitoring and Optimization

6 min

Continuously improve your RAG system by monitoring performance and optimizing based on usage patterns.

Key Metrics to Track

  • Retrieval Accuracy: Percentage of queries with relevant results
  • Response Quality: User satisfaction with answers
  • Coverage: Percentage of questions that can be answered
  • Response Time: Average time to generate answers
  • Cost: Token usage for embeddings and generation

Optimization Strategies

  • Document Quality: Improve source content structure and clarity
  • Chunk Optimization: Adjust size and overlap based on performance
  • Embedding Tuning: Experiment with different embedding models
  • Prompt Refinement: Continuously improve instructions

Common Performance Issues

  • Poor Retrieval:
    • Check document quality and structure
    • Adjust chunking strategy
    • Consider different embedding models
  • Slow Responses:
    • Optimize retrieval parameters
    • Use smaller, more focused knowledge bases
    • Consider caching frequent queries
  • High Costs:
    • Optimize chunk sizes to reduce token usage
    • Use more efficient embedding models
    • Implement query caching

Best Practices for Production

  • Regular Updates: Keep knowledge bases current
  • Quality Control: Review and curate content regularly
  • User Feedback: Collect and act on user ratings
  • A/B Testing: Test different configurations
  • Backup Strategies: Maintain multiple knowledge sources
Tips:
  • Monitor real user queries to identify content gaps
  • Regularly review and update your knowledge base content
  • Use analytics to identify the most common query patterns