Creating Knowledge Bases with RAG
Build intelligent chatbots that can answer questions from your documents using Retrieval-Augmented Generation (RAG).
1 Understanding RAG
5 minRetrieval-Augmented Generation (RAG) is a powerful technique that allows AI to access external knowledge sources to provide accurate, up-to-date information.
How RAG Works
- Document Processing: Your documents are split into chunks and converted to embeddings
- Query Processing: User questions are also converted to embeddings
- Similarity Search: The system finds the most relevant document chunks
- Response Generation: The AI uses retrieved information to generate accurate answers
Benefits of RAG
- Accuracy: Reduces AI hallucinations by grounding responses in real data
- Current Information: Access to your latest documents and data
- Source Attribution: Can cite specific sources for transparency
- Cost Effective: More efficient than fine-tuning models
Use Cases
- Customer support with product documentation
- Internal knowledge sharing
- Educational content delivery
- Legal document analysis
- Technical documentation assistance
Tips:
- RAG is perfect when you need AI to answer questions about specific content
- Quality of your source documents directly impacts answer quality
- RAG works best with well-structured, factual content
2 Creating Your First Knowledge Base
6 minLet's create a knowledge base from your documents that your AI can search and reference.
Step-by-Step Knowledge Base Creation
- Navigate to Knowledge in the main menu
- Click "Create Knowledge"
- Give your knowledge base a descriptive name
- Add a brief description of its contents
- Click "Create"
Supported Data Sources
Dify supports multiple data sources:
- Local Files: Upload PDFs, Word docs, text files, CSV, etc.
- Notion Pages: Sync directly from your Notion workspace
- Web Pages: Scrape content from websites using Jina or Firecrawl API
- Plain Text: Copy and paste content directly
File Requirements and Limits
- Supported formats: PDF, DOCX, TXT, MD, CSV, XLSX
- File size limit: Usually 15MB per file (varies by plan)
- Total size: Depends on your subscription tier
- Language support: Multi-language documents supported
Tips:
- Start with a small set of high-quality documents
- Use descriptive names for easy management
- Organize related content in the same knowledge base
3 Uploading and Processing Documents
8 minNow let's add documents to your knowledge base and configure how they're processed.
Document Upload Process
- Click "Add Document" in your knowledge base
- Choose your upload method (File, Notion, Web scraping, or Text)
- Select or upload your documents
- Review the document preview
- Configure processing settings
Chunking Configuration
Documents are split into smaller chunks for better retrieval:
- Automatic Chunking: Dify automatically splits by paragraphs
- Custom Rules: Set your own chunk size and overlap
- Chunk Size: 500-1000 characters is usually optimal
- Overlap: 50-100 characters to maintain context
Text Preprocessing Options
- Remove extra spaces: Clean up formatting
- Remove URLs: Filter out web links
- Remove email addresses: Protect privacy
- Custom preprocessing: Advanced filtering rules
Embedding Model Selection
Choose the right embedding model for your content:
- OpenAI text-embedding-3-small: Fast and cost-effective
- OpenAI text-embedding-3-large: Higher accuracy
- Cohere embed-english: Good for English content
- Cohere embed-multilingual: For multiple languages
Example:
# Example document structure for optimal RAG performance:
## Product FAQ Document
### What is Product X?
Product X is a comprehensive solution for...
### How do I install Product X?
1. Download the installer from our website
2. Run the installer as administrator
3. Follow the setup wizard
### Troubleshooting Common Issues
**Issue:** Application won't start
**Solution:** Check system requirements and try running as administrator
Tips:
- Well-structured documents with clear headings work best
- Keep chunk sizes moderate - too small loses context, too large reduces precision
- Choose embedding models based on your primary language
4 Configuring Retrieval Settings
5 minFine-tune how your knowledge base searches for and retrieves relevant information.
Retrieval Methods
- Vector Retrieval: Finds semantically similar content using embeddings
- Full-Text Search: Traditional keyword-based search
- Hybrid Retrieval (Recommended): Combines both methods for best results
Hybrid Retrieval Configuration
Adjust the balance between semantic and keyword search:
- Semantic Weight (70%): Finds conceptually related content
- Keyword Weight (30%): Finds exact term matches
- Custom Weights: Adjust based on your content type
Reranking Models
Improve retrieval accuracy with reranking:
- Cohere Rerank: Reorders results for better relevance
- BGE Reranker: Open-source alternative
- No Reranking: Faster but potentially less accurate
Retrieval Parameters
- Top K: Number of chunks to retrieve (3-5 recommended)
- Score Threshold: Minimum similarity score for inclusion
- Max Tokens: Total token limit for retrieved content
Tips:
- Hybrid retrieval works best for most use cases
- Start with 70% semantic, 30% keyword weighting
- Use reranking for better accuracy when response quality matters most
5 Testing Your Knowledge Base
4 minBefore integrating your knowledge base into an application, test its retrieval accuracy.
Using the Recall Test
- Go to your knowledge base settings
- Click on the "Recall Test" tab
- Enter test queries related to your content
- Review the retrieved chunks and their relevance scores
- Adjust settings if needed
Effective Test Queries
- Direct questions: "How do I reset my password?"
- Conceptual queries: "Security best practices"
- Specific terms: "API rate limits"
- Variations: Test different ways of asking the same thing
Evaluating Results
Look for:
- Relevance: Do retrieved chunks actually answer the question?
- Completeness: Is all necessary information retrieved?
- Ranking: Are the most relevant chunks ranked highest?
- Coverage: Can the system find information across all your documents?
Common Issues and Solutions
- Poor retrieval: Adjust chunk size or embedding model
- Irrelevant results: Increase score threshold
- Missing information: Check document quality and chunking
- Inconsistent results: Consider using reranking
Tips:
- Test with questions your actual users would ask
- Document any query patterns that don't work well
- Iterate on your settings based on test results
6 Building a RAG-Powered Chatbot
5 minNow let's create a chatbot that uses your knowledge base to answer questions accurately.
Creating the Chatflow
- Create a new Chatflow application
- Keep the default Start → LLM → Answer flow
- Click on the LLM node to configure it
Adding Your Knowledge Base
- In the LLM node settings, find the "Context" section
- Click "Add Knowledge"
- Select your knowledge base
- Configure retrieval settings if needed
Crafting a RAG-Optimized Prompt
You are a helpful assistant that answers questions based on the provided context.
Instructions:
1. Use the context information below to answer the user's question
2. If the context doesn't contain relevant information, say "I don't have information about that in my knowledge base"
3. Always cite specific parts of the context when possible
4. Be accurate and don't make up information not in the context
Context: {{#knowledge}}
User Question: {{sys.query}}
Please provide a helpful and accurate response based on the context above.
Advanced RAG Techniques
- Question Classification: Route different types of questions appropriately
- Multiple Knowledge Bases: Use different sources for different topics
- Fallback Strategies: Handle cases when no relevant information is found
Tips:
- Always instruct the AI to stay within the provided context
- Enable citation features to show sources
- Test with questions both inside and outside your knowledge base
7 Advanced RAG Workflows
6 minCreate more sophisticated RAG applications with conditional logic and multiple knowledge sources.
Question Classification Workflow
Route different types of questions to appropriate knowledge bases:
- Add a Question Classifier node after Start
- Define categories (e.g., "Product Info", "Technical Support", "Billing")
- Connect different paths to different knowledge bases
- Use conditional logic to route appropriately
Multi-Step RAG Process
- Initial Retrieval: Find relevant chunks
- Relevance Check: Evaluate if information is sufficient
- Follow-up Retrieval: Search additional sources if needed
- Response Generation: Synthesize all found information
Handling Edge Cases
- No Matches Found: Provide helpful guidance on how to rephrase
- Low Confidence: Ask clarifying questions
- Multiple Valid Answers: Present options clearly
- Outdated Information: Include disclaimers about data freshness
# Example multi-step RAG prompt
You are analyzing a user question in two steps:
Step 1: Evaluate if the retrieved context contains sufficient information
Context: {{#knowledge}}
Question: {{sys.query}}
If context is sufficient, respond with: SUFFICIENT
If context is insufficient, respond with: INSUFFICIENT - [reason]
Step 2 (only if sufficient): Provide a complete answer based on the context.
Tips:
- Question classification improves accuracy for diverse knowledge bases
- Always have fallback options when retrieval fails
- Consider the user experience when no good answers are found
8 Monitoring and Optimization
6 minContinuously improve your RAG system by monitoring performance and optimizing based on usage patterns.
Key Metrics to Track
- Retrieval Accuracy: Percentage of queries with relevant results
- Response Quality: User satisfaction with answers
- Coverage: Percentage of questions that can be answered
- Response Time: Average time to generate answers
- Cost: Token usage for embeddings and generation
Optimization Strategies
- Document Quality: Improve source content structure and clarity
- Chunk Optimization: Adjust size and overlap based on performance
- Embedding Tuning: Experiment with different embedding models
- Prompt Refinement: Continuously improve instructions
Common Performance Issues
- Poor Retrieval:
- Check document quality and structure
- Adjust chunking strategy
- Consider different embedding models
- Slow Responses:
- Optimize retrieval parameters
- Use smaller, more focused knowledge bases
- Consider caching frequent queries
- High Costs:
- Optimize chunk sizes to reduce token usage
- Use more efficient embedding models
- Implement query caching
Best Practices for Production
- Regular Updates: Keep knowledge bases current
- Quality Control: Review and curate content regularly
- User Feedback: Collect and act on user ratings
- A/B Testing: Test different configurations
- Backup Strategies: Maintain multiple knowledge sources
Tips:
- Monitor real user queries to identify content gaps
- Regularly review and update your knowledge base content
- Use analytics to identify the most common query patterns