LightRAG: Retrieval-augmented generation with knowledge graphs
Graph-based retrieval framework for structured RAG reasoning.
Learn more about LightRAG
LightRAG is a Python framework for retrieval-augmented generation that integrates knowledge graph structures into the retrieval process for enhanced contextual reasoning. The system constructs and maintains a graph-based index from ingested text documents, where entities and relationships are extracted and stored as nodes and edges within a working directory structure. During retrieval operations, the framework traverses this knowledge graph to identify semantically relevant information paths rather than relying solely on vector similarity search. The architecture combines traditional embedding-based retrieval with graph traversal algorithms to provide more structured and contextually coherent results for downstream language model generation tasks.
Graph-based retrieval
Uses extracted entities and relationships to construct a knowledge graph for retrieval, enabling structured queries that capture semantic relationships between concepts rather than relying on vector similarity alone.
Flexible storage backends
Supports multiple storage options including PostgreSQL and local storage, allowing deployment in different infrastructure contexts and enabling document deletion with graph regeneration.
Multimodal document processing
Integrates with RAG-Anything for handling diverse document formats including PDFs, images, tables, and equations, extending beyond text-only processing.
Fixes OpenAI environment variable loading and adds Vertex AI support for Gemini with Korean localization.
- –Fix OpenAI LLM binding options not loaded from environment variables
- –feat(gemini): Add Vertex AI support for Gemini LLM binding
- –refact(gemini): Migrate Gemini LLM to native async Google GenAI client
- –Refact: Change DOCX extraction to use HTML tags for whitespace
- –feat: add Korean localization
Hot Fix AttributeError in Neo4JStorage and MemgraphStorage when using storage specified workspace env var
- –Hot Fix AttributeError in Neo4JStorage and MemgraphStorage when using storage specified workspace env var
This release introduces workspace isolation for multi-instance deployments, vector data isolation by model/dimension, and enhanced document extraction.
- –Dimension Selection is Supported for OpenAI and Gemini Embedding model with new env var introduced: EMBEDDINGSENDDIM
- –Add LLM Cache Migration and LLM Query Cache Cleanup Tools Between Different KV Storage
- –Enhanced Enhanced DOCX Extraction with Table Content Support.
- –Enhanced XLSX extraction with proper handling of tab and newline characters within cells.
- –Fix Critical Security Vulnerability in React Server Components: #2494
Related Repositories
Discover similar tools and frameworks used by developers
Open WebUI
Extensible multi-LLM chat platform with RAG pipeline.
DeepSpeed
PyTorch library for training billion-parameter models efficiently.
crewAI
Python framework for autonomous multi-agent AI collaboration.
Model Context Protocol Servers
Reference implementations for LLM tool and data integration.
PaddleOCR
Multilingual OCR toolkit with document structure extraction.