rag
Agentic RAG Transformer for intelligent knowledge retrieval across ML, Sci-Fi, and Cosmos domains.
Introduction
rag is a sophisticated retrieval-augmented generation system designed to help you extract meaningful insights from your knowledge base. By combining semantic search with large language models, rag enables intelligent question-answering capabilities across specialized domains.
The system leverages state-of-the-art embeddings (all-MiniLM-L6-v2) and FAISS vector storage for efficient similarity search.
Installation
The fastest way to get started:
bash SETUP.sh
Manual installation:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .
Quick Start
rag-tui
Or CLI:
rag --query "What is machine learning?"
Configuration
cp .env.example .env
Core
RAG_LLM_BACKEND=local|openai|cerebras|ollama
RAG_MEMORY_MODE=off|session|persist
RAG_ENABLE_WEB=0|1
Models
EMBEDDING_MODEL=all-MiniLM-L6-v2
GENERATOR_MODEL=google/flan-t5-small
API Keys
OPENAI_API_KEY=sk-...
CEREBRAS_API_KEY=cs-...
TMDB_API_KEY=...
NASA_API_KEY=...
System
TOP_K_RETRIEVAL=3
MAX_ITERATIONS=3
MAX_LENGTH=150
LLM Backends
OpenAI (GPT-5.3)
Uses OpenAI Python SDK with client.responses.create() (primary) and client.chat.completions.create() (fallback).
RAG_LLM_BACKEND=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5.3-chat-latest
Cerebras
OpenAI-compatible API at api.cerebras.ai. Cost-effective inference.
RAG_LLM_BACKEND=cerebras
CEREBRAS_API_KEY=cs-...
CEREBRAS_MODEL=llama3.1-8b
Ollama (local)
Run LLMs locally via REST API on localhost:11434. Fully offline.
ollama pull llama3
RAG_LLM_BACKEND=ollama
OLLAMA_MODEL=llama3
Tool Commands
rag integrates external tools via prefixed commands with AST-safe evaluation.
CALC: - Calculator
CALC: 2^10
CALC: sqrt(144) * pi
TIME: - Current Time
TIME:
WIKI: - Wikipedia
WIKI: machine learning
SHELL: - Shell Commands
SHELL: git status
SHELL: ls -la
SHELL: git add .
SHELL: git commit -m "feat: new feature"
SEARCH: / WEB: - Web Search
SEARCH: latest AI news
WEB: https://example.com
Memory System
RAG_MEMORY_MODE=off # No persistence
RAG_MEMORY_MODE=session # In-memory only
RAG_MEMORY_MODE=persist # SQLite in .cache/
rag remembers user facts (e.g., "My name is X") and uses them in conversations.
Knowledge & Retrieval
Uses FAISS (IndexFlatL2) for vector similarity search with all-MiniLM-L6-v2 embeddings.
KNOWLEDGE_BASE_FILE=knowledge_base.json
USE_FAISS=1
TOP_K_RETRIEVAL=3
UI & UX
Built with rich library for colored output and interactive TUI.
[bold]Agentic RAG Transformer[/]
ML, Sci-Fi, and Cosmos Assistant
> What is ML?
Machine learning is a subset of AI...
> CALC: 2^10
1024
> SHELL: git status
On branch main
Changes to be committed:
modified: README.asc
Features: colored output, interactive pickers, status indicators, keyboard shortcuts.
Data Collection
rag-collect
Collects data from TMDB (movies) and NASA (astronomy pictures).
TMDB_API_KEY=...
NASA_API_KEY=...
Deployment
Local Preview
cd release-webpage && python3 -m http.server 8000
GitHub Pages
Auto-deployed from main branch on push to release-webpage/**.
URL: https://bniladridas.github.io/rag/
Workflow: .github/workflows/pages.yml
Docker
docker build -t rag .
docker run rag rag-tui
Architecture
src/rag/
config.py # Configuration (.env)
rag_engine.py # Core RAG engine
tools.py # Tool executor
memory.py # Conversation memory
ui/tui.py # Terminal UI
Query -> Tools -> FAISS Retrieval -> Memory -> LLM Generation
Release
Automated via semantic-release on push to main.
- semantic-release determines version and creates tags/releases
scripts/enhance_changelog.pyupdates CHANGELOG.mdscripts/update_webpage_version.pysyncs version to HTML
Troubleshooting
ModuleNotFoundError
pip install -e .
Terminal transcript
$ python3 -m venv venv $ source venv/bin/activate $ pip install -r requirements.txt $ pip install -e . $ rag-tui INFO: Loading faiss. INFO: Load pretrained SentenceTransformer: all-MiniLM-L6-v2