rag

Introduction

rag is a sophisticated retrieval-augmented generation system designed to help you extract meaningful insights from your knowledge base. By combining semantic search with large language models, rag enables intelligent question-answering capabilities across specialized domains.

The system leverages state-of-the-art embeddings (all-MiniLM-L6-v2) and FAISS vector storage for efficient similarity search.

Installation

The fastest way to get started:

bash SETUP.sh

Manual installation:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .

Quick Start

rag-tui

Or CLI:

rag --query "What is machine learning?"

Configuration

cp .env.example .env

Core

RAG_LLM_BACKEND=local|openai|cerebras|ollama
RAG_MEMORY_MODE=off|session|persist
RAG_ENABLE_WEB=0|1

Models

EMBEDDING_MODEL=all-MiniLM-L6-v2
GENERATOR_MODEL=google/flan-t5-small

API Keys

OPENAI_API_KEY=sk-...
CEREBRAS_API_KEY=cs-...
TMDB_API_KEY=...
NASA_API_KEY=...

System

TOP_K_RETRIEVAL=3
MAX_ITERATIONS=3
MAX_LENGTH=150

LLM Backends

OpenAI (GPT-5.3)

Uses OpenAI Python SDK with client.responses.create() (primary) and client.chat.completions.create() (fallback).

RAG_LLM_BACKEND=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5.3-chat-latest

Cerebras

OpenAI-compatible API at api.cerebras.ai. Cost-effective inference.

RAG_LLM_BACKEND=cerebras
CEREBRAS_API_KEY=cs-...
CEREBRAS_MODEL=llama3.1-8b

Ollama (local)

Run LLMs locally via REST API on localhost:11434. Fully offline.

ollama pull llama3
RAG_LLM_BACKEND=ollama
OLLAMA_MODEL=llama3

Tool Commands

rag integrates external tools via prefixed commands with AST-safe evaluation.

CALC: - Calculator

CALC: 2^10
CALC: sqrt(144) * pi

TIME: - Current Time

TIME:

WIKI: - Wikipedia

WIKI: machine learning

SHELL: - Shell Commands

SHELL: git status
SHELL: ls -la
SHELL: git add .
SHELL: git commit -m "feat: new feature"

SEARCH: / WEB: - Web Search

SEARCH: latest AI news
WEB: https://example.com

Memory System

RAG_MEMORY_MODE=off    # No persistence
RAG_MEMORY_MODE=session # In-memory only
RAG_MEMORY_MODE=persist # SQLite in .cache/

rag remembers user facts (e.g., "My name is X") and uses them in conversations.

Knowledge & Retrieval

Uses FAISS (IndexFlatL2) for vector similarity search with all-MiniLM-L6-v2 embeddings.

KNOWLEDGE_BASE_FILE=knowledge_base.json
USE_FAISS=1
TOP_K_RETRIEVAL=3

UI & UX

Built with rich library for colored output and interactive TUI.

rag-tui

[bold]Agentic RAG Transformer[/]

ML, Sci-Fi, and Cosmos Assistant

> What is ML?

Machine learning is a subset of AI...

> CALC: 2^10

1024

> SHELL: git status

On branch main
Changes to be committed:
modified: README.asc

Features: colored output, interactive pickers, status indicators, keyboard shortcuts.

Data Collection

rag-collect

Collects data from TMDB (movies) and NASA (astronomy pictures).

TMDB_API_KEY=...
NASA_API_KEY=...

Deployment

Local Preview

cd release-webpage && python3 -m http.server 8000

GitHub Pages

Auto-deployed from main branch on push to release-webpage/**.

URL: https://bniladridas.github.io/rag/
Workflow: .github/workflows/pages.yml

Docker

docker build -t rag .
docker run rag rag-tui

Architecture

src/rag/
  config.py        # Configuration (.env)
  rag_engine.py    # Core RAG engine
  tools.py         # Tool executor
  memory.py        # Conversation memory
  ui/tui.py        # Terminal UI

Query -> Tools -> FAISS Retrieval -> Memory -> LLM Generation

Release

Automated via semantic-release on push to main.

semantic-release determines version and creates tags/releases
scripts/enhance_changelog.py updates CHANGELOG.md
scripts/update_webpage_version.py syncs version to HTML

Troubleshooting

ModuleNotFoundError

pip install -e .

Terminal transcript

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ pip install -e .
$ rag-tui
INFO: Loading faiss.
INFO: Load pretrained SentenceTransformer: all-MiniLM-L6-v2