01 — AI

AI Chatbot.

A cybersecurity assistant powered by Ollama, combining intent-aware chat, RAG-augmented knowledge retrieval, and automated Lua script dispatch for real-time security operations.


Overview

The WormSec AI Chatbot is a FastAPI-based server that provides an intelligent cybersecurity assistant. It connects to an Ollama instance running the qwen2.5:3b model to handle two distinct types of requests:

The system also supports mixed intents, where a single prompt contains both a question and an action (e.g. "What is port 22 used for and close it"). In this case, the chatbot streams a full informational answer while simultaneously dispatching the Lua automation in the background.

The AI server acts as the bridge between the user-facing frontend and the backend's Lua execution engine. It does not execute security commands directly — it generates Lua code and sends it to the Lua server for execution.

Architecture

The chatbot is composed of two core modules:

Core Modules

Module File Purpose
AI Server main.py FastAPI application — intent classification, chat streaming, Lua generation, model & history management
RAG Manager rag_manager.py PDF indexing, text chunking, and context retrieval via ChromaDB

Intent Classification

Every incoming user message goes through a rule-based intent classifier before being routed. The classifier splits the prompt into clauses and analyses each one:

Classification relies on three word sets: question starters (what, why, how…), action verbs (close, block, scan…), and security objects (port, firewall, ip…). High-confidence action phrases like "close port" or "block ip" take priority.

Request Flow

User prompt
  │
  ├─ classify_intent()
  │    ├─ "info"   → _build_chat_messages() → stream response (with RAG)
  │    ├─ "action" → generate_lua_code() → dispatch to backend
  │    │              └─ stream terse confirmation
  │    └─ "mixed"  → generate_lua_code() → dispatch to backend (background)
  │                   └─ stream full answer (with RAG + action banner)
  │
  └─ History saved to disk (JSON per user)

Installation

The recommended deployment method is Docker Compose. The stack includes three services: ollama (LLM runtime), ollama-init (pulls the model on first run), and chatbot (the AI server).

Prerequisites

Quick Start

# Clone the repository
git clone <repo-url> ai-chatbot
cd ai-chatbot

# Start the full stack
docker compose up -d

# Verify the services are healthy
curl http://localhost:8000/health

On first launch, the ollama-init container will pull the qwen2.5:3b model. This may take several minutes depending on your connection speed.

Local Development

# Install dependencies
pip install -r requirements.txt

# Start Ollama locally
ollama serve

# Pull the model
ollama pull qwen2.5:3b


# Start the AI server
python main.py
The AI server starts on port 8000 and the Lua server on port 8001. Both are required for action requests to work.

Configuration

The system is configured through a combination of environment variables and text configuration files.

Environment Variables

Variable Default Description
DEBUG true Enables debug logging and hot-reload
OLLAMA_HOST http://localhost:11434 Ollama API endpoint
LUA_SERVER_URL http://localhost:8001 Lua server endpoint for script dispatch
HISTORY_DIR history Directory for per-user chat history files
CHROMA_PATH chroma_db ChromaDB persistent storage path
RAG_DATA_DIR rag_data_base Directory containing PDF documents to index
RAG_DISTANCE_THRESHOLD 0.8 Maximum cosine distance for RAG results

Configuration Files

File Purpose
config_system.txt System prompt for the chat model — defines the AI persona, response guidelines, tone, and safety rules
config_system_lua.txt System prompt for Lua code generation — defines the reasoning process and output rules
models_config.txt List of available Ollama models — auto-synced on startup
history_config.txt Per-user and default history limits (format: username:limit or default:50)

API Reference

The AI server exposes a REST API on port 8000. All endpoints accept and return JSON unless noted otherwise.

POST /chat

Main chat endpoint. Streams an AI response as text/plain. If the prompt contains an action, Lua code is generated and dispatched in the background.

{
  "username": "operator1",
  "model": "qwen2.5:3b",
  "prompt": "What is XSS and block port 80"
}

The response is streamed token-by-token. For mixed intents, the stream begins with an [Action dispatched] or [Action not supported] banner, followed by [Sources] if RAG documents were used.

POST /add_model

Pulls a new model via Ollama and registers it in models_config.txt.

{ "model_name": "llama3:8b" }

POST /del_model

Removes a model from Ollama and from the configuration.

{ "model_name": "llama3:8b" }

POST /sync_models

Re-syncs models_config.txt with all models currently installed in Ollama. No body required.

POST /clear_history

Clears the conversation history for a specific user.

{ "username": "operator1" }

POST /reindex

Forces a full re-index of all PDF documents in rag_data_base/. Clears the existing ChromaDB collection and rebuilds it from scratch.

GET /health

Returns the server health status, available models, and RAG chunk count.

{
  "status": "healthy",
  "available_models": ["qwen2.5:3b"],
  "model_count": 1,
  "rag_chunks": 347
}

RAG Pipeline

The Retrieval-Augmented Generation pipeline indexes PDF documents and injects relevant context into the AI's prompt to ground answers in authoritative sources.

Indexed Documents

Place PDF files in the rag_data_base/ directory. The system ships with the following reference documents:

How It Works

On startup, the RAGManager scans rag_data_base/ for PDFs, extracts text with PyPDF2, splits it into overlapping chunks (1200 chars, 200 overlap), and indexes them in a ChromaDB collection using cosine similarity.

At query time, the top 5 most relevant chunks are retrieved and filtered by a distance threshold (default 0.8). Chunks that pass are injected into the system prompt alongside source attribution, so the model can cite references like "According to NIST SP 800-53…".

The ChromaDB index is persisted across restarts via Docker volumes. PDFs are only re-extracted when the index is empty or when a /reindex request is made.

Lua Generation

When the intent classifier detects an action request, the AI generates Lua code and dispatches it to the WormSec backend for execution. The process is fully automated and transparent to the user.

How It Works

Before generating code, the AI fetches the list of available Lua functions from the backend's context endpoint. It then builds a system prompt that includes all function names and descriptions, allowing the model to reason about which functions to call.

The Lua generation uses a dedicated system prompt (config_system_lua.txt) with few-shot examples. The model outputs raw function calls — no markdown, no prose. The output is then sanitised: markdown fences are stripped, non-Lua markers (bash, Python, etc.) are rejected, and prose-like lines are discarded.

Script Assembly

The final script includes the full function definitions (fetched from the backend) for every called function, making each script self-contained. If no known function matches the request, the action is marked as unsupported and no script is dispatched.

-- WormSec Security Automation Script

function close_port(port, protocol)
  firewall:add_rule("deny", port, protocol or "tcp")
  return true
end

-- Execution
close_port(22, "tcp")
The list of available Lua functions is managed by the backend. Refer to the Backend documentation for the full function reference.

What's next

The AI chatbot integrates tightly with the backend's Lua execution layer. To understand how dispatched scripts are executed and how the firewall, IDS, and service management systems work, refer to the Backend documentation.

For details on the frontend chat interface that communicates with this API, refer to the Frontend documentation.

Next — Backend →