Financial professionals live inside documents. Balance sheets, quarterly reports, invoices, and payroll exports are the raw materials of decision-making, but they’re also a source of friction. Answers to seemingly simple questions — “What were our top three expenses last quarter?” or “How much did we spend with Vendor X over the last year?” — often require manual searches across multiple PDFs and spreadsheets.
AI has the potential to change this workflow, but two barriers stand in the way:
- Data sensitivity. Most financial records cannot be shared with third-party APIs without raising compliance, confidentiality, or data governance issues. This rules out simply uploading to OpenAI or Anthropic.
- Operational complexity. Even if an LLM is available, turning it into a usable tool that integrates with financial workflows requires more than a chat interface. It needs structured ingestion, retrieval, calculation tools, and a way to deploy securely.
In this tutorial, we build a Financial Document Copilot designed for privacy and scalability. The solution starts locally with Ollama (to run open-source LLMs like Mistral on your own machine) and Qdrant (to index documents in a vector database). We then layer in LangChain to handle ingestion and retrieval, and n8n to orchestrate queries and outputs. Finally, we show how the prototype can be deployed to Knolli, where it becomes a production-ready copilot with authentication, monitoring, and monetization options.
By the end, you will have a working system that:
- Accepts financial documents in formats such as PDF, CSV, and Excel.
- Preprocesses and embeds these documents for semantic search.
- Answers natural language questions with precise citations back to the source files.
- Performs calculations with sandboxed tools.
- Runs fully on local infrastructure before scaling to a multi-user cloud deployment.
This is not a toy demo. It is a reference implementation that a finance team, consultant, or CTO could deploy today. Although our focus here is financial data, the architecture applies equally to HR handbooks, legal contracts, or customer support knowledge bases.
System Architecture
Before writing a line of code, it helps to see the copilot as a system. We are not building a chatbot — we are building a pipeline that ingests financial data, transforms it into a machine-readable format, reasons over it with a local LLM, and returns answers that a CFO can trust.
At a high level, the Financial Document Copilot is made up of five subsystems:
- Inference Engine (Ollama). Runs an open-source LLM like Mistral on local hardware. This ensures that no sensitive data leaves your environment.
- Vector Database (Qdrant). Stores embeddings of your financial documents so the copilot can retrieve only the relevant passages for each question.
- Retrieval and Prompt Layer (LangChain). Handles document chunking, metadata tagging, semantic search, and prompt templating. This layer is what makes the LLM “aware” of your data.
- Orchestration (n8n). Manages the input/output flow: a CFO asks a question via Slack or a web form, the query passes through the retrieval layer, and the result is returned in a structured format.
- Deployment and Control (Knolli). Turns the prototype into a production system with authentication, monitoring, and monetization features. Knolli lets you manage who can access the copilot, track usage, and even resell it as a product.
Data Flow
Here’s how the pieces fit together:
csjd
User Query --> n8n Webhook --> LangChain Retriever
--> Qdrant (financial_docs collection)
--> Top-k Chunks --> Ollama (Mistral 7B via API)
--> Tool Calls (math, export, filtering) --> Response
--> n8n Output --> Slack / Knolli UI
- A CFO asks: “What were our top three expenses in Q2?”
- The query hits an n8n webhook, which forwards it to LangChain.
- LangChain queries Qdrant, retrieving the top 5 most relevant text chunks from uploaded documents.
- The retrieved context is combined with the query inside a carefully designed system prompt, then passed to Ollama for inference.
- If the query requires calculations, Ollama invokes a math tool; if it needs structured output, it can call an export tool.
- The final answer is returned via n8n, either as a Slack message, JSON API response, or directly into the Knolli UI.
Why These Components?
- Ollama – Lightweight, fast, and supports quantized models that can run on commodity hardware. It also provides an OpenAI-compatible API, which makes integration with LangChain trivial.
- Qdrant – A production-grade vector database that supports metadata filtering, sharding, and horizontal scale. It’s well suited for financial document stores that will grow over time.
- LangChain – Provides ready-made abstractions for loaders, chunkers, retrievers, and chains. Without it, you’d be writing your own preprocessing and query orchestration logic.
- n8n – No-code/low-code automation that allows us to quickly connect the copilot to real interfaces like Slack, email, or REST APIs. This turns the copilot into a usable tool, not just a Jupyter notebook.
- Knolli – The bridge from prototype to production. Knolli adds user management, dashboards, monitoring, and monetization — the things most engineers skip but every business user demands.
Deployment Modes
You can think of this copilot as living in three phases:
- Local prototype. Everything runs on a developer laptop. Great for experimenting with ingestion and query quality.
- Team deployment. Ollama + Qdrant run on a private server (on-prem or VPC). n8n exposes an internal API, and the copilot is used by a small finance team.
- Production rollout. The workflow is uploaded to Knolli, where it gains authentication, monitoring, and monetization features. At this stage, it’s a real product, not just an internal experiment.
By the end of this section, you should have a clear mental model: Ollama is the brain, Qdrant is the memory, LangChain is the reasoning layer, n8n is the nervous system, and Knolli is the body that makes it all usable.

