Ollama Integration

Last updated: January 24, 2026

Hone uses a pluggable AI backend system for local LLM inference. Ollama is the primary supported backend - this document covers its setup and usage.

Two AI Modes

Hone supports two complementary AI modes:

Classification Mode (via OLLAMA_HOST) - Standard Ollama /api/generate endpoint for:
- Merchant classification (tagging)
- Merchant name normalization
- Subscription detection
- Receipt-to-transaction matching
Agentic Mode (via ANTHROPIC_COMPATIBLE_HOST) - Ollama's Anthropic Messages API for:
- Tool-calling workflows where the AI can query your financial data
- Deeper spending anomaly analysis
- Richer duplicate subscription explanations
- Explore Mode - conversational interface for querying financial data
- Any task requiring iterative reasoning with data access

Both modes are optional and can be used independently or together.

Other Backends

Other backends are supported through the same AIBackend trait:

OpenAI-compatible - Works with Docker Model Runner, vLLM, LocalAI, llama-server, and any OpenAI-compatible endpoint
Mock - For testing

All backends share the same prompts and response types. See "Alternative: OpenAI-Compatible Backend" below for non-Ollama setup.

AI features are optional - Hone works without them, falling back to pattern matching and the "Other" category.

How It Works

During import, transactions are processed in two ways:

Tagging

Transactions are tagged in this order:

User rules - Exact/contains/regex patterns you define
Auto-patterns - Built-in patterns on root tags (e.g., "WALMART" → Groceries)
Bank category - Uses bank-provided category if available (e.g., Amex "Transportation-Fuel" → Transport)
Ollama - LLM classifies the merchant name
Fallback - Unclassified transactions tagged as "Other"

Bank categories are mapped to Hone tags with 0.75 confidence. Supported mappings include:

Transportation-Fuel → Transport > Gas
Transportation-Parking → Transport > Parking
Transportation-Tolls → Transport > Tolls
Transportation-Auto Services → Transport > Auto
Transportation-* → Transport (generic)
Restaurant-* → Dining
*-Groceries → Groceries
Entertainment-Associations → Personal > Fitness (sports clubs, gyms)
Entertainment-* → Entertainment (other)
Airlines-*, Lodging-* → Travel
Healthcare-*, Pharmacy-* → Healthcare
Fees & Adjustments-*, Financial-* → Financial

Generic categories like "Merchandise & Supplies-Internet Purchase" fall through to Ollama for merchant-based classification.

Merchant Normalization

After tagging, Ollama normalizes raw bank descriptions into clean merchant names:

AMZN MKTP US*2X7Y9Z → Amazon Marketplace
TRADER JOE'S #456 → Trader Joe's
SPOTIFY USA*1234567 → Spotify

Normalized names are stored in the merchant_normalized column and used for display, while the original description is preserved for matching.

Subscription Classification

During subscription detection, Ollama classifies merchants as subscription services or retail:

SUBSCRIPTION: Netflix, Spotify, gym memberships, meal kits (HelloFresh, Blue Apron)
RETAIL: Grocery stores (Trader Joe's, Costco), gas stations, restaurants

This reduces false positives where regular shopping gets flagged as a "zombie subscription". Classifications are cached in the merchant_subscription_cache table. User exclusions (via "Not a subscription" button) override Ollama classifications.

Receipt-to-Transaction Matching

When matching receipts to bank transactions, some matches are "ambiguous" (score 0.5-0.85). For these, Ollama evaluates whether the receipt and transaction are likely the same purchase:

Tips: Receipt shows $50, bank shows $60 → Ollama recognizes this as a 20% tip
Processing delays: Receipt dated Jan 10, transaction dated Jan 12 → normal for credit cards
Merchant name variations: Receipt says "Amazon", bank shows "AMZN MKTP US" → same merchant

The final match score combines algorithmic matching (60%) with Ollama's confidence (40%). Ollama's reasoning is stored in the match factors for transparency in the UI.

Duplicate Detection Reasoning

When duplicate subscriptions are detected (e.g., 3 streaming services), Ollama analyzes what they have in common and what makes each unique:

Example output:

What they have in common: All offer on-demand streaming of movies and TV series

What makes each unique:

Netflix: International content, mature themes, extensive originals

Disney+: Family content, Marvel, Star Wars, Pixar exclusives

HBO Max: HBO originals, Warner Bros theatrical releases

This helps you understand whether you truly have redundant services or if each serves a distinct purpose. The analysis is stored in the alert and displayed in the alert detail modal.

Only service names and categories are sent to Ollama (no account info), preserving privacy.

Setup

Option 1: Remote Ollama Server (Recommended)

Run Ollama on a more powerful machine (e.g., Windows PC with NVIDIA GPU) and point Hone to it:

# On your Windows machine with GPU:
# 1. Install Ollama from https://ollama.com/download
# 2. Pull a model: ollama pull llama3.2
# 3. Ollama listens on localhost:11434 by default

# To expose to network, set environment variable before starting Ollama:
# OLLAMA_HOST=0.0.0.0:11434

# In Hone's .env on the Pi:
OLLAMA_HOST=http://192.168.1.100:11434  # Your Windows machine's IP

This gives you fast GPU-accelerated inference while keeping Hone on the Pi.

Option 2: Ollama on the Same Machine

If running Hone directly on a machine (not in Docker), you can run Ollama locally:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.2

# Ollama runs on port 11434 by default

Set environment variables:

export OLLAMA_HOST=http://localhost:11434
export OLLAMA_MODEL=llama3.2  # optional, defaults to llama3.2

Model Recommendations

Text Models (Classification, Normalization, Entity Suggestions)

Model	VRAM	Speed	Quality	Notes
`gemma3`	~4GB	Fast	Excellent	Recommended - best instruction following
`llama3.2:3b`	~3GB	Fast	Good	Alternative option
`llama3.2:1b`	~2GB	Very fast	Decent	Fastest option

Why gemma3? It excels at following formatting instructions precisely, which is critical for merchant normalization (preserving apostrophes in brand names like "Trader Joe's", "McDonald's").

Vision Models (Receipt Parsing)

Model	VRAM	Speed	Quality	Notes
`llama3.2-vision:11b`	~8GB	Medium	Good	Recommended for receipts
`llava:13b`	~10GB	Medium	Good	Alternative vision model
`llava:7b`	~5GB	Fast	Decent	Smaller/faster option

Quick Start

# On your Windows machine with GPU:

# Required: text model for classification and normalization
ollama pull gemma3

# Required: vision model for receipt parsing
ollama pull llama3.2-vision:11b

Environment Variables

# In your .env or shell
export OLLAMA_HOST=http://<windows-ip>:11434
export OLLAMA_MODEL=gemma3                     # Default text model
export OLLAMA_VISION_MODEL=llama3.2-vision:11b  # Vision model for receipts

With NVIDIA GPU: Use the recommended models above. GPU inference is ~10-50x faster than CPU.

On Raspberry Pi (CPU only): Stick with llama3.2:1b for text, skip vision (too slow).

Agentic Mode Setup

Ollama 0.14+ supports the Anthropic Messages API, enabling tool-calling workflows. This lets the AI dynamically query your transaction data for richer analysis.

Why Use Agentic Mode?

Feature	Classification Only	With Agentic Mode
Spending anomalies	Pre-assembled context, single prompt	AI investigates transactions, finds patterns
Duplicate analysis	Lists services with overlap	AI checks usage frequency, suggests which to cancel
Explanations	Based on provided data only	Can query for additional context

Configuration

# In your .env or shell (in addition to OLLAMA_HOST)
export ANTHROPIC_COMPATIBLE_HOST=http://<ollama-ip>:11434
export ANTHROPIC_COMPATIBLE_MODEL=llama3.1  # Must support tool calling

Recommended models for agentic tasks:

llama3.3 - Recommended - best instruction following, handles empty data gracefully
llama3.1 - Good alternative, faster than 3.3 but less precise with complex prompts
llama3.1:70b or llama3.3:70b - Better reasoning, requires more VRAM
qwen3-coder - Works but may require XML fallback parsing
Any model with 32K+ context that supports function calling

Performance note: Agentic queries typically take 15-30+ seconds due to the multi-turn workflow (initial LLM call → tool execution → response processing). Larger models like llama3.3 produce better responses but are slower. This is a trade-off between quality and speed.

How It Works

When agentic mode is enabled, certain analysis tasks use a multi-turn workflow:

Hone sends a prompt with available tools (search_transactions, get_merchants, etc.)
The AI decides what data it needs and calls tools
Hone executes the tools and returns results
The AI reasons about the data and may call more tools
When done, the AI returns a final explanation

Example: Spending Anomaly Analysis

AI: "I'll check the transactions for this category..."
    [calls get_merchants tool]
AI: "I see 3 new merchants this month. Let me look at the largest transactions..."
    [calls search_transactions tool]
AI: "The increase is due to:
    1. New gym membership at Equinox ($150/month)
    2. 4 DoorDash orders totaling $180 (vs $50 last month)
    3. One-time purchase at REI for $320"

Tools Available to the AI

Tool	Description
`search_transactions`	Search by query, date range, tag, or amount
`get_spending_summary`	Spending breakdown by category
`get_subscriptions`	List subscriptions with status and amounts
`get_alerts`	Active waste detection alerts
`compare_spending`	Compare spending between periods
`get_merchants`	Top merchants by spending amount
`get_account_summary`	Overview of all accounts

All tool calls stay local—data never leaves your network.

CLI Output

When running detection, you'll see which AI capabilities are active:

$ hone detect
🔍 Running waste detection...
   🤖 AI backend enabled (full: classification + agentic analysis)
   Mode: All detection types

Possible modes:

full: classification + agentic analysis - Both backends configured
agentic analysis only - Only orchestrator configured
classification only - Only Ollama generate API configured
Tips shown if neither is configured

Explore Mode

Explore Mode provides a conversational interface for querying your financial data. Navigate to the Explore page and ask questions like:

"What did I spend last month?"
"Show my subscriptions"
"Any zombie subscriptions?"
"Compare this month to last month"

Features:

Multi-turn conversations with session persistence (30-minute timeout)
Model selector to switch between available Ollama models at runtime
All queries tracked in AI Metrics as explore_query operations
Tool call tracking: view which tools were called, their inputs, and outputs in AI Metrics detail view

The AI uses the same tools listed above to answer your questions, dynamically querying your data as needed.

Testing the Connection

Using the CLI

# Test Ollama connection and run sample classifications
hone ollama test

# Test with a specific merchant
hone ollama test --merchant "TARGET #1234 AUSTIN TX"

# Test receipt parsing with an image
hone ollama test --receipt /path/to/receipt.jpg

Manual Testing

# Check Ollama is running
curl http://localhost:11434/api/tags

# Test a classification (what Hone does internally)
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "Classify this merchant name and return JSON only:\nMerchant: \"NETFLIX.COM*1234\"\n\nReturn format:\n{\"merchant\": \"normalized name\", \"category\": \"category\"}\n\nCategories: streaming, music, cloud_storage, fitness, news, food_delivery, shopping, utilities, other",
  "stream": false
}'

Category Mapping

Ollama returns categories that Hone maps to your tag hierarchy:

Ollama Category	Hone Tag
streaming, music, cloud_storage, news	Subscriptions
food_delivery, restaurant, dining	Dining
groceries	Groceries
shopping	Shopping
utilities	Utilities
transport, gas, rideshare	Transport
entertainment	Entertainment
travel, hotel, airline	Travel
healthcare, pharmacy	Healthcare
income, salary, deposit	Income
housing, rent, mortgage	Housing
gifts	Gifts
financial, bank, investment	Financial
fitness	Personal
other	Other

Disabling Ollama

Simply don't set OLLAMA_HOST. Hone will skip LLM classification and rely on rules and patterns.

Performance Notes

First request to Ollama loads the model into memory (~10-30 seconds)
Subsequent requests are fast (~1-3 seconds per merchant on Pi)
Ollama caches models in memory, so keep it running for best performance
On resource-constrained systems, imports may be slower with Ollama enabled

Troubleshooting

Ollama not responding:

# Check if running
curl http://localhost:11434/api/tags

# Check logs
docker logs ollama  # if using Docker
journalctl -u ollama  # if using systemd

Out of memory on Pi:

Use a smaller model (llama3.2:1b)
Increase swap space
Run Ollama on a separate machine

Slow classification:

Normal on first request (model loading)
Consider llama3.2:1b for faster responses
Check Pi temperature (may throttle if hot)

Wrong classifications:

Add user rules for frequently misclassified merchants
Rules take priority over Ollama

Alternative: OpenAI-Compatible Backend

If you prefer to use Docker Model Runner, vLLM, LocalAI, llama-server, or another OpenAI-compatible server instead of Ollama, Hone supports this via the openai_compatible backend.

When to Use

Docker Model Runner: Runs in Docker Desktop, easy setup on Windows/Mac
vLLM: High-performance inference server, great for GPUs
LocalAI: Self-hosted, supports GGUF models
llama-server (llama.cpp): Lightweight server for llama.cpp models

All of these expose an OpenAI-compatible /v1/chat/completions endpoint.

Configuration

# Select the OpenAI-compatible backend
export AI_BACKEND=openai_compatible  # or: vllm, localai, llamacpp (all aliases)

# Required: server URL
export OPENAI_COMPATIBLE_HOST="http://192.168.1.100:8080"

# Optional: model name (default: gpt-3.5-turbo)
export OPENAI_COMPATIBLE_MODEL="llama3.2"

# Optional: API key if your server requires authentication
export OPENAI_COMPATIBLE_API_KEY="your-api-key"

Example: Docker Model Runner

# On your Windows/Mac machine with Docker Desktop:
# 1. Enable Docker Model Runner in Docker Desktop settings
# 2. Pull a model: docker model pull llama3.2
# 3. It runs on port 8080 by default

# In Hone's .env on the Pi:
AI_BACKEND=openai_compatible
OPENAI_COMPATIBLE_HOST=http://192.168.1.100:8080
OPENAI_COMPATIBLE_MODEL=llama3.2

Example: vLLM

# On your GPU server:
# pip install vllm
# vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8000

# In Hone's .env:
AI_BACKEND=vllm
OPENAI_COMPATIBLE_HOST=http://gpu-server:8000
OPENAI_COMPATIBLE_MODEL=meta-llama/Llama-3.2-3B-Instruct

Example: LocalAI

# On your server:
# docker run -p 8080:8080 localai/localai:latest

# In Hone's .env:
AI_BACKEND=localai
OPENAI_COMPATIBLE_HOST=http://server:8080
OPENAI_COMPATIBLE_MODEL=gpt-3.5-turbo  # LocalAI's default model name

Vision Models

For receipt parsing, you need a vision-capable model. Set it separately:

export OPENAI_COMPATIBLE_VISION_MODEL="llava:13b"  # or any vision model your server supports

Note: Not all OpenAI-compatible servers support vision. Check your server's documentation.

Testing the Connection

# Check if server is running
curl http://your-server:8080/v1/models

# Test a completion
curl http://your-server:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello"}]
  }'