From NavigatorAI to Ollama on HiPerGator

This guide walks you through moving a NavigatorAI workflow (the one exercised by cegme/navigator-cli) onto a local Ollama server running on a HiPerGator compute node. The end state is an OpenAI-compatible endpoint at http://localhost:11434/v1 that your existing navigator-cli and MCP code can hit with one URL change.

Why switch to a local Ollama

NavigatorAI is the right default for assignments because it handles hosting, routing, and auth. Local Ollama becomes attractive once you need any of the following:

Heavier experimentation without counting against the Navigator quota.
Offline reproducibility: pin a specific open-weight model and rerun the same pipeline next semester.
Data that you would rather keep inside the cis6930 allocation.
Open-weight models that Navigator does not expose (Qwen2.5, Mistral Nemo, Command-R, Gemma 2, etc.).

Ollama speaks the same chat/completions schema as NavigatorAI, so every --system, --model, and --mcp-server flag in navigator-cli keeps working. Only the base URL and API key change.

Prerequisites

An active HiPerGator account with access to the cis6930 account and QoS.
SSH access to hpg.rc.ufl.edu from your laptop.
A working copy of cegme/navigator-cli and familiarity with its options (see the NavigatorAI Setup guide).
A model plan: decide in advance whether you want 7B-class models (one A100 is plenty) or 70B-class (much larger request).

Storage: do not install Ollama in $HOME

Home directories on HiPerGator are capped near 40 GB and throttled for I/O. Ollama models are large: Llama 3.1 8B is about 4.7 GB, Qwen2.5 14B is about 8 GB, and Llama 3.1 70B is about 40 GB on disk. Store both the binary and the model cache under the class blue allocation at /blue/cis6930, which has the quota and the throughput for this.

# Run once from a login node
mkdir -p /blue/cis6930/$USER/ollama/bin
mkdir -p /blue/cis6930/$USER/ollama/models

Add the following to ~/.bashrc on HiPerGator so every shell picks up the right paths:

export OLLAMA_HOME=/blue/cis6930/$USER/ollama
export OLLAMA_MODELS=$OLLAMA_HOME/models
export PATH=$OLLAMA_HOME/bin:$PATH

Reload the shell with source ~/.bashrc and confirm echo $OLLAMA_MODELS points into /blue/cis6930. If this variable is unset when Ollama runs, it will drop models into ~/.ollama and blow past your home quota.

Installing Ollama as a regular user

The official curl | sh installer writes into /usr/local, which regular HiPerGator users cannot touch. Grab the static Linux tarball directly instead:

cd $OLLAMA_HOME/bin
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama.tgz
tar -xzf ollama.tgz
rm ollama.tgz
./ollama --version

The extracted layout places ollama on your PATH through the earlier export. Run which ollama from a fresh shell to confirm the login shell resolves it from /blue/cis6930.

Requesting a compute node that exposes port 11434

Ollama listens on TCP port 11434. Running it on a login node is against HiPerGator policy, so grab an interactive GPU session first. This example asks for one A100 for two hours, which is plenty for 7B-14B models:

srun --account=cis6930 \
     --qos=cis6930 \
     --partition=gpu \
     --gres=gpu:a100:1 \
     --ntasks=1 \
     --cpus-per-task=4 \
     --mem=32gb \
     --time=02:00:00 \
     --pty bash -i

Once the prompt lands on a compute node, capture the hostname. You will need it to build the SSH tunnel:

hostname            # e.g. c0907a-s23.ufhpc
echo $HOSTNAME      # same value

Start the server bound to all interfaces on the node so that the login node and your laptop can reach it:

OLLAMA_HOST=0.0.0.0:11434 ollama serve > ~/ollama.log 2>&1 &

The & backgrounds the server and redirects logs to ~/ollama.log. Leave the shell alive as long as you want the server up. When you are done, kill %1 (or exit the srun session) stops it.

The default bind is 127.0.0.1:11434, which blocks the SSH tunnel. Always set OLLAMA_HOST=0.0.0.0:11434 on the compute node.

Reaching the compute node from your laptop

From your laptop, open an SSH tunnel that forwards localhost:11434 to the compute node you just captured:

ssh -N -L 11434:c0907a-s23.ufhpc:11434 <gatorlink>@hpg.rc.ufl.edu

Leave that terminal running. Any request to http://localhost:11434 on your laptop now hits the Ollama server on the HiPerGator GPU node. Verify the connection with:

curl http://localhost:11434/api/tags

An empty JSON list ({"models":[]}) means the tunnel is good and no models are pulled yet.

Downloading models into /blue/cis6930

Run ollama pull from inside the srun session so the download lands on the compute node and into $OLLAMA_MODELS:

ollama pull llama3.1:8b
ollama pull qwen2.5:7b
ollama pull mistral-nemo:12b

Confirm the cache is writing to the blue allocation and not your home directory:

ollama list
du -sh $OLLAMA_MODELS
realpath $OLLAMA_MODELS       # should start with /blue/cis6930

If du reports anything under ~/.ollama, stop the server, fix OLLAMA_MODELS, and rerun the pulls.

Which Ollama models support MCP?

MCP tool calling through navigator-cli requires the model to emit OpenAI-style tool_calls. On the Ollama side, this shows up as the tools capability. Pulling a model that lacks this capability will run fine for text but will silently ignore --mcp-server, leaving your MCP tools unused.

The table below lists models that currently expose tools and fit on a single A100:

Model tag	Disk	Context	Good for
`llama3.1:8b`	~4.7 GB	128k	General default, solid tool use
`llama3.1:70b`	~40 GB	128k	Large reasoning, needs 80 GB GPU
`qwen2.5:7b`	~4.4 GB	128k	Strong tool use, multilingual
`qwen2.5:14b`	~8.2 GB	128k	Better reasoning, one A100
`mistral-nemo:12b`	~7.1 GB	128k	Long documents, tool use
`command-r:35b`	~20 GB	128k	Long context, RAG-friendly

Models that do not support tools and will not drive an MCP loop: gemma:2b, gemma2:2b, phi3:mini, llama2, codellama, deepseek-coder:6.7b. You can still chat with them through navigator-cli, just skip --mcp-server.

Checking a specific model’s capabilities

Ask Ollama directly, either through the CLI:

ollama show llama3.1:8b | grep -i capabilities

A tool-capable model prints something like capabilities: completion, tools. The same information is available over HTTP, which is useful from a script on your laptop:

curl -s http://localhost:11434/api/show \
  -d '{"name":"llama3.1:8b"}' | jq '.capabilities'

If tools is missing from the list, the model will not round-trip MCP calls. Switch to one of the tool-capable tags above before wiring it into navigator-cli.

Pointing navigator-cli at your Ollama server

cegme/navigator-cli currently hardcodes the NavigatorAI base URL near the top of navigator_cli.py:

NAVIGATOR_BASE_URL = "https://api.ai.it.ufl.edu/v1"

You have two ways to redirect it at your local Ollama, depending on whether you want to change the CLI itself.

Option A. Small patch to navigator-cli

Make the base URL come from an environment variable so the change is reversible:

NAVIGATOR_BASE_URL = os.environ.get(
    "NAVIGATOR_BASE_URL",
    "https://api.ai.it.ufl.edu/v1",
)

Then drive it from the shell where you want Ollama instead of Navigator:

export NAVIGATOR_BASE_URL=http://localhost:11434/v1
export NAVIGATOR_API_KEY=ollama          # any non-empty string; Ollama ignores it

uv run python -m navigator_cli --model llama3.1:8b "Summarize RAG in two sentences."

# MCP still works, because both endpoints speak the same chat.completions schema
uv run python -m navigator_cli \
    --model qwen2.5:7b \
    --mcp-server mcp_servers/csv_tools.py \
    "What is the average score in mcp_servers/sample_data.csv?"

This is the preferred option for the class. Opening a PR against cegme/navigator-cli with exactly this change helps future students too.

Option B. Drive Ollama directly with the OpenAI SDK

If you do not want to fork navigator-cli, skip the CLI and use the OpenAI SDK the same way the navigatorai-setup guide shows:

from openai import OpenAI

client = OpenAI(
    api_key="ollama",
    base_url="http://localhost:11434/v1",
)

resp = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[{"role": "user", "content": "Name three steps in an ETL pipeline."}],
)
print(resp.choices[0].message.content)

Your MCP tool-call loop from navigator-cli/mcp_client.py will run unchanged against this client.

End-to-end smoke test

Run these from your laptop with the SSH tunnel open:

# 1. The tunnel sees the server
curl -s http://localhost:11434/api/tags | jq '.models[].name'

# 2. A plain completion works
curl -s http://localhost:11434/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "llama3.1:8b",
        "messages": [{"role":"user","content":"Reply with the single word OK."}]
      }' | jq -r '.choices[0].message.content'

# 3. navigator-cli reaches it
NAVIGATOR_BASE_URL=http://localhost:11434/v1 \
NAVIGATOR_API_KEY=ollama \
uv run python -m navigator_cli --model llama3.1:8b "Reply OK."

All three steps succeeding means the swap is clean and MCP-ready.

Troubleshooting

ollama: command not found inside srun. The compute node did not re-read ~/.bashrc. Run source ~/.bashrc or invoke $OLLAMA_HOME/bin/ollama by full path.
connect: connection refused from the tunnel. Ollama bound to 127.0.0.1. Restart it with OLLAMA_HOST=0.0.0.0:11434 ollama serve.
Disk quota exceeded while pulling a model. OLLAMA_MODELS is not set in the current shell, so Ollama wrote into ~/.ollama. Export the variable, delete ~/.ollama, and pull again.
Out of GPU memory on model load. Swap to a smaller variant (llama3.1:8b instead of :70b) or a quantized tag like llama3.1:8b-instruct-q4_K_M.
MCP tools never fire. Run ollama show <model> | grep tools. If tools is absent, switch to a tool-capable tag from the table above.
Tunnel drops when your laptop sleeps. Rebuild the tunnel and restart any long-running navigator-cli command; the Ollama server itself keeps running inside srun.
srun session expired. The server dies with the session. Start a new srun, rerun OLLAMA_HOST=0.0.0.0:11434 ollama serve &, and reopen the tunnel with the new hostname.

Reference

cegme/navigator-cli reference CLI and MCP client
NavigatorAI Setup for the baseline API workflow
Ollama model library for tags and sizes
Ollama API reference for HTTP details
HiPerGator SLURM commands for srun and partition options

back