LLM Gateway

OpenAI-compatible API backed by local models across 2 nodes

Available Models

codestral:22b Code generation [FAST]
deepseek-coder-v2:16b Code generation [FAST]
deepseek-r1:70b Deep reasoning, niche knowledge [VERY SLOW]
devstral-2:123b Code generation, 123B [VERY SLOW]
gemma3:27b General purpose [MEDIUM]
gemma4:31b Thinking model, strong [SLOW]
glm4:9b Fast chat, lightweight [FAST]
gpt-oss:120b GPT-OSS 120B [SLOW]
gpt-oss:latest GPT-OSS 13B [FAST]
mistral-small3.2:latest Fast general chat [FAST]
nemotron-3-super:latest NVIDIA Nemotron 86B [VERY SLOW]
qwen3.5:latest Thinking model, high quality [SLOW]
qwen3.6:latest
qwen3:32b Strong general purpose [MEDIUM]

Recommended

General chat: mistral-small3.2:latest or glm4:9b (fastest)

Code: codestral:22b or deepseek-coder-v2:16b

Quick Start

Base URL: https://api.jonbowden.com.ngrok.dev/v1

Authenticate with your API key using the Authorization: Bearer header.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.jonbowden.com.ngrok.dev/v1",
    api_key="YOUR_API_KEY",
)

resp = client.chat.completions.create(
    model="mistral-small3.2:latest",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

curl

curl https://api.jonbowden.com.ngrok.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-small3.2:latest",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Cursor / VS Code

Settings → Models → Add OpenAI-compatible provider:

Base URL: https://api.jonbowden.com.ngrok.dev/v1

API Key: your provided key

Endpoints

GET  /v1/models              List available models
POST /v1/chat/completions    Chat (supports streaming)
GET  /chats                  Your chat history
GET  /health                 Health check

Features

Streaming & non-streaming responses

Per-key rate limiting and usage tracking

Encrypted chat history with export

13 models across 2 nodes (spark1 + spark2) via 200Gbps link