← Back to cellule.ai ADVANCED

Proxy Mode

Run your own LLM backends and connect them to the Cellule.ai network. You keep full control over your models, hardware, and configuration.

Who is this for?

Proxy mode is for contributors who:

The pool never reassigns models to proxy workers. Your config is yours.

Quick start

1. Install

pip install iamine-ai -i https://cellule.ai/pypi --extra-index-url https://pypi.org/simple

2. Start your LLM backends

Start one or more llama-server instances on different ports:

# Backend 1: Reasoning model on port 8080
llama-server -m models/Qwen3-30B-A3B-Instruct-Q4_K_M.gguf \
  --host 127.0.0.1 --port 8080 -ngl 99 -c 4096

# Backend 2: Coding model on port 8081
llama-server -m models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf \
  --host 127.0.0.1 --port 8081 -ngl 99 -c 4096

# Backend 3: Fast chat model on port 8082
llama-server -m models/Qwen3.5-9B-Q4_K_M.gguf \
  --host 127.0.0.1 --port 8082 -ngl 99 -c 4096

3. Create proxy.json

{
    "pool_url": "wss://cellule.ai/ws",
    "backends": [
        {
            "name": "Reasoning",
            "url": "http://127.0.0.1:8080",
            "model": "Qwen3-30B-A3B",
            "model_path": "models/Qwen3-30B-A3B-Instruct-Q4_K_M.gguf",
            "worker_id": "MyWorker-reasoning",
            "bench_tps": 60.0
        },
        {
            "name": "Coder",
            "url": "http://127.0.0.1:8081",
            "model": "Qwen3-Coder-30B-A3B",
            "model_path": "models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf",
            "worker_id": "MyWorker-coder",
            "bench_tps": 55.0
        },
        {
            "name": "Chat",
            "url": "http://127.0.0.1:8082",
            "model": "Qwen3.5-9B",
            "model_path": "models/Qwen3.5-9B-Q4_K_M.gguf",
            "worker_id": "MyWorker-chat",
            "bench_tps": 80.0
        }
    ]
}

4. Launch the proxy

python -m iamine proxy -c proxy.json

Each backend registers as a separate worker on the pool. The pool routes traffic to each based on the model requested.

Configuration reference

FieldRequiredDescription
pool_urlYesWebSocket URL of the pool to join. Default: wss://cellule.ai/ws
backends[].nameYesDisplay name for this backend (shown in logs)
backends[].urlYesHTTP URL of your llama-server instance
backends[].modelYesModel name (used for routing)
backends[].model_pathYesPath to the GGUF file (for pool registry matching)
backends[].worker_idNoCustom worker ID. Auto-generated if omitted
backends[].bench_tpsNoDeclared throughput in tokens/sec. Pool uses this for routing priority

How it works

Proxy vs Auto mode

Auto (--auto)Proxy (-c proxy.json)
Model selectionPool decidesYou decide
Model downloadAutomaticYou manage
Inference engineBuilt-in (llama-cpp-python)Your backend (llama-server, vLLM, etc.)
GPU configAuto-detectedYou configure
Multi-model1 model per workerN models per machine
Auto-updateYes (pool pushes)Package only (config preserved)
$IAMINE earningsSame formulaSame formula

Supported backends

llama-server (recommended) — from llama.cpp. Best performance for GGUF models.

llama-server -m model.gguf --host 127.0.0.1 --port 8080 -ngl 99

Ollama — set the backend URL to Ollama's API port.

# Start Ollama
ollama serve

# In proxy.json, use:
"url": "http://127.0.0.1:11434"

vLLM — for high-throughput serving with continuous batching.

python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen3-30B-A3B --port 8080

Tips

bench_tps matters. The pool uses your declared throughput for routing. Set it honestly — if you claim 100 t/s but deliver 20, the pool will detect the discrepancy and lower your routing priority.
worker_id is persistent. Your wallet and earnings are tied to the worker_id. If you change it, you start with a fresh wallet.
$IAMINE earnings are the same formula regardless of mode. 60% of credits go to the worker. More jobs served = more earned. The M12 recruitment system places traffic where it's needed — fill a gap, earn more.

Example: Z2 multi-GPU setup

The Cellule.ai test network runs a real proxy with 4 backends on an AMD Ryzen AI MAX+ PRO 395 (94 GB RAM, 88 GB ROCm VRAM):

{
    "pool_url": "wss://cellule.ai/ws",
    "backends": [
        {"name": "RED",    "url": "http://127.0.0.1:8080", "model": "Qwen3-30B-A3B",         "bench_tps": 105.0, "worker_id": "RED-z2"},
        {"name": "Coder",  "url": "http://127.0.0.1:8081", "model": "Qwen3-Coder-30B-A3B",   "bench_tps": 60.0,  "worker_id": "Coder-z2"},
        {"name": "Tank",   "url": "http://127.0.0.1:8082", "model": "Qwen3.5-35B-A3B",       "bench_tps": 55.0,  "worker_id": "Tank-z2"},
        {"name": "Scout",  "url": "http://127.0.0.1:8083", "model": "Qwen3.5-9B",            "bench_tps": 50.0,  "worker_id": "Scout-z2"}
    ]
}

Combined throughput: 270 tokens/sec across 4 models. This single machine serves reasoning, coding, large context, and fast chat — all at once.

Cellule.ai — Decentralized AI, powered by the community

cellule.ai