Local LLM for VS code copilot: Difference between revisions

Revision as of 16:52, 9 December 2025

In Config VS Code (Insiders) we see how we can add a custom LLM via the "OpenAI compatible API".

Add in a custom model to copilot in vs code insiders is easy. Let it do agent stuff is really hard.

Here an example: Qwen3 Coder

Also change the directories if you need to.

VS Code JSON

  "github.copilot.chat.customOAIModels": {
    "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8": {
      "name": "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8",
      "url": "http://gate0.neuro.uni-bremen.de:8000/v1",
      "toolCalling": true,
      "vision": false,
      "thinking": true,
      "maxInputTokens": 256000,
      "maxOutputTokens": 8192,      
      "requiresAPIKey": false
    }

Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model.

There are at least three modes: XML, JSON, and something strange.

Qwen3 Coder falls in the category "strange".

vllm-qwen-coder.service

[Unit]
Description=vLLM Qwen3 Coder Service
After=network.target

[Service]
Type=simple
User=ollama
Group=ollama
WorkingDirectory=/ollama_coder/vllm-project
Environment="VLLM_LOGGING_LEVEL=INFO"
Environment="CUDA_VISIBLE_DEVICES=0"
Environment="HF_HOME=/ollama_coder/huggingface_cache"
ExecStart=/ollama_coder/vllm-project/.venv/bin/vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 \
    --port 8000 \
    --host 0.0.0.0 \
    --dtype auto \
    --max-model-len 262144 \
    --gpu-memory-utilization 0.85 \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

To be installed and activated via

cp -f vllm-qwen-coder.service /etc/systemd/system/vllm-qwen-coder.service
systemctl daemon-reload
systemctl enable vllm-qwen-coder
systemctl start vllm-qwen-coder

systemctl status vllm-qwen-coder

This look harmless but

How to install vLLM

We need uv: https://github.com/astral-sh/uv

cd /ollama_coder
uv init vllm-project

Copy pyproject.toml into /ollama_coder/vllm-project

uv sync

pyproject.toml

[project]
name = "vllm-project"
version = "0.1.0"
description = "A project using vLLM for LLM inference"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
    "vllm>=0.12.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=8.0.0",
    "black>=24.0.0",
    "ruff>=0.1.0",
]

[tool.uv]
dev-dependencies = [
    "pytest>=8.0.0",
    "black>=24.0.0",
    "ruff>=0.1.0",
]

@@ Line 3: / Line 3: @@
 Add in a custom model to copilot in vs code insiders is easy. Let it do agent stuff is really hard.
 Here an example: Qwen3 Coder
+Also change the directories if you need to.
 === VS Code JSON ===
@@ Line 18: / Line 20: @@
        "requiresAPIKey": false
      }
-</syntaxhighlight>Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model. There are at least three modes. XML, JSON, something strange. Qwen3 Coder falls in the category strange.
+</syntaxhighlight>Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model.
+There are at least three modes: XML, JSON, and something strange.
+Qwen3 Coder falls in the category "strange".
 === vllm-qwen-coder.service ===
@@ Line 61: / Line 67: @@
 cd /ollama_coder
 uv init vllm-project
-</syntaxhighlight>Put pyproject.toml into /ollama_coder/vllm-project<syntaxhighlight lang="bash">
+</syntaxhighlight>Copy pyproject.toml into /ollama_coder/vllm-project<syntaxhighlight lang="bash">
 uv sync
 </syntaxhighlight>