Local LLM for VS code copilot
From Master of Neuroscience Wiki
In Config VS Code (Insiders) we see how we can add a custom LLM via the "OpenAI compatible API".
Add in a custom model to copilot in vs code insiders is easy. Let it do agent stuff is really hard.
Here an example: Qwen3 Coder
VS Code JSON:
"github.copilot.chat.customOAIModels": {
"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8": {
"name": "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8",
"url": "http://gate0.neuro.uni-bremen.de:8000/v1",
"toolCalling": true,
"vision": false,
"thinking": true,
"maxInputTokens": 256000,
"maxOutputTokens": 8192,
"requiresAPIKey": false
}
Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model. There are at least three modes. XML, JSON, something strange. Qwen3 Coder falls in the category strange. vllm-qwen-coder.service
[Unit]
Description=vLLM Qwen3 Coder Service
After=network.target
[Service]
Type=simple
User=ollama
Group=ollama
WorkingDirectory=/ollama_coder/vllm-project
Environment="VLLM_LOGGING_LEVEL=INFO"
Environment="CUDA_VISIBLE_DEVICES=0"
Environment="HF_HOME=/ollama_coder/huggingface_cache"
ExecStart=/ollama_coder/vllm-project/.venv/bin/vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 \
--port 8000 \
--host 0.0.0.0 \
--dtype auto \
--max-model-len 262144 \
--gpu-memory-utilization 0.85 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
To be installed and activated via
cp -f vllm-qwen-coder.service /etc/systemd/system/vllm-qwen-coder.service
systemctl daemon-reload
systemctl enable vllm-qwen-coder
systemctl start vllm-qwen-coder
systemctl status vllm-qwen-coder
This look harmless but
How to install vLLM
We need uv: https://github.com/astral-sh/uv
cd /ollama_coder
uv init vllm-project
Put pyproject.toml into /ollama_coder/vllm-project
uv sync
The pyproject.toml :
[project]
name = "vllm-project"
version = "0.1.0"
description = "A project using vLLM for LLM inference"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"vllm>=0.12.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"black>=24.0.0",
"ruff>=0.1.0",
]
[tool.uv]
dev-dependencies = [
"pytest>=8.0.0",
"black>=24.0.0",
"ruff>=0.1.0",
]