Local LLM for VS code copilot: Difference between revisions
From Master of Neuroscience Wiki
No edit summary |
No edit summary |
||
| Line 5: | Line 5: | ||
Here an example: Qwen3 Coder | Here an example: Qwen3 Coder | ||
VS Code JSON | === VS Code JSON === | ||
<syntaxhighlight lang="json"> | |||
"github.copilot.chat.customOAIModels": { | "github.copilot.chat.customOAIModels": { | ||
"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8": { | "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8": { | ||
| Line 19: | Line 20: | ||
</syntaxhighlight>Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model. There are at least three modes. XML, JSON, something strange. Qwen3 Coder falls in the category strange. | </syntaxhighlight>Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model. There are at least three modes. XML, JSON, something strange. Qwen3 Coder falls in the category strange. | ||
vllm-qwen-coder.service<syntaxhighlight lang="bash">[Unit] | === vllm-qwen-coder.service === | ||
<syntaxhighlight lang="bash">[Unit] | |||
Description=vLLM Qwen3 Coder Service | Description=vLLM Qwen3 Coder Service | ||
After=network.target | After=network.target | ||
| Line 61: | Line 63: | ||
</syntaxhighlight>Put pyproject.toml into /ollama_coder/vllm-project<syntaxhighlight lang="bash"> | </syntaxhighlight>Put pyproject.toml into /ollama_coder/vllm-project<syntaxhighlight lang="bash"> | ||
uv sync | uv sync | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== pyproject.toml === | |||
<syntaxhighlight lang="toml"> | |||
[project] | [project] | ||
name = "vllm-project" | name = "vllm-project" | ||
Revision as of 16:50, 9 December 2025
In Config VS Code (Insiders) we see how we can add a custom LLM via the "OpenAI compatible API".
Add in a custom model to copilot in vs code insiders is easy. Let it do agent stuff is really hard.
Here an example: Qwen3 Coder
VS Code JSON
"github.copilot.chat.customOAIModels": {
"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8": {
"name": "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8",
"url": "http://gate0.neuro.uni-bremen.de:8000/v1",
"toolCalling": true,
"vision": false,
"thinking": true,
"maxInputTokens": 256000,
"maxOutputTokens": 8192,
"requiresAPIKey": false
}
Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model. There are at least three modes. XML, JSON, something strange. Qwen3 Coder falls in the category strange.
vllm-qwen-coder.service
[Unit]
Description=vLLM Qwen3 Coder Service
After=network.target
[Service]
Type=simple
User=ollama
Group=ollama
WorkingDirectory=/ollama_coder/vllm-project
Environment="VLLM_LOGGING_LEVEL=INFO"
Environment="CUDA_VISIBLE_DEVICES=0"
Environment="HF_HOME=/ollama_coder/huggingface_cache"
ExecStart=/ollama_coder/vllm-project/.venv/bin/vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 \
--port 8000 \
--host 0.0.0.0 \
--dtype auto \
--max-model-len 262144 \
--gpu-memory-utilization 0.85 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
To be installed and activated via
cp -f vllm-qwen-coder.service /etc/systemd/system/vllm-qwen-coder.service
systemctl daemon-reload
systemctl enable vllm-qwen-coder
systemctl start vllm-qwen-coder
systemctl status vllm-qwen-coder
This look harmless but
How to install vLLM
We need uv: https://github.com/astral-sh/uv
cd /ollama_coder
uv init vllm-project
Put pyproject.toml into /ollama_coder/vllm-project
uv sync
pyproject.toml
[project]
name = "vllm-project"
version = "0.1.0"
description = "A project using vLLM for LLM inference"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"vllm>=0.12.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"black>=24.0.0",
"ruff>=0.1.0",
]
[tool.uv]
dev-dependencies = [
"pytest>=8.0.0",
"black>=24.0.0",
"ruff>=0.1.0",
]