Local LLM for VS code copilot: Difference between revisions
From Master of Neuroscience Wiki
No edit summary |
No edit summary |
||
| Line 17: | Line 17: | ||
"requiresAPIKey": false | "requiresAPIKey": false | ||
} | } | ||
</syntaxhighlight>Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model. There are at least three modes. XML, JSON, something strange. Qwen3 Coder falls in the category strange. | |||
vllm-qwen-coder.service<syntaxhighlight lang="bash">[Unit] | |||
Description=vLLM Qwen3 Coder Service | |||
After=network.target | |||
[Service] | |||
Type=simple | |||
User=ollama | |||
Group=ollama | |||
WorkingDirectory=/ollama_coder/vllm-project | |||
Environment="VLLM_LOGGING_LEVEL=INFO" | |||
Environment="CUDA_VISIBLE_DEVICES=0" | |||
Environment="HF_HOME=/ollama_coder/huggingface_cache" | |||
ExecStart=/ollama_coder/vllm-project/.venv/bin/vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 \ | |||
--port 8000 \ | |||
--host 0.0.0.0 \ | |||
--dtype auto \ | |||
--max-model-len 262144 \ | |||
--gpu-memory-utilization 0.85 \ | |||
--enable-auto-tool-choice \ | |||
--tool-call-parser qwen3_coder | |||
Restart=on-failure | |||
RestartSec=10 | |||
StandardOutput=journal | |||
StandardError=journal | |||
[Install] | |||
WantedBy=multi-user.target</syntaxhighlight>To be installed and activated via<syntaxhighlight lang="bash"> | |||
cp -f vllm-qwen-coder.service /etc/systemd/system/vllm-qwen-coder.service | |||
systemctl daemon-reload | |||
systemctl enable vllm-qwen-coder | |||
systemctl start vllm-qwen-coder | |||
systemctl status vllm-qwen-coder | |||
</syntaxhighlight>This look harmless but | |||
== How to install vLLM == | |||
We need uv: https://github.com/astral-sh/uv<syntaxhighlight lang="bash"> | |||
cd /ollama_coder | |||
uv init vllm-project | |||
</syntaxhighlight>Put pyproject.toml into /ollama_coder/vllm-project<syntaxhighlight lang="bash"> | |||
uv sync | |||
</syntaxhighlight>The pyproject.toml : <syntaxhighlight lang="toml"> | |||
[project] | |||
name = "vllm-project" | |||
version = "0.1.0" | |||
description = "A project using vLLM for LLM inference" | |||
readme = "README.md" | |||
requires-python = ">=3.10" | |||
dependencies = [ | |||
"vllm>=0.12.0", | |||
] | |||
[project.optional-dependencies] | |||
dev = [ | |||
"pytest>=8.0.0", | |||
"black>=24.0.0", | |||
"ruff>=0.1.0", | |||
] | |||
[tool.uv] | |||
dev-dependencies = [ | |||
"pytest>=8.0.0", | |||
"black>=24.0.0", | |||
"ruff>=0.1.0", | |||
] | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Revision as of 16:49, 9 December 2025
In Config VS Code (Insiders) we see how we can add a custom LLM via the "OpenAI compatible API".
Add in a custom model to copilot in vs code insiders is easy. Let it do agent stuff is really hard.
Here an example: Qwen3 Coder
VS Code JSON:
"github.copilot.chat.customOAIModels": {
"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8": {
"name": "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8",
"url": "http://gate0.neuro.uni-bremen.de:8000/v1",
"toolCalling": true,
"vision": false,
"thinking": true,
"maxInputTokens": 256000,
"maxOutputTokens": 8192,
"requiresAPIKey": false
}
Ollama is not a good choice for an agent model. This has something to do with the format the agent commands are delivered by the model. There are at least three modes. XML, JSON, something strange. Qwen3 Coder falls in the category strange. vllm-qwen-coder.service
[Unit]
Description=vLLM Qwen3 Coder Service
After=network.target
[Service]
Type=simple
User=ollama
Group=ollama
WorkingDirectory=/ollama_coder/vllm-project
Environment="VLLM_LOGGING_LEVEL=INFO"
Environment="CUDA_VISIBLE_DEVICES=0"
Environment="HF_HOME=/ollama_coder/huggingface_cache"
ExecStart=/ollama_coder/vllm-project/.venv/bin/vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 \
--port 8000 \
--host 0.0.0.0 \
--dtype auto \
--max-model-len 262144 \
--gpu-memory-utilization 0.85 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
To be installed and activated via
cp -f vllm-qwen-coder.service /etc/systemd/system/vllm-qwen-coder.service
systemctl daemon-reload
systemctl enable vllm-qwen-coder
systemctl start vllm-qwen-coder
systemctl status vllm-qwen-coder
This look harmless but
How to install vLLM
We need uv: https://github.com/astral-sh/uv
cd /ollama_coder
uv init vllm-project
Put pyproject.toml into /ollama_coder/vllm-project
uv sync
The pyproject.toml :
[project]
name = "vllm-project"
version = "0.1.0"
description = "A project using vLLM for LLM inference"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"vllm>=0.12.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"black>=24.0.0",
"ruff>=0.1.0",
]
[tool.uv]
dev-dependencies = [
"pytest>=8.0.0",
"black>=24.0.0",
"ruff>=0.1.0",
]