Model Management
Models are the AI services you interact with. Each model is associated with a provider and can have custom configuration settings.
Commands
List Models
List all available models from configured providers:
viben model list
Output:
Available Models:
Provider: anthropic-main
claude-opus-4-20250514 200K context $15/$75
claude-sonnet-4-20250514* 200K context $3/$15
claude-3-5-haiku-latest 200K context $0.25/$1.25
Provider: openai-main
gpt-4-turbo 128K context $10/$30
gpt-4o 128K context $2.5/$10
gpt-4o-mini 128K context $0.15/$0.6
* = default model
Filter by provider:
viben model list --provider anthropic-main
For JSON output:
viben model list --json
Check Model Status
View the availability status of configured models:
# Check all models
viben model status
# Check specific model
viben model status -n claude-sonnet-4-20250514
Output:
Model Status:
Default: claude-sonnet-4-20250514
claude-sonnet-4-20250514 anthropic-main ✓ available
gpt-4-turbo openai-main ✓ available
claude-3-5-haiku-latest anthropic-main ✓ available
local-llama local-ollama ✗ provider offline
Set Default Model
Set a model as the default for all operations:
viben model set-default -n <model>
Example:
viben model set-default -n claude-sonnet-4-20250514
Configuration File
Models are configured in ~/.viben/models.yaml:
# ~/.viben/models.yaml
version: 1
# Default model
default: claude-sonnet-4-20250514
# ============================================================
# Model Aliases
# Use short names to reference commonly used models
# ============================================================
aliases:
# Speed-focused
fast: claude-3-5-haiku-latest
quick: gpt-4o-mini
# Quality-focused
smart: claude-sonnet-4-20250514
balanced: gpt-4o
# Maximum capability
best: claude-opus-4-20250514
powerful: gpt-4-turbo
# Purpose-specific
code: claude-sonnet-4-20250514
chat: claude-3-5-haiku-latest
reasoning: o1-preview
# Provider-specific
gpt: gpt-4-turbo
claude: claude-sonnet-4-20250514
gemini: gemini-1.5-pro
# ============================================================
# Fallback Chain
# Models to try in order when primary is unavailable
# ============================================================
fallbacks:
- claude-sonnet-4-20250514 # Primary
- gpt-4-turbo # First fallback
- claude-3-5-haiku-latest # Second fallback
- gpt-4o-mini # Last resort
# ============================================================
# Model-specific Configuration
# Override default parameters for each model
# ============================================================
model_config:
claude-sonnet-4-20250514:
provider: anthropic-main
max_tokens: 8192
temperature: 0.7
gpt-4-turbo:
provider: openai-main
max_tokens: 4096
temperature: 0.7
Model Configuration
Each model can have custom settings that override defaults:
model_config:
claude-sonnet-4-20250514:
provider: anthropic-main # Which provider to use
max_tokens: 8192 # Maximum output tokens
temperature: 0.7 # Temperature (0.0-1.0)
# Optional parameters
# top_p: 0.9
# top_k: 40
# stop_sequences: ["\n\nHuman:"]
claude-opus-4-20250514:
provider: anthropic-main
max_tokens: 4096
temperature: 0.5 # Lower temperature for more deterministic output
claude-3-5-haiku-latest:
provider: anthropic-main
max_tokens: 4096
temperature: 0.8
gpt-4-turbo:
provider: openai-main
max_tokens: 4096
temperature: 0.7
gpt-4o:
provider: openai-main
max_tokens: 4096
temperature: 0.7
gpt-4o-mini:
provider: openai-main
max_tokens: 4096
temperature: 0.8
# Azure-hosted model
azure-gpt-4:
provider: azure-gpt4 # Uses Azure provider
max_tokens: 4096
temperature: 0.7
# Google Gemini model
gemini-1.5-pro:
provider: google-gemini
max_tokens: 8192
temperature: 0.7
# Local Ollama model
llama3:
provider: local-ollama
max_tokens: 4096
temperature: 0.8
# DeepSeek
deepseek-chat:
provider: deepseek
max_tokens: 4096
temperature: 0.7
# Groq (LLaMA)
llama-3.1-70b-versatile:
provider: groq
max_tokens: 4096
temperature: 0.7
Model Capabilities
You can define model capabilities for intelligent model selection:
model_capabilities:
claude-sonnet-4-20250514:
context_window: 200000
supports_vision: true
supports_tools: true
supports_streaming: true
cost_per_1k_input: 0.003
cost_per_1k_output: 0.015
claude-opus-4-20250514:
context_window: 200000
supports_vision: true
supports_tools: true
supports_streaming: true
cost_per_1k_input: 0.015
cost_per_1k_output: 0.075
gpt-4-turbo:
context_window: 128000
supports_vision: true
supports_tools: true
supports_streaming: true
cost_per_1k_input: 0.01
cost_per_1k_output: 0.03
gpt-4o-mini:
context_window: 128000
supports_vision: true
supports_tools: true
supports_streaming: true
cost_per_1k_input: 0.00015
cost_per_1k_output: 0.0006
These capabilities can be used by agents to intelligently select models based on task requirements.
Popular Models Reference
Anthropic Models
| Model | Context | Input Cost | Output Cost | Notes |
|---|---|---|---|---|
claude-opus-4-20250514 | 200K | $15/1M | $75/1M | Most capable |
claude-sonnet-4-20250514 | 200K | $3/1M | $15/1M | Balanced |
claude-3-5-haiku-latest | 200K | $0.25/1M | $1.25/1M | Fastest |
OpenAI Models
| Model | Context | Input Cost | Output Cost | Notes |
|---|---|---|---|---|
gpt-4-turbo | 128K | $10/1M | $30/1M | Most capable |
gpt-4o | 128K | $2.5/1M | $10/1M | Balanced |
gpt-4o-mini | 128K | $0.15/1M | $0.6/1M | Fastest |
o1-preview | 128K | $15/1M | $60/1M | Reasoning |
Google Models
| Model | Context | Input Cost | Output Cost | Notes |
|---|---|---|---|---|
gemini-1.5-pro | 2M | $1.25/1M | $5/1M | Large context |
gemini-1.5-flash | 1M | $0.075/1M | $0.3/1M | Fast |
Local Models (Ollama)
| Model | Context | Cost | Notes |
|---|---|---|---|
llama3 | 8K | Free | Open-source |
llama3.1 | 128K | Free | Extended context |
mistral | 32K | Free | Fast |
codellama | 16K | Free | Code-focused |
Using Models with Aliases
Instead of remembering full model names, use aliases:
# Instead of this:
viben model set-default -n claude-sonnet-4-20250514
# Use an alias:
viben model set-default -n smart
Common alias conventions:
| Alias | Typical Model | Use Case |
|---|---|---|
fast | claude-3-5-haiku-latest | Quick responses |
smart | claude-sonnet-4-20250514 | Balanced quality |
best | claude-opus-4-20250514 | Maximum quality |
code | claude-sonnet-4-20250514 | Coding tasks |
chat | claude-3-5-haiku-latest | Casual conversation |
Agent Integration
Models can be configured per-agent:
# Set model for specific agent
viben agent config -n my-agent set model claude-sonnet-4-20250514
# Or use an alias
viben agent config -n my-agent set model smart
JSON Output
All commands support --json flag:
viben model list --json
{
"success": true,
"data": {
"default": "claude-sonnet-4-20250514",
"models": [
{
"name": "claude-sonnet-4-20250514",
"provider": "anthropic-main",
"context_window": 200000,
"status": "available"
},
{
"name": "gpt-4-turbo",
"provider": "openai-main",
"context_window": 128000,
"status": "available"
}
]
}
}
viben model status --json
{
"success": true,
"data": {
"default": "claude-sonnet-4-20250514",
"models": [
{
"name": "claude-sonnet-4-20250514",
"provider": "anthropic-main",
"status": "available"
},
{
"name": "local-llama",
"provider": "local-ollama",
"status": "offline",
"error": "Provider not running"
}
]
}
}
Next Steps
- Model Aliases - Create convenient shortcuts for model names
- Model Fallbacks - Set up automatic fallback chains
- Provider Management - Configure providers for your models