2,948 Downloads Updated 20 hours ago
ollama run nemotron-3-super
ollama launch claude --model nemotron-3-super
ollama launch codex --model nemotron-3-super
ollama launch opencode --model nemotron-3-super
ollama launch openclaw --model nemotron-3-super
NVIDIA Nemotron™ is a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.
Nemotron-3-Super is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model’s reasoning capabilities can be configured through a flag in the chat template.
The model has 12B active parameters and 120B parameters in total.
The supported languages include: English, French, German, Italian, Japanese, Spanish, and Chinese
This model is ready for commercial use.
| Benchmark | Nemotron-3-Super | Nemotron-3-Super FP8 | Nemotron-3-Super NVFP4 |
|---|---|---|---|
| General Knowledge | |||
| MMLU-Pro | 83.73 | 83.63 | 83.33 |
| Reasoning | |||
| HMMT Feb25 (with tools) | 94.73 | 94.38 | 95.36 |
| GPQA (no tools) | 79.23 | 79.36 | 79.42 |
| LiveCodeBench (v6 2024-08↔2025-05) | 78.69 | 78.44 | 78.44 |
| LiveCodeBench (v5 2024-07↔2024-12) | 81.19 | 80.99 | 80.56 |
| SciCode (subtask) | 42.05 | 41.38 | 40.83 |
| HLE (no tools) | 18.26 | 17.42 | 17.42 |
| Agentic | |||
| Terminal Bench (hard subset) | 25.78 | 26.04 | 24.48 |
| TauBench V2 | |||
| Airline | 56.25 | 56.25 | 54.75 |
| Retail | 62.83 | 63.05 | 63.38 |
| Telecom | 64.36 | 63.93 | 63.27 |
| Average | 61.15 | 61.07 | 60.46 |
| Chat & Instruction Following | |||
| IFBench (prompt) | 72.58 | 72.32 | 73.30 |
| Scale AI Multi-Challenge | 55.23 | 54.35 | 52.8 |
| Arena-Hard-V2 (Hard Prompt) | 73.88 | 76.06 | 76.00 |
| Long Context | |||
| AA-LCR | 58.31 | 57.69 | 58.06 |
| RULER-500 @ 128k (500 samples per task) | 96.79 | 96.85 | 95.99 |
| RULER-500 @ 256k (500 samples per task) | 96.60 | 96.33 | 96.52 |
| RULER-500 @ 512k (500 samples per task) | 96.09 | 95.66 | 96.23 |
| Multilingual | |||
| MMLU-ProX (avg over languages) | 79.35 | 79.21 | 79.37 |