openbmb/ minicpm5:latest

222 yesterday

highly efficient large language models (LLMs) designed explicitly for end-side devices

ollama run openbmb/minicpm5

Details

yesterday

08239e8f70e0 · 688MB ·

llama
·
1.08B
·
Q4_K_M
<s>{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ range .Messages }}<|im_sta
{ "stop": [ "<|im_end|>", "<|endoftext|>" ], "temperature": 0.7, "to

Readme

MiniCPM Tech Report | GitHub Repo | UltraData | MiniCPM Desk Pet | Online Demo

English | 中文

Highlights

We are releasing MiniCPM5-1B, the first model in the MiniCPM5 series. It is a dense 1B Transformer built for on-device, local deployment, and resource-constrained scenarios, reaching 1B-class open-source SOTA.

🏆 1B-class open-source SOTA: compared with strong open-source models in the same size class, MiniCPM5-1B reaches SOTA within this comparison set. Its advantage is most visible in agentic tool use, code generation, and difficult reasoning.

MiniCPM5-1B capability comparison by domain

🧠 Hybrid Reasoning: built-in <think> chat template, switch via enable_thinking. The same checkpoint serves as both a fast assistant and a deliberate reasoner.

🛠️ Deployment / Fine-tuning Resources: the MiniCPM GitHub repo provides single-page cookbooks and Agent Skills for major inference backends and fine-tuning frameworks.

🐱 Desktop Pet: a local-LLM desktop pet driven by MiniCPM5-1B.

Model List

Use this directory to choose the model format that matches your runtime:

Model Information

MiniCPM5-1B has the following features:

  • Type: Causal Language Model
  • Architecture: Standard LlamaForCausalLM
  • Number of Parameters: 1,080,632,832
  • Number of Non-Embedding Parameters: 679,552,512
  • Number of Layers: 24
  • Number of Attention Heads (GQA): 16 for Q and 2 for KV
  • Context Length: 131,072

Introduction

MiniCPM5-1B is the first checkpoint in the MiniCPM5 series. It is designed for local assistants, coding agents, tool-use workflows, and reasoning scenarios where a compact model is preferred. The model keeps a small deployment footprint while providing native long-context support and both Think / No Think chat modes through the same checkpoint.

Note: If you want to use local deployment, you can refer to this document.