qwen3-next:80b-a3b-thinking-q4_K_M

2,733 yesterday

The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.

tools thinking cloud 80b

5 days ago

b2ebb986e4e9 · 50GB ·

qwen3next
·
79.7B
·
Q4_K_M
{{- $lastUserIdx := -1 -}} {{- range $idx, $msg := .Messages -}} {{- if eq $msg.Role "user" }}{{ $la
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR US
{ "repeat_penalty": 1, "stop": [ "<|im_start|>", "<|im_end|>" ], "te

Readme

image.png

Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series and features the following key enhancements:

  • Hybrid Attention: Replaces standard attention with the combination of Gated DeltaNet and Gated Attention, enabling efficient context modeling for ultra-long context length.
  • High-Sparsity Mixture-of-Experts (MoE): Achieves an extreme low activation ratio in MoE layers, drastically reducing FLOPs per token while preserving model capacity.
  • Stability Optimizations: Includes techniques such as zero-centered and weight-decayed layernorm, and other stabilizing enhancements for robust pre-training and post-training.
  • Multi-Token Prediction (MTP): Boosts pretraining model performance and accelerates inference.