qwen3-next:80b-cloud

566.3K Downloads Updated 6 months ago

The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.

tools thinking cloud 80b

Usage

medium

Context

256K tokens

Size

80B parameters

ollama run qwen3-next:80b-cloud

curl http://localhost:11434/api/chat \
  -d '{
    "model": "qwen3-next:80b-cloud",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='qwen3-next:80b-cloud',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'qwen3-next:80b-cloud',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Readme

Qwen3-Next-80B-A3B is the first installment in the Qwen3-Next series and features the following key enhancements:

Hybrid Attention: Replaces standard attention with the combination of Gated DeltaNet and Gated Attention, enabling efficient context modeling for ultra-long context length.
High-Sparsity Mixture-of-Experts (MoE): Achieves an extreme low activation ratio in MoE layers, drastically reducing FLOPs per token while preserving model capacity.
Stability Optimizations: Includes techniques such as zero-centered and weight-decayed layernorm, and other stabilizing enhancements for robust pre-training and post-training.
Multi-Token Prediction (MTP): Boosts pretraining model performance and accelerates inference.