45 3 days ago

Pruned to 98 experts gemma-4 a4b 26b v3

tools thinking
ollama run mannix/gemma4-98e:Q6_K

Details

4 days ago

5aba7e420995 · 15GB ·

gemma4
·
19.9B
·
Q6_K
{{- if or .System .Tools }}<bos><|turn>system {{ if .System }}{{ .System }} {{ end }}{{- if .Tools }
{ "num_ctx": 256000, "repeat_last_n": 256, "repeat_penalty": 1.15, "stop": [

Readme

The gemma-4-A4B-98e-v3 is pruned specifically to keep intact reasoning, the token usage is higher than the original 128e version. It’s superseded by the v4 version (https://ollama.com/mannix/gemma4-98e-v4) that scores better and is within the same 128e original token usage:

HumanEval-chat token usage (164 problems × max=3072)

  ┌──────────────┬─────┬─────┬─────┬─────┬──────┬─────┐
  │   variant    │ min │ p10 │ p50 │ p90 │ max  │ avg │
  ├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
  │ 128e @3072   │  35 │ 125 │ 314 │ 589 │  917 │ 334 │
  ├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
  │ 98e-v4       │  35 │ 114 │ 304 │ 648 │  895 │ 340 │
  ├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
  │ 98e-v3 @3072 │  35 │ 206 │ 490 │ 897 │ 1013 │ 512 │
  └──────────────┴─────┴─────┴─────┴─────┴──────┴─────┘

Template fixed for tools usage

Model on HF:

https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v3-it

Full GGUF:

https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v3-it-GGUF