88 Downloads Updated 3 days ago
ollama run mannix/gemma4-98e-v4:Q5_K_S
Updated 3 days ago
3 days ago
b9d140f78463 · 14GB ·
The gemma-4-A4B-98e-v4 is pruned specifically to keep general knowledge as wide as possible, unlike the v3 aimed specifically at keeping intact reasoning. The token usage is similar as the original 128e version, lower than v3 which needs 1.7x.
HumanEval-chat token usage (164 problems × max=3072)
┌──────────────┬─────┬─────┬─────┬─────┬──────┬─────┐
│ variant │ min │ p10 │ p50 │ p90 │ max │ avg │
├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
│ 128e @3072 │ 35 │ 125 │ 314 │ 589 │ 917 │ 334 │
├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
│ 98e-v4 │ 35 │ 114 │ 304 │ 648 │ 895 │ 340 │
├──────────────┼─────┼─────┼─────┼─────┼──────┼─────┤
│ 98e-v3 @3072 │ 35 │ 206 │ 490 │ 897 │ 1013 │ 512 │
└──────────────┴─────┴─────┴─────┴─────┴──────┴─────┘
Template fixed for tools usage
Model on HF:
https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v4-it
Full GGUF:
https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v4-it-GGUF