bramvanroy / fietje-2b-chat

Fietje: An open and efficient LLM for Dutch (chat)

225 Pulls Updated 6 months ago

Updated 6 months ago

6 months ago

8394fc31afbe · 3.0GB

parameters2.78B

quantizationQ8_0

{"num_ctx":2048,"stop":["\u003c|im_start|\u003e","\u003c|im_end|\u003e"]}

{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user

Readme

This repository contains quantized versions of BramVanroy/fietje-2b-chat.

Available quantization types and expected performance differences compared to base f16, higher perplexity=worse (from llama.cpp):

Q3_K_M  :  3.07G, +0.2496 ppl @ LLaMA-v1-7B
Q4_K_M  :  3.80G, +0.0532 ppl @ LLaMA-v1-7B
Q5_K_M  :  4.45G, +0.0122 ppl @ LLaMA-v1-7B
Q6_K    :  5.15G, +0.0008 ppl @ LLaMA-v1-7B
Q8_0    :  6.70G, +0.0004 ppl @ LLaMA-v1-7B
F16     : 13.00G              @ 7B

Quants were made with release b2777 of llama.cpp.