evilfreelancer/ rugpt3:xl-q4_k_m

766 1 week ago

ruGPT-3 family of Russian Language Models

125m 356m 760m 1.3b
ollama run evilfreelancer/rugpt3:xl-q4_k_m

Details

1 week ago

992b3648c7c7 · 935MB ·

gpt2
·
1.42B
·
Q4_K_M
{ "num_ctx": 2048, "repeat_penalty": 1.1, "temperature": 0.8, "top_p": 0.95 }
{{ if .System }}{{ .System }}{{ end }}{{ if .Prompt }}{{ .Prompt }}{{ end }}

Readme

ruGPT-3 - Russian Language Models

A curated collection of all ruGPT-3 family models converted to GGUF and packaged for Ollama. These are foundational (base) language models for Russian, originally developed by SberDevices / ai-forever and published in 2021 year. They are not instruction-tuned - they perform text completion only.

Think of this repository as a museum of classic Russian NLP models, preserved and made accessible with modern tooling. All four sizes from the ruGPT-3 family are available here in quantized GGUF format, ready to run locally via Ollama.

Details in the paper: A Family of Pretrained Transformer Language Models for Russian.

Quick Start

ollama run evilfreelancer/rugpt3

Available Models

All models share a 2048-token context window and use a BPE tokenizer with 50,264 tokens.

ruGPT-3 Small (125M parameters)

Based on ai-forever/rugpt3small_based_on_gpt2. Standard GPT-2 architecture pretrained on 80B tokens of Russian text for ~3 epochs, then finetuned with 2048 context.

Tag Quantization Size
small, 125m, 125m-q8_0, small-q8_0 Q8_0 182 MB
125m-fp16, small-fp16 FP16 334 MB

ruGPT-3 Medium (356M parameters)

Based on ai-forever/rugpt3medium_based_on_gpt2. Pretrained on 80B tokens for 3 epochs, finetuned with 2048 context. Test perplexity: 17.4.

Tag Quantization Size
medium, 356m, 356m-q8_0, medium-q8_0 Q8_0 443 MB
356m-fp16, medium-fp16 FP16 823 MB

ruGPT-3 Large (760M parameters)

Based on ai-forever/rugpt3large_based_on_gpt2. Pretrained on 80B tokens for 3 epochs, finetuned 1 epoch with 2048 context. Test perplexity: 13.6.

Tag Quantization Size
large, 760m, 760m-q8_0, large-q8_0 Q8_0 904 MB
760m-fp16, large-fp16 FP16 1.7 GB

ruGPT-3 XL (1.3B parameters)

Based on ai-forever/rugpt3xl. A deeply modified GPT-2 architecture (Pre-LayerNorm, fused QKV projections, Megatron-LM style). Trained from scratch on 80B tokens for 4 epochs using DeepSpeed + Megatron-LM, then finetuned with 2048 context. Test perplexity: 12.05.

The XL variant was converted from the original Megatron-LM checkpoint to HuggingFace format and then to GGUF. More quantization options are available due to its larger size.

Tag Quantization Size
xl, 1.3b, 1.3b-q8_0, xl-q8_0, latest Q8_0 1.5 GB
1.3b-q4_k_m, xl-q4_k_m Q4_K_M 935 MB
1.3b-q4_0, xl-q4_0 Q4_0 845 MB
1.3b-fp16, xl-fp16 FP16 2.9 GB

Usage Examples

Run with a specific size:

ollama run evilfreelancer/rugpt3:small
ollama run evilfreelancer/rugpt3:medium
ollama run evilfreelancer/rugpt3:large
ollama run evilfreelancer/rugpt3:xl

Run a specific quantization:

ollama run evilfreelancer/rugpt3:1.3b-q4_k_m
ollama run evilfreelancer/rugpt3:760m-fp16

Limitations

  • These are base models - they perform text completion, not instruction following or chat
  • Trained primarily on Russian text, limited capability in other languages
  • Maximum context length is 2048 tokens
  • May generate biased, factually incorrect, or offensive content

Links