ruGPT-3 - Russian Language Models

A curated collection of all ruGPT-3 family models converted to GGUF and packaged for Ollama. These are foundational (base) language models for Russian, originally developed by SberDevices / ai-forever and published in 2021 year. They are not instruction-tuned - they perform text completion only.

Think of this repository as a museum of classic Russian NLP models, preserved and made accessible with modern tooling. All four sizes from the ruGPT-3 family are available here in quantized GGUF format, ready to run locally via Ollama.

Details in the paper: A Family of Pretrained Transformer Language Models for Russian.

Quick Start

ollama run evilfreelancer/rugpt3

Available Models

All models share a 2048-token context window and use a BPE tokenizer with 50,264 tokens.

ruGPT-3 Small (125M parameters)

Based on ai-forever/rugpt3small_based_on_gpt2. Standard GPT-2 architecture pretrained on 80B tokens of Russian text for ~3 epochs, then finetuned with 2048 context.

Tag	Quantization	Size
`small`, `125m`, `125m-q8_0`, `small-q8_0`	Q8_0	182 MB
`125m-fp16`, `small-fp16`	FP16	334 MB

ruGPT-3 Medium (356M parameters)

Based on ai-forever/rugpt3medium_based_on_gpt2. Pretrained on 80B tokens for 3 epochs, finetuned with 2048 context. Test perplexity: 17.4.

Tag	Quantization	Size
`medium`, `356m`, `356m-q8_0`, `medium-q8_0`	Q8_0	443 MB
`356m-fp16`, `medium-fp16`	FP16	823 MB

ruGPT-3 Large (760M parameters)

Based on ai-forever/rugpt3large_based_on_gpt2. Pretrained on 80B tokens for 3 epochs, finetuned 1 epoch with 2048 context. Test perplexity: 13.6.

Tag	Quantization	Size
`large`, `760m`, `760m-q8_0`, `large-q8_0`	Q8_0	904 MB
`760m-fp16`, `large-fp16`	FP16	1.7 GB

ruGPT-3 XL (1.3B parameters)

Based on ai-forever/rugpt3xl. A deeply modified GPT-2 architecture (Pre-LayerNorm, fused QKV projections, Megatron-LM style). Trained from scratch on 80B tokens for 4 epochs using DeepSpeed + Megatron-LM, then finetuned with 2048 context. Test perplexity: 12.05.

The XL variant was converted from the original Megatron-LM checkpoint to HuggingFace format and then to GGUF. More quantization options are available due to its larger size.

Tag	Quantization	Size
`xl`, `1.3b`, `1.3b-q8_0`, `xl-q8_0`, `latest`	Q8_0	1.5 GB
`1.3b-q4_k_m`, `xl-q4_k_m`	Q4_K_M	935 MB
`1.3b-q4_0`, `xl-q4_0`	Q4_0	845 MB
`1.3b-fp16`, `xl-fp16`	FP16	2.9 GB

Usage Examples

Run with a specific size:

ollama run evilfreelancer/rugpt3:small
ollama run evilfreelancer/rugpt3:medium
ollama run evilfreelancer/rugpt3:large
ollama run evilfreelancer/rugpt3:xl

Run a specific quantization:

ollama run evilfreelancer/rugpt3:1.3b-q4_k_m
ollama run evilfreelancer/rugpt3:760m-fp16

Limitations

These are base models - they perform text completion, not instruction following or chat
Trained primarily on Russian text, limited capability in other languages
Maximum context length is 2048 tokens
May generate biased, factually incorrect, or offensive content

ruGPT-3 family of Russian Language Models

Details

Readme