rouge / nanbeige2-16b-chat

The Nanbeige2-16B-Chat is the latest 16B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase.
During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Direct Preference Optimization (DPO), culminating in the development of Nanbeige2-16B-Chat. Nanbeige2-16B-Chat has achieved superior performance across various authoritative benchmark datasets.

Evaluation

We have evaluated Nanbeige2-8B-Chat’s general question-answering capabilities and human preference alignments on several popular benchmark datasets. The model has achieved notable results in single-turn English QA (AlpacaEval 2.0), single-turn Chinese QA (AlignBench), and multi-turn English QA (MT-Bench).

AlpacaEval 2.0	AlignBench	MT-Bench
43.0%/40.4%	7.62	8.60