wangrongsheng / sfr-iterative-dpo-llama-3-8b-r

SFR-Iterative-DPO-LLaMA-3-8B-R is a further (SFT and RLHF) fine-tuned model on LLaMA-3-8B, which provides good performance. The model is from Salesforce team.

135 Pulls Updated 4 months ago

params

577073ffcc6c · 110B

{ "num_keep": 24, "stop": [ "<|start_header_id|>", "<|end_header_id|>", "<|eot_id|>" ] }