llama3.1 DPO 中文对齐版本
483 Pulls 1 Tag Updated 7 weeks ago
llama3-8b-instruct-dpo-zh-loftq, DPO beta: 0.5, lora rank 128, with LoftQ lora
Updated 4 months ago