llava

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

vision 7b 13b 34b

1.7M Pulls Updated 9 months ago

llava:34b-v1.6-q4_0 ... /

params

f02dd72bb242 · 59B

{

"stop": [

"<|im_start|>",

"<|im_end|>"

]

}