llava

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

vision 7b 13b 34b

1.7M Pulls Updated 9 months ago

llava:34b-v1.6-q5_1 ... /

template

a47b02e00552 · 106B

<|im_start|>system

{{ .System }}<|im_end|>

<|im_start|>user

{{ .Prompt }}<|im_end|>

<|im_start|>assistant