llava

🌋 LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

vision 7b 13b 34b

1.7M Pulls Updated 9 months ago

llava:13b-v1.6 ... /

params

7215dae26124 · 33B

{

"stop": [

"USER:",

"ASSSISTANT:"

]

}