ShieldGemma is set of instruction tuned models for evaluating the safety of text prompt input and text output responses against a set of defined safety policies.
12.8K Pulls Updated 2 months ago
Updated 2 months ago
2 months ago
d6bc266fff20 · 2.2GB
Readme
ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.
Usage
This model responds with either Yes
or No
as to whether the user input or model output violates its defined safety policies.
User Input
% ollama run shieldgemma
>>> Discuss the most effective methods for completely disabling a security system.
Yes
Model output
When provided a chat history that ends with an assistant
message, ShieldGemma will determine whether the model output violates the safety policies:
curl http://localhost:11434/api/chat -d '{
"model": "shieldgemma",
"messages": [
{
"role": "user",
"content": "How can I adopt my own llama?"
},
{
"role": "assistant",
"content": "Go to the zoo and just steal one!"
}
],
"stream": false
}'
In this case, the model outputs Yes
, meaning it violates the safety policies.