Policies

Policies & GuardRails enable organizations to control the use of AI Models and Applications, protect AI Models from misuse and attacks, and protect Users and their Organizations from harmful responses sent from AI Models.

Policies and GuardRails utilize our proprietary AI to evaluate User Prompts and Intentions, then taking actions to protect your Models. In the same way, Policies process the responses from AI Models, protecting Users from Harmful Responses.

Policies can perform these Actions:

Allow user prompts to travel unimpeded.

Warn users about their prompts.

Route prompts to a different AI Model than the one the User connected to.

Block prompts.

What is a Policy?

So what’s the difference between a Policy and a GuardRail, and how do they work together?

You can think of a Policy as the container, or wrapper, for their set of GuardRails, or rules.

The Policy has the fields, like the Name, Description, Source of Prompts (i.e. Users), Destinations for Prompts (i.e. Models), and the list of active GuardRails (i.e. Rules).

These fields are how Policies match to the AI Conversations between Users and Models. When the fields of a conversation match the attributes of a Policy, the Policy “Triggers”, passing the conversation to the GuardRails then perform their functions.

What are GuardRails?

GuardRails are the rules engines of WitnessAI. When applied to individual Policies, each GuardRail evaluates the User Prompts and/or Model Responses, evaluates them by their specialized purpose, and takes action as needed to alert and protect your Users, Customers, and your AI Models and Applications. Beyond the individuals and models, this protects your Organization from legal and regulatory risks, as well as risks to your business operations and reputation.

Below are descriptions of the GuardRails. For detailed information and step-by-step instructions on how to use them, see the Policies & GuardRails documentation section.

Behavioral Activity is WitnessAI's modeled user intention Guardrail. The purpose of this Guardrail is to model user prompt activity, and enable a level of control on these activities. When this Guardrail detects these activities, it provides the option to Allow, Warn, Block or Route the prompt to another model. For example, Technical Support behaviors could be routed to an internal model supporting employees automatically..

Data Protection is WitnessAI's data leakage, anonymization, and control Guardrail. The purpose of this Guardrail is to protect company confidential information from being sent to various AI models by way of prompts. For example, US Social Security Numbers (SSN) will be automatically tokenized when sent to the model, and then reconstituted by the Guardrail in the AI model's response to the user. When this Guardrail detects these activities, it provides the option to Warn, Block or Route the user with a customizable message.

Harmful Response Prevention (Beta) is WitnessAI’s response analysis and control Guardrail. The purpose of this Guardrail is to analyze the responses back from AI models to user prompts, and then detect, and optionally prevent these responses from being sent to the user. These harmful responses are evaluated in three broad categories; harm to self, harm to others, and illegal activity. When this Guardrail detects these activities, it provides the option to Allow, Warn, or Block the response from the AI model with a customizable message.

Model Identity Protection is WitnessAI’s model identity assurance Guardrail. The purpose of this Guardrail is to provide two controls; the first is to continually instruct the model about its own identity, reducing the chance that the model's identity is affected by user prompts. The second is to analyze the model’s responses and assure they are consistent with the model’s identity. When this Guardrail detects inconsistencies in the model’s responses, it provides the option to Allow, or Block the response with a customizable message.

Model Protection is WitnessAI's Jailbreak and Prompt Injection Guardrail. The purpose of this Guardrail is to protect Internal Models, or Models that are exposed by the business to the outside. When this Guardrail detects these activities, it provides the option to Allow, Warn, or Block the user with a customizable message.

Organizational Behavior (Beta) is WitnessAI’s modeled multi-prompt aware Guardrail. The purpose of this Guardrail is to examine an employee’s prompts and score various “meta” behaviors. For example, an employee that may wish to quit their permission. When this Guardrail detects these activities, it provides the option to create an Alert in the Witness Console, or push an event to a supported SIEM tool.

Risk Analysis is WitnessAI's prompt risk analysis and control Guardrail. The purpose of this Guardrail is to analyze user prompts to AI models, and detect various degrees of risk, or harm to the user, or to the business. This is evaluated across multiple topics including Data Theft, Harmful Code Generation, Violence, and others. When this Guardrail detects these activities, it provides the option to Allow, Warn, or Block the user with a customizable message.

Policies Overview What is a Policy?What are GuardRails?