4.3 Challenges and Risks of Generative AI Models

Generative AI Hallucination

Hallucination: Occurs when a generative AI model creates content that sounds factual but is actually false or made up.
Cause: Happens when the AI attempts to fill in gaps where data is missing.
Risk: Hallucinations can have disastrous outcomes. For example, in 2023, lawyers submitted fake case citations generated by AI to a court, leading to sanctions and fines.

AI-generated works cannot be copyrighted because they are not human-created.
Data Privacy Concerns: A generative AI model might generate outputs that include copyrighted material, such as:
- Trained data with copyrights, patents, or trademarks.
- User-submitted copyrighted works leading to unlicensed derivatives.
Example: Getty Images sued Stable Diffusion (a generative AI model) for using over 12 million photos and metadata without authorization.

Biased Outputs: Generative AI models can generate biased content, leading to discriminatory or unfair treatment.
Example: The Equal Employment Opportunity Commission (EEOC) sued companies that used an AI hiring tool which discriminated against older applicants (women over age 55 or men over age 60).

Offensive, Disturbing, or Obscene Content: If such content was in training data, the model can produce harmful outputs, affecting users’ mental or emotional health.
Violence Risk: Toxic content can lead to an increased propensity for violence against individuals or marginalized groups.

Sensitive Data Leaks: Large language models may inadvertently leak sensitive information such as:
- Personally identifiable information (PII)
- Intellectual property
- Healthcare records
Issue: Once data is trained in a foundation model, it cannot be “forgotten” or deleted from the model.
Customer Trust: These risks can result in loss of trust and reputational damage due to irresponsible AI practices.

Guardrails: You can set filters in Amazon Bedrock to block inappropriate content, including:
- Hate speech, insults, sexual content, violence.
Blocking Content: Users’ prompts must pass through guardrails. If the prompt is not allowed, the user receives a violation response, and the prompt is not sent to the model.
Model Response Filtering: Even if a prompt passes, the generated response can still be blocked if it violates the guardrails.

Purpose: Helps compare performance of large language models (LLMs) across different tasks.
Tasks:
- Text Generation
- Text Classification
- Question & Answering
- Text Summarization

Evaluation Dimensions:
- Prompt Stereotyping: Measures bias in model responses (e.g., race, gender, age).
- Toxicity: Checks for offensive, rude, or harmful language.
- Factual Knowledge: Verifies the accuracy of model responses.
- Semantic Robustness: Assesses model’s resilience to input changes like typos or formatting.
- Accuracy: Compares model output to expected responses.
Evaluation Tools:
- Use a built-in prompt dataset or provide your own.
- Option for human feedback from employees or subject matter experts.
- Amazon Bedrock: Similar evaluation capabilities are available for pre-trained LLMs in Amazon Bedrock console.