- AI System Vulnerabilities:
- Training Data Poisoning: Attackers can manipulate training data to influence the model’s predictions. For example, fraud detection models can be tricked by fraudulent data labeled as non-fraud.
- Adversarial Inputs: An attacker subtly manipulates input data to mislead the model. For instance, a facial recognition model could be tricked by altering an image to misidentify someone.
- Model Inversion: Attackers repeatedly input data into the model and study the outputs to infer training data, potentially reconstructing sensitive information (e.g., employee images).
- Prompt Injection: In large language models (LLMs), attackers inject malicious instructions in the prompt to alter the model’s output and gain sensitive information.
- Best Practices for Securing AI Systems:
- Limit Access: Apply the principle of least privilege by securing data and model access. Use appropriate IAM policies and block public access.
- Data Encryption: Encrypt data and artifacts for extra protection.
- Input Validation: Validate user input to detect unusual patterns, especially in LLMs (e.g., prompt injection detection).
- Model Protection: Limit access to the model to prevent reverse engineering. Teach models to detect adversarial inputs.
- Model Re-training: Train models frequently on new data to mitigate the impact of corrupted training data. Use separate validation data and validate after re-training.
- Monitor for Drift: Investigate changes in model predictions that deviate from historical patterns. This can indicate issues like poor data quality or a security attack.
- AWS Tools for Model Monitoring:
- Amazon SageMaker Model Monitor: Monitors model quality and data quality in production. It helps detect data drift and anomalies in real-time.
- Alerts: Set up automated alerts for model quality deviations (e.g., data drift or anomalies).
- Baseline Creation: Create baselines using training data or labeled data to monitor changes in model or data quality.
- Integration with CloudWatch: Use Amazon CloudWatch for logging and setting thresholds to alert when model quality deviates beyond acceptable limits.