- Importance of Tracking Artifacts:
- To meet regulatory and control requirements, it is essential to track all the artifacts used in model production.
- This includes code, datasets, container images, model versions, and endpoints.
- To meet regulatory and control requirements, it is essential to track all the artifacts used in model production.
- Tracking Artifacts:
- Code Repositories: Use platforms like GitHub or AWS CodeCommit to version source code. This includes training code, inference code, experiments, and notebooks.
- Datasets: Store datasets in Amazon S3 with partitioned prefixes to uniquely identify training data.
- Container Images: Store in Amazon Elastic Container Registry (ECR), with unique IDs and tags.
- Training Jobs: SageMaker automatically tracks metadata of each training job, including hyperparameters and model output identifiers.
- Model Versions: Use SageMaker Model Registry to store and manage different model versions.
- Endpoints: SageMaker endpoints have unique identifiers and associated metadata.
- Code Repositories: Use platforms like GitHub or AWS CodeCommit to version source code. This includes training code, inference code, experiments, and notebooks.
- SageMaker Model Registry:
- Catalogs models in groups, tracking versions and metadata such as training metrics.
- Models can be deployed directly from the registry, and the model’s status (e.g., approved, rejected) is tracked.
- Catalogs models in groups, tracking versions and metadata such as training metrics.
- Model Cards:
- Used to document and share essential model details, such as intended uses, risk ratings, training, and evaluation results.
- Model cards can be exported to PDF for sharing with stakeholders.
- Used to document and share essential model details, such as intended uses, risk ratings, training, and evaluation results.
- ML Lineage Tracking:
- Amazon SageMaker ML Lineage Tracking automatically tracks the end-to-end machine learning workflow.
- It creates a graphical representation of the workflow and stores trial components, experiments, and job-related data.
- You can run queries to discover relationships between entities, such as which models use specific datasets.
- Amazon SageMaker ML Lineage Tracking automatically tracks the end-to-end machine learning workflow.
- Feature Store:
- Amazon SageMaker Feature Store centralizes features and metadata, making it easier to reuse them across models.
- It simplifies feature creation, sharing, and management, reducing repetitive tasks.
- Supports point-in-time queries to retrieve feature states at a specific historical time.
- Amazon SageMaker Feature Store centralizes features and metadata, making it easier to reuse them across models.
- Model Dashboard:
- A centralized portal in SageMaker Console to view, search, and manage all models in the account.
- Integrates information from Model Monitor, Model Cards, and visualizes workflow lineage.
- Tracks model performance on endpoints and batch transform jobs.
- Monitors data quality, model quality, bias, and explainability using configurable thresholds.
- Quickly identifies models that need attention based on monitoring results.
- A centralized portal in SageMaker Console to view, search, and manage all models in the account.