Machine Learning (ML) Concepts

Machine Learning (ML) is the science of developing algorithms and models that allow computers to perform complex tasks without explicit instructions.
ML algorithms use historical data to identify patterns and make predictions.

Training the Model:
An algorithm is given known data (features) as input and adjusts internal parameters to produce the expected output.
Inference: Once trained, the model can predict outputs for new, unseen data.
Features: The input data that the model uses to learn (e.g., columns in a table, pixels in an image).

Structured Data:
Data organized in rows and columns (like in a table).
Examples: CSV files, relational databases (e.g., Amazon RDS, Amazon Redshift).
Stored in Amazon S3 for model training.
Semi-Structured Data:
Data that doesn’t fully follow the structure of tables but contains some organization.
Example: JSON (key-value pairs).
Stored in Amazon DynamoDB or Amazon DocumentDB.
Unstructured Data:
Data with no fixed model (e.g., images, text files).
Stored as objects in Amazon S3.
Features are extracted using techniques like tokenization (for text).
Time Series Data:
Data records labeled with timestamps, stored sequentially.
Example: Microservice metrics (CPU usage, memory, transactions).
Helps predict future trends (e.g., scaling infrastructure).
Stored in Amazon S3 for model training.

Algorithm: A mathematical method that defines the relationship between inputs and outputs.
Linear Regression: A simple example where the goal is to find the best-fitting line for data points (e.g., predicting height from weight).
Model Parameters: Adjusted iteratively to minimize errors between the predicted and actual data.

Machine Learning (ML): Algorithms that allow computers to learn from data without explicit instructions.
Training: The process of feeding known data to a model to adjust its parameters.
Inference: The process of using a trained model to predict outputs from new data.
Features: The input data used for training the model.
Structured Data: Data in tables (e.g., CSV, relational databases).
Semi-Structured Data: Data with partial organization (e.g., JSON).
Unstructured Data: Raw data with no specific structure (e.g., images, text).
Time Series Data: Data with timestamps to track changes over time.
Linear Regression: A model that finds the best-fit line for data points.

4.2 Monitoring in AWS for Model Bias, Trustworthiness, and Truthfulness