1. What is Machine Learning?
- Machine Learning (ML) is the science of developing algorithms and models that allow computers to perform complex tasks without explicit instructions.
- ML algorithms use historical data to identify patterns and make predictions.
2. How Machine Learning Works
- Training the Model:
- An algorithm is given known data (features) as input and adjusts internal parameters to produce the expected output.
- Inference: Once trained, the model can predict outputs for new, unseen data.
- Features: The input data that the model uses to learn (e.g., columns in a table, pixels in an image).
3. Types of Data for ML
- Structured Data:
Data organized in rows and columns (like in a table).
Examples: CSV files, relational databases (e.g., Amazon RDS, Amazon Redshift).
Stored in Amazon S3 for model training. - Semi-Structured Data:
Data that doesn’t fully follow the structure of tables but contains some organization.
Example: JSON (key-value pairs).
Stored in Amazon DynamoDB or Amazon DocumentDB. - Unstructured Data:
Data with no fixed model (e.g., images, text files).
Stored as objects in Amazon S3.
Features are extracted using techniques like tokenization (for text). - Time Series Data:
Data records labeled with timestamps, stored sequentially.
Example: Microservice metrics (CPU usage, memory, transactions).
Helps predict future trends (e.g., scaling infrastructure).
Stored in Amazon S3 for model training.
4. Machine Learning Models and Algorithms
- Algorithm: A mathematical method that defines the relationship between inputs and outputs.
- Linear Regression: A simple example where the goal is to find the best-fitting line for data points (e.g., predicting height from weight).
- Model Parameters: Adjusted iteratively to minimize errors between the predicted and actual data.
5. Inference
- Once the model is trained, it can make predictions on new data.
- Example: Predict a person’s height based on their weight.
Key Terms to Remember:
- Machine Learning (ML): Algorithms that allow computers to learn from data without explicit instructions.
- Training: The process of feeding known data to a model to adjust its parameters.
- Inference: The process of using a trained model to predict outputs from new data.
- Features: The input data used for training the model.
- Structured Data: Data in tables (e.g., CSV, relational databases).
- Semi-Structured Data: Data with partial organization (e.g., JSON).
- Unstructured Data: Raw data with no specific structure (e.g., images, text).
- Time Series Data: Data with timestamps to track changes over time.
- Linear Regression: A model that finds the best-fit line for data points.