Scania & Silo AI:

Citizen Data Scientist Course

Over the past weeks, I completed a comprehensive Data Science course designed to provide both theoretical knowledge and practical applications of modern data science techniques. Below is a brief summary of the key modules I covered. For more details, I encourage You to take a look at my blog where I tried to share my notes about all the modules.


Course Overview:

Fundamental concepts, the data science lifecycle, and the importance of problem definition were explored. Key learning included understanding the role of statistics, machine learning, and data wrangling in solving business problems.

I studied supervised learning models like linear regression, logistic regression, and decision trees. The focus was on training models to make predictions from labeled data, as well as understanding overfitting and generalization.

Techniques to evaluate models using metrics such as accuracy, precision, recall, and F1-score were emphasized. Cross-validation and its role in reducing bias and variance were key takeaways.

I applied machine learning workflows: data preprocessing, feature engineering, model selection, and hyperparameter tuning using real-world examples.

This module introduced clustering methods like K-means and hierarchical clustering for unlabeled data. Dimensionality reduction techniques, such as PCA, were also explored to identify hidden patterns.

A deeper dive into ensemble methods like Random Forest and boosting algorithms (e.g., Gradient Boosting) to reduce overfitting and improve predictive accuracy.

I learned to design experiments, frame null and alternative hypotheses, and select statistical tests to validate results. This module helped bridge the gap between traditional statistical analysis and data-driven machine learning.


Final Project:

The course culminated in a hands-on final project, where I applied the skills acquired to solve a practical, data-driven problem.

Data Cleaning and Preprocessing:

Addressed inconsistencies such as outliers and irregular time intervals through custom interpolation, smoothing techniques, and logical filtering.

Feature Engineering:

Designed new features, including rolling window trends, average growth rates, and custom metrics to better capture hidden patterns.

Trend and anomaly detection:

Analyzed usage patterns through curve fitting, derivative-based trend analysis, and rolling statistics to identify anomalies and changes in data behavior.

Model Development and Evaluation:

Implemented Random Forest and Gradient Boosting models to classify patterns, prioritizing high precision. Performed feature selection and hyperparameter tuning for optimization.

Insights and Results:

Delivered a robust framework to detect anomalies, leveraging both statistical analysis and machine learning to uncover meaningful insights.


Previous
Previous

Python for Data Analysis Course