As this is not a clearly identifiable project, no conventional documentation was written. However, a manual was created, which together with the four Jupyter notebooks should serve as a reference book for applied Machine Learning with Python.
From a Python point of view, this practice gives an introduction to the libraries: Numpy, Pandas and Scikit-Learn.
The contents of the notebooks and of the reference book can be structured in three parts:
In the Representation part, the focus is on the dataset and its features. This means that data preprocessing and feature selection are in the centre of this part. Furthermore, the different use cases of Machine Learning and the linked classifiers are presented. Models which are covered in this practice are for an example, Linear Regression, Decision Trees, Logistic Regression, Support Vector Machines, etc.
In the Evaluation part, the focus is on the different metrics which help to evaluate models' performances. In that sense, topics like confusion matrix or ROC curve are introduced.
Finally, in the Optimization part, the performance reasons of the models is further examined. This means that additional tools are introduced which help to improve to fine-tune the models. As an example, Cross Validation and its impact are explained.
Jupyter Notebooks for Machine Learning Introduction.
- There are three notebooks which give an introduction into feature extraction, classifiers, evaluation methods and performance optimisation.
- Additionally, there is one notebook which focuses on evaluation metrics, such as Precision or Recall.
- This handbook provides the theoretical background for the implementations in the notebooks.