Fundamental Machine Learning concepts

Machine learning is currently the most in demand skill set in the I.T. industry. Machine learning problems can generally be broken up into two categories:

  1. Supervised ML problems - Use features and labels associated with training data. Features are properties of training data, while labels are the output or descriptors given to the features.
  2. Unsupervised ML problems - use clustering.

Sklearn is a very useful python package for ML, developed by Google. It contains 6 very useful tools used in ML problems:

You can find many useful datasets to play with on the internet. A popular repository is, UCI Machine Learning Repository. It also contains the 'Iris' dataset which is the 1st dataset often used to test ML algorithms. It is also one of the oldest datasets. It's simplicity makes it a good starting point.

To use ML algorithms a large amount of work is involved in getting data ready to be processed. For example, Scikit learn is currently designed and optimised in a way such that it works with numbers and not strings. This is how it ensures it can run very quickly. Often, some time is required to manipulate and clean data so it is in format ready to be processed.

Some good books recommended to me for ML include the following:

Tags: ml