The Hello World of Machine Learning
Machine learning is a research field in computer science, artificial intelligence, and statistics. The focus of machine learning is to train algorithms to learn patterns and make predictions from data. Machine learning is particularly important on the grounds that it gives us a chance to utilize computers to automate decision-making processes.
You’ll find machine learning applications all around. Netflix and Amazon utilize machine learning to figuring out how to make new product suggestions. Banks utilize machine learning to recognize deceitful activity in credit card transactions, and healthcare companies are starting to utilize machine learning to monitor, assess, and diagnose patients.
In this instructional exercise, you’ll execute a straightforward machine learning algorithm in Python using Scikit-learn, a machine learning tool for Python.
Scikit is a free and open-source machine learning library for Python. It offers off-the-shelf functions to implement many algorithms like linear regression, classifiers, SVMs, k-means, Neural Networks etc. It likewise has a few sample datasets which can be straightforwardly utilized for training and testing.
In light of its speed, robustness and ease to utilize, it’s a standout amongst the most generally utilized libraries for some Machine Learning applications.
>>pip install -U scikit-learn
The best small project to start with on a new tool is the classification(recognize the image of a specific species of flower called iris) of iris flowers.
Without Machine learning we might need to write bunch of different functions like def detect_petal_length, detect_shape, detect_color, detect_curvature etc etc.
And the problem would be there will be bunch of surrounding cases which we need to take care and there is no way you could take an account of all these ahead of time. You cannot code all of these before in-hand! YOU have to use machine learning to solve this problem and best thing is it is Simple you don’t need to be a Maths master to do it!
This is a decent dataset for your first project because it is so well understood because of underneath reasons:
All of the numeric attributes are in the same units and the same scale and It is a multiclass classification problem
It is a classification problem, allowing you to practice with an easier type of supervised learning algorithm.
Contains 4 columns and 150 rows, which means it is little and effortlessly fits into memory.
In this post we are going to work through a small machine learning project end-to-end. Here is an overview of what we are going to cover:
Let’s get started!
Using A Structured Step-By-Step Process Any predictive modeling machine learning project can be broken down into 4 stages:
1.) Collect Data
2.) Pick the Model
3.) Train the Model
4.) Test the Model
For this example, we train a simple classifier on the Iris dataset, which comes bundled in with scikit-learn.
The dataset takes four features of flowers:
- sepal length,
- sepal width,
- petal length and
- petal width.
And classifies them into three flower species (labels): setosa, versicolor or virginica.
The labels have been represented as numbers in the dataset: 0 (setosa), 1 (versicolor) and 2 (virginica).
We rearrange the Iris dataset, and partition it into separate training and testing sets: keeping the last 10 data points for testing and rest for training. We at that point train the classifier on the training set, and predict on the testing set.
from sklearn.datasets import load_iris from sklearn import tree from sklearn.metrics import accuracy_score import numpy as np #loading the iris dataset iris = load_iris() x = iris.data #array of the data y = iris.target #array of labels (i.e answers) of each data entry #getting label names i.e the three flower species y_names = iris.target_names #taking random indices to split the dataset into train and test test_ids = np.random.permutation(len(x)) #splitting data and labels into train and test #keeping last 10 entries for testing, rest for training x_train = x[test_ids[:-10]] x_test = x[test_ids[-10:]] y_train = y[test_ids[:-10]] y_test = y[test_ids[-10:]] #classifying using decision tree clf = tree.DecisionTreeClassifier() #training (fitting) the classifier with the training set clf.fit(x_train, y_train) #predictions on the test dataset pred = clf.predict(x_test) print pred #predicted labels i.e flower species print y_test #actual labels print (accuracy_score(pred, y_test))*100 #prediction accuracy
Since we’re splitting randomly and the classifier trains on every iteration, the accuracy may vary.
Running the above code gives:
[0 1 1 1 0 2 0 2 2 2] [0 1 1 1 0 2 0 2 2 2] 100.0
The First line contains the labels (i.e flower species) of the testing data as anticipated by our classifier, and the second line contains the actual flower species as given in the dataset. We therefore get a precision of 100% this time.
In this post you found step-by-step how to complete your first machine learning project in Python. You found that completing a small end-to-end project from loading the data to making predictions is the most ideal approach to get comfortable platform.
Taking everything into account, I’d get a kick out of the chance to wish a win and tolerance to the individuals who are quite recently starting to partake in machine learning competitions!
Your Next Stage
Do you work through the instructional exercise?
Work through the above instructional exercise.
List any inquiries you have in comments section.