How to Choose ML Algorithm: Machine Learning Questions & Answers Part – III

ML-06

This is the third post on Machine Learning Questions and Answers series. Look here for previous posts.

The fundamental requirement to start any Machine Learning project is to identify which algorithm we should apply for the business problem on which we are starting the Machine Learning project.

For that, you must have the ability to pick the algorithm by looking at the problem. You should have that vision to decide whether we should choose Classification or Regression or Clustering or Recordation etc.

Today’s question is somewhat around that.

  1. How to choose Machine Learning Algorithm from the problem given by Business?

Answer – 

The Very first step would be to create a Problem Statement from the problem given by the Business.

Importance of the Problem Statement

  • Whenever you start working on any Machine Learning project, you first need to understand what the problem is
  • You should be able to understand what type of problem we want to solve and it mainly depends on how you set the problem statement
  • Setting a problem statement includes the input and output of the business problem you want to solve
  • Problem statement must be concrete and it should not have multiple sentences in it
  • For example, if your company is getting many harmful spams or some fake emails which may harm your organization, you require an algorithm which can identify what are the spam\fake mails out of all the emails. So, in this case, an email would be the input and identification whether the email is spam\fake is the output. So the Problem statement would be “Is the mail spam\fake or not?

Once the problem statement is defined, you should be able to identify which algorithm you should choose among below types:

  • Any Classification Algorithm
  • Any Clustering Algorithm
  • Any Regression Algorithm
  • Any Recommendation Algorithm

When to use Classification Algorithms:

For example, your problem statement is a question to classify something(ie Is this good or bad?) or when your goal is to predict discrete values, e.g. {1,0}, {True, False}, {Spam, No Spam}

Classification is a Supervised Learning.

classification

Real Time Examples:

  • Is this mail a spam or not?
  • Is the customer happy or not?
  • After entering out of a restaurant, was the person looking satisfied or not? Here Resturant can be replaced with anything like Bank, shop, mail etc
  • Is the feedback provided by the customer positive or negative?
  • If the trading day is an Up-day or a down-day?
  • Classification algorithms are used by banks to classify loan applicants by their probability of defaulting payments

All above problem statement has some certain pattern. They involve taking an object or entity and classifying it. For example in Spam detection, we are classifying an Email

And on the other hand, you have something which needs to be classified. For example, if we are doing Binary classification then there might be 2 categories like the tweet is positive or not or if we are doing the classification with multiple categories then it needs to classify accordingly.

Example of Classification Algorithm:

  • K – Nearest neighbor
  • Decision Trees
  • Bayesian Classifier

When to use Clustering Algorithms:

For example, there is a large group of users and you want to divide them into particular groups based on some common attributes. But the key part is when the groups to be divided into are unknown beforehand. So you should go with Clustering Algorithm when there are not known attributes beforehand.

If there are questions like how is this organized or grouping something or concentrating on particular groups etc in your problem statement then you should go with Clustering.

Clustering is an UnSupervised Learning.

ml35

Real Time Examples:

  • When you have large social network site and you want to divide the users on basis of the Likes they made on the post or on basis of Demographics, so it helps to identify the meaningful groups
  • A company got to know particular products are not making sales as expected so those products can be clustered.  then only the cluster of such products would have to be checked rather than comparing the sales value of all the products
  • Most of the search engines like Yahoo, Google uses Clustering Algorithms to cluster web pages by similarity and identify the ‘relevance rate’ of search results. This helps search engines reduce the computational time for the users.
  • We all know Dominos guarantees you to deliver pizza in 30 minutes, they use Clustering to identify the Pizza shop location such a way that traveling time of the Pizza guy is minimized. That is the reason there are so sure that they will reach customers within 30 minutes

You should go with the Clustering when the major focus is to create groups which have similar attributes. For example, you’re given a set of history transactions which recorded who bought what. By using clustering techniques, you can tell the segmentation of your customers.

Example of Clustering Algorithm:

  • K- Means Algorithm
  • Expectation maximization

When to use Regression Algorithms:

For example, you want to compute some continuous value as compared to Classification where the output is categoric. So whenever you are told to predict some future value of a process which is currently running, you can go with Regression Algorithm.

Here you deal with numbers and there might be questions about How much or how many or how long or Impact of something on something else in your problem statement, in such cases you should go with Regression.

Regression is a Supervised Learning.

ml36

Real Time Examples:

  • What will be the value of Doller equivalent to the Bitcoin on any particular future date
  • How long it would take me to go Home from my office
  • What would be the sales of particular product next month
  • Impact of blood alcohol content on coordination
  • A credit card company applied regression analysis to predict monthly gift card sales and improve yearly revenue projections

As you can see in above all examples, we have some continuous process like BitCoin price, Compute Time, Sales etc and the output depends upon some certain inputs. For example in case of Compute time, it depends on the time of the day when I want to travel + the distance + weather etc.

Example of Regression Algorithm:

  • Linear Regression Algorithm
  • Logistic Regression Algorithm
  • Polynomial Regression Algorithm etc

When to use Recommendation Algorithms:

For example, when you want to determine what kind of theme a user would like in future based on the user’s past behavior. Like as Collaborative filtering(for example, User like you also liked kind of things)

So when there are some questions like what the user should do next or suggest something to the user or top choices of particular users etc in your problem statement then you should go with Recommendation.

The recommendation is an UnSupervised Learning.

ml37

Real Time Examples:

  • If a user buys the Washing Machine, what else he would buy in future
  • What are top 10 choices of books for a particular user
  • What kind of artist the user would like when he comes back on any music related applications
  • Providing user A the list of item which has been bought by another user B whose behavior is almost similar to user A

As you can see above examples, it mainly based on any particular user and his\her historical behavior.

Example of Recommendation Algorithm:

  • Collaborative filtering Algorithm
  • Logistic Regression Algorithm
  • Polynomial Regression Algorithm etc

Conclusion:

Whenever you want to identify which Algorithm to use:

  • Set one concrete Problem statement out of the problem given to you
  • The choice you made here will completely determine what would be the next steps
  • Make the choice which Algorithm you should choice by looking at the problem statement like what exactly is needed
  • Choose the type(ie Clustering) and choose the Algorithm(ie K-Means Algorithm) and start your project

Hope it helps.

 

 

 

 

Advertisements

8 thoughts on “How to Choose ML Algorithm: Machine Learning Questions & Answers Part – III

  1. just design the datapoints required and which model to be used? I have problem sttement and not any data. May I share that problem with u?

    Like

  2. Hi Neel, great article, I really appreciate your contributions. What category do you think ‘classify objects within an image’ would fall into? I had originally thought of this as a classification problem. Suppose I have a mechanical product and wanted to detect if all the screws/connectors have been added before shipping, is this a good machine learning problem?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s