Microsoft Cognitive Services for AI : Vision API

microsoft-cognitive

Recently I took part into a Hackathon in which we were required to submit some innovative ideas for a well-known bank.

I registered and after few days I got an email from the Hackathon event team that they have arranged some webinars to help people to think about some innovative ideas.

I got impressed with the agenda of the webinar which included below points:

  • Microsoft Vision API
  • Microsoft Speech API
  • Microsoft Language API
  • Microsoft Knowledge API
  • Microsoft Search API

This was the first time I got to know about Microsoft Cognitive Services and when I learned more about this, I got to know that Microsoft Cognitive Services are very powerful.

Let us first see what is Microsoft Cognitive Services?

Microsoft Cognitive Services (formerly Project Oxford) are a set of APIs, SDKs and services available to developers to make their applications more intelligent, engaging and discoverable. Microsoft Cognitive Services expands on Microsoft’s evolving portfolio of machine learning APIs and enables developers to easily add intelligent features – such as emotion and video detection; facial, speech and vision recognition; and speech and language understanding – into their applications. Our vision is for more personal computing experiences and enhanced productivity aided by systems that increasingly can see, hear, speak, understand and even begin to reason.

It has basically 5 main features:

  • Vision
  • Knowledge
  • Language
  • Search
  • Speech

ai1

Let us see how Vision API works

Follow below steps which are required:

Also if you want to have Bot Application as a template then as a workaround just download this project and put the extracted folder into below location:

C:\Users\YourName\Documents\Visual Studio 2015\Templates\ProjectTemplates\Visual C#

Once this is done, you can see Bot Application template as shown below:

ai2

Click on Bot Application and then it will create a sample project which has the structure as below:

ai3

Here MessagesController is created by default and it is the main entry point of the application.

MessagesController will call the service which will handle the interaction with the Microsoft APIs. Replace the code into MessagesController with below code:

using System;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
using System.Web.Http;
using System.Web.Http.Description;
using Microsoft.Bot.Connector;
using Newtonsoft.Json;
using NeelTestApplication.Vision;

namespace NeelTestApplication
{
    [BotAuthentication]
    public class MessagesController : ApiController
    {
        public IImageRecognition imageRecognition;

        public MessagesController()  {
            imageRecognition = new IImageRecognition();
        }

        ///
        /// POST: api/Messages
        /// Receive a message from a user and reply to it
        ///
        public async Task<HttpResponseMessage> Post([FromBody]Activity activity)
        {

            ConnectorClient connector = new ConnectorClient(new Uri(activity.ServiceUrl));

            if (activity.Type == ActivityTypes.Message)
            {

                var analysisResult =await imageRecognition.AnalizeImage(activity);
                Activity reply = activity.CreateReply("Did you upload an image? I'm more of a visual person. " +
                                      "Try sending me an image or an image url"); //default reply

                if (analysisResult != null)
                {
                    string imageCaption = analysisResult.Description.Captions[0].Text;
                    reply = activity.CreateReply("I think it's " + imageCaption);
                }
                await connector.Conversations.ReplyToActivityAsync(reply);
                return new HttpResponseMessage(HttpStatusCode.Accepted);
            }
            else
            {
                HandleSystemMessage(activity);
            }
            var response = Request.CreateResponse(HttpStatusCode.OK);
            return response;
        }

        private Activity HandleSystemMessage(Activity message)
        {

            if (message.Type == ActivityTypes.DeleteUserData)
            {
                // Implement user deletion here
                // If we handle user deletion, return a real message
            }
            else if (message.Type == ActivityTypes.ConversationUpdate)
            {
                // Handle conversation state changes, like members being added and removed
                // Use Activity.MembersAdded and Activity.MembersRemoved and Activity.Action for info
                // Not available in all channels
            }
            else if (message.Type == ActivityTypes.ContactRelationUpdate)
            {
                // Handle add/remove from contact lists
                // Activity.From + Activity.Action represent what happened
            }
            else if (message.Type == ActivityTypes.Typing)
            {
                // Handle knowing tha the user is typing
            }
            else if (message.Type == ActivityTypes.Ping)
            {
            }

            return null;
        }
    }
}

In above code, you can find an interface called IImageRecognition. This interface includes the methods which will interact with the Microsoft APIs.

So now we will add an interface IImageRecognition and replace the code with below code:

using Microsoft.Bot.Connector;
using Microsoft.ProjectOxford.Vision;
using Microsoft.ProjectOxford.Vision.Contract;
using System.Threading.Tasks;

namespace NeelTestApplication.Vision
{
    public interface IImageRecognition
    {
        Task<AnalysisResult> AnalizeImage(Activity activity);    
    }
}

Once this is done, let us add ImageRecognition class which will inherit from IImageRecognition:

using Microsoft.Bot.Connector;
using Microsoft.ProjectOxford.Vision;
using Microsoft.ProjectOxford.Vision.Contract;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using System.Web;

namespace NeelTestApplication.Vision
{
    public class ImageRecognition : IImageRecognition
    {
        private   VisualFeature[] visualFeatures = new VisualFeature[] {
                                        VisualFeature.Adult, //recognize adult content
                                        VisualFeature.Categories, //recognize image features
                                        VisualFeature.Description //generate image caption
                                        };

        private VisionServiceClient visionClient = new VisionServiceClient(" https://www.microsoft.com/cognitive-services/en-us/sign-up");

        public async Task<AnalysisResult> AnalizeImage(Activity activity)  {
            //If the user uploaded an image, read it, and send it to the Vision API
            if (activity.Attachments.Any() && activity.Attachments.First().ContentType.Contains("image"))
            {
                //stores image url (parsed from attachment or message)
                string uploadedImageUrl = activity.Attachments.First().ContentUrl; ;
                uploadedImageUrl = HttpUtility.UrlDecode(uploadedImageUrl.Substring(uploadedImageUrl.IndexOf("file=") + 5));

                using (Stream imageFileStream = File.OpenRead(uploadedImageUrl))
                {
                    try
                    {
                        return  await this.visionClient.AnalyzeImageAsync(imageFileStream, visualFeatures);
                    }
                    catch (Exception e)
                    {
                           return null; //on error, reset analysis result to null
                    }
                }
            }
            //Else, if the user did not upload an image, determine if the message contains a url, and send it to the Vision API
            else
            {
                try
                {
                   return await visionClient.AnalyzeImageAsync(activity.Text, visualFeatures);
                }
                catch (Exception e)
                {
                   return null; //on error, reset analysis result to null
                }
            }
        }
    }
}

Note that you will be required to add an API key which you can get from the Cognitive Service page of Azure here.

ImageRecognition class has an important method named AnalizeImage which basically reads the image from the location and transfers it into the stream. Then it calls below API method and passes the image stream:

this.visionClient.AnalyzeImageAsync(imageFileStream, visualFeatures);

Above method will return AnalysisResult which can be extracted as below:

var imageCaption = analysisResult.Description.Captions[0].Text

So basically Image caption is the text it will return after analyzing the image.

Let us try this out.

If we want to test our bots locally then Bot emulator is the best option.

The Bot Framework Emulator is a desktop application that allows bot developers to test and debug their bots on localhost or running remotely through a tunnel.

As we mentioned on top of the post, you can download the Bot emulator from here.

The only important thing it requires is the URL of your API. For example in our case it would be:

http://localhost:PortNumber/api/messages

Now when we upload the image on Bot emulator, it will give the result as below:

ai4

It is awesome. Hope it helps.

3 thoughts on “Microsoft Cognitive Services for AI : Vision API

Leave a comment