Overview of Azure Cognitive Services
At Allcloud, our favorite projects are smart enough to anticipate your needs and answer questions you didn’t even know existed.
We’re incredibly excited to watch artificial intelligence and data science evolve and become more accessible to everyone. To that end, some of our favorite cloud offerings are within Microsoft’s Cognitive Services.
About Microsoft Cognitive Services
Microsoft has been at the forefront of machine learning for years, with obvious use cases like speech recognition in Cortana and optical character recognition in OneNote, as well as more subtle use cases like spam filtering in Outlook and explicit image filtering in Bing.
Traditionally, artificial intelligence required significant investments in talent, hardware, and time to yield results. Companies like Microsoft are democratizing AI by offering sophisiticated algorithms as low-cost “a la carte” web services.
Cognitive Services (introduced in 2015 as Project Oxford) offers a suite of more than two dozen algorithms honed by data scientists and millions of test cases. Cognitive Services are sold via the Azure portal and are billed based on consumption at a fraction of a penny per service call.
We have integrated various services into Digital Asset Management (DAM) systems, enterprise search platforms, websites, and ERP. Cognitive Services consistently deliver “wow factor”.
Summary of Cognitive Services and Use Cases
Academic Knowledge API
Academic Knowledge API is one of the most advanced APIs offered. It taps into Microsoft’s Academic Graph (MAG), which mines semantic data from Bing. It provides tools to interpret, evaluate, and search scholarly articles.
These tools are among the most specialized that Microsoft offers, but are incredibly useful in knowledge management and search applications.
Bing Speech API
Bing Speech API enables text-to-speech for 20 common languages, with multiple voice options. The voices are significantly more natural and relatable than we’re used to with other computerized voices.
The text-to-speech API is useful for hands-free mobile applications. It is often combined with other services.
Bing Spell Check API
Bing Spell Check API is one of the most intuitive APIs. It allows for spell-checking and automatic grammar correction of uploaded text (e.g., adding punctuation and proper casing).
Spell checking and processing is important for user-submitted content.
Computer Vision
Computer Vision allows us to analyze an image to determine an automated caption (e.g., “a man swimming in a pool of water”), automated tags (e.g., “water”, “sport”), detect faces with estimated age and gener (e.g., “36 year-old male”), technical details (i.e., resolution and format), and dominant colors.
This may be the most impressive cognitive service, with many clear applications. Computer Vision can be used to generate metadata for searching, filter offensive content, and provide descriptions for users with limited visibility.
Content Moderator
Content Moderator is a collection of automated and human-driven review tools for test, images, and video. This API allows us to detect profanity, lewd imagery, and sensitive content such as Personally-Identifiable Information (PII) in more than 100 languages. The human review tool allows for moderation workflows with higher confidence.
Content Moderator is a good idea for user-driven websites and enterprise content management systems.
Custom Decision Service
Custom Decision Service drives personalized content for applications based on user behavior. It provides reinforced learning, recommending which content to place where based on user intent. It can be integrated with a few lines of JavaScript and an RSS feed.
This service is useful for adding contextual content to web pages and placing relevant advertisements.
Custom Speech Service
Custom Speech Service enables advanced uses cases, like recognizing speech of a specific set of users, optionally using a custom vocabulary. There are advanced options for companies who need to train their own speech models.
This can be useful for improving speech-to-text recognition in specialized fields like medicine.
Custom Vision Service
Custom Vision Service allows advanced use cases to build our own Computer Vision Model. Microsoft offers methods to upload images, train the API by labeling those images, and providing generational feedback in order to improve the API.
Although the REST APIs are easy to code against, this API is likely excessive for most applications.
Emotion API
Emotion API is related to Face API. It identifies faces in an image and analyzes them against known indicators for anger, contempt, disgust, fear, happiness, neutrality, sadness, and surprise.
Emotion API can be used to augment other facial matching and Computer Vision applications.
Entity Linking Intelligent Service
Entity Linking Intelligent Service is an advanced API for understanding text entities. Microsoft’s example includes associating “times” with either “New York Times” or “Times Square” depending on context clues.
This API allows for deep understanding of large datasets, such as parsing wikis.
Face API
Face API detects faces in an image and can be used to build a database of known faces. The demo allows you to upload two photos and determine the likelihood that they are from the same person.
Facial recognition has value in search and DAM applications.
Knowledge Exploration Service
Knowledge Exploration Service parses natural language search queries into expressions and allows for query autocompletion.
This API is currently in preview, but has clear use cases in search engines.
Language Understanding Intelligent Service (LUIS)
Language Understanding Intelligent Service (LUIS) provides tools to understand user voice and intent.
LUIS is useful for mobile applications with natural voice controls.
Linguistic Analysis API
Linguistic Analysis API allows for advanced parsing of text to examine sentence structure. It tokenizes text, so you can be confident what your users intend when issuing voice commands.
We have not had the opportunity to work with Lingustic Analysis, but a clear use case is eliminating ambiguity in voice commands.
QnA Maker API
QnA Maker API provides machine learning to convert a static Frequently Asked Questions (FAQ) webpage into an interactive natural language knowledge base.
This can be useful for modernizing large FAQs, allowing users to type their questions instead of browsing large pages.
Recommendations API
Recommendations API allows for the familiar functionality of “Users who like X also like Y”. It allows for user/transaction datasets to be tracked, providing personalized recommendations.
Recommendations can be integrated into ecommerce (product recommendations), social networks (people you may know), and search platforms (keyword recommendations).
Speaker Recognition API
Speaker Recognition API allows us to train voice models for known speakers, and then determine if new audio originated from a known speaker. The online demo uses clips from recent presidents.
This is a specialized service that we haven’t had the chance to work with.
Text Analytics API
Text Analytics API extracts information and sentiment from uploaded text. It returns the interpreted language, extracts key phrases, and determines a sentiment score ranging from 0% (negative) to 100% (positive).
Text analytics are often used to assess social network sentiment and analyze user-submitted reviews.
Translator Speech API
Translator Speech API is an automatic translation service that operates on speech with support for 60 languages. It supports real-time transcription and translation between known languages.
This has some amazing real-world use cases, particularly for telecommunications. It is especially powerful when combined with Bing Speech API and LUIS.
Translator Text API
Translator Text API is similar to the Translator Speech API, except it operates on text. It supports 60 languages, covering countries that account for more than 95% of the world’s GDP.
There are countless use cases, including automatic translation of training and sales materials.
Video API
Video API has a collection of functions that analyze and can even transform content. API calls exist to identify and track faces throughout a video. It is possible to track motion and detect changes (e.g., in order to run other APIs on specific frames). There is also an option to stabilize motion.
The use cases are clear for both existing videos at rest and live streaming.
Video Indexer
Video Indexer provides tooling for advanced metadata extraction and processing of videos. There are options to transcribe spoken words, identify faces, and extract text from multiple video formats.
Video Indexer will be integral for Digital Asset Management and search platforms.
Web Language Model API
Web Language Model API provides probabilistic algorithms for natural language processing. Examples include probabilities for words appearing together and completing sentences.
There are multiple use cases when processing text, including parsing OCR-ed content to determine word breaks.
Bing Search Services
Microsoft also offers several API calls that tap directly into the Bing search index: