Sunday 10 March 2024

Create a Microsoft 365 Copilot plugin: Extend Microsoft 365 Copilot's knowledge

Microsoft 365 Copilot is an enterprise AI tool that is already trained on your Microsoft 365 data. If you want to "talk" to data such as your emails, Teams chats or SharePoint documents, then all of it is already available as part of it's "knowledge".

However, not all the data you want to work with will live in Microsoft 365. There will be instances when you want to use Copilot's AI on data residing in external systems. So how do we extend the knowledge of Microsoft 365 Copilot with real time data coming from external systems? The answer is by using plugins! Plugins not only help us do Retrieval Augmented Generation (RAG) with Copilot, but they also provide a framework for writing data to external systems. 

To know more about the different Microsoft 365 Copilot extensibility options, please have a look here: https://learn.microsoft.com/en-us/microsoft-365-copilot/extensibility/decision-guide

So in this post, let's have a look at how to build a plugin which talks to an external API and then infuses the real time knowledge into Copilot's AI. At the time of this writing, there is nothing more volatile than Cryptocurrency prices! So, I will be using a cryptocurrency price API and enhance Microsoft 365 Copilot's knowledge with real time Bitcoin and Ethereum rates!

(click to zoom)

So let's see the different moving parts of the plugin. We will be using a Microsoft Teams message extension built on the Bot Framework as a base for our plugin:  

1) App manifest

This is by far the most important part of the plugin. The name and description (both short and long) are what tell Copilot about the nature of the plugin and when to invoke it to get external data. We have to be very descriptive and clear about the features of the plugin here as this is what the Copilot will use to determine whether the plugin is invoked. The parameter descriptions are used to tell Copilot how to create the parameters required by the plugin based on the conversation.

2) Teams messaging extension code

This function does the heavy lifting in our code. It is called with the parameters specified in the app manifest by Copilot. Based on the parameters we can fetch external data and return it as adaptive cards. 

3) Talk to the external system (Cryptocurrency API)

This is helper function which is used to actually talk to the crypto api and return rates. 

Hope you found this post useful! 

The code for this solution is available on GitHub: https://github.com/vman/M365CopilotPlugin

Thursday 15 February 2024

Generate images using Azure OpenAI DALL·E 3 in SPFx

Dall E 3 is the latest AI image generation model coming out of OpenAI. It is leaps and bounds ahead of the previous model Dall E 2. Having explored both, the image quality as well as the adherence to text prompts is much better for Dall E 3. It is now available as a preview in Azure OpenAI Service as well.

Given all this, it is safe to say if you are working on the Microsoft stack and want to generate images with AI, using the Azure OpenAI Dall E 3 model would be the recommended option.

In this post, let's explore the image generation API for Dall E 3 and also how to use it from a SharePoint Framework (SPFx) solution. The full code of the solution is available on GitHub: https://github.com/vman/Augmentech.OpenAI

First, let's build the web api which will wrap the Azure OpenAI API to create images. This will be a simple ASP.NET Core Web API which will accept a text prompt and return the generated image to the client.

To run this code, we will need the following NuGet package: https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.13/

Now for calling the API, we will use a standard React based SPFx webpart. The webpart will use Fluent UI controls to grab the text prompt from user and send it to our API.

Hope this helps!

Thursday 25 January 2024

Get structured JSON data back from GPT-4-Turbo

With the latest gpt-4-turbo model out recently, there is one very helpful feature which came with it: The JSON mode option. Using JSON mode, we are able to predictably get responses back from OpenAI in structured JSON format. 

This can help immensely when building APIs using Large Language Models (LLMs). Even though the model can be instructed to return JSON in it's system prompt, previously, there was no guarantee that the model would return valid JSON. With the JSON mode option now, we can specify the required format and the model will return data according to it. 

To know more about JSON mode, have a look at the official OpenAI docs: https://platform.openai.com/docs/guides/text-generation/json-mode

Now let's look at some code to see how this works in action:

I am using the Azure OpenAI service to host the gpt-4-turbo model and I am also using the v1.0.0.-beta.12 version of the Azure OpenAI .NET SDK found on NuGet here:

https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.12

What is happening in the code is that in the system message, we are instructing the LLM that analyse the text provided by the user and then extract the cities mentioned in this text and return them in the specified JSON format. 

Also important is line 22 where we explicitly specify to use the response format as JSON.   

Next, we provide the actually text to parse in the user message. 

Once we get the data back in expected JSON schema, we are able to convert it to objects which can be used in code.

And as expected we get the following output:


Hope this helps!

Monday 18 December 2023

Using Microsoft Tokenizer to count Azure OpenAI model tokens

If you have been working with OpenAI APIs, you will have come across the term "tokens". Tokens are a way in which these APIs process and output text. Various versions of the OpenAI APIs have different token context lengths. This means there is a limit to the text they can process in a single request. More about tokens here: https://learn.microsoft.com/en-us/azure/ai-services/openai/overview#tokens

When building an app based on these APIs, we need to keep track of the tokens being sent and make sure not to send more than the maximum context length of the OpenAI model being used (e.g. gpt-3.5-turbo). If more tokens are sent than the maximum context length of the model, the request will fail with the following error:

To help with counting tokens before sending to the APIs, there are various libraries available. One of them being the Microsoft Tokenizer: https://github.com/microsoft/Tokenizer which is an open source .NET and TypeScript implementation of OpenAI's tiktoken library. 

So in this post, let's see how we can use the Microsoft Tokenizer .NET SDK to manage the tokens sent to OpenAI APIs.

First we will need the Microsoft Tokenizer nuget package:

https://www.nuget.org/packages/Microsoft.DeepDev.TokenizerLib/

Since we will actually be counting the tokens of a chat between the user and an AI assistant, we will also use the Azure OpenAI .NET SDK:

https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.8

Next, in our code we will first have to initialize the tokenizer and let it know which OpenAI model will we be working with. Most of the recent models like gpt-3.5-turbo, gpt-4 etc. share the same token encoding i.e. cl100k_base. So we can use the same tokenizer across these models.

Now let's look at the actual code:

What we have here is a sample chat history between a user and an assistant. Before sending the chat history to the OpenAI api to get the next message from the assistant, we are using the Tokenizer library to count the tokens, and if it comes out that there are more tokens present in the than the model supports, we are removing the earlier messages from the chat. This is so that the most recent conversations are sent to the API and the response generated stays relevant to the current conversation context. 

Hope this helps!

Sunday 26 November 2023

Manage Azure OpenAI Service using the Azure CLI

I was working on a project recently where we were using the Azure OpenAI service quite heavily. As part of creating the DevOps pipelines for the project, we had to look into automating the management of the Azure OpenAI service. Turns out this functionality is possible with the Azure CLI however it is available under the Cognitive Services module which can be a bit tricky to find. So here is a quick blog post detailing some of the more frequently used operations for the Azure OpenAI service through the Azure CLI:


For a full set of operations, please see the Microsoft docs: https://learn.microsoft.com/en-us/cli/azure/cognitiveservices?view=azure-cli-latest

Thursday 16 November 2023

Teams tab fails to load in the new Microsoft Teams Desktop client

The new Microsoft Teams Desktop client was made generally available for Windows and Mac recently. The good news is that the new client provides feature parity for 3rd party apps like Focusworks AI giving customers a choice of using their preferred Teams client to access the apps.

However, if you have a custom built Microsoft Teams tab or a task module as part of your solution, and find that it fails to load in the new Microsoft Teams client, there might be a specific reason for it. 

And since there is no way to invoke the Developer tools in the new Teams desktop client yet (November 2023), the experience can get a bit frustrating. 

In my case, I have a custom React/TypeScript based tab which is using the @microsoft/teams-js library to interact with Teams. 

Since teams tabs are just HTML pages, we need to make sure that the page is being loaded inside Teams before continuing to execute the code. To do that we can use the context.app.host.name property and check that the value was "teams" before moving ahead.

However, with the new desktop client my tab was failing to load. After a bit of digging around I realised that the new Teams desktop client has an entirely different host name property and the value is "teamsModern" as mentioned here: https://learn.microsoft.com/en-us/javascript/api/%40microsoft/teams-js/hostname?view=msteams-client-js-latest

So changing my code to include the new value as well worked!

Hope this saves you some debugging time!

Tuesday 24 October 2023

Connect an OpenAI chat bot to the internet using Bing Search API

In the previous post, we saw what is OpenAI function calling and how to use it to chat with your organization's user directory using Microsoft Graph. Please have a look at the article here: Chat with your user directory using OpenAI functions and Microsoft Graph

In this post, we will implement function calling for a very common scenario of augmenting the large language model's responses with data fetched from internet search.

Since the Large Language model (LLM) was trained with data only up to a certain date, we cannot talk to it about events which happened after that date. To solve this, we will use OpenAI function calling to call out the Bing Search API and then augment the LLM's responses with the data returned via internet search.

This pattern is called Retrieval Augmented Generation or RAG. 


Let's look at the code now on how to achieve this. In this code sample I have used the following nuget packages:

https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.6/

https://www.nuget.org/packages/Azure.Identity/1.10.2/

The very first thing we will look at is our function definition for informing the model that it can call out to external search API to search information:

In this function we are informing the LLM that if it needs to search the internet as part of providing the responses, it can call this function. The function name will be returned in the response and the relevant parameters will be provided as well.

Next, let's see how our orchestrator looks. I have added comments to each line where relevant:

This code is responsible for handling the chat with OpenAI, calling the Bing API and also responding back to the user based on the response from internet search. 

Next, let's have a look at the code which calls the Bing API based on the parameters provided by the LLM. Before executing this code, you will need to have created Bing Web Search API resource in Azure. Here is more information on it: https://learn.microsoft.com/en-us/bing/search-apis/bing-web-search/overview

The Bing Web Search API key can be found in the "Keys and Endpoint" section on the Azure resource:


Here is the code for calling the Bing Search API:

In this code, we are calling the Bing Web Search REST API to get results based on the search query created by the LLM. Once the top 3 results are fetched we are getting the text snippets of those results, combining them and sending it back the LLM. 

We are only getting the search result snippets to keep this demo simple. In production, ideally you will need to get the Url of each search result and then get the content of the page using the Url.

Finally, lets have a look at our CallChatGPT function which is responsible for talking to the Open AI chat API:
This code defines the OpenAI function which will be included in our Chat API calls. Also, the user's question is sent to the Chat API to determine if the function needs to be called. This function is also called again after the response from the Bing Web Search API is fetched. At that time, this function contains the search results and uses them to generate an output in natural language.

This way, we can use Open AI function calling together with Bing Web Search API to connect our chat bot to the internet!