Content Moderator Cognitive Service - Moderating Profanity
Microsoft provides a ton of artificial intelligence services through Azure that you can easily implement anywhere, and in this post I will be talking about just one of them: the Content Moderator service.
In fact, the Content Moderator service is quite large, so it may be more accurate that I will only be explaining one of the many endpoints of one of the many Cognitive Services Microsoft offers.
What this content moderator service does is, as its name may suggest, provide you information that will help you automate moderation decisions. For example, you may want to hide images that are sexually explicit, or that contain some sort of adult content. In the example that we will be working with we will be hiding profanity (curse words) from a "tweet" (you know, for our imaginary Twitter for Kids app).
Creating the service
First things first, you will, of course, need an Azure subscription, if you don't have one, first go to azure.com and get one, because I will immediately start to create the service.
To create the service, simply search for "Content Moderator" when creating a new resource from your Azure Portal, you should find this service, select it and hit on create.
While creating you will need to provide the usual information necessary to create any Azure Service, such as name, resource group, location, select which of your subscriptions you want to use, and most importantly, select the pricing tier.
As with most Cognitive Services, there is a free tier and a standard tier. They mainly differ in that the standard tier allows for more calls per minute, and that it has no limit (as long as you pay).
The free tier will be more than enough to perform a couple of tests, but keep in mind the restrictions that come with it (you won't be able to make that many requests per minute and once you reach the limit, errors will be returned by the service).
Once your service is up and running, you can select it from your dashboard or services list to find the keys that were created along with it. You will find a Keys tab, inside of which there are two keys that will be required to identify your service and will allow you to make the requests. Without one of these keys, your requests will simply return an unauthenticated error.
Make sure to keep these keys private, and if you ever think they have been exposed (as I often do in my Udemy courses), you can always regenerate them from this same tab, just keep in mind that if you regenerate them, they won't work anymore, so you will have to update the keys you are using in your apps to the new ones.
But enough chit-chat, let's get coding.
Making the request
I am going to be using Python to make the request, but this is simply a REST request, you can perform it in whatever language you like. I will first need the headers for the request. For this example we are going to be sending plain text, so the Content-Type will be set to that, but you could also be sending HTML, XML, and other formats:
import urllib.parse, http.client, json headers = { 'Ocp-Apim-subscription-Key':'YOUR_KEY', 'Content-Type':'text/plain' }
Notice that I also imported a few packages to be able to make the request and create the parameters. Precisely, the parameters, which I am URL-encoding right here, will allow me to ask for some additional information other than the default one (which already identifies profanity). To give you an example, I will ask for Personal Identifiable Information by setting the PII value to true in these parameters:
params = urllib.parse.urlencode({ 'PII':True })
I will finally be creating the body of the request (plain text as stated in the headers) and creating a couple of variables that will already contain the URL of the request and the entire endpoint:
body = 'Crap! I just bumped my toe in the corner of this stupid chair, that chair is stupid. That is a really stupid chair. I hate the stupid chair. That chair is a fucking pussy.' service_url = '[YOUR_LOCATION].api.cognitive.microsoft.com' endpoint = '/contentmoderator/moderate/v1.0/ProcessText/Screen?%s' % params
Notice that for the service URL you must know the location where you created your service. In my case, since I created in South Central US, this would be 'southcentralus'. You can double check this from your Azure portal by selecting the service and navigating to the Overview tab. There you will not only find the Location itself, but the Endpoint, which will look familiar (but incomplete) compared to the one we have (because that endpoint does not include the actual Screen endpoint that we are using in this example, just the one for the service itself).
After this, I will simply execute a POST request with the information that I have defined:
try: conn = http.client.HTTPSConnection(service_url) conn.request('POST', endpoint, body, headers) response = conn.getresponse() jsonData = response.read() text_data = json.loads(jsonData) print(json.dumps(text_data, indent=2)) conn.close() except Exception as ex: print(ex)
In this request I am simply printing the JSON response, let's take a look at it:
A couple of things to notice in here:
- Because I set the PII value in the parameters to True, I am receiving a PII object with many arrays. Once for the emails identified in the body, one for the IPA, one for the phone, etc. Now, it is not identifying any of that here, but it could be important that our Twitter for Kids app does not send our users' private information, so you could use that object to hid that information too.
- The Terms object (an array) contains a list of all those curse words that were found on the body that we sent. because we don't want our users' sensitivity (they are kids) to be affected, we will moderate these terms out when publishing the "tweet".
To moderate the curse words I can execute something like this, which will replace, from the original text, the identified curse words for asterisks:
moderatedText = text_data['OriginalText'] for term in text_data['Terms']: old_text = moderatedText[term['Index']:term['Index']+len(term['Term'])] new_text = '*' * len(old_text) moderatedText = moderatedText.replace(old_text, new_text) print(moderatedText)
Easy right? Simply get the original text value (from the JSON) through the text_data variable, iterate through all the terms in the Terms value (from the JSON), get the old text by getting a substring of the moderated text (starting at the index that the term has, and ending at that index plus the length of the term itself) and creating a new text that is all asterisks and is as long as the old text is.
So now, my moderated text looks like this, and this can be now sent to my Twitter for Kids app safely!