Use the TextRazor API to grab and clean the text from an url

Retrieving the text from an url, cleaning it from all HTML tags is a recurring request on Intercom.

This easy tutorial accessible will allow you to do exactly that in Voiceflow.

FIRST STEP, GETTING AN API KEY

image
First, we will create a free account on TextRazor to get an API key that we will then use to authenticate our API call.

TextRazor offers several plans and you can start with the free one.

After creating your account you should be redirected to this page.

Copy your API key somewhere because we will need it a little later.

For the most curious, you can study the TextRazor API documentation which is not only about retrieving text from a web page :slight_smile:

IMPORTING THE DEMO PROJECT IN VOICEFLOW

You are now ready to import our HTML Cleaner Demo project into your Voiceflow account. We will study it together in the rest of this tutorial.

Click here to import the project.

THE PROJECT

As you can see, nothing very complex here.

Start by clicking on the “Settings” set block and paste in the API key you retrieved earlier.

In this demo, we will retrieve the text from the New York Times article: Apollo 11 As They Shot It

But of course, you can modify the value of the URL variable by the one you want.

The following block will call the TextRazor API.

13%20PM

As you can see in the settings (and after studying the API documentation), we make a POST request to the endpoint api.textrazor.com

For the headers, you will not have to modify anything but know that the API key is passed in the x-textrazor-key header with the {razorApiKey} variable you’ve set in the previous set block. The Content-Type is application/x-www-form-urlencoded.

In the Body section, we have selected the Form Url-Encoded tab.
The url parameter (as its name suggests) will contain the URL you passed in the {url} variable.
The following two parameters allow you to obtain a cleaned version of the HTML page.

For the Mapping output part of the Integration block, we map response.response.cleanedText to our variable {cleanedText} and response.ok to {status}.

The {cleanedText} variable will contain the cleaned text

And the {status} variable, true or false depending on the result of the operation.

The last part is to check that our request has been successful and redirects to the corresponding speak block.

IT’S ALL GOOD

Finally, all you have to do is test it in Voiceflow, ADC (Alexa Developer Console) or directly on a device linked to your Amazon developer account.

Again, you can import the HTML Cleaner Demo project in Voiceflow from this link.
image