Process Automation with Azure Functions and TagUI

Written by raoulbia | Published 2023/02/02
Tech Story Tags: azure-functions | rpa-tools

TLDRvia the TL;DR App

In this article, I share what I learned from exploring how to integrate Azure Functions and TagUI in order to create a process automation proof-of-concept (POC) application. The application runs on a schedule, on the cloud, and saves the output to blob storage. The POC scrapes a table from a Wikipedia page, saves the table as a CSV file, and uploads the file to Azure Blob storage. No need to worry about having a running machine at home or at work, or to kick it off manually or to set up a separate scheduler to execute the script. No need to use a proprietary web scraping tool, or to code a solution from scratch.

Azure Functions is a cloud service available on-demand that provides the infrastructure and resources needed to run applications. TagUI is an open-source Robotic Process Automation (RPA) tool developed by AI Singapore and the community to help you rapidly automate your repetitive or time-critical tasks — use cases include process automation, data acquisition and testing of web apps.

The article covers three main development areas : Azure Functions-specific choices, the application dependencies, and the tweaks necessary to make the integration work. The function uses NodeJS as this seemed to be the most appropriate choice given that TagUI is a native JavaScript application. The local development environment consists of Visual Studio Code (VSC), the VSC Azure Tools extension, and npm and NodeJS.

Please note that the application described in this article is a proof-of-concept (POC) rather than a fully fledged NodeJS application. For this reason the code is kept to the minimum with no error handling or similar advanced software development best practice.

Azure Functions

Serverless vs. App Service Plan

Even though my preference would have been to implement the POC using the free, serverless Azure Functions Consumption Plan, it turned out that the ability to manipulate files and directories is limited. The Consumption Plan does not come with a data directory. The App Service Plan on the other hand allows for far greater freedoms. It comes with a data directory (/home/data/) and it provides access to Advanced Tools (in the portal UI) which include a Bash shell to navigate the directories of the function instance. However, the downside is that the App Service Plan are not supported in Free and Shared Plans.

For this POC the following cloud architecture was used:

  • an Azure Resource Group with
  • a Storage Account (Standard, Locally-redundant, with hierarchical namespace)
  • Function App (Code, Node.js 16LTS, Linux OS) with App Service Plan
  • an Azure function of type Timer trigger to allow for automation based on a schedule

Anatomy of an Azure Function

The bare-bones code of such a function is shown below. Code can be placed inside or outside the async function. When placed outside the async function, the code instructions are executed once on startup. When placed inside the async function, instructions are executed on each request/trigger handled by the function. This distinction is noteworthy as it will guide where the various application code snippets will be placed.

// This code only runs at first startup of the function instance
 Console.log(‘Hello World’)

module.exports = async function (context, myTimer) {

     //This code runs at each invocation/request
    var timeStamp = new Date().toISOString();
    if (myTimer.isPastDue)
    {
        context.log('JavaScript is running late!');
    }
    context.log('JavaScript timer trigger function ran!', timeStamp);   

};

Dependencies

The POC Azure function has three core dependencies: TagUI, Azure-Storage and ShellJS which is used to execute shell commands. These dependencies are defined in the file package.json as shown below. To install the dependencies, simply run npm install from the command line. NPM will create a new directory named node_modules to hold all the installation files, including the TagUI source files. Note that npm install will raise security warnings. For the purposes of this POC they will be ignored. The warnings relate to casper.js , a JavaScript utility that is installed as part of the TagUI npm package.

{
  "name": "tagui-azfunc-nodejs",
  "version": "1.0.0",
  "description": "",
  "scripts": {
    "start": "func start",
    "test": "echo \"No tests yet...\""
  },
  "dependencies": {
    "tagui": "^5.0.0",
    "shelljs": "^0.8.5",
    "azure-storage": "^2.10.7"
  }
}

TagUI

Adapting the Source Code

TagUI was not designed to run on a third-party platform. In order to integrate TagUI with Azure Functions, some modifications to the TagUI source code need to be made. The path to the source code file isnode_modules/tagui/src/tagui .

  • Change the shebang line at the top of the script from#!/usr/bin/env bash to #!/usr/bin/env /bin/bash
  • Set the OpenSSL configuration environment variable: add export OPENSSL_CONF=/etc/ssl/ to the script (e.g. to line 6)
  • In line 83, change tagui_web_browser="phantomjs" to tagui_web_browser=”google-chrome"

My knowledge of the inner working of Azure Functions is limited at this moment, and the changes listed above reflect the solution that worked for me. Other approaches might be possible. I identified these changes through trial and error, and with the help of Google searches and Stack Overflow. Changing the shebang line is needed to accommodate for the default file structure of the VM hosting the Azure Function. The default browser used by TagUI is PhantomJS, a headless web browser. I changed it to Chrome to use a browser I am familiar with. It may be possible to use the default TagUI browser. For the remaining modification, I can only make an educated guess. The openssl environment variable must be set in order for TagUI to interact with the target website.

Dependencies

A key obstacle to integrating TagUI with Azure Functions is its dependencies. TagUI requires PHP, Python and a web browser, all of which are not native to Azure Functions. It turns out that the npm package ShellJS can be leveraged to execute JavaScript commands and to install those dependencies. All that needs to be done is to wrap commands inside a shell exec statement, e.g. shell.exec(apt-get …) . The dependencies of the POC are installed by adding the code below to theindex.jsonfile. Note how these instructions are above (i.e. outside) the async function to ensure that they are executed only once, on startup.

const shell = require('shelljs') ;

// PHP
shell.exec('apt-get install php -y') ;

// Python
shell.exec('apt-get install python3') ;
shell.exec('ln -s /usr/bin/python3.9 /bin/python') ;

// Chrome
shell.exec("wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb");
shell.exec('sudo dpkg -i google-chrome-stable_current_amd64.deb');

module.exports = async function (context, myTimer) { 
… 
}


The TagUI script

The TagUI script used for this POC is shown below. In two lines of code, an HTML table is scraped from a Wiki page and saved as a CSV file. The script should be saved with a .tagui extension e.g. wiki.tagui, and saved in a custom directory e.g. scripts.

// visit website
https://en.wikipedia.org/wiki/Instrumental_temperature_record

// save table
table (//table)[1] to /home/data/filename.csv

The TagUI script is executed by the async function as shown below. The pattern to execute a TagUI script is path-to-tagui-executable , followed by path-to-tagui-script. Note the -h flag to indicate that TagUI should execute in headless mode, i.e. without physically launching a browser.

module.exports = async function (context, myTimer) {  
       
    // execute TagUI script
    var tg = shell.exec('/home/data/node_modules/tagui/src/tagui /home/data/scripts/wiki.tagui -h');
    // see the TagUI logs
    context.log(tg);
    ...
};

The Data Directory

The primary reason for developing this application using the App Service Plan option is that it provides access to a data directory from where TagUI can be executed. The two directories that are required for executing the application are scripts, where the TagUI script resides, and node_modules, where the TagUI executable resides. The code below shows how to copy the directories recursively to /home/data. The copy instructions are above (i.e. outside) the async function to ensure that they are executed only once, on startup.

shell.exec('cp -ar ./scripts/ /home/data/');
shell.exec('cp -ar ./node_modules/ /home/data/');


Upload to Blob storage

The upload of the CSV file to blob storage can be achieved using the code below. The function createBlobService() will look for credentials. These can be provided in the form of the connection string for the Storage Account. The connection string should be saved in the App Function Configuration section under Settings. Create a new application setting named

AZURE_STORAGE_CONNECTION_STRING to store the connection string.

    var blobSvc = await azure.createBlobService();
    blobSvc.createBlockBlobFromLocalFile('container_name', destination_filename, '/home/data/filename.csv', function(error, result, response){
        if(!error){
          context.log('File successfully uploaded to Blob Storage');
          //remove the file after upload 
          shell.exec('rm -r /home/data/filename.csv');
        } 
      });

Publishing and Testing

The process for deploying the application to Azure Functions consists of two steps. First, the application will be deployed and built in its non-customised state. Second, the custom TagUI code will be pushed to Azure cloud. The first deployment is done by right-clicking on the code directory and selecting Deploy to Function App. This will push the files to the App function and perform the npm install of the dependencies. Once completed, the custom TagUI source code can be pushed using the following command:

func azure functionapp publish <Function App name>

To test the application, click on Functions in the Azure Function App, open the function and click on Code + Test. The screenshot below shows the output produced by the TagUI script during execution. To view the TagUI output, store the TagUI execution call in a variable (e.g. tg) and print the variable to the Test/Run console with context.log(tg);

Final Remarks

There are many tried and tested ways of automating RPA tasks. However, to the best of my knowledge, the integration of TagUI and Azure Functions has not been done before. I spent a good few hours trying to get this integration to work and was quite satisfied to have been able to develop a working POC. The proposed solution has, I believe, very little overhead and is easy to implement. TagUI is a powerful open-source RPA tool that can, with a little programming experience, be programmed to automate a great deal of mundane web-based tasks. Azure Functions provides a cost-effective cloud environment for hosting a TagUI RPA solution.

I actually set out to get this integration working in order to learn mode about Azure Functions. My verdict is that the learning curve for Azure Functions is steep. There are a lot of options to chose from, testing is somewhat cumbersome, and the serverless offering is rather limited for non-standard projects. That said, once you get the hang of them, Azure Functions are indeed quite useful and versatile. I will certainly explore them further.

The repo with the code for this proof-of-concept integration of TagUI and Azure Functions can be found here: https://github.com/raoulbia/azure-function-node-tagui-poc.


Written by raoulbia | Microsoft Certified Azure Data Engineer. MSc. Computer Science
Published by HackerNoon on 2023/02/02