SAP Data Intelligence, SAP Data Quality Management

Using SAP Data Quality Management, microservices for location data in SAP Data Intelligence Pipeline

I am happy to hear that many of you have signed up for SAP Data Quality Management, microservices for location data to uncover the address cleansing and geocoding capabilities on the SAP Business Technology Platform (BTP).

Today I want to show how you can leverage these microservices for location data in the SAP Data Intelligence. The first thing, you may have noticed a little naming differences of the service. On the SAP BTP Cockpit, the microservices for location data is available as Data Quality Services. In SAP Data Intelligence, you can find the microservices for location data as DQMm operators in the Modeler application. We have DQMm Address Cleanse operator and DQMm Reverse Geo operator, and they are used in conjunction with DQMm Client operator which is a specialization of OpenAPI Client operator. These operators are available on SAP Data Intelligence Cloud as well as SAP Data Intelligence on premise.

DQMm Operators Overview

When you log in to the SAP Data Intelligence, go to the Modeler application. You can browse some graphs using the DQMm operators in the Graphs tab, and the DQMm operators in the Operators tab. They are currently available as Generation 1 operators.

Let’s open the configuration of the DQMm Address Cleanse operator. You can adjust the settings according to your preference.

Now open the DQMm Client configuration. You will need to set the connection properties with the OAuth authorization credentials.

  • Host: [url]:443
  • oauth2TokenUrl: [url]/oauth/token?grant_type=client_credentials
  • oauth2ClientId: [clientid]
  • oauth2ClientSecret: [clientsecret]

To find these values for you, you can go to the SAP BTP Cockpit where you subscribed to the Data Quality Services. Click the Service Key you created to see the credentials to find those property values.

Now you have configured the DQMm operators, let’s run the graph. You can also see the status if the graph has completed successfully.

At the time of writing this article, the graph template is still using the deprecated Write File operator. You might as well replace it with the latest Write File operator. Just delete the old File Writer operator (com.sap.storage.write) and add a new File Writer operator (com.sap.file.write). When you connect from the DQMm Client operator to the Write File operator, you will see the ToFile converter which converts the message to string. Select the first option.

Set the Path to the file path you want the output file to be, and change the Mode to Append to capture all the records.

Examine the DQMm output

Let’s go to the System Management to look at the generated output.

If you look at the data, you can see the attributes and body of the message are concatenated together, which is forming an invalid JSON structure.

Format the DQMm output

To convert the output of DQMm Client operator to the valid JSON format, let’s use the Format Converter operator. To do that, we need a few operators to prepare the data first. Let’s add a JavaScripts operator with two ports: input and output of the message type. Now you can add some JavaScripts code to create a message body with the data you want to output. You can use the code snippets below and adjust it as you needed.

$.setPortCallback("input", onInput);

// Convert a byte array into a string.
function bin2String(array) {
  var result = "";
  for (var i = 0; i < array.length; i++) {
    result += String.fromCharCode(array[i]);
  }
  return result;
}


// Input data handler.
function onInput(ctx, s) {
    
    // Retrieve the HTTP status code to see if the
    // request was successful.
    var status = s.Attributes["openapi.status_code"];
    var id = s.Attributes["message.request.id"];
    var body = bin2String(s.Body);

    // If the request was successful then convert
    // the JSON into a format that the Format Converter
    // operator will understand and output to the
    // output port.
    if (status === "200") {
        
        // Convert the body to a JSON object.
        var json = JSON.parse(body);
        
        // Add the id if one is present.
        if (id !== null) {
            json.id = s.Attributes["message.request.id"];
        }
        
        // Wrap the JSON in an array for the Format Converter
        // operator.
        s.Body = [json];
        
        // Output the new message.
        $.output(s);
    }
    else {
        $.log("Error processing record with id " + id, $.logSeverity.ERROR, status, body);
    }
}

After the JavaScripts operator, add the ToBlob operator because the Format Converter expects a blob type input. Now you can connect the DQMm Client operator to the Format Converter operator, and then the Write File operator.

Run this graph and check out the output file again in the System Management. You should see the data in the valid JSON format.

Use of Configuration

In the DQMm Address Cleanse operator, you have an option to specify the configuration source. If you choose service, you can specify a configuration name.

To view or create your own configuration, you can go to the Configuration UI.

In the Configuration UI, you can view some predefined configurations. You can see that there are simple address configuration samples and also additional configurations for the Business Suite applications. You can select one that suits for your application, make a copy of it to create a new configuration, and start customizing it for your application.

Example of reading from and writing to HANA Table

Using the Format Converter, you can also convert the data from the JSON format to the CSV format, or vise versa. Here is an example of reading from a HANA table and writing back to another HANA table with the DQMm operators.

1. Read HANA Table – Read address data from a HANA table
2. ToString Converter
3. JavaScripts Operator (input: string, output: string) – Convert the data to CSV format.

$.setPortCallback("input",onInput);

// Create CSV data
function onInput(ctx,s) {
    var first = true;
    var output = "";

    var obj = JSON.parse(s);
    for (var i in obj)
    {
        if (!first)
        {
            output += "\n";
            first = false;
        }
        var array = obj[i].toString();
        output += obj[i].toString();
    }
    $.output(output);
}

4. ToBlob Converter
5. Format Converter – Convert to JSON format
6. ToMessage Converter
7. DQMm Address Cleanse
8. DQMm Client
9. JavaScripts Operator (input: message, output: message) – Same as the previous example to format the DQMm output
10. ToBlob Converter
11. Format Converter – Convert back to CSV format
12. SAP HANA Client – Write validated address data to a HANA table

Example of consolidating output records

When you run a graph with the DQMm operators, you might have noticed the DQMm Client operator sends a HTTP request to the service record by record, and each response is passed down to the subsequent operator per record instead of a collection of records. If you want to consolidate all these records together before sending down to the subsequent operator, you can add some code to wait for all requests to be processed and then consolidate the output records. Here is an example.

1. Message Generator – Same as the one in the sample graph
2. 1:2 Multiplexer
3. ToString Converter
4. JavaScripts Operator 1 (input: string, output: int64)

$.setPortCallback("input",onInput)

function onInput(ctx,s) {
    var json = JSON.parse(s);
    $.output(json.length)
}

5. DQMm Address Cleanse
6. DQMm Client
7. JavaScripts Operator 2 (inputtarget: int64, inputcurrent: message, output: message)

$.setPortCallback("inputtarget",onInputTarget)
$.setPortCallback("inputcurrent",onInputCurrent)

var target = -1
var current = 0

var dqmmout = [];

// Convert a byte array into a string.
function bin2String(array) {
  var result = "";
  for (var i = 0; i < array.length; i++) {
    result += String.fromCharCode(array[i]);
  }
  return result;
}

function processResults(s) {
    // Retrieve the HTTP status code to see if the
    // request was successful.
    var status = s.Attributes["openapi.status_code"];
    var id = s.Attributes["message.request.id"];
    var body = bin2String(s.Body);
    var result = {};

    if (status === "200") {
        // Convert the body to a JSON object.
        result = JSON.parse(body);
    }
    else {
        result.error = $.log("Error processing record with id " + id, $.logSeverity.ERROR, status, body);
    }
    result.id = id;
    result.status = status;

    return result;
}

function onInputTarget(ctx,s) {
    target = s
    if (current == target && dqmmout.length == target) {
          $.output(dqmmout)
    }
}

function onInputCurrent(ctx,s) {
    current++
    
    // processResults
    var result = processResults(s);
    dqmmout.push(result)
    
    if (current == target) {
        $.output(dqmmout)
    }
}

8. ToFile Converter
9. Write File