SAP Data Intelligence

SAP Data Intelligence – What’s New in DI:2022/05

SAP Data Intelligence, cloud edition DI:2022/05 will soon be available.

Within this blog post, you will find updates on the latest enhancements in DI:2022/05. We want to share and describe the new functions and features of SAP Data Intelligence for the Q2 2022 release.

Overview

This section will give you a quick preview about the main developments in each topic area. All details will be described in the following sections for each individual topic area.

SAP Data Intelligence 2022/05

Metadata & Governance

In this topic area you will find all features dealing with discovering metadata, working with it and also data preparation functionalities. Sometimes you will find similar information about newly supported systems. The reason is that people only having a look into one area, do not miss information as well as there could also be some more information related to the topic area.

Validation Rules operator integrate with Metadata Explorer

USE CASE DESCRIPTION:

  • Ability for a Modeler to build a pipeline graph that reuses trusted Metadata Explorer’s validation and quality rules
  • Execution of rule validation from pipeline and reuse rules within rule operator

BUSINESS VALUE – BENEFITS:

  • Validation and quality rules created and defined by a subject matter expert in Metadata Explorer’s rulebooks can be reused by a Modeler in pipeline
  • Ability to run rulebooks in a pipeline and send pass and failed records to respective targets
    • Allow subject matter expert to ‘fix’ failed records to improve quality of the data
  • Collaboration between data stewards / subject matter experts and modeler / developers
  • Quickly be able to use rules in a pipeline without having to create the rules from scratch

Public APIs for metadata exchange

USE CASE DESCRIPTION:

  • Ability to export out Metadata Explorer’s including:
    • Lineage information of datasets, including relations with other datasets
    • Used transformations and computations
    • Schema information
    • Profiling data
    • User descriptions

BUSINESS VALUE – BENEFITS:

  • Ability to consume and use exported information in reporting tools for:
    • Analysis
    • Creating plot graphs to visualize lineage information based on organizational needs and requirements
    • Reuse descriptions and annotations

Add Rules – Add Publishing – Add Connectivity within Metadata Explorer

BUSINESS VALUE – BENEFITS:

  • Expanded functionality support for sources * * New with DI:2022/05

Connectivity & Integration

This topic area focuses mainly on all kinds of connection and integration capabilities which are used across the product – for example: in the Metadata Explorer or on operator level in the Pipeline Modeler.

Connectivity to Teradata

Creating a new connection of type “TERADATA” in the connection management that can be used in Metadata Explorer as well as data source for extraction use cases in pipelines.

  • Supported version: 17.x
  • Support via SAP Cloud Connector

Supported qualities:

  • Metadata Explorer
    • browsing
    • show metadata
    • data preview (tables)
  • Data Extraction via Generation 2 Pipelines
    • Table Consumer
    • SQL Consumer
    • SQL Executor

Support of Google Cloud Storage (GCS) as target in Replication Flows

Creating a Replication Flow now allows to write data in form of files to GCS as a target using the following properties

  • Container (Target file root path)
  • Group Delta By (none, date, hour)
  • File Type (csv, parquet, json, json lines)
  • File compression (only for parquet)

For each replication flow, you can add one or several tasks to load the data in to GCS and:

  • Perform filtering (optional)
  • Change column mapping (optional)
  • Set or change target name
  • Select load type on data set level

Support of HANA Data Lake (HDL) Files as target in Replication Flows

Creating a Replication Flow now allows to write data in form of files to HDL-Files as a target using the following properties:

  • Container (Target file root path)
  • Group Delta By (none, date, hour)
  • File Type (csv, parquet, json, json lines)
  • File compression (only for parquet)

For each replication flow, you can add one or several tasks to load the data in to HDL-Files and:

  • Perform filtering (optional)
  • Change column mapping (optional)
  • Set or change target name
  • Select load type on data set level

Support of JSON & JSON Lines as target file type in Replication Flows

When creating a Replication Flow selecting a cloud object store as target (AWS S3, ADL V2, HDL Files or GCS), you can now also select:

  • JSON and
  • JSON Lines

as file formats in addition to previously available csv and parquet file formats.

When choosing JSON as file format, you can select between two different json formats:

  • Records
  • Values

Mass Data Replication via Replication Flows

Pipeline Modelling

This topic area covers new operators or enhancements of existing operators. Improvements or new functionalities of the Pipeline Modeler and the development of pipelines.

Migration graph for merging part files

USE CASE DESCRIPTION:

  • merge small part files generated by replication flows, including both initial and delta loads
  • Supported merge scenarios/file formats
    • CSV to CSV
    • Parquet to Parquet

BUSINESS VALUE – BENEFITS:

  • Achieve replication with configurable file size

Administration

This topic area includes all services that are provided by the system – like administration, user management or system management.

Encrypt data using Customer Managed Keys

USE CASE DESCRIPTION:

  • Integration of SAP Data Custodian Key Management Service and SAP Data Intelligence
    • Supported for new DI Cloud instances created in AWS where a SAP Data Custodian Key Management service instance is available
    • Feature can be enabled during the creation of a new DI instance
    • Option to provide an existing Data Custodian Key reference to be used in the new DI instance

BUSINESS VALUE – BENEFITS:

  • Increased flexibility to use own encryption keys

Intelligent Processing

This topic area includes all improvements, updates and way forward for Machine Learning in SAP Data Intelligence.

Standalone Jupyter Lab Notebook

USE CASE DESCRIPTION:

  • Use Jupyter Lab for:
    • EDA
    • Data Preprocessing
    • Data Manipulation

without a hard dependency on ML Scenario Manager.

BUSINESS VALUE – BENEFITS:

  • Jupyter Lab app has its own tile on the Launchpad
  • Enabling of its usage independently of ML Scenario Manager, without necessarily affecting any of the existing scenarios in MLSM
  • associate Jupyter Lab Notebooks to an existing ML Scenario

These are the new functions, features and enhancements in SAP Data Intelligence, cloud edition DI:2022/05 release.