

SAP Data Intelligence in collaboration with Document Information Extraction Serv...
source link: https://blogs.sap.com/2022/06/24/sap-data-intelligence-in-collaboration-with-document-information-extraction-service/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

SAP Data Intelligence in collaboration with Document Information Extraction Service

Introduction
Document Extraction Information is a BTP service, and helps you to process large amounts of business documents.
The purpose of this blog post is to demonstrate how you can combine it with SAP Data Intelligence. The use case is simple: we want to upload documents into Document Extraction Services using Data Intelligence.
Preparation
To be able to execute next steps you should have running instance for both services. In this tutorial “Set up Account for Document Information Extraction and Go to Application” you will find the steps for activating a trial BTP account with the service. So, I’m not going to repeat it here.
Sample invoices that I’m going to use in this post you can download from the following tutorial page.
Workflow
Upload a document in Document Information Extraction directly.
I uploaded one invoice document by clicking on the “+” button and extracted some fields:

Upload a document

Select a document

Select header fields for extraction

Select line item columns for extraction

Overview

Result

The document is ready
Upload a document in Document Information Extraction Service with Data Intelligence
Create a connection
To connect these two services I’m going to use APIs provided by SAP API Business Hub.
Firstly, I created an OPENAPI connection in DI:

OPENAPI connection in Data Intelligence
Client credentials can be founded in a key file on BTP:

Credentials for connection, BTP
Create a pipeline
Upload a document in DI.
For this tutorial I uploaded a pdf file in “Files” in DI.

Upload a document into DI
Create a Custom Python Operator
In this post I’m using Gen1 operators.
Python code for the operator:
import requests
import json
import pandas as pd
restConn = api.config.connection['connectionProperties']
base_url = "https://" + restConn['host']
token_url = restConn['oauth2TokenEndpoint'] + '/oauth/token?grant_type=client_credentials'
url = base_url + '/document-information-extraction/v1' + '/document/jobs'
headers = {}
var= {}
body = {}
# get token
api.send("debug", "--- get token ---")
r = requests.get(token_url, auth=(restConn['oauth2ClientId'], restConn['oauth2ClientSecret']))
api.send("debug", str(r.status_code))
var = r.json()
token = var['access_token']
api.send("debug", 'Token: ' + str(token))
# get definitions of document endpoint
body['client_id'] = restConn['oauth2ClientId']
body['client_secret'] = restConn['oauth2ClientSecret']
body['type'] = 'client_credentials'
headers['Authorization'] = 'Bearer ' + token
headers['accept'] = 'application/json'
payload = {"payload": json.dumps(body)}
r = requests.get(url, data = payload, headers=headers)
api.send("debug", "--- GET ---")
api.send("debug", str(r.status_code))
api.send("debug", str(r.text))
# post document
options = {
"extraction": {"headerFields": ["documentNumber", "currencyCode"]},
"clientId": "default",
"documentType": "invoice"
}
payload = {"options": json.dumps(options)}
file = {'file':('sample-invoice-2.pdf', open('/vrep/sample-invoice-2.pdf', 'rb'), "application/pdf")}
r = requests.post(url, headers = headers, data = payload, files = file)
api.send("debug", '--- POST --- ')
api.send("debug", str(r.status_code))
api.send("debug", str(r.text))

Graph

Output in Terminal
Let’s check the Document Information Extraction. We should see one more document there:

Document Information Extraction
Conclusion
You see, how simple we can automate document uploading into Document Information Extraction using another SAP BTP Service – Data Intelligence. Another use case could be document details extraction with DI into a database.
Please, be aware that this post is just my personal idea, how the collaboration of these services can be implemented.
Helpful links
SAP Discovery Center BTP Services
Recommend
-
14
sandeep Pantangi October 7, 2021 3 minute read ...
-
12
Ludovic MOOS 17 hours ago Issue with Document Information Extraction 21 Views Last edit 17 hours ago...
-
6
Document Information Extraction
-
4
Abeeha Rizvi November 23, 2022 Less than a 1 minute read...
-
11
Abeeha Rizvi December 6, 2022 1 minute read ...
-
5
Abeeha Rizvi December 6, 2022 1 minute read ...
-
8
Abeeha Rizvi December 28, 2022 2 minute read ...
-
5
Chandra Bhushan Singh January 24, 2023 2 minute r...
-
7
Maria Bezrukova-Goepfert February 28, 2023...
-
6
Filtering and Pagination in Document Information Extraction With the most recent release of Document Information Extraction, the new feature for filtration, ordering, and pagination is now available. This feature is a...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK