Python and Oracle Cloud: loading data

Using OCI Python SDK and Autonomous DWH API you can easily load data for your Data Science and ML work

Introduction

In a previous article , I have explored how we can use Python and popular Open Source frameworks, like Pandas and scikit-learn, to analyze data stored with Oracle Autonomous Data Warehouse (ADWC) . In this shorter story, I want to show you how you can easily load even big data files using OCI Python SDK .

Automation “on steroids”

One of the big features of the Cloud is Automation. In the Cloud World, it is easier to dynamically provision resources (like VM, or even Databases) and manage them using an API.

Oracle Cloud Infrastructure (OCI)has many tools, within which you can choose your preferred, to automate a large set of tasks (I would say: everything I need):

REST API
OCI CLI
SDK

OCI Command Line Interface (CLI) is really good. But, if you want to develop your highly customized tools, perfectly tailored on your shoulders, I would recommend using SDK .

As you can see in the documentation , we have:

Python SDK
Java SDK
Ruby SDK
Go SDK

Since we’re talking about Data Science and ML Engineering, I would without any doubt choose Python .

With OCI Python SDK you have a complete and easy-to-use set of classes, well-documented, enabling you to build Python scripts to automate a large part of the pipeline for your daily work with data. And, moreover, you can even use it inside a Jupyter Notebook (or Jupyter Lab).

So, some days ago, I started with the evening task to build, for one of my customers, some examples to show how to load data inside Oracle ADWC, starting from flat files. Then, as a nice side-effect, I decided to explore the subject a little more and created a script to start/stop VM and even start/stop my ADWC instance.

You can find the set of examples inside my G ithub repo

In this story, I want to explain, with more details, how to load data, starting from flat files.

Installation? Easy!

Well, obviously there is the documentation , but, if you’re using PyPi, installation is only one line of code:

pip install oci

It works also with Anaconda.

If you want to be one of the first to try the new features added, simply check out the OCI Python Github repository .

Configuration

Then, you need to do some configuration tasks, needed to ensure secure communication between your working environment and your OCI Cloud. I’ll spend some time to clarify here since I don’t want you to lose precious time on this part.

First, you need to create an API signing key-pair , in PEM format. You must keep the private safe on your environment (the computer running the Python program) and store the public key in your Cloud profile. When you store it, you can read from Cloud UI the fingerprint.

Some more details can be found here:

Required Keys and OCIDs

Learn about keys and OCIDs required when using an Oracle client or a client you built yourself. docs.cloud.oracle.com

Then, with every type of clients (ObjectStorageClient, ComputeClient, ….) you need to supply a set of (key, value) pair, packed as a Dictionary, in order to specify:

your OCI userid
a key (API signing key) used to secure communication
the fingerprint of this key
your Cloud tenant identifier
OCI Region

This is an example of what you need to prepare (anonymized):

config = {

“user”: “ocid1.user.oc1..XXXXX”,

“key_file”: “/Users/lsaetta/Progetti/xxxx/oci_api_key.pem”,

“fingerprint”: “YYYYY”,

“tenancy”: “ocid1.tenancy.oc1..ZZZZZZ”,

“region”: “eu-frankfurt-1”

For user and your Cloud tenant, you need to use not the name, but the identifier OCID, that you can copy from the OCI Cloud Console.

Loading files in the Cloud

At this point, you’re ready to load a set of files as Objects in the ObjectStore.

This is the code to load a single file, I’ll comment sooner:

As you can see, you need:

to create a client, in this case an ObjectStorageClient, and pass to the constructor the configuration Dictionary
call the method put_object with bucket_name, file_name, file_path as parameters

That’s all.

In the Github repository, you’ll find an enhanced version, handling a set of files.

Store credentials inside the DB

Now, it is time to copy data from ObjectStore to ADWC tables.

First of all: we need to create a set of credentials and store them inside the DB, in order to enable PL/SQL code to access the ObjectStorage Bucket.

Go to the OCI UI and create, inside your profile (top right icon and UserSetting), an AuthToken:

Authentication Token

Copy from the UI the Authentication Token. Remember to do that right after created, otherwise, you will not able to re-read it.

Second, store the Cloud Credentials inside the DB using this code; This step needs to be done only once:

import cx_Oracle

connection = cx_Oracle.connect(USER, PWD, 'adwc5_medium')

# need to create a cursor
cur = connection.cursor()

statement = """
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'DEF_CRED_NAME',
    username => '[email protected]',
    password => '<AUTH Token>'
  );
END;
"""
cur.execute(statement)

Take care: if your user is a federated one, the name is the fully qualified name. It has a prefix like:

oracleidentitycloudservice/

and you must specify, in the CREATE_CREDENTIAL call, the fully qualified name.

Loading data in Oracle ADWC

Since we have now the flat files in the ObjectStore, we can employ the PL/SQL package DBMS_CLOUD to load the data inside ADWC tables.

Ok, it is PL/SQL code, but not very complicated, and you can easily cut&paste. We’ll wrap this code inside Python code.

We’re imaging the simpler scenario: one Oracle table for flat file and the format of the file is the same as the table (one-to-one correspondence between file and table fields)

We have to create the table inside the DB. For example:

CREATE TABLE CHANNELS
   (channel_id char(1),
    channel_desc varchar2(20),
    channel_class varchar2(20)
   );
/

Then we can load the data contained in one file in this table, with this code:

import cx_Oracle

connection = cx_Oracle.connect(USER, PWD, 'adwc5_medium')

# need to create a cursor
cur = connection.cursor()

statement = """
BEGIN
 DBMS_CLOUD.COPY_DATA(
    table_name =>'CHANNELS1',
    credential_name =>'DEF_CRED_NAME',
    file_uri_list =>'https://swiftobjectstorage.eu-frankfurt-1.oraclecloud.com/v1/tenant-name/BUCKET_NAME/channels1.txt',
    format => json_object('delimiter' value ',')
 );
END;
"""

cur.execute(statement)

One detail: to reference the Object you need to use the SWIFT-type URL. What you need to change in the above example is:

the Region Name (here it is eu-frankurt-1)
the name of your tenant
the name of the bucket in Object Store
the name of the Object (here: channels1.txt)

As we can see, the DBMS_CLOUD package has all the required API to:

store credentials inside the DB
COPY from ObjectStore to DB

It provides also some other nice features. For example, you can use Apache Parquet format for the files, instead of CSV.

Want to check?

If you want to check that data has really been loaded, you can use this snippet of code:

import pandas as pd
import cx_Oracle

connection = cx_Oracle.connect(USER, PWD, 'adwc5_medium')

channel1 = pd.read_sql("select * from CHANNELS1", 
                   con=connection)

# if it is a Notebook
channel1.head()

External table

Another option for Data Loading is to create in the DB an External Table , referring to the file. A soon as you have the external table working, you can use DML instructions to load data inside the DB. For example, you can use CTAS: CREATE TABLE AS SELECT.

The procedure to create the external table is:

DBMS_CLOUD.CREATE_EXTERNAL_TABLE

more details here .

Conclusion

As a Data Scientist, you need to work with data. Sometimes you’ll need to load by yourself. Sometimes, you’ll have a DBA to load for you.

In this article, I have shown how to use OCI Python SDK to:

load files in the ObjectStore
copy data from ObjectStore to ADWC tables

In addition, I have given a close look at the power of OCI Python SDK. With it, you can create your highly customized set of tools to automate your daily work. You can even create your ADWC instance or scale it up&down (adding cores and storage space).

In a future article, I’ll return to Machine Learning. It is almost time to explore OML4Py .

One more thing

If you want to find more examples using OCI Python SDK, you can have a look here

Introduction

Automation “on steroids”

Installation? Easy!

Configuration

Loading files in the Cloud

Store credentials inside the DB

Loading data in Oracle ADWC

Want to check?

External table

Conclusion

One more thing

Recommend

Writing Resilient Components

When Excel isn’t enough: Using Python to clean your Data, automate Excel and muc...

Why the default, WSL terminal is so fast and other commentary about low-level AP...

GitHub - libgit2/libgit2: A cross-platform, linkable library implementation of G...

中亚Prime会员:SONY 索尼 Xperia XZ1 智能手机 4GB+64GB 黑色 1649.83元含税直邮_亚马...

中亚Prime会员、再降价:TIGER 虎牌 MCX-A035 梦重力保温杯 350ml *3件￥204.09+￥22....

25日0点、历史低价:Joyoung 九阳 JR5001 反渗透纯水机 999元包邮（前2分钟付款）_天猫...

Glambody无钢圈性感半杯文胸套装 58元包邮（需用券）_京东优惠

他，不是一个靠谱的男人

总是在重复「从入门到放弃 Python」？这里有为你「量身定做」的学习方法

About Joyk