48

Amazon RDS — Step-By-Step Guide

 4 years ago
source link: https://www.tuicool.com/articles/2eUbY3Y
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

T his article here will guide you how to create a database instance using Amazon Relational Database Service and how to interact with a remote database server using only Python.

Step 1: Launch Your AWS Database Instance

First, we need an account on AWS. We are going to an AWS portal to create a new account, or you may use an existing AWS account.

What is Amazon RDS?

Amazon Relational Database Service (RDS) forms a central part to set up a relational database in the cloud. Each cloud database we provision is called a database instance. So, first, click on Services and select Database to navigate to the RDS Dashboard:

B3QVjem.png!web

AWS Services Console. Select RDS for launching a new database instance.

Next, to create a new database click on “ Create databas e”. Database instances are flexible, and you configure them to meet your needs. You can configure your database with a range of engine types including Amazon Aurora, MySQL, PostgreSQL, and others. Beyond the engine types, you can also configure the database instance type and size, allocate storage, and configure different settings like monitoring, backups and many others.

Now, we are selecting a PostgreSQL engine on which to run the db. In this post, we use a Free tier template . With a free trier, we get a small instance db.t2.micro with 1 CPU and 1GiB memory.

Next, in the Settings section specify your database instance information: instance name and the master user credentials to log in to your instance. You may also specify a port you want to use to access the database. For PostgreSQL installations, the default port is 5432.

If you want to give your database instance a public IP address (so devices outside the VPC will be able to connect to your database) choose Yes for Public Accessible inside the Connectivity section. I choose Yes to make my database be available from my local machine.

You should proceed to the Additional configuration section and specify a database name that Amazon RDS creates when it creates the database instance. For this tutorial, I am going to create a database with a name titanic .

Remember: a database name and a database instance name are two different names. If you do NOT specify a database name, Amazon RDS does NOT create a database when it creates the database instance.

Finally, click on Create Database . It may take a few minutes to launch database. The new database instance appears on the Amazon RDS console. You may connect to the instance once its state changes to available.

jmaM322.png!web
The DB instance on the Amazon RDS console

Step 2: Connect to the PostgreSQL database

Once the database is created we can interact with it. Here, we connect to our database from Jupyter notebook using Python package psycopg2 :

Note, you don’t even need to install Postgres on our local machine. You only need to install a psycopg2 package, by typing inside your terminal: pip install psycopg2

In the second line, we opened a connection to Postgres by using the connect() method. We passed five parameters into the psycopg2. First is the host which specifies where your database is running. In our case it is a remote RDB, and the host url is the Endpoint value from the Amazon RDS console (see the screenshot below). Next, we passed the username and password (lines 5,6). You may start with using master user credentials which you created earlier. Later, we will add a new user as a way to improve security and division of data. Finally, we passed the database name ( which is different with the database instance name!!! ). Database name I asked you to enter in the Additional configuration , and in my case it was titanic.

BfUFRjI.png!web

Amazon RDS console for a test database instance

Last but not least, we need to create a cursor object (line 9) to interact and execute the commands on the Postgres database .

Step3: Interacting with the database from Python

3.1. Dataset

Once we connected to our database, it’s time to add some data. For this tutorial, I took the titanic dataset from Kaggle (train.csv). After some pre-processing, I split the file into three CSV files: passengers.csv , tripInfo.csv , and survival.csv. The link to my repo with processed dataset and notebook is in the end of article.

3.2. Creating tables

Well, now I am going to copy data from the CSV files into the database, but first, I need to create three tables that fits my dataset. For this purpose, we can use CREATE TABLE command, but pay attention, the columns should be in the same order as in the CSV files with their respective types.

Remember, every time we need to commit our changes by running connection.commit() at the end of every transaction (line 22). Finally, you may always query the table INFORMATION_SCHEMA.COLUMNS to display schemas of all the public tables. In particular we are interested in the column names and data types. In Python we can also access remote data using pandas method read_sql() :

AjaaayY.png!web

A sample output after executing the SQL query.

3.3. Loading data into a Postgres table from CSV

Once we have our tables, we want to copy data from CSV files. Fortunately, it is super simple with psycopg2 using the copy_from() method:

Loading data into a Postgres table from CSV

We can also create a new user and grant privileges to manage our database:

3.4. Execute queries

Now, our remote database is ready and we can start fetching some data from our database tables using Python pandas.read_sql() . Let’s say we are interested to get the survival rate for each passenger’s class. Below is a simple SQL query with INNER JOIN to do that:

It is very convenient to use pandas to execute SQL, because it reads SQL query into a DataFrame:

FJ3m6rU.png!web

The survival rate for each passenger’s class

Delete the database Instance

Last notes. It is a good practice to delete your database instances from AWS which you are no longer using. Otherwise, you will be getting charged for them. So you need to go back to the AWS RDS Console, select your instance and click Delete.

Yz2AbmY.png!web

Delete the database Instance

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK