Lili's Sharings : Automating system administrative tasks of Vertica cluster on OpenStack with Python

In this post, I would like to share some experiences on how to automate system administrative tasks with Python. This is the first time for me to do such tasks. One of big lessons I have learned is how to break a complex task down into small sequence tasks.

Based on the project need, recently I have been working on some system administrative tasks. And one task is to automate the following process with a Python program.

Create a Vertica cluster on OpenStack Trove.
Migrate some certain users and their tables to this newly created cluster from the cluster they currently sit on. This task can be broken down into following sequence tasks.

Connect to the current Vertica cluster and export objects of those users into a sql file on the machine where the python program is executed.
Transfer the sql file to the target Vertica cluster.
Connect to the target Vertica cluster with the dbadmin credentials to create users and grant database privilege to them.
Connect to the target Vertica cluster with each user's credentials to create their objects (schemas and tables) from sql file.
Copy table data to the target Vertica cluster from the current Vertica cluster.

The reason why we are interested in launching a Vertica cluster is that Vertica is a massively parallel processing database (MPPDB). The MPPDB-as-a-service on the cloud would be very attractive for companies who perform analytical tasks on huge amounts of data.

The following Python packages have been used to implement the above tasks.

troveclient: Call Trove Python API.
python-vertica: Interact with Vertica server on the cluster.
subprocess: Execute a command line on the local machine.
paramiko: Execute a command line on the remote machine (i.e. virtual machines that host Vertica server on OpenStack)

Let's see how things work out with examples.

1. As it mentions in this article, OpenStack is a popular open source cloud operating system for deploying infrastructure as a service and cloud-based tasks can be automated through working with OpenStack Python APIs. The author provides the examples of Keystone API, Nova API, and so on, but doesn't cover Trove API.

An Example of Trove API can be found in the screenshot below.

2. Every database management system adopts the server-client paradigm, which means we can interact with the server with a client. The Python client package for Vertica is python_vertica.

Below screenshot provides an example how to user python_vertica to export objects.

3. As mentioned in the above section, one small task is to transfer sql file from the local machine to a remote machine (the target Vertica cluster). subprocess.check_call() is used.

4. Another small task is to execute a command line to create a user's objects from sql file on the target Vertica cluster. So how can we execute a command line on a remote machine?

Below is the code screenshot to use paramiko to do that. Note in this case, SSH login of the remote machine is set via key pair file. So the parameter "KEY_FILE_PATH_LOCAL" is need. If SSH login is set via password, refer to paramiko document and make corresponding changes.

This task requires the knowledge in the operating system, database system (i.e. Vertica), and Python. I am glad that I make it, since the growth always starts with small steps. There is still some space to optimize the code, which is what I will work on.

Lili's Sharings

Saturday, February 27, 2016

Automating system administrative tasks of Vertica cluster on OpenStack with Python

No comments:

Post a Comment