Joshua Blewitt

Building the Postgres Toolbox, part one 🐘

So recently, I've been diving more into Data Analysis (which I've been really enjoying!) and I recently wanted to see if I could combine both my interest of hobbyist development and data analysis into a project! 🧐 And today I'm sharing the recent progress I've made on it. I'm calling this application the 'Postgres Toolbox' 🔮

This is going to be a fairly big project. I honestly think this will take several months to complete, as there is a lot to do! But also, this is my first 'real' project that uses the PyCharm IDE - something that's used by professionals!

If you want to take a look at the repo, you can find it here on GitHub (If you like it, give it a star!)

Please remember that this is a work in progress!

What is Postgres Toolbox?

Postgres Toolbox is a CLI application that allows you to:

  • Connect to a Postgres database
  • Save your login info to a JSON for faster login
  • Run queries against your database
  • Export your results to a CSV file
  • Perform Data Analysis by using Seaborn to visualise your data

Currently, you can perform data visualisation on statistical data. Support for distributions of data and plotting categorical data is planned! 👨‍💻

The project will grow over time so who knows what new features will be added!

This post will look at connecting to a DB, running a query and then taking the results of the query and exporting them to a CSV file.

So let's jump in with connecting to a database.

Connecting to a Postgres database

Before we can do anything, we need to connect to our Postgres database. So, we need a library that can do just that. For this project, I'm using a library called psycopg.

So in order to connect to the database in Postgres, we need the following:

  • Database name
  • The user name
  • The password
  • The host
  • The port

We can get this information from the user like so:

# get db info from user
print('Please enter the following information below.')
db_name = input('Enter the DB Name - ')
db_user = input('Enter the user name - ')
db_password = input('Enter the DB Password - ')
db_host = input('Enter the host name - ')
db_port = input('Enter the port - ')

But after a while of running the program and entering the details over and over again, I wanted to see if there was an easier way. So I decided to take the variables and write them to a JSON file:

print("Now saving DB info to JSON file....")
# create dictionary
db_dict = {'db_name': db_name, 'db_user': db_user, 'db_password': db_password, 'db_host': db_host, 'db_port': db_port}
# create json
json_file = json.dumps(db_dict, indent=4)
with open("db_info.json", "w") as outfile:
    outfile.write(json_file)
print('Complete! File saved as db_info.json')

And then on startup, the program checks to see if a JSON file exists:

if os.path.exists('db_info.json'):
    try:
        print('Found saved login - now reading...')
        # open JSON file and close it
        with open('db_info.json', 'r') as file:
            db_data = json.load(file)
            file.close()
        # unpack dictionary
        db_name = db_data['db_name']
        db_user = db_data['db_user']
        db_password = db_data['db_password']
        db_host = db_data['db_host']
        db_port = db_data['db_port']

This way, it can be quicker for the user to access the same database instead of typing in the same information over and over again.

So we can establish a connection to the database - now lets take a look at sending a query.

Querying a connected database

Before we can begin querying the database we're connected to, we need to create a cursor.

# Open a cursor to perform database operations
cursor = connection.cursor()
print('Enter your query below')
query = input('')
print('Now executing...')
try:
    # execute query
    cursor.execute(query)
    result = cursor.fetchall()

After the cursor is created, we can ask the user for their query.

Executing a query is captured in a try/except statement - just in case the query is incorrect.

There is a problem that I discovered early on, the 'fetchall' method doesn't include the names of the columns.

The 'results' variable becomes a list, but doesn't have any of the column names. They must be stored somewhere in the cursor object right? Turns out, they are!

# extract the column names
col_names = []
for name in cursor.description:
    col_names.append(name[0])

The 'description' property is a tuple which holds the names of the columns. We take the first element of the tuple and append it to a dictionary. Here's the complete code for that:

try:
    # execute query
    cursor.execute(query)
    result = cursor.fetchall()
    # extract the column names
    col_names = []
    for name in cursor.description:
        col_names.append(name[0])
    df = pd.DataFrame(data=result, columns=col_names)
    print('Success! Would you like to export the results to a CSV file or view the results?')
    choice = input("""Enter 1 to export to CSV or 2 to view the results.
    \nEnter anything else to skip.""")
    if choice == '1':
        print('Now exporting...')
        df.to_csv('query_results.csv')
        print('Complete! File name is called query_results.csv')
    elif choice == '2':
        print(f'{df}')
    else:
        print('Skipping.')
    while True:
        print('Would you like to perform some data analysis on your results?')
        print('Type Y for yes or N for no')
        analysis_choice = input().upper()
        if analysis_choice == 'Y':
            analyse_data(df)
            # close connection
            cursor.close()
            break
        elif analysis_choice == 'N':
            print('Returning to the main menu...')
            # close connection
            cursor.close()
            break
        else:
            print('Invalid command - try again!')
except OperationalError as e:
    print(f"The error {e} occurred!")

So how do we use the 'results' variable and the 'col_names' dictionary to create a CSV file?

Writing a CSV file

As seen above, we use Pandas to create a data frame and then use the built in functionality.

print('Now exporting...')
df.to_csv('query_results.csv')
print('Complete! File name is called query_results.csv')

Now we have a CSV file exported for the user which has the column headers!

And that about covers what I wanted to cover today on part one of the Postgres Toolbox 🐘 It's been fun putting the program together and make something that can pull data from a database.

Next time, I'll look at how we use the dataframe and create graphs to visualise the data in Seaborn.

Thanks for reading! 👏

Tags:


A photo of me!

I'm Joshua Blewitt, I'm passionate about product, a technology advocate, customer champion, curious mind and writer. I've worked for companies such as Rightmove, Domino's Pizza and IQVIA.

Let me know your thoughts!
More about me