crop unrecognizable man typing on computer keyboard

How to Use SQLAlchemy and Python to Read and Write to Your Database

In today’s post, I will explain how to perform queries on an SQL database using Python. Particularly, I will cover how to query a database with SQLAlchemy, Flask-SQLAlchemy, and Pandas.

If you want to start with a toy project, I recommend that you first check last week’s post about creating a Flask application with a database in the backend. I included the whole project in this Github repo. However, the post from last week will walk you through the code faster.

Flask-SQLAlchemy

Before we can interact with the database, we need to import some libraries. Additionally, we need to generate a URI for the database. I’ll be using the function I created in tools.py for that.

from models import Cryptocurrency, Transaction

from tools import (generate_uri_from_file,
                   get_symbol_to_id_dict,
                   get_id_to_symbol_dict)

from flask_sqlalchemy import SQLAlchemy
from flask import Flask
#================================

# generate URI
uri = generate_uri_from_file("db_config.yml")

Remember to check last week’s post to better understand the models and tools I’m importing here. The next step is to set up our Flask app and db object with the correct URI. This is basically the same as in app.py in the repository.

db = SQLAlchemy()
app = Flask(__name__)

app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
app.config['SQLALCHEMY_DATABASE_URI'] = uri

db.init_app(app)

Using App_context to Run Queries

In order to perform CRUD operations on our database with Flask-SQLAlchemy, we need to run the commands inside an application context. This is needed when we are running standalone scripts.

The first query example is to retrieve all transactions stored in the database.

with app.app_context():
    results = db.session.query(Transaction).all()

and the output will be:

results

[<Transaction: ticker_id=1, coins=100.000000000, cost basis=3999.00000>,
 <Transaction: ticker_id=1, coins=45.000000000, cost basis=10000.00000>,
 <Transaction: ticker_id=1, coins=1.000000000, cost basis=34000.00000>,
 <Transaction: ticker_id=2, coins=0.123456789, cost basis=350.00000>,
 <Transaction: ticker_id=3, coins=1500.000000000, cost basis=100.00000>,
 <Transaction: ticker_id=4, coins=1500.000000000, cost basis=1215.00000>]

To query, we call the query method on the object type we are interested in. In this case, we wanted all transactions, so we access them with the Transaction object from models.py.

Notice the .all() at the end of the query. It tells Flask-SQLAlchemy to return a list of the Transaction objects we have. Without it, the command will return a BaseQuery object. There are other modifiers to use instead of .all(), for instance: .first() will return the first element only.

Additionally, you can limit the total number of rows returned from the database with the .limit() method. It takes an integer as input and it will return that many rows. Ex:

with app.app_context():
    results = db.session.query(Transaction).limit(2).all()

In that example, I’m telling the database to give me back only 2 rows. I still use the .all() method at the end to get a list and not a query object.

And before we move on to the next examples, I wanted to mention that technically, you can query directly from the model object. I don’t use that method often, but it can be less verbose. To illustrate, our original example can also be written as:

with app.app_context():
    #results = db.session.query(Transaction).all()   #original example
    results = Transaction.query.all()

Retrieving Data Conditionally

It can be useful to retrieve only a specific portion of our data that meets certain requirements. In this example, I will be querying the database to get only the transactions associated with Ticker ID 1. Ticker id 1 happens to be Bitcoin, but it depends on how we stored our values initially in the cryptocurrencies table.

#Example 1: Returning only Bitcoin transactions (Bitcoin's ticker_id is 1)
with app.app_context():
    results = db.session.query(Transaction).filter(Transaction.ticker_id==1).all()

As you can see, the filter method allows us to add conditions to the data we request back. Furthermore, we can add several conditionals. Instead of getting all Bitcoin transactions, I want to get only those Bitcoin transactions that were of more than 2 coins. Here is the way to do it:

#Example 2: Returning only Bitcoin transactions and only those that have more than 2 coins per transaction
with app.app_context():
    results = db.session.query(Transaction).filter(Transaction.ticker_id==1, 
                                                   Transaction.num_coins > 2).all()

Writing to the Database

Here is how you can write a new object to the database. I used the same method in the home route of app.py when getting user input. I will create a new sample transaction and write it to the database. In this example, I will add a new XRP transaction. Therefore, we first need to query the table cryptocurrencies to get the appropriate object. If XRP does not exist in the table yet, it will be added as well.

_coin = 'XRP'

with app.app_context():
    
    #create our new transaction
    new_tx = Transaction(num_coins=1500, cost_basis=1215)  #$1,215 assumes an XRP price of around $0.81 -> 1500 XRP * $0.81 ~= $1215
    
    #---Retrieve coin from DB
    coin_obj = db.session.query(Cryptocurrency).filter(Cryptocurrency.ticker==_coin).first()
    if coin_obj is None:
        coin_obj = Cryptocurrency(_coin) #if it does not exist, we create a new object
        db.session.add(coin_obj)
    
    coin_obj.transactions.append(new_tx)
    db.session.add(new_tx)
    db.session.commit()

On line 14 above, we are appending new_tx to coin_obj.transactions. This creates a back-reference from the transaction to the cryptocurrencies table. Make sure to take a look at models.py to see how that back-reference relationship is set up.

You can run the toy application with flask (run python app.py in the terminal) to check that the data is indeed added to the database. You can also query the database for XRP transactions following the steps I showed in the previous section.

SQLAlchemy

I mainly use Flask-SQLAlchemy because it takes care of a few things when working with a Flask application. However, we can accomplish the same things with SQLAlchemy directly. By the way, I found this great resource on using SQLAlchemy.

First, we need to import the library and create a session object to interact with the database.

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
#=================================

db_engine = create_engine(uri, echo=False)

SessionObject = sessionmaker(bind=db_engine)
session = SessionObject()   #this is the handle on the current session

Querying the Database

With our session object ready, the query method is pretty similar to the one using Flask-SQLAlchemy. I will retrieve all records with ticker_id 1, as in the previous example.

session.query(Transaction).filter(Transaction.ticker_id==1).all()

Writing to DB

Once again, the method for writing does not change a lot. Here is a similar example as the one in the previous section. I’m creating a new XRP transaction and writing to the database:

_coin = 'XRP'
new_tx = Transaction(num_coins=100, cost_basis=80)

coin_obj = session.query(Cryptocurrency).filter(Cryptocurrency.ticker==_coin).first()
if coin_obj is None:
    coin_obj = Cryptocurrency(ticker=_coin)
    session.add(coin_obj)

coin_obj.transactions.append(new_tx)
session.add(new_tx)
session.commit()

Maybe the biggest difference between using Flask-SQLAlchemy and SQLAlchemy is the use of app_context in the former. But for the most part, the code requires very few modifications.

Pandas

You may not know this, but pandas can read from an SQL database and load the data as a Dataframe. It is a very convenient tool for analysis and data manipulation.

Pandas accomplishes this with a built-in method named read_sql. As the name implies, it reads an SQL statement and returns a dataframe object. Its counterpart, to_sql, writes a Pandas dataframe to a database. I’ll show you how to use both.

  • Imports
import pandas as pd

Before we perform any CRUD (Create, Read, Update, Delete) operations with Pandas, we need to decide on a database engine to use. Luckily, in this post we have covered two, so we can use Flask-SQLAlchemy or SQLAlchemy for this.

Querying With read_sql()

To read from the database with pandas, we use the method read_sql. We need to get the query as a statement. Like the following:

#getting sql statement using SQLAlchemy
sql_statement = session.query(Transaction).statement

Now I will show how to query using Flask-SQLAlchemy and SQLAlchemy engines.

  • Using Flask-SQLAlchemy Engine
with app.app_context():
    df = pd.read_sql(sql=sql_statement, con=db.session.bind)
  • Using SQLAlchemy Engine
df = pd.read_sql(sql=sql_statement, con=db_engine)

Once again, the difference between the two is subtle. For one, we need to use an application context (app_context) when working with Flask-SQLAlchemy. The con parameter in read_sql is also different between the two.

The two examples above will return the same dataframe of all transactions.

Writing With to_sql()

For this example, I included a sample file transactions.csv in the repository. It contains a few transactions so that we can load them as a dataframe and then write them to the database. You could also create a dataframe without reading from a file. Just make sure that the dataframe column names match the names of the columns in the database table you are writing to.

  • Loading CSV as Dataframe
transactions_df = pd.read_csv("transactions.csv",)

The dataframe will look something like this:

transactions dataframe from reading a csv file. These transactions can be added to a database with SQLAlchemy and Flask
  • Write to the database with to_sql
tablename = 'transactions'

transactions_df.to_sql(
    name      = tablename,
    con       = db_engine,
    if_exists = 'append',
    index     = False, )

As you can see, we call to_sql on the dataframe directly. to_sql needs to know the table we want to write to. In the example above, we wrote to the transactions table. The connection engine used is from SQLAlchemy.

The next parameter to keep in mind is if_exists. It tells the database what to do with the table if it already exists. We selected “append“, which means that our dataframe will be appended to the end of the table. We can also pass “replace” instead. In that case, the table and its data will be dumped and replaced with the new one. Take a look at the documentation for to_sql for more detail.

Finally, the index parameter lets us exclude the index column in the database from being uploaded. Since this column was created automatically by Pandas when we read the CSV file, I did not want it included.

Conclusion

We have seen how we can query an SQL database using Python and SQLAlchemy, Flask-SQLAlchemy, and Pandas. Each method has its advantages. I tend to prefer to read data with Pandas, especially if it is data that I want to manipulate further. Here is the link to the Jupyter notebook I used for this post, and a link to the whole repository.

Let me know what you think in the comments. I appreciate suggestions for new topics, corrections if I made a mistake, or simply relevant discussion. And you can also subscribe to my newsletter. When I work on an interesting project, or find cool and useful information, I will send messages to subscribers. It’s also a good way to stay in touch.

Have anything in mind?