Pursuing Data

Article

Python for SQL Developers: Basic Data Types

Python class code

If you've spent a lot of time working in SQL the transition to a different language can seem large. It is definitely the case that at the beginning things can feel slow and unintuitive, but the skills that are developed in thinking about how to work with and manipulate data in SQL will translate when working with Python, it's all just about practice!

Also, when learning something new, doing is much better than reading about it. So while you can keep up by reading through this article, if you'd like to follow along on your own, I've set up some Jupyter notebooks that will allow you to work alongside the article. You'll need a GitHub account to download the files but that's free and easy to set up. You can check out the notebooks here.

Important Differences

Before we get started exploring some of the specifics of Python, it's worth pointing out some key differences between the two languages. Remembering this can help avoid some debugging challenges down the line.

Python is case sensitive

When writing SQL, case usage is often very flexible. How case is used is often more about personal preference or organizational conventions than it is about differences that will affect the outcome of the query. This is NOT the case with Python.

For example, if we created a variable called python that contained the string "A python is a type of snake. It is also a programming language." and then tried to call the variable Python we would get a name error letting us know that we have not created a variable called "Python".

Python is case sensitive

Python starts with 0

When working in SQL, if we wanted to start at the first value in a string, we would reference this using 1. When working with Python, we index starting at 0. To reference the item in the first position in our python variable we should use python[0]. If we use python[1] instead, we will get the value at the second positin, which is a space.

Python starts as zero

Equals is for variables

Depending on the SQL dialect you use, you may be used to using = to evaluate whether two values are equal. This is NOT how this sign works in Python. In Python, if we want to compare whether two values are equal, we use two equals signs. If we use a single equals sign will result in the value being stored in our variable.

Equals is for variables

The "does not equal" sign is also different in Python. Python uses != for does not equal, instead of <> in SQL.

NULL is None

In Python, the default null data type is None. It is important to use the correct capitalization for Python to recognize the null value; without it, Python assumes that none is the name of a variable.

None is the default null value

Code comments are different

In Python, in-line code comments use the # symbol. This is the equivalent of --. For multi-line comments, Python uses three double or single quotes at the beginning and end of the code. This is instead of the /* */ used in SQL.

Basic Data Types

With that brief review of differences, we can now get into some basics of working with Python. The general concepts of data types exists in Python, and there is a general ability to map the SQL data types to Python equivalents.

Here's how they map:

  • string: text, char, varchar
  • integer: numeric (whole numbers only)
  • float: numeric (decimal numbers)
  • boolean: similar to binary but not exact comparison

When working with data types in Python, we can separate them out on two different dimensions, mutability and orderedness. While these concepts are not essential for starting out, they can be helpful to file away for later use.

An object or data type is mutable when the ID assigned to it can be changed. Immutable objects are not able to be changed and are faster to retrieve from memory.

An object or data type is ordered when the items it contains are stored in a specific order and can be retrieved by referencing the order of the object.

Now that we've been introduced to the types and basic characteristics, let's review each in a bit more detail.

Strings

Strings (str) are Python's text data type. Unlike SQL, which has a variety of different text types that assist in storage and retrieval times, Python uses strings to cover every text type. Strings can be as short as one letter (or an empty string) or as long as desired.

Strings are immutable and ordered. They are designated through the use of single or double quotes.

String data types

Strings have some unique functioning in Python in that they can be multiplied and summed to create longer strings. This is equivalent to concatenating with || in SQL.

String functions

They can also be indexed and sliced to return portions of the string. In Python we use square brackets ([]) to indicate that we are indexing or slicing and use a colon (:) to indicate "between" to numbers. This is the equivalent of SUBSTRING and similar functions in SQL.

String slicing and indexing

Integers

Whereas SQL has one basic numeric type, Python has two. The first is the integer (int) that captures whole numbers. (The lack of a decimal point makes it smaller in memory) Both negative and positive numbers are integers. Integers are immutable and unordered (they can't be indexed or sliced).

Integer data types

Python has a similarity to SQL with it's integers in that you can't confirm a data type simply by looking at the value. It's possible for a number to actually be a string.

Data type may not be apparent on inspection

The arithmetic operators in Python are the same as in SQL. The only difference is in how "power" is designated. In Python we use double asterix (**) as the equivalent of the POWER function.

Python math operators

It's also possible to change data types in Python in a similar way to that which is used in SQL. We wrap the object in a function of the name of the new data type.

Data types can be converted

Floats

Floats are Python's other basic numeric data type. It capture decimal numbers, or numbers with a floating point (the exact position of the point in relation to the other numbers is not defined). Floats are also immutable and unordered, as is the case with integers. Any mathematical operation that you can do with integers you can do with floats.

Floats are designated by the presence of a decimal point at the beginning, end, or middle of a number.

Float data types

Floats can be converted to ints in the same way that strings can be converted to ints. The result is similar to using the FLOOR function.

Float converted to int

Boolean

Booleans (bool) are logical values that can be one of two values, True or False. They have a similar function to binary values in SQL but are not a direct equivalent. Booleans are also immutable and unordered. Because Python is case sensitive, it is once again important to use the correct casing for Python to recognize the object.

Boolean data types

The typical function of bools are to act as the result of comparisons. Using the mathematical comparison operators, we can convey whether an expression is true or false.

Boolean comparisons

(If you've been following along in the notebook you may wish to complete this section before reading further) Boolean values can also act in mathematical operations. True takes the value of 1, and False takes the value of 0.

Boolean math

Summary of Equivalencies

To help keep track of the equivalent functions in Python from SQL, at the end of each section I'll provide a summary of the comparions.

  • CONCAT: Joining strings using +
  • SUBSTRING: Indexing and slicing using [] and :
  • Arithmetic functions: The same for Python and SQL
  • POWER: Using **
  • Type conversion: Wrapping an object in the function of the desired data type

Python has an additional set of data types called collections. We'll look at those in the next article!