Basic Data Structures

What are Data Structures?

Imagine you want to plot monthly potential temperature from a CESM run on a bar graph. Your first step will be to put the potential temperature data inside a Python data structure. Data structures are how computer programs store information. This information can subsequently be processed, analyzed and visualized.

There are many different types of data structures depending on the kind of data you wish to store. You will have to choose data structures that best meet your requirements for the problem you are trying to solve. Fortunately, Python as a “batteries-included” language gives you a practical choice of data structures to select from. We will do a minimal exploration of Python data structures; just enough to get you going. For a more complete treatment of Python data structures, see the Data Structures section in the Python documentation.

Python Data Structures

First, scientific data can be large and complex and may require data structures appropriate for scientific programming. We cover Python for scientific data further along in the Python Scientific Computing Ecosystem guide.

This notebook covers basic Python data structures meant for general-purpose programming, necessary to write programs in any capacity. Choosing the right data structure for the problem you are targeting will help your programs run correctly and efficiently, and make them easier for others to understand.

We will specifically examine three Python data structures: lists, tuples, both Python sequences, and dictionaries.

Sequences

Python offers several data structures to store sequences of information such as hourly mean sea level pressure readings from a weather station, or three-dimensional coordinates describing a location in a climate model. To accommodate storage of such data, Python has a few different choices. We will discuss two of them: lists and tuples.

List

  • A Python list is a sequence of values that are usually the same kind of item.
  • They are ordered, which means items of a list stay in the order they are inserted in. - - They can contain strings, numbers, or more complex items.
  • Lists are mutable, which is a fancy way of saying they can be changed after they are created.

Here is a Python list of synthetic yeraly average potential temperature (in celsius degree) data for 2000 thru 2009, and these values are assigned to the T variable.

T = [15.344799 , 16.299318 , 17.322786 , 18.373438 , 19.443617 ,
        20.552095 , 21.70852  , 22.810867 , 23.81051  , 24.661425 ]

The list is demarcated with square brackets, the values are comma delimited and assigned to the T variable with the = assignment operator.

What Can You Do with a List?

Python has a set of built-in methods that you can use on lists

Method Description
append() Adds an element at the end of the list
clear() Removes all the elements from the list
copy() Returns a copy of the list
count() Returns the number of elements with the specified value
extend() Add the elements of a list (or any iterable), to the end of the current list
index() Returns the index of the first element with the specified value
insert() Adds an element at the specified position
pop() Removes the element at the specified position
remove() Removes the first item with the specified value
reverse() Reverses the order of the list
sort() Sorts the list

We will examine just a few examples.

  • Add an Item to the End of the List

Continuing with our list of potential temperatures, we want to add a new value prediction for the year of 2010 to the T list. We can use the append method to add an item to the end of the list. (A method is like a function, but denoted with the . notation after the variable it is acting on. Instead of append(T, 21.589) you have T.append(21.589).)

T.append(21.589)
print(T)
[15.344799, 16.299318, 17.322786, 18.373438, 19.443617, 20.552095, 21.70852, 22.810867, 23.81051, 24.661425, 21.589, 21.589]
  • Add an Item to the Front of the List

Let’s say we want to add value of the year 1999 to the list. We can use the insert method, to add an item to the list at the location of our choosing, in this case location or index 0. (Python sequences start at index 0, not 1 like Matlab or Fortran.)

T.insert(0, 14.0)
print(T)
[14.0, 15.344799, 16.299318, 17.322786, 18.373438, 19.443617, 20.552095, 21.70852, 22.810867, 23.81051, 24.661425, 21.589, 21.589]
  • Change a Value in the List

Let’s say that we want to update the potential temperature of the year 2002 to a value of 16.0. We will access the 3rd value on the list with the square bracket notation.

T[2] = 16.0 # Remember, 3rd item at index 2 because we start at 0, not 1
print(T)
[14.0, 15.344799, 16.0, 17.322786, 18.373438, 19.443617, 20.552095, 21.70852, 22.810867, 23.81051, 24.661425, 21.589]

Tuples

Tuples are also ordered sequences of information but they are immutable, which means once they are created, they cannot change. Immutability may seem like a strange concept given that computer programs are constantly manipulating and changing data, but your program becomes easier to understand when you can guarantee something is unchanging. Tuples tend to contain related items such as an x and y locations in a Cartesian plane, or an author, title and journal in a scholarly citation.

Here we define a tuple representing a geographic coordinate expressed latitude, longitude and elevation in meters:

location = (40.0, -105.3, 1655.1)

The tuple definition is demarcated with parentheses, the values are comma delimited and assigned to the location variable with the = assignment operator. Because tuples are immutable, unlike lists, there are no operations to change them in-place.

Built-in Functions for Lists and Tuples

There are several built-in Python functions to examine both lists and tuples. Let’s look at a few. We can find out the length of the tuple or list with the built-in Python len function:

print(len(T))
12

We can also discover the min and max of a sequence:

print(min(T), max(T))
14.0 24.661425

Accessing Data from Lists and Tuples

Python offers a rich variety of options to access values inside lists and tuples, and you will want to eventually understand indexing, slicing and striding expressions. For brevity, we will only examine a couple of examples to get values inside sequences. Again, note valid indices on lists and tuples start at 0 and end at size of list - 1.

Indexing

Individual items inside the list can be obtained with the square bracket notation. Here will assign a couple of values from inside the list to two variables: T_1999 and T_2010. We will the print the values with Python 3 formatted string literals.

T_1999 = T[0] # index 0 at 1999
T_2010 = T[11] # index 11 at 2010

print(f'Yearly Average Potential Temperature in 1999 was {T_1999} and in 2010 was {T_2010}')
Yearly Average Potential Temperature in 1999 was 14.0 and in 2010 was 21.589
Multiple Assignments for Unpacking Tuples

Python sequences also allow for multiple assignments for unpacking. This trick is quite handy for tuples:

lat, lon, elev = location  # unpacking the tuple
print(f'lat {lat}, lon {lon}, elevation {elev}')
lat 40.0, lon -105.3, elevation 1655.1

Dictionaries

Dictionary data structures are easy to understand because you are already familiar with them. When you look up a word definition in a language dictionary or use an index in the back of a book, you are using a dictionary data structure. Dictionaries are composed of key and value pairs. For example,

Potential temperature - the temperature that a parcel of fluid at pressure $P$ would attain if adiabatically brought to a standard reference pressure ${P_0}$, usually 1000 millibars.

Here, the key is “Potential temperature” and the value is “the temperature that a parcel of fluid at pressure $P$ would attain if adiabatically brought to a standard reference pressure ${P_0}$, usually 1000 millibars.”

Let’s build upon the earlier tuple example by defining a dictionary of ensemble runs. The keys are strings representing the ensemble identifier, the values are tuples representing the values of potential temperature in celcius degrees.

ensembles = {
    'member_1': (10.0, 12.89, 13),
    'member_2': (11.2, 12.1, 12.5),
    'member_3': (10.5, 11.9, 14)}

Unlike lists and tuples, dictionaries are unordered; entries in a dictionary are not in the order they are inserted in and you cannot rely on any predictable ordering. This is not a problem as you will be using Python dictionary operations to look up the information contained within the dictionary.

What Can You Do with a Dictionary?

Python has a set of built-in methods that you can use on dictionaries:

Method Description
clear() Removes all the elements from the dictionary
copy() Returns a copy of the dictionary
fromkeys() Returns a dictionary with the specified keys and values
get() Returns the value of the specified key
items() Returns a list containing a tuple for each key value pair
keys() Returns a list containing the dictionary’s keys
pop() Removes the element with the specified key
popitem() Removes the last inserted key-value pair
setdefault() Returns the value of the specified key. If the key does not exist: insert the key, with the specified value
update() Updates the dictionary with the specified key-value pairs
values() Returns a list of all the values in the dictionary
  • Look up a Value in a Dictionary

Let’s look up the first ensemble memeber member_1.

print(ensembles['member_1'])
(10.0, 12.89, 13)
  • Add a Value to a Dictionary

Let’s add a fourth ensemble member member_4 to our ensembles dictionary:

ensembles['member_4'] = (13.0, 10.7, 12.8)
print(ensembles)
{'member_1': (10.0, 12.89, 13), 'member_2': (11.2, 12.1, 12.5), 'member_3': (10.5, 11.9, 14), 'member_4': (13.0, 10.7, 12.8)}

Going Further

There are many topics concerning Python data structures that we did not cover in the interest of brevity. We encourage you to research more elaborate indexing, slicing and striding expressions.

Also, we did not cover Sets, which is a data structure composed of unique, unordered values similar to keys in a dictionary data structure.

There are several valuable built-in Python functions that merit study: filter(), map(), sorted() functions to name a few. Lastly, in the “Flow Control” notebook, we will examine Python list comprehension to process information inside of sequences and dictionaries.