Basic Input and Output
What is Input and Output?
Python programs run in a larger computing environment. They may input information from the computer or across the Internet. Or they may output information displayed to the screen or written to a file. These are examples of input and output or “I/O” in computer science parlance. In this notebook, we will cover the basics of Python I/O by examining string formatting, reading, and writing information to files. Finally, we will delve into a real world example involving snow depth data.
Strings and String Formatting
Strings
In computer programming, a string is a sequence of letters or other characters. The
“CESM” string is a sequence of the characters C
, E
, S
, M
. You can print strings to the screen with the print() function. For example:
print("CESM")
Python strings can either be enclosed in single or double quotes. They can hold not just a single word, but any amount of text such as a paragraph of prose or web page content. They can be assigned to variables just like any other Python data type:
precip = "CESM"
Strings are immutable so they cannot be changed in-place similar to tuples. (See the data structures notebook for a discussion on immutability.) Indeed, strings can be manipulated in a similar manner to tuples.
Python strings are str objects and support many methods to act upon them such as split(), join() and replace().
Python has a set of built-in methods that you can use on strings.
Method | Description |
---|---|
capitalize() |
Converts the first character to upper case |
casefold() |
Converts string into lower case |
center() |
Returns a centered string |
count() |
Returns the number of times a specified value occurs in a string |
encode() |
Returns an encoded version of the string |
endswith() |
Returns true if the string ends with the specified value |
expandtabs() |
Sets the tab size of the string |
find() |
Searches the string for a specified value and returns the position of where it was found |
format() |
Formats specified values in a string |
format_map() |
Formats specified values in a string |
index() |
Searches the string for a specified value and returns the position of where it was found |
isalnum() |
Returns True if all characters in the string are alphanumeric |
isalpha() |
Returns True if all characters in the string are in the alphabet |
isdecimal() |
Returns True if all characters in the string are decimals |
isdigit() |
Returns True if all characters in the string are digits |
isidentifier() |
Returns True if the string is an identifier |
islower() |
Returns True if all characters in the string are lower case |
isnumeric() |
Returns True if all characters in the string are numeric |
isprintable() |
Returns True if all characters in the string are printable |
isspace() |
Returns True if all characters in the string are whitespaces |
istitle() |
Returns True if the string follows the rules of a title |
isupper() |
Returns True if all characters in the string are upper case |
join() |
Joins the elements of an iterable to the end of the string |
ljust() |
Returns a left justified version of the string |
lower() |
Converts a string into lower case |
lstrip() |
Returns a left trim version of the string |
maketrans() |
Returns a translation table to be used in translations |
partition() |
Returns a tuple where the string is parted into three parts |
replace() |
Returns a string where a specified value is replaced with a specified value |
rfind() |
Searches the string for a specified value and returns the last position of where it was found |
rindex() |
Searches the string for a specified value and returns the last position of where it was found |
rpartition() |
Returns a tuple where the string is parted into three parts |
rsplit() |
Splits the string at the specified separator, and returns a list |
rstrip() |
Returns a right trim version of the string |
split() |
Splits the string at the specified separator, and returns a list |
splitlines() |
Splits the string at line breaks and returns a list |
startswith() |
Returns true if the string starts with the specified value |
strip() |
Returns a trimmed version of the string |
swapcase() |
Swaps cases, lower case becomes upper case and vice versa |
title() |
Converts the first character of each word to upper case |
translate() |
Returns a translated string |
upper() |
Converts a string into upper case |
zfill() |
Fills the string with a specified number of 0 values at the beginning |
Remember, strings are immutable so any method that “changes” a string really returns a new string. The original string remains the same.
String Formatting
In any realistic program, you will typically incorporate variable information into a string. Imagine you have a Python program that analyzes radar reflectivity data from thunderstorms and will convey that information to the user. For example, consider this string:
The peak reflectivity of the thunderstorm cell is 50 dBZ.
In this string the number “50” is variable depending on the data analyzed within the program, while the rest of the string is constant. Python offers several possibilities to print such strings but the best and most powerful is Python 3 formatted string literals.
Python 3 Formatted String Literal
In formatting, there is a “literal” or unchanging part of the string and zero or more replacement fields denoted by the {}
curly braces. For example,
print(f"The peak reflectivity of the thunderstorm cell is {50} dBZ.")
The curly braces are swapped out with the evaluated value at runtime. You can put any and all valid Python expressions in them. This allows you to do some nifty things. For brevity, we will not cover this topic in any depth, but we will look at a few examples that examine formatting numbers, a common concern in scientific programming.
Decimal Numbers
We will look more closely at formatting involving decimal numbers.
print(f"unity is {1}, e is {2.71828:.2f} and pi is {3.14159:.3f}")
Let’s study the {2.71828:.2f}
field. The :
signifies the start of the string formatting, the .2
describes the precision of the number after the decimal place (in this case two places) . The f
denotes that we want a decimal number with a fixed number of digits after the decimal point. (Note, the formatted numbers have been properly rounded.)
Scientific Notation
Another common concern in scientific programming is the display of numbers in scientific notation:
print(f'The universal gas constant is {8314.5:.2e} J K-1 mol-1')
The {:.2e}
field is largely the same as the field we described earlier except the e
denotes that we wish to display the number using scientific notation format.
Reading and Writing Files
Imagine you wish to share the results of your data analysis with the broader scientific community. Your program may have to write data to a file so that it can be uploaded to a data archive, for example. Or perhaps there are data files vital to your research that you want to read into your program so that they can be visualized. In these scenarios, it is essential you learn how to write to, and read from files.
open()
built-in Function
The first order of business is to understand the open() built-in Python function in conjunction with the with...as
Python idiom. Imagine you have some data you wish to share with colleagues. You can write the data contained within a hypothetical data
variable to the data.txt
file in this manner:
with open("data.txt", 'w') as f:
f.write(data)
Or maybe you have some data you want to analyze. You can read the contents from the data.txt
file into the data
variable.
with open("data.txt", 'r') as f:
data = f.read()
The first parameter in the open()
function, in this case data.txt
, is the file on your computer you wish to read. The second, known as the mode
, describes how you want to open the file and, in particular, if you want to read its contents, or if you aim to modify them. The options are r
read only, w
write only, a
append, and r+
read and write. The mode
parameter can be omitted in which case it will default to r
, read only mode. Be careful with write modes as you can erase files that are already present with the same name.
Here we are using the open()
functions with the with ... as
Python idiom. The purpose of this idiom is to ensure the file is properly closed when you are finished with it. Otherwise, the responsibility of closing the file is left to the programmer with the close() method. The file object, which you will need to get your work done, appears after the as
keyword. In this case, the file object is f
and will only be available to you in the indented code block following the with ... as
idiom. Keeping with its batteries-included philosophy, Python will close the file for you when that code block is done executing.
Snow Depth Data Exercise: Reading and Writing in Practice
You are assigned to analyze National Water and Climate Center snow depth data from SNOTEL Site 936, Echo Lake, Colorado, USA. Here is a snippet from a file describing the snow data:
Site Id,Date,Time,WTEQ.I-1 (in) ,PREC.I-1 (in) ,TOBS.I-1 (degC) ,TMAX.D-1 (degC) ,TMIN.D-1 (degC) ,TAVG.D-1 (degC) ,SNWD.I-1 (in) ,
936,2016-04-27,, 10.7, 16.7, -3.9, 1.7, -5.8, -2.7, 34,
936,2016-04-28,, 10.8, 16.8, -4.2, 4.3, -5.3, -1.7, 36,
936,2016-04-29,, 10.9, 17.0, -4.8, -2.7, -5.1, -4.3, 37,
936,2016-04-30,, 11.4, 17.5, -6.0, -2.4, -6.0, -4.6, 43,
936,2016-05-01,, 11.8, 18.0, -7.4, -3.1, -7.5, -5.6, 48,
The SNOTEL data are expressed in comma-separated values (CSV) format with the column headers describing the data. For example, WTEQ.I-1
is “Snow Water Equivalent” in inches, and SNWD.I-1
is “Snow Depth” in inches. (Note, the data are in a mixture of English and metric units.)
Read the Data File
Let’s examine the snow.csv
data file by reading it with Python. (Python has a library for handling CSV files, but for the purposes of this exercise, we will ignore it as it is not really needed.)
We are going to open our CSV snow data file with the with...as
Python idiom followed by a nested list comprehension to extract the data:
with open("data/snow.csv", 'r') as file:
snowdata = [entries for line in file for entries in [line.split(",")]
if (len(entries) > 0 and entries[0].isdigit())]
List Comprehension Explanation
To read the data into the snowdata
variable, we are using nested list comprehension.
In Python, list comprehension is a way of processing sequential data structures including lists, tuples and dictionaries. They take getting used to, but it will be worth your time to understand them as they make code clear and concise especially to other Pythonistas.
We will deconstruct this nested list comprehension to better understand it.
The entire list comprehension statement is enclosed in brackets: [entries ... entries[0].isdigit())]
. The first part of the list comprehension, for line in file
, loops through every line of the file. Each line is processed sequentially into the line
string.
The second part of the list comprehension, for entries in [line.split(",")]
, takes the line
we obtained from the first list comprehension and splits it (according to the commas) into the entries
list. For example, this line:
936,2016-04-27,, 10.7, 16.7, -3.9, 1.7, -5.8, -2.7, 34,
will be split into the entries
list:
["936", "2016-04-27", "", "10.7", "16.7", "-3.9", "1.7", "-5.8", "-2.7", "34"]
The if (len(entries) > 0 and entries[0].isdigit())
denotes that we only want lines with more than zero entries and that start with a number. This construct helps us get only the lines that contain data and will prevent us from grabbing the header, and will also avoid blank lines.
The end results is a snowdata
variable that looks like this:
[['936' '2016-04-26' '' ..., ' 3.5' ' 34' '\n']
['936' '2016-04-27' '' ..., ' -2.7' ' 34' '\n']
['936' '2016-04-28' '' ..., ' -1.7' ' 36' '\n']
...,
['936' '2016-05-24' '' ..., ' 3.3' ' 24' '\n']
['936' '2016-05-25' '' ..., ' 4.9' ' 24' '\n']
['936' '2016-05-26' '' ..., ' 6.1' ' 23' '\n']]
(The list has been abbreviated with ...
for clarity.) snowdata
is a two-dimensional data structure (a list of lists) with the first dimension representing a row of data from a certain date, and the second dimension representing the individual entries as strings for a row of data. The \n
is a newline character that is invisible in the CSV file, but we can see in the snowdata
list. It tells the computer to display subsequent characters on the next line when printing out to the screen or writing to a file.
Now that we have our SNOTEL data in the snowdata
variable, let’s plot it, and write that plot to a file.
Writing Results to File
In this part of the exercise, we will fetch the data in the snowdata
variable and create a histogram of snow depth over the month long interval. From the header information we examined earlier, we know the snow depth field is in the second to last column (remember the \n
newline character is in the last column). We have not yet learned about matplotlib so we will rely upon our newly acquired knowledge of strings to make a text-based histogram. Each bin of the histogram, built by repeating the -
character according to the snow depth, will be on a separate horizontal line of text.
To create our histogram, we will gradually build up a lengthy string as we loop through the snow depth data building our histogram bins. Remember, strings are immutable so you cannot append to them without creating a new string as you loop through the data which is inefficient and frowned upon. Instead, there is a preferred (a.k.a. idiomatic) approach to build such strings with Python. Create an empty list (which is mutable), and append strings to that list. Finally, call the string join() method to link all the strings contained in the list together into one big string. For example,
# create an empty list
storms = []
# append strings to that list
storms.append('hurricane')
storms.append('cyclone')
storms.append('typhoon')
# join the strings together separated by commas
s = ', '.join(storms)
print(s)
With this knowledge of programatically building long strings, we can create our histogram. As we just described, we will create an empty list called lines
and we will first append the header information. Then we will write a for
loop to iterate through the snow data into the d
variable (which is a list of the individual row entries) to grab the date (at index 1) and the depth in the second to last column which we will obtain with the negative index trick (d[-2]
). Finally, we will create our histogram bins by repeating the “-“ character according to the snow depth. Python allows strings to be “multiplied” to repeat them. For example, "Z" \* 4
results in "ZZZZ"
. We will use this tactic to build the bin. Note, we have to convert the snow depth string to an integer with the int() function.
# lines empty list
lines = []
# append the header
lines.append("SNOTEL Site 936, Echo Lake, Colorado, USA")
lines.append("")
lines.append(f"{'date':<12}{'snow depth (inches)':<4}")
# append the snow depth bins
for d in snowdata:
lines.append(f"{d[1]:<12}{d[-2].strip():<4}{'-' * int(d[-2])}")
# join on newline so that each string in the lines list appears on a new line
histogram = "\n".join(lines)
We also take advantage of more positional formatting features (e.g., {:<12}
and {:<4}
) to consistently pad the strings so that the header and data align.
We can now print our histogram!
print(histogram)
What does this visual representation of the snow depth data tell you? What conclusions can you draw or what additional questions do these data produce? What happened on the 17th of May? We will finally write the histogram to a file, completing the exercise.
with open("data/histogram.txt", 'w') as f:
f.write(histogram)