4. Material: Conscious Modules¶
Last Things Before Doomsday¶
The latest material left us at a point where our ability to produce program logic allowed us to make all kinds of programs - at least theoretically. So far we've conquered variables, functions, conditional structures, data structures, loops, and a handful of basic types. We've also gone through design and implementation priciples, and learned practices that make our code nice and clean. The last basics can be found from this material's text stream. We are rising above program logic, and talk mostly about ways of doing things instead. However, the biggest topic of this material is taking advantage of work others have done before us.
A programmer is never alone. Way back in the beginning we noticed that Python comes with quite a few bucketfuls of code. The Internet is also full of more modules like these, and Python even comes with a built-in tool to install them, which we already used to install IPython. In this material we'll take a look at some more modules that come with Python, and even branch out to a few external ones.
This is where the road to "real programming" truly opens up: using existing code allows us to escape from simple command line programs to windowed user interfaces, or to the web, with minimal effort. We're only scratching the surface of graphical user interfaces within this course, and web is left for learner's own devices. However all of that contains more or less similar code that we've been working with so far. The results may look different, but program logic follows the same rules.
This material also teaches you how to process other files using Python. Even though direct manipulation of files is not particularly common these days as it is usually done through some module, in order to understand how these modules work it's best to learn how to manipulate files with code written by your very own hands. In some cases a do-it-yourself solution can be smoother than any existing one - or at least more suitable for the specific purpose you have planned.
Making your own tools is in general a good reason to learn programming. Although there are existing tools for almost everything, a self-made precision tool is often more effective in one specific task. When you make your own tools you are aware of all task specific details, and you can make assumptions that a multipurpose tool cannot afford.
List Cleaning¶
We'll pick up exactly where we left off with the last material. There were two holes in our collection management program: removing and modifying albums. There's also a couple more things we didn't implement yet, and we'll get to them as well. Let's take care of our unfinished business first though, and implement removal and editing.
To refresh your memory, here's what we have so far:
Learning goals: How to remove items from lists, and what complications are involved. How to modify items in a list.
Lists on a Diet¶
Currently our program's remove function is a
stub function
that doesn't do anything. Since the feature is advertised by the user interface, it probably would be nice if it also did something. In order to achieve this, we need to look at how to remove items
from lists
in the first place. Lists have a method
called remove for this purpose. Here's what it's documentation says:remove(...) method of builtins.list instance L.remove(value) -> None -- remove first occurrence of value. Raises ValueError if the value is not present.
Some notes: removal is based on
value
; only one occurrence of this value is removed even if the same value appears multiple times. As long as the value is exactly equal it is removed. This is straightforward for lists that contain simple values but in our case it would be slightly unreasonable to require the user to write the entire representation of an album accurately in order to remove it (since it's a dictionary) - how would they even manage that with string inputs? We could definitely be a bit more merciful, and allow users to select an album to remove by entering artist's name and album title (separate artists can have albums with the same title, but one artist usually doesn't unless we want to keep track of different editions as well - but then we'd have to have a field for that too). So basically what we want the action to look like is:Enter artist name for album to remove: Mono Enter album title for album to remove: You Are There
Getting there is going to need a bit more than just the remove method, but starting with how to use it is still a good idea. Let's look at the following code snippet that could be found from a terminal version of an internet personality test (that authentically always gives the same result):
week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
print("There's seven days in a week")
print(", ".join(week))
removal = input("Which day do you want to remove: ").capitalize()
week.remove(removal)
print("Your week has these days remaining:")
print(", ".join(week))
print("This choice is typical for people with paranormal disorders.")
Because the documenation for remove says it throws an
exception
if the value
isn't in the list, we also need to add exception handling with try:try:
week.remove(removal)
except ValueError:
print("This day does not exist")
print("Choosing nonexistent days is typical for paranoid individuals")
else:
print("Your week has these days remaining:")
print(", ".join(week))
print("This choice is typical for people with paranormal disorders.")
The use of remove by itself is not exactly rocket science. However, removing with an exactly matching value is only one scenario, and more often we actually went to remove
items
that fulfill some condition
. In these cases the only way is to iterate through the list with a for loop
, and remove matching items as we encounter them. This also conveniently allows removal of more than one occurrence. A student who got tired of donkeys everywhere in the material developed the following code snippet, and just to be safe, decided to remove all animals starting with d:animals = ["monkey", "cat", "squirrel", "warus", "donkey", "llama"]
for animal in animals:
if animal.startswith("d"):
animals.remove(animal)
print(", ".join(animals))
Is there something special about donkeys after all? Unfortunately no, we just like them. However, there is something special about how this code works. The crux of the problem is that the
loop
removes items
from the same list it is iterating through. Iteration goes through indices
from 0 to N-1, where N is list length. When an item is removed, the indices of all items after it will drop by one while the iteration counter advances every time. These two things combined cause some items to be skipped. When an item from index 2 is removed, the item in index 3 immediately drops to index 2, but iteration resumes to index 3 - i.e. to the item that used to be in index 4! The animation below illustrates what happens:There's a small inaccuracy in the animation: the
loop variable
should retain its latest value until it gets assigned
a new value on the next iteration. At the end the animal variable would actually still hold the value
"duck". We left this out intentionally because in its current form the animation demonstrates the real issue at hand better.In order to fix this issue, we need to iterate through a copy of the
list
instead of the original in our loop
. This way removals from the original list do not affect the sequence we're iterating through. This can be achieved with a very small change to the for loop
declaration:for animal in animals[:]:
The new thing in this line,
[:]
at the end of the list name, is a variation of something we've already learned. It's actually a slicing
operation, the very same we did at the end of the previous material. Previously we just always had numbers at least on one side of the colon. Because both sides of the colon are now empty we take a "slice" that contains all items from the list, but is still a separate copy. This seemingly small change alters how the entire loop works rather significantly:This animation contains the same minor inaccuracy as the previous one (the loop variable would be "moose" at the end), but it demonstrates what we wanted it to. In our catalog program we wanted to remove albums that match the artist name - album title pair given by the user. The principle is the same as in the animal example. The
conditional statement
inside the loop
is just a bit different. We're also going to iterate over the copy, in case there's more than one album to remove (unlikely, but we haven't specifically prevented duplicates).def remove(collection):
print("Fill in the album title and artist name to select which album to remove")
print("Leave album title empty to quit")
while True:
title = input("Album title: ").lower()
if not title:
break
artist = input("Artist name: ").lower()
for album in collection[:]:
if album["artist"].lower() == artist and album["album"].lower() == title:
collection.remove(album)
print("Album removed")
We sowed some extra lower
method calls
to inputs and the conditional statement to make the comparisons case-insensitive while retaining the names in the collection in their original written form. This fuction only exits when the user has removed all albums they wanted to. Let's test it:This program manages an album collection. You can use the following features: (A)dd new albums (R)emove albums (S)how the collection (O)rganize the collection (Q)uit Make your choice: s 1. Alcest - Kodama (2016) [6] [42:15] 2. Canaan - A Calling to Weakness (2002) [17] [1:11:17] 3. Deftones - Gore (2016) [11] [48:13] 4. Funeralium - Deceived Idealism (2013) [6] [1:28:22] 5. IU - Modern Times (2013) [13] [47:14] -- press enter to continue -- 6. Mono - You Are There (2006) [6] [1:00:01] 7. Panopticon - Roads to the North (2014) [8] [1:11:07] 8. PassCode - Clarity (2019) [13] [49:27] 9. Scandal - Hello World (2014) [13] [53:22] 10. Slipknot - Iowa (2001) [14] [1:06:24] -- press enter to continue -- 11. Wolves in the Throne Room - Thrice Woven (2017) [5] [42:19] Make your choice: r Fill in the album title and artist name to select which album to remove Leave album title empty to quit Album title: color crush Artist name: elris Album removed Album title: Make your choice: s 1. Alcest - Kodama (2016) [6] [42:15] 2. Canaan - A Calling to Weakness (2002) [17] [1:11:17] 3. Deftones - Gore (2016) [11] [48:13] 4. Funeralium - Deceived Idealism (2013) [6] [1:28:22] 5. IU - Modern Times (2013) [13] [47:14] -- press enter to continue -- 6. Mono - You Are There (2006) [6] [1:00:01] 7. Panopticon - Roads to the North (2014) [8] [1:11:07] 8. PassCode - Clarity (2019) [13] [49:27] 9. Scandal - Hello World (2014) [13] [53:22] 10. Wolves in the Throne Room - Thrice Woven (2017) [5] [42:19] Make your choice: q
With this we have implemented all necessary basic features of the program. It can now add albums to the collection, remove them, and show the collection with sorting options.
Renovation¶
Our second objective is to add the ability to edit information for albums in the collection. This allows careless users to fix their errors.
The contents of a
list
can be modified by changing individual items
. This typically involves choosing an item with index subscription
, and then changing it either by assigning
a new value to replace it, or - if the value is mutable
(e.g. list) - change it with e.g. a method
. Once again it's important to understand the significant difference between mutable and immutable
types. Let's start our investigation from a list that contains strings
:In [1]: animals = ["walrus", "doggo", "donkey", "llama", "koala", "duck", "moose"]
The easiest thing to do is replacing an item with a new one:
In [2]: animals[1] = "elephant"
In [3]: animals
Out[3]: ['walrus', 'elephant', 'donkey', 'llama', 'koala', 'duck', 'moose']
Just like when modifying the values in a
dictionary
using keys
, the left side of the assignment denotes where in the list the new value should go to, and the new value itself goes to the right. The index subscription is on the left side which makes it the target of the assignment. Replacing an existing value with a new one is the only way to modify immutable values in a list (like strings). Other stuff fails:In [4]: animals[2].upper()
Out[4]: 'DONKEY'
In [5]: animals
Out[5]: ['walrus', 'elephant', 'donkey', 'llama', 'koala', 'duck', 'moose']
Once again, a string method only returns a copy of the original string, leaving the original as it was. If we want to apply the upper method to the "donkey" value in the list, it has to be assigned back to itself:
In [6]: animals[2] = animals[2].upper()
In [7]: animals
Out[7]: ['walrus', 'elephant', 'DONKEY', 'llama', 'koala', 'duck', 'moose']
Another notable fact about how
variables
work in this context:In [8]: animal = animals[3]
In [9]: animal = "bear"
In [10]: animals
Out[10]: ['walrus', 'elephant', 'DONKEY', 'llama', 'koala', 'duck', 'moose']
This is where it becomes extremely important to understand what it means when we say that a variable is a
reference
to a value
. When created, the new animal variable does indeed refer to the item
"llama" inside the list. However as soon as we assign a new value to it, its reference changes to the new value "bear", but the reference to "llama" within the list does not change. The only connection the two ever had was referring to the same value.However, if we have a list that contains lists, things work differently because lists are
mutable
. This example shows some Blackjack starting hands in a list.In [1]: hands = [["A", "8"], ["5", "7"], ["3", "10"]]
Swapping a hand with another one is done exactly like we did with
strings
:In [2]: hands[2] = ["4", "8"]
In [3]: hands
Out[3]: [['A', '8'], ['5', '7'], ['4', '8']]
Things get different if we want to change one
item
with a method
, e.g. draw one card. In this case assignment is not needed:In [4]: hands[0].append("5")
In [5]: hands
Out[5]: [['A', '8', '5'], ['5', '7'], ['4', '8']]
This is because
lists
are mutable
, and therefore the first line appends directly to the list that exists in memory. Meanwhile the inner lists contain strings so the rules that apply to strings apply here as well. If we want to change a card to another, it has to be done through assignment:In [6]: hands[2][0] = "9"
In [7]: hands
Out[7]: [['A', '8', '5'], ['5', '7'], ['9', '8']]
This also shows a new
syntax
: how to do subscription
to a list within a list. There's now another pair of braces after the first subscription. The first subscription returns a reference to the inner list, and the second subscription selects an index from the inner list to replace with a new value.If we create a new
variable
to refer to one of the list items
, and then assign a new value to the same variable, we get the same result as we did when working with strings
:In [8]: hand = hands[0]
In [9]: hand = ["10", "5"]
In [10]: hands
Out[10]: [['A', '8', '5'], ['5', '7'], ['9', '8']]
However if we instead do the same assignment from the list to a variable but then append to the variable:
In [11]: hand = hands[0]
In [12]: hand.append("2")
In [13]: hands
Out[13]: [['A', '8', '5', '2'], ['5', '7'], ['9', '8']]
There is a perfectly logical explanation to this result. The statement
copy = hands[:]
does create a new list
but this new list is just a different set of references
to the same mutable
lists as the original. Because of this any actions that modify this new list directly (i.e. the first append) only affect this copy, but any actions that modify the nested lists also sneakily affect the original list. The inner lists are not copied at all, so there only exists one of each of them, and they are simply referred to from two different lists. This is important to keep in mind, especially when stomping mines.Armed with this information we can attack the edit
function
of our catalog program. Truth be told it's more challenging to select the album and which field to modify than it is to actually perform the edit. We solved the first problem already in the remove function: the user inputs
album title and artist name. Let's copy this part from the remove function:def edit(collection):
print("Fill in the album title and artist name to select which album to remove")
print("Leave album title empty to quit")
while True:
title = input("Album title: ").lower()
if not title:
break
artist = input("Artist name: ").lower()
for album in collection[:]:
if album["artist"].lower() == artist and album["album"].lower() == title:
collection.remove(album)
print("Album removed")
When editing we also need to ask which field should be modified, and what the new value should be. These changes apply to the for loop. While we're at it we can also edit the prints to replace mentions of remove with mentions of edit. Because this code is pretty good for selecting an album, let's place the field editing into its own function. We can call it
edit_fields
. We can replace the list remove method call with a call to this function that we'll implement immediately after.def edit(collection):
print("Fill in the album title and artist name to select which album to edit")
print("Leave album title empty to quit")
while True:
title = input("Album title: ").lower()
if not title:
break
artist = input("Artist name: ").lower()
for album in collection[:]:
if album["artist"].lower() == artist and album["album"].lower() == title:
edit_fields(album)
print("Album edited")
This looks very similar to the remove function, but in this case it would be somewhat hard to turn them both into one function where
parameters
dictate what actually happens to the selected album (and what verb to use in prints). We could fix this by restructuring the program a bit, e.g. by choosing the album first and then choosing the action to take.Instead of merging these two function right now though, we're just going to move on to implementing the edit_fields function. This function is used for editing the fields of one album in the collection. We can once again use a
while loop
where the user chooses fields until they've done all they wanted.def edit_fields(album):
print("Current information:")
print("{artist}, {album}, {no_tracks}, {length}, {year}".format(**album))
print("Choose a field to edit by entering its number. Leave empty to stop.")
print("1 - artist")
print("2 - album title")
print("3 - number of tracks")
print("4 - album length")
print("5 - release year")
while True:
field = input("Select field (1-5): ")
if not field:
break
elif field == "1":
album["artist"] = input("Artist name: ")
elif field == "2":
album["album"] = input("Album title: ")
elif field == "3":
album["no_tracks"] = prompt_number("Number of tracks: ")
elif field == "4":
album["length"] = prompt_time("Album length: ")
elif field == "5":
album["year"] = prompt_number("Release year: ")
else:
print("Field does not exist")
Majority of this
function
is old news. Because the collection list contains mutable
values (dictionaries
), the modifying itself is just assigning new values to dictionary keys
, but the results are reflected in the collection. We need separate branches in the conditional structure
for each field because some of the values are prompted using different functions. Test run to show it works:This program manages an album collection. You can use the following features: (A)dd new albums (E)dit albums (R)emove albums (S)how the collection (O)rganize the collection (Q)uit Make your choice: e Fill in the album title and artist name to select which album to edit Leave album title empty to quit Album title: modern times Artist name: iu Current information: IU, Modern Times, 13, 0:47:14, 2013 Choose a field to edit by entering its number. Leave empty to stop. 1 - artist 2 - album title 3 - number of tracks 4 - album length 5 - release year Select field (1-5): 4 Album length: 32:14 Select field (1-5): Album edited Album title: Make your choice: s 1. Alcest - Kodama (2016) [6] [42:15] 2. Canaan - A Calling to Weakness (2002) [17] [1:11:17] 3. Deftones - Gore (2016) [11] [48:13] 4. Funeralium - Deceived Idealism (2013) [6] [1:28:22] 5. IU - Modern Times (2013) [13] [32:14] -- press enter to continue -- 6. Mono - You Are There (2006) [6] [1:00:01] 7. Panopticon - Roads to the North (2014) [8] [1:11:07] 8. PassCode - Clarity (2019) [13] [49:27] 9. Scandal - Hello World (2014) [13] [53:22] 10. Slipknot - Iowa (2001) [14] [1:06:24] -- press enter to continue -- 11. Wolves in the Throne Room - Thrice Woven (2017) [5] [42:19] Make your choice: q
Now that the features we intended to do last time we can actually move to new challenges. It just never ends. Until it ends.
Files Open Up to the Coder¶
The very first thing we need to fix is the most glaring flaw in our collection manager: the collection cannot be saved on exit, or loaded on startup for that matter. So far the collection has just been written directly into the program code, which is obviously not very good programming. It'd be much better to save it into a separate file.
In order to achieve this, we'll get to take a look at how Python handles text files. There's two stages to this project: writing data into a file, and reading it from there. Because writing is actually simpler, we'll start from there. Just to recap, the code below is what we have right now.
Learning goals: This section should teach you how to open files, write into them and read their contents. You should also pick up on some philosophy about what kinds of complications are involed in processing text-based data.
The Short Philosophy of Files¶
Our save
function
is a bit on the short side at the moment:def save_collection(collection):
"""
Saves the collection, one day in the future
"""
pass
In the near future this function should vomit the contents of the collection list into a
text file
in a way that it can be loaded again later. In general deciding on a suitable data format
for files is one of the biggest problems involved. The process of saving is quite straightforward.When saving a
list
that contains dictionaries
, one typical choice is to save one row per dictionary. In our example this would mean putting the data of one album on one row in the text file. Note that the keys
are not saved at all - returning values to the correct keys is done in the loading function. Another common practice is to use a separator
to separate each field's value from each other. We've done something quite similar actually:Input value and unit to convert: 12 yd 12 yd is 10.97 m
In this example, space was used as a separator. Because it's impossible for the computer to know which characters are data and which are separators, it's best to choose a separator that cannot be present in the data (if possible). Comma is a very common separator, to the extent that there is a commonly used file format called CSV - comma separated values. If we look at our data we can see that it doesn't contain any commas, at least at the moment.
The risk definitely exists though because there are no rules in the world of music against having commas in artist or album names. However, a
separator
that could not theoretically be in a name doesn't even exist. We make a prediction right now that we'll need a more sophisticated data format
further along the line. But for what we have right now, commas will do the trick. So let's just agree that for now the collection will be saved into a text file that looks like this:Alcest, Kodama, 6, 0:42:15, 2016 Canaan, A Calling to Weakness, 17, 1:11:17, 2002 Deftones, Gore, 11, 0:48:13, 2016 ...
This format is easy to read with just the
string
split method. As it turns out, everything that is read from a text file is read into strings. In real life we don't have any guarantees that any given string would not be present in an artist or album name. This means we eventually need a reading solution that doesn't rely just on split. Still, we have to start from somewhere.Open Book¶
Files are opened with with the open
function
for both reading and writing. An opened file can be assigned
to a variable
. This variable will act as a file handle
which is used figuratively to grab the file.handle = open("myfile")
When opening files, the open function must be told how the file will be opened (i.e. for reading or writing). If a file is opened for reading, it must exist. If opened for writing, it will be created at the moment of writing if it didn't exist before. Depending on the exact
opening mode
, an existing file with the same name may be entirely replaced by the new contents, or new contents will be written to the end of an existing file. Shown below are the three basic ways to use open:read = open("donkey.txt")
write = open("donkey.txt", "w")
append = open("donkey.text", "a")
This also shows that reading is the default mode (it would be "r", but it's not needed). The other two are "w" (write) and "a" (append). The latter is familiar from the list append method, and similarly it also means adding something to the end. However, files are not normally opened using the above syntax. Instead, the
with statement
is used. It looks like this:with open("donkey.txt") as read:
This statement must be followed by an indented
block
which contains all statements that operate on the opened file. When the with statement is used, the target variable
where the file handle
will be assigned
to is found on the right side of the as keyword
. Typically on the left of the keyword is the statement that opens the file, usually a call
to the open function.The advantage of the with statement is automatically closing the file once all statements in the block have been completed - whether they are executed or interrupted by an
exception
. Otherwise it would be the programmer's responsibility to make sure the file handle's close method gets called, and recognize all situations when the file needs to be closed.In addition to opening the file using the with statement, exceptions related to actually opening the file must also be taken into consideration. This is the topic of the next task.
Because the open
function call
is in the with statement, the whole thing needs to be wrapped into a try:try:
with open("donkey.txt") as read:
content = read_file(read)
except ??:
print("Unable to open the file.")
Of course you need to replace ?? with the
exception
you found out in the exercise. In normal cases opening a file will always look like this.Data Storage¶
Now that we know all we need to know about opening
text files
, we can finally write something into them. Let's start by adding the above basic structure into our save_collection function:def save_collection(collection, filename):
try:
with open(filename, "w") as target:
pass
except IOError:
print("Unable to open the target file. Saving failed.")
We also sneakily added a second
parameter
to this function. We did this with the foresight that in the future we might want to be able to manage multiple collections, each stored in their own file, and even allow the user to choose which files to use. For this reason it's a good idea to handle the filename as a variable
from the start so this part of the code does not need to be changed later. In general the less assumptions you make with a tool function, the better.At this point all we need to do is to replace the pass statement with actual code that implements writing into the file. To recap, our plan was to write one row for each
item
in the collection list
. We encountered a similar problem when we were printing the collection in the last material. The solution could be this familiar loop
:for album in collection:
The act of writing itself is carried out by the
file handle's
write method
. It works kind of like print does, but there's two key differences. First of all, write doesn't accept anything except strings
:In [1]: with open("donkey.txt", "w") as target:
...: target.write(5)
...:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-174567876ea2> in <module>()
1 with open("donkey.txt", "w") as target:
----> 2 target.write(5)
3
TypeError: write() argument must be str, not int
This means that everything that will be written to a file needs to be converted to strings first, with some chosen way. A simple way is to use the str function, but often it's more convenient to shape the entire row into a single string with the
format
method. This also allows us to tread on familiar ground:for album in collection:
print(
f"{i:2}. "
f"{album['artist']} - {album['album']} ({album['year']}) "
f"[{album['no_tracks']}] [{album['length'].lstrip('0:')}]"
)
Now we just need to replace the print with the write method:
def save_collection(collection, filename):
try:
with open(filename, "w") as target:
for album in collection:
target.write(
f"{i:2}. "
f"{album['artist']} - {album['album']} ({album['year']}) "
f"[{album['no_tracks']}] [{album['length'].lstrip('0:')}]"
)
except IOError:
print("Unable to open the target file. Saving failed.")
Done? Almost.
We notice that write doesn't do the kind of magic print does with newlines. Print produces them automatically, but it turns out write does not. This means we have to add it to the
string
manually - but how? We go all the way back to the second material, where we saw special use for the \ character:print("Inch (in or \")")
The backslash was called an
escape character
, and it meant that the next character would be interpreted in a special way. So far we've used it to include a string delimiter character inside the string itself without breaking anything. However there's a second use as well: when the escape character precedes a normal letter, some of them produce special meanings. The three most common are n, r and t - but especially n, for newline. \n
produces a newline character when it's inside a string. \t
produces tabulation and \r
just kind of makes you go "why?"
.Anyway, this means there's an easy solution to our problem. Just include a newline characer in the format template string.
def save_collection(collection, filename):
try:
with open(filename, "w") as target:
for album in collection:
target.write(
f"{i:2}. "
f"{album['artist']} - {album['album']} ({album['year']}) "
f"[{album['no_tracks']}] [{album['length'].lstrip('0:')}]\n"
)
except IOError:
print("Unable to open the target file. Saving failed.")
Now we can just add a second argument to the
function call
at the end of the main program
:save_collection(collection, "collection.txt")
The
file extension
here doesn't really matter to us, but Windows likes to use it to sniff out which program should open the file by default. We used .txt so Windows can open this file with a text editor. Now we can save the collection by running and quitting the program once.Reading Club¶
Now that the collection is safely saved we can think about how to load it. As was hinted earlier, the split
method
will be used. For the time being we're just going to trust that there aren't any names in the collection that would contain commas. The file is opened in reading mode this time, but otherwise the code that opens and reads the file looks very similar to the previous function:def load_collection(filename):
# The order of values in each row corresponds to dictionary keys:
# 1. "artist" - artist name
# 2. "album" - album title
# 3. "no_tracks" - number of tracks
# 4. "length" - album length
# 5. "year" - release year
collection = []
try:
with open(filename) as source:
pass
except IOError:
print("Unable to open the target file. Starting with an empty collection.")
return collection
We start by initializing the collection as an empty
list
. If the denoted file is not found this allows us to start with an empty collection by returning an empty list. We also added the filename parameter
to this function
for the same reason we did that for the save function.File contents are mostly read by two different
file handle
methods
: read and readlines. The first one reads the entire contents of the file into a single string
where as the latter gives a list where each row in the file is an item
. In most cases the latter is more useful - especially in all cases where one row corresponds to one unit of data. Picture a file like this:Eeyore, depression Pooh, eating disorder Piglet, anxiety Rabbit, OCD Christopher Robin, schizophrenia Tigger, ADHD
Let's call this file "pooh.txt"
In [1]: with open("pooh.txt") as pooh:
...: contents = pooh.read()
...:
In [2]: contents
Out[2]: 'Eeyore, depression\nPooh, eating disorder\nPiglet, anxiety\nRabbit, OCD\nChristopher Robin, schizophrenia\nTigger, ADHD'
In [3]: with open("pooh.txt") as pooh:
...: contents = pooh.readlines()
...:
In [4]: contents
Out[4]:
['Eeyore, depression\n',
'Pooh, eating disorder\n',
'Piglet, anxiety\n',
'Rabbit, OCD\n',
'Christopher Robin, schizophrenia\n',
'Tigger, ADHD']
Linebreaks were added manually. As advertised, the first one produces one string, and the latter produces a list of strings. We can also see that the result of readlines is not exactly the same as using split on the string produced by read:
In [5]: with open("pooh.txt") as pooh:
...: contents = pooh.read().split("\n")
...:
In [6]: contents
Out[6]:
['Eeyore, depression',
'Pooh, eating disorder',
'Piglet, anxiety',
'Rabbit, OCD',
'Christopher Robin, schizophrenia',
'Tigger, ADHD']
As you can see, readlines leaves the
newline characters
where they were. They are pretty easy to remove later with the strip method
, that can also be used to remove unnecessary spaces from the data.As we said earlier, further processing often consists of splitting the rows using a predefined
separator
, which allows us to access individual items
. These can then be transferred to a list
or a dictionary
. Usually the result of readlines is iterated over in a for loop
:with open("pooh.txt") as pooh:
for row in pooh.readlines():
read_row(row)
Likewise, the results are often appended to some list:
patients = []
with open("pooh.txt") as pooh:
for row in pooh.readlines():
patients.append(read_row(row))
This is the most typical way to read simple data back into a list inside the program. The entire point of saving and loading is that a program saves data in a way that allows it to restore its
state
, or the state of its data.Of course this process can become quite complex if we think about how video games save their data. For instance, how much information needs to be saved to reproduce the game state of a massive open world game like Skyrim? Luckily we don't need to worry about that because we have our own relatively simple case at hand. Let's apply the above code to our program:
def load_collection(filename):
# The order of values in each row corresponds to dictionary keys:
# 1. "artist" - artist name
# 2. "album" - album title
# 3. "no_tracks" - number of tracks
# 4. "length" - album length
# 5. "year" - release year
collection = []
try:
with open(filename) as source:
for row in source.readlines():
collection.append(read_row(row))
except IOError:
print("Unable to open the target file. Starting with an empty collection.")
return collection
This
function
doesn't show how individual rows are handled. That problem has been postponed to the implemention of the read_row function.Before actually implementing that function it's a good idea to examine what kinds of potential issues lay in wait when a line of text is read into a
list
with split. Most of the problems are familiar from splitting inputs
in the last material.In a way we can trust
files
a bit more than user inputs because at least in theory they've been produced by the same program that reads them. However we need to acknowledge the fact that some hackerman can edit them manually.So our goal is to turn this:
"Agalloch, The Mantle, 9, 1:08:36, 2002\n"
into this:
{
"artist": "Agalloch",
"album": "The Mantle",
"no_tracks": 9,
"length": "1:08:36",
"year": 2002
}
Let's demonstrate this whole process in the
Python console
in order to make the intermediate steps visible. Everything begins with a split because we have no other way to access parts of the string
individually.In [1]: row = "Agalloch, The Mantle, 9, 1:08:36, 2002\n"
In [2]: parts = row.split(",")
In [3]: parts
Out[3]: ['Agalloch', ' The Mantle', ' 9', ' 1:08:36', ' 2002\n']
Here we can see that the spaces after commas that were considered good practice in writing code actually cause some extra spaces to appear in the data, and those should be gotten rid of. The extra newline character should also be dealt with.
In [4]: for i, part in enumerate(parts):
...: parts[i] = part.strip()
...:
In [5]: parts
Out[5]: ['Agalloch', 'The Mantle', '9', '1:08:36', '2002']
This also shows a trick we haven't done before: how to change the contents of a
list
that contains immutable
values inside a loop
. The basic principle is to make a modified copy of the item
stored in the loop variable
, and replace the original item with it. This must be done with subscription
- if we tried to do part = part.strip()
we would end up creating a new part variable that no longer refers to the same value as parts[i].So far we've achieved:
['Agalloch', 'The Mantle', '9', '1:08:36', '2002']
All that's left is to convert two items into integers. For the sake of brevity of this example, we'll do the conversion directly to the list, but in practice this should be done when this list is turned into a dictionary.
In [8]: parts[2] = int(parts[2])
In [9]: parts[4] = int(parts[4])
In [10]: parts
Out[10]: ['Agalloch', 'The Mantle', 9, '1:08:36', 2002]
Converting this list into a dictionary is done in the actual program example. As a whole this process wasn't quite simple enough to be done with just a couple of lines. At this point it should be clear why it's best to do this in a separate function. Next we have a fun little game: what can go wrong?
The way in which we tried to figure out whether there's too many
items
in the list
isn't very good. It's much better to notice immediately at splitting that there aren't enough values (or there's too many). For this reason we could actually split directly into five variables
. This way we also don't need to remember which index is which field.def read_row(row):
try:
artist, album, no_tracks, length, year = row.split(",")
except ValueError:
print(f"Unable to read row: {row}")
The
loop
we used earlier to do the stripping cannot be used in this solution because we no longer have a list. However, two fields out of five need special treatment anyway, so we don't lose that much by processing each variable individually. The values have to be inserted into a dictionary
as well, and that is easiest to do at this stage. We can do this by creating a new dictionary where the values are derived from the above variables.def read_row(row):
try:
artist, album, no_tracks, length, year = row.split(",")
album = {
"artist": artist.strip(),
"album": album.strip(),
"no_tracks": int(no_tracks),
"length": length.strip(),
"year": int(year)
}
except ValueError:
print(f"Unable to read row: {row}")
Coincidentally, ValueError is what happens if split doesn't result in exactly 5 items, and also the exception int gives if it cannot convert a value. Lucky us, we get away with one except. We also notice that int doesn't give two hoots about blanks in strings (i.e. spaces, tabs, newlines).
The only thing left to do is
returning
the dictionary and deciding what to do when an unreadable row is encountered. At the moment we only print an error message along with the row that caused problems. However, we also need to somehow tell the load_collection function to not append anything (it would append a None otherwise).We have at least three ways to deal with this problem:
- read_row returns a value that indicates there was a problem, and load_collection can use a conditional statementto check the returned value.
- read_row can leave the ValueError exception uncaught, and we can do its handling in load_collection instead
- the collection can be handed over to the read_row function as a second argument, making it responsible for deciding when to append.
All of these are perfectly valid. There are many contributing factors to choosing between them. In this case we'll show how to implement the last option. This calls for some changes to the read_row function.
def read_row(row, collection):
try:
artist, album, no_tracks, length, year = row.split(",")
album = {
"artist": artist.strip(),
"album": album.strip(),
"no_tracks": int(no_tracks),
"length": length.strip(),
"year": int(year)
}
collection.append(album)
except ValueError:
print(f"Unable to read row: {row}")
The
function
no longer needs to return
anything because appending to the list
is done locally. Now we just need to change the load_collection function to provide the second argument.def load_collection(filename):
# The order of values in each row corresponds to dictionary keys:
# 1. "artist" - artist name
# 2. "album" - album title
# 3. "no_tracks" - number of tracks
# 4. "length" - album length
# 5. "year" - release year
collection = []
try:
with open(filename) as source:
for row in source.readlines():
read_row(row, collection)
except IOError:
print("Unable to open the target file. Starting with an empty collection.")
return collection
Finally we need to make the
main program
provide the collection's filename to the load_collection function with this line:collection = load_collection("collection.txt")
Let's try it out:
This program manages an album collection. You can use the following features: (A)dd new albums (E)dit albums (R)emove albums (S)how the collection (O)rganize the collection (Q)uit Make your choice: s 1. Alcest - Kodama (2016) [6] [42:15] 2. Canaan - A Calling to Weakness (2002) [17] [1:11:17] 3. Deftones - Gore (2016) [11] [48:13] 4. Funeralium - Deceived Idealism (2013) [6] [1:28:22] 5. IU - Modern Times (2013) [13] [47:14] -- press enter to continue -- 6. Mono - You Are There (2006) [6] [1:00:01] 7. Panopticon - Roads to the North (2014) [8] [1:11:07] 8. PassCode - Clarity (2019) [13] [49:27] 9. Scandal - Hello World (2014) [13] [53:22] 10. Slipknot - Iowa (2001) [14] [1:06:24] -- press enter to continue -- 11. Wolves in the Throne Room - Thrice Woven (2017) [5] [42:19] Make your choice: a Fill the information for a new album. Leave album title empty to stop. Album title: All Around Us Artist name: Miaou Number of tracks: 10 Total length: 59:39 Release year: 2008 Album added Album title: Make your choice: q
This program manages an album collection. You can use the following features: (A)dd new albums (E)dit albums (R)emove albums (S)how the collection (O)rganize the collection (Q)uit Make your choice: s 1. Alcest - Kodama (2016) [6] [42:15] 2. Canaan - A Calling to Weakness (2002) [17] [1:11:17] 3. Deftones - Gore (2016) [11] [48:13] 4. Funeralium - Deceived Idealism (2013) [6] [1:28:22] 5. IU - Modern Times (2013) [13] [47:14] -- press enter to continue -- 6. Mono - You Are There (2006) [6] [1:00:01] 7. Panopticon - Roads to the North (2014) [8] [1:11:07] 8. PassCode - Clarity (2019) [13] [49:27] 9. Scandal - Hello World (2014) [13] [53:22] 10. Slipknot - Iowa (2001) [14] [1:06:24] -- press enter to continue -- 11. Wolves in the Throne Room - Thrice Woven (2017) [5] [42:19] 12. Miaou - All Around Us (2008) [10] [59:39] Make your choice: q
The new addition appears at the end of the collection because we didn't sort the thing at any point. At least it's easy to spot that the change we made has persisted through separate runs of the program.
This closes the chapter on reading files. There would be a lot more to reading and writing files but the more difficult or complex the solution is starting to look like, the more probably it's time to move from do-it-yourself solutions to something that already exists, and where Someone Else (tm) has already taken care of exceptions.
Someone Else's Problem¶
In real life programming data saving is not usually done in such an elementary fashion as we just did. Depending on the nature of the program - and its data - an existing tool or database solution is typically used. Even without going further than Python's own modules one can find several solutions for saving data between runs of a program. One of the more common ones is csv which saves data on comma-separated rows - in practice a complete version of what we just tried to accomplish. However, JSON (JavaScript Object Notation) suits our program better. It is quite commonly used in communication between web services, and in configuration files. Python also has pickle, which saves Python objects between runs. However, pickle is less generic as it only applies to Python, while JSON can be processed quite easily with almost any language.
Learning goals: To see how easy it is to save collection type data with Python's json module. You'll also see how
command line arguments
can be used to configure a program at startup.JSON in a Nutshell¶
The basic use of the json module is extremely simple. A JSON file can be produced with the dump function that eats Python data structures, and loaded with the load function.
In [1]: impot json
In [2]: measurement_1 = {
...: "date": "2014-08-03",
...: "location": "animal crossing",
...: "results": [12.54, 6.35, 20.38, 13.76, 45.51],
...: "comment": "donkeys are heavy"
...: }
...:
In [3]: with open("measurement.json", "w") as target:
...: json.dump(measurement_1, target)
...:
This produces a file with the following contents:
{"date": "2014-08-03", "results": [12.54, 6.35, 20.38, 13.76, 45.51], "comment": "donkeys are heavy", "location": "animal crossing"}
It's just as easy to load it back:
In [4]: with open("measurement.json") as source:
...: measurement_1 = json.load(source)
...:
In [5]: measurement_1
Out[5]:
{'date': '2014-08-03',
'results': [12.54, 6.35, 20.38, 13.76, 45.51],
'comment': 'donkey are heavy',
'location': 'animal crossing'}
If we apply this new tech to our program we'll notice that both loading and saving are simplified "a bit". The basic limitation of JSON is that each document can only contain one object. However, this doesn't slow us down at all because that one object can be a
list
, that can contain dictionaries
, other lists etc. In all its simplicity, the save function becomes:def save_collection(collection, filename):
try:
with open(filename, "w") as target:
json.dump(collection, target)
except IOError:
print("Unable to open the target file. Saving failed.")
We also want to change the target filename to a new one in the main program:
save_collection(collection, "collection.json")
At this point we should run the program in order to load the collection with the old mechanism, and then save it with the new one, effectively converting our collection file to JSON. After that we can reimplement the loading function:
def load_collection(filename):
try:
with open(filename) as source:
collection = json.load(source)
except (IOError, json.JSONDecodeError):
print("Unable to open the target file. Starting with an empty collection.")
collection = []
return collection
Note that the read_row function is no longer used. This is a good time to feel a bit silly: we went through a whole lot of trouble to create our own (deficient) loading solution, only to notice that, given the proper module, we could have done it with like two lines of code. The only noteworthy new thing in this function is the addition of a second exception to the except statement:
json.JSONDecodeError
. This is an exception that's been defined in the json module, and it occurs if the given file is not compliant to JSON syntax. After we change the main program's loading function to use this new JSON format, we can show that commas no longer cause problems:This program manages an album collection. You can use the following features:
(A)dd new albums
(E)dit albums
(R)emove albums
(S)how the collection
(O)rganize the collection
(Q)uit
Make your choice: a
Fill the information for a new album. Leave album title empty to stop.
Album title: Black Tar Prophecies Volumes 4, 5 & 6
Artist name: Grails
Number of tracks: 12
Total length: 50:36
Release year: 2013
Album added
Album title:
Make your choice: q
This program manages an album collection. You can use the following features: (A)dd new albums (E)dit albums (R)emove albums (S)how the collection (O)rganize the collection (Q)uit Make your choice: s 1. Alcest - Kodama (2016) [6] [42:15] 2. Canaan - A Calling to Weakness (2002) [17] [1:11:17] 3. Deftones - Gore (2016) [11] [48:13] 4. Funeralium - Deceived Idealism (2013) [6] [1:28:22] 5. IU - Modern Times (2013) [13] [47:14] -- press enter to continue -- 6. Mono - You Are There (2006) [6] [1:00:01] 7. Panopticon - Roads to the North (2014) [8] [1:11:07] 8. PassCode - Clarity (2019) [13] [49:27] 9. Scandal - Hello World (2014) [13] [53:22] 10. Slipknot - Iowa (2001) [14] [1:06:24] -- press enter to continue -- 11. Wolves in the Throne Room - Thrice Woven (2017) [5] [42:19] 12. Miaou - All Around Us (2008) [10] [59:39] 13. Grails - Black Tar Prophecies Volumes 4, 5 & 6 (2013) [12] [50:36] Make your choice: q
An Argument for Better Usability¶
We've mentioned at least once how nice it would be if the user could relatively effortlessly choose which collection file is loaded when the program is started, and even where it is saved. Our saving and loading
functions
already support this, but prompting the location is missing from the program. We could do this with the input function, but there's another way to give this kind of information to a program. Whereas input is usually used when the program is already running, the method we'll look into next is used to give instructions when the program is started. Our program loads the collection at startup and saves it at the very end. So it would make sense to define these locations at startup.The answer is once again found from a
module
. Our solution is offered by the sys module which is used for all kinds of deeper interaction with the computer's operating system. We will only scratch its very surface as we only need one feature from it - and it isn't even a function. The feature, or attribute really, is called argv, abbreviation for argument vector. It contains the arguments
that were used when running the program. Our normal way to run a Python program is:ipython collection.py
On this line,
ipython
is the actual command, and collection.py
is its first and only argument. That's not all though. Programs can be given more than one argument - an arbitrary amount of them in fact. Handling the arguments is the program's responsibility. Technically we could run our program by adding a whole bunch of arguments:ipython collection.py i can haz cheezburger
Because the program doesn't react to the extra arguments in any way, nothing special happens, the program just runs normally. What we actually want to accomplish though, is for the program to be ran like this:
ipython collection.py collection.json
So the user can now give the collection's
filename
as an argument. We could also add the option to give another filename in case the user wants to save to a different file instead of the file it was loaded from:ipython collection.py collection.json copy.json
It's a bit hard to demonstrate the argument vector in the console, so we have to create a small code file instead:
C:\path\to\somewhere>ipython arg.py i can "haz cheezburger" ['arg.py', 'i', 'can', 'haz cheezburger']
This shows that an argument vector is actually just a normal
list
that contains strings
. We can also see that it uses spaces to separate individual arguments from each other, but spaces inside quotes are interpreted as space characters instead of separators
. Because the filename is handled inside the program as a string, taking it out from this list should not be much of a challenge for us. One thing we have to do though, is to do some exception handling for the argument vector, and instruct the user if needed. The best place to do this is a new function
:def read_arguments(arguments):
pass
This function should
return
two filenames
: file where the collection is, and file where it should be saved to. If the save file hasn't been given separately, the program uses the same file for both. This is a rather simple scenario: if there are three arguments, there should be two filenames; if there's two, only the source file has been given; and if there is only one (the program's name), the user hasn't given enough arguments and the program cannot be run. This function always returns two values. Both are left empty (None) if the function fails entirely. This way the main program can check from the returned values whether filenames were found from the arguments.def read_arguments(arguments):
if len(arguments) >= 3:
source = arguments[1]
target = arguments[2]
return source, target
elif len(arguments) == 2:
source = arguments[1]
return source, source
else:
return None, None
The
main program
also needs some changing:source, target = read_arguments(sys.argv)
if source:
menu(source, target)
else:
print("Usage:")
print("python collection.py source_file (target_file)")
A new
import
will also be needed in the beginning:import json
import math
import sys
We took this opportunity to move the old main program into the menu function so that we don't need to put it inside one if statement with all its bells and whistles.
def menu(source_file, target_file):
collection = load_collection(source_file)
print("This program manages an album collection. You can use the following features:")
print("(A)dd new albums")
print("(E)dit albums")
print("(R)emove albums")
print("(S)how the collection")
print("(O)rganize the collection")
print("(Q)uit")
while True:
choice = input("Make your choice: ").strip().lower()
if choice == "a":
add(collection)
elif choice == "e":
edit(collection)
elif choice == "r":
remove(collection)
elif choice == "s":
show(collection)
elif choice == "o":
organize(collection)
elif choice == "q":
break
else:
print("The chosen feature is not available.")
save_collection(collection, target_file)
Doing this also allows us to handle Ctrl + C nicely:
source, target = read_arguments(sys.argv)
if source:
try:
menu(source, target)
except KeyboardInterrupt:
print("Program was interrupted, collection was not saved.")
else:
print("Usage:")
print("python collection.py source_file (target_file)")
When the entire menu
function call
is placed like this inside a try, the user can press Ctrl + C at any time during program execution and it always results in a clean exit to the terminal
. Again we could wrap the entire contents of the menu function into the try instead, but let's just agree that what we just did looks much more elegant. At this point we once again have a pretty nice program.In case you happen to need more complex handling of command line arguments, fiddling with sys.argv manually gets tiresome pretty quickly. The argparse module can be a big help in such situations.
Third Party Solutions¶
Eventually you'll encounter a situation where Python's built-in
modules
are no longer sufficient. While it's true that you can use them in theory to make everything under the blue sky, a lot of things will take unreasonable effort. Especially considering that someone has most likely already solved your problem. In this section we'll familiarize ourselves with so-called third party modules that generally offer solutions to most problems that are not extremely specific. We'll employ thid party modules in the last feature of our little collection manager.Ain't no one got the time to input album information into a collection manager manually. This is even more true in modern times where the music library is usually on a computer (or not, because Spotify is a thing, but then we would not need this program in the first place...) Let's make a reasonable proposal: the program should parse the collection contents by reading the computer's music library. If the library is well-organized, it shouldn't be too hard to read things like album titles and artist names from folder names on the disk. Number of tracks can also be figured out by counting the number of music files in an album folder. Even release year can be found from folder names in some cases. But what about album lengths? We can find track lengths from the metadata of music files, but how can we access them?
One option is to dig up the music format's specification and find out how metadata can be accessed from music files. Or we could take the easier way and install a module that does this.
This section won't have as much explanation of individual solutions as the preceding material. If you have questions about the solutions we implement, ask about them via email or chat, or from a TA in the exercises. The main focus of this section is on how to employ modules made by others, and also how to make modules that could be used by others.
Learning goals: After this section you know where to look for Python packages and how to install them. As an additional bonus we'll show you some code about how to rummage through the contents of your hard drive, and how to make a separate code module.
Packages from the Internet Wonderland¶
As usual, you should be careful about what you install on your computer. The default place where Python packages should be looked for is the Python Package Index aka PyPI. PyPI is a repository maintained by the Python Software Foundation, so it can be considered as a rather reliable source. Packages in PyPI can be installed with the pip installation script that we used to install IPython at the very beginning of this course. There's no need to download packages with a web browser and then install them separately. Let's head to the PyPI web page and search for "mp3 tag read" which should produce a bunch of results. The results' descriptions indicate that there are quite a few packages that could fit our purposes. So, how do we know what to choose? Usually by reading each package's documentation or code examples.
This time we're choosing tinytag because its front page has clear use examples, and judging by them it looks suitable for what we want to do. It also looks like an alive project given that the latest update is from this month (as of writing this, in Spring 2020). It only supports reading tags but that is perfectly adequate for our program. Its license is MIT which is also perfect for us as it allows free use. All of these are factors that should be taken into consideration when choosing a Python package. For personal uses anything that works and is not harmful is usually adequate.
We can use pip to install packages from PyPI. The documentation page for tinytag shows a simple example.
pip install tinytag
This is written in the
terminal
. The installation should look like this:C:\some\folder>pip install tinytag Collecting tinytag Using cached https://files.pythonhosted.org/packages/74/cb/844151777ec728692b7 1bced33db355d6f889cf612f949325b2d2b62657c/tinytag-1.3.1.tar.gz Installing collected packages: tinytag Running setup.py install for tinytag ... done Successfully installed tinytag-1.3.1
Running the installation script usually requires writing access to Python's library folder. This usually means you need admin privileges. Refer to instructions in Pre-Exercises and Installations if you've forgotten how. After this installation the module can be imported in Python just like any of Python's built-in modules. All in all, not a particulary complicated process. Now we just have the small detail of writing the code that uses this package.
In case you are using Linux you should check what's the recommended practice for managing Python packages in your distribution. In case you cannot obtain the installation privileges on your computer, you can look into using virtual environments. Using virtual environments is generally a good practice in Python programming, but it's a bit far-removed from the basics, albeit not a very long lesson.
Sniffing Module¶
Now that we have a shiny new tool, we are itching to use it. Let's create an entirely new
code file
for this purpose, one that we can then import to our collection program. Mostly to get a bit of familiarity with the process. Let us call this new module sniffer.py. We'll start with a couple of empty functions
and one import
:import tinytag
def read_folder(folder):
pass
def read_metadata(filename, collection):
pass
Of the two functions, read_metadata will be the one that reads data from a single music file. The entire program will proceed by starting from a given folder, and then crawling into its subfolder (and their subfolders), inspecting every music file it finds on the way. Whenever a new album is found (previously unknown combination of artist name and album title), a new
dictionary
is created to represent it, and it is appended to the collection list
. This matches the structure we previously had for the collection. The album's release year can be read from the metadata. Finally, a new key is added to each dictionary: "lenghts". This will be a list where we collect the lengths of individual tracks on the album. After the search is over, these lists are used to calculate the total length of each album, and also the number of tracks from the number of items
.We can start by reading a single file in order to get familiar with our new toy. According to the documentation we can get a tag like this:
tag = TinyTag.get('/some/music.mp3')
However, since we used a normal
import
instead of from-import, we need to do it like this:tag = tinytag.TinyTag.get('/some/music.mp3')
In our case location on disk and name of the files come to the function as a
parameter
. The data we need can be read as follows:def read_metadata(filename, collection):
tag = tinytag.TinyTag.get(filename)
title = tag.album
artist = tag.artist
year = tag.year
length = tag.duration
Because we haven't tested whether tinytag actually does what we want, the best course of action right now is to throw some test prints into the program:
print(artist)
print(title)
print(year)
print(length)
Then we need a short
main program
that calls the function
in order to see what happens. If you want to test this on your own computer you naturally need to have at least one music file somewhere. Note that the argument
must be either absolute
or relative
path
for the file to be found. This example has an absolute path to an external drive in Windows. The collection can be an empty list because we're not using it for anything yet.read_metadata("E:/Music/Encore Show/Scandal - 10 - Cute!.mp3", [])
With the above prints this outputs:
Scandal Encore Show 2013 272.6661224489796
This tells us two things: 1) tinytag works like we expected, allowing us to get what we need; and 2) song length is given in seconds, which we can also find out in the documentation. Now that we've tested it, we can remove the prints from the function (the test code in the main program can stay for now). Next we need a familiar snippet of code that searches whether an album is already in the collection:
for album in collection:
if album["artist"].lower() == artist.lower() and album["album"].lower() == title.lower():
Doing this for every file is gonna get pretty taxing, and it can lead into very slow performance when the collection grows. Let's not worry about that right now tho - code should not be optimized until it's proven to be slow. Reading the tag also took a while, so performance is not necessarily hinged on this
loop
. Either way if the album is already found, all we do is append the song's length to the existing list of lengths: for album in collection:
if album["artist"].lower() == artist.lower() and album["album"].lower() == title.lower():
album["lengths"].append(length)
break
If an album has not been added to the collection yet, it will be added. As it is impossible to tell this while inside the loop (since it hasn't gone through everything yet) in the
if statement
, we cannot attach an else to this statement. However, there is a new trick we can use: the else branch
of a for loop
. The else branch in a loop will be entered if the entire loop goes through without being interrupted (by e.g. break
or return). If we add an else branch to our loop, it means the code within it is only executed if no album in the collection fulfilled the aforementioned condition. for album in collection:
if album["artist"].lower() == artist.lower() and album["album"].lower() == title.lower():
album["lengths"].append(length)
break
else:
collection.append({
"artist": artist.strip(),
"album": title.strip(),
"lengths": [],
"year": int(year)
})
At this point the albums don't have lenght or number of tracks yet because we have no way to know them. We can once again test it by adding a print that shows the collection after calling the above function once.
collection = []
read_metadata("E:/Music/Aura/Saor - 01 - Children of the Mist.mp3", collection)
print(collection)
If we run the program we can see that something was indeed added:
[{'year': '2014', 'artist': 'Saor', 'album': 'Aura', 'lengths': [733.4138775510204]}]
We can also test whether the length appending works correctly by picking another example from the same album:
collection = []
read_metadata("E:/Music/Aura/Saor - 01 - Children of the Mist.mp3", collection)
read_metadata("E:/Music/Aura/Saor - 02 - Aura.mp3", collection)
print(collection)
This seems to work too:
[{'year': '2014', 'artist': 'Saor', 'album': 'Aura', 'lengths': [733.4138775510204, 817.1885714285714]}]
At this point it seems safe to say that this function works like we wanted it to. Hopefully we no longer need to touch it, unless we want to change its behavior later, or we discover surprising
bugs
when testing with real data instead of just a couple of examples.Squirrel in the Directory Tree¶
The trickier step - or at least one where we encounter some new stuff again - is going through the folders. The basic concept is quite straightforward: take the directory listing of a folder's contents and iterate through it one item at a time: folders are opened for further processing while files are read with the previous function. While the concept is not that complicated, its implementation raises some questions. Namely, how to navigate an unknown directory tree when we can't know in advance how many levels of subfolders there will be? We have two ways to do this: one is called recursion, where a
function
calls itself with new arguments
; the other is collecting folders into a to-do list
as we encounter them. In our scenario we can expect that there won't be too many levels of subfolders so we can choose recursion. Meanwhile, minestompers should avoid recursion because Python does not appreciate too many function calls
piling up.Both of these solutions have the same principle: the algorithm marks its own future tasks (i.e. folders to open) either by queueing function calls or appending folder names to a list. When a folder has been opened and processed, it is removed from the task list. The algorithm defines its own number of iterations in a way. As the directory tree is hierarchical and we only move in one way, there's no risk of an infinite loop, and each folder is only visited once.
The function we are implementing is read_folder. Its
parameter
is the path
of the folder reading should start from. The first phase is to find out what's in the folder. This can be done with the os module, and more specifically its functions for files and directories. There we can find the listdir function. We can test it quickly in the console
:In [1]: import os
In [2]: os.listdir("E:/Music/Aura")
Out[2]:
['Saor - 04 - Farewell.mp3',
'Saor - 05 - Pillars of the Earth.mp3',
'Saor - 01 - Children of the Mist.mp3',
'Saor - 02 - Aura.mp3',
'Saor - 03 - The Awakening.mp3']
As we can see, the function returns the contents of a directory as file (and folder) names. If the names are strings, how can we tell what's a file and what's a folder? The answer can be found from the os.path submodule that's used for all kinds of operating system
path
processing. One of its functions, isdir, can be used to ask the OS whether a certain path belongs to a folder - if it's not a folder, then it's a file. Another problem we have is skipping files that are not music files. Again we have two options: look at file extensions with a conditional statement
, or find out what exception
tinytag causes when it can't read a file. The latter option is much better in this case because we can expect music folders to mostly contain music files. So let's start by finding out what the exception actually is.In [1]: import tinytag
In [2]: tinytag.TinyTag.get("sniffer.py")
---------------------------------------------------------------------------
TinyTagException Traceback (most recent call last)
<ipython-input-2-1e6241557a86> in <module>
----> 1 tinytag.TinyTag.get("sniffer.py")
c:\program files\python37\lib\site-packages\tinytag\tinytag.py in get(cls, filen
ame, tags, duration, image, ignore_errors)
126 return TinyTag(None, 0)
127 if cls == TinyTag: # if `get` is invoked on TinyTag, find parse
r by ext
--> 128 parser_class = cls._get_parser_for_filename(filename, except
ion=True)
129 else: # otherwise use the class on which `get` was invoked
130 parser_class = cls
c:\program files\python37\lib\site-packages\tinytag\tinytag.py in _get_parser_fo
r_filename(cls, filename, exception)
117 return tagclass
118 if exception:
--> 119 raise TinyTagException('No tag reader found to support filet
ype! ')
120
121 @classmethod
TinyTagException: No tag reader found to support filetype!
This exception belongs to tinytag so we need to prefix it with tinytag when using it in our code. With this knowledge we can finally write a test function (also showing the added import):
def read_folder(folder, collection):
contents = os.listdir(folder)
for name in contents:
path = os.path.join(folder, name)
if os.path.isdir(path):
print("Found folder:", name)
else:
try:
read_metadata(path, collection)
except tinytag.TinyTagException:
print("Skipping", name)
The first line inside the
loop
joins each name from the directory listing to the folder's path
- without this Python looks for the file in the wrong place (from the folder the program was started from). The prints that describe what the program does verbally are good for testing. In order to run the test we're gonna create an additional empty folder into a folder that contains music files - we'll call it "test". We also create one text file. This way we can cover all cases. Let's change our main program to test this new function:collection = []
read_folder("E:/Music/Aura", collection)
print(collection)
Found folder: test Skipping donkey.txt [{'lengths': [733.4138775510204, 817.1885714285714, 606.3804081632653, 499.9836734693878, 729.5738775510204], 'artist': 'Saor', 'album': 'Aura', 'year': '2014'}]
With this we can once again safely assume that our control flow structures work like we wanted. Now we can change the code to its final form where the print for folders is replaced by calls to the same
function
. For the except branch
a simple pass will suffice. The program could also print which folder it's currently inspecting to preserve the user's sanity. This way they can follow the progress instead of looking at a black screen and wondering if the program's stuck or just taking a really long time.def read_folder(folder, collection):
print("Opening folder:", folder)
contents = os.listdir(folder)
for name in contents:
path = os.path.join(folder, name)
if os.path.isdir(path):
read_folder(path, collection)
else:
try:
read_metadata(path, collection)
except tinytag.TinyTagException:
print("Skipping", name)
You could now run this for your entire music collection by changing the main program test code:
collection = []
read_folder("E:/Music/Aura", collection)
for album in collection:
print(album)
This may take a while, depending on how big the collection is. Shown below is a small snippet of a run that shows one thing about os.listdir: it's definitely not giving contents in alphabetic order. Majority of the output has been cut out, as indicated by the ellipsis.
Opening folder: E:/Music Opening folder: E:/Music\Exercises in Futility Opening folder: E:/Music\Pelagial Opening folder: E:/Music\Moonlover Opening folder: E:/Music\Guardians ... {'artist': 'Mgla', 'lengths': [478.58938775510205, 468.5583673469388, 278.02122448979594, 285.7534693877551, 495.6734693877551, 529.5804081632654], 'album': 'Exercises in Futility', 'year': '2015'} {'artist': 'The Ocean', 'lengths': [72.48979591836735, 356.31020408163266, 264.5420408163265, 198.11265306122448, 267.36326530612246, 207.934693877551, 305.34530612244896, 67.1869387755102, 557.9232653061224, 545.410612244898, 355.6048979591837], 'album': 'Pelagial', 'year': '2013'} {'artist': 'Ghost Bath', 'lengths': [87.04, 548.6497959183673, 524.6432653061224, 287.63428571428574, 243.905306122449, 453.48571428571427, 385.5412244897959], 'album': 'Moonlover', 'year': '2015'} {'artist': 'Saor', 'lengths': [692.610612244898, 632.0065306122449, 669.2832653061224, 687.8824489795918, 679.3926530612245], 'album': 'Guardians', 'year': '2016'} ...
Packing Up¶
One more feature is missing: the data doesn't quite yet match the format used in the collection manager. We still need to calculate the values for no_tracks and length. The length is also in seconds while we wanted it to be as a more human-readable string. If you guessed we're gonna use another module to solve the issue, you were 100% correct. Our friend this time around is the time
module
. It contains all sorts of useful functions related to time. The one tool that does what we need is the strftime function
that formats a time. It works kind of like the format method does, but has its own way for marking placeholders. Details can be found from the function's documentation.The date of writing this could be printed like this:
In [1]: import time
In [2]: time.strftime("%d.%m.%Y", time.localtime())
Out[2]: '31.03.2020'
More often we want to display dates in a different format that sorts without problems. We'll also add the time to get a proper timestamp:
In [3]: time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
Out[3]: '2020-03-31 11:49:07'
At the end of the string we can see the time format we wanted for lenghts. Unfortunately the lenght is currently as seconds, and if we try to offer seconds to strftime it's not happy:
In [4]: time.strftime("%H:%M:%S", 453)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-c2ee80529b06> in <module>()
----> 1 time.strftime("%H:%M:%S", 453)
TypeError: Tuple or struct_time argument required
The exception here is telling us that we need to have a
tuple
or a mysterious struct_time object. The function we used in the previous examples, localtime, returns the current time in this format by default, but it can also be given a number of seconds as an argument. E.g. for a three minute song:In [5]: time.localtime(180)
Out[5]: time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=2, tm_min=3, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)
That shows all kinds of information. The date is the official epoch used in computing, 1.1.1970 00:00:00, so basically the time is now 3 minutes after the clocks for computing were started. We can pick individual components from a struct_time:
In [6]: length = time.localtime(180)
In [7]: length.tm_min
Out[7]: 3
In [8]: length.tm_hour
Out[8]: 2
Wait what, where did that two hours come from?! It came from timezones because localtime takes them into account. This is useful if we actually wanted the current time but less useful if we want to make an object from an album or song's length as seconds. Luckily we also have the gmtime function which does the same but without accounting for timezones. So the correct statement to convert seconds into a time object would be:
In [9]: time.strftime("%H:%M:%S", time.gmtime(180))
Out[9]: '00:03:00'
If we apply this, we can write a function that goes through the entire collection and converts length information from the current
list
format to strings that we used earlier.def parse_lenghts(collection):
for album in collection:
album["length"] = time.strftime("%H:%M:%S", time.gmtime(sum(album["lengths"])))
album["no_tracks"] = len(album["lengths"])
album.pop("lengths")
A trial run gives the following results:
{'artist': 'Mgla', 'album': 'Exercises in Futility', 'year': '2015', 'length': '00:42:16', 'kpl_n': 6} {'artist': 'The Ocean', 'album': 'Pelagial', 'year': '2013', 'length': '00:53:18', 'kpl_n': 11} {'artist': 'Ghost Bath', 'album': 'Moonlover', 'year': '2015', 'length': '00:42:10', 'kpl_n': 7} {'artist': 'Saor', 'album': 'Guardians', 'year': '2016', 'length': '00:56:01', 'kpl_n': 5} ...
Finally we'll make a function that the collection program can call when it needs to parse a collection. This is very similar to the main program that we used for testing.
def read_collection(folder):
collection = []
read_folder(folder, collection)
parse_lenghts(collection)
return collection
Now we can move back to the collection manager to implement this new feature. But before we go and import our shiny new module to the collection manager, we need to talk about one important thing regarding modules. We'll approach it through a task.
This is not how our
imports
behaved earlier! Is there something different about writing modules
after all? Not really. What import actually does, is that it executes the imported module. Keep in mind that def is a statement that - when executed - only creates a function
. The problem with our module is that it contains other statements that are not defs or imports. Therefore main program
gets executed during import, and this is not ok. Of course we could just delete the main program code. However, there are definitely cases where we want to employ parts from programs that are meant to be used independetly as well.Luckily there's a best of both worlds solution in Python. When modules are executed, they are given an internal
variable
called __name__. Normally __name__ contains the module's name but if the module is the program that was executed, __name__ gets the special value "__main__"
. We can use this to tell when a module is being executed, and when it's being imported (i.e. when we don't want to execute its main program). This test can be done with a simple if statement
:if __name__ == "__main__":
The entire main program is placed inside this statement. Let's do this for the ball module:
If we try to import it now:
In [1]: import ballmodule
Nothing visible happens - as it should be. This way only the
function
definitions have been executed from the module, and the main program
has been ignored. However, we can still run the module from the terminal with ipython ballmodule.py
and execute its main program. Let's do the same to the sniffer module's main program, and to the collection programs' as well while we're at it. After this our new module can be imported to the collection manager by adding an import:import json
import sys
import time
import sniffer
Add the new feature to the main menu:
def menu(source_file, target_file):
collection = load_collection(source_file)
print("This program manages an album collection. You can use the following features:")
print("(C)onstruct collection")
print("(A)dd new albums")
print("(E)dit albums")
print("(R)emove albums")
print("(S)how the collection")
print("(O)rganize the collection")
print("(Q)uit")
while True:
choice = input("Make your choice: ").strip().lower()
if choice == "c":
collection = construct_collection()
elif choice == "a":
add(collection)
elif choice == "e":
edit(collection)
elif choice == "r":
remove(collection)
elif choice == "s":
show(collection)
elif choice == "o":
organize(collection)
elif choice == "q":
break
else:
print("The chosen feature is not available.")
save_collection(collection, target_file)
And all that's left is writing the new function to implement the feature:
def construct_collection():
folder = input("Enter music library folder: ")
try:
collection = sniffer.read_collection(folder)
except FileNotFoundError:
print("Folder not found")
return collection
In this implementation the constructed collection replaces the one loaded at startup. Now that there's finally a bit more stuff in the collection we can change the prins per page back to its initial value of 20. Final code files below:
You get to practice making your own
modules
in the exercises. What about the final projec, should you split it into modules? Normally no. The amount of code in the project is relatively small and there isn't much to be gained from splitting it. Of course you are allowed to split it if you feel like it and it's clearer for you. Splitting code into modules follows the same rules as many other things: whatever you do, do it consistently. Code can be split into modules thematically, e.g. minestomper can have a separate module for the main menu, and another for each main feature.Graphical Brilliance¶
Let's be real here: text-based terminal programs are a bit too 80s. In these modern times we should at least try to do something that opens in its own window, and can be poked at with a mouse or touchscreen. As our finishing touch we'll move the collection manager from the terminal to a windowed interface. The things in this section aren't really elementary in any way or shape, but a lot of modern programming is based on some of the concepts we're about to learn. Furthermore, modern tools for making graphics are so cool that even with just a scratch to the surface one can do pretty impressive stuff.
As usual, we're not diving in head first without a plan. Reaching for the moon from the sky might be a bit beyond us so we'll rather be content with the most straightforward way to transform the current text-based interface to a windowed one. All main features should be accessible from buttons in the main window, and submenus should open in separate windows as needed. The collection can be seen in a table or text box in the main window. We need a
library
that can offer these basic UI features.One thing to keep in mind when reading through this example is that it's more complex than the minimum requirements for the course project - you'll get away with less.
Learning goals: In this section you'll learn the very bare minimum about how modern(-ish) user interface libraries work. In particular this includes the inner workings of
handler functions
and how to share information between them. In addition you'll learn how to use one of the graphical user interface libraries that we made specifically for this course. The other is introduced in the last exercise example.Library Tour¶
At this stage we'd normally do some research about which library is the best for what we planned. However, we don't really have enough programming experience yet to make a reasonable evaluation, so we'll skip that. Python comes with the TKinter library that's older than stone weapons and produces interfaces that are uglier than a salty Dota player's behavior at 3 am, but it does offer all the basics for graphical user interfaces in a relatively simple manner. Not simple enough that we'd go through it in this context though. Instead, we've made a module that simplifies some of TKinter's features into several
functions
that are a bit easier to comprehend. The code has also been documented rather extensively with docstrings
This is the same library as the one used in Spectral Matters and Run, Circuit! Run! course projects, but we removed the matplotlib connections so that minestompers don't need to install it just to test these examples. We'll only cover the parts of the library that are actually needed here, figuring out the rest is left as homework while doing the course project.
Callback to Wonderland¶
We used
callbacks
with minimal explanation when working on sorting lists
. To recap, we were able to give the sort method
a function
as an argument, and that function was used during sorting to derive comparison values
from the lists's items
. We used this power to choose what property of the list's items was used for sorting, like this:collection.sort(key=choose_length, reverse=reverse)
The special thing here is that we're not calling the choose_length function at any point in our own code - it is called when the program's control has been temporarily handed over to the sort method instead. Giving the function - not its return value - as an argument here instructs the sort method that this is the function it should call when it needs to obtain comparison values. Because the function is called from the sort method, the arguments given to it are also determined there. This means they're not in our control, and we also cannot control what is done with the returned value. When implementing a callback function it's very important to research what arguments are given to the function and what is done with its return value. This information tells us the number of
parameters
and their types
, as well as what the function should return
. The function in our example had exactly one parameter (one item from a list) and it returned exactly one value:def choose_length(album):
return album["length"]
Reviewing this is important because the same mechanism is found in user interface and game libraries, but the scale is larger. Typically the main loop that runs the entire program is somewhere deep inside the library. In our program the main loop is currently the
while True:
loop in the menu function, and something like this will not be seen in our code once it's changed to work with an inteface library. In a way the program's flow is outside our control. The reason is quite plain: the main loop needs to react to every interaction the user has with the interface, and that results in quite a lot of code. Writing all this code by ourselves isn't particularly pleasant, and probably the library already does it better.Of course this leaves us wondering about how to implement anything at all if the program's control has been removed from our hands. This is where callbacks come in, and in this context they're also called
handler functions
. Before starting the main loop, we can tell the library what kinds of events
are interesting to us. Event means something happening, like the user interacting with the interface. We can attach handler functions to these events. Whenever the designated event occurs, the attached handler is called.With user interface libraries it's common that each active user interface component has its own handler. This is set up when the components are defined. For instance, if we want a button, we can attach a handler to it when it is created, and this handler will be called by the library when the user clicks the button. Our simple library has this set up so that there's only one function for creating a button, and it takes three arguments:
- frame component where the button will be placed (see next section)
- the text on the button (string)
- function that will be the handler
Below is a high level description of what happens when a program using an interface library is started up.
- our program defines its user interface components along with their handlers
- our program calls the function that starts the library's main loop
- the library follows the user's actions
- the user does something that is interesting to our program (i.e. there's an eventthat we attached ahandlerto)
- the library calls the handler function, effectively returning control to our program
- after the handler function returns, control moves back to the library
- a handler function in our program calls a function that exits the library's main loop, and control is returned to our program
- our program can perform cleanup before exiting (e.g. saving data)
- our program exits
Interface Simulator¶
Before moving on, it's best to look at the following approximation of a user interface library in order to get a better understanding of how they work in general. The code presented here several multitudes less complex than a real user interface library but works with a similar logic. The functions of the library can be used to define buttons, and it can be started. Once running, it will detect "clicks" (that are produced as random integer pairs instead of actually reading the mouse), and performs actions using the buttons' handler functions if a click hits one of them.
The single most important thing about this approximation is that you could replace it with the guilib library at any time - the function interfaces are exactly the same. In order to understand how user interface libraries work in general, we need to look at two specific functions in this approximation, and how those functions would be used in a program. In order to do that, we've also created the following small program that creates an interface with a few buttons.
Executing the code does not create a real interactive window because it only simulates what an actual library would do. Instead, you will see dots printed into the
terminal
, and occasionally either "donkey" or "hemulen". Each dot is a single mouse click, and the appearance of a word means a button was pressed by a click. The program will also end when the quit button is hit. A run of the program could look like this:..................................donkey ............hemulen ..donkey ......................................................hemulen ..................donkey ..so long, and thanks for all the fish
Because the simulator's code is much more simple, causal relations are easier to follow. Let's look at two functions in particular. Our goal is to understand why the program itself (i.e. librarytest.py) works like it does. The first half of the puzzle is the interface layout. From the program's viewpoint, an interface consists of frames (columns) and buttons (rows inside columns). The program can create frames, and push buttons into them. On the library side this is handled by the
create_button
function.def create_button(frame, label, action):
left = window.index(frame) * BUTTON_WIDTH
right = left + BUTTON_WIDTH
top = len(frame) * BUTTON_HEIGHT
bottom = top + BUTTON_HEIGHT
frame.append({
"left": left,
"right": right,
"top": top,
"bottom": bottom,
"label": label,
"action": action
})
This function calculates the position of each button inside the window, and saves the x and y values of its edges to a
dictionary
. This marks the region of the window that belongs to the button. The width of each button is 200 units and height is 60 units. The placement is based on the frame's index
in the window list, and the amount of buttons already inside the frame. The other very important thing that's saved into the dictionary is the value of the action parameter
. This value is a function
that performs the action designated for he button. It's very important to note that the function is not called yet!On the side of the program itself, a button is created by calling the
create_button
function, after we've defined the function that will be the button's handler.def print_donkey():
print("donkey")
window = library.create_window("test")
frame = library.create_frame(window)
library.create_nappi(frame, "nappi 1", print_donkey)
Please pay attention to how the function is handled like any old
variable
: it is not called here, only handed as an argument
. The full example creates three buttons, which results in the following "interface":The buttons are regions inside the window, and clicking the mouse within that area causes the button to be pressed. This is the point where control is given to the library - to its
start
function to be precise. Here, we've removed some stuff from the full function to make it easier to see its logic:def detect_button(x, y, window):
for frame in window:
for button in frame:
if button["left"] <= x <= button["right"]:
if button["top"] <= y <= button["bottom"]:
function = button["action"]
function()
return
def start():
state["running"] = True
while state["running"]:
print(".", end="", flush=True)
mouse_x, mouse_y = read_click()
detect_button(mouse_x, mouse_y, window)
# added to prevent the program from running too fast
time.sleep(0.1)
if state["draw"]:
t.done()
The corresponding function in a real user interface library would obviously be much more complex, but ultimately it does the same things:
- read the position of a mouse click
- find if the click was inside a user interface element
- if an element is hit, its action is executed and
- search is ended
This gets repeated in a loop until the program's execution ends. In our simulator, phase 1 is handled by calling the
read_click
function that provides x, y coordinates of (imaginary) mouse clicks. Phase 2 is implemented by going through all frames and their buttons in loops, and comparing button boundaries to the click coordinates. If the point is inside a button's boundaries, the button dictionary's "action" key
is used to retrieve a reference
to a function, and then the function is called (without an argument).The key point here is to concretely show the context where handler functions are eventually called. As seen here, the function call's arguments are determined at call time (and this time there aren't any). The syntax also shows that if a variable contains a function, and parentheses are placed at the end of the variable, this results in making a
call
to the referenced function. When this happens, control temporarily returns to the actual program, inside the handler
function. In the case of the first button, this function would be:def print_donkey():
print("donkey")
Therefore donkey is printed to the terminal. The dots printed into the terminal while running the program indicate mouse clicks regardless of whether a button was hit or not. When the library is "closed" with the
quit
function, control returns to the actual program ja resumes from the line following the call to the start function call. This is why "so long, and thanks for all the fish" is printed when the program ends.print("so long, and thanks for all the fish")
The library file earlier which contains the full code also has an option to show a visualization of what happens in the window using turtle. You can activate this visualization by adding the
-d
or --draw
command line argument
when starting the program:python librarytest.py --draw
Note that button labels will not show. The topmost button prints donkey, middle prints hemulen, and the last one quits the program. You can keep the terminal visible alongside the turtle window to see what's printed with each click.
Another cool detail: if you change the import on the first line of code to import guilib instead, you can run the test code, and it will create a real interface instead. So the first line would be:
import guilib as library
Now runnin the program creates an actual window. The window geometry will be different because layouting of elements is done by the library, not the program that uses it. This is described in more detail under the next heading.
Boxes and Packing¶
Before moving on to implementing features with handler functions, we should look into how interface components are defined with code. TKinter uses a method where the interface can be divided into frames and components. A frame is sort of like a
list
in Python in that it can contain other components - including frames. Placement is based on packing against a border (although this is not the only option). When packed, a target direction is determined for a component. For instance, if the direction is up, the component tries to get as far up as possible inside the frame. Components are packed in the order they are added, which means the first added component will be closest to the border it was packed against.In general all components inside a frame should be packed to the same direction to avoid silly holes in the interface. In terms of simplicity this is exactly what our custom library does: all components inside each frame are packed against the top border. Only the packing direction of frames themselves can be changed when using our custom library. The library also hides a bunch of other placement related settings that TKinter offers, which means it limits options quite a bit. However it's not much of a loss. If you really want interfaces that look good, you should look further than TKinter. One example is PySide 2 that translates Qt, a way more powerful (but also way more complex) interface library to Python.
Shown below is a function that creates the shiny new graphical interface of our collection manager program, followed by a sceenshot what it looks like (on Linux). We've also added a quit function that serves as the
handler
for the quit button.import guilib as ui
def quit():
ui.quit()
def create_window():
window = ui.create_window("Collection Manager 0.1 alpha")
button_frame = ui.create_frame(window, ui.LEFT)
collection_frame = ui.create_frame(window, ui.LEFT)
load_button = ui.create_button(button_frame, "Load", load_collection)
construct_button = ui.create_button(button_frame, "Construct", construct_collection)
save_button = ui.create_button(button_frame, "Save", save_collection)
ui.create_horiz_separator(button_frame, 5)
add_button = ui.create_button(button_frame, "Add", add)
remove_button = ui.create_button(button_frame, "Remove", remove)
edit_button = ui.create_button(button_frame, "Edit", edit)
ui.create_horiz_separator(button_frame, 5)
quit_button = ui.create_button(button_frame, "Quit", quit)
listbox = ui.create_listbox(collection_frame)
ui.start()
if __name__ == "__main__":
#source, target = read_arguments(sys.argv)
try:
create_window()
except KeyboardInterrupt:
print("Program was interrupted, collection was not saved.")
The functions for creating buttons and other components generally say they return an
object
. As of now we save of all them to variables
in order to refer to them later. We don't actually know if we need to refer to them later though. Frames are clearly referred to inside this same function but buttons aren't. Separators aren't active components in the interface so the library doesn't even bother with returning them. Another thing worth of note is that while we can run the code at the moment, most buttons do not work (except quit). The main program's been changed to call the create_window function instead of the menu function, and we've commented out the part about reading command line arguments.Information Smuggling¶
We don't need to look far to find out why the buttons are not working. The create_button function in the library has the following to say in its
docstring
Creates a button that the user can click. Buttons work through handler functions. There must be a function in your code that is called whenever the user presses the button. This function doesn't receive any arguments. The function needs to be given to this function as its handler argument. E.g.: def donkey_button_handler(): # something happens create_button(frame, "donkey", donkey_button_handler) Buttons are always packed against the top border of their frame which means they will be stacked on top of each other. If you want to pack them in a different way, you can always use this function as an example and write your own. :param widget frame: frame that will host the buttons :param str label: text on the button :param function handler: function that is called when the button is pressed :return: returns the created button object
The
handler
doesn't receive any arguments
whereas our existing functions do expect to get some. In other words they are not fit to be used as handlers as they are. There's no reason to throw them away entirely though. For instance, load_collection still does its job perfectly well. We just need to give it the path
to the collection file in some other way. With a little bit of further investigation we can discover a promising function from the library: open_file_dialog. Let's create a new function that calls the existing load_collection function once it's received a path from the open_file_dialog function. The same can be done for the construction feature (they both beed a different selection dialog). We'll also remove the input from construct_collection and change the folder to a parameter
.def construct_collection(folder):
try:
collection = sniffer.read_collection(folder)
except FileNotFoundError:
print("Folder not found")
return collection
def open_load_window():
path = ui.open_file_dialog("Select collection file (JSON)")
collection = load_collection(path)
def open_construct_window():
path = ui.open_folder_dialog("Select music collection root folder")
collection = construct_collection(path)
This introduces another problem: the handler also cannot return anything, so how do we get the loaded/constructed collection to show up in other parts of the program? This is where the fact that
lists
and dictionaries
are mutable
becomes handy. If a mutable object
is defined in the global scope
it can be accessed in all functions. This time we use some foresight and create a dictionary. This will allow us to assign other objects that we might want to share to its keys
.components = {
"collection": []
}
As a side note, Pylint will complain about this (although we've disabled that particular warning in the checkers) because it thinks this dictionary is a
constant
since it's in the global scope. However the data contained within this object will most definitely change during program execution, so giving it an uppercase name would by misleading. We can now change the load and construct functions to assign the collection into this dictionary:def load_collection(filename):
try:
with open(filename) as source:
components["collection"] = json.load(source)
except (IOError, json.JSONDecodeError):
print("Unable to open the target file. Starting with an empty collection.")
components["collection"] = []
def construct_collection(folder):
try:
components["collection"] = sniffer.read_collection(folder)
except FileNotFoundError:
print("Folder not found")
def open_load_window():
path = ui.open_file_dialog("Select collection file (JSON)")
load_collection(path)
show(components["collection"])
def open_construct_window():
path = ui.open_folder_dialog("Select music collection root folder")
construct_collection(path)
show(components["collection"])
Since the returns were removed, the corresponding assignment of return values also had to go. We can now load or construct the collection. Now we need to make it visible in the interface. We have a function for this called add_list_row in the library, but we need to give it a listbox as an argument. Currently our listbox only exists inside the create_window function. The best way to make it available elsewhere is to put into this new dictionary we cooked up. Let's rewrite the printing functions to write into the listbox instead of the terminal.
def format_row(album, i):
return (
f"{i:2}. "
f"{album['artist']} - {album['album']} ({album['year']}) "
f"[{album['no_tracks']}] [{album['length'].lstrip('0:')}]"
)
def show(collection):
for i, album in enumerate(collection):
ui.add_list_row(components["listbox"], format_row(album, i + 1))
In order for the listbox to be available like this, it needs to be saved into the dictionary when it's created, and we can do it like this:
components["listbox"] = ui.create_listbox(collection_frame)
. A single row is formatted in its own function because we predict it might be needed for updating a row after an album has been edited. Now we can achieve a nicely printed collection inside the window.Popping Windows¶
This section contains a lot of code but not that many new concepts. The goal is to make it possible to add albums again. Since this was previously done with
text inputs
, a small legion of changes is needed. The basic concept is that pressing the Add button in the interface opens a new subwindow containing fields for album information. The album is added to the collection when this window is closed - if the fields have valid values. Otherwise we inform the user about their mistake with an error message and let them fix it.The library contains a few
functions
related to subwindows. A subwindow is a way to open another window on top of an existing window. We can place frames and components into them just like the main window. A subwindow can be hidden and showed again with functions. A good way to go about is to create the window at the beginning of the program and then hide it whenever it's not needed. We prefer this over creating the window anew every time. The window will contain text field inputs and labels related to them. The whole thing is created in the original create_window function.def create_window():
# Main window creation
window = ui.create_window("Collection Manager 0.1 alpha")
button_frame = ui.create_frame(window, ui.LEFT)
collection_frame = ui.create_frame(window, ui.LEFT)
load_button = ui.create_button(button_frame, "Load", open_load_window)
construct_button = ui.create_button(button_frame, "Construct", open_construct_window)
save_button = ui.create_button(button_frame, "Save", open_save_window)
ui.create_horiz_separator(button_frame, 5)
add_button = ui.create_button(button_frame, "Add", open_add_window)
remove_button = ui.create_button(button_frame, "Remove", remove)
edit_button = ui.create_button(button_frame, "Edit", edit)
ui.create_horiz_separator(button_frame, 5)
quit_button = ui.create_button(button_frame, "Quit", quit)
components["listbox"] = ui.create_listbox(collection_frame)
# Subwindow creation
album_form = ui.create_subwindow("Album information")
field_frame = ui.create_frame(album_form)
button_frame = ui.create_frame(album_form)
label_frame = ui.create_frame(field_frame)
input_frame = ui.create_frame(field_frame)
ui.create_label(label_frame, "Artist")
components["form_artist"] = ui.create_textfield(input_frame)
ui.create_label(label_frame, "Album")
components["form_album"] = ui.create_textfield(input_frame)
ui.create_label(label_frame, "No. tracks")
components["form_no_tracks"] = ui.create_textfield(input_frame)
ui.create_label(label_frame, "Length")
components["form_length"] = ui.create_textfield(input_frame)
ui.create_label(label_frame, "Release year")
components["form_year"] = ui.create_textfield(input_frame)
ui.create_button(button_frame, "Save", save_form)
ui.hide_subwindow(album_form)
components["album_form"] = album_form
ui.start()
References to each field in the form and to the form itself are needed in the components
dictionary
so that the fields can be read in other parts of the program, and so that we can show and hide the window in the future. We also changed the handler
to a new function that opens the add dialog. Likewise a handler is created for the subwindow's Save button.def open_add_window():
ui.show_subwindow(components["album_form"])
def save_form()
ui.hide_subwindow(components["album_form"])
With these we can open and close the form and see what it looks like. The labels don't quite align with the fields, but we're not going to tune them right now.
Next we need this form to actually do something. This calls for some decision-making and planning. We've decided to use the same form for both adding and editing. We've also decided to save the album when the window is closed (when else?) This means we need to know what purpose the form was opened for, and smuggle this information to the save_form function. We can use the same mechanism as we use for accessing the collection list from everywhere in the program: save this information into the global
dictionary
. While at it we're also going to separate this and the collection list into a second dictionary, and leave the components dictionary only for interface component references.NOT_SELECTED = 0
ADD = 1
EDIT = 2
components = {
"listbox:" None,
"album_form": None,
"form_artist": None,
"form_album": None,
"form_no_tracks": None,
"form_length": None,
"form_year": None
}
state = {
"collection": [],
"action": NOT_SELECTED
}
We've implemented the actions with
constants
. The numeral values of these constants don't matter at all but they're just more practical than strings
, let alone plain number. We've also put None as the value for each key
. This is not mandatory but we've done it in order to show at the very beginning of the code what keys will be available in this dictionary. Using the action information in the state dictionary we can now proceed with the album form.def save_form():
if state["action"] == ADD:
success = add(state["collection"])
place = len(state["collection"]) - 1
elif state["action"] == EDIT:
success = edit(state["collection"])
else:
return
if success:
ui.add_list_row(
components["listbox"],
format_row(state["collection"][place], place + 1),
place
)
ui.clear_field(components["form_artist"])
ui.clear_field(components["form_album"])
ui.clear_field(components["form_no_tracks"])
ui.clear_field(components["form_length"])
ui.clear_field(components["form_year"])
ui.hide_subwindow(components["album_form"])
state["action"] = NOT_SELECTED
The action is set when the form is opened, and its value is checked when the form is closed with the Save button. We've also done some additional processing when the form is closed. We only want to close the form when the user has given valid data. We also need to clear all the fields so that their contents aren't haunting the user the next time they open the form. In case of a successful save the album must also be inserted into the listbox view. Another option would be to clear the entire listbox and then just call the show function that would display the entire collection afresh, but that involves a whole lot of wasted clock cycles. The add function itself becomes quite a bit larger:
def add(collection):
artist = ui.read_field(components["form_artist"])
album = ui.read_field(components["form_album"])
try:
no_tracks = int(ui.read_field(components["form_no_tracks"]))
except ValueError:
ui.open_msg_window("Error in data", "Number of tracks must be an integer", error=True)
return False
try:
length = check_length(ui.read_field(components["form_no_tracks"]))
except ValueError:
ui.open_msg_window("Error in data", "Length must be written as HH:MM:SS", error=True)
return False
try:
year = int(ui.read_field(components["form_year"]))
except ValueError:
ui.open_msg_window("Error in data", "Release year must be an integer", error=True)
return False
collection.append({
"artist": artist,
"album": title,
"no_tracks": no_tracks,
"length": length,
"year": year
})
return True
The main culprit to this function becoming so long is user feedback: each error opens a message popup with a different error message, and that makes all of them require their own try-except. We're now using the library's message popup feature which can be used to open notifications in popup windows. The last argument - which we've given here as a
keyword argument
for increased clarity - tells the library to show the error icon in the popup. The form's contents are read with the read_field function, and this is where we need the references to fields from the components dictionary
. This function returns the field's content as a string. Note that check_length still doesn't actually do anything, but at least we're handling it when it ultimately does.Streamlined Renovation¶
Removing albums used to be very clunky in the program: in order to select an album, the user had to type both album title and artist name. In order for our program to get on with the times it should allow choosing an album from the listbox in the interface with a simple mouse click. This is the primary reason we used a listbox instead of a plain textbox, because in a listbox each row is a clickable entity. Our library has a function for handling this: read_selected. This function returns the index and content of the selected row. The library also has a function for removing a row. With these the remove function becomes a whole lot simpler:
def remove():
index, contents = ui.read_selected(components["listbox"])
if index != None:
state["collection"].pop(index)
ui.remove_row(components["listbox"], index)
This is the first instance of using pop
method
instead of remove to remove an item
. We do this because pop removes based on index instead of value. It would also return the item it removed but we're not doing anything with it so it goes to the bin. The last line is needed to remove the album from the listbox. This leaves us with the minor problem of having a hole in the numbering after a removal. We're going to be lazy about this and "fix" the problem by removing the numbering altogether. Otherwise we'd have to reprint all rows starting from the removed index. Since we removed the collection parameter, this function can be used directly as the Remove button's handler
.The same method of album selection can be used editing. This feature will be a combination of the add feature from before, and the remove feature we just did. We borrow the editing form from the former, and the selection code from the latter. We're going to open the same subwindow as we did with add, but this time each field is prefilled with its current value. In addition the album should be shown in its old place in the listbox after editing. So once again we need to make some decisions about what happens where. The easiest place to start is opening the form.
def open_edit_window():
place = prefill_form()
ui.show_subwindow(components["album_form"])
state["action"] = EDIT
state["selected"] = place
The form must be prefilled at this stage before it is shown. This sounds like a job for a separate function. We also let that function take care of reading the selected album's place in the list (i.e. its index in the collection), and return it. Another decision we made here is saving the selected index to the state
dictionary
. This is done as a safeguard to prevent the user from choosing another album while the form window is open which would overwrite the wrong album with the edited information. The new function is:def prefill_form():
index, contents = ui.read_selected(components["listbox"])
album = state["collection"][index]
ui.write_field(components["form_artist"], album["artist"])
ui.write_field(components["form_album"], album["album"])
ui.write_field(components["form_no_tracks"], album["no_tracks"])
ui.write_field(components["form_length"], album["length"])
ui.write_field(components["form_year"], album["year"])
return index
Now we can open the form and see the existing values in all fields.
The save button handler already exists but the guess we made about how to handle saving an edit wasn't entirely accurate. Let's add some things to it.
def save_form():
if state["action"] == ADD:
success = add(state["collection"])
place = len(state["collection"]) - 1
elif state["action"] == EDIT:
place = state["selected"]
success = edit(state["collection"], place)
if success:
ui.remove_list_row(components["listbox"], place)
state["selected"] = None
else:
return
if success:
ui.add_list_row(
components["listbox"],
format_row(state["collection"][place], place + 1),
place
)
ui.clear_field(components["form_artist"])
ui.clear_field(components["form_album"])
ui.clear_field(components["form_no_tracks"])
ui.clear_field(components["form_length"])
ui.clear_field(components["form_year"])
ui.hide_subwindow(components["album_form"])
state["action"] = NOT_SELECTED
As seen here we chose to read the place from the state dictionary's "selected"
key
that was set when the form was opened. The edit itself is done by the edit function. If it reports a successful edit, the old row is removed from the box so that we can write the updated row in its place. Adding the row into the listbox and cleanup didn't change, so we did pretty well on that part. All that's left is changing the edit function.def read_form(album):
album["artist"] = ui.read_field(components["form_artist"])
album["album"] = ui.read_field(components["form_album"])
try:
album["no_tracks"] = int(ui.read_field(components["form_no_tracks"]))
except ValueError:
ui.open_msg_window("Error in data", "Number of tracks must be an integer", error=True)
return None
try:
album["length"] = check_length(ui.read_field(components["form_no_tracks"]))
except ValueError:
ui.open_msg_window("Error in data", "Length must be written as HH:MM:SS", error=True)
return None
try:
album["year"] = int(ui.read_field(components["form_year"]))
except ValueError:
ui.open_msg_window("Error in data", "Release year must be an integer", error=True)
return None
return album
def edit(collection, index):
album = read_form(collection[index].copy())
if album:
collection[index] = album
return True
return False
def add(collection):
album = read_form({})
if album:
collection.append(album)
return True
return False
Because both adding and editing need similar form reading, it was refactored into its own function. That's why we're also showing how the add function was changed from what it was. And that closes the chapter on our collection manager. Sorting features were left out from this version because we just wanted to show how to tie functions to interface elements, and how to pass data and state information between different parts of the program. The old sorting function was left in the code as an example, and it can be fairly easily converted to work with the new interface. One way is to make a button for each column that sorts the collection based on that column, and reverses the order if pressed again.
The final file that's been prettified a bit with Pylint (e.g. we removed unused variables from window creation because the buttons ended up not being referenced).
The Very Final Words¶
Four long materials later we've come from basic functionality of a calculator to programs that approach magic. With just a couple of clicks the program can find all albums from the computer's music library on the hard drive - a feature that exists in modern music player applications as well. While doing all this we discovered that thousands of lines of code were not needed, and we didn't even need anything particularly difficult (relative, admittedly). It's all a matter of putting together small pieces one after the other, creating a much larger whole. The result may feel like magic, but ultimately it was just a clean, systemic process. The most important thing is to not tackle too big of a problem at once.
Of course when reading the examples it may seem that answers are found a bit too easily. However, it's not really about how quickly the answers are found - it's more about asking small enough questions. That way the answers will also be small, and the code will become orgnized in a manageable way. If you look at the code examples, the longest function is only 40 lines, and all individual features of the program are made with quite simple structures. Of course it's possible to go deeper and create more complex code, but why bother? Complex code can be entertaining to write in itself, but in efficient programming simplicity is a virtue.
The things in this material were mostly to fill some holes that were not covered by the first three. It is quite hard to do anything productive without using modules. However, Python's built-in modules can go quite far. But when you run out of means, it's good to keep in mind that someone else has probably already solved your problem, unless it is very specific. Even those cases are usually just special cases for a more generic problem that someone has solved. This was seen in the examples: we found existing solutions for both data storage and music file metadata reading - one inside Python, and we didn't have to go that far to find the other either.
At the very end we saw a glimpse of programming that's a bit more modern than terminal programs. Admittedly we went beyond basics in this section, but it's hard to get far in modern programming without knowing anything about handler functions and their friends. Then again, if you know just a little bit about them, a lot of doors are opened - modern libraries can do really impressive things with the most basic knowledge in relatively short time. Overall your creativity is not hindered nearly as much with implementation details as it would be if you tried to do everything yourself.
After the last material fades away all that's left are the final exercises, and of course the course project. At this point you possess everything you need to complete the project, and some initial planning has already been done. With a systematic approach the course project should not be that herculean of a task. It can definitely take some effort, especially if some answers don't come to you immediately. But as long as you don't tackle too big problems at once, you shouldn't trip yourself, and even the slowest progress becomes progress. Just follow the plan, one small piece at a time, and remember to test your program at every turn to make sure it does what you thought it should do. If you always make progress, conquering the world is just around the corner.
Image Sources¶
- original license: public domain (caption added)
- original license: CC-BY-NC 2.0 (caption added)
- original license: CC-BY-NC 2.0 (caption added)
- original license: CC-BY 2.0 (caption added)
- original license: public domain (caption added)
Give feedback on this content
Comments about this material