The Statistics and Calculus with Python Workshop
上QQ阅读APP看书,第一时间看更新

Data Structures

Data structures are types of variables that represent different forms of information that you might want to create, store, and manipulate in your program. Together with control flow methods, data structures are the other fundamental building block of any programming language. In this section, we will go through some of the most common data structures in Python, starting with strings.

Strings

Strings are sequences of characters that are typically used to represent textual information (for example, a message). A Python string is denoted by any given textual data inside either single- or double-quotation marks. For example, in the following code snippet, the a and b variables hold the same information:

a = 'Hello, world!'

b = "Hello, world!"

Since strings are roughly treated as sequences in Python, common sequence-related operations can be applied to this data structure. In particular, we can concatenate two or more strings together to create a long-running string, we can iterate through a string using a for loop, and inpidual characters and substrings can be accessed using indexing and slicing. The effects of these operations are demonstrated in the following code:

>>> a = 'Hello, '

>>> b = 'world!'

>>> print(a + b)

Hello, world!

>>> for char in a:

...     print(char)

H

e

l

l

o

,

 # a blank character printed here, the last character in string a

>>> print(a[2])

l

>>> print(a[1: 4])

ell

One of the most important features that was added in Python 3.6 was f-strings, a syntax to format strings in Python. Since we are using Python 3.7, we can avail this feature. String formatting is used when we would like to insert the value of a given variable into a predefined string. Before f-strings, there were two other formatting options, which you may be familiar with: %-formatting and str.format(). Without going into too much detail, these two methods have a few undesirable characteristics, and f-strings was therefore developed to address those problems.

The syntax for f-strings is defined with curly brackets, { and }. For example, we can combine the printed value of a variable using an f-string as follows:

>>> a = 42

>>> print(f'The value of a is {a}.')

The value of a is 42.

When a variable is put inside the f-string curly brackets, its __str__() representation will be used in the final printed output. This means you can obtain further flexibility with f-strings by overwriting and customizing the dunder method, __str__(), while working with Python objects.

Common numeric formatting options for strings such as specifying the number of digits after the decimal or datetime formatting can be done in f-strings using the colon, as demonstrated here:

>>> from math import pi

>>> print(f'Pi, rounded to three decimal places, is {pi:.3f}.')

Pi, rounded to three decimal places, is 3.142.

>>> from datetime import datetime

>>> print(f'Current time is {datetime.now():%H:%M}.')

Current time is 21:39.

Another great thing about f-strings is that they are faster to render and process than the other two string formatting methods. Next, let's discuss Python lists.

Lists

Lists are arguably the most used data structure in Python. It is Python's own version of an array in Java or C/C++. A list is a sequence of elements that can be accessed or iterated over in order. Unlike, say, Java arrays, elements in a Python list do not have to be of the same data structure, as demonstrated here:

>>> a = [1, 'a', (2, 3)] # a list containing a number, a string, and a tuple

Note

We'll talk more about tuples in the next section.

As we have discussed previously, elements in a list can be iterated over in a for loop in a similar way as characters in a string. Lists can also be indexed and sliced in the same way as strings:

>>> a = [1, 'a', (2, 3), 2]

>>> a[2]

(2, 3)

>>> a[1: 3]

['a', (2, 3)]

There are two ways to add new elements to a Python list: append() inserts a new single element to the end of a list, while list concatenation simply concatenates two or more strings together, as shown here:

>>> a = [1, 'a', (2, 3)]

>>> a.append(3)

>>> a

[1, 'a', (2, 3), 3]

>>> b = [2, 5, 'b']

>>> a + b

[1, 'a', (2, 3), 3, 2, 5, 'b']

To remove an element from a list, the pop() method, which takes in the index of the element to be removed, can be used.

One of the operations that make Python lists unique is list comprehension: a Pythonic syntax to efficiently initialize lists using a for loop placed inside square brackets. List comprehension is typically used when we want to apply an operation to an existing list to create a new list. For example, say we have a list variable, a, containing some integers:

>>> a = [1, 4, 2, 9, 10, 3]

Now, we want to create a new list, b, whose elements are two times the elements in a, in order. We could potentially initialize b as an empty list and iteratively loop through a and append the appropriate values to b. However, with list comprehension, we can achieve the same result with a more elegant syntax:

>>> b = [2 * element for element in a]

>>> b

[2, 8, 4, 18, 20, 6]

Furthermore, we can even combine conditionals inside a list comprehension to implement complex logic in this process of creating Python lists. For example, to create a list of twice the elements in a that are odd numbers, we can do the following:

>>> c = [2 * element for element in a if element % 2 == 1]

>>> c

[2, 18, 6]

Another Python data structure that is very often contrasted with list is tuple, which we will discuss in the next section. However, before moving forward, let's go through an exercise on a new concept: multi-dimensional lists/arrays.

Multi-dimensional arrays, also known as tables or matrices (and sometimes tensors), are common objects in the field of mathematics and machine learning. Given the fact that elements in a Python list can be any Python objects, we can model arrays that span more than one dimension using lists in a list. Specifically, imagine that, within an overarching Python list, we have three sublists, each having three elements in it. This object can be thought of as a 2D, 3 x 3 table. In general, we can model n-dimensional arrays using Python lists that are nested inside other lists n times.

Exercise 1.03: Multi-Dimensional Lists

In this exercise, we will familiarize ourselves with the concept of multi-dimensional lists and the process of iterating through them. Our goal here is to write logic commands that dynamically display the content of a 2D list.

Perform the following steps to complete this exercise:

  1. Create a new Jupyter notebook and declare a variable named a in a code cell, as follows:

    a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

    This variable represents a 3 x 3 2D table, with the inpidual sublists in the list representing the rows.

  2. In a new code cell, iterate through the rows by looping through the elements in list a (do not run the cell just yet):

    for row in a:

  3. At each iteration in this for loop, a sublist in a is assigned to a variable called row. We can then access the inpidual cells in the 2D table by indexing the inpidual rows. The following for loop will print out the first element in each sublist, or in other words, the number in the first cell of each row in the table (1, 4, and 7):

    for row in a:

        print(row[0])

  4. In a new code cell, print out the values of all the cells in table a by having a nested for loop, whose inner loop will iterate through the sublists in a:

    for row in a:

        for element in row:

            print(element)

    This should print out the numbers from 1 to 9, each in a separate row.

  5. Finally, in a new cell, we need to print out the diagonal elements of this table in a nicely formatted message. To do this, we can have an indexing variable — i, in our case — loop from 0 to 2 to access the diagonal elements of the table:

    for i in range(3):

        print(a[i][i])

    Your output should be 1, 5, and 9, each in a separate row.

    Note

    This is because the row index and the column index of a diagonal element in a table/matrix are equal.

  6. In a new cell, change the preceding print statements using f-strings to format our printed output:

    for i in range(3):

        print(f'The {i + 1}-th diagonal element is: {a[i][i]}')

    This should produce the following output:

    The 1-th diagonal element is: 1

    The 2-th diagonal element is: 5

    The 3-th diagonal element is: 9

In this exercise, we have combined what we have learned about loops, indexing, and f-string formatting to create a program that dynamically iterates through a 2D list.

Note

To access the source code for this specific section, please refer to https://packt.live/3dRP8OA.

You can also run this example online at https://packt.live/3gpg4al.

Next, we'll continue our discussion about other Python data structures.

Tuples

Declared with parentheses instead of square brackets, Python tuples are still sequences of different elements, similar to lists (although the parentheses can be omitted in assignment statements). The main difference between these two data structures is that tuples are immutable objects in Python—this means they cannot be mutated, or changed, in any way after their initialization, as shown here:

>>> a = (1, 2)

>>> a[0] = 3 # trying to change the first element

Traceback (most recent call last):

    File "<stdin>", line 1, in <module>

TypeError: 'tuple' object does not support item assignment

>>> a.append(2) # trying to add another element

Traceback (most recent call last):

    File "<stdin>", line 1, in <module>

AttributeError: 'tuple' object has no attribute 'append'

Given this key difference between tuples and lists, we can utilize these data structures accordingly: when we want a sequence of elements to be immutable for any number of reasons (for example, to ensure the logical integrity functions), a tuple can be used; if we allow the sequence to be able to be changed after its initialization, it can be declared as a list.

Next, we will be discussing a common data structure in mathematical computing: sets.

Sets

If you are already familiar with the mathematical concept, the definition of a Python set is essentially the same: a Python set is a collection of unordered elements. A set can be initialized with curly brackets, and a new element can be added to a set using the add() method, like so:

>>> a = {1, 2, 3}

>>> a.add(4)

>>> a

{1, 2, 3, 4}

Since a set is a collection of Python elements, or in other words, an iterator, its elements can still be iterated over using a for loop. However, given its definition, there is no guarantee that those elements will be iterated in the same order as they are initialized in or added to the set.

Furthermore, when an element that already exists in a set is added to that set, the statement will have no effect:

>>> a

{1, 2, 3, 4}

>>> a.add(3)

>>> a

{1, 2, 3, 4}

Taking the union or the intersection of two given sets are the most common set operations and can be achieved via the union() and intersection() methods in Python, respectively:

>>> a = {1, 2, 3, 4}

>>> b = {2, 5, 6}

>>> a.union(b)

{1, 2, 3, 4, 5, 6}

>>> a.intersection(b)

{2}

Finally, to remove a given element from a set, we can use either the discard() method or the remove() method. Both remove the item passed to them from a set. However, if the item does not exist in the set, the former will not mutate the set, while the latter will raise an error. Just like tuples and lists, you can choose to use one of these two methods in your program to implement specific logic, depending on your goal.

Moving on, the last Python data structure that we will be discussing in this section is dictionaries.

Dictionaries

Python dictionaries are the equivalent of hash maps in Java, where we can specify key-value pair relationships and perform lookups on a key to obtain its corresponding value. We can declare a dictionary in Python by listing out key-value pairs in the form of key: value, separated by commas inside curly brackets.

For example, a sample dictionary containing students' names mapped to their final scores in a class may look as follows:

>>> score_dict = {'Alice': 90, 'Bob': 85, 'Carol': 86}

>>> score_dict

{'Alice': 90, 'Bob': 85, 'Carol': 86}

In this case, the names of the students ('Alice', 'Bob', and 'Carol') are the keys of the dictionary, while their respective scores are the values that the keys are mapped to. A key cannot be used to map to multiple different values. The value of a given key can be accessed by passing the key to the dictionary inside square brackets:

>>> score_dict['Alice']

90

>>> score_dict['Carol']

86

>>> score_dict['Chris']

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

KeyError: 'Chris'

Note that in the last statement in the preceding snippet, 'Chris' is not a key in the dictionary, so when we attempt to access its value, KeyError is returned by the Python interpreter.

Changing the value of an existing key or adding a new key-value pair to an existing dictionary can be done using the same syntax:

>>> score_dict['Alice'] = 89

>>> score_dict

{'Alice': 89, 'Bob': 85, 'Carol': 86}

>>> score_dict['Chris'] = 85

>>> score_dict

{'Alice': 89, 'Bob': 85, 'Carol': 86, 'Chris': 85}

Similar to list comprehension, a Python dictionary can be declared using dictionary comprehension. For instance, the following statement initializes a dictionary mapping integers from -1 to 1 (inclusively) to their respective squares:

>>> square_dict = {i: i ** 2 for i in range(-1, 2)}

>>> square_dict

{-1: 1, 0: 0, 1: 1}

As we can see, this dictionary contains the key-value pairs xx ** 2 for every x between -1 and 1, which was done by placing the for loop inside the initialization of the dictionary.

To delete a key-value pair from a dictionary, we would need to use the del keyword. Say we would like to delete the 'Alice' key and its corresponding value. We would do this like so:

>>> del score_dict['Alice']

Attempting to access a deleted key will cause the Python interpreter to raise an error:

>>> score_dict['Alice']

KeyError: 'Alice'

One of the most important aspects of Python dictionaries to keep in mind is the fact that only immutable objects can be dictionary keys. In the examples so far, we have seen strings and numbers as dictionary keys. Lists, which can be mutated and changed after initialization, cannot be used as dictionary keys; tuples, on the other hand, can.

Exercise 1.04: Shopping Cart Calculations

In this exercise, we will use the dictionary data structure to build a skeletal version of a shopping application. This will allow us to review and further understand the data structure and the operations that can be applied to it.

Perform the following steps to complete this exercise:

  1. Create a new Jupyter notebook and declare a dictionary representing any given items available for purchase and their respective prices in the first code cell. Here, we'll add three different types of laptops with their prices in dollars:

    prices = {'MacBook 13': 1300, 'MacBook 15': 2100, \

              'ASUS ROG': 1600}

    Note

    The code snippet shown here uses a backslash ( \ ) to split the logic across multiple lines. When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.

  2. In the next cell, initialize a dictionary representing our shopping cart. The dictionary should be empty at the beginning, but it should map an item in the cart to how many copies are to be purchased:

    cart = {}

  3. In a new cell, write a while True loop that represents each step of the shopping process and asks the user whether they would like to continue shopping or not. Use conditionals to handle different cases of the input (you can leave the case where the user wants to continue shopping until the next step):

    while True:

        _continue = input('Would you like to continue '\

                          'shopping? [y/n]: ')

        if _continue == 'y':

            ...

        elif _continue == 'n':

            break

        else:

            print('Please only enter "y" or "n".')

  4. Inside the first conditional case, take in another user input to ask which item should be added to the cart. Use conditionals to increment the count of the item in the cart dictionary or handle invalid cases:

        if _continue == 'y':

            print(f'Available products and prices: {prices}')

            new_item = input('Which product would you like to '\

                             'add to your cart? ')

            if new_item in prices:

                if new_item in cart:

                    cart[new_item] += 1

                else:

                    cart[new_item] = 1

            else:

                print('Please only choose from the available products.')

  5. In the next cell, loop through the cart dictionary and calculate the total amount of money the user has to pay (by looking up the quantity and price of each item in the cart):

    # Calculation of total bill.

    running_sum = 0

    for item in cart:

        running_sum += cart[item] * prices[item] # quantity times price

  6. Finally, in a new cell, print out the items in the cart and their respective amount in different lines via a for loop and at the end the total bill. Use an f-string to format the printed output:

    print(f'Your final cart is:')

    for item in cart:

        print(f'- {cart[item]} {item}(s)')

    print(f'Your final bill is: {running_sum}')

  7. Run the program and experiment with different carts to ensure our program is correct. For example, if you were to add two MacBook 13s and one ASUS ROG to my shopping cart and stop, the corresponding output would be as follows:

Figure 1.2: Output of the shopping cart application

And that concludes our shopping cart exercise, through which we have familiarized ourselves with the use of dictionaries to look up information. We have also reviewed the use of conditionals and loops to implement control flow methods in a Python program.

Note

To access the source code for this specific section, please refer to https://packt.live/2C1Ra1C.

You can also run this example online at https://packt.live/31F7QXg.

In the next section, we will discuss two integral components of any complex program: functions and algorithms.