Ordered Collections - Sequence
Python provides three data structures for ordered collection, also called sequence; Lists, Tuples and Ranges.
In this lesson we will learn all three.
Lists are used for representing ordered collection of mutable objects - both compound and primitive. Here is an example of a collection of numbers called scores.
scores = [50, 80, 90, 100]
Notice the square bracket used to represent the collection. A list can also be created by using list function. In this case the above statement would be
scores = list([50, 80, 90, 100])
Here are some of the commonly used operations on lists:
|Positive positional index||scores||50||first position in the list has a value:50|
|Negative positional index||scores[-1]||100||first position from the end.|
|Slicing||scores[0:2]||[50, 80]||new list with numbers from position 0 (included) to 2 (excluded)|
|Slicing with start position default||scores[:2]||[50, 80]||new list with numbers from from 0 index and 2nd position (excluded)|
|Slicing with end position default||scores[2:]||[90, 100]||new list with numbers from position 2 to end of the list|
|Build in function len||len(scores)||4||len can be applied to lists to obtain the size of the list|
|Modify an element|| |
scores = 60
[60, 80, 90, 100]
|lists can be modified unlike strings|
|Add new element|| |
[60, 80, 90, 100, 90]
|adds a new element at the end of the list|
|Use slice to replace more than one element|| |
[60, 80, 89, 99, 90]
|numbers in 2nd and 3 positions are replaced|
|Use slice to remove all elements|| |
|||removes all elements of the list!|
|Use slice with a step value||scores[::-1]||[90, 99, 89, 80, 60]||using -1 as step value reverses the list.|
|List of lists|| |
scores = [50, 60]
[[50, 60], ['joe', 'john']]
|c is a list of lists|
|Sort a list in place|| |
scores = [60, 30, 80]
[30, 60, 80]
|scores is sorted in place using the default sort algorithm - sort from smallest to largest. Can change the default by passing in optional 'key' and 'reverse' attributes to sort function.|
Few other useful methods: There are a few other interesting methods on lists. Replace 'list' below with the name of your list.
- list.count(x) - returns the number of times number x appears in the list
- list.index(x[, start[,end]]) - return index value of the first item whose value is x. Raises ValueError if no such item. Optional start and end are used to limit the search similar to slice.
- list.insert(index,element) - to insert the specified element at the specified index
- list.remove(index) - to remove an element at the specified index
- list.pop([index]) - removes the last element if no index is specified, otherwise it gets the specified index element and removes it from the list.
Python provides rich built-in functions which can be applied on list structures. In the first lesson you calculated mean of 3 individual numbers. If you create a list object of the same three numbers, then you can use the sum function on the list to calculate the sum of all the list numbers and use len function to calculate the total number of elements in the list. Then the mean of all the numbers in the list can be obtained as shown below:
my_numbers = [15, 35, 55] mean = sum(my_numbers) / len(my_numbers) print(mean)
You can also apply sorted built in function on any list. This will return a sorted list and the original list is untouched. This is different from the sort function which can be applied on a list, which sorts the list in place. The default sort order is small to large. You can change this default sort order by passing in the 'key' and 'reverse' attributes to the sorted function. More on this: https://docs.python.org/3/howto/sorting.html
Complete list of built-in functions can be found at: https://docs.python.org/3/library/functions.html
Tuples are immutable collection of objects. As a result, tuple elements cannot be modified, deleted or inserted. Tuples are used to represent a collection which may have heterogeneous values. Tuples are used to represent objects, which make sense as a whole, when all elements are together.
For example, in a class, if you have to represent the complete details of a student who scored the highest marks in the class, you could represent the student record as a tuple as shown below:
top_student = ('jane', 'doe', 99, 21, 'f')
This tuple contains values for the student's first name, last name, total score, age and gender. Collection of items is a mix of String and numbers, and so the term heterogeneous, but together they make a record for one student. Since the values, though heterogeneous, represent one student with the highest score, you make a tuple representing the record.
Operation on tuples are similar to lists except, the modification operations are not allowed. Here is the run-down
|Positive positional index||top_student||'jane'||first position in the tuple|
|Negative positional index||top_student[-1]||'f'||first position from the end.|
|Slicing||top_student[1:3]||('doe', 99)||new tuple with numbers from position 1 (included) to 3 (excluded)|
|Slicing with start position default||top_student[:2]||('jane', 'doe')||new tuple with numbers from from 0 index and 2nd position (excluded)|
|Slicing with end position default||top_student[2:]||(99, 21, 'f')||new tuple with numbers from position 2 to end of the tuple|
|Build in function len||len(top_student)||5||len can be applied to tuples also to obtain the size of the tuple|
Unpack a tuple
Elements of a tuple can be unpacked into individual variables. This is very useful when you want to unpack all the elements at once. The below code unpacks the top student's individual values into separate values:
top_student = ('jane', 'doe', 99, 21, 'f') first_name, last_name, score, age, gender = top_student print(first_name)
Range type represents an immutable sequence of numbers mostly used in for loops for looping a specific number of times.
start and step are optional.
Loop starts from this value. 0 is the default when not supplied
Loop steps through with in increments with the value defined in step
The loop execution stops when incremented value matches stop. Once it matches the loop is not executed.
Here are a few examples:
|list(range(5))||[0,1,2,3,4]||Total number is 5 but the last index number is 4|
|list(range(1, 5))||[1,2,3,4]||Instead of starting at default 0, it starts at 1 since start number is given|
|list(range(0, 20, 5))||[0,5,10,15]||Since step is given, start should be given. Steps in increment of 5 - step value|
list(range(0, -5, -1))
|[0,-1,-2,-3,-4]||Same rule applies for negative numbers as well|
Points to note
- Tuples are very popular in Data Analytics and should be understood thoroughly.
- Tuples are more efficient than lists. When ever the elements do not change, you should use tuples.
- Lists are not as popular for Data Analytics as NumPy Arrays or Pandas Series. NumPy and Pandas are Python libraries heavily used in Data Analytics. We use NumPy arrays or Pandas Series in place of Python lists because these data structures are more efficient for handling large datasets and also provides many convenient functions to apply for analytics.