Let's discuss a few features of sorting in python. This is something we will treat as a "black box" for now; that is, we'll just use it as a tool given to us without understanding the implementation details. But later we will go over the details later of how these algorithms work.
First, let's create a list of 50 random numbers between 0 and 99 using numpy
import numpy as np
np.random.seed(0)
arr = np.random.randint(0, 100, 50)
print(arr)
[44 47 64 67 67 9 83 21 36 87 70 88 88 12 58 65 39 87 46 88 81 37 25 77 72 9 20 80 69 79 47 64 82 99 88 49 29 19 19 14 39 32 65 9 57 32 31 74 23 35]
Sorting them in python is as simple as saying
arr_sorted = sorted(arr)
print(arr_sorted)
[9, 9, 9, 12, 14, 19, 19, 20, 21, 23, 25, 29, 31, 32, 32, 35, 36, 37, 39, 39, 44, 46, 47, 47, 49, 57, 58, 64, 64, 65, 65, 67, 67, 69, 70, 72, 74, 77, 79, 80, 81, 82, 83, 87, 87, 88, 88, 88, 88, 99]
So what's the big deal? Well, things can get a little fancier if we are sorting a list of objects. For example, suppose we had an object which encapsulated both a string and a number. To keep things simple, we'll make each name a 3-character string with only the characters "a" "b" and "c". I'll go ahead and make a random list of such objects now
class NumStr:
def __init__(self):
"""
Randomly initialize a number and a string
"""
self.num = np.random.randint(1000)
self.name = "".join([["a", "b", "c"][i] for i in np.random.randint(0, 3, 3)])
def __str__(self):
"""
Analagous to Java's toString() function
"""
return "{}:{}".format(self.num, self.name)
np.random.seed(0)
arr = [NumStr() for i in range(100)]
For example, here are the first two elements
print(arr[0], arr[1])
684:bab 723:bca
But what happens when we try to sort these?
sorted(arr)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_26238/2712916210.py in <module> ----> 1 sorted(arr) TypeError: '<' not supported between instances of 'NumStr' and 'NumStr'
Uh oh! The issue is, we haven't defined how to sort objects of type "NumStr". What we have to do is define a special function to say how to sort. This function should return something about the object which is comparable (e.g. ints by value or strings in alphabetical order). Let's define two functions for our object: one that returns the name and one that returns the number
def get_name(obj):
return obj.name
def get_num(obj):
return obj.num
Now, we can pass along these functions as parameters to the sorted
function. For example, let's sort them by number and print out the first 10.
arr_sorted = sorted(arr, key=get_num)
for x in arr_sorted[0:10]:
print(x)
25:aab 32:cac 43:aca 43:bbb 53:aca 73:bcc 80:cab 88:bba 98:ccc 106:bab
Let's take this result and now sort it by name
arr_sorted2 = sorted(arr_sorted, key=get_name)
for x in arr_sorted2[0:10]:
print(x)
347:aaa 593:aaa 606:aaa 964:aaa 25:aab 128:aab 248:aab 715:aab 777:aab 782:aab
Sharp eyes will notice that within the ties, each chunk is sorted by number (e.g. for everything with a name aaa, they come out 347, 593, 606, and 964). This is because python's sorted uses what's known as a stable sort. We won't immediately need this, but it's a good thing to know and keep in our back pocket
One more thing to mention is that the methods we passed to sorted
were incredibly simple. There's actually a short hand for these types of one line method definitions in python known as an anonymous function. Below is how you could use this to sort by number again
arr_sorted = sorted(arr, key=lambda x: x.num)