1

Sorting a Python Dictionary: Values, Keys, and More

 1 year ago
source link: https://realpython.com/sort-python-dictionary/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Values, Keys, and More – Real Python

Understanding What Sorting A Dictionary Really Means

Because dictionaries don’t have much reordering functionality, when sorting a dictionary, it’s rarely done in-place. In fact, there are no methods for explicitly moving items in a dictionary.

If you wanted to sort a dictionary in-place, then you’d have to use the del keyword to delete an item from the dictionary and then add it again. Deleting and then adding again effectively moves the key-value pair to the end.

The OrderedDict class has a specific method to move an item to the end or the start, which may make OrderedDict preferable for keeping a sorted dictionary. However, it’s still not very common and isn’t very performant, to say the least.

The typical method for sorting dictionaries is to get a dictionary view, sort it, and then cast the resulting list back into a dictionary. So you effectively go from a dictionary to a list and back into a dictionary. Depending on your use case, you may not need to convert the list back into a dictionary.

Note: Sorted dictionaries aren’t a very common pattern. You’ll explore more about that topic later in the tutorial.

With those preliminaries out of the way, you’ll get to sorting dictionaries in the next section.

Sorting Dictionaries in Python

In this section, you’ll be putting together the components of sorting a dictionary so that, in the end, you can master the most common way of sorting a dictionary:

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}

>>> # Sort by key
>>> dict(sorted(people.items()))
{1: 'Jill', 2: 'Jack', 3: 'Jim', 4: 'Jane'}

>>> # Sort by value
>>> dict(sorted(people.items(), key=lambda item: item[1]))
{2: 'Jack', 4: 'Jane', 1: 'Jill', 3: 'Jim'}

Don’t worry if you don’t understand the snippets above—you’ll review it all step-by-step in the following sections. Along the way, you’ll learn how to use the sorted() function with sort keys, lambda functions, and dictionary constructors.

Using the sorted() Function

The critical function that you’ll use to sort dictionaries is the built-in sorted() function. This function takes an iterable as the main argument, with two optional keyword-only arguments—a key function and a reverse Boolean value.

To illustrate the sorted() function’s behavior in isolation, examine its use on a list of numbers:

>>> numbers = [5, 3, 4, 3, 6, 7, 3, 2, 3, 4, 1]
>>> sorted(numbers)
[1, 2, 3, 3, 3, 3, 4, 4, 5, 6, 7]

As you can see, the sorted() function takes an iterable, sorts comparable elements like numbers in ascending order, and returns a new list. With strings, it sorts them in alphabetical order:

>>> words = ["aa", "ab", "ac", "ba", "cb", "ca"]
>>> sorted(words)
['aa', 'ab', 'ac', 'ba', 'ca', 'cb']

Sorting by numerical or alphabetical precedence is the most common way to sort elements, but maybe you need more control.

Say you want to sort on the second character of each word in the last example. To customize what the sorted() function uses to sort the elements, you can pass in a callback function to the key parameter.

A callback function is a function that’s passed as an argument to another function. For sorted(), you pass it a function that acts as a sort key. The sorted() function will then call back the sort key for every element.

In the following example, the function passed as the key accepts a string and will return the second character of that string:

>>> def select_second_character(word):
...     return word[1]
...
>>> sorted(words, key=select_second_character)
['aa', 'ba', 'ca', 'ab', 'cb', 'ac']

The sorted() function passes every element of the words iterable to the key function and uses the return value for comparison. Using the key means that the sorted() function will compare the second letter instead of comparing the whole string directly.

More examples and explanations of the key parameter will come later in the tutorial when you use it to sort dictionaries by values or nested elements.

If you take another look at the results of this last sorting, you may notice the stability of the sorted() function. The three elements, aa, ba and ca, are equivalent when sorted by their second character. Because they’re equal, the sorted() function conserves their original order. Python guarantees this stability.

Note: Every list also has a .sort() method, which has the same signature as the sorted() function. The main difference is that the .sort() method sorts the list in-place. In contrast, the sorted() function returns a new list, leaving the original list unmodified.

You can also pass reverse=True to the sorting function or method to return the reverse order. Alternatively, you can use the reversed() function to invert the iterable after sorting:

>>> list(reversed([3, 2, 1]))
[1, 2, 3]

If you want to dive deeper into the mechanics of sorting in Python and learn how to sort data types other than dictionaries, then check out the tutorial on how to use sorted() and .sort()

So, how about dictionaries? You can actually take the dictionary and feed it straight into the sorted() function:

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted(people)
[1, 2, 3, 4]

But the default behavior of passing in a dictionary directly to the sorted() function is to take the keys of the dictionary, sort them, and return a list of the keys only. That’s probably not the behavior you had in mind! To preserve all the information in a dictionary, you’ll need to be acquainted with dictionary views.

Getting Keys, Values, or Both From a Dictionary

If you want to conserve all the information from a dictionary when sorting it, the typical first step is to call the .items() method on the dictionary. Calling .items() on the dictionary will provide an iterable of tuples representing the key-value pairs:

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> people.items()
dict_items([(3, 'Jim'), (2, 'Jack'), (4, 'Jane'), (1, 'Jill')])

The .items() method returns a read-only dictionary view object, which serves as a window into the dictionary. This view is not a copy or a list—it’s a read-only iterable that’s actually linked to the dictionary it was generated from:

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> view = people.items()
>>> people[2] = "Elvis"
>>> view
dict_items([(3, 'Jim'), (2, 'Elvis'), (4, 'Jane'), (1, 'Jill')])

You’ll notice that any updates to the dictionary also get reflected in the view because they’re linked. A view represents a lightweight way to iterate over a dictionary without generating a list first.

Note: You can use .values() to get a view of the values only and .keys() to get one with only the keys.

Crucially, you can use the sorted() function with dictionary views. You call the .items() method and use the result as an argument to the sorted() function. Using .items() keeps all the information from the dictionary:

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted(people.items())
[(1, 'Jill'), (2, 'Jack'), (3, 'Jim'), (4, 'Jane')]

This example results in a sorted list of tuples, with each tuple representing a key-value pair of the dictionary.

If you want to end up with a dictionary sorted by values, then you’ve still got two issues. The default behavior still seems to sort by key and not value. The other issue is that you end up with a list of tuples, not a dictionary. First, you’ll figure out how to sort by value.

Understanding How Python Sorts Tuples

When using the .items() method on a dictionary and feeding it into the sorted() function, you’re passing in an iterable of tuples, and the sorted() function compares the entire tuple directly.

When comparing tuples, Python behaves a lot like it’s sorting strings alphabetically. That is, it sorts them lexicographically.

Lexicographical sorting means that if you have two tuples, (1, 2, 4) and (1, 2, 3), then you start by comparing the first item of each tuple. The first item is 1 in both cases, which is equal. The second element, 2, is also identical in both cases. The third elements are 4 and 3, respectively. Since 3 is less than 4, you’ve found which item is less than the other.

So, to order the tuples (1, 2, 4) and (1, 2, 3) lexicographically, you would switch their order to (1, 2, 3) and (1, 2, 4).

Because of Python’s lexicographic sorting behavior for tuples, using the .items() method with the sorted() function will always sort by keys unless you use something extra.

Using the key Parameter and Lambda Functions

For example, if you want to sort by value, then you have to specify a sort key. A sort key is a way to extract a comparable value. For instance, if you have a pile of books, then you might use the author surname as the sort key. With the sorted() function, you can specify a sort key by passing a callback function as a key argument.

Note: The key argument has nothing to do with a dictionary key!

To see a sort key in action, take a look at this example, which is similar to the one you saw in the section introducing the sorted() function:

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}

>>> # Sort key
>>> def value_getter(item):
...     return item[1]
...

>>> sorted(people.items(), key=value_getter)
[(2, 'Jack'), (4, 'Jane'), (1, 'Jill'), (3, 'Jim')]

>>> # Or with a lambda function
>>> sorted(people.items(), key=lambda item: item[1])
[(2, 'Jack'), (4, 'Jane'), (1, 'Jill'), (3, 'Jim')]

In this example, you try out two ways of passing a key parameter. The key parameter accepts a callback function. The function can be a normal function identifier or a lambda function. The lambda function in the example is the exact equivalent of the value_getter() function.

Note: Lambda functions are also known as anonymous functions because they don’t have a name. Lambda functions are standard for functions that you’re only using once in your code.

Lambda functions confer no benefit apart from making things more compact, eliminating the need to define a function separately. They keep things nicely contained on the same line:

# With a normal function
def value_getter(item):
    return item[1]

sorted(people.items(), key=value_getter)

# With a lambda function
sorted(people.items(), key=lambda item: item[1])

For basic getter functions like the one in the example, lambdas can come in handy. But lambdas can make your code less readable for anything more complex, so use them with care.

Lambdas can also only ever contain exactly one expression, making any multiline statements like if statements or for loops off limits. You can work around this by using comprehensions and if expressions, for example, but those can make for long and cryptic one-liners.

The key callback function will receive each element of the iterable that it’s sorting. The callback function’s job is to return something that can be compared, such as a number or a string. In this example, you named the function value_getter() because all it does is get the value from a key-value tuple.

Since the default behavior of sorted() with tuples is to sort lexicographically, the key parameter allows you to select a value from the element that it’s comparing.

In the next section, you’ll take sort keys a bit further and use them to sort by a nested value.

Selecting a Nested Value With a Sort Key

You can also go further and use a sort key to select nested values that may or may not be present and return a default value if they’re not present:

data = {
    193: {"name": "John", "age": 30, "skills": {"python": 8, "js": 7}},
    209: {"name": "Bill", "age": 15, "skills": {"python": 6}},
    746: {"name": "Jane", "age": 58, "skills": {"js": 2, "python": 5}},
    109: {"name": "Jill", "age": 83, "skills": {"java": 10}},
    984: {"name": "Jack", "age": 28, "skills": {"c": 8, "assembly": 7}},
    765: {"name": "Penelope", "age": 76, "skills": {"python": 8, "go": 5}},
    598: {"name": "Sylvia", "age": 62, "skills": {"bash": 8, "java": 7}},
    483: {"name": "Anna", "age": 24, "skills": {"js": 10}},
    277: {"name": "Beatriz", "age": 26, "skills": {"python": 2, "js": 4}},
}

def get_relevant_skills(item):
    """Get the sum of Python and JavaScript skill"""
    skills = item[1]["skills"]

    # Return default value that is equivalent to no skill
    return skills.get("python", 0) + skills.get("js", 0)

print(sorted(data.items(), key=get_relevant_skills, reverse=True))

In this example, you have a dictionary with numeric keys and a nested dictionary as a value. You want to sort by the combined Python and JavaScript skills, attributes found in the skills subdictionary.

Part of what makes sorting by the combined skill tricky is that the python and js keys aren’t present in the skills dictionary for all people. The skills dictionary is also nested. You use .get() to read the keys and provide 0 as a default value that’s used for missing skills.

You’ve also used the reverse argument because you want the top Python skills to appear first.

Note: You didn’t use a lambda function in this example. While it’s possible, it would make for a long line of potentially cryptic code:

sorted(
    data.items(),
    key=lambda item: (
        item[1]["skills"].get("python", 0)
        + item[1]["skills"].get("js", 0)
    ),
    reverse=True,
)

A lambda function can only contain one expression, so you repeat the full look-up in the nested skills subdictionary. This inflates the line length considerably.

The lambda function also requires multiple chained square bracket ([]) indices, making it harder to read than necessary. Using a lambda in this example only saves a few lines of code, and the performance difference is negligible. So, in these cases, it usually makes more sense to use a normal function.

You’ve successfully used a higher-order function as a sort key to sort a dictionary view by value. That was the hard part. Now there’s only one issue left to solve—converting the list that sorted() yields back into a dictionary.

Converting Back to a Dictionary

The only issue left to address with the default behavior of sorted() is that it returns a list, not a dictionary. There are a few ways to convert a list of tuples back into a dictionary.

You can iterate over the result with a for loop and populate a dictionary on each iteration:

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted_people = sorted(people.items(), key=lambda item: item[1])

>>> sorted_people_dict = {}
>>> for key, value in sorted_people:
...     sorted_people_dict[key] = value
...

>>> sorted_people_dict
{2: 'Jack', 4: 'Jane', 1: 'Jill', 3: 'Jim'}

This method gives you absolute control and flexibility in deciding how you want to construct your dictionary. This method can be quite lengthy to type out, though. If you don’t have any special requirements for constructing your dictionary, then you may want to go for a dictionary constructor instead:

>>> people = {3: "Jim", 2: "Jack", 4: "Jane", 1: "Jill"}
>>> sorted_people = sorted(people.items(), key=lambda item: item[1])
>>> dict(sorted_people)
{2: 'Jack', 4: 'Jane', 1: 'Jill', 3: 'Jim'}

That’s nice and compact! You could also use a dictionary comprehension, but that only makes sense if you want to change the shape of the dictionary or swap the keys and values, for example. In the following comprehension, you swap the keys and values:

>>> {
...     value: key
...     for key, value in sorted(people.items(), key=lambda item: item[1])
... }
...
{'Jack': 2, 'Jane': 4, 'Jill': 1, 'Jim': 3}

Depending on how familiar you or your team are with comprehensions, this may be less readable than just using a normal for loop.

Congratulations, you’ve got your sorted dictionary! You can now sort it by any criteria that you’d like.

Now that you can sort your dictionary, you might be interested in knowing if there are any performance implications to using a sorted dictionary, or whether there are alternative data structures for key-value data.


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK