1

Python's ChainMap: Manage Multiple Contexts Effectively

 2 years ago
source link: https://realpython.com/python-chainmap/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Getting Started With Python’s ChainMap

Python’s ChainMap was added to collections in Python 3.3 as a handy tool for managing multiple scopes and contexts. This class allows you to group several dictionaries and other mappings together to make them logically appear and behave as one. It creates a single updatable view that works similar to a regular dictionary but with some internal differences.

ChainMap doesn’t merge its mappings together. Instead, it keeps them in an internal list of mappings. Then ChainMap reimplements common dictionary operations on top of that list. Since the internal list holds references to the original input mapping, any changes in those mappings affect the ChainMap object as a whole.

Storing the input mappings in a list allows you to have duplicate keys in a given chain map. If you perform a key lookup, then ChainMap searches the list of mappings until it finds the first occurrence of the target key. If the key is missing, then you get a KeyError as usual.

Storing the mappings in a list truly shines when you need to manage nested scopes, where each mapping represents a specific scope or context.

To better understand what scopes and contexts are about, think about how Python resolves names. When Python looks for a name, it searches in locals(), globals(), and finally builtins until it finds the first occurrence of the target name. If the name doesn’t exist, then you get a NameError. Dealing with scopes and contexts is the most common kind of problem you can solve with ChainMap.

When you’re working with ChainMap, you can chain several dictionaries with keys that are either disjoint or intersecting.

In the first case, ChainMap allows you to treat all your dictionaries as one. So, you can access the key-value pairs as if you were working with a single dictionary. In the second case, besides managing your dictionaries as one, you can also take advantage of the internal list of mappings to define some sort of access priority for repeated keys across your dictionaries. That’s why ChainMap objects are great for handling multiple contexts.

A curious behavior of ChainMap is that mutations, such as updating, adding, deleting, clearing, and popping keys, act only on the first mapping in the internal list of mappings. Here’s a summary of the main features of ChainMap:

  • Builds an updatable view from several input mappings
  • Provides almost the same interface as a dictionary, but with some extra features
  • Doesn’t merge the input mappings but instead keeps them in an internal public list
  • Sees external changes in the input mappings
  • Can contain repeated keys with different values
  • Searches keys sequentially through the internal list of mappings
  • Throws a KeyError when a key is missing after searching the entire list of mappings
  • Performs mutations only on the first mapping in the internal list

In this tutorial, you’ll learn a lot more about all these cool features of ChainMap. The following section will guide you through how to create new instances of ChainMap in your code.

Instantiating ChainMap

To create ChainMap in your Python code, you first need to import the class from collections and then call it as usual. The class initializer can take zero or more mappings as arguments. With no arguments, it initializes a chain map with an empty dictionary inside:

>>>
>>> from collections import ChainMap
>>> from collections import OrderedDict, defaultdict

>>> # Use no arguments
>>> ChainMap()
ChainMap({})

>>> # Use regular dictionaries
>>> numbers = {"one": 1, "two": 2}
>>> letters = {"a": "A", "b": "B"}

>>> ChainMap(numbers, letters)
ChainMap({'one': 1, 'two': 2}, {'a': 'A', 'b': 'B'})

>>> ChainMap(numbers, {"a": "A", "b": "B"})
ChainMap({'one': 1, 'two': 2}, {'a': 'A', 'b': 'B'})

>>> # Use other mappings
>>> numbers = OrderedDict(one=1, two=2)
>>> letters = defaultdict(str, {"a": "A", "b": "B"})
>>> ChainMap(numbers, letters)
ChainMap(
    OrderedDict([('one', 1), ('two', 2)]),
    defaultdict(<class 'str'>, {'a': 'A', 'b': 'B'})
)

Here, you create several ChainMap objects using different combinations of mappings. In each case, ChainMap returns a single dictionary-like view of all the input mappings. Note that you can use any type of mapping, such as OrderedDict and defaultdict.

You can also create ChainMap objects using the class method .fromkeys(). This method can take an iterable of keys and an optional default value for all the keys:

>>>
>>> from collections import ChainMap

>>> ChainMap.fromkeys(["one", "two","three"])
ChainMap({'one': None, 'two': None, 'three': None})

>>> ChainMap.fromkeys(["one", "two","three"], 0)
ChainMap({'one': 0, 'two': 0, 'three': 0})

If you call .fromkeys() on ChainMap with an iterable of keys as an argument, then you get a chain map with a single dictionary. The keys come from the input iterable, and the values default to None. Optionally, you can pass a second argument to .fromkeys() to provide a sensible default value for every key.

Running Dictionary-Like Operations

ChainMap supports the same API as regular dictionaries for accessing existing keys. Once you have a ChainMap object, you can retrieve existing keys with dictionary-style key lookup, or you can use .get():

>>>
>>> from collections import ChainMap

>>> numbers = {"one": 1, "two": 2}
>>> letters = {"a": "A", "b": "B"}

>>> alpha_num = ChainMap(numbers, letters)
>>> alpha_num["two"]
2

>>> alpha_num.get("a")
'A'

>>> alpha_num["three"]
Traceback (most recent call last):
    ...
KeyError: 'three'

A key lookup searches all the mappings in the target chain map until it finds the desired key. If the key doesn’t exist, then you get the usual KeyError. Now, how does a lookup operation behave when you have duplicate keys? In that case, you get the first occurrence of the target key:

>>>
>>> from collections import ChainMap

>>> for_adoption = {"dogs": 10, "cats": 7, "pythons": 3}
>>> vet_treatment = {"dogs": 4, "cats": 3, "turtles": 1}
>>> pets = ChainMap(for_adoption, vet_treatment)

>>> pets["dogs"]
10
>>> pets.get("cats")
7
>>> pets["turtles"]
1

When you access a duplicate key, such as "dogs" and "cats", the chain map only returns the first occurrence of that key. Internally, lookup operations search the input mappings in the same order they appear in the internal list of mappings, which is also the exact order you pass them into the class’s initializer.

This general behavior also applies to iteration:

>>>
>>> from collections import ChainMap

>>> for_adoption = {"dogs": 10, "cats": 7, "pythons": 3}
>>> vet_treatment = {"dogs": 4, "cats": 3, "turtles": 1}
>>> pets = ChainMap(for_adoption, vet_treatment)

>>> for key, value in pets.items():
...     print(key, "->", value)
...
dogs -> 10
cats -> 7
turtles -> 1
pythons -> 3

The for loop iterates over the dictionaries in pets and prints the first occurrence of each key-value pair. You can also iterate through the dictionary directly or with .keys() and .values() as you can do with any dictionary:

>>>
>>> from collections import ChainMap

>>> for_adoption = {"dogs": 10, "cats": 7, "pythons": 3}
>>> vet_treatment = {"dogs": 4, "cats": 3, "turtles": 1}
>>> pets = ChainMap(for_adoption, vet_treatment)

>>> for key in pets:
...     print(key, "->", pets[key])
...
dogs -> 10
cats -> 7
turtles -> 1
pythons -> 3

>>> for key in pets.keys():
...     print(key, "->", pets[key])
...
dogs -> 10
cats -> 7
turtles -> 1
pythons -> 3

>>> for value in pets.values():
...     print(value)
...
10
7
1
3

Again, the behavior is the same. Every iteration goes through the first occurrence of each key, item, and value in the underlying chain map.

ChainMap also supports mutations. In other words, it allows you to update, add, delete, and pop key-value pairs. The difference in this case is that these operations act on the first mapping only:

>>>
>>> from collections import ChainMap

>>> numbers = {"one": 1, "two": 2}
>>> letters = {"a": "A", "b": "B"}

>>> alpha_num = ChainMap(numbers, letters)
>>> alpha_num
ChainMap({'one': 1, 'two': 2}, {'a': 'A', 'b': 'B'})

>>> # Add a new key-value pair
>>> alpha_num["c"] = "C"
>>> alpha_num
ChainMap({'one': 1, 'two': 2, 'c': 'C'}, {'a': 'A', 'b': 'B'})

>>> # Update an existing key
>>> alpha_num["b"] = "b"
>>> alpha_num
ChainMap({'one': 1, 'two': 2, 'c': 'C', 'b': 'b'}, {'a': 'A', 'b': 'B'})

>>> # Pop keys
>>> alpha_num.pop("two")
2
>>> alpha_num.pop("a")
Traceback (most recent call last):
    ...
KeyError: "Key not found in the first mapping: 'a'"

>>> # Delete keys
>>> del alpha_num["c"]
>>> alpha_num
ChainMap({'one': 1, 'b': 'b'}, {'a': 'A', 'b': 'B'})
>>> del alpha_num["a"]
Traceback (most recent call last):
    ...
KeyError: "Key not found in the first mapping: 'a'"

>>> # Clear the dictionary
>>> alpha_num.clear()
>>> alpha_num
ChainMap({}, {'a': 'A', 'b': 'B'})

Operations that mutate the content of a given chain map only affect the first mapping, even if the key you’re trying to mutate exists in other mappings in the list. For example, when you try to update "b" in the second mapping, what really happens is that you add a new key to the first dictionary.

You can take advantage of this behavior to create updatable chain maps that don’t modify your original input dictionaries. In this case, you can use an empty dictionary as the first argument to ChainMap:

>>>
>>> from collections import ChainMap

>>> numbers = {"one": 1, "two": 2}
>>> letters = {"a": "A", "b": "B"}

>>> alpha_num = ChainMap({}, numbers, letters)
>>> alpha_num
ChainMap({}, {'one': 1, 'two': 2}, {'a': 'A', 'b': 'B'})

>>> alpha_num["comma"] = ","
>>> alpha_num["period"] = "."

>>> alpha_num
ChainMap(
    {'comma': ',', 'period': '.'},
    {'one': 1, 'two': 2},
    {'a': 'A', 'b': 'B'}
)

Here, you use an empty dictionary ({}) to create alpha_num. This ensures that the changes you perform on alpha_num will never affect your two original input dictionaries, numbers and letters, and will only affect the empty dictionary at the beginning of the list.

Merging vs Chaining Dictionaries

As an alternative to chaining multiple dictionaries with ChainMap, you can consider merging them together using dict.update():

>>>
>>> from collections import ChainMap

>>> # Chain dictionaries with ChainMap
>>> for_adoption = {"dogs": 10, "cats": 7, "pythons": 3}
>>> vet_treatment = {"hamsters": 2, "turtles": 1}

>>> ChainMap(for_adoption, vet_treatment)
ChainMap(
    {'dogs': 10, 'cats': 7, 'pythons': 3},
    {'hamsters': 2, 'turtles': 1}
)

>>> # Merge dictionaries with .update()
>>> pets = {}
>>> pets.update(for_adoption)
>>> pets.update(vet_treatment)
>>> pets
{'dogs': 10, 'cats': 7, 'pythons': 3, 'hamsters': 2, 'turtles': 1}

In this specific example, you get similar results when you build a chain map and an equivalent dictionary from two existing dictionaries with unique keys.

Merging dictionaries with .update() has pros and cons compared with chaining them with ChainMap. The first and most important drawback is that you’re throwing out the ability to manage and prioritize the access to repeated keys using multiple scopes or contexts. With .update(), the last value you provide for a given key will always prevail:

>>>
>>> for_adoption = {"dogs": 10, "cats": 7, "pythons": 3}
>>> vet_treatment = {"cats": 2, "dogs": 1}

>>> # Merge dictionaries with .update()
>>> pets = {}
>>> pets.update(for_adoption)
>>> pets.update(vet_treatment)
>>> pets
{'dogs': 1, 'cats': 2, 'pythons': 3}

Regular dictionaries can’t store repeated keys. Every time you call .update() with a value for an existing key, that key is updated with the new value. In this case, you lose the ability to prioritize the access to duplicate keys using different scopes.

Note: Since Python 3.5, you can also merge dictionaries together using the dictionary unpacking operator (**). Additionally, if you’re using Python 3.9, then you can use the dictionary union operator (|) to merge two dictionaries into a new one.

Now suppose you have n different mappings with at most m keys each. Creating a ChainMap object from them would take O(n) execution time, while retrieving a key would take O(n) in the worst-case scenario, in which the target key is in the last dictionary of the internal list of mappings.

Alternatively, creating a regular dictionary using .update() in a loop would take O(nm), while retrieving a key from the final dictionary would take O(1).

The conclusion is that, if you often create chains of dictionaries and only perform a few key lookups each time, then you should use ChainMap. If it’s the other way around, then use regular dictionaries unless you require duplicate keys or multiple scopes.

Another difference between merging and chaining dictionaries is that when you use ChainMap, external changes in the input dictionaries affect the underlying chain, which isn’t the case with merged dictionaries.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK