Python collections module

Apart from built-in general purpose container data structures like list, dict, set and tuple . Python provides collections module which implements some specialized container data types.

Following container data types are present in collections module for python 3.6.

namedtuple() : factory function for creating tuple subclasses with named fields.
deque : list-like container with fast appends and pops on either end.
ChainMap : dict-like class for creating a single view of multiple mappings
Counter: dict subclass for counting hashable objects
OrderedDict : dict subclass that remembers the order entries were added
defaultdict: dict subclass that calls a factory function to supply missing values.
UserDict : wrapper around dictionary objects for easier dict subclassing
UserList : wrapper around list objects for easier list subclassing
UserString : wrapper around string objects for easier string subclassing

We will have look at some of these container data types in this blog.

namedtuple() Link to heading

Named tuple assigned meaning to each position in a tuple and allows for more readable , self-documenting code. It allows you to access fields by name instead of position index. It can be used anywhere you can use regular tuple.

collections.namedtuple(typename, field_names, *, verbose=False, rename=False, module=None)

It creates a named tuple with typename as name and fields name as item in the iterable field_names. For example if you want to create a named tuple for a 2D points.

Point = collections.namedtuple('Point', ['x', 'y'])# Create a namedtuple

p = Point(x=10, y=11) # Create a new point instance of named tuple

print(p.x, p.y) # Accessing the value using named field

x, y = p # Unpacking a named tuple value

This can be very useful when reading data from CSV file or a database. For example, we have a CSV file which has employee record such as name, age, title, department, paygrade.

from collections import namedtuple
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')

import csv
for emp in map(EmployeeRecord._make,
               csv.reader(open("employees.csv", "rb"))):
    print(emp.name, emp.title)
# Reading from a database
import sqlite3
conn = sqlite3.connect('/companydata')
cursor = conn.cursor()
cursor.execute('SELECT name, age, title, department, paygrade FROM employees')
for emp in map(EmployeeRecord._make, cursor.fetchall()):
    print(emp.name, emp.title)

In addition to the methods inherited from tuples, named tuples support three additional methods and two attributes.

somenamedtuple._make(iterable): makes a new instance from an existing sequence or iterable.

somenamedtuple._asdict(): Return a new OrderedDict which maps field names to their corresponding values.

somenamedtuple._replace(**kwargs): Return a new instance of the named tuple replacing specified fields with new values.

somenamedtuple._source

somenamedtuple._fields

OrderedDict Link to heading

Ordered dictionaries are just like regular dictionaries but they remember the order in which items were inserted. If a new entry overwrites an existing entry, the original position left unchanged. Deleting an entry and re-inserting it will move it to the end.

>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d["a"] = 1
>>> d["b"] = 2
>>> d["c"] = 3
>>> print(d.items())
OrderedDict([('a', 1), ('b', 2), ('c', 3)])

Most of the OrderedDict methods works in same way as regular dictionary. There are two methods which you should know.

popitem(last=True): returns and removes a key-value pair. The pairs are returned in LIFO order if last is True or FIFO order if false.

>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d["a"] = 1
>>> d["b"] = 2
>>> d["c"] = 3
>>> d
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> d.popitem(last=True)
('c', 3)
>>> d
OrderedDict([('a', 1), ('b', 2)])
>>> d.popitem(last=False)
('a', 1)
>>> d
OrderedDict([('b', 2)])

move_to_end(key, last=True): Move an existing key to either end of an ordered dictionary. The items is moved to the right if last is True or to the beginning if last is False.

>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d["a"] = 1
>>> d["b"] = 2
>>> d["c"] = 3
>>> d
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> d.move_to_end('b')
>>> d
OrderedDict([('a', 1), ('c', 3), ('b', 2)])
>>> d.move_to_end('c', last=False)
>>> d
OrderedDict([('c', 3), ('a', 1), ('b', 2)])

Counter Link to heading

A counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts.

Elements are counted from an iterable or initialized from another mapping (or counter):

>>> c = Counter()           # a new, empty counter
>>> c = Counter('gallahad') # a new counter from aiterable  
>>> c = Counter({'red': 4, 'blue': 2}) # a new counter from mapping
>>> c = Counter(cats=4, dogs=8)   # a new counter from keyword args

counter objects returns zero count for missing item.

>>> c = Counter(['cat', 'dogs'])
>>> c['fan']
0

setting a count to zero does not remove element from the counter. use del to remove entirely.

>>> c['fan'] = 0
>>> del c['fan']

Apart from all methods available to dictionaries there are three extra methods available.

elements(): returns iterator of elements, each repeated as many times as count arranged in arbitrary manner.

most_common([n]): returns list of most common elements and their counts from the most common to least. if n is omitted or None it returns all elements.

subtract([iterable or mapping]): Elements are subtracted from an iterable or another mapping. It subtracts counts.

Note: update([iterable or mapping]) method of dictionary does not replace them instead it adds and update counts from an iterable or mapping.

ChainMap Link to heading

ChainMap is a data structure which provides a way for quickly linking a number of mapping objects, so that they can be treated as a single unit. Lookups search all mapping successively until a key is found. In contrast, writes, updates and deletions only operate on the first mapping.

This data structure can be very useful for scenarios like

Config or argument parsing : E.g for a command line tool flag, you have to give priority to user provided value, if user does not provide any input read it from environment otherwise use default value.
Simulating a nested scoping, like local and global variable

`collections.ChainMap(*maps)` Link to heading

It constructor takes multiple dicts or other mappings together and provides a single update able view. If no maps are provided, a single empty dictionary.

ChainMap stores all mapping in a list and can be accessed or updated using maps attributes as it is public. It also stores the reference of mapping, so if a mapping gets updated it will be reflected in ChainMap as well.

It supports all dictionary methods and provides an attribute maps , a method new_child([m]) and a property parent

maps: returns a list of mapping ordered from first-searched to last-searched.

new_child([mapping]): Returns a new ChainMap followed by all maps in the current instance. If mapping is specified that becomes first map otherwise an empty dictionary is used. This can be used to create a new sub context which can be updated without altering values in any of the parents mappings.

parents: returns a new ChainMap containing all the maps in current ChainMap except the first one.

Example usage of ChainMap where a user specified value take precedence over environment variable which is turn take precedence over default value.

import os
import argparse
from collections import ChainMap
DEFAULTS = {"message": "Hello", "user": "Guest"}
parser = argparse.ArgumentParser()
parser.add_argument("-u", "--user")
parser.add_argument("-m", "--message")parsed_arguments = parser.parse_args()
command_line_args = {k: v for k, v in vars(parsed_arguments).items() if v}
chain_map = ChainMap(command_line_args, os.environ, DEFAULTS)
print("{}, {}".format(chain_map['message'], chain_map['user']))

deque Link to heading

Deque are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deque support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.

Deque can be bounded or unbounded and it can be controlled using maxlen argument while initializing a deque. If maxlen is None deque can grow to an arbitrary length. Otherwise, the deque is bounded to the specified maximum length. Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end.

>>> from collections import deque
>>> d = deque([1, 2, 3])         # Creating a deque from iterable
>>> empty_d = deque()            # Empty deque
>>> bounded_d = deque(maxlen=10) # A bounded deque of length 10

It has methods like appendleft , extendleft, popleft, rotate apart from regular list methods.

appendleft(item): It can be used to append an element from the left side of deque.

extendleft(iterable): Extend the left side of the deque by appending elements from iterable.

popleft(): removes and return element from the left side of the queue.

rotate(n): rotates the deque n steps to the right. if n is negative, rotate to the left.

defaultdict Link to heading

defaultdict is a subclass of dict class which calls a factory function to supplying missing values. All functionalities are similar to dictionary except when a new key is encountered, it call default_factory to get the default value.

For example, if we use list as default_factory .

>>> from collections import defaultdict
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
...     d[k].append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the default_factory function which returns an empty list. The list.append() operation then attaches the value to the new list. When keys are encountered again, the look-up proceeds normally (returning the list for that key) and the list.append() operation adds another value to the list.

Resources: Link to heading

https://docs.python.org/3.6/library/collections.html