Apart from built-in general purpose container data structures like list
, dict
, set
and tuple
. Python provides collections module which implements some specialized container data types.
Following container data types are present in collections
module for python 3.6.
namedtuple()
: factory function for creating tuple subclasses with named fields.deque
: list-like container with fast appends and pops on either end.ChainMap
: dict-like class for creating a single view of multiple mappingsCounter
: dict subclass for counting hashable objectsOrderedDict
: dict subclass that remembers the order entries were addeddefaultdict
: dict subclass that calls a factory function to supply missing values.UserDict
: wrapper around dictionary objects for easier dict subclassingUserList
: wrapper around list objects for easier list subclassingUserString
: wrapper around string objects for easier string subclassing
We will have look at some of these container data types in this blog.
namedtuple() Link to heading
Named tuple assigned meaning to each position in a tuple and allows for more readable , self-documenting code. It allows you to access fields by name instead of position index. It can be used anywhere you can use regular tuple.
collections.namedtuple(typename, field_names, *, verbose=False, rename=False, module=None)
It creates a named tuple with typename
as name and fields name as item in the iterable field_names
. For example if you want to create a named tuple for a 2D points.
Point = collections.namedtuple('Point', ['x', 'y'])# Create a namedtuple
p = Point(x=10, y=11) # Create a new point instance of named tuple
print(p.x, p.y) # Accessing the value using named field
x, y = p # Unpacking a named tuple value
This can be very useful when reading data from CSV file or a database. For example, we have a CSV file which has employee record such as name, age, title, department, paygrade.
from collections import namedtuple
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
import csv
for emp in map(EmployeeRecord._make,
csv.reader(open("employees.csv", "rb"))):
print(emp.name, emp.title)
# Reading from a database
import sqlite3
conn = sqlite3.connect('/companydata')
cursor = conn.cursor()
cursor.execute('SELECT name, age, title, department, paygrade FROM employees')
for emp in map(EmployeeRecord._make, cursor.fetchall()):
print(emp.name, emp.title)
In addition to the methods inherited from tuples, named tuples support three additional methods and two attributes.
somenamedtuple._make(iterable):
makes a new instance from an existing sequence or iterable.
somenamedtuple._asdict():
Return a new OrderedDict which maps field names to their corresponding values.
somenamedtuple._replace(**kwargs):
Return a new instance of the named tuple replacing specified fields with new values.
somenamedtuple._source
somenamedtuple._fields
OrderedDict Link to heading
Ordered dictionaries are just like regular dictionaries but they remember the order in which items were inserted. If a new entry overwrites an existing entry, the original position left unchanged. Deleting an entry and re-inserting it will move it to the end.
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d["a"] = 1
>>> d["b"] = 2
>>> d["c"] = 3
>>> print(d.items())
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
Most of the
OrderedDict
methods works in same way as regular dictionary. There are two methods which you should know.
popitem(last=True):
returns and removes a key-value pair. The pairs are returned in LIFO order if last is True or FIFO order if false.
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d["a"] = 1
>>> d["b"] = 2
>>> d["c"] = 3
>>> d
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> d.popitem(last=True)
('c', 3)
>>> d
OrderedDict([('a', 1), ('b', 2)])
>>> d.popitem(last=False)
('a', 1)
>>> d
OrderedDict([('b', 2)])
move_to_end(key, last=True):
Move an existing key to either end of an ordered dictionary. The items is moved to the right if last is True or to the beginning if last is False.
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d["a"] = 1
>>> d["b"] = 2
>>> d["c"] = 3
>>> d
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> d.move_to_end('b')
>>> d
OrderedDict([('a', 1), ('c', 3), ('b', 2)])
>>> d.move_to_end('c', last=False)
>>> d
OrderedDict([('c', 3), ('a', 1), ('b', 2)])
Counter Link to heading
A counter
is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts.
Elements are counted from an iterable
or initialized from another mapping
(or counter):
>>> c = Counter() # a new, empty counter
>>> c = Counter('gallahad') # a new counter from aiterable
>>> c = Counter({'red': 4, 'blue': 2}) # a new counter from mapping
>>> c = Counter(cats=4, dogs=8) # a new counter from keyword args
counter objects returns zero count for missing item.
>>> c = Counter(['cat', 'dogs'])
>>> c['fan']
0
setting a count to zero does not remove element from the counter. use del
to remove entirely.
>>> c['fan'] = 0
>>> del c['fan']
Apart from all methods available to dictionaries there are three extra methods available.
elements():
returns iterator of elements, each repeated as many times as count arranged in arbitrary manner.
most_common([n]):
returns list of most common elements and their counts from the most common to least. if n is omitted or None it returns all elements.
subtract([iterable or mapping]):
Elements are subtracted from an iterable or another mapping. It subtracts counts.
Note:
update([iterable or mapping])
method of dictionary does not replace them instead it adds and update counts from an iterable or mapping.
ChainMap Link to heading
ChainMap is a data structure which provides a way for quickly linking a number of mapping objects, so that they can be treated as a single unit. Lookups search all mapping successively until a key is found. In contrast, writes, updates and deletions only operate on the first mapping.
This data structure can be very useful for scenarios like
-
Config or argument parsing : E.g for a command line tool flag, you have to give priority to user provided value, if user does not provide any input read it from environment otherwise use default value.
-
Simulating a nested scoping, like local and global variable
collections.ChainMap(*maps)
Link to heading
It constructor takes multiple dicts or other mappings together and provides a single update able view. If no maps are provided, a single empty dictionary.
ChainMap stores all mapping in a list and can be accessed or updated using maps
attributes as it is public. It also stores the reference of mapping, so if a mapping gets updated it will be reflected in ChainMap as well.
It supports all dictionary methods and provides an attribute
maps
, a methodnew_child([m])
and a propertyparent
maps: returns a list of mapping ordered from first-searched to last-searched.
new_child([mapping]): Returns a new ChainMap followed by all maps in the current instance. If mapping is specified that becomes first map otherwise an empty dictionary is used. This can be used to create a new sub context which can be updated without altering values in any of the parents mappings.
parents: returns a new ChainMap containing all the maps in current ChainMap except the first one.
Example usage of ChainMap where a user specified value take precedence over environment variable which is turn take precedence over default value.
import os
import argparse
from collections import ChainMap
DEFAULTS = {"message": "Hello", "user": "Guest"}
parser = argparse.ArgumentParser()
parser.add_argument("-u", "--user")
parser.add_argument("-m", "--message")parsed_arguments = parser.parse_args()
command_line_args = {k: v for k, v in vars(parsed_arguments).items() if v}
chain_map = ChainMap(command_line_args, os.environ, DEFAULTS)
print("{}, {}".format(chain_map['message'], chain_map['user']))
deque Link to heading
Deque
are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deque
support thread-safe, memory efficient appends and pops from either side of the deque
with approximately the same O(1) performance in either direction.
Deque
can be bounded or unbounded and it can be controlled using maxlen
argument while initializing a deque. If maxlen
is None
deque can grow to an arbitrary length. Otherwise, the deque is bounded to the specified maximum length. Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end.
>>> from collections import deque
>>> d = deque([1, 2, 3]) # Creating a deque from iterable
>>> empty_d = deque() # Empty deque
>>> bounded_d = deque(maxlen=10) # A bounded deque of length 10
It has methods like
appendleft
,extendleft
,popleft
,rotate
apart from regular list methods.
appendleft(item): It can be used to append an element from the left side of deque
.
extendleft(iterable): Extend the left side of the deque
by appending elements from iterable
.
popleft(): removes and return element from the left side of the queue.
rotate(n): rotates the deque n steps to the right. if n is negative, rotate to the left.
defaultdict Link to heading
defaultdict
is a subclass of dict
class which calls a factory function to supplying missing values. All functionalities are similar to dictionary except when a new key is encountered, it call default_factory
to get the default value.
For example, if we use list
as default_factory
.
>>> from collections import defaultdict
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the default_factory
function which returns an empty list. The list.append()
operation then attaches the value to the new list. When keys are encountered again, the look-up proceeds normally (returning the list for that key) and the list.append()
operation adds another value to the list.