Python collections module
Python’s collections module is often ignored and sidelined by most of the python programmers but it contains few useful container data types which can come handy in some special scenarios and use-cases.
Let’s dig in!
As of writing, Wed 6 Nov 2019, and v3.8 — Python has total of 9 container Data types:
namedtuple()
deque
ChainMap
Counter
defaultdict
OrderedDict
UserDict
UserList
UserString
We’ll now see what each one has to offer and which use-case they satisfy.
namedtuple()
namedtuple() is actually a factory function to create tuple
with named fields.
Since, there is no mention of private scope/properties/variables in Python’s
definition of OOPS (although we can enforce it to some extent using meta-classes
and __setattr_
etc but still true immutability is missing). Now here comes
namedtuple
which provides us the succinct way to create immutable attributes.
We can access our elements by either using tuple like numbered index or key based lookup. It can be useful where we have a requirement to make class attributes immutable.
|
|
deque
It is the implementation of “double-ended” queues in Python and is pronounced as
“deck”. It supports append
and pop
in order of O(1)
in
either direction. If maxlen
attribute is not specified or is None
, deques may
grow to an arbitrary length. Once it is full, it will start discarding elements
from other end during an insertion.
Regarding its use-case we all know how and when deques
are used.
|
|
ChainMap
Sometimes, we need to override global values/config etc from a local config like
different mysql
host for production and dev environment, in such situation we
can use ChainMap
.
ChainMap is used to encapsulate multiple mappings as one view. In case of a
key
present in multiple mappings then value of first dict passed will be taken into
account.
|
|
Counter
This container data type is a subclass of dict and is used to store counts of
hash-able objects. It can come handy when you have to keep track of number of
occurrences of an element in an iterable (str
, list
, etc). For
missing/non-existent keys it returns 0 or ’’ instead of IndexError.
Example use-case, find numbers which appear more than once in a list.
|
|
OrderedDict
Although, fundamentally dict doesn’t need to have a definite order as we don’t
access elements using its position/index rather we use a key
but still there are
possible use-cases when one need to preserve the order of insertion, therefore
comes this DataType called OrderedDict
to satisfy this use case.
|
|
defaultdict
Almost all of us have faced this KeyError
while accessing non-existent keys in a
dict but fortunately defaultdict
saves us from this notorious error and
return the mutually decided default value. defaultdict
accepts a
function/factory as its argument to return the default value. Let’s see how it
works
|
|
UserList, UserDict, and UserString
These Data types are subclass of standard list
, dict
and str
class
respectively and act as wrapper around them. You can use them as base class for
your objects so that you can extend default behaviours and add new ones.
Conclusion
The collections module in Python is not very extensive or fancy, and also it doesn’t intend to satisfy every use case but still it can come very handy for some particular use-cases where otherwise a developer would have to write a custom and tricky workaround.
Problems like immutable object attributes, keeping track of occurrences of elements in a list,
maintain order in dicts, O(1)
append
and pop
from both ends of a list, etc. are very common
and therefore python has standard modules to handle them.
You can learn more about collections
module in official docs here
as each one of them have some cool methods to manipulate data which aren’t mentioned in this article.
#TillThenHappyCoding