summaryrefslogtreecommitdiff
path: root/artima/python/mixins2.py
blob: 31090be673d2fd9314320d423d5051104ededc79 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
'''
A few conceptual issues with mixins
----------------------------------------------------------------

In the first article of this series I have discussed a very serious
problem of the mixin approach, i.e. the namespace overpopulation issue.

.. figure:: http://www.phyast.pitt.edu/~micheles/python/Overpopulation.jpg

   The namespace overpopulation issue

The overpopulation issue comes from the idea of growing functionality
by adding more and more mixin classes, which is just plain wrong.
It is true that you can use the idea in little frameworks
with little damage, but that does not make it a good design solution.
Small frameworks have a tendency to grow, and you should not
start with a weak design.

Some reader argued that this
is not a problem of mixins *per se*, but a problem of bad design.
That is true, but I maintain that a technique which is so easy to
misuse even by expert programmers, should be regarded with suspicion,
especially when there are better solutions available.
Moreover, I have a few conceptual issues with mixins - as implemented in most
languages - which are independent of the overpopulation problem.

First of all, I think everybody agrees that the best way to solve a
complex problem is to split it in smaller subproblems, by
following the *dividi et impera* principle. The disturbing thing about
mixins is that the principle is applied at the beginning (the problem
is decomposed in smaller independent units) but at the end all
the functionalities are added back to the client class as
an undifferentiated soup of methods.

Therefore a design based on mixins looks clean to the
framework writer - everything is well separated in his mind - but
it looks messy to the framework user - she sees
methods coming from all directions without a clear separation. It is
really the same situation than using the 
``from module import *`` idiom, which is rightly frowned upon.

I find it most unpythonic that mixins make the life of the framework
writer easier, but the life of the framework reader more difficult,
since the goal of Python is to make code *easy to read*, not easy to
write.  The scenario I have in mind is the usual one: a poor
programmer who needs to debug an object coming from a gigantic
framework which is *terra incognita* to her, without any documentation
and with a strict deadline (do you see yourself in there?). In such
conditions a framework heavily based on mixins makes things harder,
since the programmer gets drowned under hundreds of methods which
are properly ordered in mixin classes on the paper, but not on the
battle field.

There is also another conceptual issue. The idea
behind mixins is that they should be used for generic functionality
which can be applied to different classes (think of mixins like Persistent,
Comparable, Printable, etc.). But this is exactly the same situation
where you want to use generic functions.

In this `post of mine`_ I actually argue that generic functions
(a.k.a. multimethods) are a better solution than mixins. I also provide
a very concrete example, which I think generalizes. The advantage of generic
functions is that they are clearly defined outside classes,
whereas the mixin approach is kind of schizophrenic:
the functionality is actually defined externally, but that fact
is made invisible to the final user.

I am a big fan of generic functions which are already used
in the Python word - ``print`` is a generic function,
the comparison operators are generic functions, numpy_ universal
functions (ufunctions) are generic functions, etc - but should be
used even more. With generic functions, mixins becomes useless.
A side effect is that the class namespace becomes much slimmer: for instance,
in CLOS_ classes are used just to contain state, whereas the methods
live in a separate namespace. In most languages instead,
classes are used as a namespace control mechanism, performing
double duty - namespace control should be the job of modules.

.. _numpy: http://numpy.scipy.org/

A particularly bad usage of mixins
-----------------------------------------------------------

A tipical beginner's tutorial (for instance, I recommend `Using
Mix-ins with Python`_ , by Chuck Esterbrook which is very well written
and very informative, even if the point of view is exactly the
opposite of mine) will tell you that mixins are used to add
functionality to the classes they mix in. For instance a mixin class
``WithLog`` 
could be used to enhance a pre-existing class ``C`` with a logging
capability:

$$C

$$WithLog
    
$$C_WithLog

An example of usage is the following:

 >>> c = C_WithLog()
 >>> c.log.warn("hello")

That prints

 ``WARNING:C_WithLog:hello``.

The usage of mixins you see here is wrong: why would you use
inheritance when you need just one method? You can just import the one
method you need!  Generally speaking, a mixin class has sense only
when you have a set of methods which belong together: if you
have a single method, or a set of disconnected methods, you are much
better off by defining the methods externally, in an utility module,
and then by importing them in the class namespace. Of course,
here I am assuming that you really want the external method to end up
in the class namespace, possibly because of interface requirements,
but I am not saying that this is always a good idea. You can import the
method in your class as simply as that:

$$CWithLog

This approach is very little used in Python, probably because most people
coming from other languages do not know it is possible, but it
is in my opinion a much clearer solution than inheritance.
The problem with inheritance is that it requires a *substantial
cognitive load*: when I see the line of code ``class C_WithLog(C, WithLog)``
I immediately I ask myself many
questions: *which methods are exported by C_WithLog?
is there any method of C which accidentally overrides one of the methods
of C_WithLog? if yes, is there any method cooperation mechanism
(super) or not? what are the ancestors of C_WithLog? which methods
are coming from them? are such methods overridden by some C method?
is there a cooperation mechanism on C_WithLog ancestors? What's the*
`method resolution order`_ *of the hierarchy?*
On the other hand, if I see ``from utility import log`` I have very little
to understand and very little to worry about. The only caution in this
specific example is that I will have a single logger shared by all
instances of the class since
``logging.getLogger(self.__class__.__name__)`` will return always the
same object. If I need different loggers with different configurations
for different instances I will have to override the ``.log`` attribute on
a case by case basis, or I will have to use a different strategy, such as
the `dependency injection pattern`_, i.e. I will have to pass the logger
to the constructor.

.. _Using Mix-ins with Python: http://www.linuxjournal.com/article/4540
.. _dependency injection pattern: http://en.wikipedia.org/wiki/Dependency_injection
.. _CLOS: http://en.wikipedia.org/wiki/CLOS
.. _post of mine: http://www.artima.com/weblogs/viewpost.jsp?thread=237764
.. _method resolution order: http://www.python.org/download/releases/2.3/mro/

Acceptable usages of mixins
---------------------------------------------------------------

There are usages for mixins which are restricted in scope and not
dangerous: for instance, you can use mixins for implementing the comparison
interface, or the mapping interface.  This is actually the approach suggested
by the standard library, and by the new ABC's in Python 2.6. This is
an acceptable usage: in this case there is no incontrollable
growth of methods, since you are actually implementing well know
interfaces - typically a few specific special methods.
In order to give a practical example, let me discuss a toy application.

Suppose you want to define a ``PictureContainer`` class in an application
to manage pictures and photos. A ``PictureContainer`` object may contain
both plain pictures (instances of a ``Picture`` class) and
``PictureContainer`` objects, recursively.
From the point of view of the Python programmer it could make sense
to implement such a class by using a dictionary.
A ``Picture`` object will contain information such as the picture
title, the picture date, and a few methods to read and write the
picture on the storage (the file system, a relation database,
an object database like the ZODB or the AppEngine datastore_,
or anything else).

.. _datastore: http://code.google.com/appengine/docs/datastore/
.. _ZODB: http://wiki.zope.org/ZODB/guide/index.html

The first version of the ``PictureContainer`` class could be something
like that:

$$SimplePictureContainer

At this point, one realized that it is annoying to call the inner
dictionary directly and that it would be nicer to expose its methods.
A simple solution is to leverage on the standard
library class ``UserDict.DictMixin`` which is there just for
that use case. Since we are at it, we can also add the logging
functionality: that means that the low-level interface (calling
directly the inner dictionary methods) will not log whereas
the high level interface will log:

$$BetterPictureContainer

Using ``DictMixin`` is acceptable, since

1.
 ``DictMixin`` provided to its subclasses the standard interface of a
 dictionary, a conceptually tied set of methods;
2.
 the dictionary interface is well know to anybody knowing how
 to use dictionaries in Python, so that the cognitive load is zero;
3.
 ``DictMixin`` allows a substantial code reuse: we redefined explicitly just
 4 methods, but actually we are indirecty affecting 17 other methods:
 ``__cmp__, __contains__, __iter__,
 __len__, __repr__, clear, get, has_key, items, iteritems,
 iterkeys, itervalues, pop, popitem, setdefault, update, values``: without
 ``DictMixin`` we would need to reimplement all of them.

However, notice that in this example the usage of ``DictMixin`` as mixin
class is acceptable, but not optimal: the best solution is to use
``DictMixin`` as a base class, not as a mixin class.

The core problem is that we started from a wrong desing: we wrote
``SimplePictureContainer`` when we did not know of the existence of
``DictMixin``. Now, *a posteriori*, we are trying to fix the mistake by
using multiple inheritance, but that it not the Rigth Thing (TM)
to do. The right thing would be to change the source code of
``SimplePictureContainer`` and to derive directly from ``DictMixin``.

In the real world usually you do not have complete control of the code:
you may leverage on a third party library with a design error, or
simply with an old library, written when ``DictMixin`` did not exist.
In such a situation you may have no way to modify the source code.
Then using ``DictMixin`` and multiple inheritance is a perfectly
acceptable workaround, but it is a workaround still, and it
should not be traded for a clever design.

Moreover, even the best examples of mixins could be replaced
by generic functions: this is why I would not provide mixins, should I
write a new language from scratch. Of course, in an existing language like
Python, one has to follow the common idioms, so I use
mixins in a few controlled cases, and I have no problems with that.
For instance, one could define an ``EqualityMixin`` which defines
the special methods ``__eq__`` and ``__ne__``, with ``__ne__`` being the
logical negation of ``__eq__`` (Python does not do
that by default). That would be a fine usage but I don't do that, I
prefer to duplicate two lines of code and to write the ``__ne__``
method explicitly, to avoid complicating my inheritance hierarchy.
One should should decide when to use a mixin or not on a case by case
basis, with a bias for the *not*. 
'''

import logging

from UserDict import DictMixin
import pickle, logging, sys
from datetime import datetime

class SimplePictureContainer(object):
    "A wrapper around the .data dictionary, labelled by an id"
    def __init__(self, id, pictures_or_containers):
      self.id = id
      self.data = {} # the inner dictionary
      for poc in pictures_or_containers: 
        # both pictures and containers must have an .id
        self.data[poc.id] = poc


class BetterPictureContainer(SimplePictureContainer, DictMixin):
  from utility import log

  def __getitem__(self, id):
    return self.data[id]

  def __setitem__(self, id, value):
    self.log.info('Adding or replacing %s into %s', id, self.id)
    self.data[id] = value

  def __delitem__(self, id):
    self.log.warn('Deleting %s', id)
    del self.data[id]

  def keys(self):
    return self.data.keys()
      
logging.basicConfig(level=logging.INFO, stream=sys.stdout)

class Picture(object):
  def __init__(self, id, location, title, date):
    self.id = id
    self.location = location
    self.title = title
    self.date = date
  def __str__(self):
    return '<%s %s>' % (self.__class__.__name__, self.id)


class WithLog(object):
  "A mixin class"
  @property
  def log(self):
    return logging.getLogger(self.__class__.__name__)

class C(object):
  "A base class"

class C_WithLog(C, WithLog):
  "A mixin-enhanced class"

class CWithLog(C):
  from utility import log # log is the property defined above

if __name__ == '__main__':
  import doctest; doctest.testmod()