Marty Alchinhttp://martyalchin.com/2011-01-21T23:09:32-05:00Marty Alchinmarty@martyalchin.comhttp://martyalchin.com/about/A Tough Design Decision2011-01-21T23:09:32-05:00Marty Alchinmarty@martyalchin.comhttp://martyalchin.com/about/http://martyalchin.com/2011/jan/21/design-decision/ <p>When I set out to work on <a href="https://github.com/gulopine/biwako">Biwako</a>, I expected to have to make some hard choices, but I didn&#8217;t expect to hit one so quickly. What started as a simple feature turned out to take several days of research, trial and error before finally settling on a solution that I didn&#8217;t see coming. And in the spirit of building this framework out in the open, I&#8217;d like to spend some time sharing my experiences, in hopes of helping someone else who might face a similar problem&nbsp;someday.</p> <p>Yesterday, <a href="http://martyalchin.com/2011/jan/20/class-level-keyword-arguments/">I explained</a> how Python allows a class declaration to take keyword arguments alongside the usual base class. For Biwako, I planned to use that to use that feature as a way to keep things <abbr title="Don't Repeat Yourself"><span class="caps">DRY</span></abbr>. Some field types have options that are the same for all such fields within the same file format. It doesn&#8217;t make sense to have to specify the same option over and over again, when it can instead be supplied at the class&nbsp;level.</p> <p>So the goal is to provide arguments to the class that will then be passed into each of the fields that needs it. That puts it in a different category than, for instance, Django&#8217;s <a href="http://docs.djangoproject.com/en/1.2/ref/models/options/"><code>Meta</code> options</a>. Sure, some of those are accessed by fields, but they describe the model class itself. For Biwako, the options are much more tightly integrated with the fields themselves, which poses something of a&nbsp;problem.</p> <h2>The na&iuml;ve&nbsp;approach</h2> <p>To illustrate why it&#8217;s a problem, let&#8217;s start with what I&#8217;ll call the na&iuml;ve approach. The most straightforward solution I could think of, just to try to get the thing working. Basically, let Python process the <code>Structure</code> class declaration the way it normally would, then use the <code>attach_to_class()</code> method to retrieve the extra arguments from the&nbsp;class.</p> <p>First, a quick moment to explain <code>attach_to_class()</code> if you&#8217;re not familiar with it. Once Python has executed the body of the <code>Structure</code> class declartion, the metaclass for <code>Structure</code> will loop through all the class attributes, looking for any that have an <code>attach_to_class()</code> method. When one is found, the metaclass calls the method and passes it the class object and the name that the field was assigned&nbsp;to.</p> <p>Without this step, fields have no way to know what name they were given, so it&#8217;s a necessity in any declarative framework like this. The <a href="http://martyalchin.com/2007/nov/12/using-declarative-syntax-part-3/">last time I covered declarative syntax</a>, I only used this step to set the name, but this is also the first opportunity the field has to see the class it was assigned to, so it&#8217;s advantageous to pass that in as well. So to start, the <code>attach_to_class()</code> method looks something like&nbsp;this:</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Field</span><span class="p">:</span> <span class="c"># ...</span> <span class="k">def</span> <span class="nf">attach_to_class</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="n">label</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">label</span> <span class="ow">or</span> <span class="n">name</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">&#39;_&#39;</span><span class="p">,</span> <span class="s">&#39; &#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">label</span> <span class="o">=</span> <span class="n">label</span><span class="o">.</span><span class="n">title</span><span class="p">()</span> <span class="n">cls</span><span class="o">.</span><span class="n">_fields</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> </pre></div> </p> <p>So here it&#8217;s dealing with the name and assigning itself to a list of known fields. But this gets called from the <code>__init__()</code> method of the metaclass, which, as I explained yesterday, has all the class keyword arguments available to it. So if I pass those into <code>attach_to_class()</code> as well, the field can pick out which options it knows about and use them however it needs to. So for integers that take an <code>endianness</code> argument:</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Integer</span><span class="p">(</span><span class="n">Field</span><span class="p">):</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">endianness</span><span class="o">=</span><span class="n">BigEndian</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="nb">super</span><span class="p">(</span><span class="n">Integer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="o">=</span> <span class="n">endianness</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">)</span> <span class="c"># ...</span> <span class="k">def</span> <span class="nf">attach_to_class</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">endianness</span><span class="o">=</span><span class="n">BigEndian</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="o">=</span> <span class="n">endianness</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">)</span> <span class="nb">super</span><span class="p">(</span><span class="n">Integer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">attach_to_class</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">)</span> </pre></div> </p> <p>But wait, why do <code>__init__()</code> and <code>attach_to_class()</code> <em>both</em> take an <code>endianness</code> argument? Well, <code>__init__()</code> always has to accept it in case you want to specify it explicitly, for those rare formats where endianness might not be uniform throughout the file. So by adding options to <code>attach_to_class()</code> without updating <code>__init__()</code>, what happens now is inconsistent. If you supply <code>endianness</code> to just the class, it will work as you expect, but if you put it in the field instead, you get&nbsp;this:</p> <ol> <li>The field instantiates with the explicit endianness value and stores it&nbsp;away.</li> <li>The class comes along and overrides it with its default value, because <code>attach_to_class()</code> comes&nbsp;later.</li> </ol> <p>And if you happen to supply it in both places, it goes something like&nbsp;this:</p> <ol> <li>The field stores away its endianness value&nbsp;correctly.</li> <li>The class comes along and overrides it with the value it was&nbsp;given.</li> </ol> <p>So really, the field value is getting the shaft no matter what. The only time it works the right way is when you don&#8217;t pass anything into the field at all. So in order to make it work properly, <code>attach_to_class()</code> needs to figure out whether the argument was passed into the field or not. If it was, the class-level argument should be ignored, regardless of whether something was passed in&nbsp;explicitly.</p> <p>Of course, the simplest way to do that is to take the default value out of <code>__init__()</code> and use <code>None</code> instead. That way, <code>attach_to_class()</code> can check if the attribute is set to <code>None</code> or something more specific. Only in the former case should it bother supplying its own&nbsp;value.</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Integer</span><span class="p">(</span><span class="n">Field</span><span class="p">):</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">endianness</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="o">=</span> <span class="n">endianness</span> <span class="nb">super</span><span class="p">(</span><span class="n">Integer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="c"># ...</span> <span class="k">def</span> <span class="nf">attach_to_class</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">endianness</span><span class="o">=</span><span class="n">BigEndian</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">):</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="o">=</span> <span class="n">endianness</span> <span class="c"># Still need to initialize it</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">)</span> <span class="nb">super</span><span class="p">(</span><span class="n">Integer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">attach_to_class</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">)</span> </pre></div> </p> <p>Now, believe it or not, this works. It correctly handles all four combinations of where arguments could be passed in. But it has two fairly severe&nbsp;problems:</p> <ul> <li> <p>It&#8217;s ugly. File formats are often heavily customized, so custom fields are likely to be a big part of the Biwako ecosystem. If I&#8217;m asking my users to create their own custom fields very often, I&#8217;d like the <span class="caps">API</span> for it to be nicer, more straightforward and less prone to simple mistakes (like supplying a default value in <code>__init__()</code> even though that&#8217;s exactly what you&#8217;d normally&nbsp;do).</p> </li> <li> <p>It&#8217;s difficult to test. When writing unit tests for fields, it&#8217;s a lot better to instantiate the fields on their own, without being part of a class declaration. That way you have less parts involved and you can focus on testing just what you&#8217;re worried about. But if endianness isn&#8217;t set and instantiated until <code>attach_to_class()</code> is executed, those independent fields are pretty much&nbsp;useless.</p> </li> </ul> <p>These problems were enough to make me quickly realize that I needed a better solution. I figured I&#8217;d have a few options, so I wanted to spend the time to find out what I could do and how it would work. I started where you probably expected me to start. More&nbsp;metaclasses.</p> <h2>The declarative&nbsp;approach</h2> <p>You&#8217;ve probably noticed by now that I&#8217;m a big fan of declarative classes, so it shouldn&#8217;t be too surprising that my first instinct was to make another one. I figured that the arguments themselves could be pulled out of the methods and assigned as attributes to the <code>Field</code> class, just like fields are assigned to <code>Structure</code> classes. So <code>Integer</code> would look something like&nbsp;this:</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Integer</span><span class="p">(</span><span class="n">Field</span><span class="p">):</span> <span class="n">endianness</span> <span class="o">=</span> <span class="n">Argument</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="n">BigEndian</span><span class="p">)</span> <span class="c"># ...</span> </pre></div> </p> <p>Behind the scenes, though, things would work a lot like the na&iuml;ve approach. <code>__init__()</code> would be able to tell if an argument was pass in and <code>attach_to_class()</code> would figure out if it needed to override the attribute or not. The only real difference is that the arguments would come from the class declaration, which saves users the trouble of seeing the whole <code>__init__()</code>/<code>attach_to_class()</code> mess.</p> <p>That&#8217;s certainly prettier, and it uses a syntax I&#8217;m already expecting my users to be familiar with, so it seemed promising. But there&#8217;s another half of the problem: initializing fields on their own, outside of a <code>Structure</code>. As it stands, I&#8217;d still be left with undefined attributes if I wanted to use the&nbsp;defaults.</p> <p>So I then took advantage of the fact that class attributes, such as the <code>Argument</code> object in the above example, can be used as <a href="http://martyalchin.com/2007/nov/23/python-descriptors-part-1-of-2/">descriptors</a>. That way, the field can figure out when the testing code is trying to access it. If it doesn&#8217;t have an explicit value yet, the descriptor can provide a default, which was naturally already passed into it as an argument (to &#8230; <code>Argument</code> &#8230; yeah, its arguments <a href="http://en.wikipedia.org/wiki/Turtles_all_the_way_down">all the way down</a>, stay with&nbsp;me).</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Argument</span><span class="p">:</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span> <span class="o">=</span> <span class="n">default</span> <span class="c"># ...</span> <span class="k">def</span> <span class="nf">__get__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instance</span><span class="p">,</span> <span class="n">owner</span><span class="p">):</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">instance</span><span class="o">.</span><span class="n">__dict__</span><span class="p">:</span> <span class="c"># Default value to the rescue!</span> <span class="n">instance</span><span class="o">.</span><span class="n">__dict__</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span> <span class="k">return</span> <span class="n">instance</span><span class="o">.</span><span class="n">__dict__</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> </pre></div> </p> <p>So far, so good. If the field gets instantiated outside a class, it doesn&#8217;t know that at first, but as soon as one of its arguments is accessed, it quickly sets a default value if it needs to and allows the field to act as if it knew all along how it was supposed to&nbsp;work.</p> <p>You&#8217;ll notice one key feature went missing, though: the argument can&#8217;t be initialized. The way it stands, if I were to pass in <code>endianness=BigEndian</code>, the endianness value would end up being the <code>BigEndian</code> class, rather than an instance of that class that&#8217;s been tailored to the field&#8217;s size. So we need a way to specify an initialization function for each argument as&nbsp;well.</p> <p>For that, I turned to a decorator. In the class declaration, the <code>endianness</code> argument gets instantiated as an object right away. I can then add a method on that object to act as a decorator, which will allow users to mark a field method as being the initialization function for the&nbsp;argument.</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Argument</span><span class="p">:</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span> <span class="o">=</span> <span class="n">default</span> <span class="c"># A default initializer that does nothing</span> <span class="bp">self</span><span class="o">.</span><span class="n">initialize</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">obj</span><span class="p">,</span> <span class="n">value</span><span class="p">:</span> <span class="n">value</span> <span class="c"># ...</span> <span class="k">def</span> <span class="nf">__get__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instance</span><span class="p">,</span> <span class="n">owner</span><span class="p">):</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">instance</span><span class="o">.</span><span class="n">__dict__</span><span class="p">:</span> <span class="c"># Default value to the rescue!</span> <span class="n">value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">initialize</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">default</span><span class="p">)</span> <span class="n">instance</span><span class="o">.</span><span class="n">__dict__</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span> <span class="k">return</span> <span class="n">instance</span><span class="o">.</span><span class="n">__dict__</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="k">def</span> <span class="nf">init</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">func</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">initialize</span> <span class="o">=</span> <span class="n">func</span> <span class="k">return</span> <span class="n">func</span> <span class="c"># ...</span> <span class="k">class</span> <span class="nc">Integer</span><span class="p">(</span><span class="n">Field</span><span class="p">):</span> <span class="n">endianness</span> <span class="o">=</span> <span class="n">Argument</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="n">BigEndian</span><span class="p">)</span> <span class="nd">@endianness</span><span class="o">.</span><span class="n">init</span> <span class="k">def</span> <span class="nf">initialize_endianness</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">endianness</span><span class="p">):</span> <span class="k">return</span> <span class="n">endianness</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">)</span> <span class="c"># ...</span> </pre></div> </p> <p>Of course, there are other places where this new <code>initialize()</code> function would get called, but you get the idea. Now we have a way to populate arguments from two different places, give them default values, and initialize whatever values are being used. It&#8217;s certainly a prettier, friendlier approach and it works great with tests. But it has its own&nbsp;caveats:</p> <ul> <li> <p>It&#8217;s surprisingly fragile. If you were to provide your own <code>__init__()</code> method and try to use the argument too early, the default value would get stored before the class has a chance to provide its value, and there&#8217;d be no way for the <code>Argument</code> class to know whether that was intentional or&nbsp;accidental.</p> </li> <li> <p>It&#8217;s counter-intuitive. It might be pretty, but the fact remains that we&#8217;ve introduced a declarative syntax in order to handle what would seem like a common situation: passing an argument into a function. It requires users to &#8220;unlearn&#8221; what they&#8217;ve learned, and I don&#8217;t think that&#8217;s a good thing to be&nbsp;doing.</p> </li> <li> <p>It&#8217;s <a href="http://en.wikipedia.org/wiki/Law_of_the_instrument">Maslow&#8217;s hammer</a>. If all you have is a hammer, everything looks like a nail. I have more than just declarative syntax at my disposal, so it feels wrong to just jump straight to it whenever I&#8217;m faced with a problem. There&#8217;s got to be other options available, and one of them is probably more appropriate, especially given the previous&nbsp;point.</p> </li> </ul> <p>It&#8217;s easy to ignore that third point, and just figure that if something works, it must be the right tool for the job. In this case, it&#8217;s an extremely heavy-handed approach to an otherwise simple problem, and with my particular reliance on the declarative sytle, I really wanted to push myself to use something&nbsp;else.</p> <h2>The&nbsp;placeholder</h2> <p>So while I was working on the declarative approach, I thought of another possibilty. The real problem with the na&iuml;ve approach is that it relied on a value of <code>None</code> to determine if an argument had been passed into the field or not, once <code>attach_to_class()</code> comes along. So what if I could use a different value instead? One that not only served the same purpose, but could also be used to manage a default&nbsp;value?</p> <p>Basically, I&#8217;d use the field&#8217;s <code>__init__()</code> method like normal, but instead of passing in the argument&#8217;s default value directly, it could be wrapped up in a <code>Default</code> object (I actually called it <code>Arg</code> but I like <code>Default</code> better). Then <code>attach_to_class()</code> check for an instance of that object instead, and if found, grab the default value from the object itself and use&nbsp;that.</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Default</span><span class="p">:</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">default_value</span> <span class="k">class</span> <span class="nc">Field</span><span class="p">:</span> <span class="c"># ...</span> <span class="k">def</span> <span class="nf">attach_to_class</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="n">label</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">label</span> <span class="ow">or</span> <span class="n">name</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">&#39;_&#39;</span><span class="p">,</span> <span class="s">&#39; &#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">label</span> <span class="o">=</span> <span class="n">label</span><span class="o">.</span><span class="n">title</span><span class="p">()</span> <span class="n">cls</span><span class="o">.</span><span class="n">_fields</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">options</span><span class="p">:</span> <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span> <span class="ow">and</span> \ <span class="nb">isinstance</span><span class="p">(</span><span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">),</span> <span class="n">Default</span><span class="p">):</span> <span class="nb">setattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">value</span><span class="o">.</span><span class="n">default_value</span><span class="p">)</span> <span class="k">class</span> <span class="nc">Integer</span><span class="p">(</span><span class="n">Field</span><span class="p">):</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">endianness</span><span class="o">=</span><span class="n">Default</span><span class="p">(</span><span class="n">BigEndian</span><span class="p">),</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="o">=</span> <span class="n">endianness</span> <span class="nb">super</span><span class="p">(</span><span class="n">Integer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> </pre></div> </p> <p>The <code>hasattr()</code> test is necessary because this will get <em>all</em> the options, regardless of whether any of them actually mean something to this particular&nbsp;field.</p> <p>This approach works for the most part, but again the initialization bit has gone missing. Since the <code>Default</code> object gets created inside the function signatures, there&#8217;s no good place to put the initialization code. There are three options that I can&nbsp;see:</p> <ul> <li>Create the <code>Default</code> object somewhere else, so it can be used as a&nbsp;decorator.</li> <li>Pass in the initialization function as another argument to <code>Default</code>.</li> <li>Add the initialization function to the <code>Default</code> object somewhere in the code of <code>__init__()</code>, before it gets&nbsp;accessed.</li> </ul> <p>None of those are very convenient for users implementing their own fields. Technically a fourth could be initializing the value in <code>attach_to_class()</code>, but then we&#8217;re back to having problems with&nbsp;testing.</p> <p>All in all, I didn&#8217;t get very far into this one before abandoning it due to lack of flexibility for things like initialization. Providing a default value was easy enough, but it wasn&#8217;t long before things got considerably more hairly. So when I went back to the drawing board, I thought much&nbsp;simpler.</p> <h2>Double&nbsp;initialization</h2> <p>A thought had occured to me: what I&#8217;m really trying to do is initialize the arguments in two different places. The first happens when <code>__init__()</code> is called during the creation of the field object. The second happens inside of <code>attach_to_class()</code>. But ultimately it&#8217;s the same process either way, so why not just call <code>__init__()</code> twice?</p> <p>This one works by storing away the arguments that were passed in when the field was created, then filling in class-level arguments wherever a field-level argument wasn&#8217;t passed in. So in order to avoid messing with <code>__init__()</code> on the field any more than I had to, I built a very small metaclass for fields to work with&#8212;much smaller than the declarative approach shown&nbsp;earlier.</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">FieldMeta</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span> <span class="k">def</span> <span class="nf">__call__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="c"># This gets called before __new__() or __init__()</span> <span class="n">field</span> <span class="o">=</span> <span class="nb">super</span><span class="p">(</span><span class="n">FieldMeta</span><span class="p">,</span> <span class="n">cls</span><span class="p">)</span><span class="o">.</span><span class="n">__call__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="n">field</span><span class="o">.</span><span class="n">_arguments</span> <span class="o">=</span> <span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">)</span> <span class="k">return</span> <span class="n">field</span> <span class="k">class</span> <span class="nc">Field</span><span class="p">(</span><span class="n">metaclass</span><span class="o">=</span><span class="n">FieldMeta</span><span class="p">):</span> <span class="k">def</span> <span class="nf">attach_to_class</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">):</span> <span class="n">args</span><span class="p">,</span> <span class="n">kwargs</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_arguments</span> <span class="n">options</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">kwargs</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">)</span> <span class="k">class</span> <span class="nc">Integer</span><span class="p">(</span><span class="n">Field</span><span class="p">):</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">endianness</span><span class="o">=</span><span class="n">BigEndian</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="nb">super</span><span class="p">(</span><span class="n">Integer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="o">=</span> <span class="n">endianness</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">)</span> </pre></div> </p> <p>So the way this works, when you create a field, it&#8217;s complete, right out of the box. It has the necessary default values, initialization happens in <code>__init__()</code> where it belongs, there are no pesky placeholders or strange syntax and it works just as well for testing as it does in real use. It&#8217;s a fairly clean, concise solution &#8230; except for one potential&nbsp;problem.</p> <p>Ordinarily, <code>__init__()</code> is called exactly once for a given object. When it&#8217;s first created, <code>__init__()</code> gets a chance to set some of its values to their starting positions, making the object ready for general use. Doing this step twice won&#8217;t cause problems in most cases, especially since the field won&#8217;t really get used between the two calls to <code>__init__()</code>.</p> <p>But what actually goes on in <code>__init__()</code> is entirely up to you. You might just set a few attributes and be done with it or you might add the field to some internal registry for later use. That latter option would be considered a side-effect: the code modifies something outside of its own scope, and that change persists even after the function is done&nbsp;executing.</p> <p>If the code gets run twice, any side-effects would also occur twice, which could be a problem. In the case of a field registry, you would end up with each field occurring twice in the registry, which would definitely cause problems. And since there&#8217;s no obvious cue that <code>__init__()</code> would get called twice, such problems could be difficult to track down to their&nbsp;source.</p> <p>What I really need is a way to just get access to the class-level arguments <em>before</em> the fields are created, so that I can just pass in the full, correct set of arguments the first time and save myself all this&nbsp;mess.</p> <h2>Thread&nbsp;locals</h2> <p>Now, if you&#8217;re familiar with programming, and if you&#8217;ve been following Django design discussions in particular, you might see the phrase &#8220;thread locals&#8221; and immediately jump to scenes of mass descruction, flesh being torn off from innocent bystanders and babies having their candy taken away. But I felt it was my responsibility to consider every&nbsp;option.</p> <p>The foundation of this approach is actually a newer feature of metaclasses that I neglected to mention yesterday: the <code>__prepare__()</code> method. In Python 3, metaclasses can have a method called <code>__prepare__()</code>, which will get called <em>before</em> Python processes any of the contents of the class declaration. That is, before any of the fields have been created or initialized. Thankfully for us, <code>__prepare__()</code> also gets the same keyword argument dictionary as <code>__new__()</code> and <code>__init__()</code>, containing all the options that are declared at the top of the&nbsp;class.</p> <p>With that information available so early on, it&#8217;s possible to store those options right away. But we need a good place to put them. Unfortunately, because <code>__prepare__()</code> gets called so early, it doesn&#8217;t have access to the class object yet (that hasn&#8217;t even been created yet). Instead, it gets the name of the class, a tuple of its base classes and the dictionary of keyword arguments. So we turn to <a href="http://docs.python.org/library/threading.html#threading.local">thread locals</a>.</p> <p>In a nutshell, thread locals are a way to store data so that only the current thread can see it. That way, if more than one thread happens to be running the same code at about the same time, there won&#8217;t be any conflict between the two. Since a single thread can only run code sequentially, we can be sure that if we place those class-level options in thread-local storage, it&#8217;ll be available exactly when we need it, and only to the thread that should see&nbsp;it.</p> <p>Then, when each field is called, it can look in thread-local storage to find any class-level arguments and combine them with its own arguments before calling <code>__init__()</code> in the first&nbsp;place.</p> <div class="typygmentdown"><pre><span class="kn">import</span> <span class="nn">threading</span> <span class="k">class</span> <span class="nc">FieldMeta</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span> <span class="n">_registry</span> <span class="o">=</span> <span class="n">threading</span><span class="o">.</span><span class="n">local</span><span class="p">()</span> <span class="n">_registry</span><span class="o">.</span><span class="n">options</span> <span class="o">=</span> <span class="p">{}</span> <span class="k">def</span> <span class="nf">__call__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="k">if</span> <span class="n">FieldMeta</span><span class="o">.</span><span class="n">_registry</span><span class="o">.</span><span class="n">options</span><span class="p">:</span> <span class="n">options</span> <span class="o">=</span> <span class="n">FieldMeta</span><span class="o">.</span><span class="n">_registry</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="n">options</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">kwargs</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="n">options</span> <span class="o">=</span> <span class="n">kwargs</span> <span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">FieldMeta</span><span class="p">,</span> <span class="n">cls</span><span class="p">)</span><span class="o">.</span><span class="n">__call__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">)</span> <span class="k">class</span> <span class="nc">Field</span><span class="p">(</span><span class="n">metaclass</span><span class="o">=</span><span class="n">FieldMeta</span><span class="p">):</span> <span class="c"># ...</span> <span class="k">class</span> <span class="nc">Integer</span><span class="p">(</span><span class="n">Field</span><span class="p">):</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="n">endianness</span><span class="o">=</span><span class="n">BigEndian</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="nb">super</span><span class="p">(</span><span class="n">Integer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">endianness</span> <span class="o">=</span> <span class="n">endianness</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">size</span><span class="p">)</span> <span class="k">class</span> <span class="nc">StructureMeta</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span> <span class="nd">@classmethod</span> <span class="k">def</span> <span class="nf">__prepare__</span><span class="p">(</span><span class="n">metacls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">):</span> <span class="n">FieldMeta</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">arguments</span> <span class="o">=</span> <span class="n">options</span> <span class="c"># ...</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">bases</span><span class="p">,</span> <span class="n">attrs</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">):</span> <span class="c"># ...</span> <span class="c"># Clean up the thread-local dictionary</span> <span class="n">FieldMeta</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">arguments</span> <span class="o">=</span> <span class="p">{}</span> </pre></div> </p> <p>With this in place, each field has the opportunity to instantiate itself using <em>all</em> of the arguments that it needs, regardless of where they were specified, without requiring any special handling for the subclass. Initialization of arguments happens in <code>__init__()</code>, and that process happens exactly once per field, just like it should. Fields get their default values when creating them for testing, and they get their full options in real&nbsp;use.</p> <h2>The&nbsp;decision</h2> <p>So after going through this entire process (over about five days), I finally had to come to a decision. As much as I had expected to find something more traditional, it turned out that thread locals actually solved the problem more cleanly than any other. It made the <span class="caps">API</span> dead obvious by not changing anything, and it was actually pretty simple to&nbsp;implement.</p> <p>So there you have it. Biwako uses thread locals, and now you know&nbsp;why.</p> <h2>The moral of the&nbsp;story</h2> <p>Now, this hasn&#8217;t been a story of thread locals, metaclasses, descriptors or anthing else. Really, I just wanted to show the design process. Writing a framework requires countless decisions like this, and they can sound really&nbsp;daunting.</p> <p>Sometimes it can seem like framework authors just naturally have things they like and don&#8217;t like or that they somehow always have all the right answers. But the fact is, most of these decisions are the result of a lot of trial and error, soul searching and trying very hard to think of the user experience above all else. Design and usability doesn&#8217;t only have to be visual; <span class="caps">API</span> design uses many of the same&nbsp;principles.</p> <p>So Think of your users. Research your options. Keep an open mind. Ride the wave and you might be surprised where the process takes&nbsp;you.</p>Class-level Keyword Arguments2011-01-20T22:40:50-05:00Marty Alchinmarty@martyalchin.comhttp://martyalchin.com/about/http://martyalchin.com/2011/jan/20/class-level-keyword-arguments/ <p>Python sports some <a href="http://martyalchin.com/2007/nov/22/dynamic-functions/">nifty features</a> when it comes to handling arguments, but those are only for functions. A class declaration is limited to just a list of base classes &#8230; or is&nbsp;it?</p> <p>Supplying one or more base classes in a class declaration looks pretty much like passing positional arguments to a&nbsp;function.</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Example</span><span class="p">(</span><span class="n">Base</span><span class="p">,</span> <span class="n">Mixin</span><span class="p">):</span> <span class="k">pass</span> </pre></div> </p> <p>You have object references, commas to separate them and parentheses to hold it all in place. I had <a href="http://martyalchin.com/2007/nov/11/using-declarative-syntax-part-2/">touched briefly</a> on metaclasses before, but pretty much glazed over the fact that metaclasses can actually receive this list of base classes as a tuple, much like variable positional arguments in a&nbsp;function.</p> <p>So there&#8217;s some precedent here for treating classes and functions somewhat similarly. But the elephant in the room from my <a href="http://martyalchin.com/2011/jan/20/biwako/">earlier post</a> about Biwako is that it&#8217;s possible to supply keyword arguments to a class declaration as&nbsp;well!</p> <h2>Metaclasses in Python&nbsp;3</h2> <p>This is where we get a disclaimer: the technique I&#8217;m about to describe is only available in Python 3.0 and higher. Python 3 came with a change to the way metaclasses are specified, using a <code>metaclass</code> keyword argument instead of a <code>__metaclass__</code> attribute like Python 2 used. So class declarations start to look more like&nbsp;this:</p> <div class="typygmentdown"><pre><span class="k">class</span> <span class="nc">Example</span><span class="p">(</span><span class="n">Base</span><span class="p">,</span> <span class="n">metaclass</span><span class="o">=</span><span class="n">BaseMeta</span><span class="p">):</span> <span class="k">pass</span> </pre></div> </p> <h2>Supporting other&nbsp;arguments</h2> <p>With this change on the table, Python 3 also opens up the possibility for arbitrary keyword arguments. These arguments are provided to the metaclass as part of the class declaration. Both the <code>__new__()</code> and <code>__init__()</code> methods receive them as standard keyword arguments, so you can grab them using the double-asterisk&nbsp;syntax.</p> <div class="typygmentdown"><pre>class BaseMeta(type): def __new__(cls, name, bases, attrs, **options): # This is only necessary because type.__new__() # doesn&#39;t know how to handle the extra arguments return type.__new__(cls, name, bases, attrs) def __init__(cls, name, bases, attrs, **options): print(options) class Base(metaclass=BaseMeta, option=True): pass </pre></div> </p> <p>Running this code will simply output <code>{'option': True}</code> because Python handles the <code>metaclass</code> as a special case. It&#8217;s already figured out which metaclass to use, so it strips that out when sending the rest of the arguments through. What&#8217;s left is a dictionary of whatever else you could want your class declarations to&nbsp;accept.</p> <p>With these new arguments, you can write base classes that can process class-wide options right in the first line of the class declaration itself. Otherwise, you&#8217;re stuck using an approach like Django&#8217;s, where you have to supply an inner class. It&#8217;s functional, and it worked well when it was the only option, but it&#8217;s far less elegant by&nbsp;comparison.</p> <h2>Use it&nbsp;wisely</h2> <p>Not all classes can make good use of keyword arguments in this way. For some features, you might be better off using a separate mixin class to define custom behavior. Other situations might make more sense if you simply add attributes to the class directly or special methods that control extra&nbsp;behavior.</p> <p>There&#8217;s no one answer to when you should or shouldn&#8217;t use this or any other feature. You always want to research all your options and use the one that makes the most sense, both for now and for maintenance in the future. Hopefully you at least understand this new feature well enough to add it to your toolbox for&nbsp;later.</p> <h2>Stay&nbsp;tuned</h2> <p>My next blog post (probably tomorrow or this weekend) will explain some of the different ways I tried to use class-level arguments in Biwako before finally settling on a solution that just might surprise&nbsp;you.</p>Biwako: File Formats Made Easy2011-01-20T19:30:00-05:00Marty Alchinmarty@martyalchin.comhttp://martyalchin.com/about/http://martyalchin.com/2011/jan/20/biwako/ <p>For years now, I&#8217;ve been researching various kinds of file formats, from music and images to video games and even <span class="caps">NASCAR</span> data streams. Each format is usually considered to be unique&#8212;at least as far as parsing/saving implementations go, but the truth is that they have a lot in common. And anytime you have a bunch of independent tasks that share similar aspects, you have an ideal environment for the creation of a framework to make those common aspects easier to&nbsp;manage.</p> <p>To that end, I&#8217;ve created <a href="https://github.com/gulopine/biwako">Biwako</a>. It&#8217;s still very early on in the process, but it covers some interesting features of Python that I really want to write about, so it&#8217;s useful to have some context. This is just a brief introduction to explain the motivations behind my use of some of the other topics I&#8217;ll be writing about&nbsp;soon.</p> <p>Biwako is a declarative class framework, similar to Django&#8217;s models and forms. It allows you to define a file format using a class definition and a series of individual field defintions, which you can then use to create, parse, modify or save files in the binary format you&#8217;ve defined. It can be used either to create your own custom file formats, but where it really shines is by helping you access data in formats specified by other standards or&nbsp;applications.</p> <h2>Usage</h2> <p>For example, here&#8217;s a very simple Biwako class that will can parse part of the <span class="caps">GIF</span> file format, allowing you to easily get to the width and height of any <span class="caps">GIF</span>&nbsp;image.</p> <div class="typygmentdown"><pre><span class="kn">from</span> <span class="nn">biwako</span> <span class="kn">import</span> <span class="n">bin</span> <span class="k">class</span> <span class="nc">GIF</span><span class="p">(</span><span class="n">bin</span><span class="o">.</span><span class="n">Structure</span><span class="p">,</span> <span class="n">endianness</span><span class="o">=</span><span class="n">bin</span><span class="o">.</span><span class="n">LittleEndian</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">&#39;ascii&#39;</span><span class="p">):</span> <span class="n">tag</span> <span class="o">=</span> <span class="n">bin</span><span class="o">.</span><span class="n">FixedString</span><span class="p">(</span><span class="s">&#39;GIF&#39;</span><span class="p">)</span> <span class="n">version</span> <span class="o">=</span> <span class="n">bin</span><span class="o">.</span><span class="n">String</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mf">3</span><span class="p">)</span> <span class="n">width</span> <span class="o">=</span> <span class="n">bin</span><span class="o">.</span><span class="n">Integer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mf">2</span><span class="p">)</span> <span class="n">height</span> <span class="o">=</span> <span class="n">bin</span><span class="o">.</span><span class="n">Integer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mf">2</span><span class="p">)</span> </pre></div> </p> <p>Now you have a class that can accept any <span class="caps">GIF</span> image as a file (or any file-like object that&#8217;s readable) and parse it into the attributes shown on this&nbsp;class.</p> <div class="typygmentdown"><pre><span class="o">&gt;&gt;&gt;</span> <span class="n">image</span> <span class="o">=</span> <span class="n">GIF</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s">&#39;example.gif&#39;</span><span class="p">,</span> <span class="s">&#39;rb&#39;</span><span class="p">))</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">image</span><span class="o">.</span><span class="n">width</span><span class="p">,</span> <span class="n">image</span><span class="o">.</span><span class="n">height</span> <span class="p">(</span><span class="mf">400</span><span class="p">,</span> <span class="mf">300</span><span class="p">)</span> </pre></div> </p> <p>Of course, a full format definition would have many more fields available, but you get the idea. The repository currently has a few examples like this, but ultimately the goal is that you&#8217;ll be able to easily create your own classes using whatever documentation is available for the formats you&#8217;re interested in. Since most formats would be useful across multiple projects, I&#8217;m considering setting some sort of formal site where you&#8217;ll be able to upload your own class or find and download existing classes that were created by others. It&#8217;s still too early to get into that level of detail,&nbsp;though.</p> <h2>Python&nbsp;3</h2> <p>Since this is a new framework with no existing users to support, I&#8217;ve decided to support Python 3 right out of the box and not even bother trying to maintain compatibility with previous versions. There are a number of advantages to this, not the least of which is an easy way to distinguish between bytes and strings. I&#8217;ll be explaining some of the advantages of Python 3 in future blog posts, and there are some pretty great features to take advantage&nbsp;of.</p> <p>Of course, supporting only Python 3 means that there are currently some limits on which projects can use Biwako, because of other projects that might not yet have a Python 3 version available. These cases should be getting less and less common as time goes on, and it&#8217;s not an issue for most command-line cases where you just need to process some information in a bunch of files at&nbsp;once.</p> <h2>Under&nbsp;construction</h2> <p>Biwako is under heavy development right now. Most open source projects say this to some extent or another, but because it&#8217;s so young, I really mean it. I&#8217;m developing this &#8220;live&#8221; so every day I&#8217;m pushing new code to GitHub (that was my New Year&#8217;s resolution). That means that every day there&#8217;s either new code to play with, new documentation to read or new tests to run. But it also means that anything you wrote using yesterday&#8217;s code might break in surprising&nbsp;ways.</p> <p>I&#8217;m doing my best to establish a stable <span class="caps">API</span> early on, but it&#8217;s still too early for me to consider any aspect of Biwako as stable yet. Some days you might find things work a little differently than you expected, and other days you might find that I&#8217;ve pulled several rugs completely out from underneath you. You&#8217;re always free to play around with it, but if you upll new code one day and everything&#8217;s suddenly broken, please don&#8217;t come complain to me&#8212;at least, not until I&#8217;ve marked a stable&nbsp;<span class="caps">API</span>.</p> <p>I&#8217;ll do my best to document changes as they happen, but I fully expect the documentation to lag a bit behind the code. It&#8217;s not that I don&#8217;t like to write docs, I just that right now, getting the code working, stabilized and rich with features are much higher priorities. I&#8217;m having fun writing it, and sometimes I just want to have more fun with before getting down to the real work of docs. If you&#8217;re interested, you can <a href="http://biwako.rtfd.org/">read the docs</a> now and keep an eye on them in the&nbsp;future.</p> <h2>Future&nbsp;plans</h2> <p>I&#8217;ve got a lot in mind for this little framework. So far, I&#8217;ve implemented enough to parse a few simple formats, but I&#8217;ve got list of nearly a hundred different formats to test it out on, which run a pretty wide gamut. For now, my focus is on binary file formats, but I&#8217;ve laid out the namespaces in a way that allows for future expansion into text-based formats as well. Perhaps someday <a href="http://pypi.python.org/pypi/Sheets/">Sheets</a> will find a new home as part of Biwako, for&nbsp;example.</p> <p>So keep checking back for new code, new documentation and new articles about the awesome features Python 3 has to&nbsp;offer.</p>Competing on Technology2010-10-14T22:34:00-05:00Marty Alchinmarty@martyalchin.comhttp://martyalchin.com/about/http://martyalchin.com/2010/oct/14/competing-technology/ <p>So today, Jesse Noller wrote up <a href="http://jessenoller.com/2010/10/14/how-can-you-compete-with-google/">an interesting article</a> about how to compete in a market once Google makes an appearance. He makes a lot of valid points, but I wanted to take a step further and try to explain why I think we shouldn&#8217;t be looking to Google for points on how to run a business these&nbsp;days.</p> <p>I&#8217;m an open source nut. It&#8217;s not about free software, whether as in speech or beer. It&#8217;s not about the personal satisfaction from helping others learn how to <a href="http://prodjango.com/">make</a> <a href="http://propython.com/">something</a> awesome. It&#8217;s not even about the thrill of tinkering with something to figure out how it works, though that&#8217;s probably the most fun of the whole thing. No, I love open source because it helps force companies to compete on more than just&nbsp;technology.</p> <p>Competing on technology is almost as bad as competing on&nbsp;price.</p> <p>We&#8217;ve all heard the stories. A couple of guys in their garage/basement/loft spend a few days/weeks/months building some awesome bit of technology that threatens the market leaders. It&#8217;s true that anybody with a computer, some knowledge and some free time can replicate just about whatever somebody else is doing. Sure, there are some limitations to just how closely you can mimic another service, particularly when speed and reliability are at stake (those things cost money, you&nbsp;know).</p> <p>So for those of us really wanting to differentiate ourselves, Jesse rightly suggests that we try to compete on things other than technology. Bring more to the table than others can bring. Know more about your users, their needs and how you can address those needs. Care more about your users, their motivations, their passions and the reason they&#8217;re using your software in the first place. Not everybody can put themselves in their users&#8217; shoes, and even fewer bother to try. If you design for humans first, you can set yourself apart. I won&#8217;t pretend that it&#8217;s easy; it&#8217;s not. But it&#8217;s worth&nbsp;it.</p> <p>Now, reading this after Jesse&#8217;s article, you might think I&#8217;m suggesting that Google&#8217;s business model is unsound, and that they can&#8217;t possibly survive in the long term. I don&#8217;t know about the odds of Google&#8217;s long-term success, but it&#8217;s important to realize that Google doesn&#8217;t compete on technology. Google is a member of a fairly small club in the grand scheme of&nbsp;companies.</p> <p>Their technology got them far enough head in a short enough time that they built themselves a&nbsp;reputation.</p> <p>That&#8217;s an important distinction. Google&#8217;s core services are good enough to have earned them a reputation for quality. They worked hard for that reputation, especially since they did it with very little traditional marketing over the years. But at the end of the day, that&#8217;s really what they&#8217;ve got. As Jesse mentioned, when Google releases something, people flock to it, but not because it&#8217;s an awesome piece of technology, but because it&#8217;s Google. It&#8217;s that reputation that gets people to show up, and that&#8217;s what they use to crush (or buy) their&nbsp;competitors.</p> <p>In a way, Google is even worse than competing on technology, because they don&#8217;t even really have to innovate to succeed. But because reputation can&#8217;t be created in a garage/basement/loft, they still have something powerful to bring to the table, and that&#8217;s why they&#8217;re such fearsome competitors. They&#8217;ve built a wave (no, that <em>that</em> wave), and they&#8217;re riding it all the way to the&nbsp;bank.</p> <p>That sword cuts both ways, though. It can bring a ton of people to a new service overnight, but sometimes they seem to rely on it too much. Some of their offerings seem to just expect that everyone will show up and use it, regardless of whether it&#8217;s actually useful for anything, much less whether it&#8217;s enjoyable to use. It&#8217;s that side of their offerings that makes me wonder about their longevity, but they&#8217;re far too big to start pretending they&#8217;re on their way&nbsp;out.</p> <p>The natural comparison, though, is Apple. Jobs and company don&#8217;t get into just anything, and they don&#8217;t do much of anything half-heartedly. They focus on the people who use software and the experience they have while using it. They hire people who genuinely care about people and how they interact with technology, and that passion shows through in their products. That&#8217;s what they bring to the table beyond technology. Sure, their tech is pretty awesome and it keeps getting better, but they choose their hardware offerings in service of a better experience for their customers. There are nearly always products that are technically superior, but Apple keeps doing better because they focus on more than the technology&nbsp;itself.</p> <p>So in a nutshell, Jesse&#8217;s right that you should put people first. But it&#8217;s a bit more subtle than that. The key point is to offer more than just technology. User experience is a great place to go if you can, but there are other ways as well. Customer service, continuous improvement and community involvement can all help tremendously to differentiate you from your&nbsp;competition.</p> <p>Don&#8217;t ask &#8220;What if Google does it?&#8221; Just try to bring whatever you can to the table that&#8217;s uniquely yours. Someone else might still make something technically better, but if there are other factors at play, you&#8217;re likely to have people who will latch on to those other factors, and if you&#8217;ve done a good enough job at those, your customers won&#8217;t be willing to give them up. Even for&nbsp;Google.</p>Usage Driven Design2010-09-14T12:15:00-05:00Marty Alchinmarty@martyalchin.comhttp://martyalchin.com/about/http://martyalchin.com/2010/sep/14/usage-driven-design/ <p>I&#8217;ve had a project rattling around in my head for a few years now. Take Django&#8217;s declarative approach to <a href="http://docs.djangoproject.com/en/1.2/topics/db/models/">models</a> and <a href="http://docs.djangoproject.com/en/1.2/topics/forms/">forms</a>, and apply it to the definition of binary file formats. I kow I&#8217;m not the only one to have thought of it, but I think I&#8217;m the first to take it seriously as a project. So far, it&#8217;s had many names and taken many forms, but I think I&#8217;ve finally found an approach that&#8217;ll help me actually get the thing done: usage driven&nbsp;design.</p> <p>I doubt I&#8217;m the first to come up with this, and there&#8217;s probably an oddly-named Wikipedia page somewhere describing in excruciating detail how to use it in a business setting or something. For me, it just boils down to using the project I want to make, before I&#8217;ve even made&nbsp;it.</p> <p>There are several file formats in a particular domain that I really want to support with this framework, so what I had done in the past was write the code to support those frameworks, test them rigorously and congratulate myself on a job well done, before moving on to implement more formats and add to the framework as necessary to support them. I&#8217;ve done this at least 4 times now, and though I got better at understanding and anticipating the problems, I&#8217;d always find myself fighting with assumptions I made too early in the process. It tends to go a little something like&nbsp;this:</p> <blockquote> <p>Ooh, a chunked format! I can do those! Wait, they put the size of the chunk <em>before</em> the type indicator? That means I can&#8217;t reuse the code I put together for <a href="http://en.wikipedia.org/wiki/Interchange_File_Format"><span class="caps">IFF</span></a> files, so I guess I&#8217;ll now have to define what it means to be a chunk, so I can rearrange it if&nbsp;necessary&#8230;</p> <p>And there&#8217;s a <a href="http://en.wikipedia.org/wiki/Cyclic_redundancy_check"><span class="caps">CRC</span></a> value after the payload? I can easily reference one field within another, but this <span class="caps">CRC</span> includes the chunk type indicator as well, so I&#8217;ll need a way to specify the start and end values, as well as a way to get the raw data back out of the file again, so the <span class="caps">CRC</span> value can be&nbsp;verified&#8230;</p> <p>And ugh, the size includes the indicator value and the <span class="caps">CRC</span> value, instead of just the payload? Now I&#8217;ll need to be able to specify the payload size as an expression, so I can subract 8 from whatever value was read from the file, before reading that data&nbsp;in&#8230;</p> </blockquote> <p>And all that came up by just trying to implement a few common image formats after designing the framework to work with some less common music formats. I kept having to do so much <em>re</em>designing that I&#8217;d end up throwing the whole thing away and starting over! Needless to say, I was getting nowhere fast, and I needed a new&nbsp;approach.</p> <p>What I&#8217;m doing instead is gathering a laundry list of formats I want to implement (I&#8217;ve got about a hundred in mind so far), and rather than trying to wedge in support for each of their edge cases <em>after</em> building the framework, I&#8217;m going to implement them first, before writing a single line framework-level&nbsp;code.</p> <p>Basically, I&#8217;m <em>just</em> designing the framework, rather than trying to implement it at the same time. This way, if I need to make a change to the design, I&#8217;m not held back by implementation details based on assumptions I should never have made. I can&#8217;t hope to plan for every single use case, but with so many formats to work through, I&#8217;m likely to catch most of the oddities and design a framework that&#8217;s flexible enough to accommodate most of the rest after the&nbsp;fact.</p> <p>As much as I love programming, I&#8217;m finding this new process to be <em>very</em> fun. My only real restriction is to be consistent across different formats. If I used something called <code>PositiveInteger</code> there, I should use it here as well. Make sure the arguments and other semantics all match up, and I&#8217;m on my way. I still have to go back through existing formats and make adjustments if I need to modify the semantics of something existing, but I&#8217;d have to do that anyway, but this way I&#8217;m <em>only</em> adjusting the formats, rather than the framework as&nbsp;well.</p> <p>I should point out that when I spoke to <a href="http://twitter.com/asenchi">a friend</a> about this recently over sushi, he referred to it as <a href="http://tom.preston-werner.com/2010/08/23/readme-driven-development.html">Readme Driven Development</a>, but I&#8217;m not so sure it applies so cleanly. It does seem to provide many of the same benefits he lists, but I&#8217;m not really documenting anything. Someone reading over these format implementations probably wouldn&#8217;t know how to actually use the framework yet, because I&#8217;m only writing code to use it, not words. Plus, I&#8217;m not limiting myself to a single file, because there are just too many cases for that. I&#8217;m probably using the same approach as it applies to my problem domain, so I don&#8217;t mind the comparison&nbsp;anyway.</p> <p>Mostly, though, I don&#8217;t like to think of this step as development, or even driving development. Instead, this is all about design. Development does include design, but it also includes all the more mundane, real-world concerns like performance and security. I&#8217;m not worrying about those at the moment, because the design is so important to this project. When I do worry about that stuff later, it might require some more changes, but those wouldn&#8217;t be specific to any format, so they should be easier to make as a whole. Besides, I&#8217;ll have hundreds of fully automated tests based on a few example files for each of the formats I&#8217;ve implemented by then, to make sure I don&#8217;t accidentally break&nbsp;anything.</p> <p>So, I guess it&#8217;s usage driven design, which later becomes test driven development. I&#8217;m not much for trying to define stuff like this, though. I&#8217;m just trying to do what I think will help me get this thing working. It&#8217;s still a little too early to say how well it&#8217;ll work overall, but my experience so far has been quite positive. The flexibility has been incredibly useful, and I&#8217;m finally able to incorporate features that would&#8217;ve previously taken me&nbsp;ages.</p> <p>If you&#8217;re interested in my progress so far, feel free to look over the examples <a href="http://github.com/gulopine/biwako">at Github</a>. I plan to introduce a few new formats every week, depending on how much time I have to hack on them, and how complicated they are. I&#8217;m willing to take suggestions on future formats, but I&#8217;d like to at least organize and post my list first, so I don&#8217;t get a hundred requests for things I&#8217;m already&nbsp;planning.</p>