Biwako: File Formats Made Easy

by Marty Alchin on January 20, 2011 about Biwako and Python

For years now, I’ve been researching various kinds of file formats, from music and images to video games and even NASCAR data streams. Each format is usually considered to be unique—at least as far as parsing/saving implementations go, but the truth is that they have a lot in common. And anytime you have a bunch of independent tasks that share similar aspects, you have an ideal environment for the creation of a framework to make those common aspects easier to manage.

To that end, I’ve created Biwako. It’s still very early on in the process, but it covers some interesting features of Python that I really want to write about, so it’s useful to have some context. This is just a brief introduction to explain the motivations behind my use of some of the other topics I’ll be writing about soon.

Biwako is a declarative class framework, similar to Django’s models and forms. It allows you to define a file format using a class definition and a series of individual field defintions, which you can then use to create, parse, modify or save files in the binary format you’ve defined. It can be used either to create your own custom file formats, but where it really shines is by helping you access data in formats specified by other standards or applications.

Usage

For example, here’s a very simple Biwako class that will can parse part of the GIF file format, allowing you to easily get to the width and height of any GIF image.

from biwako import bin

class GIF(bin.Structure, endianness=bin.LittleEndian, encoding='ascii'):
    tag = bin.FixedString('GIF')
    version = bin.String(size=3)
    width = bin.Integer(size=2)
    height = bin.Integer(size=2)

Now you have a class that can accept any GIF image as a file (or any file-like object that’s readable) and parse it into the attributes shown on this class.

>>> image = GIF(open('example.gif', 'rb'))
>>> image.width, image.height
(400, 300)

Of course, a full format definition would have many more fields available, but you get the idea. The repository currently has a few examples like this, but ultimately the goal is that you’ll be able to easily create your own classes using whatever documentation is available for the formats you’re interested in. Since most formats would be useful across multiple projects, I’m considering setting some sort of formal site where you’ll be able to upload your own class or find and download existing classes that were created by others. It’s still too early to get into that level of detail, though.

Python 3

Since this is a new framework with no existing users to support, I’ve decided to support Python 3 right out of the box and not even bother trying to maintain compatibility with previous versions. There are a number of advantages to this, not the least of which is an easy way to distinguish between bytes and strings. I’ll be explaining some of the advantages of Python 3 in future blog posts, and there are some pretty great features to take advantage of.

Of course, supporting only Python 3 means that there are currently some limits on which projects can use Biwako, because of other projects that might not yet have a Python 3 version available. These cases should be getting less and less common as time goes on, and it’s not an issue for most command-line cases where you just need to process some information in a bunch of files at once.

Under construction

Biwako is under heavy development right now. Most open source projects say this to some extent or another, but because it’s so young, I really mean it. I’m developing this “live” so every day I’m pushing new code to GitHub (that was my New Year’s resolution). That means that every day there’s either new code to play with, new documentation to read or new tests to run. But it also means that anything you wrote using yesterday’s code might break in surprising ways.

I’m doing my best to establish a stable API early on, but it’s still too early for me to consider any aspect of Biwako as stable yet. Some days you might find things work a little differently than you expected, and other days you might find that I’ve pulled several rugs completely out from underneath you. You’re always free to play around with it, but if you upll new code one day and everything’s suddenly broken, please don’t come complain to me—at least, not until I’ve marked a stable API.

I’ll do my best to document changes as they happen, but I fully expect the documentation to lag a bit behind the code. It’s not that I don’t like to write docs, I just that right now, getting the code working, stabilized and rich with features are much higher priorities. I’m having fun writing it, and sometimes I just want to have more fun with before getting down to the real work of docs. If you’re interested, you can read the docs now and keep an eye on them in the future.

Future plans

I’ve got a lot in mind for this little framework. So far, I’ve implemented enough to parse a few simple formats, but I’ve got list of nearly a hundred different formats to test it out on, which run a pretty wide gamut. For now, my focus is on binary file formats, but I’ve laid out the namespaces in a way that allows for future expansion into text-based formats as well. Perhaps someday Sheets will find a new home as part of Biwako, for example.

So keep checking back for new code, new documentation and new articles about the awesome features Python 3 has to offer.