Archive for September, 2006

NNTP, Core Data, and Wombats

For my new programming side project I’ve started writing an NNTP server. Like many young children, I once looked up into the night sky and wondered, “what would it be like to write my own news server?” I’ve read the RFC before, but never got around to actually implementing one, mainly because writing my own database never really appealed to me. I’m just crazy like that.

That’s where Core Data comes in. It’s an Apple data modeling technology that wraps SQLite. I’ve been looking for an excuse to learn this very cool technology, and this seemed a good excuse as any. I have to say, Core Data is very easy to use. The only problem I had was I kept wanting to design database tables instead of designing an object model. (Where’s my primary and foreign keys??) Way too much MySQL beforehand.

I’ve decided to code name this ill-advised project “Wombat,” for several reasons. First, Wombat is mentioned a few times in the examples in RFC 977. Second, its a really dumb name, and what other kind of a name would you give to an NNTP server that only runs Mac OS X 10.4 and higher? Plus Wombats look ornery, and if this server is anything, it’s ornery.

I’ve actually got RFC 977 implemented now, with a couple of extensions, which is supposed to be the NNTP standard. After I got most of the commands implemented, I decided to try my server with some Mac OS X native newsreaders, just to see them interact.

Hahahaha!

That’s the sound of all the Mac OS X newsreaders not knowing what the heck to do with a news server that implements the NNTP standard. Or perhaps, more accurately, news servers that only implement the standard.

I tried several NNTP readers: Panic’s Unison, Mozilla’s Thunderbird, the venerable MT-NewsWatcher, and some lesser known ones like Newsflash, OSXnews, MaxNews, and Xnntp. Most of them could get the list of groups and how many articles there were, but that was it.

You see, there’s another RFC, called “Common NNTP Extensions” that describes many ad-hoc extensions that servers started implementing that are not in the standard (RFC 977). Well, it turns out every reader I could find requires at least some of these extensions to be implemented. Namely, most readers want XOVER, LIST OVERVIEW.FMT, and XHDR implemented. Basically, those commands help the reader retrieve header information (subject, from, etc) en masse, and far more efficiently than the standard allows.

I guess what was more shocking to me, was that none of the clients fell back to the standard commands, when The Wombat started kicking back “500″ codes (command not implemented). The readers either treated my response like a bug in the server (that’s what they told the user) or just assumed The Wombat returned an empty data set. I’m guessing that the common extensions have been around for long enough (documented in 2000) and widely implemented enough, that all readers simply assume they’re going to be there.

There were also problems with the couple of extension commands I did implement. Namely I wanted a way to require authorization, so I implemented the AUTHINFO command, as described in the “Common NNTP Extensions” RFC. Apparently I’m the only one reading from that document.

The way AUTHINFO works is the client can issue commands like it normally would, but at any time the server can kick back a response that essentially says “Sorry, you have to authenticate before you do that.” At that point the client gives a username and password, the server authenticates it, and the client goes on its merry way. That’s the way it supposed to work anyway.

Unison, on the other hand, wants to start shoving AUTHINFO commands at the server at random intervals. This is contrary to the RFC that says the client should never initiate authentications, but only provide it when requested. The RFC also says that if the client offers AUTHINFO commands when not requested, the server is supposed to reject them. Foolishly believing the spec, that’s how I implemented The Wombat. Well, if you ever turn Unison down, you hurt its little feelings and it starts talking about you behind your back to the user. Stuff like: “the server doesn’t like you anymore. It rejected your username and password.” Unfortunately, the RFC doesn’t have a response for “you dumb client, you’re already logged in as that user!” Of course, I ended up modifying The Wombat to accommodate Unison’s pushy style of authentication.

So at the end of the day I have a fully implemented NNTP server, with respect to RFC 977, and no reader will work with it. I guess I have some work to do.

Respect for the testers

Sure it’s a lot fun to tease them and try to make their lives miserable, but really, if we didn’t have testers who would we, as engineers, have to torment? The marketing people? Please, they’re not even self aware enough to know that we’re doing it, and that’s no fun. Testers, on the other hand, are not only fun to use as scape goats, but they also provide an important service for the product.

Namely one I never want to do.

Despite that, I found myself doing exactly that recently. My WordPress plugin is now code complete, but is in need of testing. I looked around the apartment for suitable candidates, but the lizards around here are so small they cannot even depress the keys on the keyboard despite jumping on them. That’s how this unmentionable task fell to me: I had to write and execute test plans.

GAAAHHHH!!!!

Now writing test plans and such is something that I learned about in college. At the time I thought “Bah! This is nice for reference and all, but I’m never going to use this. I’m an engineer! I create the bugs, not find them!” Oh, how wrong I was. While in college, I also worked as an intern. Although I was supposed to be working on developing internal tools, I often got pulled into doing QA work. (Note for the unexperienced: the QA department is always understaffed. Hide behind the nearest potted plant if the QA manager ever comes within ten feet of your cubicle.) It was a never ending battle: me trying to escape QA work, the QA manager pulling me back in, and the other engineers laughing at me the entire time.

Testing and quality assurance work is never fun. When writing a test plan, you have to think of all the possible ways that a feature can break, and make sure all the different angles are covered. But that’s balanced by the fact that you can’t test everything so you have to be smart about what you test. That way you get the maximum possible coverage for the least amount of work. After you write the mind-numbingly boring test plan, some unlucky bloke has to run it. The experience is much like putting a portable drill to your temple and pressing really hard.

I’ve actually managed to get the test plans for my plugin written now. I found that writing them myself was a good exercise. I had to change my attitude from “how do I make this work?” to “how do I crush this pathetic excuse for software, and send the developer running home to his mommy?” I found several bugs just by thinking through how to test the different features. I also found that there were features that weren’t as usable as they should have been, since I hadn’t been looking at them from the point of the user, but that of an engineer. All of this, and I hadn’t even run the test plan. Good stuff.

I’m not looking forward to running my test plans. I have to run them at least three times: once on Safari, once on Firefox, and once on my arch-nemesis, Internet Explorer. May God have mercy on me.

I say all of this to show that I respect the testers and quality assurance people out there. Sure I go through this each time I have to do some sort of testing myself, or a tester finds a bug that I wouldn’t have caught myself, but it bears repeating. Testers are there to make to make the engineers look good. Unless the tester wants your parking spot. Then they’re probably trying to get you fired so they can have a shorter walk to the building.

It's not about code reuse, it's about maintenance

Uli Kusterer has an article on his blog about what he calls “Cocoa ground rules.”

Since I’ve seen many people violate two ground rules of object-oriented programming (OOP), I thought I’d list them here, in the hopes it’ll help some beginners not fight the frameworks but rather go with the flow in Cocoa:

Uli then lists encapsulation and designing for code reuse as the ground rules. Now Uli is giving some good advice here, but I’d like to nit pick him here a bit for my own personal agenda… I mean, for the good of the people. Yeah, that’s it, the good of the people.

I propose that designing for code reuse, in most cases, is not the best thing to do. Allow me to explain.

Think about the times that you actually reuse code, in real, live applications. Its usually data structures, generic algorithms (like sorts and searches), and standard controls. Now think about where you find such code. That’s right, the basic data structures, algorithms, and UI widgets that you reuse in different applications are provided by the operating system or language for you. You don’t need to implement them, and, in general, shouldn’t try to. Apple is smart and realizes that applications get developed faster when developers don’t have implement basic data structures etc, that the operating system could provide for them.

Since the operating system and programming language provide most, if not all reusable classes, what’s left in your application is the application-specific classes. By definition application-specific code doesn’t typically have reuses in other applications, unless you’re writing competition for yourself. For example, if you have a class that reads in a SVG document, then you are probably drooling on yourself from having to read that insane specification. Plus, its unlikely you’ll need that SVG reader class in another application unless its also a SVG editor.

Now some of you think you have just spotted a hole in my logic. What if you have an application that is a lightweight SVG viewer and then a full-featured vector editing application? Both are plausible products that would need a class to read SVG documents. But they will have very different uses of the SVG reader class. The lightweight viewer will want to quickly build up the graphics state and draw, while the editor will want to keep the SVG DOM around so it can be edited later. Not a problem, you can abstract the SVG reader class to simply use callbacks on the delegate and let the delegate build up a DOM or whatever it needs to do to be efficient. That fixes the problem.

Right?

Except that the whole “abstracting” part is real hard. Very hard, especially if you haven’t implemented all the possible users of the class yet. As any API designer will tell you, it’s difficult to anticipate how a class is going to be used. That can lead to one of two situations: a class that isn’t abstracted enough to be used by multiple client classes, or a class that’s over engineered and very hard to use. Neither is ideal, but I will argue that being over engineered is worse.

Over engineered code is often bug riddled because of its complexity. It also runs the risk of having untested code in it, where the implementor said, “I don’t need this method now, but someone might need it later.” This introduces code that may or may not work, but creates the illusion, by being in the codebase, of being tested. That’s very bad. Over engineered code, as a result of being complex, also has a higher rate of being abandoned by potential users. If another programmer can’t figure out a class, they’re much more likely to create a new class to do what they want. On the other hand, if they see a simpler class that they understand, but doesn’t quite do what they need, they are more likely to enhance it to meet the original need and theirs as well.

I’m not saying code reuse is bad, I’m saying code reuse as a goal can be bad. In the stead of code reuse, I propose a new goal (or old, depending on how long you’ve been around).

As I so convincingly showed above, achieving code reuse is quite often not an attainable or realistic goal, because it so rarely happens. But what does happen a lot, and is in fact inevitable, is code maintenance. No matter what you write, you will have to maintain it at some point, unless you write really crappy code that no one in their right mind wants to use. There are several reasons why maintenance is more important than reuse:

  • Money. More money is generated from product upgrades (maintenance) than the initial release.
  • Time. Maintenance is where the product spends most of its life cycle.
  • Code reuse itself. As I said above, most code reuse is achieved by modifying simple classes to meet the needs of additional clients.

Now that everyone knows that code maintenance is where developers spend their time, and make their money, we should optimize for it. How do we do that? Well, we need some new ground rules for that.

The ground rules of any programming, whether object-oriented or otherwise, lies in two terms: cohesion and coupling. Hopefully you remember these terms from your computer science classes, but in case you were getting a bikini wax that day, I’ll define them.

Cohesion describes the internal relationship of a module, or in OOP terms, a class. High cohesion means that everything in the class is tightly related (e.g. the x and y member variables in a point class), where low cohesion means that everything inside is only loosely related, if at all (e.g. the Windows Registry).

Coupling describes the external relationships of a module or class. High coupling means that a class is very dependent on other classes, such as deriving from them or declaring them as friends. Coupling can also be the use of other classes or aggregating instances of other classes. Low coupling means that a class or module pretty much stands on its own. i.e. It doesn’t need other classes to compile or run.

Ideally, you want high cohesion and low coupling.

High cohesion and low coupling improve code maintenance in some simple ways. High cohesion means you don’t have to go searching everywhere in the codebase to find everything about, say, vectors. Its all in one place. Which means the code is easier to understand, and also modify. Low coupling also means that code is easier to understand, because you don’t have to understand how several classes work (such as a class and its parents and friends), but just a minimal set. More importantly, low coupling means the code is easier to modify. Since code isn’t tightly coupled, then you can modify code in one class and not have to worry about it effecting a class its not coupled to. This extremely important in large software systems, like when you’re trying to figure out why moving a button in a dialog changes how your preferences are written out.

“Fine”, you say, “I want high cohesion and low coupling, but it sounds like proper encapsulation will get me that, which in turn will lead to code reuse, easier maintenance, and incredible abs. How is that any different from what Uli said?”

The goal is different. You’re goal is not code reuse, although it might be a welcomed side effect. Your goal is easy code maintenance. Focusing on how to make things easier to maintain, as opposed to reusable, changes how you code your classes. Instead of spending time trying to guess all the possible uses of a class and making them generic as possible, you code it to satisfy the needs of its current clients efficiently. As different clients require its use, then you modify the class to accommodate them.

In all fairness, reusability for Uli is really important. But its not because of programming ground rules, its because of product requirements. Uli has lots of sample code, and the primary feature of sample code is reuse. That said, I would still argue that it is more important that the sample code be maintainable than outright reusable by everyone. Its impossible to guess all the possible ways a Cocoa view could be used, so its more pressing that it can be understood and modified by the user to do what’s needed.

In computer science you should always optimize for the common case. Which is code maintenance, not code reuse.

(P.S. My apologies to Uli if I have misrepresented anything you said. Your comment on code reuse got me thinking about the people who rant and rave about code reuse as if it were the Holy Grail.)