Archive for September, 2006

Core Data fetch queries

I’m currently trying to implement the wildmat matching algorithm as described in RFC 2980 for my NNTP server, Wombat. Since I’m implementing the article database using Core Data, I’d like to use as much as the built in functionality as possible.

Wildmat isn’t regular expressions, but supports five allegedly different operations. It’s really just three (they count character sets as three):

  1. * –matches any character zero or more times.
  2. ? –matches any one, and only one, character.
  3. [] –character sets. This includes using ^ to specify negative character sets and using \ to escape special characters in the set.

I’ve actually figured out that Core Data supports the first two right out of the box. That’s right, if you use the LIKE operator to match strings, you can use * and ? in the string and Core Data will do the magic for you:


NSString* wildmat = @"foo*bar??";
NSPredicate* predicate = [NSPredicate predicateWithFormat:@"name LIKE %@", wildmat];

In the above example Core Data would match any of the follow names:

foobar12
fooSTUFFbarNO
foojunk12barG1

I’m trying to figure out if Core Data supports character sets in anyway. It’d be a lot easier, and more efficient, if I can get Core Data to do all the heavy lifting for me. I’m also looking for any mailing list or discussion group that covers Core Data. I checked Apple’s mailing lists, but I couldn’t find anything.

Wombat NNTP progress

I’ve been making some progress on Wombat, my NNTP server. I got XHDR, LIST OVERVIEW.FMT, and XOVER all implemented, so all the NNTP clients that I have work with it. The interesting thing is XHDR and XOVER do basically the same thing, except that XOVER is much more efficient. Instead of sending one header at a time for a range of articles, XOVER sends all the standard headers for a range of articles all at once. I can see why NNTP clients want it implemented, but it also makes me wonder why newer NNTP clients would bother with XHDR (MaxNews and some others do).

I also implemented some extended headers like Xref, Lines, and Bytes, since every client I tested with wanted them. Also both Lines and Bytes are required to be sent with the XOVER headers (XOVER has required and optional headers). Now, Xref and Lines were fairly easy to implement. Xref just lists each group the article is on the server, and its article number in that group. Lines is simply the number of lines in the body of article. But Bytes, oy.

RFC 2980 says XOVER requires Bytes, but nowhere does it say what the value is. Only by interrogating an existing news server was I able to guess that it’s most likely the total number of bytes of the article. But I’m not sure, I could very easily be wrong. I googled for it several times and came up with nothing. The other thing about Bytes is that it isn’t really stored in the headers. That would alter the size of the article. So its kept external of that. Me, I just added another long long field to the database and jammed it in there.

In related news, I’ve found that only Unison, NewsWatcher, and Thunderbird are really worth testing against. The others that I mentioned last time are so flaky I have to track down whether it’s a bug in my server or a bug in their client. Of the good ones, NewsWatcher is actually my favorite despite its age. It seems to have the most robust and flexible NNTP client implementation. Its about the only news client that notices when groups are added/removed on the server, and allows me to select certain useful options (i.e. does it use XHDR or XOVER to get header information). Very useful for testing.

Unison makes it easy to upload files (it automatically segments them, encodes them, etc) so I played around with that for a while. I found some bugs in Wombat that way. Namely, that I was treating all text as UTF-8 encoded, when it wasn’t. That meant yEnc encoded files got corrupted and couldn’t be properly rebuilt. I found out that using ISO Latin 1 for the encoding resulted in non-corrupted files, but that’s still a big assumption to make. In the end, however, I ended up just treating the headers as text and the body as a big bag of bits.

I’m still using Core Data as the back end, although a little less so now. The other thing I found out from uploading files from Unison was that large articles make the database huge, quick. Also seeing that I was treating most of the article as a binary blob, keeping it in the database wasn’t so useful. So I modified Wombat to keep the headers in the database, but write the article to an external file. I kept what I think is a traditional directory layout for articles. An article posted as the 4th article to com.orderndev.general gets path: com/orderndev/general/4.txt. Unlike traditional systems though, I don’t hard link from other group directories it was cross posted to. I just put the relative path to the article file in the database.

I’m struggling with using Core Data with threads. It has some lock and unlock methods on NSManagedObjectContext with documentation vaguely stating that you should use them, maybe, if you feel like it. Unfortunately, I occasionally get random crashes when I have multiple threads touching the object context, the in memory part of the database. I have already put locks around code anytime I create an entity, modify an entity, or retrieve an entity. I haven’t put locks around accessing attributes of entities, although it looks like I will have to. I just wish there was some good documentation for this.

Meanwhile, I’m still progressing through RFC 2980 and RFC 1036 and getting the standard stuff implemented. Yukon, ho!

NNTP, Core Data, and Wombats

For my new programming side project I’ve started writing an NNTP server. Like many young children, I once looked up into the night sky and wondered, “what would it be like to write my own news server?” I’ve read the RFC before, but never got around to actually implementing one, mainly because writing my own database never really appealed to me. I’m just crazy like that.

That’s where Core Data comes in. It’s an Apple data modeling technology that wraps SQLite. I’ve been looking for an excuse to learn this very cool technology, and this seemed a good excuse as any. I have to say, Core Data is very easy to use. The only problem I had was I kept wanting to design database tables instead of designing an object model. (Where’s my primary and foreign keys??) Way too much MySQL beforehand.

I’ve decided to code name this ill-advised project “Wombat,” for several reasons. First, Wombat is mentioned a few times in the examples in RFC 977. Second, its a really dumb name, and what other kind of a name would you give to an NNTP server that only runs Mac OS X 10.4 and higher? Plus Wombats look ornery, and if this server is anything, it’s ornery.

I’ve actually got RFC 977 implemented now, with a couple of extensions, which is supposed to be the NNTP standard. After I got most of the commands implemented, I decided to try my server with some Mac OS X native newsreaders, just to see them interact.

Hahahaha!

That’s the sound of all the Mac OS X newsreaders not knowing what the heck to do with a news server that implements the NNTP standard. Or perhaps, more accurately, news servers that only implement the standard.

I tried several NNTP readers: Panic’s Unison, Mozilla’s Thunderbird, the venerable MT-NewsWatcher, and some lesser known ones like Newsflash, OSXnews, MaxNews, and Xnntp. Most of them could get the list of groups and how many articles there were, but that was it.

You see, there’s another RFC, called “Common NNTP Extensions” that describes many ad-hoc extensions that servers started implementing that are not in the standard (RFC 977). Well, it turns out every reader I could find requires at least some of these extensions to be implemented. Namely, most readers want XOVER, LIST OVERVIEW.FMT, and XHDR implemented. Basically, those commands help the reader retrieve header information (subject, from, etc) en masse, and far more efficiently than the standard allows.

I guess what was more shocking to me, was that none of the clients fell back to the standard commands, when The Wombat started kicking back “500″ codes (command not implemented). The readers either treated my response like a bug in the server (that’s what they told the user) or just assumed The Wombat returned an empty data set. I’m guessing that the common extensions have been around for long enough (documented in 2000) and widely implemented enough, that all readers simply assume they’re going to be there.

There were also problems with the couple of extension commands I did implement. Namely I wanted a way to require authorization, so I implemented the AUTHINFO command, as described in the “Common NNTP Extensions” RFC. Apparently I’m the only one reading from that document.

The way AUTHINFO works is the client can issue commands like it normally would, but at any time the server can kick back a response that essentially says “Sorry, you have to authenticate before you do that.” At that point the client gives a username and password, the server authenticates it, and the client goes on its merry way. That’s the way it supposed to work anyway.

Unison, on the other hand, wants to start shoving AUTHINFO commands at the server at random intervals. This is contrary to the RFC that says the client should never initiate authentications, but only provide it when requested. The RFC also says that if the client offers AUTHINFO commands when not requested, the server is supposed to reject them. Foolishly believing the spec, that’s how I implemented The Wombat. Well, if you ever turn Unison down, you hurt its little feelings and it starts talking about you behind your back to the user. Stuff like: “the server doesn’t like you anymore. It rejected your username and password.” Unfortunately, the RFC doesn’t have a response for “you dumb client, you’re already logged in as that user!” Of course, I ended up modifying The Wombat to accommodate Unison’s pushy style of authentication.

So at the end of the day I have a fully implemented NNTP server, with respect to RFC 977, and no reader will work with it. I guess I have some work to do.