Archive for September, 2006

NSPredicate and regular expressions

I’ve managed to figure out how to implement wildmat patterns inside of Wombat. It turns out NSPredicate does in fact support regular expressions, as documented in Using Predicates. You simply use the MATCHES operator to specify a regular expression, as shown in a couple of examples in Apple’s documentation. It’s implemented using ICU’s Regular Expressions package, which provides much better documentation than Apple.

The gotcha to note is that the MATCHES operator does not compile down to SQL, so you can’t give it directly to Core Data. Instead you have to pull out all the entities then post-process the array using NSArray’s filteredArrayUsingPredicate: method. It works, but it’s not as efficient as it would be if it were compiled down to SQL.

In a related news, Mike Zornek was kind enough to point out a small Core Data mailing list. Hopefully it will be a source of useful information in the future.

Core Data fetch queries

I’m currently trying to implement the wildmat matching algorithm as described in RFC 2980 for my NNTP server, Wombat. Since I’m implementing the article database using Core Data, I’d like to use as much as the built in functionality as possible.

Wildmat isn’t regular expressions, but supports five allegedly different operations. It’s really just three (they count character sets as three):

  1. * –matches any character zero or more times.
  2. ? –matches any one, and only one, character.
  3. [] –character sets. This includes using ^ to specify negative character sets and using \ to escape special characters in the set.

I’ve actually figured out that Core Data supports the first two right out of the box. That’s right, if you use the LIKE operator to match strings, you can use * and ? in the string and Core Data will do the magic for you:


NSString* wildmat = @"foo*bar??";
NSPredicate* predicate = [NSPredicate predicateWithFormat:@"name LIKE %@", wildmat];

In the above example Core Data would match any of the follow names:

foobar12
fooSTUFFbarNO
foojunk12barG1

I’m trying to figure out if Core Data supports character sets in anyway. It’d be a lot easier, and more efficient, if I can get Core Data to do all the heavy lifting for me. I’m also looking for any mailing list or discussion group that covers Core Data. I checked Apple’s mailing lists, but I couldn’t find anything.

Wombat NNTP progress

I’ve been making some progress on Wombat, my NNTP server. I got XHDR, LIST OVERVIEW.FMT, and XOVER all implemented, so all the NNTP clients that I have work with it. The interesting thing is XHDR and XOVER do basically the same thing, except that XOVER is much more efficient. Instead of sending one header at a time for a range of articles, XOVER sends all the standard headers for a range of articles all at once. I can see why NNTP clients want it implemented, but it also makes me wonder why newer NNTP clients would bother with XHDR (MaxNews and some others do).

I also implemented some extended headers like Xref, Lines, and Bytes, since every client I tested with wanted them. Also both Lines and Bytes are required to be sent with the XOVER headers (XOVER has required and optional headers). Now, Xref and Lines were fairly easy to implement. Xref just lists each group the article is on the server, and its article number in that group. Lines is simply the number of lines in the body of article. But Bytes, oy.

RFC 2980 says XOVER requires Bytes, but nowhere does it say what the value is. Only by interrogating an existing news server was I able to guess that it’s most likely the total number of bytes of the article. But I’m not sure, I could very easily be wrong. I googled for it several times and came up with nothing. The other thing about Bytes is that it isn’t really stored in the headers. That would alter the size of the article. So its kept external of that. Me, I just added another long long field to the database and jammed it in there.

In related news, I’ve found that only Unison, NewsWatcher, and Thunderbird are really worth testing against. The others that I mentioned last time are so flaky I have to track down whether it’s a bug in my server or a bug in their client. Of the good ones, NewsWatcher is actually my favorite despite its age. It seems to have the most robust and flexible NNTP client implementation. Its about the only news client that notices when groups are added/removed on the server, and allows me to select certain useful options (i.e. does it use XHDR or XOVER to get header information). Very useful for testing.

Unison makes it easy to upload files (it automatically segments them, encodes them, etc) so I played around with that for a while. I found some bugs in Wombat that way. Namely, that I was treating all text as UTF-8 encoded, when it wasn’t. That meant yEnc encoded files got corrupted and couldn’t be properly rebuilt. I found out that using ISO Latin 1 for the encoding resulted in non-corrupted files, but that’s still a big assumption to make. In the end, however, I ended up just treating the headers as text and the body as a big bag of bits.

I’m still using Core Data as the back end, although a little less so now. The other thing I found out from uploading files from Unison was that large articles make the database huge, quick. Also seeing that I was treating most of the article as a binary blob, keeping it in the database wasn’t so useful. So I modified Wombat to keep the headers in the database, but write the article to an external file. I kept what I think is a traditional directory layout for articles. An article posted as the 4th article to com.orderndev.general gets path: com/orderndev/general/4.txt. Unlike traditional systems though, I don’t hard link from other group directories it was cross posted to. I just put the relative path to the article file in the database.

I’m struggling with using Core Data with threads. It has some lock and unlock methods on NSManagedObjectContext with documentation vaguely stating that you should use them, maybe, if you feel like it. Unfortunately, I occasionally get random crashes when I have multiple threads touching the object context, the in memory part of the database. I have already put locks around code anytime I create an entity, modify an entity, or retrieve an entity. I haven’t put locks around accessing attributes of entities, although it looks like I will have to. I just wish there was some good documentation for this.

Meanwhile, I’m still progressing through RFC 2980 and RFC 1036 and getting the standard stuff implemented. Yukon, ho!