Core Data and Multi-threading
I’ve been wanting to write this article since Tuesday, but I’ve been distracted by my day job. One of our clients is getting close to shipping so I have to put in more hours than usual. It’s no where near as fun as working on Wombat, but it pays the bills.
Anyway, back when I wrote a couple of weeks ago about Wombat, I mentioned the trouble I was having with Core Data and multiple threads. Basically, I was finding that the entire context (NSManagedObjectContext) had to be locked anytime a thread touched the context or any one of its managed objects (NSManagedObject). That included even accessing attributes on a NSManagedObject as well as mutating them.
Apparently I wasn’t the only one who figured this out. Florian Zschocke, creator of Xnntp, told me that he was running into the same problem of having to lock the entire context each time he touched anything. He was also wondering if there was a better way.
The obvious problem with locking every time is that it defeats the concurrency of threads. The threads end up being serialized anytime they touch the data store. This is pretty troublesome for Wombat, because it’s an NNTP server. Most of its time is spent doing I/O –either reading/writing to the data store or reading/writing to sockets. Accessing the data store is already a potential performance hotspot, and the serialization or threads makes it even worse.
Fortunately there’s a better way. Blake Seely left a comment on my previous post, letting me know that the appropriate way to handle multiple threads is to have a separate context for each thread. About this time I also found some Apple documentation pertaining to Core Data and multiple threads, which echoed Blake’s comments. This is as simple as allocating a NSManagedObjectContext each time a thread is spawned, and handing it the solitary NSPersistentStoreCoordinator.
The one gotcha is that NSManagedObject’s from one context cannot be used in another context. If you want to send an object from one thread to the other, you have to pass the NSManagedObjectID around. This can be obtained by [object objectID] from one thread, then used on the other thread by [context objectWithID:objectID] to get the corresponding object in that context. However, this only works for objects that have been saved. In general, Wombat isn’t going to have to worry about passing objects between threads. That’s because each client is pretty isolated and has no reason to talk directly to another client.
That said, having multiple contexts has some implications for Wombat. Currently each client gets its own thread, and thus its own object context. In the future this will probably change, and clients will be pooled together on a few threads that handle multiple clients using something like kqueue to multiplex sockets. The catch is that no client saves its changes until the remote client closes the connection. Currently that means if Wombat has two clients, and Client A posts an article, Client B will not see that article until Client A quits. For performance reasons, NNTP clients often leave the connection open for a specified amount of time after they’ve done their work.
The behavior is acceptable, but it gets a bit more weird when clients start getting multiplexed by a single thread. In that scenario, Client B might see the article immediately if its in the same pool, or it might have to wait until Client A quits. In other words, some clients will see articles sooner than other clients. Once again, it’s acceptable behavior, but it’s a little odd.
Meanwhile, I’ve been reading Another Day in the Code Mines, which has a lot to say about threading. One of the thoughts that I came away with is that forking processes in Wombat would probably be better than spawning threads. That is, for each client that connects, instead of spawning a thread for it, spawn a process to handle it. Processes are heavier weight, but they provide a couple advantages. First, they provide separate memory spaces for each client, so one client can’t mess with another. Second, if one client crashes, it doesn’t take down the entire server, thus making Wombat more robust. Forking for each client also happens to be a classic NNTP server design, and for good reason.
Unfortunately, as far as I can tell, Core Data doesn’t support this. Multiple contexts can exist because they all share one NSPersistentStoreCoordinator, which serializes all I/O to the data store file. Since SQLite often updates just parts of the file at a time, I can’t imagine that it would allow multiple processes to have the data store file open at once, especially for write. The only way I see around this is to make the data store file its own server. Unfortunately, this reintroduces the single point of failure (if it goes down, all clients go down) and since NNTP is fairly thin protocol over news, it would just end up being something pretty close to an NNTP server itself. Not a win.
In the end, it looks as though I’m just going to give each thread its own context, and then multiplex several sockets on each thread using kqueue. It may not be as robust as forking processes, but it should be possible to get some good performance out of it.