Core Data and Multi-threading
andy on Oct 14th 2006
I’ve been wanting to write this article since Tuesday, but I’ve been distracted by my day job. One of our clients is getting close to shipping so I have to put in more hours than usual. It’s no where near as fun as working on Wombat, but it pays the bills.
Anyway, back when I wrote a couple of weeks ago about Wombat, I mentioned the trouble I was having with Core Data and multiple threads. Basically, I was finding that the entire context (NSManagedObjectContext) had to be locked anytime a thread touched the context or any one of its managed objects (NSManagedObject). That included even accessing attributes on a NSManagedObject as well as mutating them.
Apparently I wasn’t the only one who figured this out. Florian Zschocke, creator of Xnntp, told me that he was running into the same problem of having to lock the entire context each time he touched anything. He was also wondering if there was a better way.
The obvious problem with locking every time is that it defeats the concurrency of threads. The threads end up being serialized anytime they touch the data store. This is pretty troublesome for Wombat, because it’s an NNTP server. Most of its time is spent doing I/O – either reading/writing to the data store or reading/writing to sockets. Accessing the data store is already a potential performance hotspot, and the serialization or threads makes it even worse.
Fortunately there’s a better way. Blake Seely left a comment on my previous post, letting me know that the appropriate way to handle multiple threads is to have a separate context for each thread. About this time I also found some Apple documentation pertaining to Core Data and multiple threads, which echoed Blake’s comments. This is as simple as allocating a NSManagedObjectContext each time a thread is spawned, and handing it the solitary NSPersistentStoreCoordinator.
The one gotcha is that NSManagedObject’s from one context cannot be used in another context. If you want to send an object from one thread to the other, you have to pass the NSManagedObjectID around. This can be obtained by [object objectID] from one thread, then used on the other thread by [context objectWithID:objectID] to get the corresponding object in that context. However, this only works for objects that have been saved. In general, Wombat isn’t going to have to worry about passing objects between threads. That’s because each client is pretty isolated and has no reason to talk directly to another client.
That said, having multiple contexts has some implications for Wombat. Currently each client gets its own thread, and thus its own object context. In the future this will probably change, and clients will be pooled together on a few threads that handle multiple clients using something like kqueue to multiplex sockets. The catch is that no client saves its changes until the remote client closes the connection. Currently that means if Wombat has two clients, and Client A posts an article, Client B will not see that article until Client A quits. For performance reasons, NNTP clients often leave the connection open for a specified amount of time after they’ve done their work.
The behavior is acceptable, but it gets a bit more weird when clients start getting multiplexed by a single thread. In that scenario, Client B might see the article immediately if its in the same pool, or it might have to wait until Client A quits. In other words, some clients will see articles sooner than other clients. Once again, it’s acceptable behavior, but it’s a little odd.
Meanwhile, I’ve been reading Another Day in the Code Mines, which has a lot to say about threading. One of the thoughts that I came away with is that forking processes in Wombat would probably be better than spawning threads. That is, for each client that connects, instead of spawning a thread for it, spawn a process to handle it. Processes are heavier weight, but they provide a couple advantages. First, they provide separate memory spaces for each client, so one client can’t mess with another. Second, if one client crashes, it doesn’t take down the entire server, thus making Wombat more robust. Forking for each client also happens to be a classic NNTP server design, and for good reason.
Unfortunately, as far as I can tell, Core Data doesn’t support this. Multiple contexts can exist because they all share one NSPersistentStoreCoordinator, which serializes all I/O to the data store file. Since SQLite often updates just parts of the file at a time, I can’t imagine that it would allow multiple processes to have the data store file open at once, especially for write. The only way I see around this is to make the data store file its own server. Unfortunately, this reintroduces the single point of failure (if it goes down, all clients go down) and since NNTP is fairly thin protocol over news, it would just end up being something pretty close to an NNTP server itself. Not a win.
In the end, it looks as though I’m just going to give each thread its own context, and then multiplex several sockets on each thread using kqueue. It may not be as robust as forking processes, but it should be possible to get some good performance out of it.
Filed in Core Data, Macintosh, Programming, Wombat | 5 responses so far
5 Responses to “Core Data and Multi-threading”

Jakob Stoklund Olesen Oct 15th 2006 at 06:08 am 1
Andy,
SQLite does in fact allow multiple processes to edit a database at the same time, and it works too. It uses file-level locking to do this, so performance is less than stellar if you have many processes.
One thing you need to consider when doing multithreaded Core Data is that changes are not processed automatically outside the GUI thread. You need to regularly save, fetch, or call -processPendingChanges manually.
Another thing: when using multiple contexts, you need to use -refreshObject:mergeChanges: to keep contexts up to date. This is the only way of propagating changes between existing contexts.
As for multiplexing sockets, do you know about CFSocket and NSStream? kqueue is pretty low level, not that there is anything wrong with that.
Finally, Core Data is not really a good choice for an NNTP server, but you will find out
Chris Hanson Oct 16th 2006 at 09:30 am 2
You can use a SQLite persistent store from multiple Core Data persistence stacks (coordinators) at once, whether they’re in the same or multiple processes. As Jakob pointed out, depending on your filesystem and its locking support, this can place limits on your performance; however, on a local filesystem, SQLite should be using byte-range locking rather than simple file locking. You should be able to see this yourself from the public SQLite source code.
You’re also absolutely correct about having to lock a context any time you access or manipulate its managed objects. As a convenience, you don’t have to do this if (a) you create the context in the same thread as you use it and (b) you use it (and its object graph) only in that thread. The practical upshot of the above is as you discovered, that you can use multiple contexts to achieve your goals. Here’s a good article explaining why all of the above is the case by Ben Trumbull; it’s from the WebObjects list and covers EOEditingContext, which is the spiritual forebear of NSManagedObjectContext: http://lists.apple.com/archives/Webobjects-dev/2004/Dec/msg00255.html
Andy Oct 16th 2006 at 08:42 pm 3
Hi Jakob,
I do know about CFSocket/NSStream and NSRunLoop, etc. The main reason for wanting to use kqueue is the same reason for wanting to use Core Data in Wombat. That is, I’ve never used kqueue before and wanted to learn.
I’ve also heard that kqueue is much more efficient/scalable that select. I’m not sure what NSRunLoop/CFRunLoop uses; it might use kqueue behind the scenes making my direct use of kqueue rather pointless.
Scott Stevenson Oct 20th 2006 at 04:14 pm 4
Finally, Core Data is not really a good choice for an NNTP server, but you will find out
I don’t agree with that. No matter how you slice it, Core Data can make something part of the equation. If you don’t want to use its persistence engine, you can opt out of that and just use it for change tracking and UI population.
Core Data’s speed can be extremely competitive with raw SQLite if used properly, which was explained at one session at WWDC.
Scott Stevenson Oct 20th 2006 at 04:15 pm 5
Core Data can make something part of the equation
Wow. Something got mixed up there. It should have read “can make some part of the equation easier”.