Reading and writing files in Swift async/await
Because of habits ingrained in me, by default I tend to reach for synchronous, blocking APIs when reading and writing data to and from disk. This causes problems with Swift’s cooperatively scheduled Tasks. In this post, I examine the various async-safe approaches I’ve discovered to hitting disk, and end with a general approach that I ended up using.
The synchronous way
If I’m loading up data from disk to decode from JSON or decode into a UIImage, then my go to has been:
let data = try Data(contentsOf: fileURL)
Similarly, if I’m writing said JSON or image data out to disk, I’ll use write
:
try data.write(to: fileURL)
These are simple and easy to use. They also create problems with Swift async/await.
When I look at the Swift Foundation implementation of these methods, I can see by default they do blocking I/O operations. This is a problem because Task
s are cooperatively scheduled over kernel threads. (I know this is probably an over-simplification, but stick with me.) If a Task
traps to the kernel on a blocking call (like the blocking I/O calls), the kernel doesn’t know anything about the Task
it only sees the system thread blocking, and therefore suspends it. This robs Swift’s async runtime from the opportunity to just suspend the Task
making the I/O call and pick up a different Task
that is available to run. i.e. this takes away one of the system threads Swift async/await can use for the duration of the blocking I/O call.
So while simple, it would probably be best if I didn’t use them in the async/await world.
Option 1: side step the issue via memory mapping
If all I care about is reading in data, I can consider asking Data
to memory map the file. Depending on the access patterns of the end user, this might actually be the fastest way anyway. The read accesses become implicit whenever someone touches the contents of Data
and the kernel faults in the appropriate pages.
let data = try Data(contentsOf: fileURL, options: .mappedIfSafe)
A caveat is not all files can be memory mapped. There’s also no automatic way to use memory mapping for writes using Data
APIs. While I could roll my own, I’m not sure what the benefit of that would be.
Option 2: use URLSession
The next option I discovered was leveraging URLSession
to do asynchronous reads from file URLs.
Like this:
let (data, _) = try await URLSession.shared.data(from: fileURL)
For reading the entire file in one go, this works well. Unfortunately, I couldn’t find a way to “upload” to a file URL. Probably for good reason. So this is a read-only solution.
Quick aside about async URLSession
: you probably don’t want to use the bytes
method.
Option 3: wrap up FileHandle
This is the first solution I found that handles both asynchronous reads and writes. The basic principle is simple: create a FileHandle
then use the readabilityHandler
or writeabilityHandler
properties to either read data out or write data into it. For reading, I returned an AsyncStream<Data>
that the readabilityHandler
yields to.
Side note: I do not recommend using the bytes
property to iterate one byte at a time.
I got really far with this approach and had it working. However, for files, I found it clunky. Namely:
- When reading, I had to get the size of the file first so I knew when the
readabilityHandler
got called the last time. A lot of sample code assumes that theavailableData
will beisEmpty
on the last call. However, in my testing, I found that was true forPipe
s but not for files. - When writing, the call to
write
was still synchronous. Since I was doing it inside thewriteabilityHandler
, presumably it wouldn’t actually block but that’s not really clear.
After getting FileHandle
to work, I decided it was too clunky and went for a different approach. But it is an option.
Option 4: DispatchIO
This is the approach I ended up using. DispatchIO is low-level, but its API was consistent and straight forward to wrap up. The basic principle is to use DispatchIO to get non-blocking reads/writes and then wrap those in checked continuations. That way Swift async/await knows when to suspend or resume the calling Task
. Below I go into detail on my implementation. As an added bonus to the async/await, this is my first go at a non-copyable type.
I’ll start with the type declarations:
/// A phantom type used by AsyncFileStream to restrict methods to read mode
public enum ReadMode {}
/// A phantom type used by AsyncFileStream to restrict methods to write mode
public enum WriteMode {}
public struct AsyncFileStream<Mode>: ~Copyable {
// ...
}
The actual type of AsyncFileStream
is complicated by the fact that I’m trying to make invalid operations impossible for the caller. I achieve this two ways:
- I use a phantom type called
Mode
that can either beReadMode
orWriteMode
. Read methods are only available whenMode == ReadMode
and write methods only whenMode == WriteMode
- I mark
AsyncFileStream
as non-copyable with~Copyable
. This means there can be only one copy, preventing multiple Tasks/threads/whatever from calling it at the same time. It also means I get a nicedeinit
method to clean up in, and I can mark myclose()
method asconsuming
. This means the instance can’t be used afterclose()
is called, or the compiler at compile time will emit an error.
AsyncFileStream
has the properties you’d probably expect if you’ve used DispatchIO before:
/// The queue to run the operations on
private let queue: DispatchQueue
/// The unix file descriptor for the open file
private let fileDescriptor: Int32
/// The DispatchIO instance used to issue operations
private let io: DispatchIO
/// If the file is open or not; used to prevent double closes()
private var isClosed = false
If you haven’t used DispatchIO before, don’t worry, these properties will make sense when I use them.
Creating an instance is actually the most involved part of this whole type, but it’s a good place to start:
fileprivate init(url: URL, mode: Int32) throws {
guard url.isFileURL else {
throw AsyncFileStreamError.notFileURL
}
// Since we're reading/writing as a stream, keep it a serial queue
let queue = DispatchQueue(label: "AsyncFileStream")
let fileDescriptor = open(url.absoluteURL.path, mode, 0o666)
// Once we start setting properties, we can't throw. So check to see if
// we need to throw now, then set properties
if fileDescriptor == -1 {
throw AsyncFileStreamError.openError(errno)
}
self.queue = queue
self.fileDescriptor = fileDescriptor
io = DispatchIO(
type: .stream,
fileDescriptor: fileDescriptor,
queue: queue,
cleanupHandler: { [fileDescriptor] error in
// Unfortunately, we can't seem to do anything with `error`.
// There are no guarantees when this closure is invoked, so
// the safe thing would be to save the error in an actor
// that the AsyncFileStream holds. That would allow the caller
// to check for it, or the read()/write() methods to check
// for it as well. Howevever, having an actor as a property
// on a non-copyable type appears to uncover a compiler bug.
// Since we opened the file, we need to close it
Darwin.close(fileDescriptor)
}
)
}
First, notice the init
is fileprivate
. I’m going to provide much more ergonomic APIs for creating an instance than knowing how to fill out mode
correctly. But there’s a lot of shared code, so this init
takes care of that. Error handling in the init of a non-copyable type is tricky. I can only throw errors up to the point that it starts setting up stored properties. Once that starts the init
must guarantee success (i.e. no throws after) or its a compile time error. So early on I verify I have a file URL and that the open happened successfully.
Otherwise, I set up DispatchIO. I create my DispatchQueue
to execute the IO operations on. Since I’m going to be reading and writing as a stream, I might as well keep a serial queue. I open the file using the POSIX open()
API, using the passed in mode, and set the permissions to 0o666
, which should give read/write to all. As mentioned before, I create DispatchIO as a .stream
since I’m going to sequentially read through the file or write it out.
The cleanupHandler
is another tricky bit. It’s called once DispatchIO is done with our fileDescriptor
. e.g. after someone’s called close()
. Since we opened the file, we take care of closing it. I’m also given an error as a parameter to my closure; if it’s non-zero something went wrong. Unfortunately, I ran into a compiler bug (according to the compiler itself) in trying to handle it.
A slight diversion about said compiler bug
What I’d like to do to handle the cleanupHandler
error is to have a private
actor like this:
final actor AsyncError {
private(set) var error: AsyncFileStreamError?
func setError(_ error: AsyncFileStreamError) {
self.error = error
}
}
I always create an instance in the init
and have it as a property on AsyncFileStream
. Inside of my cleanupHandler
I could set the actual error on it:
cleanupHandler: { [fileDescriptor] error in
if error != 0 {
Task {
await asyncError.setError(error)
}
}
// Since we opened the file, we need to close it
Darwin.close(fileDescriptor)
}
Since it’s a property, the read and write methods could check it in a guard
statement or the caller could explicitly read it off to check for an error. Unfortunately, when I actually implement this, attempting to touch the actor instance leads to:
Usage of a noncopyable type that compiler can't verify. This is a compiler bug. Please file a bug with a small example of the bug
Welp.
Moving on…
Creating an instance
I wanted to be able to create an instance of an AsyncFileStream
without remembering all the various mode flags for open()
. Also, I added the Mode
phantom type, but I’d rather that be invisible to the caller. i.e. they don’t have to type out the Mode
explicitly somewhere. If I use a direct init
though, they would, as there’s no way to infer it. For those reasons, I added an extension to URL
to create AsyncFileStream
:
public extension URL {
/// Create an instance from the URL for reading only
func openForReading() throws -> AsyncFileStream<ReadMode> {
try AsyncFileStream<ReadMode>(url: self, mode: O_RDONLY)
}
/// Create an instance from the URL for writing. It will overwrite if the file
/// already exists or create it if it does not exist.
func openForWriting() throws -> AsyncFileStream<WriteMode> {
try AsyncFileStream<WriteMode>(url: self, mode: O_WRONLY | O_TRUNC | O_CREAT)
}
}
If I have a URL
, I just have to call openForReading()
or openForWriting()
. Does what it says on the tin.
Cleaning up safely
Next, I’ll cover the parts of the type that are available regardless of Mode
, namely closing the file. Because this is a non-copyable type, close()
gets interesting (in a good way):
/// Close the file. Consuming method
public consuming func close() {
isClosed = true
io.close()
}
The body is a bit boring — it marks itself as closed so deinit
doesn’t close again, and then closes the DispatchIO object. The fact that this is marked consuming
is interesting though. Specifically, in that position, it means self
is consumed. Which means the caller can’t call any methods after it without getting a compiler error. For example:
let stream: AsyncFileStrea<ReadMode> // ... assume initialized
stream.close() // stream is now consumed
stream.readToEnd() // this is a compiler error!
That’s pretty neat! I’ve prevented an illegal operation a compile time.
To circle back to marking isClosed
to true
so deinit
doesn’t try to close it. In a consuming
method, I could — if the type is right — call discard self
. That destroys self and means deinit
doesn’t get called. Unfortunately, it only works if the type only contains “trivial” types, and AsyncFileStream
contains non-trivial types, apparently. From what I can discern “trivial” means you can do a bit-by-bit copy on the type, no reference counting. In any case, since I can’t discard self
, we need to mark something so deinit
knows not to try to call ios.close()
a second time.
Speaking of deinit
:
deinit {
// Ensure we've closed the file if we're going out of scope
if !isClosed {
io.close()
}
}
Since I’ve got a non-copyable type, I get a deinit
, which is just the bee’s knees. If I haven’t closed the file already, I do so now so I don’t leak anything.
Reading
If I create the type in ReadMode
, there are a couple of read
methods available.
public extension AsyncFileStream where Mode == ReadMode {
/// Read the entire contents of the file in one go
func readToEnd() async throws -> DispatchData { ... }
/// Read the next `length` bytes.
func read(upToCount length: Int) async throws -> DispatchData { ... }
}
Like marking close()
as consuming
, the goal of putting the methods in a conditional extension is to prevent misuse. Unless I open AsyncFileStream
in ReadMode
, I simply can’t call any read methods at compile time because they aren’t available.
As mentioned at the start of this section, the basic idea of read
is to wrap DispatchIO’s read in a continuation:
/// Read the next `length` bytes.
func read(upToCount length: Int) async throws -> DispatchData {
try await withCheckedThrowingContinuation { continuation in
var readData = DispatchData.empty
io.read(offset: 0, length: length, queue: queue) { done, data, error in
if let data {
readData.append(data)
}
guard done else {
return // not done yet
}
if error != 0 {
continuation.resume(throwing: AsyncFileStreamError.readError(error))
} else {
continuation.resume(returning: readData)
}
}
}
}
This is straight forward: wrap up io.read()
in checked throwing continuation. For my purposes I built up a single DispatchData
and returned it when finished. That’s because I want to use the entire contents as a single Data
instance. However, I could have returned an AsyncStream<DispatchData>
and streamed out the data as I got it in the callback. I’m also going to note DispatchIO explicitly marks the read as done with a done flag. Glaring at you, FileHandle
.
Also note that since I’m using the stream mode of DispatchIO, so offset
is ignored, which is why its always 0
.
Since my goal was to read the entire file at once, I added a helper method to do that:
/// Read the entire contents of the file in one go
func readToEnd() async throws -> DispatchData {
try await read(upToCount: .max)
}
It just calls through to the normal read()
.
Writing
Like reading, writing is gated based on the Mode
. Writing also works the same as reading: I wrap up a non-blocking call with a callback in a checked throwing continuation. Ugly, but straight forward and effective (like me!):
public extension AsyncFileStream where Mode == WriteMode {
/// Write the data out to file async
func write(_ data: DispatchData) async throws {
try await withCheckedThrowingContinuation { continuation in
io.write(
offset: 0,
data: data,
queue: queue
) { done, _, error in
guard done else {
return // not done yet
}
if error != 0 {
continuation.resume(throwing: AsyncFileStreamError.writeError(error))
} else {
continuation.resume(returning: ())
}
}
} as Void
}
}
The main difference from the structure of the read
is I ignore the second parameter in the callback, which is the data remaining to write out.
Some Data convenience
Finally, I get back to where I started. Namely, I want convenience methods on Data
that read and write entire files, but asynchronously. All I do is wrap up the methods I just defined:
public extension Data {
/// Asynchronously read from the contents of the fileURL. This method
/// will throw an error if it's not a file URL.
init(asyncContentsOf url: URL) async throws {
let stream = try url.openForReading()
self = try await Data(stream.readToEnd())
}
/// Asynchronously write the contents of self into the fileURL.
func asyncWrite(to url: URL) async throws {
// This line makes me sad because we're copying the data. I'm not
// currently aware of a way to not copy these bytes.
let dispatchData = withUnsafeBytes { DispatchData(bytes: $0) }
let stream = try url.openForWriting()
try await stream.write(dispatchData)
}
}
The one thing of note is in the asyncWrite()
method I make a copy of the bytes when I instantiate DispatchData
. I would really like to not have to do that, but I couldn’t find a way. If you know a way, please tap on the "Contact" button at the top of this page and send me an email or message me on Mastodon.
And that’s it!
Conclusion
In this post, I started with a couple of very convenient methods on Data
that make it easy to read or write entire files. However, I pointed out they’re synchronous, which can cause problems in Swift’s async/await world. I then walked through various async-safe ways or reading and writing files. Finally, I built out a general solution for async file I/O by wrapping up DispatchIO in checked continuations.