Reading and writing files in Swift async/await

Because of habits ingrained in me, by default I tend to reach for synchronous, blocking APIs when reading and writing data to and from disk. This causes problems with Swift’s cooperatively scheduled Tasks. In this post, I examine the various async-safe approaches I’ve discovered to hitting disk, and end with a general approach that I ended up using.

The synchronous way

If I’m loading up data from disk to decode from JSON or decode into a UIImage, then my go to has been:


let data = try Data(contentsOf: fileURL)

Similarly, if I’m writing said JSON or image data out to disk, I’ll use write:


try data.write(to: fileURL)

These are simple and easy to use. They also create problems with Swift async/await.

When I look at the Swift Foundation implementation of these methods, I can see by default they do blocking I/O operations. This is a problem because Tasks are cooperatively scheduled over kernel threads. (I know this is probably an over-simplification, but stick with me.) If a Task traps to the kernel on a blocking call (like the blocking I/O calls), the kernel doesn’t know anything about the Task it only sees the system thread blocking, and therefore suspends it. This robs Swift’s async runtime from the opportunity to just suspend the Task making the I/O call and pick up a different Task that is available to run. i.e. this takes away one of the system threads Swift async/await can use for the duration of the blocking I/O call.

So while simple, it would probably be best if I didn’t use them in the async/await world.

Option 1: side step the issue via memory mapping

If all I care about is reading in data, I can consider asking Data to memory map the file. Depending on the access patterns of the end user, this might actually be the fastest way anyway. The read accesses become implicit whenever someone touches the contents of Data and the kernel faults in the appropriate pages.


let data = try Data(contentsOf: fileURL, options: .mappedIfSafe)

A caveat is not all files can be memory mapped. There’s also no automatic way to use memory mapping for writes using Data APIs. While I could roll my own, I’m not sure what the benefit of that would be.

Option 2: use URLSession

The next option I discovered was leveraging URLSession to do asynchronous reads from file URLs.

Like this:


let (data, _) = try await URLSession.shared.data(from: fileURL)

For reading the entire file in one go, this works well. Unfortunately, I couldn’t find a way to “upload” to a file URL. Probably for good reason. So this is a read-only solution.

Quick aside about async URLSession: you probably don’t want to use the bytes method.

Option 3: wrap up FileHandle

This is the first solution I found that handles both asynchronous reads and writes. The basic principle is simple: create a FileHandle then use the readabilityHandler or writeabilityHandler properties to either read data out or write data into it. For reading, I returned an AsyncStream<Data> that the readabilityHandler yields to.

Side note: I do not recommend using the bytes property to iterate one byte at a time.

I got really far with this approach and had it working. However, for files, I found it clunky. Namely:

When reading, I had to get the size of the file first so I knew when the readabilityHandler got called the last time. A lot of sample code assumes that the availableData will be isEmpty on the last call. However, in my testing, I found that was true for Pipes but not for files.
When writing, the call to write was still synchronous. Since I was doing it inside the writeabilityHandler, presumably it wouldn’t actually block but that’s not really clear.

After getting FileHandle to work, I decided it was too clunky and went for a different approach. But it is an option.

Option 4: DispatchIO

This is the approach I ended up using. DispatchIO is low-level, but its API was consistent and straight forward to wrap up. The basic principle is to use DispatchIO to get non-blocking reads/writes and then wrap those in checked continuations. That way Swift async/await knows when to suspend or resume the calling Task. Below I go into detail on my implementation. As an added bonus to the async/await, this is my first go at a non-copyable type.

I’ll start with the type declarations:


/// A phantom type used by AsyncFileStream to restrict methods to read mode
public enum ReadMode {}
/// A phantom type used by AsyncFileStream to restrict methods to write mode
public enum WriteMode {}

public struct AsyncFileStream<Mode>: ~Copyable {
    // ...
}

The actual type of AsyncFileStream is complicated by the fact that I’m trying to make invalid operations impossible for the caller. I achieve this two ways:

I use a phantom type called Mode that can either be ReadMode or WriteMode. Read methods are only available when Mode == ReadMode and write methods only when Mode == WriteMode
I mark AsyncFileStream as non-copyable with ~Copyable. This means there can be only one copy, preventing multiple Tasks/threads/whatever from calling it at the same time. It also means I get a nice deinit method to clean up in, and I can mark my close() method as consuming. This means the instance can’t be used after close() is called, or the compiler at compile time will emit an error.

AsyncFileStream has the properties you’d probably expect if you’ve used DispatchIO before:


/// The queue to run the operations on
private let queue: DispatchQueue
/// The unix file descriptor for the open file
private let fileDescriptor: Int32
/// The DispatchIO instance used to issue operations
private let io: DispatchIO
/// If the file is open or not; used to prevent double closes()
private var isClosed = false

If you haven’t used DispatchIO before, don’t worry, these properties will make sense when I use them.

Creating an instance is actually the most involved part of this whole type, but it’s a good place to start:


fileprivate init(url: URL, mode: Int32) throws {
    guard url.isFileURL else {
        throw AsyncFileStreamError.notFileURL
    }
    // Since we're reading/writing as a stream, keep it a serial queue
    let queue = DispatchQueue(label: "AsyncFileStream")
    let fileDescriptor = open(url.absoluteURL.path, mode, 0o666)
    // Once we start setting properties, we can't throw. So check to see if
    //  we need to throw now, then set properties
    if fileDescriptor == -1 {
        throw AsyncFileStreamError.openError(errno)
    }
    self.queue = queue
    self.fileDescriptor = fileDescriptor
    io = DispatchIO(
        type: .stream,
        fileDescriptor: fileDescriptor,
        queue: queue,
        cleanupHandler: { [fileDescriptor] error in
            // Unfortunately, we can't seem to do anything with `error`.
            // There are no guarantees when this closure is invoked, so
            //  the safe thing would be to save the error in an actor
            //  that the AsyncFileStream holds. That would allow the caller
            //  to check for it, or the read()/write() methods to check
            //  for it as well. Howevever, having an actor as a property
            //  on a non-copyable type appears to uncover a compiler bug.

            // Since we opened the file, we need to close it
            Darwin.close(fileDescriptor)
        }
    )
}

First, notice the init is fileprivate. I’m going to provide much more ergonomic APIs for creating an instance than knowing how to fill out mode correctly. But there’s a lot of shared code, so this init takes care of that. Error handling in the init of a non-copyable type is tricky. I can only throw errors up to the point that it starts setting up stored properties. Once that starts the init must guarantee success (i.e. no throws after) or its a compile time error. So early on I verify I have a file URL and that the open happened successfully.

Otherwise, I set up DispatchIO. I create my DispatchQueue to execute the IO operations on. Since I’m going to be reading and writing as a stream, I might as well keep a serial queue. I open the file using the POSIX open() API, using the passed in mode, and set the permissions to 0o666, which should give read/write to all. As mentioned before, I create DispatchIO as a .stream since I’m going to sequentially read through the file or write it out.

The cleanupHandler is another tricky bit. It’s called once DispatchIO is done with our fileDescriptor. e.g. after someone’s called close(). Since we opened the file, we take care of closing it. I’m also given an error as a parameter to my closure; if it’s non-zero something went wrong. Unfortunately, I ran into a compiler bug (according to the compiler itself) in trying to handle it.

A slight diversion about said compiler bug

What I’d like to do to handle the cleanupHandler error is to have a private actor like this:


final actor AsyncError {
    private(set) var error: AsyncFileStreamError?

    func setError(_ error: AsyncFileStreamError) {
        self.error = error
    }
}

I always create an instance in the init and have it as a property on AsyncFileStream. Inside of my cleanupHandler I could set the actual error on it:


cleanupHandler: { [fileDescriptor] error in
    if error != 0 {
        Task {
            await asyncError.setError(error)
        }
    }
    // Since we opened the file, we need to close it
    Darwin.close(fileDescriptor)
}

Since it’s a property, the read and write methods could check it in a guard statement or the caller could explicitly read it off to check for an error. Unfortunately, when I actually implement this, attempting to touch the actor instance leads to:

Usage of a noncopyable type that compiler can't verify. This is a compiler bug. Please file a bug with a small example of the bug

Welp.

Moving on…

Creating an instance

I wanted to be able to create an instance of an AsyncFileStream without remembering all the various mode flags for open(). Also, I added the Mode phantom type, but I’d rather that be invisible to the caller. i.e. they don’t have to type out the Mode explicitly somewhere. If I use a direct init though, they would, as there’s no way to infer it. For those reasons, I added an extension to URL to create AsyncFileStream:


public extension URL {
    /// Create an instance from the URL for reading only
    func openForReading() throws -> AsyncFileStream<ReadMode> {
        try AsyncFileStream<ReadMode>(url: self, mode: O_RDONLY)
    }

    /// Create an instance from the URL for writing. It will overwrite if the file
    /// already exists or create it if it does not exist.
    func openForWriting() throws -> AsyncFileStream<WriteMode> {
        try AsyncFileStream<WriteMode>(url: self, mode: O_WRONLY | O_TRUNC | O_CREAT)
    }
}

If I have a URL, I just have to call openForReading() or openForWriting(). Does what it says on the tin.

Cleaning up safely

Next, I’ll cover the parts of the type that are available regardless of Mode, namely closing the file. Because this is a non-copyable type, close() gets interesting (in a good way):


/// Close the file. Consuming method
public consuming func close() {
    isClosed = true
    io.close()
}

The body is a bit boring — it marks itself as closed so deinit doesn’t close again, and then closes the DispatchIO object. The fact that this is marked consuming is interesting though. Specifically, in that position, it means self is consumed. Which means the caller can’t call any methods after it without getting a compiler error. For example:


let stream: AsyncFileStrea<ReadMode> // ... assume initialized
stream.close() // stream is now consumed

stream.readToEnd() // this is a compiler error!

That’s pretty neat! I’ve prevented an illegal operation a compile time.

To circle back to marking isClosed to true so deinit doesn’t try to close it. In a consuming method, I could — if the type is right — call discard self. That destroys self and means deinit doesn’t get called. Unfortunately, it only works if the type only contains “trivial” types, and AsyncFileStream contains non-trivial types, apparently. From what I can discern “trivial” means you can do a bit-by-bit copy on the type, no reference counting. In any case, since I can’t discard self, we need to mark something so deinit knows not to try to call ios.close() a second time.

Speaking of deinit:


deinit {
    // Ensure we've closed the file if we're going out of scope
    if !isClosed {
        io.close()
    }
}

Since I’ve got a non-copyable type, I get a deinit, which is just the bee’s knees. If I haven’t closed the file already, I do so now so I don’t leak anything.

Reading

If I create the type in ReadMode, there are a couple of read methods available.


public extension AsyncFileStream where Mode == ReadMode {
    /// Read the entire contents of the file in one go
    func readToEnd() async throws -> DispatchData { ... }

    /// Read the next `length` bytes.
    func read(upToCount length: Int) async throws -> DispatchData { ... }
}

Like marking close() as consuming, the goal of putting the methods in a conditional extension is to prevent misuse. Unless I open AsyncFileStream in ReadMode, I simply can’t call any read methods at compile time because they aren’t available.

As mentioned at the start of this section, the basic idea of read is to wrap DispatchIO’s read in a continuation:


/// Read the next `length` bytes.
func read(upToCount length: Int) async throws -> DispatchData {
    try await withCheckedThrowingContinuation { continuation in
        var readData = DispatchData.empty
        io.read(offset: 0, length: length, queue: queue) { done, data, error in
            if let data {
                readData.append(data)
            }
            guard done else {
                return // not done yet
            }
            if error != 0 {
                continuation.resume(throwing: AsyncFileStreamError.readError(error))
            } else {
                continuation.resume(returning: readData)
            }
        }
    }
}

This is straight forward: wrap up io.read() in checked throwing continuation. For my purposes I built up a single DispatchData and returned it when finished. That’s because I want to use the entire contents as a single Data instance. However, I could have returned an AsyncStream<DispatchData> and streamed out the data as I got it in the callback. I’m also going to note DispatchIO explicitly marks the read as done with a done flag. Glaring at you, FileHandle.

Also note that since I’m using the stream mode of DispatchIO, so offset is ignored, which is why its always 0.

Since my goal was to read the entire file at once, I added a helper method to do that:


/// Read the entire contents of the file in one go
func readToEnd() async throws -> DispatchData {
    try await read(upToCount: .max)
}

It just calls through to the normal read().

Writing

Like reading, writing is gated based on the Mode. Writing also works the same as reading: I wrap up a non-blocking call with a callback in a checked throwing continuation. Ugly, but straight forward and effective (like me!):


public extension AsyncFileStream where Mode == WriteMode {
    /// Write the data out to file async
    func write(_ data: DispatchData) async throws {
        try await withCheckedThrowingContinuation { continuation in
            io.write(
                offset: 0,
                data: data,
                queue: queue
            ) { done, _, error in
                guard done else {
                    return // not done yet
                }
                if error != 0 {
                    continuation.resume(throwing: AsyncFileStreamError.writeError(error))
                } else {
                    continuation.resume(returning: ())
                }
            }
        } as Void
    }
}

The main difference from the structure of the read is I ignore the second parameter in the callback, which is the data remaining to write out.

Some Data convenience

Finally, I get back to where I started. Namely, I want convenience methods on Data that read and write entire files, but asynchronously. All I do is wrap up the methods I just defined:


public extension Data {
    /// Asynchronously read from the contents of the fileURL. This method
    /// will throw an error if it's not a file URL.
    init(asyncContentsOf url: URL) async throws {
        let stream = try url.openForReading()
        self = try await Data(stream.readToEnd())
    }

    /// Asynchronously write the contents of self into the fileURL.
    func asyncWrite(to url: URL) async throws {
        // This line makes me sad because we're copying the data. I'm not
        //  currently aware of a way to not copy these bytes.
        let dispatchData = withUnsafeBytes { DispatchData(bytes: $0) }
        let stream = try url.openForWriting()
        try await stream.write(dispatchData)
    }
}

The one thing of note is in the asyncWrite() method I make a copy of the bytes when I instantiate DispatchData. I would really like to not have to do that, but I couldn’t find a way. If you know a way, please tap on the "Contact" button at the top of this page and send me an email or message me on Mastodon.

And that’s it!

Conclusion

In this post, I started with a couple of very convenient methods on Data that make it easy to read or write entire files. However, I pointed out they’re synchronous, which can cause problems in Swift’s async/await world. I then walked through various async-safe ways or reading and writing files. Finally, I built out a general solution for async file I/O by wrapping up DispatchIO in checked continuations.

Safe from the Losing Fight

mac and ios development

About