UnsafeCursor

class UnsafeCursor

A handle to the underlying data in a buffer. This handle is unsafe because it does not enforce its own invariants. Instead, it assumes a careful user who has studied Okio's implementation details and their consequences.

Buffer Internals

----------------

Most code should use Buffer as a black box: a class that holds 0 or more bytes of data with efficient APIs to append data to the end and to consume data from the front. Usually this is also the most efficient way to use buffers because it allows Okio to employ several optimizations, including:

Fast Allocation: Buffers use a shared pool of memory that is not zero-filled before use.
Fast Resize: A buffer's capacity can change without copying its contents.
Fast Move: Memory ownership can be reassigned from one buffer to another.
Fast Copy: Multiple buffers can share the same underlying memory.
Fast Encoding and Decoding: Common operations like UTF-8 encoding and decimal decoding do not require intermediate objects to be allocated.

These optimizations all leverage the way Okio stores data internally. Okio Buffers are implemented using a doubly-linked list of segments. Each segment is a contiguous range within a 8 KiB ByteArray. Each segment has two indexes, start, the offset of the first byte of the array containing application data, and end, the offset of the first byte beyond start whose data is undefined.

New buffers are empty and have no segments:

val buffer = Buffer()

We append 7 bytes of data to the end of our empty buffer. Internally, the buffer allocates a segment and writes its new data there. The lone segment has an 8 KiB byte array but only 7 bytes of data:

buffer.writeUtf8("sealion")

// [ 's', 'e', 'a', 'l', 'i', 'o', 'n', '?', '?', '?', ...]
//    ^                                  ^
// start = 0                          end = 7

When we read 4 bytes of data from the buffer, it finds its first segment and returns that data to us. As bytes are read the data is consumed. The segment tracks this by adjusting its internal indices.

buffer.readUtf8(4) // "seal"

// [ 's', 'e', 'a', 'l', 'i', 'o', 'n', '?', '?', '?', ...]
//                        ^              ^
//                     start = 4      end = 7

As we write data into a buffer we fill up its internal segments. When a write doesn't fit into a buffer's last segment, additional segments are allocated and appended to the linked list of segments. Each segment has its own start and end indexes tracking where the user's data begins and ends.

val xoxo = new Buffer()
xoxo.writeUtf8("xo".repeat(5_000))

// [ 'x', 'o', 'x', 'o', 'x', 'o', 'x', 'o', ..., 'x', 'o', 'x', 'o']
//    ^                                                               ^
// start = 0                                                      end = 8192
//
// [ 'x', 'o', 'x', 'o', ..., 'x', 'o', 'x', 'o', '?', '?', '?', ...]
//    ^                                            ^
// start = 0                                   end = 1808

The start index is always inclusive and the end index is always exclusive. The data preceding the start index is undefined, and the data at and following the end index is undefined.

After the last byte of a segment has been read, that segment may be returned to an internal segment pool. In addition to reducing the need to do garbage collection, segment pooling also saves the JVM from needing to zero-fill byte arrays. Okio doesn't need to zero-fill its arrays because it always writes memory before it reads it. But if you look at a segment in a debugger you may see its effects. In this example, one of the "xoxo" segments above is reused in an unrelated buffer:

val abc = new Buffer()
abc.writeUtf8("abc")

// [ 'a', 'b', 'c', 'o', 'x', 'o', 'x', 'o', ...]
//    ^              ^
// start = 0     end = 3

There is an optimization in Buffer.clone() and other methods that allows two segments to share the same underlying byte array. Clones can't write to the shared byte array; instead they allocate a new (private) segment early.

val nana = new Buffer()
nana.writeUtf8("na".repeat(2_500))
nana.readUtf8(2) // "na"

// [ 'n', 'a', 'n', 'a', ..., 'n', 'a', 'n', 'a', '?', '?', '?', ...]
//              ^                                  ^
//           start = 2                         end = 5000

nana2 = nana.clone()
nana2.writeUtf8("batman")

// [ 'n', 'a', 'n', 'a', ..., 'n', 'a', 'n', 'a', '?', '?', '?', ...]
//              ^                                  ^
//           start = 2                         end = 5000
//
// [ 'b', 'a', 't', 'm', 'a', 'n', '?', '?', '?', ...]
//    ^                             ^
//  start = 0                    end = 6

Segments are not shared when the shared region is small (ie. less than 1 KiB). This is intended to prevent fragmentation in sharing-heavy use cases.

Unsafe Cursor API

-----------------

This class exposes privileged access to the internal byte arrays of a buffer. A cursor either references the data of a single segment, it is before the first segment (offset == -1), or it is after the last segment (offset == buffer.size).

Call UnsafeCursor.seek to move the cursor to the segment that contains a specified offset. After seeking, UnsafeCursor.data references the segment's internal byte array, UnsafeCursor.start is the segment's start and UnsafeCursor.end is its end.

Call UnsafeCursor.next to advance the cursor to the next segment. This returns -1 if there are no further segments in the buffer.

Use Buffer.readUnsafe to create a cursor to read buffer data and Buffer.readAndWriteUnsafe to create a cursor to read and write buffer data. In either case, always call UnsafeCursor.close when done with a cursor. This is convenient with Kotlin's use extension function. In this example we read all of the bytes in a buffer into a byte array:

val bufferBytes = ByteArray(buffer.size.toInt())

buffer.readUnsafe().use { cursor ->
  while (cursor.next() != -1) {
    System.arraycopy(cursor.data, cursor.start,
        bufferBytes, cursor.offset.toInt(), cursor.end - cursor.start);
  }
}

Change the capacity of a buffer with resizeBuffer. This is only permitted for read+write cursors. The buffer's size always changes from the end: shrinking it removes bytes from the end; growing it adds capacity to the end.

Warnings

--------

Most application developers should avoid this API. Those that must use this API should respect these warnings.

Don't mutate a cursor. This class has public, non-final fields because that is convenient for low-level I/O frameworks. Never assign values to these fields; instead use the cursor API to adjust these.

Never mutate data unless you have read+write access. You are on the honor system to never write the buffer in read-only mode. Read-only mode may be more efficient than read+write mode because it does not need to make private copies of shared segments.

Only access data in [start..end). Other data in the byte array is undefined! It may contain private or sensitive data from other parts of your process.

Always fill the new capacity when you grow a buffer. New capacity is not zero-filled and may contain data from other parts of your process. Avoid leaking this information by always writing something to the newly-allocated capacity. Do not assume that new capacity will be filled with 0; it will not be.

Do not access a buffer while is being accessed by a cursor. Even simple read-only operations like Buffer.clone are unsafe because they mark segments as shared.

Do not hard-code the segment size in your application. It is possible that segment sizes will change with advances in hardware. Future versions of Okio may even have heterogeneous segment sizes.

These warnings are intended to help you to use this API safely. It's here for developers that need absolutely the most throughput. Since that's you, here's one final performance tip. You can reuse instances of this class if you like. Use the overloads of Buffer.readUnsafe and Buffer.readAndWriteUnsafe that take a cursor and close it after use.

class UnsafeCursor : Closeable