Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe Access to Contiguous Storage #2307

Merged
merged 73 commits into from
Sep 18, 2024

Conversation

glessard
Copy link
Contributor

@glessard glessard commented Feb 6, 2024

This is a proposal for safe access to borrowed memory, adding Span (formerly known as "BufferView") and RawSpan.

This proposal is a companion to (and depends on) Non-Escapable Types and on Lifetime Dependency Annotations for Non-escapable Types.

@rjmccall rjmccall added workgroup: blocked This proposal is blocked on some other consideration LSG Contains topics under the domain of the Language Steering Group labels Apr 1, 2024
@atrick atrick self-requested a review April 26, 2024 04:28
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved
proposals/nnnn-safe-shared-contiguous-storage.md Outdated Show resolved Hide resolved

```swift
extension Span where Element: ~Copyable & ~Escapable {
func withUnsafeBufferPointer<E: Error, Result: ~Copyable & ~Escapable>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Result can be ~Escapable here without the lifetimes proposal.

I have a meta-question about the design here. Span is already non-escaping itself, and these operations that produce an unsafe buffer pointer obviously have to break that requirement. Given that, it doesn't feel like using the closure here is providing extra safety over (e.g.) a property that produces the UnsafeBufferPointer, but it's a lot less ergonomic. Could we consider instead to have:

var unsafeBufferPointer: UnsafeBufferPointer<Element> { get }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span.withUnsafeBufferPointer ensures that the Span's lifetime, and whatever that span depends on, extend across all valid uses of the pointer. The rules for pointer validity in closure taking methods are clear. There's nothing new here. As long as programmers follow the usual rules, they can't introduce any new form of unsafety.
The compiler should be able to diagnose pointer escapes from the closure, but that's a separate discussion!

Comment on lines 234 to 243
func withUnsafeBytes<E: Error, Result: ~Copyable & ~Escapable>(
_ body: (_ buffer: UnsafeRawBufferPointer) throws(E) -> Result
) throws(E) -> Result
}

extension RawSpan {
func withUnsafeBytes<E: Error, Result: ~Copyable & ~Escapable>(
_ body: (_ buffer: UnsafeRawBufferPointer) throws(E) -> Result
) throws(E) -> Result
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion about having a property here:

var unsafeTypes: UnsafeRawBufferPointer { get }

}
```

##### Accessing subranges of elements:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted earlier on, all of the extracting functions depend on lifetime dependencies. Perhaps provide a closure-based withSubspan?

/// - Parameters:
/// - type: The type as which to view the bytes of this span.
/// - Returns: A typed span viewing these bytes as instances of `T`.
public func unsafeView<T: BitwiseCopyable>(as type: T.Type) -> Span<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this requires lifetime dependencies. Change it to a closure-taking withUnsafeView?

/// The closure's parameter is valid only for the duration of
/// its execution.
/// - Returns: The return value of the `body` closure parameter.
func withUnsafeBytes<E: Error, Result: ~Copyable & ~Escapable>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted earlier, can this be a property?

```swift
extension RawSpan {

public func extracting(_ bounds: Range<Int>) -> Self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as earlier; this is based on lifetime dependencies.


Some types store their internal representation in a piecewise-contiguous manner, such as trees and ropes. Some operations naturally return information in a piecewise-contiguous manner, such as network operations. These could supply results by iterating through a list of contiguous chunks of memory.

#### <a name="MutableSpan"></a>Safe mutations of memory with `MutableSpan<T>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to this proposal! ;)

public func extracting(droppingFirst k: Int) -> Self
/// - Parameters:
/// - indices: a range of indices to validate
public func boundsPrecondition(_ indices: Range<Int>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span.boundsPrecondition(range) is just precondition(span.boundsContain(range)), right? Do we really need/want sugar for this?

Re: naming, I think things get somewhat easier if we drop the precondition func, then we just have boundsContain, whose naming isn't great because it's kind of oblique. The question we're really asking is "are the indices in this range valid for this span?". Something like span.indicesContain(range), though that then raises the question of "why isn't this just span.indices.contains(range)?" (probably performance?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have these, do we also need ClosedRange versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, boundsPrecondition's implementation is precondition(boundsContain(range)). There's maybe an argument to be made that the sugar hurts here, since it doesn't capture the correct line of code in the defaulted arguments of precondition().

I haven't needed the ClosedRange versions because the RangeExpression-taking functions end up forwarding to the usual Range-taking functions. Maybe we would want them. We could take out the sugared preconditions and add ClosedRange bounds-checking functions.

Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great to me, thank you

public func extracting(droppingFirst k: Int) -> Self
/// - Parameters:
/// - indices: a range of indices to validate
public func boundsPrecondition(_ indices: Range<Int>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have these, do we also need ClosedRange versions?

Comment on lines +485 to +487
public func withSpan<E: Error, Result: ~Copyable>(
_ body: (_ elements: Span<Element>) throws(E) -> Result
) throws(E) -> Result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm one of those people who would like this API :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complex semantics of this makes it undesirable, as exhibited by the overcomplicated signature. What we actually need is a storage property that directly returns a Span, tied to the lifetime of self.

```swift
@frozen
public struct Span<Element: ~Copyable & ~Escapable>: Copyable, ~Escapable {
internal var _start: UnsafePointer<Element>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span has the wrong pointer type: UnsafePointer<Element>. The underlying pointer should be UnsafeRawPointer. An important goal of Span is that it can be used to reinterpret memory as different BitwiseCopyable elements. It needs to be possible to convert Span<T: BitwiseCopyable> to Span<U: BitwiseCopyable>. That should require an explicit unsafe API but should still be well-defined. If you do this conversion with an underlying typed pointer, you get undefined behavior. We want multiple read-only spans to safely view the same underlying memory as different types. We also want to be able to view slices of a RawSpan as a particular element type. This is the same problem. Note that RawSpan does not provide any additional functionality; it is only a convenience for buffers of heterogenous elements, where declaring an element type might be misleading at the API level.

We do not want to describe strict aliasing rules and undefined behavior in the Span API!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm; unrelated to the above: didn't we intend to use buffer-pointer layout for this? The start pointer ought to be optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to be optional, with an explanation (hadn't noticed that a push failed on Friday.) We won't use buffer-pointer layout after all, because we do want the pointer to be raw.

extension RawSpan {
public init<T: BitwiseCopyable>(_ span: Span<T>)

public func unsafeView<T: BitwiseCopyable>(as type: T.Type) -> Span<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're mentioning RawSpan.unsafeView<T: BitwiseCopyable>(as type: T.Type) -> Span<T> here, we may also want to explain that this is a safe API as long as the memory contains a value bit pattern for T at each position. Span<T> does not impose any strict aliasing demands on the programmer; they can use multiple read-only spans to safely view the same underlying memory as different types.

public var count: Int { get }
public var isEmpty: Bool { get }

public subscript(_ position: Int) -> Element { _read }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public subscript(_ position: Int) -> Element { _read }

This is not, in fact, the API that we want long term; it won't be compatible with lifetime-dependent values that are derived from the span.

We really want some new spelling for ~Copyable projections, like:

public subscript(_ position: Int) -> Element { borrow }

For now, we could say:

public subscript(_ position: Int) -> Element { unsafeAddress }

Either way, we end up with an extra mangling for this API that may become vestigial in the future. But with unsafeAddress, at least we have the full functionality of some future, safe "borrow accessor".

@jckarter what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The future directions attempts to explain that the _read accessor is a placeholder. I expect the implementation of such placeholders to be non-ABI until there is a final spelling. I could add a reference to the future directions item after this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I realize the plan is to initially implement Span.subscript via _read. I'll just point out that it will limit usability. unsafeAddress will at least support the same use cases that a borrow accessor would support.

- add `boundsContain(_ bounds: ClosedRange<Int>)`
```swift
@frozen
public struct Span<Element: ~Copyable & ~Escapable>: Copyable, ~Escapable {
internal var _start: UnsafePointer<Element>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm; unrelated to the above: didn't we intend to use buffer-pointer layout for this? The start pointer ought to be optional.

proposals/nnnn-span-access-shared-contiguous-storage.md Outdated Show resolved Hide resolved
/// - Parameters:
/// - index: an index to validate
/// - Returns: true if `index` is a valid index
public func boundsContain(_ index: Int) -> Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name needs to be reconsidered. isValidIndex(_:)

proposals/nnnn-span-access-shared-contiguous-storage.md Outdated Show resolved Hide resolved
/// Parameters:
/// - span: a span of the same type as `self`
/// Returns: whether `span` is a subrange of `self`
public func contains(_ span: borrowing Self) -> Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we have blown the contains name to mean elementwise containment, not structural containment. This usage is not compatible with that, so we need to use a different name for it.

isSuperspan(of:)? isSubspan(of:)? isWider(than:)? 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notes from in person discussions]

  1. span.isWithin(other) appears to be a workable spelling for this.

  2. We also need to have an isIdentical(to:) method, to check equivalence. (a.isIdentical(to: b) == a.isWithin(b) && b.isWithin(a), but we wouldn't want to spell/implement it that way.)

  3. If the elements are bitwise copyable, then this operation needs to check alignment, which is a surprising complication, explaining why we'd want to ship this functionality in the stdlib. (This also needs to be called out within the docs.)

    This may also turn into a solid argument against using the isWithin name, especially when considering the equivalent APIs on Range in point 5 below. (If so, perhaps something like isSubspan is the way to go.)

  4. We should add equivalent members to the UnsafeBufferPointer types, to keep the stdlib coherent with itself, and to resolve some long standing omissions of basic functionality.

    extension Unsafe[Mutable]BufferPointer {
      func isIdentical(to other: UnsafeBufferPointer<Element>) -> Bool
      func isIdentical(to other: UnsafeMutableBufferPointer<Element>) -> Bool
      func isWithin(_ other: UnsafeBufferPointer<Element>) -> Bool
      func isWithin(_ other: UnsafeMutableBufferPointer<Element>) -> Bool
    }
  5. isWithin ought to also be added to the range types:

    extension Range {
      func isWithin(other: Range) -> Bool
      func isWithin(other: ClosedRange) -> Bool
    }
    extension ClosedRange {
      func isWithin(other: Range) -> Bool
      func isWithin(other: ClosedRange) -> Bool
    }

    (These types already provide the functionality of isIdentical in their == operation.)

    The notion of unaligned bounds makes this somewhat dangerous -- Range<UnsafePointer<T>>.isWithin(_:) would sometimes return different results than an isWithin call on UnsafeBufferPointer or Span values over the same region of memory.

  6. The preexisting [Closed]Range.overlaps hints at a need to provide [Raw]Span.overlaps and Unsafe[Mutable][Raw]BufferPointer.overlaps operations. (As well as intersect and union/merge/combine operations.) This does not seem to be an urgent need though.

Comment on lines +270 to +271
/// This is an unsafe operation. Failure to meet the preconditions
/// above may produce an invalid value of `T`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the address has incorrect alignment, are we really going to return an invalid value? (Rather than, say, sometimes causing a fault?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There exist systems where a load instruction simply masks off the low bits and always loads whatever it finds at the resulting aligned address (AltiVec loads behave this way on PPC, for example).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unaligned loads can fault in embedded world as well, especially when targeting memory that isn't DRAM. In a sense, the worst outcome isn't necessarily an invalid value of T.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this is that there is no trap other than bounds checking, and the T instance's memory could be in a state that does not respect T's invariants.

Comment on lines 411 to 413
public func contains(_ span: borrowing Self) -> Bool

public func offsets(of span: borrowing Self) -> Range<Int>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problems as with Span -- contains implements the wrong semantics for that name, and offsets(of:) has an awkward name/shape.

Copy link
Member

@lorentey lorentey Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public func isWithin(_ other: borrowing Self) -> Bool // Perhaps removed
public func byteOffsets(of subspan: borrowing Self) -> Range<Int>?

Comment on lines +485 to +487
public func withSpan<E: Error, Result: ~Copyable>(
_ body: (_ elements: Span<Element>) throws(E) -> Result
) throws(E) -> Result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complex semantics of this makes it undesirable, as exhibited by the overcomplicated signature. What we actually need is a storage property that directly returns a Span, tied to the lifetime of self.


```swift
@frozen
public struct Span<Element: ~Copyable & ~Escapable>: Copyable, ~Escapable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we proposing Span with support for nonescapable elements?

@DougGregor DougGregor merged commit be7789f into swiftlang:main Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LSG Contains topics under the domain of the Language Steering Group workgroup: blocked This proposal is blocked on some other consideration
Projects
None yet
Development

Successfully merging this pull request may close these issues.