Conversation
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Nikita Popov (nikic) ChangesThis introduces The semantics are the same as replacing the store with a call like this: This metadata is intended for annotation by frontends -- it's not something we can feasibly infer at this point, as it would require analyzing uses of the pointer stored in memory. The motivating use case for this is Rust's 10 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8e863939781a2..22b58bf0f5735 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -1489,6 +1489,8 @@ Currently, only the following parameter attributes are defined:
function, returning a pointer to allocated storage disjoint from the
storage for any other object accessible to the caller.
+.. _captures_attr:
+
``captures(...)``
This attribute restricts the ways in which the callee may capture the
pointer. This is not a valid attribute for return values. This attribute
@@ -7543,6 +7545,33 @@ The number of bytes known to be dereferenceable is specified by the integer
value in the metadata node. This is analogous to the ''dereferenceable_or_null''
attribute on parameters and return values.
+'``captures``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``!captures`` metadata can only be applied to ``store`` instructions with
+a pointer-typed value operand. It restricts the capturing behavior of the store
+value operand in the same way the ``captures(...)`` attribute would do on a
+call. See the :ref:`pointer capture section <pointercapture>` for a detailed
+discussion of capture semantics.
+
+The ``!captures`` metadata accepts a non-empty list of strings from the same
+set as the :ref:`captures attribute <captures_attr>`:
+``!"address"``, ``!"address_is_null"``, ``!"provenance"`` and
+``!"read_provenance"``. ``!"none"`` is not supported.
+
+For example ``store ptr %x, ptr %y, !captures !{!"address"}`` indicates that
+the copy of pointer ``%x`` stored to location ``%y`` will only be used to
+inspect its integral address value, and not dereferenced. Dereferencing the
+pointer would result in undefined behavior.
+
+Similarly ``store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}``
+indicates that while reads through the stored pointer are allowed, writes would
+result in undefined behavior.
+
+The ``!captures`` attribute makes no statement about other uses of ``%x``, or
+uses of the stored-to memory location after it has been overwritten with a
+different value.
+
.. _llvm.loop:
'``llvm.loop``'
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index d09cc15d65ff6..0603abcd6a4da 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -55,3 +55,4 @@ LLVM_FIXED_MD_KIND(MD_mmra, "mmra", 40)
LLVM_FIXED_MD_KIND(MD_noalias_addrspace, "noalias.addrspace", 41)
LLVM_FIXED_MD_KIND(MD_callee_type, "callee_type", 42)
LLVM_FIXED_MD_KIND(MD_nofree, "nofree", 43)
+LLVM_FIXED_MD_KIND(MD_captures, "captures", 44)
diff --git a/llvm/include/llvm/IR/Instructions.h b/llvm/include/llvm/IR/Instructions.h
index 95a0a7fd2f97e..43e9b7b338589 100644
--- a/llvm/include/llvm/IR/Instructions.h
+++ b/llvm/include/llvm/IR/Instructions.h
@@ -393,6 +393,9 @@ class StoreInst : public Instruction {
return getPointerOperandType()->getPointerAddressSpace();
}
+ /// Get capturing behavior of the value operand, based on !captures metadata.
+ CaptureComponents getCaptureComponents() const;
+
// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const Instruction *I) {
return I->getOpcode() == Instruction::Store;
diff --git a/llvm/lib/Analysis/CaptureTracking.cpp b/llvm/lib/Analysis/CaptureTracking.cpp
index a0fe7f9037e47..0d4ed2dd4f2a7 100644
--- a/llvm/lib/Analysis/CaptureTracking.cpp
+++ b/llvm/lib/Analysis/CaptureTracking.cpp
@@ -320,8 +320,11 @@ UseCaptureInfo llvm::DetermineUseCaptureKind(const Use &U, const Value *Base) {
return CaptureComponents::None;
case Instruction::Store:
// Stored the pointer - conservatively assume it may be captured.
+ if (U.getOperandNo() == 0)
+ return cast<StoreInst>(I)->getCaptureComponents();
+
// Volatile stores make the address observable.
- if (U.getOperandNo() == 0 || cast<StoreInst>(I)->isVolatile())
+ if (cast<StoreInst>(I)->isVolatile())
return CaptureComponents::All;
return CaptureComponents::None;
case Instruction::AtomicRMW: {
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index daebf447a2107..220a713028392 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -1390,6 +1390,24 @@ StoreInst::StoreInst(Value *val, Value *addr, bool isVolatile, Align Align,
AssertOK();
}
+CaptureComponents StoreInst::getCaptureComponents() const {
+ const MDNode *MD = getMetadata(LLVMContext::MD_captures);
+ if (!MD)
+ return CaptureComponents::All;
+
+ CaptureComponents CC = CaptureComponents::None;
+ for (Metadata *Op : MD->operands()) {
+ CaptureComponents Component =
+ StringSwitch<CaptureComponents>(cast<MDString>(Op)->getString())
+ .Case("address", CaptureComponents::Address)
+ .Case("address_is_null", CaptureComponents::AddressIsNull)
+ .Case("provenance", CaptureComponents::Provenance)
+ .Case("read_provenance", CaptureComponents::ReadProvenance);
+ CC |= Component;
+ }
+ return CC;
+}
+
//===----------------------------------------------------------------------===//
// AtomicCmpXchgInst Implementation
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index b2e76cc7a8a90..8f835609904ed 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -542,6 +542,7 @@ class Verifier : public InstVisitor<Verifier>, VerifierSupport {
void visitAliasScopeMetadata(const MDNode *MD);
void visitAliasScopeListMetadata(const MDNode *MD);
void visitAccessGroupMetadata(const MDNode *MD);
+ void visitCapturesMetadata(Instruction &I, const MDNode *Captures);
template <class Ty> bool isValidMetadataArray(const MDTuple &N);
#define HANDLE_SPECIALIZED_MDNODE_LEAF(CLASS) void visit##CLASS(const CLASS &N);
@@ -5373,6 +5374,27 @@ void Verifier::visitAccessGroupMetadata(const MDNode *MD) {
}
}
+void Verifier::visitCapturesMetadata(Instruction &I, const MDNode *Captures) {
+ static const char *ValidArgs[] = {"address_is_null", "address",
+ "read_provenance", "provenance"};
+
+ auto *SI = dyn_cast<StoreInst>(&I);
+ Check(SI, "!captures metadata can only be applied to store instructions", &I);
+ Check(SI->getValueOperand()->getType()->isPointerTy(),
+ "!captures metadata can only be applied to store with value operand of "
+ "pointer type",
+ &I);
+ Check(Captures->getNumOperands() != 0, "!captures metadata cannot be empty",
+ &I);
+
+ for (Metadata *Op : Captures->operands()) {
+ auto *Str = dyn_cast<MDString>(Op);
+ Check(Str, "!captures metadata must be a list of strings", &I);
+ Check(is_contained(ValidArgs, Str->getString()),
+ "invalid entry in !captures metadata", &I, Str);
+ }
+}
+
/// verifyInstruction - Verify that an instruction is well formed.
///
void Verifier::visitInstruction(Instruction &I) {
@@ -5600,6 +5622,9 @@ void Verifier::visitInstruction(Instruction &I) {
if (MDNode *Annotation = I.getMetadata(LLVMContext::MD_annotation))
visitAnnotationMetadata(Annotation);
+ if (MDNode *Captures = I.getMetadata(LLVMContext::MD_captures))
+ visitCapturesMetadata(I, Captures);
+
if (MDNode *N = I.getDebugLoc().getAsMDNode()) {
CheckDI(isa<DILocation>(N), "invalid !dbg metadata attachment", &I, N);
visitMDNode(*N, AreDebugLocsAllowed::Yes);
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index 123881e276584..c1ae0e4a19b41 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -3025,6 +3025,9 @@ static void combineMetadata(Instruction *K, const Instruction *J,
// Preserve !nosanitize if both K and J have it.
K->setMetadata(Kind, JMD);
break;
+ case LLVMContext::MD_captures:
+ K->setMetadata(Kind, JMD ? MDNode::concatenate(JMD, KMD) : nullptr);
+ break;
}
}
// Set !invariant.group from J if J has it. If both instructions have it
diff --git a/llvm/test/Transforms/FunctionAttrs/nocapture.ll b/llvm/test/Transforms/FunctionAttrs/nocapture.ll
index 60a4214548a72..8113ba65fe422 100644
--- a/llvm/test/Transforms/FunctionAttrs/nocapture.ll
+++ b/llvm/test/Transforms/FunctionAttrs/nocapture.ll
@@ -1398,5 +1398,73 @@ define void @assume_nonnull(ptr %p) {
ret void
}
+define void @captures_metadata_address_is_null(ptr %x, ptr %y) {
+; FNATTRS: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; FNATTRS-LABEL: define void @captures_metadata_address_is_null
+; FNATTRS-SAME: (ptr captures(address_is_null) [[X:%.*]], ptr writeonly captures(none) initializes((0, 8)) [[Y:%.*]]) #[[ATTR17]] {
+; FNATTRS-NEXT: store ptr [[X]], ptr [[Y]], align 8, !captures [[META0:![0-9]+]]
+; FNATTRS-NEXT: ret void
+;
+; ATTRIBUTOR: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; ATTRIBUTOR-LABEL: define void @captures_metadata_address_is_null
+; ATTRIBUTOR-SAME: (ptr nofree writeonly [[X:%.*]], ptr nofree nonnull writeonly captures(none) [[Y:%.*]]) #[[ATTR13]] {
+; ATTRIBUTOR-NEXT: store ptr [[X]], ptr [[Y]], align 8, !captures [[META0:![0-9]+]]
+; ATTRIBUTOR-NEXT: ret void
+;
+ store ptr %x, ptr %y, !captures !{!"address_is_null"}
+ ret void
+}
+
+define void @captures_metadata_address(ptr %x, ptr %y) {
+; FNATTRS: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; FNATTRS-LABEL: define void @captures_metadata_address
+; FNATTRS-SAME: (ptr captures(address) [[X:%.*]], ptr writeonly captures(none) initializes((0, 8)) [[Y:%.*]]) #[[ATTR17]] {
+; FNATTRS-NEXT: store ptr [[X]], ptr [[Y]], align 8, !captures [[META1:![0-9]+]]
+; FNATTRS-NEXT: ret void
+;
+; ATTRIBUTOR: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; ATTRIBUTOR-LABEL: define void @captures_metadata_address
+; ATTRIBUTOR-SAME: (ptr nofree writeonly [[X:%.*]], ptr nofree nonnull writeonly captures(none) [[Y:%.*]]) #[[ATTR13]] {
+; ATTRIBUTOR-NEXT: store ptr [[X]], ptr [[Y]], align 8, !captures [[META1:![0-9]+]]
+; ATTRIBUTOR-NEXT: ret void
+;
+ store ptr %x, ptr %y, !captures !{!"address"}
+ ret void
+}
+
+define void @captures_metadata_address_read_provenance(ptr %x, ptr %y) {
+; FNATTRS: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; FNATTRS-LABEL: define void @captures_metadata_address_read_provenance
+; FNATTRS-SAME: (ptr captures(address, read_provenance) [[X:%.*]], ptr writeonly captures(none) initializes((0, 8)) [[Y:%.*]]) #[[ATTR17]] {
+; FNATTRS-NEXT: store ptr [[X]], ptr [[Y]], align 8, !captures [[META2:![0-9]+]]
+; FNATTRS-NEXT: ret void
+;
+; ATTRIBUTOR: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; ATTRIBUTOR-LABEL: define void @captures_metadata_address_read_provenance
+; ATTRIBUTOR-SAME: (ptr nofree writeonly [[X:%.*]], ptr nofree nonnull writeonly captures(none) [[Y:%.*]]) #[[ATTR13]] {
+; ATTRIBUTOR-NEXT: store ptr [[X]], ptr [[Y]], align 8, !captures [[META2:![0-9]+]]
+; ATTRIBUTOR-NEXT: ret void
+;
+ store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}
+ ret void
+}
+
+define void @captures_metadata_provenance(ptr %x, ptr %y) {
+; FNATTRS: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; FNATTRS-LABEL: define void @captures_metadata_provenance
+; FNATTRS-SAME: (ptr captures(provenance) [[X:%.*]], ptr writeonly captures(none) initializes((0, 8)) [[Y:%.*]]) #[[ATTR17]] {
+; FNATTRS-NEXT: store ptr [[X]], ptr [[Y]], align 8, !captures [[META3:![0-9]+]]
+; FNATTRS-NEXT: ret void
+;
+; ATTRIBUTOR: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
+; ATTRIBUTOR-LABEL: define void @captures_metadata_provenance
+; ATTRIBUTOR-SAME: (ptr nofree writeonly [[X:%.*]], ptr nofree nonnull writeonly captures(none) [[Y:%.*]]) #[[ATTR13]] {
+; ATTRIBUTOR-NEXT: store ptr [[X]], ptr [[Y]], align 8, !captures [[META3:![0-9]+]]
+; ATTRIBUTOR-NEXT: ret void
+;
+ store ptr %x, ptr %y, !captures !{!"provenance"}
+ ret void
+}
+
declare ptr @llvm.launder.invariant.group.p0(ptr)
declare ptr @llvm.strip.invariant.group.p0(ptr)
diff --git a/llvm/test/Transforms/SimplifyCFG/hoist-with-metadata.ll b/llvm/test/Transforms/SimplifyCFG/hoist-with-metadata.ll
index d34ac2bb30040..cba0251d1e18d 100644
--- a/llvm/test/Transforms/SimplifyCFG/hoist-with-metadata.ll
+++ b/llvm/test/Transforms/SimplifyCFG/hoist-with-metadata.ll
@@ -424,6 +424,134 @@ join:
ret ptr %phi
}
+define void @hoist_captures_same(i1 %c, ptr %x, ptr %y) {
+; CHECK-LABEL: @hoist_captures_same(
+; CHECK-NEXT: if:
+; CHECK-NEXT: store ptr [[X:%.*]], ptr [[Y:%.*]], align 8, !captures [[META9:![0-9]+]]
+; CHECK-NEXT: ret void
+;
+if:
+ br i1 %c, label %then, label %else
+
+then:
+ store ptr %x, ptr %y, !captures !{!"address"}
+ br label %out
+
+else:
+ store ptr %x, ptr %y, !captures !{!"address"}
+ br label %out
+
+out:
+ ret void
+}
+
+define void @hoist_captures_different(i1 %c, ptr %x, ptr %y) {
+; CHECK-LABEL: @hoist_captures_different(
+; CHECK-NEXT: if:
+; CHECK-NEXT: store ptr [[X:%.*]], ptr [[Y:%.*]], align 8, !captures [[META10:![0-9]+]]
+; CHECK-NEXT: ret void
+;
+if:
+ br i1 %c, label %then, label %else
+
+then:
+ store ptr %x, ptr %y, !captures !{!"address"}
+ br label %out
+
+else:
+ store ptr %x, ptr %y, !captures !{!"read_provenance"}
+ br label %out
+
+out:
+ ret void
+}
+
+define void @hoist_captures_overlap(i1 %c, ptr %x, ptr %y) {
+; CHECK-LABEL: @hoist_captures_overlap(
+; CHECK-NEXT: if:
+; CHECK-NEXT: store ptr [[X:%.*]], ptr [[Y:%.*]], align 8, !captures [[META11:![0-9]+]]
+; CHECK-NEXT: ret void
+;
+if:
+ br i1 %c, label %then, label %else
+
+then:
+ store ptr %x, ptr %y, !captures !{!"address"}
+ br label %out
+
+else:
+ store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}
+ br label %out
+
+out:
+ ret void
+}
+
+; We could also omit the attribute in this case, as it provides no additional
+; information.
+define void @hoist_captures_full_set(i1 %c, ptr %x, ptr %y) {
+; CHECK-LABEL: @hoist_captures_full_set(
+; CHECK-NEXT: if:
+; CHECK-NEXT: store ptr [[X:%.*]], ptr [[Y:%.*]], align 8, !captures [[META12:![0-9]+]]
+; CHECK-NEXT: ret void
+;
+if:
+ br i1 %c, label %then, label %else
+
+then:
+ store ptr %x, ptr %y, !captures !{!"address"}
+ br label %out
+
+else:
+ store ptr %x, ptr %y, !captures !{!"provenance"}
+ br label %out
+
+out:
+ ret void
+}
+
+define void @hoist_captures_only_one1(i1 %c, ptr %x, ptr %y) {
+; CHECK-LABEL: @hoist_captures_only_one1(
+; CHECK-NEXT: if:
+; CHECK-NEXT: store ptr [[X:%.*]], ptr [[Y:%.*]], align 8
+; CHECK-NEXT: ret void
+;
+if:
+ br i1 %c, label %then, label %else
+
+then:
+ store ptr %x, ptr %y, !captures !{!"address"}
+ br label %out
+
+else:
+ store ptr %x, ptr %y
+ br label %out
+
+out:
+ ret void
+}
+
+define void @hoist_captures_only_one2(i1 %c, ptr %x, ptr %y) {
+; CHECK-LABEL: @hoist_captures_only_one2(
+; CHECK-NEXT: if:
+; CHECK-NEXT: store ptr [[X:%.*]], ptr [[Y:%.*]], align 8
+; CHECK-NEXT: ret void
+;
+if:
+ br i1 %c, label %then, label %else
+
+then:
+ store ptr %x, ptr %y
+ br label %out
+
+else:
+ store ptr %x, ptr %y, !captures !{!"address"}
+ br label %out
+
+out:
+ ret void
+}
+
!0 = !{ i8 0, i8 1 }
!1 = !{ i8 3, i8 5 }
!2 = !{}
@@ -445,4 +573,8 @@ join:
; CHECK: [[META6]] = !{float 2.500000e+00}
; CHECK: [[META7]] = !{i32 5, i32 6}
; CHECK: [[META8]] = !{i32 4, i32 5}
+; CHECK: [[META9]] = !{!"address"}
+; CHECK: [[META10]] = !{!"read_provenance", !"address"}
+; CHECK: [[META11]] = !{!"address", !"read_provenance"}
+; CHECK: [[META12]] = !{!"provenance", !"address"}
;.
diff --git a/llvm/test/Verifier/captures-metadata.ll b/llvm/test/Verifier/captures-metadata.ll
new file mode 100644
index 0000000000000..ae08ddd036f16
--- /dev/null
+++ b/llvm/test/Verifier/captures-metadata.ll
@@ -0,0 +1,37 @@
+; RUN: not opt -passes=verify < %s 2>&1 | FileCheck %s
+
+; CHECK: !captures metadata can only be applied to store instructions
+define void @wrong_instr_type(ptr %x) {
+ load ptr, ptr %x, !captures !{!"address"}
+ ret void
+}
+
+; CHECK: captures metadata can only be applied to store with value operand of pointer type
+define void @wrong_op_type(i32 %x, ptr %y) {
+ store i32 %x, ptr %y, !captures !{!"address"}
+ ret void
+}
+
+; CHECK: !captures metadata cannot be empty
+define void @empty(ptr %x, ptr %y) {
+ store ptr %x, ptr %y, !captures !{}
+ ret void
+}
+
+; CHECK: !captures metadata must be a list of strings
+define void @not_string(ptr %x, ptr %y) {
+ store ptr %x, ptr %y, !captures !{!{}}
+ ret void
+}
+
+; CHECK: invalid entry in !captures metadata
+define void @invalid_str(ptr %x, ptr %y) {
+ store ptr %x, ptr %y, !captures !{!"foo"}
+ ret void
+}
+
+; CHECK: invalid entry in !captures metadata
+define void @invalid_none(ptr %x, ptr %y) {
+ store ptr %x, ptr %y, !captures !{!"none"}
+ ret void
+}
|
https://godbolt.org/z/T1479EG43 I am curious about how we annotate this in Rust. Do we provide an intrinsic like Anyway, from the LLVM side, I think it is profitable as we can salvage the capture information after inlining. |
What I had in mind is to just emit the attribute for all cases where a An alternative way to solve the problem would be to have an intrinsic like |
|
|
||
| For example ``store ptr %x, ptr %y, !captures !{!"address"}`` indicates that | ||
| the copy of pointer ``%x`` stored to location ``%y`` will only be used to | ||
| inspect its integral address value, and not dereferenced. Dereferencing the |
There was a problem hiding this comment.
I suppose this may be somewhat target-specific as well, though, longer term, could one envision the frontend emitting, e.g., !"address" when storing a pointer to a non-integral address space (and would that imply reading the visibile bits)? Not specific to the change, but didn't find any mention either in \pointercapture on provenance when it comes to non-integral pointers.
There was a problem hiding this comment.
Depends a bit on what kind of non-integral pointer we'd talking about here, but if it's about GC pointers, then you'd probably want the opposite, that is !captures !{!"provenance"} to indicate that accesses through the captured pointer are possible, but the address will not be inspected? Assuming the language does in fact not expose pointer addresses. I haven't thought too carefully on whether this would be safe. (Though I don't think we currently have any transforms that are able to take advantage of this.)
| ; | ||
| ; ATTRIBUTOR: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) | ||
| ; ATTRIBUTOR-LABEL: define void @captures_metadata_address_read_provenance | ||
| ; ATTRIBUTOR-SAME: (ptr nofree writeonly [[X:%.*]], ptr nofree nonnull writeonly captures(none) [[Y:%.*]]) #[[ATTR13]] { |
There was a problem hiding this comment.
Hopefully Attributor can be extended to infer the whole CaptureInfo too soon.
There was a problem hiding this comment.
This is tracked at #135610 -- but I see you're already assigned on that one ^^
| @@ -3025,6 +3025,9 @@ static void combineMetadata(Instruction *K, const Instruction *J, | |||
| // Preserve !nosanitize if both K and J have it. | |||
| K->setMetadata(Kind, JMD); | |||
| break; | |||
| case LLVMContext::MD_captures: | |||
| K->setMetadata(Kind, JMD ? MDNode::concatenate(JMD, KMD) : nullptr); | |||
There was a problem hiding this comment.
address_is_null + address can be address. It may be better to add a StoreInst::setCaptureComponents helper to update MD nodes. Then we can guarantee the list is always canonical.
There was a problem hiding this comment.
I've updated the implementation to always produce the canonical representation.
| ; CHECK: [[META10]] = !{!"read_provenance", !"address"} | ||
| ; CHECK: [[META11]] = !{!"address", !"read_provenance"} |
This introduces `!captures` metadata on stores, which looks like
this:
```
store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}
```
The semantics are equivalent to replacing the store with a call
like this:
```
call void @llvm.store(ptr captures(address, read_provenance) %x, ptr %y)
```
This metadata is intended for annotation by frontends -- it's not
something we can feasibly infer at this pointer, as it would require
analyzing uses of the pointer stored in memory.
The motivating use case for this is Rust's `println!()` machinery,
which involves storing a reference to the value inside a structure.
This means that printing code (including conditional debugging code),
can inhibit optimizations because the pointer escapes. With the new
metadata we could annotate this as a read-only capture, which would
have less impact on optimizations.
12e9d38 to
36fe181
Compare
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/25019 Here is the relevant piece of the build log for the reference |
This introduces `!captures` metadata on stores, which looks like this:
```
store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}
```
The semantics are the same as replacing the store with a call like this:
```
call void @llvm.store(ptr captures(address, read_provenance) %x, ptr %y)
```
This metadata is intended for annotation by frontends -- it's not
something we can feasibly infer at this point, as it would require
analyzing uses of the pointer stored in memory.
The motivating use case for this is Rust's `println!()` machinery, which
involves storing a reference to the value inside a structure. This means
that printing code (including conditional debugging code), can inhibit
optimizations because the pointer escapes. With the new metadata we can
annotate this as a read-only capture, which has less impact on
optimizations.
Make retags an implicit part of typed copies Ever since Stacked Borrows was first implemented in Miri, that was done with `Retag` statements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons: - It leaves open the [question](rust-lang/unsafe-code-guidelines#371) of where to even put `Retag` statements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical. - For assignments of the form `*ptr = expr`, if the assignment involves copying a reference, we probably want to do a retag -- but if we do a `Retag(*ptr)` as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has [come up](llvm/llvm-project#160913 (comment)) as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be [other ways](rust-lang/unsafe-code-guidelines#593 (comment)) to obtain this desired optimization.) - Normal compilation avoids generating retags, but we still generate LLVM IR with `noalias`. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions. I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows: - We evaluate the LHS to a place. - We evaluate the RHS to a value. This does a typed load from memory if needed, raising UB if memory does not contain a valid representation of the assignment's type. - We walk that value, identify all references inside of it, and retag them. If this happens as part of passing a function argument, this is a protecting retag. - We store (a representation of) the value into the place. However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like `&mut ***ptr` into intermediate deref's. Those must *not* do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field to `Rvalue::Use` to indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field to `StatementKind::Assign` instead, but decided against that as we only actually need it for `Rvalue::Use`. I am not sure if this was the right call...) This neatly answers the question of when retags should occur, and handles cases like `*ptr = expr`. It avoids traversing values twice in Miri. It makes codegen's use of `noalias` sound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.) Another side-effect of this PR is that `-Zmiri-disable-validation` now also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay. - [rustc benchmark results](#154341 (comment)) - [miri benchmark results](#154341 (comment))
Make retags an implicit part of typed copies Ever since Stacked Borrows was first implemented in Miri, that was done with `Retag` statements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons: - It leaves open the [question](rust-lang/unsafe-code-guidelines#371) of where to even put `Retag` statements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical. - For assignments of the form `*ptr = expr`, if the assignment involves copying a reference, we probably want to do a retag -- but if we do a `Retag(*ptr)` as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has [come up](llvm/llvm-project#160913 (comment)) as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be [other ways](rust-lang/unsafe-code-guidelines#593 (comment)) to obtain this desired optimization.) - Normal compilation avoids generating retags, but we still generate LLVM IR with `noalias`. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions. I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows: - We evaluate the LHS to a place. - We evaluate the RHS to a value. This does a typed load from memory if needed, raising UB if memory does not contain a valid representation of the assignment's type. - We walk that value, identify all references inside of it, and retag them. If this happens as part of passing a function argument, this is a protecting retag. - We store (a representation of) the value into the place. However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like `&mut ***ptr` into intermediate deref's. Those must *not* do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field to `Rvalue::Use` to indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field to `StatementKind::Assign` instead, but decided against that as we only actually need it for `Rvalue::Use`. I am not sure if this was the right call...) This neatly answers the question of when retags should occur, and handles cases like `*ptr = expr`. It avoids traversing values twice in Miri. It makes codegen's use of `noalias` sound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.) Another side-effect of this PR is that `-Zmiri-disable-validation` now also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay. - [rustc benchmark results](#154341 (comment)) - [miri benchmark results](#154341 (comment))
Make retags an implicit part of typed copies Ever since Stacked Borrows was first implemented in Miri, that was done with `Retag` statements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons: - It leaves open the [question](rust-lang/unsafe-code-guidelines#371) of where to even put `Retag` statements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical. - For assignments of the form `*ptr = expr`, if the assignment involves copying a reference, we probably want to do a retag -- but if we do a `Retag(*ptr)` as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has [come up](llvm/llvm-project#160913 (comment)) as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be [other ways](rust-lang/unsafe-code-guidelines#593 (comment)) to obtain this desired optimization.) - Normal compilation avoids generating retags, but we still generate LLVM IR with `noalias`. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions. I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows: - We evaluate the LHS to a place. - We evaluate the RHS to a value. This does a typed load from memory if needed, raising UB if memory does not contain a valid representation of the assignment's type. - We walk that value, identify all references inside of it, and retag them. If this happens as part of passing a function argument, this is a protecting retag. - We store (a representation of) the value into the place. However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like `&mut ***ptr` into intermediate deref's. Those must *not* do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field to `Rvalue::Use` to indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field to `StatementKind::Assign` instead, but decided against that as we only actually need it for `Rvalue::Use`. I am not sure if this was the right call...) This neatly answers the question of when retags should occur, and handles cases like `*ptr = expr`. It avoids traversing values twice in Miri. It makes codegen's use of `noalias` sound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.) Another side-effect of this PR is that `-Zmiri-disable-validation` now also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay. - [rustc benchmark results](#154341 (comment)) - [miri benchmark results](#154341 (comment))
Make retags an implicit part of typed copies Ever since Stacked Borrows was first implemented in Miri, that was done with `Retag` statements: given a place (usually a local variable), those statements find all references stored inside the place and refresh their tags to ensure the aliasing requirements are upheld. However, this is a somewhat unsatisfying approach for multiple reasons: - It leaves open the [question](rust-lang/unsafe-code-guidelines#371) of where to even put `Retag` statements. Over time, the AddRetag pass settled on one possible answer to this, but it wasn't very canonical. - For assignments of the form `*ptr = expr`, if the assignment involves copying a reference, we probably want to do a retag -- but if we do a `Retag(*ptr)` as the next instruction, it can be non-trivial to argue that this even retags the right value, so we refrained from doing retags in that case. This has [come up](llvm/llvm-project#160913 (comment)) as a potential issue for Rust making better use of LLVM "captures" annotations. (That said, there might be [other ways](rust-lang/unsafe-code-guidelines#593 (comment)) to obtain this desired optimization.) - Normal compilation avoids generating retags, but we still generate LLVM IR with `noalias`. What does that even mean? How do MIR optimization passes interact with retags? These are questions we have to figure out to make better use of aliasing information, but currently we can't even really ask such questions. I think we should resolve all that by making retags part of what happens during a typed copy (a concept and interpreter infrastructure that did not exist yet when retags were initially introduced). Under this proposal, when executing a MIR assignment statement, what conceptually happens is as follows: - We evaluate the LHS to a place. - We evaluate the RHS to a value. This does a typed load from memory if needed, raising UB if memory does not contain a valid representation of the assignment's type. - We walk that value, identify all references inside of it, and retag them. If this happens as part of passing a function argument, this is a protecting retag. - We store (a representation of) the value into the place. However, this semantics doesn't fully work: there's a mandatory MIR pass that turns expressions like `&mut ***ptr` into intermediate deref's. Those must *not* do any retags. So far this happened because the AddRetag pass did not add retags for assignments to deref temporaries, but that information is not recorded in cross-crate MIR. Therefore I instead added a field to `Rvalue::Use` to indicate whether this value should be retagged or not. A non-retagging copy seems like a sufficiently canonical primitive that we should be able to express it. Dealing with the fallout from that is a large chunk of the overall diff. (I also considered adding this field to `StatementKind::Assign` instead, but decided against that as we only actually need it for `Rvalue::Use`. I am not sure if this was the right call...) This neatly answers the question of when retags should occur, and handles cases like `*ptr = expr`. It avoids traversing values twice in Miri. It makes codegen's use of `noalias` sound wrt the actual MIR that it is working on. It also gives us a target semantics to evaluate MIR opts against. However, I did not carefully check all MIR opts -- in particular, GVN needs a thorough look under the new semantics; it currently can turn alias-correct code into alias-incorrect code. (But this PR doesn't make things any worse for normal compilation where the retag indicator is anyway ignored.) Another side-effect of this PR is that `-Zmiri-disable-validation` now also disables alias checking. It'd be nicer to keep them orthogonal but I find this an acceptable price to pay. - [rustc benchmark results](rust-lang/rust#154341 (comment)) - [miri benchmark results](rust-lang/rust#154341 (comment))
This introduces
!capturesmetadata on stores, which looks like this:The semantics are the same as replacing the store with a call like this:
This metadata is intended for annotation by frontends -- it's not something we can feasibly infer at this point, as it would require analyzing uses of the pointer stored in memory.
The motivating use case for this is Rust's
println!()machinery, which involves storing a reference to the value inside a structure. This means that printing code (including conditional debugging code), can inhibit optimizations because the pointer escapes. With the new metadata we can annotate this as a read-only capture, which has less impact on optimizations.