Skip to content

Commit 872503d

Browse files
committed
Auto merge of #78781 - eddyb:measureme-rdpmc, r=oli-obk
Integrate measureme's hardware performance counter support. *Note: this is a companion to rust-lang/measureme#143, and duplicates some information with it for convenience* **(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open. ## Credits I'd like to start by thanking `@alyssais,` `@cuviper,` `@edef1c,` `@glandium,` `@jix,` `@Mark-Simulacrum,` `@m-ou-se,` `@mystor,` `@nagisa,` `@puckipedia,` and `@yorickvP,` for all of their help with testing, and valuable insight and suggestions. Getting here wouldn't have been possible without you! (If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs) ## Summary This PR adds support to `-Z self-profile` for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the `rdpmc` instruction on `x86_64` Linux. While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads. Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware. That said, 32-bit `x86` (aka `i686`) would be almost trivial to add and test once we land the initial `x86_64` version (as all the CPU detection code can be reused). A new flag `-Z self-profile-counter` was added, to control which of the named `measureme` counters is used, and which defaults to `wall-time`, in order to keep `-Z self-profile`'s current functionality unchanged (at least for now). The named counters so far are: * `wall-time`: the existing time measurement * name chosen for consistency with `perf.rust-lang.org` * continues to use `std::time::Instant` for a nanosecond-precision "monotonic clock" * `instructions:u`: the hardware performance counter usually referred to as "Instructions retired" * here "retired" (roughly) means "fully executed" * the `:u` suffix is from the Linux `perf` tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this isn't entirely true and why `instructions-minus-irqs:u` should be preferred instead* * `instructions-minus-irqs:u`: same as `instructions:u`, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this should be preferred over `instructions:u`* * `instructions-minus-r0420:u`: experimental counter, same as `instructions-minus-irqs:u` but subtracting an undocumented counter (`r0420:u`) instead of IRQs * the `rXXXX` notation is again from Linux `perf`, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't *really* know what it is * this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far * *see [Challenges/Zen's undocumented 420 counter](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter) for details on how this counter was found and what it does* --- There are also some additional commits: * ~~see [Challenges/Rebasing *shouldn't* affect the results, right?](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right) for details on the changes to `rustc_parse` and `rustc_trait_section` (the latter far more dubious, and probably shouldn't be merged, or not as-is)~~ * **EDIT**: the effects of these are no long quantifiable, the PR includes reverts for them * ~~see [Challenges/`jemalloc`: purging will commence in ten seconds](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds) for details on the `jemalloc` change~~ * this is also separately found in #77162, and we probably want to avoid doing it by default, ideally we'd use the runtime control API `jemalloc` offers (assuming that can stop the timer that's already running, which I'm not sure about) * **EDIT**: until we can do this based on `-Z` flags, this commit has also been reverted * the `proc_macro` change was to avoid randomized hashing and therefore ASLR-like effects --- **(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open. #### Write-up / report Because of how extensive the full report ended up being, I've kept most of it [on `hackmd.io`](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view), but for convenient access, here are all the sections (with individual links): <sup>(someone suggested I'd make a backup, so [here it is on the wayback machine](http://web.archive.org/web/20201127164748/https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view) - I'll need to remember to update that if I have to edit the write-up)</sup> * [**Motivation**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Motivation) * [**Results**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Results) * [**Overhead**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Overhead) *Preview (see the report itself for more details):* |Counter|Total<br>`instructions-minus-irqs:u`|Overhead from "Baseline"<br>(for all 1903881<br>counter reads)|Overhead from "Baseline"<br>(per each counter read)| |-|-|-|-| |Baseline|63637621286 ±6|| |`instructions:u`|63658815885 ±2|&nbsp;&nbsp;+21194599 ±8|&nbsp;&nbsp;+11| |`instructions-minus-irqs:u`|63680307361 ±13|&nbsp;&nbsp;+42686075 ±19|&nbsp;&nbsp;+22| |`wall-time`|63951958376 ±10275|+314337090 ±10281|+165| * [**"Macro" noise (self time)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Macro”-noise-(self-time)) *Preview (see the report itself for more details):* || `wall-time` (ns) | `instructions:u` | `instructions-minus-irqs:u` -: | -: | -: | -: `typeck` | 5478261360 ±283933373 (±~5.2%) | 17350144522 ±6392 (±~0.00004%) | 17351035832.5 ±4.5 (±~0.00000003%) `expand_crate` | 2342096719 ±110465856 (±~4.7%) | 8263777916 ±2937 (±~0.00004%) | 8263708389 ±0 (±~0%) `mir_borrowck` | 2216149671 ±119458444 (±~5.4%) | 8340920100 ±2794 (±~0.00003%) | 8341613983.5 ±2.5 (±~0.00000003%) `mir_built` | 1269059734 ±91514604 (±~7.2%) | 4454959122 ±1618 (±~0.00004%) | 4455303811 ±1 (±~0.00000002%) `resolve_crate` | 942154987.5 ±53068423.5 (±~5.6%) | 3951197709 ±39 (±~0.000001%) | 3951196865 ±0 (±~0%) * [**"Micro" noise (individual sampling intervals)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Micro”-noise-(individual-sampling-intervals)) * [**Caveats**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Caveats) * [**Disabling ASLR**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Disabling-ASLR) * [**Non-deterministic proc macros**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Non-deterministic-proc-macros) * [**Subtracting IRQs**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) * [**Lack of support for multiple threads**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Lack-of-support-for-multiple-threads) * [**Challenges**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Challenges) * [**How do we even read hardware performance counters?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#How-do-we-even-read-hardware-performance-counters) * [**ASLR: it's free entropy**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#ASLR-it’s-free-entropy) * [**The serializing instruction**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#The-serializing-instruction) * [**Getting constantly interrupted**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Getting-constantly-interrupted) * [**AMD patented time-travel and dubbed it `SpecLockMap`<br><sup>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;or: "how we accidentally unlocked `rr` on AMD Zen"</sup>**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#AMD-patented-time-travel-and-dubbed-it-SpecLockMapnbspnbspnbspnbspnbspnbspnbspnbspor-“how-we-accidentally-unlocked-rr-on-AMD-Zen”) * [**`jemalloc`: purging will commence in ten seconds**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds) * [**Rebasing *shouldn't* affect the results, right?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right) * [**Epilogue: Zen's undocumented 420 counter**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter)
2 parents edab34a + a4f1331 commit 872503d

File tree

6 files changed

+42
-8
lines changed

6 files changed

+42
-8
lines changed

compiler/rustc_data_structures/src/profiling.rs

+10-4
Original file line numberDiff line numberDiff line change
@@ -550,14 +550,20 @@ impl SelfProfiler {
550550
pub fn new(
551551
output_directory: &Path,
552552
crate_name: Option<&str>,
553-
event_filters: &Option<Vec<String>>,
553+
event_filters: Option<&[String]>,
554+
counter_name: &str,
554555
) -> Result<SelfProfiler, Box<dyn Error + Send + Sync>> {
555556
fs::create_dir_all(output_directory)?;
556557

557558
let crate_name = crate_name.unwrap_or("unknown-crate");
558-
let filename = format!("{}-{}.rustc_profile", crate_name, process::id());
559+
// HACK(eddyb) we need to pad the PID, strange as it may seem, as its
560+
// length can behave as a source of entropy for heap addresses, when
561+
// ASLR is disabled and the heap is otherwise determinic.
562+
let pid: u32 = process::id();
563+
let filename = format!("{}-{:07}.rustc_profile", crate_name, pid);
559564
let path = output_directory.join(&filename);
560-
let profiler = Profiler::new(&path)?;
565+
let profiler =
566+
Profiler::with_counter(&path, measureme::counters::Counter::by_name(counter_name)?)?;
561567

562568
let query_event_kind = profiler.alloc_string("Query");
563569
let generic_activity_event_kind = profiler.alloc_string("GenericActivity");
@@ -570,7 +576,7 @@ impl SelfProfiler {
570576

571577
let mut event_filter_mask = EventFilter::empty();
572578

573-
if let Some(ref event_filters) = *event_filters {
579+
if let Some(event_filters) = event_filters {
574580
let mut unknown_events = vec![];
575581
for item in event_filters {
576582
if let Some(&(_, mask)) =

compiler/rustc_query_impl/src/profiling_support.rs

+3
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,9 @@ fn alloc_self_profile_query_strings_for_query_cache<'tcx, C>(
275275
let query_name = profiler.get_or_alloc_cached_string(query_name);
276276
let event_id = event_id_builder.from_label(query_name).to_string_id();
277277

278+
// FIXME(eddyb) make this O(1) by using a pre-cached query name `EventId`,
279+
// instead of passing the `DepNodeIndex` to `finish_with_query_invocation_id`,
280+
// when recording the event in the first place.
278281
let mut query_invocation_ids = Vec::new();
279282
query_cache.iter(&mut |_, _, i| {
280283
query_invocation_ids.push(i.into());

compiler/rustc_session/src/options.rs

+6
Original file line numberDiff line numberDiff line change
@@ -1473,6 +1473,12 @@ options! {
14731473
for example: `-Z self-profile-events=default,query-keys`
14741474
all options: none, all, default, generic-activity, query-provider, query-cache-hit
14751475
query-blocked, incr-cache-load, incr-result-hashing, query-keys, function-args, args, llvm, artifact-sizes"),
1476+
self_profile_counter: String = ("wall-time".to_string(), parse_string, [UNTRACKED],
1477+
"counter used by the self profiler (default: `wall-time`), one of:
1478+
`wall-time` (monotonic clock, i.e. `std::time::Instant`)
1479+
`instructions:u` (retired instructions, userspace-only)
1480+
`instructions-minus-irqs:u` (subtracting hardware interrupt counts for extra accuracy)"
1481+
),
14761482
share_generics: Option<bool> = (None, parse_opt_bool, [TRACKED],
14771483
"make the current crate share its generic instantiations"),
14781484
show_span: Option<String> = (None, parse_opt_string, [TRACKED],

compiler/rustc_session/src/session.rs

+2-1
Original file line numberDiff line numberDiff line change
@@ -1253,7 +1253,8 @@ pub fn build_session(
12531253
let profiler = SelfProfiler::new(
12541254
directory,
12551255
sopts.crate_name.as_deref(),
1256-
&sopts.debugging_opts.self_profile_events,
1256+
sopts.debugging_opts.self_profile_events.as_ref().map(|xs| &xs[..]),
1257+
&sopts.debugging_opts.self_profile_counter,
12571258
);
12581259
match profiler {
12591260
Ok(profiler) => Some(Arc::new(profiler)),

library/proc_macro/src/bridge/handle.rs

+19-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
//! Server-side handles and storage for per-handle data.
22
33
use std::collections::{BTreeMap, HashMap};
4-
use std::hash::Hash;
4+
use std::hash::{BuildHasher, Hash};
55
use std::num::NonZeroU32;
66
use std::ops::{Index, IndexMut};
77
use std::sync::atomic::{AtomicUsize, Ordering};
@@ -51,15 +51,31 @@ impl<T> IndexMut<Handle> for OwnedStore<T> {
5151
}
5252
}
5353

54+
// HACK(eddyb) deterministic `std::collections::hash_map::RandomState` replacement
55+
// that doesn't require adding any dependencies to `proc_macro` (like `rustc-hash`).
56+
#[derive(Clone)]
57+
struct NonRandomState;
58+
59+
impl BuildHasher for NonRandomState {
60+
type Hasher = std::collections::hash_map::DefaultHasher;
61+
#[inline]
62+
fn build_hasher(&self) -> Self::Hasher {
63+
Self::Hasher::new()
64+
}
65+
}
66+
5467
/// Like `OwnedStore`, but avoids storing any value more than once.
5568
pub(super) struct InternedStore<T: 'static> {
5669
owned: OwnedStore<T>,
57-
interner: HashMap<T, Handle>,
70+
interner: HashMap<T, Handle, NonRandomState>,
5871
}
5972

6073
impl<T: Copy + Eq + Hash> InternedStore<T> {
6174
pub(super) fn new(counter: &'static AtomicUsize) -> Self {
62-
InternedStore { owned: OwnedStore::new(counter), interner: HashMap::new() }
75+
InternedStore {
76+
owned: OwnedStore::new(counter),
77+
interner: HashMap::with_hasher(NonRandomState),
78+
}
6379
}
6480

6581
pub(super) fn alloc(&mut self, x: T) -> Handle {

library/std/src/collections/hash/map.rs

+2
Original file line numberDiff line numberDiff line change
@@ -3140,6 +3140,7 @@ impl DefaultHasher {
31403140
/// `DefaultHasher` instances, but is the same as all other `DefaultHasher`
31413141
/// instances created through `new` or `default`.
31423142
#[stable(feature = "hashmap_default_hasher", since = "1.13.0")]
3143+
#[inline]
31433144
#[allow(deprecated)]
31443145
#[must_use]
31453146
pub fn new() -> DefaultHasher {
@@ -3153,6 +3154,7 @@ impl Default for DefaultHasher {
31533154
/// See its documentation for more.
31543155
///
31553156
/// [`new`]: DefaultHasher::new
3157+
#[inline]
31563158
fn default() -> DefaultHasher {
31573159
DefaultHasher::new()
31583160
}

0 commit comments

Comments
 (0)