Changes

Jump to: navigation, search

GPU621/Intel Parallel Studio VTune Amplifier

366 bytes removed, 20:12, 8 December 2021
Features & Functionalities
*LLC Miss Count metric that shows the total number of last-level cache misses
**'''Local DRAM Access Count ''' metric that shows the total number of LLC misses serviced by the local memory**'''Remote DRAM Access Count ''' metric that shows the number of accesses to the remote socket memory**'''Remote Cache Access Count ''' metric that shows the number of accesses to the remote socket cache*'''Memory Bound ''' metric that shows a fraction of cycles spent waiting due to demand load or store instructions**L1 Bound metric that shows how often the machine was stalled without missing the L1 data cache**L2 Bound metric that shows how often the machine was stalled on L2 cache**L3 Bound metric that shows how often the CPU was stalled on L3 cache, or contended with a sibling core**L3 Latency metric that shows a fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)**'''NUMA''': % of Remote Accesses metric shows percentage of memory requests to remote DRAM. The lower its value is, the better.**'''DRAM Bound ''' metric that shows how often the CPU was stalled on the main memory (DRAM). This metric enables you to identify
*DRAM Bandwidth Bound, UPI Utilization Bound issues, as well as Memory Latency issues with the following metrics:
**'''Remote / Local DRAM Ratio ''' metric that is defined by the ratio of remote DRAM loads to local DRAM loads**'''Local DRAM ''' metric that shows how often the CPU was stalled on loads from the local memory**'''Remote DRAM ''' metric that shows how often the CPU was stalled on loads from the remote memory**'''Remote Cache ''' metric that shows how often the CPU was stalled on loads from the remote cache in other sockets
*'''Average Latency ''' metric that shows an average load latency in cycles
[[File:memoryaccess.png]]
21
edits

Navigation menu