Open main menu

CDOT Wiki β

Changes

GPU621/VTuners

2,013 bytes added, 19:17, 30 November 2022
no edit summary
== Platform and I/O ==
 
The VTune profiler can reveal to developers the utilization efficiency of Intel's Xeon processors by analyzing the input and output of the DDIO. The profiler analyses the DDIO (Data Direct I/O) technology hardware feature that is built into the processors. This functionality is always available, and always on.
 
Essentially, when a Network Interface Controller is being fully utilized and a new packet comes in. If any component in the chain takes longer than expected, we get packet loss. This is the main bottle neck of the traditional Direct Memory Approach (DMA).
 
Intel's solution to this problem is their DDIO Xeon hardware technology. It allows PCIe devices to read and write operation to and from the L3 cache. This gets the incoming data packets as close to the cores as possible. When properly utilized the device interactions can be solely served by the L3 cache.
 
Advantages:
 
 
• Completely remove the need for Dynamic Random Access Memory (DRAM).
• Low inbound read and write latencies that allow for high throughput.
• Reduced DRAM bandwidth and power consumption.
 
Depending on implementation, there can be the potential for non-optimal code performance. The areas that can be tuned are the Topology configuration, and L3 cache management.
== Multi-Node ==
 
VTune profiler helps analyze large-scale Message Passing Interfaces (MPI) and OpenMP workloads. It can help identify issues related to scalability, highlight threading implementation issues, identify imbalances and communications issues in MPI applications. It provides in-depth analysis and recommendations to the user. This functionality extends to High Performance Computing (HPC).
 
The Profiler can typically (by default) takes a snapshot of the whole application. Although, there is functionality to have it focus on particular area within an application to analyze. It will provide a general program overview, while highlighting specific problematic areas. These problematic areas can then be further analyzed to improve performance. 
= Vtune Profiler Coding Excercise =
117
edits