6
edits
Changes
no edit summary
== Accelerators and XPUs ==
=== Why XPUs? ===
Nowadays, it’s irreversible that the way of computing has become heterogeneous, thanks to the fast-growing development of applications such as machine learning, video editing, and gameplay. That means separation of machine architecture is preferred instead of using multi-purpose hardware. The typical examples are the separation of GPUs from CPUs and the application of FPGAs . The GPU among the parts becoming critical for those compute-intensive applications. It is a highly parallelized machine with several smaller processing cores that work together. While single-core serial performance on a GPU is much slower than on a CPU, applications must take advantage of the massive parallelism available in a GPU. Also, the growth of heterogeneous computing has led developers to discover that different types of workloads perform best on different GPU hardware architectures. Thus, Intel VTune Profiler enables us to evaluate overhead when offloading onto an Intel GPU and analyze it. There are three measurements in this feature:
GPU offload
Explore code execution on various CPU and GPU cores on your platform, estimate how your code benefits from offloading to the GPU, and identify whether your application is CPU or GPU bound.
=== GPU Compute/Media Hotspot (preview) ===
Analyze the most time-consuming GPU kernels, characterize GPU utilization based on GPU hardware metrics, identify performance issues caused by memory latency or inefficient kernel algorithms, and analyze GPU instruction frequency per certain instruction types.
=== CPU/FPGA interaction ===
Analyze CPU/FPGA interaction issues through these ways:
1. Focus on the kernels running on the FPGA.
2. Identify the most time-consuming kernels.
3. Look at the corresponding metrics on the device side (like Occupancy or Stalls).
4. Correlate with CPU and platform profiling data.
== Parallelism ==