62
edits
Changes
→Roof-line Analysis
The roofline tool creates a tool line model, to represent an application's performance in relation to hardware limitations, including memory bandwidth and computational peaks. To measure performance we use 2 axes with GFLOPs (Giga Floating point operations per second) on the y-axis, and AI(Arithmetic Intensity(FLOPs/Byte)) on the x-axis both in log scale, with this we can begin to build our roof-line. Now for any given machine, its CPU can only perform so many FLOPs so we can plot the CPU cap on our chart to represent this. Like the CPU a memory system can only supply so many gigabytes, we can represent this by a diagonal line(N GB/s * X FLOPs/Byte = Y GFLOPs/s). (pic) This chart represents the machine's hardware limitation, and it's best performance at a given AI
Every function, or loop, will have specific AI, when ran we can record its GFLOPs , Because we know Its AI won't change and any optimization we do will only change the performance, this is useful when we want to measure the performance of a given change or optimization.