Roofline analysis for GEMM/Attention. Use when: classify an operator as compute bound vs L1 bound vs LLC bound vs memory bound; estimate FLOPs/bytes from shapes; interpret VTune/oneAPI profiler bytes/time for Intel GPU (SYCL/XPU).
此技能包含 SKILL.md 智能体指令文件。