文件名称:xapp_hls_Matrix Multiply
- 所属分类:
- VHDL编程
- 资源属性:
- [C/C++] [源码]
- 上传时间:
- 2023-03-28
- 文件大小:
- 571.33kb
- 下载次数:
- 0次
- 提 供 者:
- 1679556379@qq.com
- 相关连接:
- 无
- 下载说明:
- 别用迅雷下载,失败请重下,重下不扣分!
介绍说明--下载内容均来自于网络,请自行研究使用
This repository includes a pure Vitis HLS implementation of matrix-matrix multiplication (A*B=C) for Xilinx FPGAs, using Xilinx Vitis to instantiate memory and PCIe controllers and interface with the host.
Experiments run on a VCU1525 achieved 462 GFLOP/s, 301 GFLOP/s and 132 GFLOP/s for half, single, and double precision, respectively, with routing across the three SLRs being the primary bottleneck preventing further scaling. The code is not device-specific, and can be configured for any Xilinx FPGA supported by the Xilinx OpenCL runtime. Kernels have also been verified to execute on TUL KU115, Alveo U250, and Alveo U280 boards with similar results.
The implementation uses a systolic array approach, where linearly connected processing elements compute distinct contributions to the outer product of tiles of the output matrix.
The approach used to implement this kernel was presented at FPGA'20 [1]. For a general descr iption of the optimization techniques that we apply, we refer to our article on HLS transformations [2]. We also gave a tutorial on HLS for HPC at SC'21, ISC'21, SC'20, HiPEAC'20, SC'19, SC'18, and PPoPP'18.
Experiments run on a VCU1525 achieved 462 GFLOP/s, 301 GFLOP/s and 132 GFLOP/s for half, single, and double precision, respectively, with routing across the three SLRs being the primary bottleneck preventing further scaling. The code is not device-specific, and can be configured for any Xilinx FPGA supported by the Xilinx OpenCL runtime. Kernels have also been verified to execute on TUL KU115, Alveo U250, and Alveo U280 boards with similar results.
The implementation uses a systolic array approach, where linearly connected processing elements compute distinct contributions to the outer product of tiles of the output matrix.
The approach used to implement this kernel was presented at FPGA'20 [1]. For a general descr iption of the optimization techniques that we apply, we refer to our article on HLS transformations [2]. We also gave a tutorial on HLS for HPC at SC'21, ISC'21, SC'20, HiPEAC'20, SC'19, SC'18, and PPoPP'18.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
压缩包 : xapp1170_floating_point_matrix_multiplication-main.zip 列表 xapp1170_floating_point_matrix_multiplication-main/ xapp1170_floating_point_matrix_multiplication-main/README.md xapp1170_floating_point_matrix_multiplication-main/block_design.PNG xapp1170_floating_point_matrix_multiplication-main/fp_mmult.ipynb xapp1170_floating_point_matrix_multiplication-main/hls/ xapp1170_floating_point_matrix_multiplication-main/hls/mmult.h xapp1170_floating_point_matrix_multiplication-main/hls/mmult_accel.cpp xapp1170_floating_point_matrix_multiplication-main/hls/mmult_test.cpp xapp1170_floating_point_matrix_multiplication-main/hls/run_hls_script.tcl xapp1170_floating_point_matrix_multiplication-main/vivado/ xapp1170_floating_point_matrix_multiplication-main/vivado/fp_mmult.bit xapp1170_floating_point_matrix_multiplication-main/vivado/fp_mmult.hwh