-
Notifications
You must be signed in to change notification settings - Fork 17
Home
Welcome to the ROCmLibs-for-gfx1103-AMD780M-APU- wiki!
This guide walks you through building rocBLAS for ROCm on Windows. If you already have the libraries, you can skip this section!
Prerequisites: Ensure the following software is installed on your PC. python
, git
, and the HIP SDK
are
essential. The script rdeps.py
will automatically download any missing dependencies when you run it.
- Visual Studio 2022: (Download from https://visualstudio.microsoft.com/)
- Python: (Download from https://www.python.org/)
- Strawberry Perl: (Download from https://strawberryperl.com/)
- CMake: (Download from https://cmake.org/download/)
- Git: (Download from https://git-scm.com/)
- HIP SDK: (Download from https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
-
rocBLAS: Download the latest version (https://github.com/ROCm/rocBLAS).
-
ROCm 5.7.0: Download
rocBLAS 3.1.0
rocBLAS 3.1.0 for ROCm 5.7.0 -
ROCm 6.1.2: Download
rocBLAS 4.1.2
rocBLAS 4.1.2 for ROCm 6.1.2 -
ROCm 6.2.4: Download
rocBLAS 4.2.4
rocBLAS 4.1.2 for ROCm 6.2.4
-
ROCm 5.7.0: Download
-
Tensile: Download the appropriate version:(https://github.com/ROCm/Tensile)
-
ROCm 5.7.0: Download
Tensile 4.38.0
Tensile 4.38.0 for ROCm 5.7.0 -
ROCm 6.1.2: Download
Tensile 4.40.0
Tensile 4.40.0 for ROCm 6.1.2 -
ROCm 6.2.4: Download
Tensile 4.41.0
Tensile 4.41.0 for ROCm 6.2.4
-
If you had a optimized logic for you gpu arche or do edit as guide here,you may skip this steps.especily build libs for xnack- features.
These steps are necessary for specific configurations of ROCm (without optimized logic and want to build a fallbasck logic) and may not be required in all cases.
- ROCm 5.7.0: Follow the instructions for "For hip 5.7" below.
- ROCm 6.1.2: Follow the instructions for "For hip 6.1.2" below.
- ROCm 6.2.4: Follow the instructions for "For hip 6.2.4" below.
-
Download Tensile-fix-fallback-arch-build.patch.
-
Place the patch file in your
Tensile
folder (e.g.,C:\ROCM\Tensile-rocm-5.7.0
). -
Open a terminal within the
Tensile
folder. -
Apply the patch:
git apply Tensile-fix-fallback-arch-build.patch
- If nothing appears after applying, it's patched successfully. Otherwise, you may need to manually add the
patch content to
TensileCreateLibrary.py
, you may also skip this steps if you have optimized logic available.
- If nothing appears after applying, it's patched successfully. Otherwise, you may need to manually add the
patch content to
-
Place the patch file in your
Tensile
folder (e.g.,C:\ROCM\Tensile-rocm-6.1.2
). -
Open a terminal within the
Tensile
folder. -
Apply the patch:
git apply Tensile-fix-fallback-arch-build-hip-6.1.2.patch
- If nothing appears after applying, it's patched successfully. Otherwise, you may need to manually add the
patch content to
TensileCreateLibrary.py
.
- If nothing appears after applying, it's patched successfully. Otherwise, you may need to manually add the
patch content to
Note: edit the line 41 in file rdeps.py for rocBLAS ,The old repo has an outdated vckpg, which will lead to fail build.update the vcpkg ,by replace with the following line
git clone -b 2024.02.14 https://github.com/microsoft/vcpkg
to udpate the vckpg version.
-
vcpkg Version: If your vcpkg version was built after April 2023, replace
CMakeLists.txt
inTensile/tree/develop/Tensile/Source/lib/CMakeLists.txt
with this version and place it in the same folder (e.g.,rocm
).- For more information, see the official ROCm guide.
- Navigate to the
rocm/rocBLAS
directory in your terminal. - Run
python rdeps.py
. This script will configure your environment and download necessary packages.
python rdeps.py
( using install.sh -d
in linux , if you encounter any mistakes , try to google and fix with it or try it again )
after done . try next step
- After
rdeps.py
completes, run
python rmake.py -a "gfx1101;gfx1103" --lazy-library-loading --no-merge-architectures -t "C:\rocm\Tensile-rocm-5.7.0"
(adjust paths and architectures as needed).
Important:
- Replace
"gfx1101;gfx1103"
with the correct GPU or APU architecture names for your system.Make sure sepearte with ";"if you have more than one arches build . - Make sure read the Editing Tensile/Common.py and blow before to build .
- For ROCm 6.1.2, change the path to
C:\rocm\Tensile-rocm-6.1.2
. - The specific commands and patch files may vary depending on your setup and ROCm version.
After successfully building rocBLAS from source, you need to replace the default rocblas.dll
with your compiled
version for your HIP programs to utilize it. Here's how:
-
Locate your Compiled Files:
-
rocblas.dll
: Located inC:\ROCM\rocBLAS-rocm-5.7.0\build\release\staging\
(or a similar path based on your build location). - Tensile data files: Found within
C:\ROCM\rocBLAS-rocm-5.7.0\build\release\Tensile\library\
(adjust the path if needed).
-
-
Replace the Default rocBLAS:
- Copy
rocblas.dll
toC:\Program Files\AMD\ROCm\5.7\bin
. This is where the HIP SDK looks for it by default.( make sure to bakc up the origianl rocblas.dll )
- Copy
-
Place Tensile Data Files:
- Navigate to
C:\Program Files\AMD\ROCm\5.7\bin\rocblas\
- Replace the
library
with new build ( back up the origianl library by rename to different name ,eg ,bklibrary). This is where you should place all the Tensile data files from your build directory.
- Navigate to
-
Test Your HIP Program:
- Now, when you run your HIP program, it should use your newly compiled
rocblas.dll
and its associated Tensile data files.
- Now, when you run your HIP program, it should use your newly compiled
Important Notes:
- Always double-check the paths to ensure they match your installation configuration.
- Make sure the ROCm version in the
bin
directory matches the version of rocBLAS you built.
This file contains general parameters used by the Tensile library. To ensure compatibility with your GPU, you need
to update two specific settings.Update the value of " globalParameters["SupportedISA"]"
and "CACHED_ASM_CAPS"
with yourgpu ISA and info
.and choose the simliar gpu achetecture. eg RND2 for gfx1031 ,RND2 for gfx1032
, then copy and put below with your gpu number and others availble gpu data .For hip sdk 6.1.2 , CACHED_ASM_CAPS
info move to tensile/AsmCaps.py, also edit architectureMap from line299 to 310 , add your arch infomation .map your arch information to correct logic file .however , some optimized logic don't exsit in the offoicial release. then we need to creat it.otherwilse ,it will creat a fallback no optimized rocblas and library.
Here's a step-by-step guide:
-
Choose Your Architecture:
- Select an existing architecture folder within
rocBLAS\library\src\blas3\Tensile\Logic\asm_full
(e.g.,navi21
). This will serve as a template for your new architecture. - Create a new folder with the name of your target architecture (e.g.,
navi22
).
- Select an existing architecture folder within
-
Copy Files:
- Copy all the files from your chosen template folder into your new architecture folder.
-
Modify Files:
- Open the copied files in a code editor (like VS Code or Visual Studio).
- Search for instances of
navi21
and replace them withnavi22
. - Update any
gfx1030
references togfx1031
(or your target GPU's identifier). - Find lines containing
ISA: [10, 3, 0]
and replace them withISA: [10, 3, 1]
. (Remember to adjust the ISA code according to your GPU) - "Rename all files within the new folder to reflect your architecture name (e.g., change 'navi21' to 'navi22'). You can use a file renaming tool like 'File Rename APP', a free application available in the Windows Store, for this task."
- if build failed ,that's beacuse ROCm architectures have different capabilities. You need to ensure your
rocblas
is tailored to each architecture you're targeting:-
gfx90c: Doesn't support
4x8II
. Delete any logic or files related to4x8II
within theasm_full
folder underrocBLAS\library\src\blas3\Tensile\Logic
. -
gfx1010: Doesn't support
8II
. Do the same for files related to8II
in theasm_full
folder.
-
- Checking Logic Files: The "new named logic file" is likely a critical place where these operations are defined. Carefully review it and remove any unsupported calculations.
-
Use Your New Architecture:
- In
Tensile/Common.py
, update"CACHED_ASM_CAPS"
or the relevant entries inarchitectureMap
to reference your newnavi22
folder.
- In
Important Notes:
- Carefully review the changes you make, as incorrect modifications can lead to errors.
- Some pre-edit custome logic file available at Custom-Logic-Files
(Skip this for HIP 5.7, Necessary for HIP 6.1.2 and forward)
Key Changes:
-
Search for
gfx1030
: Begin by searching within both the Tensile and rocBLAS folders for instances ofgfx1030
. This identifier represents a gfx1030 GPU architecture. -
Replace with Your Target Architecture: Replace all occurrences of
gfx1030
with the corresponding code for your desired GPU architecture (e.g.,gfx1031
).
Important Files to Modify:
-
Tensile: Within the Tensile folder, make changes to:
-
CMakeLists.txt
: This file configures the build process and needs adjustments for new architectures. -
AMDGPU.hpp
: Defines the architecture-specific interface. -
PlaceholderLibrary.hpp
,Predicaters.hpp
,OclUtiles.cpp
: These files contain code related to specific functionalities, which might require modifications for your target GPU.
-
-
rocBLAS: In the rocBLAS folder:
-
CMakeLists.txt
: Similar to Tensile, update this file for your new architecture. -
handle.cpp
,tensile_host.cpp
,handle.hpp
: These files are likely involved in communication and interactions between rocBLAS and the GPU.
-
Caution:
- Modifying these core files can have unintended consequences.
Advanced Usage:
For maximum performance optimization, delve deeper into Tensile's logic files. Examples are provided in
rocBLAS\library\src\blas3\Tensile\Logic\asm_full
.
To achieve truly optimized libraries, you'll need to fine-tune these logic files tailored specifically to your target hardware. The Tensile Tuning Guide provides practical guidance and techniques for this process. Keep in mind that this requires patience, time, and a solid understanding of Tensile's inner workings(need patience!!).
Note2 : on Linux (6.1.0 roclabs) ,you may need to edit more . search Processor
and gfx1102
(choose the supported card) in entire tensile folder,add your gpu number in wheregfx1102
shows.Then build again . More information avaialbe on rocm linux .However ,it is not recommended . There easier approach is using HSA_OVERRIDE_GFX_VERSION
eg, export HSA_OVERRIDE_GFX_VERSION=11.0.0
overide with support gpu.
( The credits goes to wdx04 ,the original post in Chinese . you may google translate refer it from here