Skip to content

Switch from xPPTRF to xPOTRF to improve TurbSim speed on macOS#3123

Merged
andrew-platt merged 1 commit intoOpenFAST:rc-4.2.1from
IrisMeasure:dev
Mar 3, 2026
Merged

Switch from xPPTRF to xPOTRF to improve TurbSim speed on macOS#3123
andrew-platt merged 1 commit intoOpenFAST:rc-4.2.1from
IrisMeasure:dev

Conversation

@IrisMeasure
Copy link

@IrisMeasure IrisMeasure commented Dec 30, 2025

Feature or improvement description
This PR rewrites the subroutines LAPACK_DPPTRF and LAPACK_SPPTRF in NWTC_LAPACK.f90, replacing the packed storage Cholesky decomposition (xPPTRF) with the full storage Cholesky decomposition (xPOTRF). To ensure compatibility with existing callers, the subroutine signature remains unchanged by using an internal wrapper to handle the conversion between packed and full storage formats.

This change results in a substantial speed improvement for TurbSim on macOS, with minimal additional memory overhead.

Related issue, if one exists
#3120

Impacted areas of the software
TurbSim

Test results, if applicable
(1) macOS
I compiled TurbSim using GCC 15.2.0 with the following build flags:

BUILD_UNIT_TESTING=OFF
DOUBLE_PRECISION=OFF
VARIABLE_TRACKING=OFF

I used both versions of TurbSim to generate (i) Grid = 43 x 43, 120-second .bts file; (ii) Grid = 23 x 23, 600-second .bts file. The performance results (on macOS 26.2, M4 Pro) are shown below (Coh2h() is the caller of LAPACK_xPPTRF, and unit in seconds):

(i)

Version Total Time Coh2h()
Original (SPPTRF) 113.4 105.6
Modified (SPOTRF) 11.0 3.5

(ii)

Version Total Time Coh2h()
Original (SPPTRF) 15.5 12.4
Modified (SPOTRF) 4.3 1.1

Furthermore, the two version .bts files differ only in the metadata section, specifically at 0x42 ($n_{character}$) and the related $Character_i$ (typically version info and generated time), while the subsequent data sections are identical.

(2) Windows
I compiled TurbSim using IFORT (from Intel oneAPI 2024.2.1) and IFX (from Intel oneAPI 2025.0.1) with O2 optimization level. The performance results (on Windows 11 24H2, AMD 9950X) are shown below:

(i)

Version Total Time Coh2h()
IFORT + Original (SPPTRF) 45.6 38.1
IFORT + Modified (SPOTRF) 40.9 34.1
IFX + Original (SPPTRF) 44.0 36.5
IFX + Modified (SPOTRF) 44.0 36.4
Release 4.12 50.2 N/A

(ii)

Version Total Time Coh2h()
IFORT + Original (SPPTRF) 11.6 8.2
IFORT + Modified (SPOTRF) 9.5 6.2
IFX + Original (SPPTRF) 12.5 9.3
IFX + Modified (SPOTRF) 11.7 8.9
Release 4.12 13.5 N/A

After switching to SPOTRF, the computation speed of TurbSim on Windows has at least not decreased.
It should be noted that the .bts files generated by two versions of TurbSim (same compiler) are slightly different on Windows. However, in terms of engineering accuracy, this difference is negligible.

@andrew-platt andrew-platt added this to the v5.0.0 milestone Dec 31, 2025
@andrew-platt andrew-platt changed the base branch from dev to rc-4.2.1 March 2, 2026 23:04
andrew-platt added a commit to andrew-platt/openfast that referenced this pull request Mar 2, 2026
andrew-platt added a commit to andrew-platt/openfast that referenced this pull request Mar 2, 2026
@andrew-platt andrew-platt merged commit 39af771 into OpenFAST:rc-4.2.1 Mar 3, 2026
54 of 61 checks passed
andrew-platt added a commit that referenced this pull request Mar 3, 2026
Re-add the xPPTRF routines (rename xPOTRF routines from #3123)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants