Skip to content

bfb: Bugfix for FatesSp compiled with intel#3592

Merged
ekluzek merged 9 commits intoESCOMP:b4b-devfrom
mvdebolskiy:bugfix-fatessp-itype
Mar 17, 2026
Merged

bfb: Bugfix for FatesSp compiled with intel#3592
ekluzek merged 9 commits intoESCOMP:b4b-devfrom
mvdebolskiy:bugfix-fatessp-itype

Conversation

@mvdebolskiy
Copy link
Copy Markdown
Contributor

@mvdebolskiy mvdebolskiy commented Nov 8, 2025

Description of changes

Added assignments for mlai* variables when fates is on, but patches/columns are not fates.

Specific notes

When #2935 was merged, there were no intel tests for FatesSp that were not with 1x1_brazil so the error in #3507 has never been encountered (brazil has only 1 natveg column). Gnu somehow does not give fpe illegal operation for assignments to nan.

Contributors other than yourself, if any:

CTSM Issues Fixed (include github issue #):

#3507 maybe others

Are answers expected to change (and if so in what way)?

Should not but not ran baseline comparison yet.
ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode will pass now.

Any User Interface Changes (namelist or namelist defaults changes)?

No.

Does this create a need to change or add documentation? Did you do so?

No.

Testing performed, if any:

  ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode (Overall: PASS) details:
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode CREATE_NEWCASE
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode XML
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode SETUP
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode SHAREDLIB_BUILD time=7
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode MODEL_BUILD time=34
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode SUBMIT
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode RUN time=225
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode COMPARE_base_hybrid
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode COMPARE_base_rest
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode MEMLEAK insufficient data for memleak test
    PASS ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode SHORT_TERM_ARCHIVER
  SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen (Overall: PASS) details:
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen CREATE_NEWCASE
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen XML
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen SETUP
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen SHAREDLIB_BUILD time=89
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen MODEL_BUILD time=32
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen SUBMIT
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen RUN time=72
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen MEMLEAK insufficient data for memleak test
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen SHORT_TERM_ARCHIVER

This might rather be merged in bfb-dev @ekluzek @samsrabin you can change the target if that's ok.

@mvdebolskiy mvdebolskiy requested review from ekluzek and rgknox November 8, 2025 16:17
@mvdebolskiy mvdebolskiy linked an issue Nov 8, 2025 that may be closed by this pull request
@mvdebolskiy mvdebolskiy self-assigned this Nov 8, 2025
@ekluzek ekluzek moved this to In progress - master in CTSM: Upcoming tags Nov 10, 2025
@mvdebolskiy
Copy link
Copy Markdown
Contributor Author

mvdebolskiy commented Nov 10, 2025

@ekluzek
I think it might go into bfb-dev I've tested against the same tests run with ctsm5.3.084 as baselines:

mvdebolskiy@derecho6:/glade/derecho/scratch/mvdebolskiy/deftst> ./cs.status.dev_084  | grep BASELINE
    PASS SMS_D_Ld2.f45_f45_mg37.I1850Clm60Sp.derecho_intel BASELINE fdef_084:
    PASS SMS_D_Ld2.f45_f45_mg37.I2000Clm60BgcCrop.derecho_intel.clm-default BASELINE fdef_084:
    FAIL SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen BASELINE fdef_084: ERROR BFAIL some baseline files were missing
    PASS SMS_D_Ld5.f45_f45_mg37.I2000Clm60Fates.derecho_intel.clm-FatesColdNoComp BASELINE fdef_084:

Can not run any gnu tests though.
SMS_D_Ld2.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_intel.clm-FatesColdSatPhen obv fails with ctsm5.3.084.

@mvdebolskiy mvdebolskiy changed the base branch from master to b4b-dev November 10, 2025 18:31
@mvdebolskiy
Copy link
Copy Markdown
Contributor Author

I've also removed:
ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_gnu.clm-FatesColdSatPhenCamLndTuningMode
associated with #3496 since it was fixed in NGEET/fates#1397 which is sci.1.87.4_api.41.0.0 and we are already checking out sci.1.88.0_api.42.0.0.

@wwieder
Copy link
Copy Markdown
Contributor

wwieder commented Nov 13, 2025

@ekluzek will do testing to see if this can go to b4b_dev

@ekluzek ekluzek self-assigned this Nov 13, 2025
@ekluzek ekluzek moved this from In progress - master to In progress - b4b-dev in CTSM: Upcoming tags Nov 13, 2025
@ekluzek
Copy link
Copy Markdown
Contributor

ekluzek commented Nov 13, 2025

Discussed this morning. We should also make sure that we have a intel DEBUG test for a global grid and not just the DEBUG intel 1x1_brazil test we have now.

Copy link
Copy Markdown
Contributor

@rgknox rgknox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look fine to me. As we discussed in the SE meeting, you shouldn't need the fates patch filter. The zeroing outside the column filter should be the meaningful change. That being said, I have no problems with this.

@github-project-automation github-project-automation Bot moved this from In progress - b4b-dev to In progress - master in CTSM: Upcoming tags Nov 13, 2025
@ekluzek ekluzek moved this from In progress - master to In progress - b4b-dev in CTSM: Upcoming tags Nov 13, 2025
Copy link
Copy Markdown
Contributor

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I'm glad you caught this, and are bringing this in.

@github-project-automation github-project-automation Bot moved this from In progress - b4b-dev to In progress - master in CTSM: Upcoming tags Nov 13, 2025
@ekluzek ekluzek added the b4b bit-for-bit label Nov 13, 2025
@ekluzek ekluzek moved this from Todo to In Progress in LMWG: Sprint Planning Board Nov 13, 2025
@wwieder wwieder moved this from In progress - master to In progress - b4b-dev in CTSM: Upcoming tags Nov 20, 2025
@ekluzek ekluzek moved this from In Progress to Todo in LMWG: Sprint Planning Board Feb 3, 2026
@ekluzek ekluzek changed the title Bugfix for FatesSp compiled with intel bfb: Bugfix for FatesSp compiled with intel Mar 13, 2026
@ekluzek
Copy link
Copy Markdown
Contributor

ekluzek commented Mar 17, 2026

Testing is as expected, with the exception of the following showing differences:

ERI_D_Ld9.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdSatPhenCamLndTuningMode
ERP_D_P64x2_Ld3.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.clm-flexCN_FUN--clm-matrixcnOn_ignore_warnings
ERP_D_P64x2_Ld3.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.clm-noFUN_flexCN--clm-matrixcnOn_ignore_warnings

the first one is due to the baselines not being complete, so that's fine.

The next two are actually fine, because those are screwy unreliable tests of CN matrix with threading. So I think the best answer is to remove them from the testlist. Or I could mark it like this one...

ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings EXPECTED POSSIBILITY

@ekluzek ekluzek merged commit 0c06f53 into ESCOMP:b4b-dev Mar 17, 2026
4 checks passed
@github-project-automation github-project-automation Bot moved this from In progress - b4b-dev to Done (non release/external) in CTSM: Upcoming tags Mar 17, 2026
@ekluzek ekluzek deleted the bugfix-fatessp-itype branch March 17, 2026 21:29
@slevis-lmwg slevis-lmwg restored the bugfix-fatessp-itype branch March 27, 2026 18:20
@slevis-lmwg
Copy link
Copy Markdown
Contributor

Trying a test that fails in #3894
./create_test SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Fates.derecho_intel.clm-FatesFireLightningPopDens--clm-NEON-FATES-NIWO -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.4.025

@slevis-lmwg
Copy link
Copy Markdown
Contributor

The above test gives the same error here as in #3894:

forrtl: error (73): floating divide by zero
Image              PC                Routine            Line        Source       
libc.so.6          00001527E1647900  Unknown               Unknown  Unknown
libesmf.so         00001527E3C4B616  _ZN5ESMCI4BBoxC1E     Unknown  Unknown
libesmf.so         00001527E40CF410  _ZN5ESMCI16OctSea     Unknown  Unknown
libesmf.so         00001527E40D1568  _ZN5ESMCI9OctSear     Unknown  Unknown
libesmf.so         00001527E3EC9F01  _ZN5ESMCI6InterpC     Unknown  Unknown
libesmf.so         00001527E4039978  _ZN5ESMCI6regridE     Unknown  Unknown
libesmf.so         00001527E40783B8  _Z19ESMCI_regrid_     Unknown  Unknown
libesmf.so         00001527E3FF8EC3  _ZN5ESMCI7MeshCap     Unknown  Unknown
libesmf.so         00001527E40C16B2  c_esmc_regrid_cre     Unknown  Unknown
libesmf.so         00001527E4774A50  esmf_regridmod_mp     Unknown  Unknown
libesmf.so         00001527E44E7578  esmf_fieldregridm     Unknown  Unknown
cesm.exe           00000000037EC7CC  shr_strdata_init          639  dshr_strdata_mod.F90
cesm.exe           00000000037E0632  shr_strdata_init_         347  dshr_strdata_mod.F90
cesm.exe           000000000372A30B  init                      115  ch4FInundatedStreamType.F90
cesm.exe           00000000030AEF75  init                      241  ch4Mod.F90
cesm.exe           0000000000936C3B  clm_instinit              396  clm_instMod.F90
cesm.exe           000000000092917C  initialize2               419  clm_initializeMod.F90
cesm.exe           000000000085A034  initializerealize         677  lnd_comp_nuopc.F90

My current thinking is that I should back out the changes from this PR that got merged into b4b-dev.

@slevis-lmwg
Copy link
Copy Markdown
Contributor

Oh, except that I get the same error in Erik's PR here #3822, so I may need to brainstorm with Erik...

@slevis-lmwg slevis-lmwg deleted the bugfix-fatessp-itype branch March 27, 2026 18:55
@slevis-lmwg
Copy link
Copy Markdown
Contributor

Ok, the problem appears in 024 as documented in the ChangeLog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

b4b bit-for-bit

Projects

Status: Done (non release/external)
Status: Done

Development

Successfully merging this pull request may close these issues.

ERI RUN test failure with FatesColdSatPhenCamLndTuningMode

5 participants