GK sheath BC: adding a mu dependence to parallel cut velocity by Antoinehoff · Pull Request #960 · ammarhakim/gkeyll

Antoinehoff · 2026-02-25T21:00:25Z

In this PR, we implement the ability to apply a conducting sheath boundary condition (BC) with a $\mu$-dependent parallel velocity cut (see DR #957 for more details). This PR also includes and improves the sheath BC unit test presented in PR #956. The kernels are generated with the branch of this Gkylcas PR

Design choice

The main design decision is to encode the $\mu$ dependence in a non-dimensional factor $\alpha(\mu)$ such that,

$$v_{\parallel \text{cut}}^2(\mu) = \alpha(\mu) v_{\parallel \text{cut} 0}^2$$

where $v_{\parallel \text{cut} 0}$ is the currently implemented cutting parallel velocity. This allows to preserve the existing code structure, leaving the cut velocity evaluation at the kernel level.

Implementation details

The main changes are mainly centered around the sheath BC updater gkyl_bc_sheath_gyrokinetic. A new DG array is added to the updater structure,

  struct gkyl_array *alpha_mu; // factor for mu dependent vcut in sheath BC.

to store the $\alpha(\mu)$ factor as a 2D $(v_\parallel,\mu)$ GK hybrid DG array. The use of a 2D representation simplifies the kernel generation by making the nodal evaluation more straightforward.

The updater creator, gkyl_bc_sheath_gyrokinetic_new, sets alpha_mu = 1 by default, which yields the current state of the sheath BC implementation.
One can manipulate alpha_mu with the new function gkyl_bc_sheath_gyrokinetic_set_alpha_mu to copy an existing array into alpha_mu.

Demonstration

We run gyrokinetic/unit/ctest_bc_twistshift.c which sets up Maxwellian distribution functions in 1x2v, 2x2v, and 3x2v domains. This test also sets a $\mu$-dependent parallel cut velocity as

void
eval_func_alpha_mu(double t, const double *xn, double* GKYL_RESTRICT fout, void *ctx)
{
  double vpar = xn[0], mu = xn[1];

  double Lmu = 1.0; // Characteristic scale length in mu direction.
  double alpha_mu_0 = 1.0;

  double alpha_mu = alpha_mu_0 * (1 + exp(-mu/Lmu));
  fout[0] =  pow(alpha_mu, 2);
}

which adds an exponential decaying term to the cutting velocity. This function is projected on a DG GK hybrid 2v basis:

  gkyl_proj_on_basis *projalphamu = gkyl_proj_on_basis_inew( &(struct gkyl_proj_on_basis_inp) {
      .grid = &grid_vel->grid_vel,
      .basis = basis_vel,
      .num_ret_vals = 1,
      .eval = eval_func_alpha_mu,
      .ctx = NULL,
    }
  );
  gkyl_proj_on_basis_advance(projalphamu, 0.0, &grid_vel->local_vel, alpha_mu);
  gkyl_proj_on_basis_release(projalphamu);

where the grid_vel is the same as the one that define the bc_sheath updater and the basis_vel is obtained using a new bc_sheath interface

gkyl_bc_sheath_gyrokinetic_get_alpha_mu_basis(bcsheath, &basis_vel);

We then copy our alpha_mu array to the bc_sheath updater using

gkyl_bc_sheath_gyrokinetic_set_alpha_mu(bcsheath, alpha_mu);

We run and plot the resulting distribution functions using this script: bc_sheath_test.sh, which produces figures like:

Note that we varied the Maxwellian parallel flow depending on the dimensionality, specifically $u_\parallel=0$ for 1x2v (bottom row), $u_\parallel=v_{th}$ for 2x2v (middle row), and $u_\parallel=-v_{th}$ for 3x2v (top row).

manauref · 2026-05-14T23:08:13Z

in the description above you say
"We run gyrokinetic/unit/ctest_bc_twistshift.c"
is this a typo?

manauref

Looks mostly benign. I have two higher level questions:

Have you measured whether there are any performance changes in say TCV IWL 3x2v sims with these new kernels?
Hare you thought about whether there's any difference in vcut_fact being discretized with project_on_basis or with eval_on_nodes? i.e. is there any benefit/disadvantage to vcut_fact being continuous along mu?

…(` and comment out the print statement in the unit test for bc sheath

…ction for clarity

…is one. I cherry picked the code and the ctest runs fine on GPU now.

Antoinehoff · 2026-05-15T15:16:02Z

Looks mostly benign. I have two higher level questions:

Have you measured whether there are any performance changes in say TCV IWL 3x2v sims with these new kernels?

You can find below two performance comparison tests using the TCV 3x2v setup. In short, I see 1% cost increase on a pessimistic 1CPU scenario, and no changes in production like TCV.

on 1 CPU for the TCV 3x2v regression test

We use an adapted resolution (18x8x4x8x8) to create a worst case scenario (note: the shift function is halved to make it terminate). There is a slight cost increase of the BC evaluation in this scenario ~1%.

main run

ahoffman@ahoffman-lt rt_gk_tcv.o % ./rt_gk_tcv_iwl_adapt_source_3x2v_p1
        Step 2 at time 3.1526187e-08.  Time-step  1.576385e-08.  Completed 10%
...
        Step 13 at time 2.0498352e-07.  Time-step  1.576927e-08.  Completed 100%

Number of update calls 13
Number of forward-Euler calls 39
...
Timing:
  - Time loop:                         1.4562e+01 sec.
    * Forward Euler:                   1.1978e+01 sec. / 82.25 %.
...
    * Field solves:                    1.2665e-01 sec. / 0.87 %.
....
    * Boundary conditions::            5.2016e-01 sec. / 3.57 %.
....

Mu-dependent vcut fact

ahoffman@ahoffman-lt rt_gk_tcv.o % ./rt_gk_tcv_iwl_adapt_source_3x2v_p1
        Step 2 at time 3.1526187e-08.  Time-step  1.576385e-08.  Completed 10%
...
        Step 13 at time 2.0498361e-07.  Time-step  1.576928e-08.  Completed 100%

Number of update calls 13
Number of forward-Euler calls 39
...
Timing:
  - Time loop:                         1.4786e+01 sec.
    * Forward Euler:                   1.2048e+01 sec. / 81.48 %.
...
    * Field solves:                    1.2537e-01 sec. / 0.85 %.
...
    * Boundary conditions::            6.7842e-01 sec. / 4.59 %.
...

1 A100 GPU with coarse production TCV 3x2v

Here is a test with 100 steps of the coarse production like tcv setup (resolution 24x16x12x12x8) on one A100 GPU.
We do not see any difference in terms of BC evaluation cost here. So I think the mu-dependent factor has negligible effects in a production scenario.

main

...
Starting main loop ...
Taking time-step at t = 0 mus ... dt = 0.0050033 mus
Taking time-step at t = 0.474264 mus ... dt = 0.00480274 mus
 ... finished

Number of update calls 100
Number of forward-Euler calls 391
Number of RK stage-2 failures 45
Max rel dt diff for RK stage-2 failures 0.192559
Min rel dt diff for RK stage-2 failures 0.0112318
Number of RK stage-3 failures 1
Number of write calls 4
Timing:
  - Time loop:                         6.5325e+01 sec.
    * Forward Euler:                   4.4306e+01 sec. / 67.82 %.
...
    * Field solves:                    1.1957e+01 sec. / 18.30 %.
...
    * Boundary conditions::            3.0336e+00 sec. / 4.64 %.
...

this branch

...
Starting main loop ...
Taking time-step at t = 0 mus ... dt = 0.0050033 mus
Taking time-step at t = 0.474244 mus ... dt = 0.00480602 mus
 ... finished

Number of update calls 100
Number of forward-Euler calls 391
Number of RK stage-2 failures 45
Max rel dt diff for RK stage-2 failures 0.184554
Min rel dt diff for RK stage-2 failures 0.0120736
Number of RK stage-3 failures 1
Number of write calls 4
Timing:
  - Time loop:                         6.5500e+01 sec.
    * Forward Euler:                   4.4268e+01 sec. / 67.59 %.
...
    * Field solves:                    1.2070e+01 sec. / 18.43 %.
...
    * Boundary conditions::            3.0237e+00 sec. / 4.62 %.
...

Antoinehoff · 2026-05-15T15:21:33Z

Hare you thought about whether there's any difference in vcut_fact being discretized with project_on_basis or with eval_on_nodes? i.e. is there any benefit/disadvantage to vcut_fact being continuous along mu?

So we do not require to be continuous along the mu direction so I don't see why we would u se eval_on_nodes. It would just lead to a poorer interpolation of the cutting velocity curve which is already coarse because of our usual low mu resolution.

Antoinehoff · 2026-05-15T18:51:28Z

The last commit was tested and it is valgrind clean using (note that it takes a few minutes to run this)

valgrind --leak-check=full --max-stackframe=10000000 -s ./rt_gk_tcv_adapt_src_3x2v_p1 -s 1

and 2 GPUs compute-sanitizer clean (except Nvidia communicator errors that are not related to this PR) using

srun -N 1 --ntasks=2 --gpus-per-task=1 --gpu-bind=closest compute-sanitizer --tool memcheck --leak-check full ./rt_gk_tcv_adapt_src_3x2v_p1 -s 1

JunoRavin · 2026-05-19T14:28:47Z

I have a structural question for how the vcut indexing working. The description of the algorithm looks like alpha is implicitly a function of (x,y) too (or you define a per-configuration-space alpha coefficient). How does this alpha projection fold into this additional indexing per (x,y) cell?

Antoinehoff · 2026-05-19T14:33:08Z

I have a structural question for how the vcut indexing working. The description of the algorithm looks like alpha is implicitly a function of (x,y) too (or you define a per-configuration-space alpha coefficient). How does this alpha projection fold into this additional indexing per (x,y) cell?

In the code alpha = alpha(x,y,mu), we have a gkyl array with this "weird" cdim - 1 + vdim - 1 dimensionality that is stored in vcut_fact_dim like here https://github.com/ammarhakim/gkeyll/pull/960/changes#diff-2700385dae5201f7645452699f0afc26973516db9777729dcf61da738eb76acfR37

Does this answer your question @JunoRavin ?

JunoRavin · 2026-05-19T15:12:25Z

It looks like this particular PR is a minimal working example of vcut, because my guess is that alpha(x,y,mu) is provided by GYRAZE in production? Like you don't make alpha within the bc_sheath_gyrokinetic. You have helper functions for acquiring the pointer to the vcut array, but you make the ranges for indexing vcut within this function? It's hard to trace the program flow for where alpha is defined because it looks like bc_sheath_gyrokinetic is managing some of the memory, but the computation of vcut must be external to bc_sheath_gyrokinetic? There's a question about why the vcut allocation and range initialization is handled by bc_sheath_gyrokinetic if the manipulation of this memory is external to this object (like you could treat vcut like how we treat auxiliary arrays for the DG gyrokinetic equation object; you could just pass a pointer to the vcut array and the vcut range to bc_sheath_gyrokinetic at t=0, and then let some other updater update vcut in the same way you have a different updater updater Apar/phi/etc.)

manauref · 2026-05-19T15:20:20Z

in retrospect i think this should merge into the aisheath PR, not main

Antoinehoff · 2026-05-19T15:39:40Z

It looks like this particular PR is a minimal working example of vcut, because my guess is that alpha(x,y,mu) is provided by GYRAZE in production? Like you don't make alpha within the bc_sheath_gyrokinetic. You have helper functions for acquiring the pointer to the vcut array, but you make the ranges for indexing vcut within this function? It's hard to trace the program flow for where alpha is defined because it looks like bc_sheath_gyrokinetic is managing some of the memory, but the computation of vcut must be external to bc_sheath_gyrokinetic? There's a question about why the vcut allocation and range initialization is handled by bc_sheath_gyrokinetic if the manipulation of this memory is external to this object (like you could treat vcut like how we treat auxiliary arrays for the DG gyrokinetic equation object; you could just pass a pointer to the vcut array and the vcut range to bc_sheath_gyrokinetic at t=0, and then let some other updater update vcut in the same way you have a different updater updater Apar/phi/etc.)

Ok i get it better now. Indeed it is not needed to allocate a vcut array at all and instead just pass a function to evaluate vcut(x,y,mu) returning a double. The vcut array idea started because we wanted to maybe update the vcut curve every n step to avoid expensive computations. I will remove this array later in the aisheath branch.

Antoinehoff marked this pull request as ready for review February 26, 2026 15:55

Antoinehoff requested review from JunoRavin and manauref February 26, 2026 15:55

Antoinehoff mentioned this pull request Feb 26, 2026

Gk sheath bc with mu dependent vcut ammarhakim/gkylcas#95

Open

Antoinehoff mentioned this pull request Mar 23, 2026

Coupling with an AI model for sheath BC in GK simulations #982

Open

manauref requested changes May 14, 2026

View reviewed changes

Antoinehoff added 4 commits May 15, 2026 09:26

Update the kernels with float(expand( instead of `float(fullratsimp…

82efe9b

…(` and comment out the print statement in the unit test for bc sheath

add temp variable for vcut_fact dimensions and rename acquisition fun…

6025a4e

…ction for clarity

Merge remote-tracking branch 'origin/main' into gk_sheath_vcut_mu_dep

2a0a198

The GPU implementation was done on the sheath_ai branch but not on th…

4d8fb31

…is one. I cherry picked the code and the ctest runs fine on GPU now.

Antoinehoff requested a review from manauref May 15, 2026 15:21

define and use vcut_fact_dim instead of cdim

b65b451

manauref approved these changes May 15, 2026

View reviewed changes

Antoinehoff requested a review from akashukla May 18, 2026 12:55

Antoinehoff changed the base branch from main to gk_ai_sheath May 19, 2026 15:23

Antoinehoff merged commit a161441 into gk_ai_sheath May 19, 2026
1 check passed

Conversation

Antoinehoff commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design choice

Implementation details

Demonstration

Uh oh!

manauref commented May 14, 2026

Uh oh!

manauref left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Antoinehoff commented May 15, 2026

on 1 CPU for the TCV 3x2v regression test

main run

Mu-dependent vcut fact

1 A100 GPU with coarse production TCV 3x2v

main

Uh oh!

Antoinehoff commented May 15, 2026

Uh oh!

Antoinehoff commented May 15, 2026

Uh oh!

JunoRavin commented May 19, 2026

Uh oh!

Antoinehoff commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JunoRavin commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manauref commented May 19, 2026

Uh oh!

Uh oh!

Antoinehoff commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Antoinehoff commented Feb 25, 2026 •

edited

Loading

Antoinehoff commented May 19, 2026 •

edited

Loading

JunoRavin commented May 19, 2026 •

edited

Loading