Skip to content

GK sheath BC: adding a mu dependence to parallel cut velocity#960

Merged
Antoinehoff merged 5 commits into
gk_ai_sheathfrom
gk_sheath_vcut_mu_dep
May 19, 2026
Merged

GK sheath BC: adding a mu dependence to parallel cut velocity#960
Antoinehoff merged 5 commits into
gk_ai_sheathfrom
gk_sheath_vcut_mu_dep

Conversation

@Antoinehoff
Copy link
Copy Markdown
Collaborator

@Antoinehoff Antoinehoff commented Feb 25, 2026

In this PR, we implement the ability to apply a conducting sheath boundary condition (BC) with a $\mu$-dependent parallel velocity cut (see DR #957 for more details). This PR also includes and improves the sheath BC unit test presented in PR #956. The kernels are generated with the branch of this Gkylcas PR

Design choice

The main design decision is to encode the $\mu$ dependence in a non-dimensional factor $\alpha(\mu)$ such that,

$$v_{\parallel \text{cut}}^2(\mu) = \alpha(\mu) v_{\parallel \text{cut} 0}^2$$

where $v_{\parallel \text{cut} 0}$ is the currently implemented cutting parallel velocity. This allows to preserve the existing code structure, leaving the cut velocity evaluation at the kernel level.

Implementation details

The main changes are mainly centered around the sheath BC updater gkyl_bc_sheath_gyrokinetic. A new DG array is added to the updater structure,

  struct gkyl_array *alpha_mu; // factor for mu dependent vcut in sheath BC.

to store the $\alpha(\mu)$ factor as a 2D $(v_\parallel,\mu)$ GK hybrid DG array. The use of a 2D representation simplifies the kernel generation by making the nodal evaluation more straightforward.

The updater creator, gkyl_bc_sheath_gyrokinetic_new, sets alpha_mu = 1 by default, which yields the current state of the sheath BC implementation.
One can manipulate alpha_mu with the new function gkyl_bc_sheath_gyrokinetic_set_alpha_mu to copy an existing array into alpha_mu.

Demonstration

We run gyrokinetic/unit/ctest_bc_twistshift.c which sets up Maxwellian distribution functions in 1x2v, 2x2v, and 3x2v domains. This test also sets a $\mu$-dependent parallel cut velocity as

void
eval_func_alpha_mu(double t, const double *xn, double* GKYL_RESTRICT fout, void *ctx)
{
  double vpar = xn[0], mu = xn[1];

  double Lmu = 1.0; // Characteristic scale length in mu direction.
  double alpha_mu_0 = 1.0;

  double alpha_mu = alpha_mu_0 * (1 + exp(-mu/Lmu));
  fout[0] =  pow(alpha_mu, 2);
}

which adds an exponential decaying term to the cutting velocity. This function is projected on a DG GK hybrid 2v basis:

  gkyl_proj_on_basis *projalphamu = gkyl_proj_on_basis_inew( &(struct gkyl_proj_on_basis_inp) {
      .grid = &grid_vel->grid_vel,
      .basis = basis_vel,
      .num_ret_vals = 1,
      .eval = eval_func_alpha_mu,
      .ctx = NULL,
    }
  );
  gkyl_proj_on_basis_advance(projalphamu, 0.0, &grid_vel->local_vel, alpha_mu);
  gkyl_proj_on_basis_release(projalphamu);

where the grid_vel is the same as the one that define the bc_sheath updater and the basis_vel is obtained using a new bc_sheath interface

gkyl_bc_sheath_gyrokinetic_get_alpha_mu_basis(bcsheath, &basis_vel);

We then copy our alpha_mu array to the bc_sheath updater using

gkyl_bc_sheath_gyrokinetic_set_alpha_mu(bcsheath, alpha_mu);

We run and plot the resulting distribution functions using this script: bc_sheath_test.sh, which produces figures like:

Screenshot 2026-02-26 at 10 17 57 AM

Note that we varied the Maxwellian parallel flow depending on the dimensionality, specifically $u_\parallel=0$ for 1x2v (bottom row), $u_\parallel=v_{th}$ for 2x2v (middle row), and $u_\parallel=-v_{th}$ for 3x2v (top row).

@manauref
Copy link
Copy Markdown
Collaborator

in the description above you say
"We run gyrokinetic/unit/ctest_bc_twistshift.c"
is this a typo?

Copy link
Copy Markdown
Collaborator

@manauref manauref left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly benign. I have two higher level questions:

  1. Have you measured whether there are any performance changes in say TCV IWL 3x2v sims with these new kernels?
  2. Hare you thought about whether there's any difference in vcut_fact being discretized with project_on_basis or with eval_on_nodes? i.e. is there any benefit/disadvantage to vcut_fact being continuous along mu?

Comment thread gyrokinetic/ker/bc_sheath_gyrokinetic/bc_sheath_gyrokinetic_ser_p1.c Outdated
Comment thread gyrokinetic/unit/ctest_bc_sheath_gyrokinetic.c
Comment thread gyrokinetic/zero/bc_sheath_gyrokinetic.c Outdated
Comment thread gyrokinetic/zero/bc_sheath_gyrokinetic.c Outdated
Comment thread gyrokinetic/zero/bc_sheath_gyrokinetic.c Outdated
Comment thread gyrokinetic/zero/bc_sheath_gyrokinetic.c Outdated
Comment thread gyrokinetic/zero/gkyl_bc_sheath_gyrokinetic.h Outdated
@Antoinehoff
Copy link
Copy Markdown
Collaborator Author

Looks mostly benign. I have two higher level questions:

  1. Have you measured whether there are any performance changes in say TCV IWL 3x2v sims with these new kernels?

You can find below two performance comparison tests using the TCV 3x2v setup. In short, I see 1% cost increase on a pessimistic 1CPU scenario, and no changes in production like TCV.

on 1 CPU for the TCV 3x2v regression test

We use an adapted resolution (18x8x4x8x8) to create a worst case scenario (note: the shift function is halved to make it terminate). There is a slight cost increase of the BC evaluation in this scenario ~1%.

main run

ahoffman@ahoffman-lt rt_gk_tcv.o % ./rt_gk_tcv_iwl_adapt_source_3x2v_p1
        Step 2 at time 3.1526187e-08.  Time-step  1.576385e-08.  Completed 10%
...
        Step 13 at time 2.0498352e-07.  Time-step  1.576927e-08.  Completed 100%

Number of update calls 13
Number of forward-Euler calls 39
...
Timing:
  - Time loop:                         1.4562e+01 sec.
    * Forward Euler:                   1.1978e+01 sec. / 82.25 %.
...
    * Field solves:                    1.2665e-01 sec. / 0.87 %.
....
    * Boundary conditions::            5.2016e-01 sec. / 3.57 %.
....

Mu-dependent vcut fact

ahoffman@ahoffman-lt rt_gk_tcv.o % ./rt_gk_tcv_iwl_adapt_source_3x2v_p1
        Step 2 at time 3.1526187e-08.  Time-step  1.576385e-08.  Completed 10%
...
        Step 13 at time 2.0498361e-07.  Time-step  1.576928e-08.  Completed 100%

Number of update calls 13
Number of forward-Euler calls 39
...
Timing:
  - Time loop:                         1.4786e+01 sec.
    * Forward Euler:                   1.2048e+01 sec. / 81.48 %.
...
    * Field solves:                    1.2537e-01 sec. / 0.85 %.
...
    * Boundary conditions::            6.7842e-01 sec. / 4.59 %.
...

1 A100 GPU with coarse production TCV 3x2v

Here is a test with 100 steps of the coarse production like tcv setup (resolution 24x16x12x12x8) on one A100 GPU.
We do not see any difference in terms of BC evaluation cost here. So I think the mu-dependent factor has negligible effects in a production scenario.

main

...
Starting main loop ...
Taking time-step at t = 0 mus ... dt = 0.0050033 mus
Taking time-step at t = 0.474264 mus ... dt = 0.00480274 mus
 ... finished

Number of update calls 100
Number of forward-Euler calls 391
Number of RK stage-2 failures 45
Max rel dt diff for RK stage-2 failures 0.192559
Min rel dt diff for RK stage-2 failures 0.0112318
Number of RK stage-3 failures 1
Number of write calls 4
Timing:
  - Time loop:                         6.5325e+01 sec.
    * Forward Euler:                   4.4306e+01 sec. / 67.82 %.
...
    * Field solves:                    1.1957e+01 sec. / 18.30 %.
...
    * Boundary conditions::            3.0336e+00 sec. / 4.64 %.
...

this branch

...
Starting main loop ...
Taking time-step at t = 0 mus ... dt = 0.0050033 mus
Taking time-step at t = 0.474244 mus ... dt = 0.00480602 mus
 ... finished

Number of update calls 100
Number of forward-Euler calls 391
Number of RK stage-2 failures 45
Max rel dt diff for RK stage-2 failures 0.184554
Min rel dt diff for RK stage-2 failures 0.0120736
Number of RK stage-3 failures 1
Number of write calls 4
Timing:
  - Time loop:                         6.5500e+01 sec.
    * Forward Euler:                   4.4268e+01 sec. / 67.59 %.
...
    * Field solves:                    1.2070e+01 sec. / 18.43 %.
...
    * Boundary conditions::            3.0237e+00 sec. / 4.62 %.
...

@Antoinehoff
Copy link
Copy Markdown
Collaborator Author

  1. Hare you thought about whether there's any difference in vcut_fact being discretized with project_on_basis or with eval_on_nodes? i.e. is there any benefit/disadvantage to vcut_fact being continuous along mu?

So we do not require to be continuous along the mu direction so I don't see why we would u se eval_on_nodes. It would just lead to a poorer interpolation of the cutting velocity curve which is already coarse because of our usual low mu resolution.

@Antoinehoff Antoinehoff requested a review from manauref May 15, 2026 15:21
@Antoinehoff
Copy link
Copy Markdown
Collaborator Author

The last commit was tested and it is valgrind clean using (note that it takes a few minutes to run this)

valgrind --leak-check=full --max-stackframe=10000000 -s ./rt_gk_tcv_adapt_src_3x2v_p1 -s 1

and 2 GPUs compute-sanitizer clean (except Nvidia communicator errors that are not related to this PR) using

srun -N 1 --ntasks=2 --gpus-per-task=1 --gpu-bind=closest compute-sanitizer --tool memcheck --leak-check full ./rt_gk_tcv_adapt_src_3x2v_p1 -s 1

@Antoinehoff Antoinehoff requested a review from akashukla May 18, 2026 12:55
@JunoRavin
Copy link
Copy Markdown
Collaborator

I have a structural question for how the vcut indexing working. The description of the algorithm looks like alpha is implicitly a function of (x,y) too (or you define a per-configuration-space alpha coefficient). How does this alpha projection fold into this additional indexing per (x,y) cell?

@Antoinehoff
Copy link
Copy Markdown
Collaborator Author

Antoinehoff commented May 19, 2026

I have a structural question for how the vcut indexing working. The description of the algorithm looks like alpha is implicitly a function of (x,y) too (or you define a per-configuration-space alpha coefficient). How does this alpha projection fold into this additional indexing per (x,y) cell?

In the code alpha = alpha(x,y,mu), we have a gkyl array with this "weird" cdim - 1 + vdim - 1 dimensionality that is stored in vcut_fact_dim like here https://github.com/ammarhakim/gkeyll/pull/960/changes#diff-2700385dae5201f7645452699f0afc26973516db9777729dcf61da738eb76acfR37

Does this answer your question @JunoRavin ?

@JunoRavin
Copy link
Copy Markdown
Collaborator

JunoRavin commented May 19, 2026

It looks like this particular PR is a minimal working example of vcut, because my guess is that alpha(x,y,mu) is provided by GYRAZE in production? Like you don't make alpha within the bc_sheath_gyrokinetic. You have helper functions for acquiring the pointer to the vcut array, but you make the ranges for indexing vcut within this function? It's hard to trace the program flow for where alpha is defined because it looks like bc_sheath_gyrokinetic is managing some of the memory, but the computation of vcut must be external to bc_sheath_gyrokinetic? There's a question about why the vcut allocation and range initialization is handled by bc_sheath_gyrokinetic if the manipulation of this memory is external to this object (like you could treat vcut like how we treat auxiliary arrays for the DG gyrokinetic equation object; you could just pass a pointer to the vcut array and the vcut range to bc_sheath_gyrokinetic at t=0, and then let some other updater update vcut in the same way you have a different updater updater Apar/phi/etc.)

@manauref
Copy link
Copy Markdown
Collaborator

in retrospect i think this should merge into the aisheath PR, not main

@Antoinehoff Antoinehoff changed the base branch from main to gk_ai_sheath May 19, 2026 15:23
@Antoinehoff Antoinehoff merged commit a161441 into gk_ai_sheath May 19, 2026
1 check passed
@Antoinehoff
Copy link
Copy Markdown
Collaborator Author

It looks like this particular PR is a minimal working example of vcut, because my guess is that alpha(x,y,mu) is provided by GYRAZE in production? Like you don't make alpha within the bc_sheath_gyrokinetic. You have helper functions for acquiring the pointer to the vcut array, but you make the ranges for indexing vcut within this function? It's hard to trace the program flow for where alpha is defined because it looks like bc_sheath_gyrokinetic is managing some of the memory, but the computation of vcut must be external to bc_sheath_gyrokinetic? There's a question about why the vcut allocation and range initialization is handled by bc_sheath_gyrokinetic if the manipulation of this memory is external to this object (like you could treat vcut like how we treat auxiliary arrays for the DG gyrokinetic equation object; you could just pass a pointer to the vcut array and the vcut range to bc_sheath_gyrokinetic at t=0, and then let some other updater update vcut in the same way you have a different updater updater Apar/phi/etc.)

Ok i get it better now. Indeed it is not needed to allocate a vcut array at all and instead just pass a function to evaluate vcut(x,y,mu) returning a double. The vcut array idea started because we wanted to maybe update the vcut curve every n step to avoid expensive computations. I will remove this array later in the aisheath branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants