Using Titan and Chester

Fun with batch queues

When submitting your batch jobs, make sure to specify the account as TRN001 and not trn001. Our reservation is for the account TRN001, and it is case-sensitive. Normally qsub makes the account number uppercase before handing things off to the scheduler but that doesn't appear to be the case on Chester.

Run small "interactive" jobs on Chester:

qsub -I -l nodes=1,walltime=3600 -A TRN001 -V get the prompt and run your test you will need to run aprun from $MEMBERWORK/trn001 Don't forget the TRN001 or you won't be able to run.

If you get an "illegal instruction" error, this probably means that you forgot aprun.

If you get an error from aprun like unable to change directory, this probably means you're running from your home directory rather than lustre.

Using the module command from scripts:

Source /opt/modules/default/init/{bash,csh,sh,zsh} or $MODULESHOME/init/{bash,csh,sh,zsh}. Either should work; the second has the benefit of portability between systems (since different systems install modules in different locations

How to speed-up data movement

Adding the async clause to data transfers will often force the data to be put into "pinned" memory, which will double the bandwidth of your transfers. With the PGI compiler, you may also want to try adding -ta=tesla:pin. Be sure to acc wait before you access it on the host again.

Using cuda-memcheck & nvprof on XK7

Before using cuda-memcheck or nvprof on the XK7, you must set PMI_NO_FORK=1 in your runtime environment. You can then do: export PMI_NO_FORK=1 aprun -n 1 -N 1 cuda-memcheck ./a.out Limit this to a very short run to prevent a lot of output.

See slides on nvprof here

Here's OLCF documentation on profiling

Command Line Profilers

To use PGI/NVIDIA's command line profile information: Set the PGI environment variable PGI_ACC_TIME=1 to enable OpenACC profile information after your run.
Set the NVIDIA environment variable COMPUTE_PROFILE=1 to enable profile information from the CUDA driver. Profile information will be outputted to the file "cuda_profile.log". Beyond the basic time information, you can also create a configuration file to gather hardware profile counters..Full details can be found at: http://docs.nvidia.com/cuda/profiler-users-guide/#compute-command-line-profiler-overview

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Titan and Chester

Fun with batch queues

Run small "interactive" jobs on Chester:

Using the module command from scripts:

How to speed-up data movement

Using cuda-memcheck & nvprof on XK7

Command Line Profilers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally