## Using conda on the cluster ### Introduction Conda is a package manager for python and other tools that will be important for your work on the cluster. However, it can be quite cumbersome to run and is prone to errors. Operating a `conda` installation on Saga is not straightforward as it installs many files and can quickly take up a lot of room - i.e. filling your `$HOME` directory. For most uses, such as the genotyping and phasing pipelines, you should use the preinstalled `conda` and `mamba` (the faster version of conda) modules on Saga. This guide will explain how to do that and also how to create your own environments that you can repeatedly load from the `/cluster/projects/nn10082k/` directory. The rationale here is to follow the [Sigma2 guidelines](https://documentation.sigma2.no/software/userinstallsw/conda.html) on this and it saves us from eaching having a separate conda installation that takes up a lot of space on our shared project directory. It is also designed to make it easier to share scripts that are reproducible among members of the group. ### Loading the conda and mamba modules Loading the previously installed modules on Saga is very straightforward with the `module` command. For example: ``` module load Miniconda3/23.10.0-1 ``` This will load both `miniconda`. Whereas the following will load `mamba`. ``` module load Mamba/23.11.0-0 ``` You can use `module avail miniconda` to search for all the available packages, this is worth doing as there might be multiple versions of miniconda or the version installed on Saga may be updated. Once you have loaded one of these, you will need to activate them so that `conda` actually runs. For example the following activation line will load `miniconda`: ``` source ${EBROOTMINICONDA3}/bin/activate ``` And this will activate `mamba`: ``` source ${EBROOTMAMBA}/bin/activate ``` You will know these have worked because your command line prompt will now say `(base)` next to it. You will always need to run this line when starting a new terminal, logging in to Saga or at the start of scripts that make use of `conda` packages or environments. ### Running previously installed environments I have installed several environments for the phasing and genotyping pipelines in the project directory. These are maintained at `/cluster/projects/nn10082k/conda_group`. There is a `README` in this directory explaining what each of them is. It is very simple to load these and run them so that you can make use of the pipelines. All you need to do is point `conda` or `mamba` to them to do so. Here I will show you how to load the genotyping and phasing environments using mamba. To load the genotyping pipeline, use the following command (once `conda` is loaded): ``` conda activate /cluster/projects/nn10082k/conda_group/nextflow ``` Your prompt will change and you can now use all the programs that are contained in this environment. To do the same for the phasing environment it is simply: ``` conda activate /cluster/projects/nn10082k/conda_group/phase ``` ### Installing programs with conda and creating your own environments It is quite easy to install packages with `conda`. However as a rule, you should **not** install anything in the `base` environment. Instead you should create your own environments and install packages into a **specific location** on the cluster. I have created the following location to ensure that everyone has a space for installation. You should only install things here: `/cluster/projects/nn10082k/conda_users`. Create a directory with your username and we will then point `conda` only to this location. So for example, my folder is: ``` /cluster/projects/nn10082k/conda_users/msravine ``` Next, we need to set an environmental variable to ensure `conda` uses this directory as a cache - i.e. a place to store everything it downloads and that we can easily maintain. Do so like this: ``` export CONDA_PKGS_DIRS=/cluster/projects/nn10082k/conda_users/username/package-cache ``` Be sure to replace the `username` part of this path with your own. It is very important to maintain the cache here so that it doesn't fill your home directory. **However** it can quickly fill with a lot of downloads so you should regularly clean it using the following command: ``` conda clean -a ``` Next, we need to create a new personal environment to install things into - remember, we are not using `base`. Here I will create an enviornment called `cpg` in my conda directory: ``` conda create -y --prefix /cluster/projects/nn10082k/conda_users/username/cpg ``` Note that here the `-y` flag will prevent `conda` asking for permission - it will just go ahead and set it up for you. Once this is done, you can activate the environment like so: ``` conda activate /cluster/projects/nn10082k/conda_users/username/cpg ``` With this loaded and installed, you can now install programs to that environment like so: ``` conda install bcftools ``` ### Using conda in a slurm script If you are using one of the lab pipelines that requires a conda installation - e.g. the genotyping pipeline - you should add the following lines at the start of your script. ``` module load Miniconda3/23.10.0-1 export CONDA_PKGS_DIRS=/cluster/projects/nn10082k/conda_users/username/package-cache conda activate /cluster/projects/nn10082k/conda_group/nextflow ``` This will load `miniconda` and activate the group environment `nextflow`. With this, you can essentially ignore the requirements for installing `conda` listed on the pipeline page [here](https://github.com/markravinet/genotyping_pipeline?tab=readme-ov-file#installing-conda). You **do not need** your own version of conda and this will make it much easier to run the pipeline without issue. ### Sharing environments with others If you create an environment for running a script that you want to share with others, you can easily do so by giving them the link to your environment - i.e.: ``` conda activate /cluster/projects/nn10082k/conda_users/username/my_environment ``` You might need to ensure other group members have permission to access this. You can do so with `chmod -R 775 /cluster/projects/nn10082k/conda_users/username/my_environment`. However once that is done, it is very easy for others to use your environment!