Hi,
I am trying to run PEPPI on a local machine (linux), using 3 protein sequences as a test run.
I have installed/compiled from source psipred (psipred.4.02.tar.gz) and blast (blast-2.2.9-amd64-linux.tar.gz) as recommended.
I have installed/compiled from source latest hh-suite (https://github.com/soedinglab/hh-suite) and its dependencies, including its latest pdb70 database (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).
I have downloaded and compiled PEPPI as in the install.sh, with the following config:
Are you on a slurm HPC system? (WARNING: PEPPI will run slowly without HPC parallelization) [y/n] n
Full path of where you wish to install PEPPI: /home/hae/anaconda3/envs/peppi
Full path to your HHsuite installation: /home/hae/bin/HHSuite3/hh-suite/build
Full path to the database used for hhblits: /home/hae/bin/HHSuite3/pdbDB/pdb70
Full path of your python interpreter: /home/hae/anaconda3/envs/peppi/bin/python
What is your C++ compiler? g++
What is your fortran compiler? gfortran
The working directory - where the main pipeline script is launched - has the following tree:
├── A.fasta
├── B.fasta
├── LICENSE
├── PEPPI1.pl
├── PEPPIcontainer
│ ├── PEPPIconda.yml
│ └── PEPPIcontainer.def
├── README.md
├── bin
│ ├── CTNN
│ ├── CTmod
│ ├── CTpred.py
│ ├── NWalign
│ ├── PEPPI2temp.pl
│ ├── PEPPI3temp.py
│ ├── PRISMmod
│ ├── SEQmod
│ ├── SPRINGNEGmod
│ ├── SPRINGmod
│ ├── STRINGmod
│ ├── TMalign
│ ├── blastp
│ ├── charge_inp.dat
│ ├── compileRes.sh
│ ├── compiled_source
│ ├── dcomplex
│ ├── dimMap
│ ├── fort.21_alla
│ ├── getHashcode.py
│ ├── install.sh
│ ├── makeHHR.pl
│ ├── model_multiD
│ ├── multiwrapper.pl
│ ├── oldcomplex
│ ├── runSetWrapper.pl
│ ├── seqSearch.pl
│ ├── trainCT.py
│ └── trainDists.py
├── cmd.sh
├── install.sh
├── lib
│ ├── CTtrainvecs.txt
│ ├── SEQ
│ ├── SPRINGDB
│ ├── STRING
│ └── trainNB.txt
└── test
├── A.fasta
├── B.fasta
├── LR.csv
├── PEPPI2.pl
├── PEPPI3.py
├── PPI
├── allres.txt
├── mono
├── protcodeA.csv
└── protcodeB.csv
There is no SPRINGDB/70negpos.db_cs219.ffdata file to be found from the original install/download of PEPPI:
├── 70CDHITstruct.txt
├── 70negpos.db
├── 70negpos.mono
├── 70negs.txt
├── monomers
└── monomers.aliases
I have checked that the scripts bin/makeHHR.pl and bin/seqSearch.pl had the correct local paths.
When I launch the main script:
the pipeline seems to run fine at the beginning (hh-suite functions kicking in as expected), with the output directory and its content as this:
├── PEPPI
│ ├── A.fasta
│ ├── B.fasta
│ ├── PEPPI2.pl
│ ├── mono
│ ├── protcodeA.csv
│ └── protcodeB.csv
But the pipeline fails to find "SPRINGDB/70negpos.db_cs219.ffdata", thus failing to output final results file:
prot1
HHR
- 12:32:53.429 INFO: Search results will be written to /tmp/hae/makeHHR_prot1_464127/prot1.hhr
- 12:32:53.456 INFO: Searching 92111 column state sequences.
- 12:32:53.501 INFO: /tmp/hae/makeHHR_prot1_464127/prot1.fasta is in A2M, A3M or FASTA format
- 12:32:53.501 INFO: Iteration 1
- 12:32:53.666 INFO: Prefiltering database
- 12:32:54.078 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 4748
- 12:32:54.126 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 2755
- 12:32:54.126 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 2755
- 12:32:54.126 INFO: Scoring 2755 HMMs using HMM-HMM Viterbi alignment
- 12:32:54.230 INFO: Alternative alignment: 0
- 12:32:55.312 INFO: 2000 alignments done
- 12:32:55.831 INFO: 2755 alignments done
- 12:32:55.833 INFO: Alternative alignment: 1
- 12:32:57.405 INFO: 2650 alignments done
- 12:32:57.410 INFO: Alternative alignment: 2
- 12:32:57.914 INFO: 467 alignments done
- 12:32:57.914 INFO: Alternative alignment: 3
- 12:32:58.132 INFO: 76 alignments done
- 12:32:58.924 INFO: Premerge done
- 12:32:58.924 INFO: Realigning 500 HMM-HMM alignments using Maximum Accuracy algorithm
- 12:34:56.226 INFO: 1284 sequences belonging to 1284 database HMMs found with an E-value < 0.001
- 12:34:56.226 INFO: Number of effective sequences of resulting query HMM: Neff = 11.2888
- 12:34:56.239 INFO: Iteration 2
- 12:34:56.239 INFO: Set premerge to 0! (premerge: 3 iteration: 2 hits.Size: 1281)
- 12:34:56.407 INFO: Prefiltering database
- 12:34:56.820 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 4871
- 12:34:56.863 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 2556
- 12:34:56.863 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 1389
- 12:34:56.863 INFO: Scoring 1389 HMMs using HMM-HMM Viterbi alignment
- 12:34:56.950 INFO: Alternative alignment: 0
- 12:34:57.865 INFO: 1389 alignments done
- 12:34:57.868 INFO: Alternative alignment: 1
- 12:34:58.706 INFO: 1228 alignments done
- 12:34:58.708 INFO: Alternative alignment: 2
- 12:34:58.929 INFO: 79 alignments done
- 12:34:58.930 INFO: Alternative alignment: 3
- 12:34:59.108 INFO: 31 alignments done
- 12:34:59.156 INFO: Rescoring previously found HMMs with Viterbi algorithm
- 12:34:59.233 INFO: Alternative alignment: 0
- 12:34:59.744 INFO: 1167 alignments done
- 12:34:59.747 INFO: Alternative alignment: 1
- 12:35:00.282 INFO: 1167 alignments done
- 12:35:00.285 INFO: Alternative alignment: 2
- 12:35:00.456 INFO: 196 alignments done
- 12:35:00.456 INFO: Alternative alignment: 3
- 12:35:00.500 INFO: 32 alignments done
- 12:35:00.571 INFO: Realigning 500 HMM-HMM alignments using Maximum Accuracy algorithm
- 12:35:02.405 INFO: 1284 sequences belonging to 1284 database HMMs found with an E-value < 0.001
- 12:35:02.405 INFO: Number of effective sequences of resulting query HMM: Neff = 11.2888
$ cp /tmp/hae/makeHHR_prot1_464127/prot1.a3m /tmp/2YVboks0ez/9UFyIA7YHI.1.in.a3m
Filtering alignment to diversity 7 ...
$ hhfilter -v 1 -neff 7 -i /tmp/2YVboks0ez/9UFyIA7YHI.in.a3m -o /tmp/2YVboks0ez/9UFyIA7YHI.in.a3m
$ /home/hae/bin/HHSuite3/hh-suite/build/scripts/reformat.pl -v 1 -r -noss a3m psi /tmp/2YVboks0ez/9UFyIA7YHI.in.a3m /tmp/2YVboks0ez/9UFyIA7YHI.in.psi
Predicting secondary structure with PSIPRED ... $ /home/hae/bin/HHSuite3/BLAST//blastpgp -b 1 -j 1 -h 0.001 -d /home/hae/bin/HHSuite3/hh-suite/build/data/do_not_delete -i /tmp/2YVboks0ez/9UFyIA7YHI.sq -B /tmp/2YVboks0ez/9UFyIA7YHI.in.psi -C /tmp/2YVboks0ez/9UFyIA7YHI.chk 1> /tmp/2YVboks0ez/9UFyIA7YHI.blalog 2> /tmp/2YVboks0ez/9UFyIA7YHI.blalog
$ echo 9UFyIA7YHI.chk > /tmp/2YVboks0ez/9UFyIA7YHI.pn
$ echo 9UFyIA7YHI.sq > /tmp/2YVboks0ez/9UFyIA7YHI.sn
$ /home/hae/bin/HHSuite3/BLAST//makemat -P /tmp/2YVboks0ez/9UFyIA7YHI
$ /home/hae/bin/HHSuite3/PSIPRED/psipred/bin/psipred /tmp/2YVboks0ez/9UFyIA7YHI.mtx /home/hae/bin/HHSuite3/PSIPRED/psipred/data/weights.dat /home/hae/bin/HHSuite3/PSIPRED/psipred/data/weights.dat2 /home/hae/bin/HHSuite3/PSIPRED/psipred/data/weights.dat3 > /tmp/2YVboks0ez/9UFyIA7YHI.ss
$ /home/hae/bin/HHSuite3/PSIPRED/psipred/bin/psipass2 /home/hae/bin/HHSuite3/PSIPRED/psipred/data/weights_p2.dat 1 0.98 1.09 /tmp/2YVboks0ez/9UFyIA7YHI.ss2 /tmp/2YVboks0ez/9UFyIA7YHI.ss > /tmp/2YVboks0ez/9UFyIA7YHI.horiz
done
- 12:35:03.826 INFO: /tmp/hae/makeHHR_prot1_464127/prot1.a3m is in A2M, A3M or FASTA format
- 12:35:03.847 WARNING: MSA prot1 looks too diverse (Neff=12.227>11). Better check it with an alignment viewer for non-homologous segments. Also consider building the MSA with hhblits using the - option to limit MSA diversity.
- 12:35:03.853 INFO: Search results will be written to /tmp/hae/makeHHR_prot1_464127/prot1.hhr
- 12:35:03.853 ERROR: In /home/hae/bin/HHSuite3/hh-suite/src/ffindexdatabase.cpp:11: FFindexDatabase:
- 12:35:03.853 ERROR: could not open file '/home/hae/anaconda3/envs/peppi/PEPPI/lib/SPRINGDB/70negpos.db_cs219.ffdata'
benchmark: 0
Target: prot1
Query prot1
Match_columns 234
No_of_seqs 552 out of 4235
Neff 11.2888
Searched_HMMs 2908
Date Thu Jun 29 12:35:02 2023
Command /home/hae/bin/HHSuite3/hh-suite/build/bin/hhblits -i /tmp/hae/makeHHR_prot1_464127/prot1.fasta -oa3m /tmp/hae/makeHHR_prot1_464127/prot1.a3m -d /home/hae/bin/HHSuite3/pdbDB/pdb70 -n 2 -e 0.001
How to I obtain the SPRINGDB/70negpos.db_cs219.ffdata file ?
Thanks for your help in advance!
Hi,
I am trying to run PEPPI on a local machine (linux), using 3 protein sequences as a test run.
I have installed/compiled from source psipred (psipred.4.02.tar.gz) and blast (blast-2.2.9-amd64-linux.tar.gz) as recommended.
I have installed/compiled from source latest hh-suite (https://github.com/soedinglab/hh-suite) and its dependencies, including its latest pdb70 database (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).
I have downloaded and compiled PEPPI as in the install.sh, with the following config:
The working directory - where the main pipeline script is launched - has the following tree:
There is no SPRINGDB/70negpos.db_cs219.ffdata file to be found from the original install/download of PEPPI:
I have checked that the scripts bin/makeHHR.pl and bin/seqSearch.pl had the correct local paths.
When I launch the main script:
the pipeline seems to run fine at the beginning (hh-suite functions kicking in as expected), with the output directory and its content as this:
But the pipeline fails to find "SPRINGDB/70negpos.db_cs219.ffdata", thus failing to output final results file:
How to I obtain the SPRINGDB/70negpos.db_cs219.ffdata file ?
Thanks for your help in advance!