diff --git a/README.lastz.html b/README.lastz.html index 586a204..897c895 100644 --- a/README.lastz.html +++ b/README.lastz.html @@ -245,14 +245,14 @@
If you have received the distribution as a packed archive, unpack it
by whatever means are appropriate for your computer. The result should be
-a directory <somepath>/lastz‑distrib‑X.XX.XX that contains
+a directory <somepath>/lastz-distrib-X.XX.XX that contains
a src subdirectory (and some others). You may find it convenient
-to remove the revision number (‑X.XX.XX) from the directory name.
+to remove the revision number (-X.XX.XX) from the directory name.
Before building or installing any of the programs, you will need to tell the
installer where to put the executable, either by setting the shell variable
-$LASTZ_INSTALL, or by editing the make‑include.mak
+$LASTZ_INSTALL, or by editing the make-include.mak
file to set the definition of installDir. Also, be sure to add
the directory you choose to your $PATH.
@@ -328,7 +328,7 @@
The usual flow is as follows (though most of these steps are optional,
-and some settings like ‑‑anyornone
+and some settings like --anyornone
may affect the processing order).
We first read the target sequence(s) into memory, and use that to build a seed
word position table that will allow us to quickly map any word in the target to
@@ -393,19 +393,19 @@
This runs in about two and a half minutes on a 2-GHz workstation, requiring
only 400 Mb of RAM. Figure 1(a) shows the results, plotted using the
-‑‑format=rdotplot output option and
+--format=rdotplot output option and
the R statistical package.
(When in MAF format, LASTZ output can be browsed with
the GMAJ interactive viewer for multiple alignments, available from the
Miller Lab at Penn State.)
-Using ‑‑notransition lowers
+Using --notransition lowers
seeding sensitivity and reduces runtime (by a factor of about 10 in this case).
-‑‑step=20 also lowers seeding
+--step=20 also lowers seeding
sensitivity, reducing runtime and also reducing memory consumption (by a factor
of about 3.3 in this case).
-‑‑nogapped eliminates the
+--nogapped eliminates the
computation of gapped alignments. The complete alignment process using default
settings (shown in Figure 1(b)) uses 1.3 Gb of RAM and takes 4.5 hours on a
machine running at 2.83 GHz.
@@ -470,32 +470,32 @@
‑‑step=10, we will only be looking for
+--step=10, we will only be looking for
seeds at every 10th base. Instead of the default seed pattern, we use
-‑‑seed=match12 and
-‑‑notransition so our
+--seed=match12 and
+--notransition so our
seeds will be exact matches of 12 bases. Instead of the default
x-drop extension method we use
-‑‑exact=20 so that a 20-base
+--exact=20 so that a 20-base
exact match is required to qualify as an HSP. Because we are aligning short
reads, we specify
-‑‑noytrim so the alignment ends will
+--noytrim so the alignment ends will
not be trimmed back to the highest scoring locations during gapped extension.
We replace the default score set, which is for more distant species, with the
-stricter ‑‑match=1,5. This scores
+stricter --match=1,5. This scores
matching bases as +1 and mismatches as −5. We also use
-‑‑ambiguous=n so that Ns
+--ambiguous=n so that Ns
will be scored appropriately.
We are only interested in alignments that involve nearly an entire read, and
since the species are close we don't want alignments with low identity;
-therefore we use ‑‑coverage=90 and
-‑‑identity=95.
+therefore we use --coverage=90 and
+--identity=95.
For output, we are only interested in where the reads align, so we use the
-‑‑format=general option and specify
+--format=general option and specify
that we want the position on the chromosome (name1,
start1, length1) and the read name and orientation
(name2, strand2). This creates a tab-delimited
@@ -644,20 +644,20 @@
-
‑‑notrivial
+--notrivial
option. This performs the full computation on both copies, but doesn't report
the trivial self-alignment block along the main diagonal (Figure 3(b)).
-
‑‑self option in place
+--self option in place
of the query sequence. LASTZ will save work by computing with only one block
of each mirror-image pair, though it still reports both copies in the output by
reconstructing the second copy from the first. It also invokes
-‑‑notrivial automatically to omit the trivial self-alignment block
+--notrivial automatically to omit the trivial self-alignment block
along the main diagonal. This gives the same output as the previous method,
but runs faster (Figure 3(c)).
-
‑‑self in place of the
-query, and also add the ‑‑nomirror
+--self in place of the
+query, and also add the --nomirror
option. In this case LASTZ reports only one copy of each mirror-image pair,
as well as omitting the trivial block (Figure 3(d)).
@@ -756,18 +756,18 @@ ‑‑self the <query>
+--self the <query>
is not needed; otherwise if it is left unspecified the query sequences are read
from stdin
(though this does not work with random-access formats
like 2Bit).
As a special case, the <target> is
-omitted when the ‑‑targetcapsule
+omitted when the --targetcapsule
option is used, since the target sequence is embedded within the capsule file.
-For options, the general format is ‑‑<keyword> or
-‑‑<keyword>=<value>, but for BLASTZ compatibility
+For options, the general format is --<keyword> or
+--<keyword>=<value>, but for BLASTZ compatibility
some options also have an alternative syntax
<letter>=<number>.
(Be careful when copying options from the tables below, as some of the hyphens
@@ -842,7 +842,7 @@
‑‑self
+is only applicable when the --self
option is used.
--queryhsplimit=nowarn:<n>‑‑queryhsplimit=<n> but warnings for queries that
+Same as --queryhsplimit=<n> but warnings for queries that
exceed the limit are witheld.
-If ‑‑self is used, the default is to
+If --self is used, the default is to
re-create the redundant mirror-image alignment blocks in the output.
@@ -911,7 +911,7 @@
-Note that specifying ‑‑match changes the defaults for some of
+Note that specifying --match changes the defaults for some of
the other options (e.g. the scoring penalties for gaps, and various extension
thresholds), as described in their respective sections. The regular defaults
are chosen for compatibility with BLASTZ, but since BLASTZ doesn't support
-‑‑match, LASTZ infers that you are not expecting BLASTZ
+--match, LASTZ infers that you are not expecting BLASTZ
compatibility for this run, so it is free to use improved defaults.
This option cannot be used in conjunction with
-‑‑scores or
+--scores or
inference.
@@ -957,7 +957,7 @@
inference. These values specified on
the command line override any corresponding values from a file provided with
-‑‑scores.
+--scores.
@@ -1021,9 +1021,9 @@ ‑‑scores,
-‑‑match, or
-‑‑gap.
+--scores,
+--match, or
+--gap.
@@ -1032,7 +1032,7 @@ ‑‑infscores).
+alignment (requires --infscores).
stdout), in the same format expected
-by ‑‑scores.
+by --scores.
|
Default gap penalties are determined as follows. If
-‑‑match is
+--match is
specified, the open penalty is 3.25 times the mismatch penalty, and the extend
penalty is 0.24375 times the mismatch penalty. (These are the same ratios as
BLASTZ’s defaults.) Both penalties are rounded up to the nearest integer.
@@ -1123,8 +1123,8 @@
‑‑seed=match13, ‑‑step=15, and
-‑‑maxwordcount=90%. The gray bars show the percentage of
+--seed=match13, --step=15, and
+--maxwordcount=90%. The gray bars show the percentage of
seed word positions kept (the red line shows the ideal 90%). The blue numbers
show the equivalent count, which varies greatly.
@@ -1154,7 +1154,7 @@
The resulting masked intervals can be written to a file with the
-‑‑outputmasking=<file>
+--outputmasking=<file>
option.
@@ -1167,11 +1167,11 @@
‑‑step,
-‑‑maxwordcount,
-‑‑masking,
-‑‑seed,
-‑‑word.
+--step,
+--maxwordcount,
+--masking,
+--seed,
+--word.
@@ -1325,7 +1325,7 @@ ‑‑recoverseeds.
+--recoverseeds.
@@ -1345,7 +1345,7 @@ ‑‑twins.
+--twins.
@@ -1500,12 +1500,12 @@
-If ‑‑match scoring is used, the
+If --match scoring is used, the
default x-drop termination threshold is 10 times the square root of the
mismatch penalty, rounded up to the nearest integer. Otherwise the default
is 10 times the A-vs.-A substitution score.
-If ‑‑match scoring is used, the
+If --match scoring is used, the
default HSP score threshold is 30 times the match reward (equivalent to the
score of a 30-bp exact match). Otherwise the default is 3000.
@@ -1639,14 +1639,14 @@
-If ‑‑match scoring is used, the
+If --match scoring is used, the
default y-drop threshold is twice the x-drop threshold (or if x-drop extension
was not performed, twice what the default x-drop threshold would have been);
otherwise it is the score of a 300-bp gap.
The default for the gapped score threshold is to use the same value as the
HSP threshold (which is settable via
-‑‑hspthresh). If the HSP
+--hspthresh). If the HSP
threshold was adaptive, then the lowest-scoring
HSP that was kept is used for this default. If x-drop extension was not
performed, the value used is whatever the default HSP threshold would have been.
@@ -1711,7 +1711,7 @@
-For backwards compatibility, ‑‑matchcount=<min> has the
+For backwards compatibility, --matchcount=<min> has the
same meaning.
@@ -1758,7 +1758,7 @@
‑‑self
+are identical. Note that using --self
automatically enables this option.
general-[:<fields>].
-‑‑format=none can be used when no alignment output is desired.
+--format=none can be used when no alignment output is desired.
@@ -1855,7 +1855,7 @@
‑‑format=rdotplot, but this option
+--format=rdotplot, but this option
allows you to create the dotplot file without having to run the alignment twice.
@@ -1865,7 +1865,7 @@ ‑RG header line.
+the specification of tags for SAM's -RG header line.
<tags> is a tab-delimited list of
<tag>:<value> items. See the SAM specification for
details about which tags are required. LASTZ does not validate whether the
@@ -1939,7 +1939,7 @@ ‑‑masking=<count> option.
+--masking=<count> option.
The masked target intervals, resulting from alignment with all queries, are
written to a file in
sequence masking file format. The file is suitable
@@ -1948,9 +1948,9 @@ xmask, and
nmask sequence specifier actions.
In contrast with
-‑‑outputmasking:soft=<file>,
+--outputmasking:soft=<file>,
only those intervals created by the
-‑‑masking=<count> option
+--masking=<count> option
are reported.
‑‑outputmasking=<file>,
+--outputmasking=<file>,
except that masked intervals are wriiten to a file in
three field sequence masking file format, which
includes sequence names. The file is not suitable for later use as
@@ -1981,10 +1981,10 @@ xmask, and
nmask sequence specifier actions.
In contrast with
-‑‑outputmasking=<file>,
+--outputmasking=<file>,
all masked intervals in the target sequence are reported, regardless of whether
they were created by the
-‑‑masking=<count> option
+--masking=<count> option
or were in the sequence as it was originally input.
‑‑outputmasking:soft=<file>,
+--outputmasking:soft=<file>,
except that masked intervals are wriiten to a file in
three field sequence masking file format, which
includes sequence names. The file is not suitable for later use as
@@ -2039,7 +2039,7 @@ ‑‑segments
+used for input by the --segments
option. These anchor segments can then be used to anchor alignments
in a subsequent run of LASTZ. This can be useful if you want to filter HSPs in
some way before performing gapped extension, for example filtering them by
@@ -2095,7 +2095,7 @@ ‑‑include can be used in conjunction
+multiple lines in the file. --include can be used in conjunction
with other command line arguments.
Note that any shell-performed substitutions that would be performed on the @@ -2111,10 +2111,10 @@
<bytes> may contain an
M or K unit suffix if desired (indicating a
multiplier of 1,024 or 1,048,576, respectively). For example,
-‑‑allocate:traceback=80.0M is the same as
-‑‑allocate:traceback=83886080.
+--allocate:traceback=80.0M is the same as
+--allocate:traceback=83886080.
-For backwards compatibility, ‑‑traceback=<bytes> is also
+For backwards compatibility, --traceback=<bytes> is also
accepted.
‑‑allocate:target for further
+--allocate:target for further
details.
The memory needed for a sequence is L+1, where
@@ -2182,20 +2182,20 @@
‑‑yasra90 should be used when we expect 90% identity.
-The ‑‑yasraXXshort options are appropriate when the reads are very
+For example, --yasra90 should be used when we expect 90% identity.
+The --yasraXXshort options are appropriate when the reads are very
short (less than 50 bp).
| Option | Equivalent |
--yasra98 | T=2 Z=20 ‑‑match=1,6 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=98 ‑‑ambiguousn ‑‑noytrim |
--yasra95 | T=2 Z=20 ‑‑match=1,5 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=95 ‑‑ambiguousn ‑‑noytrim |
--yasra90 | T=2 Z=20 ‑‑match=1,5 O=6 E=1 Y=20 K=22 L=30 ‑‑identity=90 ‑‑ambiguousn ‑‑noytrim |
--yasra85 | T=2 ‑‑match=1,2O=4 E=1 Y=20 K=22 L=30 ‑‑identity=85 ‑‑ambiguousn ‑‑noytrim |
--yasra75 | T=2 ‑‑match=1,1O=3 E=1 Y=20 K=22 L=30 ‑‑identity=75 ‑‑ambiguousn ‑‑noytrim |
--yasra95short | T=2 ‑‑match=1,7O=6 E=1 Y=14 K=10 L=14 ‑‑identity=95 ‑‑ambiguousn ‑‑noytrim |
--yasra85short | T=2 ‑‑match=1,3O=4 E=1 Y=14 K=11 L=14 ‑‑identity=85 ‑‑ambiguousn ‑‑noytrim |
--yasra98 | T=2 Z=20 --match=1,6 O=8 E=1 Y=20 K=22 L=30 --identity=98 --ambiguousn --noytrim |
--yasra95 | T=2 Z=20 --match=1,5 O=8 E=1 Y=20 K=22 L=30 --identity=95 --ambiguousn --noytrim |
--yasra90 | T=2 Z=20 --match=1,5 O=6 E=1 Y=20 K=22 L=30 --identity=90 --ambiguousn --noytrim |
--yasra85 | T=2 --match=1,2O=4 E=1 Y=20 K=22 L=30 --identity=85 --ambiguousn --noytrim |
--yasra75 | T=2 --match=1,1O=3 E=1 Y=20 K=22 L=30 --identity=75 --ambiguousn --noytrim |
--yasra95short | T=2 --match=1,7O=6 E=1 Y=14 K=10 L=14 --identity=95 --ambiguousn --noytrim |
--yasra85short | T=2 --match=1,3O=4 E=1 Y=14 K=11 L=14 --identity=85 --ambiguousn --noytrim |
@@ -2203,20 +2203,20 @@
‑‑<shortcut>:<version>, where
+included. The syntax is --<shortcut>:<version>, where
<version> is the LASTZ version number that contained the
shortcut.
| Option | LASTZ version | Equivalent |
--yasra98:<version> | 1.02.45 or earlier | T=2 Z=20 ‑‑match=1,6 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=98 |
--yasra95:<version> | 1.02.45 or earlier | T=2 Z=20 ‑‑match=1,5 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=95 |
--yasra90:<version> | 1.02.45 or earlier | T=2 Z=20 ‑‑match=1,5 O=6 E=1 Y=20 K=22 L=30 ‑‑identity=90 |
--yasra85:<version> | 1.02.45 or earlier | T=2 ‑‑match=1,2O=4 E=1 Y=20 K=22 L=30 ‑‑identity=85 |
--yasra75:<version> | 1.02.45 or earlier | T=2 ‑‑match=1,1O=3 E=1 Y=20 K=22 L=30 ‑‑identity=75 |
--yasra95short:<version> | 1.02.45 or earlier | T=2 ‑‑match=1,7O=6 E=1 Y=14 K=10 L=14 ‑‑identity=95 |
--yasra85short:<version> | 1.02.45 or earlier | T=2 ‑‑match=1,3O=4 E=1 Y=14 K=11 L=14 ‑‑identity=85 |
--yasra98:<version> | 1.02.45 or earlier | T=2 Z=20 --match=1,6 O=8 E=1 Y=20 K=22 L=30 --identity=98 |
--yasra95:<version> | 1.02.45 or earlier | T=2 Z=20 --match=1,5 O=8 E=1 Y=20 K=22 L=30 --identity=95 |
--yasra90:<version> | 1.02.45 or earlier | T=2 Z=20 --match=1,5 O=6 E=1 Y=20 K=22 L=30 --identity=90 |
--yasra85:<version> | 1.02.45 or earlier | T=2 --match=1,2O=4 E=1 Y=20 K=22 L=30 --identity=85 |
--yasra75:<version> | 1.02.45 or earlier | T=2 --match=1,1O=3 E=1 Y=20 K=22 L=30 --identity=75 |
--yasra95short:<version> | 1.02.45 or earlier | T=2 --match=1,7O=6 E=1 Y=14 K=10 L=14 --identity=95 |
--yasra85short:<version> | 1.02.45 or earlier | T=2 --match=1,3O=4 E=1 Y=14 K=11 L=14 --identity=85 |
<start> and <end> are required.
-A zoom factor can also be included, using the syntax
+A "zoom factor" can also be included, using the syntax
<start>..<end>+<zoom>%. The specified interval
is expanded on each end by <zoom> percent. This is useful
when you know, for example, the location of a gene, and would like to include
@@ -2343,7 +2343,7 @@
‑‑strand options instead.
+--strand options instead.
Note that subrange positions are always measured from the start of the
sequence provided in the file (i.e., counting along the
@@ -2617,7 +2617,7 @@
This table is one of the major space requirements of the program. Both the
memory and time required for seeding can be decreased by using sparse spacing.
-The
To locate seeds, the query sequence is parsed into seed words the same
way the target is (except that
-
-Exact match extension (
M-mismatch extension
-(
-In x-drop extension (
Figure 5(a) shows an alignment without chaining, while 5(b) shows the same
@@ -2956,7 +2956,7 @@
@@ -3013,13 +3013,13 @@
Once the above stages have been performed, it is not uncommon to have regions
left over in which no alignment has been found. In the interpolation stage
-(activated by the
The alignment blocks found by the preceding pipeline of stages are written to
A special case, non-conforming to the official standard, is made to allow a
special user-specified separator character.
@@ -3250,7 +3250,7 @@
Each sequence consists of four lines. The first line begins with a
-
Here is an example. If the target sequence is hg18.chr1, this would mask the
@@ -3499,9 +3499,9 @@
This file is created by using the
-
@@ -3514,7 +3514,7 @@
-This file is used with the
The default value is −100. There is no corresponding command-line option.
@@ -3625,7 +3625,7 @@
When LASTZ is asked to infer substitution scores and/or gap penalties from the
-input sequences (e.g. via the
-The option
-The option
-The option
-The option
@@ -4023,14 +4023,14 @@
For SAM files, LASTZ assumes that the target sequence is the reference and
that query sequence(s) are short reads. For alignments that don't reach the
-end of a query,
-The options
Exonerate CIGAR
@@ -4076,9 +4076,9 @@
-For
-For
-For
-For
-Sample output for
Sample output for
-
The syntax for this option is:
-The option
Field names are normally included as column headers in the first row of the
output, preceded by a
@@ -4926,9 +4926,9 @@
-The
@@ -5277,10 +5277,10 @@
Consider the following alignment of a 50-base query to a chromosome target, and
-suppose we are using
@@ -5305,13 +5305,13 @@
To avoid this behavior, use the
-
To use the capsule file, run LASTZ like this:
@@ -5395,12 +5395,12 @@
To have LASTZ infer scoring parameters, use
a suitably enabled build of LASTZ (see below), and specify
-the
-The
Though LASTZ provides several filtering options (e.g.
-
-The
-The
-The
-The final line (
@@ -5726,7 +5726,7 @@
-In situtations where 255 is too limiting,
-Also corrected the behavior of
Also changed the behavior when the
This change may also affect (for the better) the results of gapped extension
-when either
-Also added
Also changed the option name for match count filtering to
-
For consistency,
-Sequence Specifiers
sequence to be used instead of the sequence itself. Again, this should be
used with care, as it can lead to murky interactions with other features.
In BLASTZ it was needed for searching only the minus strand, but LASTZ provides
-a ‑‑strand option for that.
+a --strand option for that.
@@ -2677,10 +2677,10 @@ Indexing Target Seed Words
‑‑step option sets a
+The --step option sets a
step size: instead of examining every position, seed words are
stored only for multiples of the step size. Large step sizes (say,
-‑‑step=100) incur a loss of sensitivity, at least at the seeding
+--step=100) incur a loss of sensitivity, at least at the seeding
stage. However, to discover any gapped alignment block we only need to
discover one seed (of many) in that alignment, so the actual sensitivity loss
is small in most cases. Section 6.2 of [Harris 2007]
@@ -2700,10 +2700,10 @@ Indexing Target Seed Words
bases are left out of the seed word position table and skipped during seeding,
respectively, so they do not participate in the seeding stage.
‑‑maxwordcount can be used to remove
+--maxwordcount can be used to remove
frequently occurring target seed words from the position table before query
processing begins.
-‑‑masking) can
+--masking) can
be used to mask target positions that have occurred in too many alignments;
however this only affects subsequent query sequences.
@@ -2725,7 +2725,7 @@ Seeding
‑‑step does not apply to the query;
+--step does not apply to the query;
we look at every seed word).
Each packed seed word is used as an index into the target seed word position
table to find the target positions that have a seed match for this
@@ -2743,7 +2743,7 @@ Quantum Seeding:
is first converted to a quantum seeding ball of those DNA words that
are most similar to it. Similarity is determined by the scoring matrix; all
words with a combined substitution score above the quantum seeding threshold
-(set by the ‑‑ball option) are
+(set by the --ball option) are
considered to be in the ball. Then each word in the ball is looked up in the
target seed word position table as usual, with all such hits considered to be
seed matches for the q-word.
@@ -2782,7 +2782,7 @@ Gap-free Extension
exact match, M-mismatch, or x-drop.
‑‑exact) simply
+Exact match extension (--exact) simply
extends the seed until a mismatch is found. If the resulting length is enough,
the extended seed is kept as an HSP for further processing. Exact match
extension is most useful when the target and query are expected to be very
@@ -2790,17 +2790,17 @@ Gap-free Extension
‑‑<M>mismatch) extends the
+(--<M>mismatch) extends the
seed to find the longest interval that includes the entire seed and contains
no more than M mismatches. If the resulting length is enough,
the extended seed is kept as an HSP for further processing. M-mismatch
extension is most useful when the approximate divergence between the target
and query is known, and HSPs of a known length are desired.
It provides a way to specify both length and identity thresholds together,
-with more flexibility than ‑‑exact.
+with more flexibility than --exact.
‑‑xdrop), as we
+In x-drop extension (--xdrop), as we
extend in each direction we track the cumulative score for the extended match
according to the substitution scoring matrix. The extension is stopped when
the score drops off by more than the given x-drop threshold; that is, when the
@@ -2810,10 +2810,10 @@ Gap-free Extension
worse than −<dropoff> is encountered.)
The extension is then trimmed back to the peak point. If the combined score
of the seed plus both extensions meets the threshold set by the
-‑‑hspthresh option, it qualifies
+--hspthresh option, it qualifies
as an HSP and is kept for further processing. Matches that do not meet the
score threshold are discarded.
-The ‑‑entropy options control
+The --entropy options control
whether or not the scores are adjusted for nucleotide entropy when they are
compared to the threshold.
@@ -2823,9 +2823,9 @@ Adaptive Score Threshold:
HSP score threshold — set it too high and hardly anything will align, but
too low and the program will be swamped and not finish. LASTZ’s adaptive
scoring options
-(‑‑hspthresh=top<basecount>
+(--hspthresh=top<basecount>
and
-‑‑hspthresh=top<percentage>%)
+--hspthresh=top<percentage>%)
allow you to set the threshold indirectly to align the desired amount of the
target (as an approximate number of bases or as a percentage, respectively).
This way you can set it for, say, 10% (which will run quickly regardless of the
@@ -2848,7 +2848,7 @@ Diagonal Hashing:
LASTZ hashes diagonals to 16-bit values and tracks extensions only by the hash
value. While this saves space, it results in a miniscule loss of sensitivity
— LASTZ may miss some seeds due to hash collisions. Using
-‑‑recoverseeds will prevent losing
+--recoverseeds will prevent losing
these seeds, but will slow the program significantly. Moreover, since most
true alignments contain many HSPs, with many seeds in each HSP, the vast
majority of lost seeds have no effect on the final results.
@@ -2878,11 +2878,11 @@ HSP Chaining
processed in separate pipelines, it will not necessarily cause inversions to be
discarded.) If LASTZ’s implementation of chaining is not suitable, it is
possible to substitute another chaining program by first running LASTZ with the
-‑‑nogapped and
-‑‑writesegments
+--nogapped and
+--writesegments
options to get the HSPs, running a separate chaining program to filter them,
and then running the final stages of LASTZ on that output via the
-‑‑segments option.
+--segments option.
Gapped Extension
first). Gapped extension is performed
independently in both directions from the anchor point, and the two resulting
alignments are joined at the anchor. If the total score meets the threshold
-specified by the ‑‑gappedthresh
+specified by the --gappedthresh
option, the joined alignment is kept and passed to the next stage; otherwise it
is discarded. If the extension from one anchor happens to go through one or
more other anchors, the redundant anchors are dropped from the list.
@@ -2978,14 +2978,14 @@ Gapped Extension
the DP matrix examined is reduced by disallowing low-scoring regions (see
[Zhang 1998]): wherever the alignment score drops
below the peak score seen so far by more than the threshold specified in the
-‑‑ydrop option, the DP matrix is
+--ydrop option, the DP matrix is
truncated and no further cells are computed along that row or column.
By default the extension is then trimmed back to the location of the peak
score; thus the extension normally ends when all remaining sub-alignment
possibilities (paths in the DP matrix) begin with sections that score worse
than −<dropoff>. However for alignments
where the extension reaches the end of the sequence, you can suppress this
-trimming by specifying the ‑‑noytrim
+trimming by specifying the --noytrim
option, which is recommended when aligning short reads.
Back-end Filtering
Whatever alignment blocks have made it through the above gauntlet are then
subjected to
identity, continuity, coverage and match count filtering (as specified by the
-‑‑identity,
-‑‑continuity,
-‑‑coverage,
-‑‑filter=nmatch,
-‑‑filter=nmismatch,
-‑‑filter=ngapand
-‑‑filter=cgap options,
+--identity,
+--continuity,
+--coverage,
+--filter=nmatch,
+--filter=nmismatch,
+--filter=ngapand
+--filter=cgap options,
respectively). Blocks that do not meet the specified range for each feature are
discarded.
@@ -3033,7 +3033,7 @@ Back-end Filtering
Characters that differ only in upper vs. lower case are
counted as matches. Columns containing gaps or non-ACGT characters play no
part in this computation, and it is independent of the settings for
-‑‑ambiguous=n and
+--ambiguous=n and
bad_score. Identity cannot
be determined for alignments with quantum DNA, because
of the potential ambiguity of the symbols.
@@ -3097,7 +3097,7 @@ Interpolation
‑‑inner option) we
+(activated by the --inner option) we
repeat the seeding through gapped extension stages in these leftover regions,
at a presumably higher sensitivity. Using such high sensitivity from the
outset would be computationally prohibitive (due to the excessive number of
@@ -3155,7 +3155,7 @@ Alignment Output
stdout (or to a file specified with the
-‑‑output option) in the requested
+--output option) in the requested
format.
These may be seeds, gap-free HSPs, or gapped local alignments, depending on
which stages were performed. There is no particular order to the alignment
@@ -3182,7 +3182,7 @@ File Formats
sequences contain a series of A, C, G,
T, and N characters in upper or lower case.
Lower case indicates repeat-masked bases, while Ns represent
-unknown bases if the ‑‑ambiguous=n
+unknown bases if the --ambiguous=n
option is specified. (By default, a run of Ns or Xs
is used to separate sequences that have been catenated together for processing,
but this is now deprecated; see
@@ -3222,7 +3222,7 @@ FASTA (sequence input)
as a splicing character. However, LASTZ does not currently support
IUPAC-IUB ambiguity codes other than N (such as R,
W, etc.),
-beyond the treatment afforded by ‑‑ambiguous=iupac.
+beyond the treatment afforded by --ambiguous=iupac.
FASTQ (sequence input)
format, prohibiting line-wrapping within DNA or quality sequences.
‑ followed by the name of the sequence. The second line contains
+- followed by the name of the sequence. The second line contains
nucleotide characters. The third line begins with a +, optionally
followed by the name of the sequence (which, if present must match that of the
first line). The fourth line contains quality characters.
@@ -3454,9 +3454,9 @@ Sequence Masking File
nmask actions in a
sequence specifier.
It can also be created by using the
-‑‑outputmasking=<file>
+--outputmasking=<file>
or
-‑‑outputmasking:soft=<file>
+--outputmasking:soft=<file>
options.
It consists of one interval per
line, without sequence names. Lines beginning with a # are
@@ -3473,7 +3473,7 @@ Sequence Masking File
Note that the masking intervals are
counted along the forward strand, even if we are only
aligning to the reverse complement of the query specifier (i.e. for
-‑‑strand=minus).
+--strand=minus).
Sequence Masking File, Three Fields
format.
‑‑outputmasking+=<file>
+--outputmasking+=<file>
or
-‑‑outputmasking+:soft=<file>
+--outputmasking+:soft=<file>
options.
It consists of one interval per line, with sequence names.
Sequence Masking File, Three Fields
Note that the masking intervals are
counted along the forward strand, even if we are only
aligning to the reverse complement of the query specifier (i.e. for
-‑‑strand=minus).
+--strand=minus).
@@ -3522,7 +3522,7 @@ Scoring File
‑‑scores
+This file is used with the --scores
option to specify a set of (mostly) scoring-related parameters en masse.
The score set consists of a substitution matrix and other settings. The other
settings come first and are individually explained in the
@@ -3613,7 +3613,7 @@ Scoring File
This is used as a default for all cells of the scoring matrix that are not
otherwise set (either by the user or by LASTZ’s defaults). This is the
score used for Ns (unless
-‑‑ambiguous=n is specified on the
+--ambiguous=n is specified on the
command line).
Scoring File
<penalty>
This is identical to the
@@ -3634,7 +3634,7 @@ <open> field of the
-‑‑gap command line option.
+--gap command line option.
Scoring File
<penalty>
This is identical to the
@@ -3643,7 +3643,7 @@ <extend> field of the
-‑‑gap command line option.
+--gap command line option.
Scoring File
<offset>
This is identical to the
-
@@ -3651,8 +3651,8 @@ ‑‑step command line option.
+--step command line option.
Scoring File
seed<strategy>
-This corresponds to the
‑‑seed and
-‑‑transition command line options.
+This corresponds to the --seed and
+--transition command line options.
<strategy> must be one of the following, with no spaces:
12of19,transition
12of19,notransition
@@ -3667,7 +3667,7 @@ Scoring File
<percentage>%
This is identical to the
-
@@ -3676,7 +3676,7 @@ ‑‑ball command line option.
+--ball command line option.
Scoring File
<dropoff>
This is identical to the
-
@@ -3685,10 +3685,10 @@ ‑‑xdrop command line option.
+--xdrop command line option.
Scoring File
<score>
This is identical to the
-
@@ -3697,7 +3697,7 @@ ‑‑hspthresh command line option,
+--hspthresh command line option,
except that it does not currently support the
-‑‑hspthresh=top<basecount> or
-‑‑hspthresh=top<percentage>% variants.
+--hspthresh=top<basecount> or
+--hspthresh=top<percentage>% variants.
Scoring File
<dropoff>
This is identical to the
-
@@ -3706,7 +3706,7 @@ ‑‑ydrop command line option.
+--ydrop command line option.
Scoring File
<score>
This is identical to the
-
@@ -3718,7 +3718,7 @@ ‑‑gappedthresh command line option.
+--gappedthresh command line option.
Inference Control File
‑‑infer
+input sequences (e.g. via the --infer
option), this file is used to set parameters that control the inference
process.
@@ -3777,8 +3777,8 @@ Inference Control File
hsp_threshold and gapped_threshold correspond to
-the command line ‑‑hspthresh and
-‑‑gappedthresh options.
+the command line --hspthresh and
+--gappedthresh options.
The defaults are hsp_threshold=3000 and
gapped_threshold=hsp_threshold.
@@ -3793,14 +3793,14 @@ Inference Control File
gap_open_penalty and gap_extend_penalty correspond to
the command line
-‑‑gap=[<open>,]<extend>
+--gap=[<open>,]<extend>
option. These are used for the first iteration of gap-scoring inference.
The defaults are gap_open_penalty=3.25*worst_substitution and
gap_extend_penalty=0.24375*worst_substitution.
step corresponds to the command line
-‑‑step option. A large step, e.g.
+--step option. A large step, e.g.
step=100, could potentially speed up the inference process.
Ideally, this would base the inference on a sample of only one percent of the
whole. However, the sample actually ends up larger than that and is biased
@@ -3812,7 +3812,7 @@ Inference Control File
entropy corresponds to the command line
-‑‑entropy option. Legal values are
+--entropy option. Legal values are
on or off. If on, sequence entropy is incorporated
when filtering HSPs. The default is entropy=on.
@@ -3887,7 +3887,7 @@ Segment File
This list is either produced internally by LASTZ as a result of the
gap-free extension stage (see Overview), or read from
a user-supplied file via the
-‑‑segments option. The latter
+--segments option. The latter
causes LASTZ to skip the indexing, seeding, and gap-free extension stages and
begin with the chaining stage (or the next specified stage, if chaining is not
requested).
@@ -3965,9 +3965,9 @@ LAV (alignment output)
(same specification at PSU)
‑‑format=lav+text adds
+The option --format=lav+text adds
textual output for each alignment block (in the same
-format as the ‑‑format=text option), intermixed with the LAV
+format as the --format=text option), intermixed with the LAV
format. Such files are unlikely to be recognized by any LAV-reading program.
@@ -3980,7 +3980,7 @@ AXT (alignment output)
UCSC AXT specification
‑‑format=axt+ reports
+The option --format=axt+ reports
additional statistics with each block, in the form of comments. The exact
content of these comment lines may change in future releases of LASTZ.
@@ -3996,11 +3996,11 @@ MAF (alignment output)
UCSC MAF specification
‑‑format=maf+ reports
+The option --format=maf+ reports
additional statistics with each block, in the form of comments. The exact
content of these comment lines may change in future releases of LASTZ.
‑‑format=maf- suppresses
+The option --format=maf- suppresses
the MAF header and any comments. This makes it suitable for concatenating
output from multiple runs.
SAM (alignment output)
‑‑format=sam uses
-"hard clipping", while ‑‑format=softsam
+end of a query, --format=sam uses
+"hard clipping", while --format=softsam
uses "soft clipping". See the section on "clipped alignment" in the SAM
specification for an explanation of what this means.
‑‑format=sam- and
-‑‑format=softsam- suppress the SAM
+The options --format=sam- and
+--format=softsam- suppress the SAM
header lines. This makes them suitable for concatenating output from multiple
runs.
@@ -4048,10 +4048,10 @@ CIGAR (alignment output)
and as an
extended cigar string
in SAMtools. For
-‑‑format=cigar, LASTZ implements
+--format=cigar, LASTZ implements
Exonerate CIGAR. LASTZ implements other CIGAR variants for
-‑‑format=sam
-and as fields for ‑‑format=general.
+--format=sam
+and as fields for --format=general.
CIGAR (alignment output)
H runs to describe clipping operations for short sequences.
LASTZ implements combinations of these variants where appropriate; details
are described in
-‑‑format=general:cigar,
-‑‑format=general:cigarx
-and ‑‑format=sam.
+--format=general:cigar,
+--format=general:cigarx
+and --format=sam.
CIGAR (alignment output)
‑‑format=cigar, the alignment would be described by this line:
+For --format=cigar, the alignment would be described by this line:
cigar: query 3 56 + target <start> <end> <strand> <score> M 24 I 3 M 7 D 2 M 19
‑‑format=general:cigar, the
+For --format=general:cigar, the
alignment path would be described by this field:
24M3I7M2D19M
‑‑format=general:cigarx, the
+For --format=general:cigarx, the
alignment path would be described by this field:
16=X7=3I7=2DX18=
‑‑format=sam, the alignment path would
+For --format=sam, the alignment path would
be described by this field:
3H24M3I7M2D19M5H
@@ -4228,7 +4228,7 @@ Differences (alignment output)
perfect match for that block (i.e., no differences).
‑‑format=differences.
+Sample output for --format=differences.
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)(12) (13) (14)
chr22 14485783 14485784 + 49691432 EAYGRGI02GQ0SL 167 167 + 303 A - TGAGA... TGAGA...
@@ -4244,7 +4244,7 @@ Differences (alignment output)
‑‑format=general:name1,zstart1,end1,strand1,size1,name2,zstart2+,end2+,strand2,size2,text1,text2.
+--format=general:name1,zstart1,end1,strand1,size1,name2,zstart2+,end2+,strand2,size2,text1,text2.
chr22 14485616 14485920 + 49691432 EAYGRGI02GQ0SL 0 303 + 303 TGAGA... TGAGA...
chr22 14731668 14731964 + 49691432 EAYGRGI01EAV19 0 297 - 298 CTTCT... CTTCT...
@@ -4320,12 +4320,12 @@ General Output (alignment output)
- ‑‑format=general[:<fields>]
+ --format=general[:<fields>]
where <fields> is a comma-separated list of field names in
any desired order, with no spaces. For example
- ‑‑format=general:nmismatch,name1,strand1,start1,end1,name2,strand2,start2,end2
+ --format=general:nmismatch,name1,strand1,start1,end1,name2,strand2,start2,end2
will report each aligned interval pair and the number of mismatches in the
alignment of that pair, like this:
@@ -4356,7 +4356,7 @@ General Output (alignment output)
coverage.
‑‑format=mapping is a shortcut for ‑‑format=general
+The option --format=mapping is a shortcut for --format=general
with the following fields:
name1, zstart1, end1,
name2, strand2, zstart2+,
@@ -4366,8 +4366,8 @@ General Output (alignment output)
#. The options
-‑‑format=general-[:<fields>]
-and ‑‑format=mapping- suppress column headers. This makes
+--format=general-[:<fields>]
+and --format=mapping- suppress column headers. This makes
them suitable for concatenating output from multiple runs.
Non-ACGT Characters, Splicing, and Separation
Xs or Ns are used to mask out
regions that should not be aligned. However, it is inappropriate when the
sequences contain Ns to represent ambiguous bases. To handle this
-case, LASTZ provides the ‑‑ambiguous=n
+case, LASTZ provides the --ambiguous=n
option, which causes substitutions with N to be scored as zero.
-Additionally, the ‑‑ambiguous=iupac option
+Additionally, the --ambiguous=iupac option
causes the other IUPAC-IUB ambiguity codes (B, D, H, K, M, R, S, V,
W, and Y) to be treated this same as an ambiguous
N.
@@ -5168,25 +5168,25 @@ General seed patterns:
--seed=<pattern>
-The default seed is ‑‑seed=1110100110010101111, which is the same
+The default seed is --seed=1110100110010101111, which is the same
12-of-19 seed used as the default in BLASTZ.
Half-weight seed patterns:
If a seed pattern consists of only 0s and Ts, it is
implemented internally as a half-weight seed, which uses much less memory
(the same amount as a normal seed pattern half as long). Additionally,
-‑‑seed=half<length> can be used as shorthand to specify a
+--seed=half<length> can be used as shorthand to specify a
space-free half-weight seed (i.e., all Ts).
Single, double, or no transitions:
By default, one match position (a 1 in a spaced seed, or any
position in an N-mer match) is allowed to be a transition instead of a true
-match. ‑‑notransition disables this. Alternatively,
-‑‑transition=2 allows any two match positions to be
+match. --notransition disables this. Alternatively,
+--transition=2 allows any two match positions to be
transitions.
Filtering on transversions and matches:
-The ‑‑filter option imposes additional requirements on the number
+The --filter option imposes additional requirements on the number
of transversions and matches in a valid seed. This is especially useful in
conjunction with half-weight patterns. For example,
@@ -5213,7 +5213,7 @@
Applicable seeding options are
-Twin hit seeds:
<minsep> but not more than <maxsep>.
If <minsep> is omitted, zero is used (which means the
twin seeds may be adjacent but not overlap). Negative values can
-be used; for example ‑‑twins=‑5..10
+be used; for example --twins=-5..10
means the twins can overlap
by as much as 5 bases or can have as much as 10 bases between them.
@@ -5231,14 +5231,14 @@ Any-or-None Alignment
want to know is whether it aligned or not.
‑‑anyornone option is designed
+The --anyornone option is designed
for such cases, and can significantly improve alignment speed. Once any
qualifying alignment has been found, processing for the current query is
halted. The alignment is reported to the output, and then we immediately begin
processing the next query. A qualifying alignment is one that would normally
be output given the other parameter settings; for example it satisfies the
-scoring thresholds (‑‑hspthresh
-and/or ‑‑gappedthresh) and any
+scoring thresholds (--hspthresh
+and/or --gappedthresh) and any
back-end filters.
Y-drop Mismatch Shadow
‑‑match=1,5,
-‑‑gap=6,1,
-‑‑identity=97, and
-‑‑coverage=95. The entire
+suppose we are using --match=1,5,
+--gap=6,1,
+--identity=97, and
+--coverage=95. The entire
alignment as shown has 97.9% identity (46/47) and 100% coverage. However, the
first five bases (AGAAC vs. AGAAG) have a negative
score: four matches at +1 each and one mismatch at −5 gives a score
@@ -5291,7 +5291,7 @@ Y-drop Mismatch Shadow
we don't want to, and we will see a bias against mismatches near the ends of
reads. (Note that this anomaly arises because the alignment is terminated
abruptly by the end of the sequence rather than normally by a low-scoring
-region; also the ‑‑coverage option is more commonly used with
+region; also the --coverage option is more commonly used with
short reads than with longer sequences.)
Y-drop Mismatch Shadow
‑‑noytrim option when aligning short
+--noytrim option when aligning short
reads. This causes LASTZ to refrain from trimming such alignments back to the
highest-scoring location. Specifically, if the
gapped extension process encounters the end of the
sequence, it will keep that as the end of the alignment. In this case a
negatively-scoring prefix or suffix will be kept as long as it does not score
-worse than the ‑‑ydrop value.
+worse than the --ydrop value.
Using Target Capsule Files
lastz <target> --writecapsule=<capsule_file> [<seeding_options>]
‑‑seed,
-‑‑step,
-‑‑maxwordcount,
-and ‑‑word.
+--seed,
+--step,
+--maxwordcount,
+and --word.
Using Target Capsule Files
No additional effort on the part of the user is required to handle sharing of
the capsule data between separate runs. Nearly all options are allowed;
however the seeding options
-‑‑seed,
-‑‑step,
-‑‑maxwordcount,
-and ‑‑word
+--seed,
+--step,
+--maxwordcount,
+and --word
are not allowed, since these (or their byproducts) are already stored in the
-capsule file. Further, ‑‑masking
+capsule file. Further, --masking
is not allowed, because it would require modifying both the target sequence and
the target seed word position table, which are contained in the capsule.
@@ -5411,7 +5411,7 @@ Using Target Capsule Files
the same file; each instance will have its own virtual addresses for the
capsule data, but the physical memory is shared. There is no requirement for
more than one instance to actually use the capsule simultaneously. Running
-a single copy of lastz with ‑‑targetcapsule will work
+a single copy of lastz with --targetcapsule will work
fine, and in fact there may be a small speed improvement compared to running
the same alignment without a capsule.
@@ -5449,14 +5449,14 @@ Inferring Score Sets
‑‑infer or
-‑‑inferonly options. (The latter
+the --infer or
+--inferonly options. (The latter
will stop after inferring the parameters, without performing the final
alignment.) Settings for the inference process can be specified in a
control file included with these options.
‑‑infscores option causes the
+The --infscores option causes the
inferred scoring parameters to be written out to a separate file. If no
<output_file> is specified, it is written to the header
of the alignment output file, as a comment. As a last resort, if no alignment
@@ -5530,17 +5530,17 @@ Filtering With Shell Commands
‑‑identity,
-‑‑continuity,
-‑‑coverage,
-‑‑filter=nmatch,
-‑‑filter=nmismatch,
-‑‑filter=ngap and
-‑‑filter=cgap),
+--identity,
+--continuity,
+--coverage,
+--filter=nmatch,
+--filter=nmismatch,
+--filter=ngap and
+--filter=cgap),
sometimes these
are not sufficient for the task at hand. But in many cases it is still possible
to perform the desired filtering by using the
-‑‑format=general option in conjunction
+--format=general option in conjunction
with a simple
awk,
perl, or
@@ -5565,7 +5565,7 @@ Filtering With Shell Commands
-
+AXT field field for ‑‑format=generalAXT field field for --format=generalAlignment number (none) Chromosome (primary organism) name1
@@ -5643,25 +5643,25 @@ Alignment start (primary organism) start1Self-Masking a Sequence
target sequence, and overlapping 200bp fragments of the critter as the
queries.
‑‑masking=3 option enables
+The --masking=3 option enables
dynamic masking, which will mark any reference base appearing in 3 or more
alignments. Since the fragments overlap by a factor of two, we expect every
base will appear in two trivial alignments. Any more than that would be caused
by a duplication elsewhere.
‑‑progress+masking
+The --progress+masking
option causes lastz to give you a progress report after every 10 thousand
fragments. These reports come to the console (stderr) and look like this:
(16.933s) processing query 50,001: critter_21299501, masked 8,920,893/51,304,566 (17.4%)
‑‑format=none option inhibits the
+The --format=none option inhibits the
normal alignment output and
-‑‑format=outputmasking+:soft
+--format=outputmasking+:soft
tells lastz to write the final masked intervals to a file.
‑‑notransition
+The final line (--notransition
in this example) is whatever alignment scoring parameters you want to use.
What is appropriate will depend on the level of divergence you want to allow in
the masked duplications.
@@ -5715,7 +5715,7 @@ Differences from BLASTZ
The handling of bounding alignments in the DP matrix is different in LASTZ than
in BLASTZ. This is discussed in
Bounding Alignments in the DP Matrix. The
-‑‑allgappedbounds option can be
+--allgappedbounds option can be
used to revert to the bounding criteria used in BLASTZ.
Differences from BLASTZ
Y) in fasta sequences but was unclear about how these were scored.
Since we feel the user should be aware of how these bases are treated, LASTZ
rejects them by default. The
-‑‑ambiguous=iupac option permits them
+--ambiguous=iupac option permits them
but treats them the same as an ambiguous N. This is discussed in
Non-ACGT Characters.
@@ -5767,7 +5767,7 @@ Bounding Alignments in the DP Matrix
The correction for this is to only use alignments as bounds if they satisfy the
score threshold. This corrected behavior is now the default in LASTZ (as of
release 1.02.00). The
-‑‑allgappedbounds option can be
+--allgappedbounds option can be
used to revert to the bounding criteria used in BLASTZ.
@@ -5794,7 +5794,7 @@ Change History
@@ -5807,54 +5807,54 @@ 1.0.5 Aug/2/2008
Fixed a bug that in some cases caused a bus error when interpolated
-alignments (e.g. ‑‑inner=…) were used with multiple
+alignments (e.g. --inner=…) were used with multiple
queries.
Change History
1.0.21 Sep/9/2008
-Fixed a bug involving the default value for ‑‑gappedthresh
-(a.k.a. L) when ‑‑exact is used. The bug caused the
+Fixed a bug involving the default value for --gappedthresh
+(a.k.a. L) when --exact is used. The bug caused the
gapped threshold to be inordinately low, allowing undesirable alignment blocks
to make it to the output file.
Fixed a bug whereby Xs and Ns were treated as desirable substitutions when
-unit scores (e.g. ‑‑match=…) were used.
+unit scores (e.g. --match=…) were used.
-Re-implemented ‑‑twins=…. The previous implementation
+Re-implemented --twins=…. The previous implementation
improperly truncated the left-extension of HSPs. The new implementation is
slower and uses more memory.
-Added ‑‑census=<file>. The census counts the number of
+Added --census=<file>. The census counts the number of
times each base in the target sequence is part of an alignment block.
-Previously, ‑‑census produced a census only if the output format
+Previously, --census produced a census only if the output format
was LAV (the census is a special stanza in a LAV file). Otherwise the option
was ignored. Now, if a file is specified a census is written to that file.
The format of lines in the census is
<name> <position> <count>.
The position is one-based, and the count is limited to 255.
‑‑census16=<file>
-or ‑‑census32=<file> can be used, with limits of about
+In situtations where 255 is too limiting, --census16=<file>
+or --census32=<file> can be used, with limits of about
65 thousand and 4 billion, respectively. Note that these will respectively
double and quadruple the amount of memory used for the census. The default
census uses one byte per target sequence location.
-Added ‑‑format=<differences>, to support Galaxy. All
+Added --format=<differences>, to support Galaxy. All
differences (gaps and runs of mismatches) are reported, one per line.
-Added ‑‑anchors=<file> (eventually this was renamed to
-‑‑segments=<file>), giving the user the ability to bypass
+Added --anchors=<file> (eventually this was renamed to
+--segments=<file>), giving the user the ability to bypass
the seeding and gap-free extension stages.
@@ -5884,7 +5884,7 @@
Changed default gap penalties for unit scores (e.g.
- ‑‑match=…) to be relative to mismatch score (instead of
+--match=…) to be relative to mismatch score (instead of
match score).
Change History
@@ -5899,15 +5899,15 @@
-Changed defaults for xdrop and ydrop when ‑‑match scoring is
+Changed defaults for xdrop and ydrop when --match scoring is
used.
Change History
-Added ‑‑maxwordcount.
+Added --maxwordcount.
-Added ‑‑notrivial.
+Added --notrivial.
@@ -5934,7 +5934,7 @@
-Corrected problem with ‑‑subset action, which wasn't using
+Corrected problem with --subset action, which wasn't using
mangled sequence names.
Change History
@@ -5952,7 +5952,7 @@
-Added ‑‑format=rdotplot option.
+Added --format=rdotplot option.
Change History
-Added support for ‑‑format=cigar.
+Added support for --format=cigar.
@@ -5961,8 +5961,8 @@ Change History
@@ -5984,13 +5984,13 @@
-Corrected the behavior of ‑‑exact regarding lowercase and
-non-ACGT characters. ‑‑exact now considers, e.g., a lowercase A
+Corrected the behavior of --exact regarding lowercase and
+non-ACGT characters. --exact now considers, e.g., a lowercase A
to be a match for an uppercase A. Further, any non-ACGT characters now stop
the match.
Change History
-Added the ‑‑output option. In some batch systems, it is
+Added the --output option. In some batch systems, it is
difficult to redirect stdout into a file, so this option allows
the user to do it directly.
-Removed ‑‑quantum and ‑‑code options, replacing
+Removed --quantum and --code options, replacing
them with the quantum and quantum=<code_file>
sequence specifier actions. This is in preparation for allowing a quantum
target sequence.
@@ -6005,12 +6005,12 @@ Change History
extension was able to skip the boundary between sequences (this problem was
introduced in 1.1.25). Second, when the exact match should have extended to
the end of the sequence, it was being cut short by 1 bp (on either end). The
-latter problem was only evident for ‑‑nogapped; a gapped entension
+latter problem was only evident for --nogapped; a gapped entension
recovered the additional bases.
-Fixed several problems with ‑‑segment=<file>. First, if
+Fixed several problems with --segment=<file>. First, if
the file contained more than 4,000 segments, on some platforms the program would
segfault. Second, if a sequence subrange was being used, the limit test
comparing the segment interval to the subrange was incorrect. Third (if the
@@ -6019,28 +6019,28 @@ Change History
-Added ‑‑noytrim to prevent y-drop mismatch shadow, improving
+Added --noytrim to prevent y-drop mismatch shadow, improving
LASTZ’s ability to align short reads.
Set the default gapped extension score threshold to inherit the lowest HSP score in the
-case where ‑‑hspthresh=top<basecount> or
-‑‑hspthresh=top<percentage>% is used but
-‑‑gappedthresh=<score> is not (and gapped extension is
+case where --hspthresh=top<basecount> or
+--hspthresh=top<percentage>% is used but
+--gappedthresh=<score> is not (and gapped extension is
performed). Previously this case was trapped by a low level routine and the
alignment was halted.
Fixed a problem with the start2+ field of
-‑‑format=general. The position was left blank for alignments on
+--format=general. The position was left blank for alignments on
the + strand.
-Fixed a problem in which ‑‑writecapsule was rejected if
-‑‑seed=match<length> was used.
+Fixed a problem in which --writecapsule was rejected if
+--seed=match<length> was used.
@@ -6061,7 +6061,7 @@ Change History
@@ -6073,28 +6073,28 @@
-Changed how ‑‑format=cigar reports alignments on the negative
+Changed how --format=cigar reports alignments on the negative
strand. Apparently there is no complete spec for CIGAR format. Matching what
I see output by exonerate for certain cases is the best I can do.
Change History
-Added cigar field for ‑‑format=general.
+Added cigar field for --format=general.
-Added shingle field for ‑‑format=general.
+Added shingle field for --format=general.
-Added the ‑‑rdotplot=<file> option.
+Added the --rdotplot=<file> option.
-The ‑‑notrivial option now works with the multiple
+The --notrivial option now works with the multiple
sequence specifier action.
-Added ‑‑markend.
+Added --markend.
-Added ‑‑nameparse=darkspace.
+Added --nameparse=darkspace.
@@ -6112,13 +6112,13 @@ Change History
-Fixed a problem with the combination of ‑‑recoverseeds and ‑‑exact.
+Fixed a problem with the combination of --recoverseeds and --exact.
Recovered seeds were cut short by one base on the left end.
-Added ‑‑format=segments option. This was later replaced by
-‑‑writesegments.
+Added --format=segments option. This was later replaced by
+--writesegments.
@@ -6138,21 +6138,21 @@ Change History
1.02.00 Jan/12/2010
Relaxed the rejection of some output formats, which was too aggressive.
-Specifically, runs with ‑‑tableonly were rejected because of
+Specifically, runs with --tableonly were rejected because of
output format, even though no output would be generated in that format.
-Added the ability to set the ‑‑maxwordcount option as a
-percentage. Also, ‑‑maxwordcount=<limit> now allows
+Added the ability to set the --maxwordcount option as a
+percentage. Also, --maxwordcount=<limit> now allows
<limit> to be 1. Previously it was not allowed to be less
than 2.
The scoring matrix used during x-drop extension now reflects the use
-of ‑‑ambiguous=n. Previously, this matrix was not affected by
-‑‑ambiguous=n,
+of --ambiguous=n. Previously, this matrix was not affected by
+--ambiguous=n,
and N-vs-N matches and N-vs-other matches were scored as -100 (more
specifically, as fill_score) during gap-free extension. This
caused LASTZ to miss some HSPs, usually those containing an N-vs-N match, since
@@ -6167,28 +6167,28 @@ Change History
-Added ‑‑softmask=<mask_file> file action to permit
+Added --softmask=<mask_file> file action to permit
soft masking of specified intervals. Also added masking of the
interval complements —
-‑‑xmask=keep:<mask_file>,
-‑‑nmask=keep:<mask_file>, and
-‑‑softmask=keep:<mask_file>. These make it easier to
+--xmask=keep:<mask_file>,
+--nmask=keep:<mask_file>, and
+--softmask=keep:<mask_file>. These make it easier to
restrict alignment to several specified intervals of a sequence.
-Enabled the use of ‑‑filter=[<transv>,]<matches>
-for non-halfweight seeds. Previously, ‑‑filter had only been
+Enabled the use of --filter=[<transv>,]<matches>
+for non-halfweight seeds. Previously, --filter had only been
tested for half-weight seeds, but was erroneously prohibited for
all seeds (instead of just prohibiting non-halfweight seeds). Further, it
-was not properly implemented for seed-only output (‑‑nogfextend
-‑‑nogapped). These have all been corrected, and ‑‑filter
+was not properly implemented for seed-only output (--nogfextend
+--nogapped). These have all been corrected, and --filter
is now available for all seed types.
‑‑filter regarding lowercase and
-non-ACGT characters. ‑‑filter now considers, e.g., a lowercase
+Also corrected the behavior of --filter regarding lowercase and
+non-ACGT characters. --filter now considers, e.g., a lowercase
a to be a match for an uppercase A. Further, for the
-purposes of ‑‑filter, any non-ACGT characters are considered to be
+purposes of --filter, any non-ACGT characters are considered to be
transversions.
<transv> field is absent.
@@ -6204,12 +6204,12 @@ Change History
Currently this only affects the handling of file paths. To activate it,
the user must add -DcompileForWindows to the definition of
definedForAll in
-.../lastz‑distrib‑X.XX.XX/src/Makefile.
+.../lastz-distrib-X.XX.XX/src/Makefile.
-Fixed chaining of seed hits. Previously, if ‑‑nogfextend and
-‑‑chain were used together, nothing was output. This was due to
+Fixed chaining of seed hits. Previously, if --nogfextend and
+--chain were used together, nothing was output. This was due to
the fact that unextended seeds had no scores, and the chaining algorithm only
reports chains with positive score. This has been corrected by calculating
scores (as the sum of substitution scores) over anchor segments whenever (a)
@@ -6217,18 +6217,18 @@ Change History
for later processing.
‑‑nogfextend or ‑‑exact is used. Gapped
+when either --nogfextend or --exact is used. Gapped
extension processes the anchors highest score first. Since
-‑‑nogfextend left all scores zero, the actual order in which gapped
+--nogfextend left all scores zero, the actual order in which gapped
extension was performed in that case was dependent on how the sort routine (the
-C runtime routine qsort) deals with ties. For ‑‑exact, the score
+C runtime routine qsort) deals with ties. For --exact, the score
was the length of the match. This has been changed to the segment’s
substitution score.
-Changed ‑‑format=segments to
-‑‑writesegments=<file>.
+Changed --format=segments to
+--writesegments=<file>.
@@ -6267,11 +6267,11 @@ Change History
-Added the ‑‑anyornone option.
+Added the --anyornone option.
-Added ‑‑allgappedbounds.
+Added --allgappedbounds.
@@ -6298,25 +6298,25 @@ Change History
-Added ‑‑progress=[<N>].
+Added --progress=[<N>].
This existed as an unadvertized option in earlier versions of the program, as
-‑‑debug=queryprogress=<N>. It has now been promoted to a
+--debug=queryprogress=<N>. It has now been promoted to a
first class option.
-Added ‑‑ambiguous=iupac and changed ‑‑ambiguousn to
-‑‑ambiguous=n. the former is still supported, but not advertized.
+Added --ambiguous=iupac and changed --ambiguousn to
+--ambiguous=n. the former is still supported, but not advertized.
-Column headers for ‑‑format=general now match the command-line
+Column headers for --format=general now match the command-line
keywords. Previously, all related keywords shared the same column header.
For example, keywords start2, zstart2,
start2+ and zstart2+ all produced the same column
header, start2, in the output file.
‑‑format=general-.
+Also added --format=general-.
@@ -6329,12 +6329,12 @@ Change History
Added nmatch, nmismatch, ngap,
cgap and cigarx fields for
-‑‑format=general.
+--format=general.
@@ -6342,19 +6342,19 @@
-Added ‑‑format=mapping, a shortcut for typical fields for
-‑‑format=general for mapping reads.
+Added --format=mapping, a shortcut for typical fields for
+--format=general for mapping reads.
Change History
1.02.11 Aug/21/2010
-Fixed the cigarx field for ‑‑format=general, so
+Fixed the cigarx field for --format=general, so
that a run length of 1 is omitted for indels.
-Fixed the behavior of ‑‑recoverseeds, which was failing to
+Fixed the behavior of --recoverseeds, which was failing to
recover many HSPs when seed denisty was high. This was due to left extension
being blocked by other seeds on that same hash-equivalent diagonal. Left
-extension is now unblocked when ‑‑recoverseeds is enabled.
+extension is now unblocked when --recoverseeds is enabled.
@@ -6381,7 +6381,7 @@
-Changed/corrected how the ‑‑segment option handles wildcard names
+Changed/corrected how the --segment option handles wildcard names
when the multiple action in used. To support this, the
rewind command was added to the segments file format.
Change History
-Fixed the implementation of ‑‑self with regard to mirror-image
+Fixed the implementation of --self with regard to mirror-image
pairs. Previously, alignments were internally restricted to be above the main
diagonal in the ungapped stage only. The mirrored twins were created prior to
the gapped stage, and the gapped stage operated on the full set of anchors.
@@ -6397,7 +6397,7 @@ Change History
1.02.16 Nov/2/2010
-Fixed a problem with ‑‑self, introduced in 1.02.11. The problem
+Fixed a problem with --self, introduced in 1.02.11. The problem
manifested itself on 64-bit CPUs, with an error message indicating it was
attempting to allocate 17 billion bytes for edit_script_copy. This has been
corrected.
@@ -6420,14 +6420,14 @@ Change History
-Added ‑‑format=blastn.
+Added --format=blastn.
Added idfrac, id%, blastid%,
covfrac, cov%, confrac,
con%, ncolumn, and npair fields for
-‑‑format=general.
+--format=general.
@@ -6447,25 +6447,25 @@ Change History
is any useful reason to set gap extension to zero.
-Added ‑‑format=rdotplot+score and
-‑‑rdotplot+score=<file>.
+Added --format=rdotplot+score and
+--rdotplot+score=<file>.
-Improved ‑‑masking=<count> so that it can allow a count
+Improved --masking=<count> so that it can allow a count
threshold greater than 254.
-Fixed a problem with ‑‑scores=<scoring_file>. When the
+Fixed a problem with --scores=<scoring_file>. When the
<scoring_file> defined score values for N,
those scores were not honored during the ungapped seed extension stage.
-Fixed problems with ‑‑ambiguous=n and
-‑‑ambiguous=iupac. These were
+Fixed problems with --ambiguous=n and
+--ambiguous=iupac. These were
incorrectly penalizing substitutions between non-ambiguous nucleotides
(A, C, G, or T) and ambiguous ones (N, B, D, H, K, M, R, S,
V, W, or Y). This has been corrected to honor the original
@@ -6477,7 +6477,7 @@ Change History
@@ -6485,7 +6485,7 @@
-Added ‑‑queryhsplimit=<n>.
+Added --queryhsplimit=<n>.
Change History
@@ -6493,7 +6493,7 @@
1.02.27 Jan/31/2011
-Added ‑‑outputmasking=<file>.
+Added --outputmasking=<file>.
Change History
1.02.37 Mar/31/2011
-Added ‑‑outputmasking:soft=<file>.
+Added --outputmasking:soft=<file>.
@@ -6511,7 +6511,7 @@ Change History
Changed the behavior of
- ‑‑queryhsplimit=<n> to
+--queryhsplimit=<n> to
better match user expectations. Previously the limit was applied separately
for each strand of the query. Moreover, HSPs discovered before the limit was
reached were still passed downstream for further processing.
@@ -6523,20 +6523,20 @@ Change History
Fixed a bug involving the ngap and cgap fields for
-‑‑format=general. These fields were only reported correctly if
+--format=general. These fields were only reported correctly if
the continuity or ncolumn fields were also requested.
Otherwise, the value reported represented the contents of unitialized memory.
@@ -6545,15 +6545,15 @@
Added filtering options
- ‑‑filter=nmismatch:0..<max>,
-‑‑filter=ngap:0..<max>,
-and ‑‑filter=cgap:0..<max>.
+--filter=nmismatch:0..<max>,
+--filter=ngap:0..<max>,
+and --filter=cgap:0..<max>.
‑‑filter=nmatch:<min>.
-The older option, ‑‑matchcount=<min> is of course still
+--filter=nmatch:<min>.
+The older option, --matchcount=<min> is of course still
recognized.
Change History
1.02.40 Apr/7/2011
-Added ‑‑outputmasking+=<file>
-and ‑‑outputmasking+:soft=<file>.
+Added --outputmasking+=<file>
+and --outputmasking+:soft=<file>.
@@ -6571,7 +6571,7 @@
Added
- ‑‑progress+masking=[<N>].
+--progress+masking=[<N>].
This existed as an unadvertized option in earlier versions of the program, as
-‑‑debug=queryprogress+masking=<N>. It has now been promoted
+--debug=queryprogress+masking=<N>. It has now been promoted
to a first class option.
Change History
was used to compute coverage. The denominator
used was the length of the subrange instead of the entire sequence. This
adversely affected both the
-‑‑coverage filter and the
+--coverage filter and the
coverage output field. This has been corrected
to use the length of the entire sequence.
@@ -6590,7 +6590,7 @@ Change History
-Added ‑‑format=general fields
+Added --format=general fields
nucs1,
nucs2
(the entire target or query nucleotides sequence),
@@ -6600,7 +6600,7 @@ Change History
-Fixed a minor problem with the ‑‑format=general fields
+Fixed a minor problem with the --format=general fields
cov% and con%. Those fields were being written with
an extra tab character preceeding them. This had a detrimental affect on
downstream parsers that required tabs as separators (parsers that interpreted
@@ -6608,26 +6608,26 @@ Change History
-Added ‑‑readgroup=<tags>,
-allowing the specification of tags for SAM's ‑RG header line.
+Added --readgroup=<tags>,
+allowing the specification of tags for SAM's -RG header line.
Added
- ‑‑allocate:target=<bytes>
+--allocate:target=<bytes>
and
-‑‑allocate:query=<bytes>.
+--allocate:query=<bytes>.
These allow the user to predict the amount of memory needed to store target
or query sequence data, which in some instances can resolve memory overuse
(it saves LASTZ from incrementally predicting the amount of memory needed).
‑‑allocate:traceback=<bytes>
-is now renamed (from ‑‑traceback=<bytes>).
+--allocate:traceback=<bytes>
+is now renamed (from --traceback=<bytes>).
-Added ‑‑include=<file>,
+Added --include=<file>,
allowing command-line arguemnts to be read from a text file.
@@ -6646,8 +6646,8 @@ Change History
1.03.02 Jul/19/2011
-Fixed a bug in ‑‑format=axt and
-‑‑format=axt+, which caused every
+Fixed a bug in --format=axt and
+--format=axt+, which caused every
alignment to be reported twice. The bug had been introduced in version
1.02.28 (not present in 1.02.27, present in 1.02.37).