From 09daf093cbeeaacaa51b992d6c29375b077f8c60 Mon Sep 17 00:00:00 2001 From: Dan Bolser Date: Tue, 31 Mar 2015 14:25:54 +0100 Subject: [PATCH] removing confusing html character codes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is motivated by the fact that ‑ is actually a different symbol from - (the minus sign on my kbd). This causes problems for simple find (Ctrl-f) in chrome, as searching for --match isnt the same as searching for ‑‑match! This could additionally cause problems when examples are coppied and pasted from the html page, i.e. "ls ‑l" gives "ls: cannot access ‑l: No such file or directory", hahahaha. Since I was getting rid of 98% the codes anyway, I went ahead and removed the other 2% for consistency as well. I imagine they just got used by a copy paste from a text editor on a mac or so... --- README.lastz.html | 698 +++++++++++++++++++++++----------------------- 1 file changed, 349 insertions(+), 349 deletions(-) diff --git a/README.lastz.html b/README.lastz.html index 586a204..897c895 100644 --- a/README.lastz.html +++ b/README.lastz.html @@ -245,14 +245,14 @@

Installation

If you have received the distribution as a packed archive, unpack it by whatever means are appropriate for your computer. The result should be -a directory <somepath>/lastz‑distrib‑X.XX.XX that contains +a directory <somepath>/lastz-distrib-X.XX.XX that contains a src subdirectory (and some others). You may find it convenient -to remove the revision number (‑X.XX.XX) from the directory name. +to remove the revision number (-X.XX.XX) from the directory name.

Before building or installing any of the programs, you will need to tell the installer where to put the executable, either by setting the shell variable -$LASTZ_INSTALL, or by editing the make‑include.mak +$LASTZ_INSTALL, or by editing the make-include.mak file to set the definition of installDir. Also, be sure to add the directory you choose to your $PATH. @@ -328,7 +328,7 @@

Overview of Processing Stages and Terminology

The usual flow is as follows (though most of these steps are optional, -and some settings like ‑‑anyornone +and some settings like --anyornone may affect the processing order). We first read the target sequence(s) into memory, and use that to build a seed word position table that will allow us to quickly map any word in the target to @@ -393,19 +393,19 @@

Comparing a Human Chromosome and a Chicken Chromosome

This runs in about two and a half minutes on a 2-GHz workstation, requiring only 400 Mb of RAM. Figure 1(a) shows the results, plotted using the -‑‑format=rdotplot output option and +--format=rdotplot output option and the R statistical package. (When in MAF format, LASTZ output can be browsed with the GMAJ interactive viewer for multiple alignments, available from the Miller Lab at Penn State.)

-Using ‑‑notransition lowers +Using --notransition lowers seeding sensitivity and reduces runtime (by a factor of about 10 in this case). -‑‑step=20 also lowers seeding +--step=20 also lowers seeding sensitivity, reducing runtime and also reducing memory consumption (by a factor of about 3.3 in this case). -‑‑nogapped eliminates the +--nogapped eliminates the computation of gapped alignments. The complete alignment process using default settings (shown in Figure 1(b)) uses 1.3 Gb of RAM and takes 4.5 hours on a machine running at 2.83 GHz. @@ -470,32 +470,32 @@

Aligning Shotgun Reads to a Human Chromosome

same as any other part of the chromosome, in order to accurately assess the uniqueness of the read mappings. Since we know the two species are close, we want to reduce sensitivity. Using -‑‑step=10, we will only be looking for +--step=10, we will only be looking for seeds at every 10th base. Instead of the default seed pattern, we use -‑‑seed=match12 and -‑‑notransition so our +--seed=match12 and +--notransition so our seeds will be exact matches of 12 bases. Instead of the default x-drop extension method we use -‑‑exact=20 so that a 20-base +--exact=20 so that a 20-base exact match is required to qualify as an HSP. Because we are aligning short reads, we specify -‑‑noytrim so the alignment ends will +--noytrim so the alignment ends will not be trimmed back to the highest scoring locations during gapped extension.

We replace the default score set, which is for more distant species, with the -stricter ‑‑match=1,5. This scores +stricter --match=1,5. This scores matching bases as +1 and mismatches as −5. We also use -‑‑ambiguous=n so that Ns +--ambiguous=n so that Ns will be scored appropriately. We are only interested in alignments that involve nearly an entire read, and since the species are close we don't want alignments with low identity; -therefore we use ‑‑coverage=90 and -‑‑identity=95. +therefore we use --coverage=90 and +--identity=95.

For output, we are only interested in where the reads align, so we use the -‑‑format=general option and specify +--format=general option and specify that we want the position on the chromosome (name1, start1, length1) and the read name and orientation (name2, strand2). This creates a tab-delimited @@ -644,20 +644,20 @@

Aligning a Sequence With Itself

it is aligning a sequence to itself, and performs the full computation on both copies (Figure 3(a)).

-

  • Specify the ‑‑notrivial +
  • Specify the --notrivial option. This performs the full computation on both copies, but doesn't report the trivial self-alignment block along the main diagonal (Figure 3(b)).

    -

  • Specify the ‑‑self option in place +
  • Specify the --self option in place of the query sequence. LASTZ will save work by computing with only one block of each mirror-image pair, though it still reports both copies in the output by reconstructing the second copy from the first. It also invokes -‑‑notrivial automatically to omit the trivial self-alignment block +--notrivial automatically to omit the trivial self-alignment block along the main diagonal. This gives the same output as the previous method, but runs faster (Figure 3(c)).

    -

  • Specify ‑‑self in place of the -query, and also add the ‑‑nomirror +
  • Specify --self in place of the +query, and also add the --nomirror option. In this case LASTZ reports only one copy of each mirror-image pair, as well as omitting the trivial block (Figure 3(d)). @@ -756,18 +756,18 @@

    Command-line Syntax

    and they also can specify pre-processing actions such as selecting a subsequence from the file (see Sequence Specifiers for details). With certain options such as -‑‑self the <query> +--self the <query> is not needed; otherwise if it is left unspecified the query sequences are read from stdin (though this does not work with random-access formats like 2Bit). As a special case, the <target> is -omitted when the ‑‑targetcapsule +omitted when the --targetcapsule option is used, since the target sequence is embedded within the capsule file.

    -For options, the general format is ‑‑<keyword> or -‑‑<keyword>=<value>, but for BLASTZ compatibility +For options, the general format is --<keyword> or +--<keyword>=<value>, but for BLASTZ compatibility some options also have an alternative syntax <letter>=<number>. (Be careful when copying options from the tables below, as some of the hyphens @@ -842,7 +842,7 @@

    Where to Look

    Inhibit the re-creation of mirror-image alignments. Output consists of only one copy of each meaningful alignment block in a self-alignment. This option -is only applicable when the ‑‑self +is only applicable when the --self option is used. @@ -864,7 +864,7 @@

    Where to Look

    --queryhsplimit=nowarn:<n> -Same as ‑‑queryhsplimit=<n> but warnings for queries that +Same as --queryhsplimit=<n> but warnings for queries that exceed the limit are witheld. @@ -889,7 +889,7 @@

    Where to Look

    By default both strands are searched, and the target is assumed to be different from the query.

    -If ‑‑self is used, the default is to +If --self is used, the default is to re-create the redundant mirror-image alignment blocks in the output. @@ -911,7 +911,7 @@

    Scoring

    Read the substitution scores and gap penalties (and possibly other options) from a scoring file. This option cannot be used in -conjunction with ‑‑match or +conjunction with --match or inference. @@ -931,15 +931,15 @@

    Scoring

    thresholds are called "dropoff", as raising them actually brings in *more* alignments, not fewer. -->

    -Note that specifying ‑‑match changes the defaults for some of +Note that specifying --match changes the defaults for some of the other options (e.g. the scoring penalties for gaps, and various extension thresholds), as described in their respective sections. The regular defaults are chosen for compatibility with BLASTZ, but since BLASTZ doesn't support -‑‑match, LASTZ infers that you are not expecting BLASTZ +--match, LASTZ infers that you are not expecting BLASTZ compatibility for this run, so it is free to use improved defaults.

    This option cannot be used in conjunction with -‑‑scores or +--scores or inference. @@ -957,7 +957,7 @@

    Scoring

    being performed, and cannot be used in conjunction with inference. These values specified on the command line override any corresponding values from a file provided with -‑‑scores. +--scores. @@ -1021,9 +1021,9 @@

    Scoring

    This feature is somewhat experimental, and special builds of LASTZ are required to enable it. Please see Inferring Score Sets for more information. Inference cannot be used in conjunction with -‑‑scores, -‑‑match, or -‑‑gap. +--scores, +--match, or +--gap. @@ -1032,7 +1032,7 @@

    Scoring

    Infer substitution scores and/or gap penalties, but don't perform the final -alignment (requires ‑‑infscores). +alignment (requires --infscores). @@ -1042,7 +1042,7 @@

    Scoring

    Save the inferred scoring parameters to the specified file (or to stdout), in the same format expected -by ‑‑scores. +by --scores. @@ -1057,15 +1057,15 @@

    Scoring

    - - - - + + + +
     ACGT
    A91‑114‑31‑123
    C‑114100‑125‑31
    G‑31‑125100‑114
    T‑123‑31‑11491
    A91-114-31-123
    C-114100-125-31
    G-31-125100-114
    T-123-31-11491

    Default gap penalties are determined as follows. If -‑‑match is +--match is specified, the open penalty is 3.25 times the mismatch penalty, and the extend penalty is 0.24375 times the mismatch penalty. (These are the same ratios as BLASTZ’s defaults.) Both penalties are rounded up to the nearest integer. @@ -1123,8 +1123,8 @@

    Indexing

    runs. The actual count is dependent on sequence length and composition as well as the step offset and seed pattern. For example, Figure 4 shows the variation among human chromosomes in hg18 for -‑‑seed=match13, ‑‑step=15, and -‑‑maxwordcount=90%. The gray bars show the percentage of +--seed=match13, --step=15, and +--maxwordcount=90%. The gray bars show the percentage of seed word positions kept (the red line shows the ideal 90%). The blue numbers show the equivalent count, which varies greatly.

    @@ -1154,7 +1154,7 @@

    Indexing

    or less, one byte is used; if it is 65,534 or less, two bytes are used.

    The resulting masked intervals can be written to a file with the -‑‑outputmasking=<file> +--outputmasking=<file> option. @@ -1167,11 +1167,11 @@

    Indexing

    are read from the specified file. When this option is used, the normal target specifier is omitted from the command line, and the following options are not allowed: -‑‑step, -‑‑maxwordcount, -‑‑masking, -‑‑seed, -‑‑word. +--step, +--maxwordcount, +--masking, +--seed, +--word. @@ -1325,7 +1325,7 @@

    Seeding

    Require two nearby seeds on the same diagonal, separated by a number of bases in the given range. See the Seed Patterns section for more information. This option cannot be used in conjunction with -‑‑recoverseeds. +--recoverseeds. @@ -1345,7 +1345,7 @@

    Seeding

    considerably and cost more memory, and usually does not improve the results significantly. See the Gap-free Extension stage for more information. This option cannot be used in conjunction with -‑‑twins. +--twins. @@ -1500,12 +1500,12 @@

    Finding HSPs (Gap-free Extension)

    By default seeds are extended to HSPs using x-drop extension, with entropy adjustment.

    -If ‑‑match scoring is used, the +If --match scoring is used, the default x-drop termination threshold is 10 times the square root of the mismatch penalty, rounded up to the nearest integer. Otherwise the default is 10 times the A-vs.-A substitution score.

    -If ‑‑match scoring is used, the +If --match scoring is used, the default HSP score threshold is 30 times the match reward (equivalent to the score of a 30-bp exact match). Otherwise the default is 3000. @@ -1639,14 +1639,14 @@

    Gapped Extension

    By default gapped extension is performed, and alignment ends are trimmed to the locations giving the maximum score.

    -If ‑‑match scoring is used, the +If --match scoring is used, the default y-drop threshold is twice the x-drop threshold (or if x-drop extension was not performed, twice what the default x-drop threshold would have been); otherwise it is the score of a 300-bp gap.

    The default for the gapped score threshold is to use the same value as the HSP threshold (which is settable via -‑‑hspthresh). If the HSP +--hspthresh). If the HSP threshold was adaptive, then the lowest-scoring HSP that was kept is used for this default. If x-drop extension was not performed, the value used is whatever the default HSP threshold would have been. @@ -1711,7 +1711,7 @@

    Back-end Filtering

    of matched bases in the alignment. This option is not valid with quantum DNA.

    -For backwards compatibility, ‑‑matchcount=<min> has the +For backwards compatibility, --matchcount=<min> has the same meaning. @@ -1758,7 +1758,7 @@

    Back-end Filtering

    Do not output a trivial self-alignment block if the target and query sequences -are identical. Note that using ‑‑self +are identical. Note that using --self automatically enables this option. @@ -1844,7 +1844,7 @@

    Output

    or general-[:<fields>].

    -‑‑format=none can be used when no alignment output is desired. +--format=none can be used when no alignment output is desired. @@ -1855,7 +1855,7 @@

    Output

    Create an additional output file suitable for plotting the alignment blocks with the R statistical package. The output file is the same as would be produced by -‑‑format=rdotplot, but this option +--format=rdotplot, but this option allows you to create the dotplot file without having to run the alignment twice. @@ -1865,7 +1865,7 @@

    Output

    Used in conjuction with the SAM file format, allowing -the specification of tags for SAM's ‑RG header line. +the specification of tags for SAM's -RG header line. <tags> is a tab-delimited list of <tag>:<value> items. See the SAM specification for details about which tags are required. LASTZ does not validate whether the @@ -1939,7 +1939,7 @@

    Output

    Used in conjuction with the -‑‑masking=<count> option. +--masking=<count> option. The masked target intervals, resulting from alignment with all queries, are written to a file in sequence masking file format. The file is suitable @@ -1948,9 +1948,9 @@

    Output

    xmask, and nmask sequence specifier actions.

    In contrast with -‑‑outputmasking:soft=<file>, +--outputmasking:soft=<file>, only those intervals created by the -‑‑masking=<count> option +--masking=<count> option are reported. @@ -1960,7 +1960,7 @@

    Output

    The same as -‑‑outputmasking=<file>, +--outputmasking=<file>, except that masked intervals are wriiten to a file in three field sequence masking file format, which includes sequence names. The file is not suitable for later use as @@ -1981,10 +1981,10 @@

    Output

    xmask, and nmask sequence specifier actions.

    In contrast with -‑‑outputmasking=<file>, +--outputmasking=<file>, all masked intervals in the target sequence are reported, regardless of whether they were created by the -‑‑masking=<count> option +--masking=<count> option or were in the sequence as it was originally input. @@ -1994,7 +1994,7 @@

    Output

    The same as -‑‑outputmasking:soft=<file>, +--outputmasking:soft=<file>, except that masked intervals are wriiten to a file in three field sequence masking file format, which includes sequence names. The file is not suitable for later use as @@ -2039,7 +2039,7 @@

    Output

    Write out alignments as segments, in the same format -used for input by the ‑‑segments +used for input by the --segments option. These anchor segments can then be used to anchor alignments in a subsequent run of LASTZ. This can be useful if you want to filter HSPs in some way before performing gapped extension, for example filtering them by @@ -2095,7 +2095,7 @@

    Housekeeping

    Read arguments from a text file. The arguments are parsed the same as they would be from the command-line, with the exception that they may appear on -multiple lines in the file. ‑‑include can be used in conjunction +multiple lines in the file. --include can be used in conjunction with other command line arguments.

    Note that any shell-performed substitutions that would be performed on the @@ -2111,10 +2111,10 @@

    Housekeeping

    the gapped extension stage. <bytes> may contain an M or K unit suffix if desired (indicating a multiplier of 1,024 or 1,048,576, respectively). For example, -‑‑allocate:traceback=80.0M is the same as -‑‑allocate:traceback=83886080. +--allocate:traceback=80.0M is the same as +--allocate:traceback=83886080.

    -For backwards compatibility, ‑‑traceback=<bytes> is also +For backwards compatibility, --traceback=<bytes> is also accepted. @@ -2142,7 +2142,7 @@

    Housekeeping

    Predict the amount of memory (in RAM) that will be needed for query sequence data. See -‑‑allocate:target for further +--allocate:target for further details.

    The memory needed for a sequence is L+1, where @@ -2182,20 +2182,20 @@

    Shortcuts for Yasra

    provide canned sets of option settings that work well for aligning an assembled reference sequence (as the target) with a set of shotgun reads (as the query). They are selected based on the expected level of identity between the sequences. -For example, ‑‑yasra90 should be used when we expect 90% identity. -The ‑‑yasraXXshort options are appropriate when the reads are very +For example, --yasra90 should be used when we expect 90% identity. +The --yasraXXshort options are appropriate when the reads are very short (less than 50 bp).

    - - - - - - - + + + + + + +
    Option Equivalent
    --yasra98 T=2 Z=20 ‑‑match=1,6 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=98 ‑‑ambiguousn ‑‑noytrim
    --yasra95 T=2 Z=20 ‑‑match=1,5 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=95 ‑‑ambiguousn ‑‑noytrim
    --yasra90 T=2 Z=20 ‑‑match=1,5 O=6 E=1 Y=20 K=22 L=30 ‑‑identity=90 ‑‑ambiguousn ‑‑noytrim
    --yasra85 T=2      ‑‑match=1,2O=4 E=1 Y=20 K=22 L=30 ‑‑identity=85 ‑‑ambiguousn ‑‑noytrim
    --yasra75 T=2      ‑‑match=1,1O=3 E=1 Y=20 K=22 L=30 ‑‑identity=75 ‑‑ambiguousn ‑‑noytrim
    --yasra95shortT=2      ‑‑match=1,7O=6 E=1 Y=14 K=10 L=14 ‑‑identity=95 ‑‑ambiguousn ‑‑noytrim
    --yasra85shortT=2      ‑‑match=1,3O=4 E=1 Y=14 K=11 L=14 ‑‑identity=85 ‑‑ambiguousn ‑‑noytrim
    --yasra98 T=2 Z=20 --match=1,6 O=8 E=1 Y=20 K=22 L=30 --identity=98 --ambiguousn --noytrim
    --yasra95 T=2 Z=20 --match=1,5 O=8 E=1 Y=20 K=22 L=30 --identity=95 --ambiguousn --noytrim
    --yasra90 T=2 Z=20 --match=1,5 O=6 E=1 Y=20 K=22 L=30 --identity=90 --ambiguousn --noytrim
    --yasra85 T=2      --match=1,2O=4 E=1 Y=20 K=22 L=30 --identity=85 --ambiguousn --noytrim
    --yasra75 T=2      --match=1,1O=3 E=1 Y=20 K=22 L=30 --identity=75 --ambiguousn --noytrim
    --yasra95shortT=2      --match=1,7O=6 E=1 Y=14 K=10 L=14 --identity=95 --ambiguousn --noytrim
    --yasra85shortT=2      --match=1,3O=4 E=1 Y=14 K=11 L=14 --identity=85 --ambiguousn --noytrim

    @@ -2203,20 +2203,20 @@

    Shortcuts for Yasra

    is done as an improvement, so most users will want to use the shortcuts shown above. Hoever, in order to support backward compatibility for users that want to reproduce previous results, all previous versions of the shortcuts are -included. The syntax is ‑‑<shortcut>:<version>, where +included. The syntax is --<shortcut>:<version>, where <version> is the LASTZ version number that contained the shortcut.

    - - - - - - - + + + + + + +
    Option LASTZ version Equivalent
    --yasra98:<version> 1.02.45 or earlierT=2 Z=20 ‑‑match=1,6 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=98
    --yasra95:<version> 1.02.45 or earlierT=2 Z=20 ‑‑match=1,5 O=8 E=1 Y=20 K=22 L=30 ‑‑identity=95
    --yasra90:<version> 1.02.45 or earlierT=2 Z=20 ‑‑match=1,5 O=6 E=1 Y=20 K=22 L=30 ‑‑identity=90
    --yasra85:<version> 1.02.45 or earlierT=2      ‑‑match=1,2O=4 E=1 Y=20 K=22 L=30 ‑‑identity=85
    --yasra75:<version> 1.02.45 or earlierT=2      ‑‑match=1,1O=3 E=1 Y=20 K=22 L=30 ‑‑identity=75
    --yasra95short:<version>1.02.45 or earlierT=2      ‑‑match=1,7O=6 E=1 Y=14 K=10 L=14 ‑‑identity=95
    --yasra85short:<version>1.02.45 or earlierT=2      ‑‑match=1,3O=4 E=1 Y=14 K=11 L=14 ‑‑identity=85
    --yasra98:<version> 1.02.45 or earlierT=2 Z=20 --match=1,6 O=8 E=1 Y=20 K=22 L=30 --identity=98
    --yasra95:<version> 1.02.45 or earlierT=2 Z=20 --match=1,5 O=8 E=1 Y=20 K=22 L=30 --identity=95
    --yasra90:<version> 1.02.45 or earlierT=2 Z=20 --match=1,5 O=6 E=1 Y=20 K=22 L=30 --identity=90
    --yasra85:<version> 1.02.45 or earlierT=2      --match=1,2O=4 E=1 Y=20 K=22 L=30 --identity=85
    --yasra75:<version> 1.02.45 or earlierT=2      --match=1,1O=3 E=1 Y=20 K=22 L=30 --identity=75
    --yasra95short:<version>1.02.45 or earlierT=2      --match=1,7O=6 E=1 Y=14 K=10 L=14 --identity=95
    --yasra85short:<version>1.02.45 or earlierT=2      --match=1,3O=4 E=1 Y=14 K=11 L=14 --identity=85
    @@ -2323,7 +2323,7 @@

    Sequence Specifiers

    both <start> and <end> are required.

    -A “zoom factor” can also be included, using the syntax +A "zoom factor" can also be included, using the syntax <start>..<end>+<zoom>%. The specified interval is expanded on each end by <zoom> percent. This is useful when you know, for example, the location of a gene, and would like to include @@ -2343,7 +2343,7 @@

    Sequence Specifiers

    used. However, this can lead to non-obvious interactions with other features such as strand reporting, sequence masking, and segment files, so it should be used with care. Usually it is simpler to use the -‑‑strand options instead. +--strand options instead.

    Note that subrange positions are always measured from the start of the sequence provided in the file (i.e., counting along the @@ -2617,7 +2617,7 @@

    Sequence Specifiers

    sequence to be used instead of the sequence itself. Again, this should be used with care, as it can lead to murky interactions with other features. In BLASTZ it was needed for searching only the minus strand, but LASTZ provides -a ‑‑strand option for that. +a --strand option for that. @@ -2677,10 +2677,10 @@

    Indexing Target Seed Words

    This table is one of the major space requirements of the program. Both the memory and time required for seeding can be decreased by using sparse spacing. -The ‑‑step option sets a +The --step option sets a step size: instead of examining every position, seed words are stored only for multiples of the step size. Large step sizes (say, -‑‑step=100) incur a loss of sensitivity, at least at the seeding +--step=100) incur a loss of sensitivity, at least at the seeding stage. However, to discover any gapped alignment block we only need to discover one seed (of many) in that alignment, so the actual sensitivity loss is small in most cases. Section 6.2 of [Harris 2007] @@ -2700,10 +2700,10 @@

    Indexing Target Seed Words

    bases are left out of the seed word position table and skipped during seeding, respectively, so they do not participate in the seeding stage.
  • If repeat locations are not known, the option -‑‑maxwordcount can be used to remove +--maxwordcount can be used to remove frequently occurring target seed words from the position table before query processing begins. -
  • Dynamic masking (‑‑masking) can +
  • Dynamic masking (--masking) can be used to mask target positions that have occurred in too many alignments; however this only affects subsequent query sequences. @@ -2725,7 +2725,7 @@

    Seeding

    To locate seeds, the query sequence is parsed into seed words the same way the target is (except that -‑‑step does not apply to the query; +--step does not apply to the query; we look at every seed word). Each packed seed word is used as an index into the target seed word position table to find the target positions that have a seed match for this @@ -2743,7 +2743,7 @@

    Quantum Seeding:

    is first converted to a quantum seeding ball of those DNA words that are most similar to it. Similarity is determined by the scoring matrix; all words with a combined substitution score above the quantum seeding threshold -(set by the ‑‑ball option) are +(set by the --ball option) are considered to be in the ball. Then each word in the ball is looked up in the target seed word position table as usual, with all such hits considered to be seed matches for the q-word. @@ -2782,7 +2782,7 @@

    Gap-free Extension

    exact match, M-mismatch, or x-drop.

    -Exact match extension (‑‑exact) simply +Exact match extension (--exact) simply extends the seed until a mismatch is found. If the resulting length is enough, the extended seed is kept as an HSP for further processing. Exact match extension is most useful when the target and query are expected to be very @@ -2790,17 +2790,17 @@

    Gap-free Extension

    M-mismatch extension -(‑‑<M>mismatch) extends the +(--<M>mismatch) extends the seed to find the longest interval that includes the entire seed and contains no more than M mismatches. If the resulting length is enough, the extended seed is kept as an HSP for further processing. M-mismatch extension is most useful when the approximate divergence between the target and query is known, and HSPs of a known length are desired. It provides a way to specify both length and identity thresholds together, -with more flexibility than ‑‑exact. +with more flexibility than --exact.

    -In x-drop extension (‑‑xdrop), as we +In x-drop extension (--xdrop), as we extend in each direction we track the cumulative score for the extended match according to the substitution scoring matrix. The extension is stopped when the score drops off by more than the given x-drop threshold; that is, when the @@ -2810,10 +2810,10 @@

    Gap-free Extension

    worse than −<dropoff> is encountered.) The extension is then trimmed back to the peak point. If the combined score of the seed plus both extensions meets the threshold set by the -‑‑hspthresh option, it qualifies +--hspthresh option, it qualifies as an HSP and is kept for further processing. Matches that do not meet the score threshold are discarded. -The ‑‑entropy options control +The --entropy options control whether or not the scores are adjusted for nucleotide entropy when they are compared to the threshold. @@ -2823,9 +2823,9 @@

    Adaptive Score Threshold:

    HSP score threshold — set it too high and hardly anything will align, but too low and the program will be swamped and not finish. LASTZ’s adaptive scoring options -(‑‑hspthresh=top<basecount> +(--hspthresh=top<basecount> and -‑‑hspthresh=top<percentage>%) +--hspthresh=top<percentage>%) allow you to set the threshold indirectly to align the desired amount of the target (as an approximate number of bases or as a percentage, respectively). This way you can set it for, say, 10% (which will run quickly regardless of the @@ -2848,7 +2848,7 @@

    Diagonal Hashing:

    LASTZ hashes diagonals to 16-bit values and tracks extensions only by the hash value. While this saves space, it results in a miniscule loss of sensitivity — LASTZ may miss some seeds due to hash collisions. Using -‑‑recoverseeds will prevent losing +--recoverseeds will prevent losing these seeds, but will slow the program significantly. Moreover, since most true alignments contain many HSPs, with many seeds in each HSP, the vast majority of lost seeds have no effect on the final results. @@ -2878,11 +2878,11 @@

    HSP Chaining

    processed in separate pipelines, it will not necessarily cause inversions to be discarded.) If LASTZ’s implementation of chaining is not suitable, it is possible to substitute another chaining program by first running LASTZ with the -‑‑nogapped and -‑‑writesegments +--nogapped and +--writesegments options to get the HSPs, running a separate chaining program to filter them, and then running the final stages of LASTZ on that output via the -‑‑segments option. +--segments option.

    Figure 5(a) shows an alignment without chaining, while 5(b) shows the same @@ -2956,7 +2956,7 @@

    Gapped Extension

    first). Gapped extension is performed independently in both directions from the anchor point, and the two resulting alignments are joined at the anchor. If the total score meets the threshold -specified by the ‑‑gappedthresh +specified by the --gappedthresh option, the joined alignment is kept and passed to the next stage; otherwise it is discarded. If the extension from one anchor happens to go through one or more other anchors, the redundant anchors are dropped from the list. @@ -2978,14 +2978,14 @@

    Gapped Extension

    the DP matrix examined is reduced by disallowing low-scoring regions (see [Zhang 1998]): wherever the alignment score drops below the peak score seen so far by more than the threshold specified in the -‑‑ydrop option, the DP matrix is +--ydrop option, the DP matrix is truncated and no further cells are computed along that row or column. By default the extension is then trimmed back to the location of the peak score; thus the extension normally ends when all remaining sub-alignment possibilities (paths in the DP matrix) begin with sections that score worse than −<dropoff>. However for alignments where the extension reaches the end of the sequence, you can suppress this -trimming by specifying the ‑‑noytrim +trimming by specifying the --noytrim option, which is recommended when aligning short reads.

    @@ -3013,13 +3013,13 @@

    Back-end Filtering

    Whatever alignment blocks have made it through the above gauntlet are then subjected to identity, continuity, coverage and match count filtering (as specified by the -‑‑identity, -‑‑continuity, -‑‑coverage, -‑‑filter=nmatch, -‑‑filter=nmismatch, -‑‑filter=ngapand -‑‑filter=cgap options, +--identity, +--continuity, +--coverage, +--filter=nmatch, +--filter=nmismatch, +--filter=ngapand +--filter=cgap options, respectively). Blocks that do not meet the specified range for each feature are discarded. @@ -3033,7 +3033,7 @@

    Back-end Filtering

    Characters that differ only in upper vs. lower case are counted as matches. Columns containing gaps or non-ACGT characters play no part in this computation, and it is independent of the settings for -‑‑ambiguous=n and +--ambiguous=n and bad_score. Identity cannot be determined for alignments with quantum DNA, because of the potential ambiguity of the symbols. @@ -3097,7 +3097,7 @@

    Interpolation

    Once the above stages have been performed, it is not uncommon to have regions left over in which no alignment has been found. In the interpolation stage -(activated by the ‑‑inner option) we +(activated by the --inner option) we repeat the seeding through gapped extension stages in these leftover regions, at a presumably higher sensitivity. Using such high sensitivity from the outset would be computationally prohibitive (due to the excessive number of @@ -3155,7 +3155,7 @@

    Alignment Output

    The alignment blocks found by the preceding pipeline of stages are written to stdout (or to a file specified with the -‑‑output option) in the requested +--output option) in the requested format. These may be seeds, gap-free HSPs, or gapped local alignments, depending on which stages were performed. There is no particular order to the alignment @@ -3182,7 +3182,7 @@

    File Formats

    sequences contain a series of A, C, G, T, and N characters in upper or lower case. Lower case indicates repeat-masked bases, while Ns represent -unknown bases if the ‑‑ambiguous=n +unknown bases if the --ambiguous=n option is specified. (By default, a run of Ns or Xs is used to separate sequences that have been catenated together for processing, but this is now deprecated; see @@ -3222,7 +3222,7 @@

    FASTA (sequence input)

    as a splicing character. However, LASTZ does not currently support IUPAC-IUB ambiguity codes other than N (such as R, W, etc.), -beyond the treatment afforded by ‑‑ambiguous=iupac. +beyond the treatment afforded by --ambiguous=iupac.

    A special case, non-conforming to the official standard, is made to allow a special user-specified separator character. @@ -3250,7 +3250,7 @@

    FASTQ (sequence input)

    format, prohibiting line-wrapping within DNA or quality sequences.

    Each sequence consists of four lines. The first line begins with a - followed by the name of the sequence. The second line contains +- followed by the name of the sequence. The second line contains nucleotide characters. The third line begins with a +, optionally followed by the name of the sequence (which, if present must match that of the first line). The fourth line contains quality characters. @@ -3454,9 +3454,9 @@

    Sequence Masking File

    nmask actions in a sequence specifier. It can also be created by using the -‑‑outputmasking=<file> +--outputmasking=<file> or -‑‑outputmasking:soft=<file> +--outputmasking:soft=<file> options. It consists of one interval per line, without sequence names. Lines beginning with a # are @@ -3473,7 +3473,7 @@

    Sequence Masking File

    Note that the masking intervals are counted along the forward strand, even if we are only aligning to the reverse complement of the query specifier (i.e. for -‑‑strand=minus). +--strand=minus).

    Here is an example. If the target sequence is hg18.chr1, this would mask the @@ -3499,9 +3499,9 @@

    Sequence Masking File, Three Fields

    format.

    This file is created by using the -‑‑outputmasking+=<file> +--outputmasking+=<file> or -‑‑outputmasking+:soft=<file> +--outputmasking+:soft=<file> options. It consists of one interval per line, with sequence names.

    @@ -3514,7 +3514,7 @@

    Sequence Masking File, Three Fields

    Note that the masking intervals are counted along the forward strand, even if we are only aligning to the reverse complement of the query specifier (i.e. for -‑‑strand=minus). +--strand=minus).
    @@ -3522,7 +3522,7 @@

    Scoring File

    -This file is used with the ‑‑scores +This file is used with the --scores option to specify a set of (mostly) scoring-related parameters en masse. The score set consists of a substitution matrix and other settings. The other settings come first and are individually explained in the @@ -3613,7 +3613,7 @@

    Scoring File

    This is used as a default for all cells of the scoring matrix that are not otherwise set (either by the user or by LASTZ’s defaults). This is the score used for Ns (unless -‑‑ambiguous=n is specified on the +--ambiguous=n is specified on the command line).

    The default value is −100. There is no corresponding command-line option. @@ -3625,7 +3625,7 @@

    Scoring File

    <penalty> This is identical to the <open> field of the -‑‑gap command line option. +--gap command line option. @@ -3634,7 +3634,7 @@

    Scoring File

    <penalty> This is identical to the <extend> field of the -‑‑gap command line option. +--gap command line option. @@ -3643,7 +3643,7 @@

    Scoring File

    <offset> This is identical to the -‑‑step command line option. +--step command line option. @@ -3651,8 +3651,8 @@

    Scoring File

    seed <strategy> -This corresponds to the ‑‑seed and -‑‑transition command line options. +This corresponds to the --seed and +--transition command line options. <strategy> must be one of the following, with no spaces:
    12of19,transition
    12of19,notransition @@ -3667,7 +3667,7 @@

    Scoring File

    <percentage>% This is identical to the -‑‑ball command line option. +--ball command line option. @@ -3676,7 +3676,7 @@

    Scoring File

    <dropoff> This is identical to the -‑‑xdrop command line option. +--xdrop command line option. @@ -3685,10 +3685,10 @@

    Scoring File

    <score> This is identical to the -‑‑hspthresh command line option, +--hspthresh command line option, except that it does not currently support the -‑‑hspthresh=top<basecount> or -‑‑hspthresh=top<percentage>% variants. +--hspthresh=top<basecount> or +--hspthresh=top<percentage>% variants. @@ -3697,7 +3697,7 @@

    Scoring File

    <dropoff> This is identical to the -‑‑ydrop command line option. +--ydrop command line option. @@ -3706,7 +3706,7 @@

    Scoring File

    <score> This is identical to the -‑‑gappedthresh command line option. +--gappedthresh command line option. @@ -3718,7 +3718,7 @@

    Inference Control File

    When LASTZ is asked to infer substitution scores and/or gap penalties from the -input sequences (e.g. via the ‑‑infer +input sequences (e.g. via the --infer option), this file is used to set parameters that control the inference process. @@ -3777,8 +3777,8 @@

    Inference Control File

    hsp_threshold and gapped_threshold correspond to -the command line ‑‑hspthresh and -‑‑gappedthresh options. +the command line --hspthresh and +--gappedthresh options. The defaults are hsp_threshold=3000 and gapped_threshold=hsp_threshold. @@ -3793,14 +3793,14 @@

    Inference Control File

    gap_open_penalty and gap_extend_penalty correspond to the command line -‑‑gap=[<open>,]<extend> +--gap=[<open>,]<extend> option. These are used for the first iteration of gap-scoring inference. The defaults are gap_open_penalty=3.25*worst_substitution and gap_extend_penalty=0.24375*worst_substitution.

    step corresponds to the command line -‑‑step option. A large step, e.g. +--step option. A large step, e.g. step=100, could potentially speed up the inference process. Ideally, this would base the inference on a sample of only one percent of the whole. However, the sample actually ends up larger than that and is biased @@ -3812,7 +3812,7 @@

    Inference Control File

    entropy corresponds to the command line -‑‑entropy option. Legal values are +--entropy option. Legal values are on or off. If on, sequence entropy is incorporated when filtering HSPs. The default is entropy=on. @@ -3887,7 +3887,7 @@

    Segment File

    This list is either produced internally by LASTZ as a result of the gap-free extension stage (see Overview), or read from a user-supplied file via the -‑‑segments option. The latter +--segments option. The latter causes LASTZ to skip the indexing, seeding, and gap-free extension stages and begin with the chaining stage (or the next specified stage, if chaining is not requested). @@ -3965,9 +3965,9 @@

    LAV (alignment output)

    (same specification at PSU)

    -The option ‑‑format=lav+text adds +The option --format=lav+text adds textual output for each alignment block (in the same -format as the ‑‑format=text option), intermixed with the LAV +format as the --format=text option), intermixed with the LAV format. Such files are unlikely to be recognized by any LAV-reading program. @@ -3980,7 +3980,7 @@

    AXT (alignment output)

    UCSC AXT specification

    -The option ‑‑format=axt+ reports +The option --format=axt+ reports additional statistics with each block, in the form of comments. The exact content of these comment lines may change in future releases of LASTZ. @@ -3996,11 +3996,11 @@

    MAF (alignment output)

    UCSC MAF specification

    -The option ‑‑format=maf+ reports +The option --format=maf+ reports additional statistics with each block, in the form of comments. The exact content of these comment lines may change in future releases of LASTZ.

    -The option ‑‑format=maf- suppresses +The option --format=maf- suppresses the MAF header and any comments. This makes it suitable for concatenating output from multiple runs.

    @@ -4023,14 +4023,14 @@

    SAM (alignment output)

    For SAM files, LASTZ assumes that the target sequence is the reference and that query sequence(s) are short reads. For alignments that don't reach the -end of a query, ‑‑format=sam uses -"hard clipping", while ‑‑format=softsam +end of a query, --format=sam uses +"hard clipping", while --format=softsam uses "soft clipping". See the section on "clipped alignment" in the SAM specification for an explanation of what this means.

    -The options ‑‑format=sam- and -‑‑format=softsam- suppress the SAM +The options --format=sam- and +--format=softsam- suppress the SAM header lines. This makes them suitable for concatenating output from multiple runs. @@ -4048,10 +4048,10 @@

    CIGAR (alignment output)

    and as an extended cigar string in SAMtools. For -‑‑format=cigar, LASTZ implements +--format=cigar, LASTZ implements Exonerate CIGAR. LASTZ implements other CIGAR variants for -‑‑format=sam -and as fields for ‑‑format=general. +--format=sam +and as fields for --format=general.

    Exonerate CIGAR @@ -4076,9 +4076,9 @@

    CIGAR (alignment output)

    H runs to describe clipping operations for short sequences. LASTZ implements combinations of these variants where appropriate; details are described in -‑‑format=general:cigar, -‑‑format=general:cigarx -and ‑‑format=sam. +--format=general:cigar, +--format=general:cigarx +and --format=sam.

    @@ -4096,27 +4096,27 @@

    CIGAR (alignment output)

    -For ‑‑format=cigar, the alignment would be described by this line: +For --format=cigar, the alignment would be described by this line:

         cigar: query 3 56 + target <start> <end> <strand> <score> M 24 I 3 M 7 D 2 M 19
     

    -For ‑‑format=general:cigar, the +For --format=general:cigar, the alignment path would be described by this field:

         24M3I7M2D19M
     

    -For ‑‑format=general:cigarx, the +For --format=general:cigarx, the alignment path would be described by this field:

         16=X7=3I7=2DX18=
     

    -For ‑‑format=sam, the alignment path would +For --format=sam, the alignment path would be described by this field:

         3H24M3I7M2D19M5H
    @@ -4228,7 +4228,7 @@ 

    Differences (alignment output)

    perfect match for that block (i.e., no differences).

    -Sample output for ‑‑format=differences. +Sample output for --format=differences.

          (1)     (2)      (3)  (4)   (5)         (6)       (7) (8) (9) (10) (11)(12)  (13)     (14)
         chr22 14485783 14485784 + 49691432  EAYGRGI02GQ0SL 167 167  +  303   A   -   TGAGA... TGAGA...
    @@ -4244,7 +4244,7 @@ 

    Differences (alignment output)

    Sample output for -‑‑format=general:name1,zstart1,end1,strand1,size1,name2,zstart2+,end2+,strand2,size2,text1,text2. +--format=general:name1,zstart1,end1,strand1,size1,name2,zstart2+,end2+,strand2,size2,text1,text2.

         chr22 14485616 14485920 + 49691432  EAYGRGI02GQ0SL 0   303  +  303   TGAGA... TGAGA...
         chr22 14731668 14731964 + 49691432  EAYGRGI01EAV19 0   297  -  298   CTTCT... CTTCT...
    @@ -4320,12 +4320,12 @@ 

    General Output (alignment output)

    The syntax for this option is:

    -    ‑‑format=general[:<fields>]
    +    --format=general[:<fields>]
     
    where <fields> is a comma-separated list of field names in any desired order, with no spaces. For example
    -    ‑‑format=general:nmismatch,name1,strand1,start1,end1,name2,strand2,start2,end2
    +    --format=general:nmismatch,name1,strand1,start1,end1,name2,strand2,start2,end2
     
    will report each aligned interval pair and the number of mismatches in the alignment of that pair, like this: @@ -4356,7 +4356,7 @@

    General Output (alignment output)

    coverage

    -The option ‑‑format=mapping is a shortcut for ‑‑format=general +The option --format=mapping is a shortcut for --format=general with the following fields:  name1, zstart1, end1, name2, strand2, zstart2+, @@ -4366,8 +4366,8 @@

    General Output (alignment output)

    Field names are normally included as column headers in the first row of the output, preceded by a #. The options -‑‑format=general-[:<fields>] -and ‑‑format=mapping- suppress column headers. This makes +--format=general-[:<fields>] +and --format=mapping- suppress column headers. This makes them suitable for concatenating output from multiple runs.

    @@ -4926,9 +4926,9 @@

    Non-ACGT Characters, Splicing, and Separation

    Xs or Ns are used to mask out regions that should not be aligned. However, it is inappropriate when the sequences contain Ns to represent ambiguous bases. To handle this -case, LASTZ provides the ‑‑ambiguous=n +case, LASTZ provides the --ambiguous=n option, which causes substitutions with N to be scored as zero. -Additionally, the ‑‑ambiguous=iupac option +Additionally, the --ambiguous=iupac option causes the other IUPAC-IUB ambiguity codes (B, D, H, K, M, R, S, V, W, and Y) to be treated this same as an ambiguous N. @@ -5168,25 +5168,25 @@

    General seed patterns:

         --seed=<pattern>
     
    -The default seed is ‑‑seed=1110100110010101111, which is the same +The default seed is --seed=1110100110010101111, which is the same 12-of-19 seed used as the default in BLASTZ.

    Half-weight seed patterns:

    If a seed pattern consists of only 0s and Ts, it is implemented internally as a half-weight seed, which uses much less memory (the same amount as a normal seed pattern half as long). Additionally, -‑‑seed=half<length> can be used as shorthand to specify a +--seed=half<length> can be used as shorthand to specify a space-free half-weight seed (i.e., all Ts).

    Single, double, or no transitions:

    By default, one match position (a 1 in a spaced seed, or any position in an N-mer match) is allowed to be a transition instead of a true -match. ‑‑notransition disables this. Alternatively, -‑‑transition=2 allows any two match positions to be +match. --notransition disables this. Alternatively, +--transition=2 allows any two match positions to be transitions.

    Filtering on transversions and matches:

    -The ‑‑filter option imposes additional requirements on the number +The --filter option imposes additional requirements on the number of transversions and matches in a valid seed. This is especially useful in conjunction with half-weight patterns. For example,
    @@ -5213,7 +5213,7 @@ 

    Twin hit seeds:

    <minsep> but not more than <maxsep>. If <minsep> is omitted, zero is used (which means the twin seeds may be adjacent but not overlap). Negative values can -be used; for example ‑‑twins=‑5..10 +be used; for example --twins=-5..10 means the twins can overlap by as much as 5 bases or can have as much as 10 bases between them. @@ -5231,14 +5231,14 @@

    Any-or-None Alignment

    want to know is whether it aligned or not.

    -The ‑‑anyornone option is designed +The --anyornone option is designed for such cases, and can significantly improve alignment speed. Once any qualifying alignment has been found, processing for the current query is halted. The alignment is reported to the output, and then we immediately begin processing the next query. A qualifying alignment is one that would normally be output given the other parameter settings; for example it satisfies the -scoring thresholds (‑‑hspthresh -and/or ‑‑gappedthresh) and any +scoring thresholds (--hspthresh +and/or --gappedthresh) and any back-end filters.

    @@ -5277,10 +5277,10 @@

    Y-drop Mismatch Shadow

    Consider the following alignment of a 50-base query to a chromosome target, and -suppose we are using ‑‑match=1,5, -‑‑gap=6,1, -‑‑identity=97, and -‑‑coverage=95. The entire +suppose we are using --match=1,5, +--gap=6,1, +--identity=97, and +--coverage=95. The entire alignment as shown has 97.9% identity (46/47) and 100% coverage. However, the first five bases (AGAAC vs. AGAAG) have a negative score: four matches at +1 each and one mismatch at −5 gives a score @@ -5291,7 +5291,7 @@

    Y-drop Mismatch Shadow

    we don't want to, and we will see a bias against mismatches near the ends of reads. (Note that this anomaly arises because the alignment is terminated abruptly by the end of the sequence rather than normally by a low-scoring -region; also the ‑‑coverage option is more commonly used with +region; also the --coverage option is more commonly used with short reads than with longer sequences.)

    @@ -5305,13 +5305,13 @@

    Y-drop Mismatch Shadow

    To avoid this behavior, use the -‑‑noytrim option when aligning short +--noytrim option when aligning short reads. This causes LASTZ to refrain from trimming such alignments back to the highest-scoring location. Specifically, if the gapped extension process encounters the end of the sequence, it will keep that as the end of the alignment. In this case a negatively-scoring prefix or suffix will be kept as long as it does not score -worse than the ‑‑ydrop value. +worse than the --ydrop value.

    @@ -5382,10 +5382,10 @@

    Using Target Capsule Files

    lastz <target> --writecapsule=<capsule_file> [<seeding_options>]
    Applicable seeding options are -‑‑seed, -‑‑step, -‑‑maxwordcount, -and ‑‑word. +--seed, +--step, +--maxwordcount, +and --word.

    To use the capsule file, run LASTZ like this: @@ -5395,12 +5395,12 @@

    Using Target Capsule Files

    No additional effort on the part of the user is required to handle sharing of the capsule data between separate runs. Nearly all options are allowed; however the seeding options -‑‑seed, -‑‑step, -‑‑maxwordcount, -and ‑‑word +--seed, +--step, +--maxwordcount, +and --word are not allowed, since these (or their byproducts) are already stored in the -capsule file. Further, ‑‑masking +capsule file. Further, --masking is not allowed, because it would require modifying both the target sequence and the target seed word position table, which are contained in the capsule. @@ -5411,7 +5411,7 @@

    Using Target Capsule Files

    the same file; each instance will have its own virtual addresses for the capsule data, but the physical memory is shared. There is no requirement for more than one instance to actually use the capsule simultaneously. Running -a single copy of lastz with ‑‑targetcapsule will work +a single copy of lastz with --targetcapsule will work fine, and in fact there may be a small speed improvement compared to running the same alignment without a capsule. @@ -5449,14 +5449,14 @@

    Inferring Score Sets

    To have LASTZ infer scoring parameters, use a suitably enabled build of LASTZ (see below), and specify -the ‑‑infer or -‑‑inferonly options. (The latter +the --infer or +--inferonly options. (The latter will stop after inferring the parameters, without performing the final alignment.) Settings for the inference process can be specified in a control file included with these options.

    -The ‑‑infscores option causes the +The --infscores option causes the inferred scoring parameters to be written out to a separate file. If no <output_file> is specified, it is written to the header of the alignment output file, as a comment. As a last resort, if no alignment @@ -5530,17 +5530,17 @@

    Filtering With Shell Commands

    Though LASTZ provides several filtering options (e.g. -‑‑identity, -‑‑continuity, -‑‑coverage, -‑‑filter=nmatch, -‑‑filter=nmismatch, -‑‑filter=ngap and -‑‑filter=cgap), +--identity, +--continuity, +--coverage, +--filter=nmatch, +--filter=nmismatch, +--filter=ngap and +--filter=cgap), sometimes these are not sufficient for the task at hand. But in many cases it is still possible to perform the desired filtering by using the -‑‑format=general option in conjunction +--format=general option in conjunction with a simple awk, perl, or @@ -5565,7 +5565,7 @@

    Filtering With Shell Commands

    - + @@ -5643,25 +5643,25 @@

    Self-Masking a Sequence

    target sequence, and overlapping 200bp fragments of the critter as the queries.

    -The ‑‑masking=3 option enables +The --masking=3 option enables dynamic masking, which will mark any reference base appearing in 3 or more alignments. Since the fragments overlap by a factor of two, we expect every base will appear in two trivial alignments. Any more than that would be caused by a duplication elsewhere.

    -The ‑‑progress+masking +The --progress+masking option causes lastz to give you a progress report after every 10 thousand fragments. These reports come to the console (stderr) and look like this:

     	(16.933s) processing query 50,001: critter_21299501, masked 8,920,893/51,304,566 (17.4%)
     

    -The ‑‑format=none option inhibits the +The --format=none option inhibits the normal alignment output and -‑‑format=outputmasking+:soft +--format=outputmasking+:soft tells lastz to write the final masked intervals to a file.

    -The final line (‑‑notransition +The final line (--notransition in this example) is whatever alignment scoring parameters you want to use. What is appropriate will depend on the level of divergence you want to allow in the masked duplications. @@ -5715,7 +5715,7 @@

    Differences from BLASTZ

    The handling of bounding alignments in the DP matrix is different in LASTZ than in BLASTZ. This is discussed in Bounding Alignments in the DP Matrix. The -‑‑allgappedbounds option can be +--allgappedbounds option can be used to revert to the bounding criteria used in BLASTZ.

    @@ -5726,7 +5726,7 @@

    Differences from BLASTZ

    Y) in fasta sequences but was unclear about how these were scored. Since we feel the user should be aware of how these bases are treated, LASTZ rejects them by default. The -‑‑ambiguous=iupac option permits them +--ambiguous=iupac option permits them but treats them the same as an ambiguous N. This is discussed in Non-ACGT Characters. @@ -5767,7 +5767,7 @@

    Bounding Alignments in the DP Matrix

    The correction for this is to only use alignments as bounds if they satisfy the score threshold. This corrected behavior is now the default in LASTZ (as of release 1.02.00). The -‑‑allgappedbounds option can be +--allgappedbounds option can be used to revert to the bounding criteria used in BLASTZ. @@ -5794,7 +5794,7 @@

    Change History

    @@ -5807,54 +5807,54 @@

    Change History

    @@ -5884,7 +5884,7 @@

    Change History

    @@ -5899,15 +5899,15 @@

    Change History

    @@ -5934,7 +5934,7 @@

    Change History

    @@ -5952,7 +5952,7 @@

    Change History

    @@ -5984,13 +5984,13 @@

    Change History

    @@ -6073,28 +6073,28 @@

    Change History

    @@ -6342,19 +6342,19 @@

    Change History

    @@ -6381,7 +6381,7 @@

    Change History

    @@ -6485,7 +6485,7 @@

    Change History

    @@ -6493,7 +6493,7 @@

    Change History

    @@ -6545,15 +6545,15 @@

    Change History

    @@ -6571,7 +6571,7 @@

    Change History

    was used to compute coverage. The denominator used was the length of the subrange instead of the entire sequence. This adversely affected both the -‑‑coverage filter and the +--coverage filter and the coverage output field. This has been corrected to use the length of the entire sequence. @@ -6590,7 +6590,7 @@

    Change History

    AXT field field for ‑‑format=general
    AXT field field for --format=general
    Alignment number (none)
    Chromosome (primary organism) name1
    Alignment start (primary organism) start1
    1.0.5Aug/2/2008 Fixed a bug that in some cases caused a bus error when interpolated -alignments (e.g. ‑‑inner=…) were used with multiple +alignments (e.g. --inner=…) were used with multiple queries.
    1.0.21Sep/9/2008 -Fixed a bug involving the default value for ‑‑gappedthresh -(a.k.a. L) when ‑‑exact is used. The bug caused the +Fixed a bug involving the default value for --gappedthresh +(a.k.a. L) when --exact is used. The bug caused the gapped threshold to be inordinately low, allowing undesirable alignment blocks to make it to the output file.
    Fixed a bug whereby Xs and Ns were treated as desirable substitutions when -unit scores (e.g. ‑‑match=…) were used. +unit scores (e.g. --match=…) were used.
    -Re-implemented ‑‑twins=…. The previous implementation +Re-implemented --twins=…. The previous implementation improperly truncated the left-extension of HSPs. The new implementation is slower and uses more memory.
    -Added ‑‑census=<file>. The census counts the number of +Added --census=<file>. The census counts the number of times each base in the target sequence is part of an alignment block. -Previously, ‑‑census produced a census only if the output format +Previously, --census produced a census only if the output format was LAV (the census is a special stanza in a LAV file). Otherwise the option was ignored. Now, if a file is specified a census is written to that file. The format of lines in the census is <name> <position> <count>. The position is one-based, and the count is limited to 255.

    -In situtations where 255 is too limiting, ‑‑census16=<file> -or ‑‑census32=<file> can be used, with limits of about +In situtations where 255 is too limiting, --census16=<file> +or --census32=<file> can be used, with limits of about 65 thousand and 4 billion, respectively. Note that these will respectively double and quadruple the amount of memory used for the census. The default census uses one byte per target sequence location.

    -Added ‑‑format=<differences>, to support Galaxy. All +Added --format=<differences>, to support Galaxy. All differences (gaps and runs of mismatches) are reported, one per line.
    -Added ‑‑anchors=<file> (eventually this was renamed to -‑‑segments=<file>), giving the user the ability to bypass +Added --anchors=<file> (eventually this was renamed to +--segments=<file>), giving the user the ability to bypass the seeding and gap-free extension stages.
    Changed default gap penalties for unit scores (e.g. -‑‑match=…) to be relative to mismatch score (instead of +--match=…) to be relative to mismatch score (instead of match score).
    -Changed defaults for xdrop and ydrop when ‑‑match scoring is +Changed defaults for xdrop and ydrop when --match scoring is used.
    -Added ‑‑maxwordcount. +Added --maxwordcount.
    -Added ‑‑notrivial. +Added --notrivial.
    -Corrected problem with ‑‑subset action, which wasn't using +Corrected problem with --subset action, which wasn't using mangled sequence names.
    -Added ‑‑format=rdotplot option. +Added --format=rdotplot option.
    -Added support for ‑‑format=cigar. +Added support for --format=cigar.
    @@ -5961,8 +5961,8 @@

    Change History

    -Corrected the behavior of ‑‑exact regarding lowercase and -non-ACGT characters. ‑‑exact now considers, e.g., a lowercase A +Corrected the behavior of --exact regarding lowercase and +non-ACGT characters. --exact now considers, e.g., a lowercase A to be a match for an uppercase A. Further, any non-ACGT characters now stop the match.
    -Added the ‑‑output option. In some batch systems, it is +Added the --output option. In some batch systems, it is difficult to redirect stdout into a file, so this option allows the user to do it directly.
    -Removed ‑‑quantum and ‑‑code options, replacing +Removed --quantum and --code options, replacing them with the quantum and quantum=<code_file> sequence specifier actions. This is in preparation for allowing a quantum target sequence. @@ -6005,12 +6005,12 @@

    Change History

    extension was able to skip the boundary between sequences (this problem was introduced in 1.1.25). Second, when the exact match should have extended to the end of the sequence, it was being cut short by 1 bp (on either end). The -latter problem was only evident for ‑‑nogapped; a gapped entension +latter problem was only evident for --nogapped; a gapped entension recovered the additional bases.
    -Fixed several problems with ‑‑segment=<file>. First, if +Fixed several problems with --segment=<file>. First, if the file contained more than 4,000 segments, on some platforms the program would segfault. Second, if a sequence subrange was being used, the limit test comparing the segment interval to the subrange was incorrect. Third (if the @@ -6019,28 +6019,28 @@

    Change History

    -Added ‑‑noytrim to prevent y-drop mismatch shadow, improving +Added --noytrim to prevent y-drop mismatch shadow, improving LASTZ’s ability to align short reads.
    Set the default gapped extension score threshold to inherit the lowest HSP score in the -case where ‑‑hspthresh=top<basecount> or -‑‑hspthresh=top<percentage>% is used but -‑‑gappedthresh=<score> is not (and gapped extension is +case where --hspthresh=top<basecount> or +--hspthresh=top<percentage>% is used but +--gappedthresh=<score> is not (and gapped extension is performed). Previously this case was trapped by a low level routine and the alignment was halted.
    Fixed a problem with the start2+ field of -‑‑format=general. The position was left blank for alignments on +--format=general. The position was left blank for alignments on the + strand.
    -Fixed a problem in which ‑‑writecapsule was rejected if -‑‑seed=match<length> was used. +Fixed a problem in which --writecapsule was rejected if +--seed=match<length> was used.
    @@ -6061,7 +6061,7 @@

    Change History

    -Changed how ‑‑format=cigar reports alignments on the negative +Changed how --format=cigar reports alignments on the negative strand. Apparently there is no complete spec for CIGAR format. Matching what I see output by exonerate for certain cases is the best I can do.
    -Added cigar field for ‑‑format=general. +Added cigar field for --format=general.
    -Added shingle field for ‑‑format=general. +Added shingle field for --format=general.
    -Added the ‑‑rdotplot=<file> option. +Added the --rdotplot=<file> option.
    -The ‑‑notrivial option now works with the multiple +The --notrivial option now works with the multiple sequence specifier action.
    -Added ‑‑markend. +Added --markend.
    -Added ‑‑nameparse=darkspace. +Added --nameparse=darkspace.
    @@ -6112,13 +6112,13 @@

    Change History

    -Fixed a problem with the combination of ‑‑recoverseeds and ‑‑exact. +Fixed a problem with the combination of --recoverseeds and --exact. Recovered seeds were cut short by one base on the left end.
    -Added ‑‑format=segments option. This was later replaced by -‑‑writesegments. +Added --format=segments option. This was later replaced by +--writesegments.
    @@ -6138,21 +6138,21 @@

    Change History

    1.02.00Jan/12/2010 Relaxed the rejection of some output formats, which was too aggressive. -Specifically, runs with ‑‑tableonly were rejected because of +Specifically, runs with --tableonly were rejected because of output format, even though no output would be generated in that format.
    -Added the ability to set the ‑‑maxwordcount option as a -percentage. Also, ‑‑maxwordcount=<limit> now allows +Added the ability to set the --maxwordcount option as a +percentage. Also, --maxwordcount=<limit> now allows <limit> to be 1. Previously it was not allowed to be less than 2.
    The scoring matrix used during x-drop extension now reflects the use -of ‑‑ambiguous=n. Previously, this matrix was not affected by -‑‑ambiguous=n, +of --ambiguous=n. Previously, this matrix was not affected by +--ambiguous=n, and N-vs-N matches and N-vs-other matches were scored as -100 (more specifically, as fill_score) during gap-free extension. This caused LASTZ to miss some HSPs, usually those containing an N-vs-N match, since @@ -6167,28 +6167,28 @@

    Change History

    -Added ‑‑softmask=<mask_file> file action to permit +Added --softmask=<mask_file> file action to permit soft masking of specified intervals. Also added masking of the interval complements — -‑‑xmask=keep:<mask_file>, -‑‑nmask=keep:<mask_file>, and -‑‑softmask=keep:<mask_file>. These make it easier to +--xmask=keep:<mask_file>, +--nmask=keep:<mask_file>, and +--softmask=keep:<mask_file>. These make it easier to restrict alignment to several specified intervals of a sequence.
    -Enabled the use of ‑‑filter=[<transv>,]<matches> -for non-halfweight seeds. Previously, ‑‑filter had only been +Enabled the use of --filter=[<transv>,]<matches> +for non-halfweight seeds. Previously, --filter had only been tested for half-weight seeds, but was erroneously prohibited for all seeds (instead of just prohibiting non-halfweight seeds). Further, it -was not properly implemented for seed-only output (‑‑nogfextend -‑‑nogapped). These have all been corrected, and ‑‑filter +was not properly implemented for seed-only output (--nogfextend +--nogapped). These have all been corrected, and --filter is now available for all seed types.

    -Also corrected the behavior of ‑‑filter regarding lowercase and -non-ACGT characters. ‑‑filter now considers, e.g., a lowercase +Also corrected the behavior of --filter regarding lowercase and +non-ACGT characters. --filter now considers, e.g., a lowercase a to be a match for an uppercase A. Further, for the -purposes of ‑‑filter, any non-ACGT characters are considered to be +purposes of --filter, any non-ACGT characters are considered to be transversions.

    Also changed the behavior when the <transv> field is absent. @@ -6204,12 +6204,12 @@

    Change History

    Currently this only affects the handling of file paths. To activate it, the user must add -DcompileForWindows to the definition of definedForAll in -.../lastz‑distrib‑X.XX.XX/src/Makefile. +.../lastz-distrib-X.XX.XX/src/Makefile.
    -Fixed chaining of seed hits. Previously, if ‑‑nogfextend and -‑‑chain were used together, nothing was output. This was due to +Fixed chaining of seed hits. Previously, if --nogfextend and +--chain were used together, nothing was output. This was due to the fact that unextended seeds had no scores, and the chaining algorithm only reports chains with positive score. This has been corrected by calculating scores (as the sum of substitution scores) over anchor segments whenever (a) @@ -6217,18 +6217,18 @@

    Change History

    for later processing.

    This change may also affect (for the better) the results of gapped extension -when either ‑‑nogfextend or ‑‑exact is used. Gapped +when either --nogfextend or --exact is used. Gapped extension processes the anchors highest score first. Since -‑‑nogfextend left all scores zero, the actual order in which gapped +--nogfextend left all scores zero, the actual order in which gapped extension was performed in that case was dependent on how the sort routine (the -C runtime routine qsort) deals with ties. For ‑‑exact, the score +C runtime routine qsort) deals with ties. For --exact, the score was the length of the match. This has been changed to the segment’s substitution score.

    -Changed ‑‑format=segments to -‑‑writesegments=<file>. +Changed --format=segments to +--writesegments=<file>.
    @@ -6267,11 +6267,11 @@

    Change History

    -Added the ‑‑anyornone option. +Added the --anyornone option.
    -Added ‑‑allgappedbounds. +Added --allgappedbounds.
    @@ -6298,25 +6298,25 @@

    Change History

    -Added ‑‑progress=[<N>]. +Added --progress=[<N>]. This existed as an unadvertized option in earlier versions of the program, as -‑‑debug=queryprogress=<N>. It has now been promoted to a +--debug=queryprogress=<N>. It has now been promoted to a first class option.
    -Added ‑‑ambiguous=iupac and changed ‑‑ambiguousn to -‑‑ambiguous=n. the former is still supported, but not advertized. +Added --ambiguous=iupac and changed --ambiguousn to +--ambiguous=n. the former is still supported, but not advertized.
    -Column headers for ‑‑format=general now match the command-line +Column headers for --format=general now match the command-line keywords. Previously, all related keywords shared the same column header. For example, keywords start2, zstart2, start2+ and zstart2+ all produced the same column header, start2, in the output file.

    -Also added ‑‑format=general-. +Also added --format=general-.

    @@ -6329,12 +6329,12 @@

    Change History

    Added nmatch, nmismatch, ngap, cgap and cigarx fields for -‑‑format=general. +--format=general.
    -Added ‑‑format=mapping, a shortcut for typical fields for -‑‑format=general for mapping reads. +Added --format=mapping, a shortcut for typical fields for +--format=general for mapping reads.
    1.02.11Aug/21/2010 -Fixed the cigarx field for ‑‑format=general, so +Fixed the cigarx field for --format=general, so that a run length of 1 is omitted for indels.
    -Fixed the behavior of ‑‑recoverseeds, which was failing to +Fixed the behavior of --recoverseeds, which was failing to recover many HSPs when seed denisty was high. This was due to left extension being blocked by other seeds on that same hash-equivalent diagonal. Left -extension is now unblocked when ‑‑recoverseeds is enabled. +extension is now unblocked when --recoverseeds is enabled.
    -Changed/corrected how the ‑‑segment option handles wildcard names +Changed/corrected how the --segment option handles wildcard names when the multiple action in used. To support this, the rewind command was added to the segments file format.
    -Fixed the implementation of ‑‑self with regard to mirror-image +Fixed the implementation of --self with regard to mirror-image pairs. Previously, alignments were internally restricted to be above the main diagonal in the ungapped stage only. The mirrored twins were created prior to the gapped stage, and the gapped stage operated on the full set of anchors. @@ -6397,7 +6397,7 @@

    Change History

    1.02.16Nov/2/2010 -Fixed a problem with ‑‑self, introduced in 1.02.11. The problem +Fixed a problem with --self, introduced in 1.02.11. The problem manifested itself on 64-bit CPUs, with an error message indicating it was attempting to allocate 17 billion bytes for edit_script_copy. This has been corrected. @@ -6420,14 +6420,14 @@

    Change History

    -Added ‑‑format=blastn. +Added --format=blastn.
    Added idfrac, id%, blastid%, covfrac, cov%, confrac, con%, ncolumn, and npair fields for -‑‑format=general. +--format=general.
    @@ -6447,25 +6447,25 @@

    Change History

    is any useful reason to set gap extension to zero.
    -Added ‑‑format=rdotplot+score and -‑‑rdotplot+score=<file>. +Added --format=rdotplot+score and +--rdotplot+score=<file>.
    -Improved ‑‑masking=<count> so that it can allow a count +Improved --masking=<count> so that it can allow a count threshold greater than 254.
    -Fixed a problem with ‑‑scores=<scoring_file>. When the +Fixed a problem with --scores=<scoring_file>. When the <scoring_file> defined score values for N, those scores were not honored during the ungapped seed extension stage.
    -Fixed problems with ‑‑ambiguous=n and -‑‑ambiguous=iupac. These were +Fixed problems with --ambiguous=n and +--ambiguous=iupac. These were incorrectly penalizing substitutions between non-ambiguous nucleotides (A, C, G, or T) and ambiguous ones (N, B, D, H, K, M, R, S, V, W, or Y). This has been corrected to honor the original @@ -6477,7 +6477,7 @@

    Change History

    -Added ‑‑queryhsplimit=<n>. +Added --queryhsplimit=<n>.
    1.02.27Jan/31/2011 -Added ‑‑outputmasking=<file>. +Added --outputmasking=<file>.
    1.02.37Mar/31/2011 -Added ‑‑outputmasking:soft=<file>. +Added --outputmasking:soft=<file>.
    @@ -6511,7 +6511,7 @@

    Change History

    Changed the behavior of -‑‑queryhsplimit=<n> to +--queryhsplimit=<n> to better match user expectations. Previously the limit was applied separately for each strand of the query. Moreover, HSPs discovered before the limit was reached were still passed downstream for further processing. @@ -6523,20 +6523,20 @@

    Change History

    Fixed a bug involving the ngap and cgap fields for -‑‑format=general. These fields were only reported correctly if +--format=general. These fields were only reported correctly if the continuity or ncolumn fields were also requested. Otherwise, the value reported represented the contents of unitialized memory.
    Added filtering options -‑‑filter=nmismatch:0..<max>, -‑‑filter=ngap:0..<max>, -and ‑‑filter=cgap:0..<max>. +--filter=nmismatch:0..<max>, +--filter=ngap:0..<max>, +and --filter=cgap:0..<max>.

    Also changed the option name for match count filtering to -‑‑filter=nmatch:<min>. -The older option, ‑‑matchcount=<min> is of course still +--filter=nmatch:<min>. +The older option, --matchcount=<min> is of course still recognized.

    1.02.40Apr/7/2011 -Added ‑‑outputmasking+=<file> -and ‑‑outputmasking+:soft=<file>. +Added --outputmasking+=<file> +and --outputmasking+:soft=<file>.
    Added -‑‑progress+masking=[<N>]. +--progress+masking=[<N>]. This existed as an unadvertized option in earlier versions of the program, as -‑‑debug=queryprogress+masking=<N>. It has now been promoted +--debug=queryprogress+masking=<N>. It has now been promoted to a first class option.
    -Added ‑‑format=general fields +Added --format=general fields nucs1, nucs2 (the entire target or query nucleotides sequence), @@ -6600,7 +6600,7 @@

    Change History

    -Fixed a minor problem with the ‑‑format=general fields +Fixed a minor problem with the --format=general fields cov% and con%. Those fields were being written with an extra tab character preceeding them. This had a detrimental affect on downstream parsers that required tabs as separators (parsers that interpreted @@ -6608,26 +6608,26 @@

    Change History

    -Added ‑‑readgroup=<tags>, -allowing the specification of tags for SAM's ‑RG header line. +Added --readgroup=<tags>, +allowing the specification of tags for SAM's -RG header line.
    Added -‑‑allocate:target=<bytes> +--allocate:target=<bytes> and -‑‑allocate:query=<bytes>. +--allocate:query=<bytes>. These allow the user to predict the amount of memory needed to store target or query sequence data, which in some instances can resolve memory overuse (it saves LASTZ from incrementally predicting the amount of memory needed).

    For consistency, -‑‑allocate:traceback=<bytes> -is now renamed (from ‑‑traceback=<bytes>). +--allocate:traceback=<bytes> +is now renamed (from --traceback=<bytes>).

    -Added ‑‑include=<file>, +Added --include=<file>, allowing command-line arguemnts to be read from a text file.
    @@ -6646,8 +6646,8 @@

    Change History

    1.03.02Jul/19/2011 -Fixed a bug in ‑‑format=axt and -‑‑format=axt+, which caused every +Fixed a bug in --format=axt and +--format=axt+, which caused every alignment to be reported twice. The bug had been introduced in version 1.02.28 (not present in 1.02.27, present in 1.02.37).