SpatialPlanning_Workshop2021/03-GettingStartedPrioritizr.Rmd at master · SpatialPlanning/SpatialPlanning_Workshop2021 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
# Getting Started with Prioritizr

## Package overview

The _prioritizr R_ package contains eight main types of functions. These functions are used to:

* create a new conservation planning [problem](https://prioritizr.net/reference/problem.html) by specifying the planning units, features, and management zones of conservation interest (e.g. species, ecosystems).
* add an [objective](https://prioritizr.net/reference/objectives.html) to a conservation planning problem.
* add [targets](https://prioritizr.net/reference/targets.html) to a problem to specify how much of each feature is desired or required to be conserved in the solutions.
* add [constraints](https://prioritizr.net/reference/constraints.html) to a conservation planning problem to ensure that solutions exhibit specific properties (e.g. select specific planning units for protection).
* add [penalties](https://prioritizr.net/reference/penalties.html) to a problem to penalize solutions according to specific metric (e.g. connectivity).
* add [decisions](https://prioritizr.net/reference/decisions.html) to a problem to specify the nature of the decisions in the problem.
* add methods to generate a [portfolio](https://prioritizr.net/reference/portfolios.html) of solutions.
* add a [solver](https://prioritizr.net/reference/solvers.html) to a conservation problem to specify which software should be used to generate solutions and customize the optimization process.
* [solve](https://prioritizr.net/reference/solve.html) a conservation problem.
* evaluate a solution by computing [summary](https://prioritizr.net/reference/summaries.html) statistics.
* evaluate the relative [importance](https://prioritizr.net/reference/importance.html) (irreplaceability) of planning units selected in a solution.

## Package work flow

The general work flow when using the _prioritizr R_ package starts with creating a new conservation planning `problem` object using data. Specifically, the `problem` object should be constructed using data that specify:
* the planning units,
* biodiversity features,
* management zones (if applicable), and
* costs.

After creating a new `problem` object, it can be customized by adding:
* objectives,
* penalties and constraints
to build a precise representation of the conservation planning problem required.

It is then solved to obtain a solutions.

### Objectives
All conservation planning problems require an objective. An objective specifies the property which is used to compare different feasible solutions. Simply put, the objective is the property of the solution which should be maximized or minimized during the optimization process. For instance:
* with the minimum set objective (specified using `add_min_set_objective`), we are seeking to minimize the cost of the solution (similar to _Marxan_)
* with the maximum coverage objective (specified using `add_max_cover_objective`), we are seeking to maximize the number of different features represented in the solution.

Many objectives require `targets` (e.g. the minimum set objective). Targets are a specialized set of constraints that relate to the total quantity of each feature secured in the solution (e.g. amount of suitable habitat or number of individuals). Fo example:
* the minimum set objective ( `add_min_set_objective`) are used to ensure that solutions secure a sufficient quantity of each feature
* the maximum features objective ( `add_max_features_objective`) are used to assess whether a feature has been adequately conserved by a candidate solution.

Targets can be expressed:
* numerically as the total amount required for a given feature (using `add_absolute_targets`), or as
* a proportion of the total amount found in the planning units (using `add_relative_targets`).

**Note** not all objectives require targets, and a warning will be thrown if an attempt is made to add targets to a problem with an objective that does not use them.

### Penaltes and Constraints

Constraints and penalties can be added to a conservation planning problem to ensure that solutions exhibit a specific property or penalize solutions which don't exhibit a specific property (respectively).
* **Constraints** are used to rule out potential solutions that don't exhibit a specific property. For instance, constraints can be used to ensure that specific planning units are selected in the solution for prioritization (using `add_locked_in_constraints`) or not selected in the solution for prioritization (using `add_locked_out_constraints`).
* **Penalties** are combined with the objective of a problem, with a penalty factor, and the overall objective of the problem then becomes to minimize (or maximize) the primary objective function and the penalty function. For example, penalties can be added to a problem to penalize solutions that are excessively fragmented (using `add_boundary_penalties`). These penalties have a `penalty` argument that specifies the relative importance of having spatially clustered solutions. When the argument to `penalty` is high, then solutions which are less fragmented are valued more highly---even if they cost more---and when the argument to `penalty` is low, then the solutions which are more fragmented are valued less highly.

After building a conservation problem, it can then be solved to obtain a solution (or portfolio of solutions if desired). The solution is returned in the same format as the planning unit data used to construct the problem. This means that if raster or shapefile / vector data was used when initializing the problem, then the solution will also be in raster or shapefile / vector data.

## Usage

Here we will provide an introduction to using the _prioritizr R_ package to build and solve a conservation planning problem.

First, we will load the _prioritizr_ package.

```{r, results = "hide", message = FALSE, warning = FALSE}
# load package
library(prioritizr)

# set default options for printing tabular data
options(tibble.width = Inf)
```

### Simulated Data

Now we will load some built-in data sets that are distributed with the _prioritizr R_ package. This package contains several different planning unit data sets. To provide a comprehensive overview of the different ways that we can initialize a conservation planning problem, we will load each of them.

First, we will load the raster planning unit data (`sim_pu_raster`). Here, the planning units are represented as a raster (i.e. a `RasterLayer` object) and each pixel corresponds to the spatial extent of each planning unit. The pixel values correspond to the acquisition costs of each planning unit.

```{r, warning = FALSE}
# load raster planning unit data
data(sim_pu_raster)

# print description of the data
print(sim_pu_raster)

# plot the data
plot(sim_pu_raster)
```

Secondly, we will load one of the spatial vector planning unit data sets (`sim_pu_polygons`). Here, each polygon (i.e. feature using ArcGIS terminology) corresponds to a different planning unit. This data set has an attribute table that contains additional information about each polygon. Namely, the `cost` field (column) in the attribute table contains the acquisition cost for each planning unit.

```{r, warning = FALSE}
# load polygon planning unit data
data(sim_pu_polygons)

# print first six rows of attribute table
head(sim_pu_polygons@data)

# plot the planning units
spplot(sim_pu_polygons, zcol = "cost")
```

Thirdly, we will load some planning unit data stored in tabular format (i.e. `data.frame` format). Each row in the planning unit table must correspond to a different planning unit. The table must also have an "id" column to provide a unique integer identifier for each planning unit, and it must also have a column that indicates the cost of each planning unit.

```{r, warning = FALSE}
# specify file path for planning unit data
pu_path <- system.file("extdata/input/pu.dat", package = "prioritizr")

# load in the tabular planning unit data
# note that we use the data.table::fread function, as opposed to the read.csv
# function, because it is much faster
pu_dat <- data.table::fread(pu_path, data.table = FALSE)

# preview first six rows of the tabular planning unit data
# note that it has some extra columns other than id and cost as per the
# Marxan format
head(pu_dat)
```

Finally, we will load data showing the spatial distribution of the conservation features. Our conservation features (`sim_features`) are represented as a stack of raster objects (i.e. a `RasterStack` object) where each layer corresponds to a different feature (e.g. a multi-band GeoTIFF where each band corresponds to a different feature). The pixel values in each layer correspond to the amount of suitable habitat available in a given planning unit. Note that our planning unit raster layer and our conservation feature stack have exactly the same spatial properties (i.e. resolution, extent, coordinate reference system) so their pixels line up perfectly.

```{r, fig.width = 4, fig.height = 3, warning = FALSE}
# load feature data
data(sim_features)

# plot the distribution of suitable habitat for each feature
plot(sim_features, main = paste("Feature", seq_len(nlayers(sim_features))),
     nr = 2, box = FALSE, axes = FALSE)
```

### Initialize a problem

After having loaded our planning unit and feature data, we will now try initializing the some conservation planning problems. There are a lot of different ways to initialize a conservation planning problem, so here we will just showcase a few of the more commonly used methods. For an exhaustive description of all the ways you can initialize a conservation problem, see the help file for the `problem` function (which you can open using the code `?problem`).

First off, we will initialize a conservation planning problem using the raster data.

```{r, warning = FALSE}
# create problem
p1 <- problem(sim_pu_raster, sim_features)

# print problem
print(p1)

# print number of planning units
number_of_planning_units(p1)

# print number of features
number_of_features(p1)
```

Generally, we recommend initializing problems using raster data where possible. This is because the `problem` function needs to calculate the amount of each feature in each planning unit, and by providing both the planning unit and feature data in raster format with the same spatial resolution, extents, and coordinate systems, this means that the `problem` function does not need to do any geo-processing behind the scenes.

Sometimes we can't use raster planning unit data because our planning units aren't equal-sized grid cells. So, below is an example showing how we can initialize a conservation planning problem using planning units that are formatted as spatial vector data. Note that if we had pre-computed the amount of each feature in each planning unit and stored the data in the attribute table, we could pass in the names of the columns as an argument to the `problem` function.

```{r, warning = FALSE}
# create problem with spatial vector data
# note that we have to specify which column in the attribute table contains
# the cost data
p2 <- problem(sim_pu_polygons, sim_features, cost_column = "cost")

# print problem
print(p2)
```

We can also initialize a conservation planning problem using tabular planning unit data (i.e. a `data.frame`). Since the tabular planning unit data does not contain any spatial information, we also have to provide the feature data in tabular format (i.e. a `data.frame`) and data showing the amount of each feature in each planning unit in tabular format (i.e. a `data.frame`). The feature data must have an "id" column containing a unique integer identifier for each feature, and the planning unit by feature data must contain the following three columns: "pu" corresponding to the planning unit identifiers, "species" corresponding to the feature identifiers, and "amount" showing the amount of a given feature in a given planning unit.

```{r, warning = FALSE}
# set file path for feature data
spec_path <- system.file("extdata/input/spec.dat", package = "prioritizr")

# load in feature data
spec_dat <- data.table::fread(spec_path, data.table = FALSE)

# print first six rows of the data
# note that it contains extra columns
head(spec_dat)

# set file path for planning unit vs. feature data
puvspr_path <- system.file("extdata/input/puvspr.dat", package = "prioritizr")

# load in planning unit vs feature data
puvspr_dat <- data.table::fread(puvspr_path, data.table = FALSE)

# print first six rows of the data
head(puvspr_dat)

# create problem
p3 <- problem(pu_dat, spec_dat, cost_column = "cost", rij = puvspr_dat)

# print problem
print(p3)
```

For more information on initializing problems, please see the help page for the `problem` function (which you can open using the code `?problem`). Now that we have initialized a conservation planning problem, we will show you how you can customize it to suit the exact needs of your conservation planning scenario.

Although we initialized the conservation planning problems using several different methods, moving forward, we will only use raster-based planning unit data to keep things simple.

### Add an objective

The next step is to add an objective to the problem. A problem objective is used to specify **the primary goal of the problem** (i.e. the quantity that is to be maximized or minimized). All conservation planning problems involve minimizing or maximizing some kind of objective. For instance, we might require a solution that conserves enough habitat for each species while minimizing the overall cost of the reserve network. Alternatively, we might require a solution that maximizes the number of conserved species while ensuring that the cost of the reserve network does not exceed the budget.

Please note that objectives are added in the same way regardless of the type of data used to initialize the problem.

The _prioritizr R_ package supports a variety of different objective functions.

* __Minimum set objective__: Minimize the cost of the solution whilst ensuring that all targets are met [@r11]. This objective is similar to that used in _Marxan_ [@r3]. For example, we can add a minimum set objective to a problem using the following code.
```{r, warning = FALSE}
# create a new problem that has the minimum set objective
p3 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective()

# print the problem
print(p3)
```

* __Maximum cover objective__: Represent at least one instance of as many features as possible within a given budget [@r12].
```{r, warning = FALSE}
# create a new problem that has the maximum coverage objective and a budget
# of 5000
p4 <- problem(sim_pu_raster, sim_features) %>%
  add_max_cover_objective(5000)

# print the problem
print(p4)
```

* __Maximum features objective__: Fulfill as many targets as possible while ensuring that the cost of the solution does not exceed a budget [inspired by @r10]. This object is similar to the maximum cover objective except that we have the option of later specifying targets for each feature. In practice, this objective is more useful than the maximum cover objective because features often require a certain amount of area for them to persist and simply capturing a single instance of habitat for each feature is generally unlikely to enhance their long-term persistence.
```{r, warning = FALSE}
# create a new problem that has the maximum features objective and a budget
# of 5000
p5 <- problem(sim_pu_raster, sim_features) %>%
  add_max_features_objective(budget = 5000)

# print the problem
print(p5)
```

* __Minimum shortfall objective__: Minimize the shortfall for as many targets as possible while ensuring that the cost of the solution does not exceed a budget. In practice, this objective useful when there is a large amount of left-over budget when using the maximum feature representation objective and the remaining funds need to be allocated to places that will enhance the representation of features with unmet targets.
```{r, warning = FALSE}
# create a new problem that has the minimum shortfall objective and a budget
# of 5000
p6 <- problem(sim_pu_raster, sim_features) %>%
  add_min_shortfall_objective(budget = 5000)

# print the problem
print(p6)
```

* __Maximum utility objective__: Secure as much of the features as possible without exceeding a budget. This objective is functionally equivalent to selecting the planning units with the greatest amounts of each feature (e.g. species richness). Generally, we don't encourage the use of this objective because it will only rarely identify complementary solutions---solutions which adequately conserve a range of different features---except perhaps to explore trade-offs or provide a baseline solution with which to compare other solutions.
```{r, warning = FALSE}
# create a new problem that has the maximum utility objective and a budget
# of 5000
p9 <- problem(sim_pu_raster, sim_features) %>%
      add_max_utility_objective(budget = 5000)

# print the problem
print(p9)
```

### Add targets

Most conservation planning problems require targets. Targets are used to specify the minimum amount or proportion of a feature's distribution that needs to be protected in the solution. For example, we may want to develop a reserve network that will secure 20% of the distribution for each feature for minimal cost.

There are four ways for specifying targets in the _prioritizr R_ package:

  * __Absolute targets__: Targets are expressed as the total amount of each feature in the study area that need to be secured. For example, if we had binary feature data that showed the absence or presence of suitable habitat across the study area, we could set an absolute target as 5 to mean that we require 5 planning units with suitable habitat in the solution.
```{r, warning = FALSE}
# create a problem with targets which specify that the solution must conserve
# a need a sum total of 3 units of suitable habitat for each feature
p10 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_absolute_targets(3)

# print problem
print(p10)
```

* __Relative targets__: Targets are set as a proportion (between 0 and 1) of the total amount of each feature in the study area. For example, if we had binary feature data and the feature occupied a total of 20 planning units in the study area, we could set a relative target of 50 % to specify that the solution must secure 10 planning units for the feature. We could alternatively specify an absolute target of 10 to achieve the same result, but sometimes proportions are easier to work with.
```{r, warning = FALSE}
# create a problem with the minimum set objective and relative targets of 10 %
# for each feature
p11 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1)

# print problem
print(p11)

# create a problem with targets which specify that we need 10 % of the habitat
# for the first feature, 15 % for the second feature, 20 % for the third feature
# 25 % for the fourth feature and 30 % of the habitat for the fifth feature
targets <- c(0.1, 0.15, 0.2, 0.25, 0.3)
p12 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(targets)

# print problem
print(p12)
```

* __Log-linear targets__: Targets are expressed using scaling factors and log-linear interpolation. This method for specifying targets is commonly used for global prioritization analyses [@r15].
```{r, warning = FALSE}
# create problem with added log-linear targets
p13 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_loglinear_targets(10, 0.9, 100, 0.2)

# print problem
print(p13)
```

* __Manual targets__: Targets are manually specified. This is only really recommended for advanced users or problems that involve multiple management zones. See the [zones vignette](zones.html) for more information on these targets.

As with the functions for specifying the objective of a problem, if we try adding multiple targets to a problem, only the most recently added set of targets are used.

### Add constraints

A constraint can be added to a conservation planning problem to ensure that all solutions exhibit a specific property. For example, they can be used to make sure that all solutions select a specific planning unit or that all selected planning units in the solution follow a certain configuration.

The following constraints can be added to conservation planning problems in the _prioritizr R_ package.

* __Locked in constraints__: Add constraints to ensure that certain planning units are prioritized in the solution. For example, it may be desirable to lock in planning units that are inside existing protected areas so that the solution fills in the gaps in the existing reserve network.
```{r, warning = FALSE}
# create problem with constraints which specify that the first planning unit
# must be selected in the solution
p14 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_locked_in_constraints(1)

# print problem
print(p14)
```

* __Locked out constraints__: Add constraints to ensure that certain planning units are not prioritized in the solution. For example, it may be useful to lock out planning units that have been degraded and are not suitable for conserving species.
```{r, warning = FALSE}
# create problem with constraints which specify that the second planning unit
# must not be selected in the solution
p15 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_locked_out_constraints(2)

# print problem
print(p15)
```

* __Neighbor constraints__: Add constraints to a conservation problem to ensure that all selected planning units have at least a certain number of neighbors.
```{r, warning = FALSE}
# create problem with constraints which specify that all selected planning units
# in the solution must have at least 1 neighbor
p16 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_neighbor_constraints(1)

# print problem
print(p16)
```

* __Contiguity constraints__: Add constraints to a conservation problem to ensure that all selected planning units are spatially connected to each other and form spatially contiguous unit.
```{r, warning = FALSE}
# create problem with constraints which specify that all selected planning units
# in the solution must form a single contiguous unit
p17 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_contiguity_constraints()

# print problem
print(p17)
```

* __Feature contiguity constraints__: Add constraints to ensure that each feature is represented in a contiguous unit of dispersible habitat. These constraints are a more advanced version of those implemented in the `add_contiguity_constraints` function, because they ensure that each feature is represented in a contiguous unit and not that the entire solution should form a contiguous unit.
```{r, warning = FALSE}
# create problem with constraints which specify that the planning units used
# to conserve each feature must form a contiguous unit
p18 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_feature_contiguity_constraints()

# print problem
print(p18)
```

* __Mandatory allocation constraints__: Add constraints to ensure that every planning unit is allocated to a management zone in the solution. Please note that this function can only be used with problems that contain multiple zones. For more information on problems with multiple zones and an example using this function, see the Management Zones vignette.

In particular, The `add_locked_in_constraints` and `add_locked_out_constraints` functions are incredibly useful for real-world conservation planning exercises, so it's worth pointing out that there are several ways we can specify which planning units should be locked in or out of the solutions. If we use raster planning unit data, we can also use raster data to specify which planning units should be locked in our locked out.

```{r, fig.width = 6.5, fig.height = 2.5, warning = FALSE}
# load data to lock in or lock out planning units
data(sim_locked_in_raster)
data(sim_locked_out_raster)

# plot the locked data
plot(stack(sim_locked_in_raster, sim_locked_out_raster),
     main = c("Locked In", "Locked Out"))

# create a problem using raster planning unit data and use the locked raster
# data to lock in some planning units and lock out some other planning units
p19 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_locked_in_constraints(sim_locked_in_raster) %>%
       add_locked_out_constraints(sim_locked_out_raster)

# print problem
print(p19)
```

If our planning unit data are in a spatial vector format (similar to the `sim_pu_polygons` data) or a tabular format (similar to `pu_dat`), we can use the field names in the data to refer to which planning units should be locked in and / or out. For example, the `sim_pu_polygons` object has `TRUE` / `FALSE` values in the "locked_in" field which indicate which planning units should be selected in the solution. We could use the data in this field to specify that those planning units with `TRUE` values should be locked in using the following methods.

```{r, warning = FALSE}
# preview first six rows of the attribute table for sim_pu_polygons
head(sim_pu_polygons@data)

# specify locked in data using the field name
p20 <- problem(sim_pu_polygons, sim_features, cost_column = "cost") %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_locked_in_constraints("locked_in")

# print problem
print(p20)

# specify locked in data using the values in the field
p21 <- problem(sim_pu_polygons, sim_features, cost_column = "cost") %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_locked_in_constraints(which(sim_pu_polygons$locked_in))

# print problem
print(p21)
```

### Add penalties

We can also add penalties to a problem to favor or penalize solutions according to a secondary objective.

Unlike the constraint functions, these functions will add extra information to the objective function of the optimization function to penalize solutions that do not exhibit specific characteristics. For example, penalties can be added to a problem to avoid highly fragmented solutions at the expense of accepting slightly more expensive solutions. All penalty functions have a `penalty` argument that controls the relative importance of the secondary penalty function compared to the primary objective function. It is worth noting that incredibly low or incredibly high `penalty` values---relative to the main objective function---can cause problems or take a very long time to solve, so when trying out a range of different penalty values it can be helpful to limit the solver to run for a set period of time.

The _prioritizr R_ package currently offers only two methods for adding penalties to a conservation planning problem.

* __Boundary penalties__:  Add penalties to penalize solutions that are excessively fragmented. These penalties are similar to those used in _Marxan_ [@r3; @r1].
```{r, warning = FALSE}
# create problem with penalties that penalize fragmented solutions with a
# penalty factor of 0.01
p22 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_boundary_penalties(penalty = 0.01)

# print problem
print(p22)
```

* __Connectivity penalties__: Add penalties to favor solutions that select combinations of planning units with high connectivity between them. These penalties are similar to those used in _Marxan with Zones_ [@r2; @r1]. This function supports both symmetric and asymmetric connectivities among planning units.
```{r, warning = FALSE}
# create problem with penalties that favor combinations of planning units with
# high connectivity, here we will use only the first four layers in
# sim_features for the features and we will use the fifth layer in sim_features
# to represent the connectivity data, where the connectivity_matrix function
# will create a matrix showing the average strength of connectivity between
# adjacent planning units using the data in the fifth layer of sim_features
p23 <- problem(sim_pu_raster, sim_features[[1:4]]) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_boundary_penalties(penalty = 5,
                              data = connectivity_matrix(sim_pu_raster,
                                                         sim_features[[5]]))

# print problem
print(p23)
```

* __Linear penalties__: Add penalties to penalize solutions that select planning units according to a certain variable (e.g. anthropogenic pressure).
```{r, warning = FALSE}
# create data for penalizing planning units
pen_raster <- simulate_cost(sim_pu_raster)

# create problem with penalties that penalize solutions that select
# planning units with high values in the pen_raster object,
# here we will use a penalty value of 5 to indicate the trade-off (scaling)
# between the penalty values (in the sim_pu_raster) and the main objective
# (i.e. the cost of the solution)
p24 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_linear_penalties(penalty = 5, data = pen_raster)

# print problem
print(p24)
```

### Add the decision types

Conservation planning problems involve making decisions on how planning units will be managed.

These decisions are then associated with management actions (e.g. turning a planning unit into a protected area). The type of decision describes how the action is applied to planning units. For instance, the default decision-type is a binary decision type, meaning that we are either selecting or not selecting planning units for management.

The _prioritizr R_ package currently offers the following types of decisions for customizing problems.

* __Binary decisions__: Add a binary decision to a conservation planning problem. This is the classic decision of either prioritizing or not prioritizing a planning unit. Typically, this decision has the assumed action of buying the planning unit to include in a protected area network. If no decision is added to a problem object, then this decision class will be used by default.
```{r, warning = FALSE}
# add binary decisions to a problem
p25 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_binary_decisions()

# print problem
print(p25)
```
* __Proportion decisions__: Add a proportion decision to a problem. This is a relaxed decision where a part of a planning unit can be prioritized, as opposed to the default of the entire planning unit. Typically, this decision has the assumed action of buying a fraction of a planning unit to include in a protected area network. Generally, problems can be solved much faster with proportion-type decisions than binary-type decisions, so they can be very useful when commercial solvers are not available.
```{r, warning = FALSE}
# add proportion decisions to a problem
p26 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_proportion_decisions()

# print problem
print(p26)
```
* __Semi-continuous decisions__: Add a semi-continuous decision to a problem. This decision is similar to proportion decisions except that it has an upper bound parameter. By default, the decision can range from prioritizing none (0%) to all (100%) of a planning unit. However, a upper bound can be specified to ensure that at most only a fraction (e.g. 80%) of a planning unit can be purchased. This type of decision may be useful when it is not practical to conserve the entire area indicated by a planning unit.
```{r, warning = FALSE}
# add semi-continuous decisions to a problem, where we can only manage at most
# 50 % of the area encompassed by a planning unit
p27 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_semicontinuous_decisions(0.5)

# print problem
print(p27)
```

### Add a solver

Next, after specifying the mathematical formulation that underpins your conservation planning problem, you can specify how the problem should be solved. If you do not specify this information, the _prioritizr R_ package will automatically use the best solver currently installed on your system with some reasonable defaults. __We strongly recommend installing the [Gurobi software suite and the _gurobi_ _R_ package](https://www.gurobi.com/) to solve problems, and for more information on this topic please refer to the [Gurobi Installation Guide](gurobi_installation.html)__.

Currently, the _prioritizr R_ package only supports three different solvers.

* __*Gurobi* solver__: [_Gurobi_](https://www.gurobi.com/) is a state of the art commercial optimization software. It is by far the fastest of the solvers that can be used to solve conservation problems. However, it is not freely available. That said, special licenses are available to academics at no cost.
```{r, warning = FALSE}
# create a problem and specify that Gurobi should be used to solve the problem
# and specify an optimality gap of zero to obtain the optimal solution
p28 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_binary_decisions() %>%
       add_gurobi_solver(gap = 0)

# print problem
print(p28)
```
* __*IBM CPLEX* solver__: [_IBM CPLEX_](https://www.ibm.com/analytics/cplex-optimizer) is a commercial optimization software. It is faster than the open source solvers available for generating prioritizations (see below), however, it is not freely available. Similar to the _Gurobi_ software, special licenses are available to academics at no cost.
```{r, warning = FALSE}
# create a problem and specify that IBM CPLEX should be used to solve the
# problem and specify an optimality gap of zero to obtain the optimal solution
# p29 <- problem(sim_pu_raster, sim_features) %>%
#        add_min_set_objective() %>%
#        add_relative_targets(0.1) %>%
#        add_binary_decisions() %>%
#        add_cplex_solver(gap = 0)
#
# # print problem
# print(p29)
```
* __*Rsymphony* solver__: [_SYMPHONY_](https://projects.coin-or.org/SYMPHONY) is an open-source integer programming solver that is part of the Computational Infrastructure for Operations Research (COIN-OR) project, an initiative to promote development of open-source tools for operations research. The _Rsymphony R_ package provides an interface to COIN-OR and is available on The Comprehensive R Archive Network (CRAN).
```{r, warning = FALSE}
# create a problem and specify that Rsymphony should be used to solve the
# problem and specify an optimality gap of zero to obtain the optimal solution
# p30 <- problem(sim_pu_raster, sim_features) %>%
#        add_min_set_objective() %>%
#        add_relative_targets(0.1) %>%
#        add_binary_decisions() %>%
#        add_rsymphony_solver(gap = 0)
#
# # print problem
# print(p30)
```
* __*lpsymphony* solver__: The _lpsymphony R_ package provides a different interface to the COIN-OR software suite. This package may be easier to install on Windows and Mac OSX operating systems than the _Rsymphony R_ package. Unlike the _Rsymphony R_ package, the _lpsymphony R_ package is distributed through [Bioconductor](http://bioconductor.org/packages/release/bioc/html/lpsymphony.html).
```{r, warning = FALSE}
# create a problem and specify that lpsymphony should be used to solve the
# problem and specify an optimality gap of zero to obtain the optimal solution
p31 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_binary_decisions() %>%
       add_lpsymphony_solver(gap = 0)

# print problem
print(p31)
```

### Add a portfolio

Many conservation planning exercises require a portfolio of solutions. For example, real-world exercises can involve presenting decision makers with a range of near-optimal decisions. Additionally, the number of times that different planning units are selected in different solutions can provide insight into their relative importance.

The following methods are available for generating a portfolio of solutions.

* __Extra portfolio__: Generate a portfolio of solutions by storing feasible solutions found during the optimization process. Note that this method requires that the _Gurobi_ optimization software is used to generate solutions.
```{r, warning = FALSE}
# create a problem and specify that a portfolio should be created using
# extra solutions found while solving the problem
p32 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_binary_decisions() %>%
       add_extra_portfolio()

# print problem
print(p32)
```
* __Top portfolio__: Generate a portfolio of solutions by finding a pre-specified number of solutions that are closest to optimality (i.e the top solutions). Note that this method requires that the _Gurobi_ optimization software is used to generate solutions.
```{r, warning = FALSE}
# create a problem and specify that a portfolio should be created using
# the top five solutions
p33 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_binary_decisions() %>%
       add_top_portfolio(number_solutions = 5)

# print problem
print(p33)
```
* __Gap portfolio__: Generate a portfolio of solutions by finding a certain number of solutions that are all within a pre-specified optimality gap. This method is especially useful for generating multiple solutions that can used be to calculate selection frequencies (similar to _Marxan_). Note that this method requires that the _Gurobi_ optimization software is used to generate solutions.
```{r, warning = FALSE}
# create a problem and specify that a portfolio should be created by
# finding five solutions within 10% of optimality
p34 <- problem(sim_pu_raster, sim_features) %>%
       add_min_set_objective() %>%
       add_relative_targets(0.1) %>%
       add_binary_decisions() %>%
       add_gap_portfolio(number_solutions = 5, pool_gap = 0.2)

# print problem
print(p34)
```
* __Cuts portfolio__: Generate a portfolio of distinct solutions within a pre-specified optimality gap. This method is only recommended if the _Gurobi_ optimization solver is not available.
```{r, warning = FALSE}
# create a problem and specify that a portfolio containing 10 solutions
# should be created using using Bender's cuts
p35 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_binary_decisions() %>%
  add_cuts_portfolio(number_solutions = 10)

# print problem
print(p35)
```
* __Shuffle portfolio__: Generate a portfolio of solutions by randomly reordering the data prior to attempting to solve the problem. If the _Gurobi_ optimization solver is not available, this method is the fastest method for generating a set number of solutions within a specified distance from optimality.
```{r, warning = FALSE}
# create a problem and specify a portfolio should be created that contains
# 10 solutions and that any duplicate solutions should not be removed
p36 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_binary_decisions() %>%
  add_shuffle_portfolio(number_solutions = 10, remove_duplicates = FALSE)

# print problem
print(p36)
```

### Solve the problem

Finally, after formulating our conservation planning problem and specifying how the problem should be solved, we can use the `solve` function to obtain a solution. Note that the solver will typically print out some information describing the size of the problem and report its progress when searching for a suitable solution.

```{r, fig.height = h, fig.width = w, warning = FALSE}
# formulate the problem
p37 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_boundary_penalties(penalty = 500, edge_factor = 0.5) %>%
  add_binary_decisions()

# solve the problem (using the default solver)
s37 <- solve(p37)

# plot solution
plot(s37, col = c("grey90", "darkgreen"), main = "Solution",
     xlim = c(-0.1, 1.1), ylim = c(-0.1, 1.1))
```

We can plot this solution because the planning unit input data are spatially referenced in a raster format. The output format will always match the planning unit data used to initialize the problem. For example, the solution to a problem with planning units in a spatial vector (shapefile) format would also be in a spatial vector format. Similarly, if the planning units were in a tabular format (i.e. `data.frame`), the solution would also be returned in a tabular format.

We can also extract attributes from the solution that describe the quality of the solution and the optimization process.

```{r, warning = FALSE}
# extract the objective (numerical value being minimized or maximized)
print(attr(s37, "objective"))

# extract time spent solving solution
print(attr(s37, "runtime"))

# extract state message from the solver that describes why this specific
# solution was returned
print(attr(s37, "status"))
```

### Evaluate the performance of a solution

After obtaining a solution to a conservation planning problem, it can be useful to calculate various summary statistics to understand its performance. The following functions are available to summarize a solution:

  * Calculate the number of planning units selected within a solution.
```{r, warning = FALSE}
# calculate statistic
eval_n_summary(p37, s37)
```
* Calculate the total cost of a solution.
```{r, warning = FALSE}
# calculate statistic
eval_cost_summary(p37, s37)
```

* Calculate how well features are represented by a solution. This function can be used for problems that are built using targets and those that are not built using targets.
```{r, warning = FALSE}
# calculate statistics
eval_feature_representation_summary(p37, s37)
```

* Calculate how well feature representation targets are met by a solution. This function can only be used with problems containing targets.
```{r, warning = FALSE}
# calculate statistics
eval_target_coverage_summary(p37, s37, include_zone = FALSE, include_sense = FALSE)
```
* Calculate the exposed boundary length (perimeter) associated with a solution.
```{r, warning = FALSE}
# calculate statistic
eval_boundary_summary(p37, s37)
```

* Calculate the connectivity held within a solution.
```{r, warning = FALSE}
# calculate statistic
# here we use the raster data for the first feature as an example
# to parametrize pair-wise connectivity between different planning units
eval_connectivity_summary(
  p37, s37, data = connectivity_matrix(sim_pu_raster, sim_features[[1]]))
```

### Importance (irreplaceability)

Conservation plans can take a long time to implement. Since funding availability and habitat quality can decline over time, **it is critical that the most important places in a prioritization are scheduled for protection as early as possible**. For instance:

* some planning units in a solution might contain many rare species which do not occur in any other planning units.

* some planning units might offer an especially high return on investment that reduces costs considerably.

As a consequence, conservation planners often need information on which planning units selected in a prioritization are most important to the overall success of the prioritization. To achieve this, conservation planners can use importance (irreplaceability) scores for each planning unit selected in a solution.

The _prioritizr R_ package offers multiple methods for assessing importance. These includes scores calculated based on:

* __Replacement Costs__: [`eval_replacement_importance()`; @r31] quantify the change in the objective function (e.g. additional costs required to meet feature targets) of the optimal solution if a given planning unit in a solution cannot be acquired. They can:
  1. account for the cost of different planning units,
  2. account for multiple management zones,
  3. apply to any objective function, and
  4. identify truly irreplaceable planning units (denoted with infinite values).

* __Ferrier Scores__: [`eval_ferrier_importance()`; @r34] quantify the importance of planning units for meeting feature targets. They can only be applied to conservation problems with a minimum set objective and a single zone (i.e. the classic _Marxan_-type problem). Furthermore---unlike the replacement cost scores---the Ferrier scores provide a score for each feature within each planning unit, providing insight into why certain planning units are more important than other planning units.

* __rarity weighted richness scores__ [`eval_rare_richness_importance()`; @r33] are simply a measure of biological diversity. They do not account for planning costs, multiple management zones, objective functions, or feature targets (or weightings). They merely describe the spatial patterns of biodiversity, and do not account for many of the factors needed to quantify the importance of a planning unit for achieving conservation goals.

We recommend using replacement cost scores for small and moderate sized problems (e.g. less than 30,000 planning units) when it is feasible to do so. It can take a very long time to compute replacement cost scores, and so it is simply not feasible to compute these scores for particularly large problems. For moderate and large sized problems (e.g. more than 30,000 planning units),  we recommend using the rarity weighted richness scores. Beware, it has been known for decades that such static measures of biodiversity lead to poor conservation plans [@r17]. Although the Ferrier method is also provided, we do not recommend using this method until it has been verified by a statistical expert.

Below we will generate a solution, and then calculate importance scores for the planning units selected in the solution using the three different methods.

```{r, fig.height = h, fig.width = w, warning = FALSE}
# formulate the problem
p38 <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.1) %>%
  add_binary_decisions()

# solve the problem
s38 <- solve(p38)

# plot solution
plot(s38, col = c("grey90", "darkgreen"), main = "Solution",
     xlim = c(-0.1, 1.1), ylim = c(-0.1, 1.1))

# calculate replacement cost scores and make the solver quiet
rc38 <- p38 %>%
  add_default_solver(gap = 0, verbose = FALSE) %>%
  eval_replacement_importance(s38)

# plot replacement cost scores
plot(rc38, main = "replacement cost")
```

```{r, fig.height = h, fig.width = w, warning = FALSE}
# calculate Ferrier scores and extract total score
fs38 <- eval_ferrier_importance(p38, s38)[["total"]]

# plot Ferrier scores
plot(fs38, main = "Ferrier scores")
```

```{r, fig.height = h, fig.width = w, warning = FALSE}
# calculate rarity weighted richness scores
rwr38 <- eval_rare_richness_importance(p38, s38)

# plot replacement cost scores
plot(rwr38, main = "rarity weighted richness")
```

Although rarity weighted richness scores can approximate scores derived from the other two methods in certain conservation planning exercises, we can see that the rarity weighted richness scores provide completely different results in this case.