Survey Design: Optimization and Sampling • spdgt.sight

library(spdgt.sight)

Overview

This vignette covers survey design using the SpeedGoat sightability API. The workflow consists of two main steps:

Optimize - Determine optimal sample allocation across strata
Sample - Select subunits using GRTS or random sampling

Authentication is handled automatically - you will be prompted to log in when needed.

Prerequisites

Before designing a survey, identify your species, survey type, and analysis unit:

# Look up required IDs
species_id <- lkup_species_id("Mule Deer")
survey_type_id <- lkup_survey_type_id("Sightability", species_id = species_id)
dau_id <- lkup_dau_id("North Converse 755", species_id = species_id)

# Check available strata
strata <- lkup_strata(
  species_id = species_id,
  survey_type_name = "Sightability"
)
print(strata)

Step 1: Optimize Sample Allocation

Use sight_optimize_design() to determine the optimal allocation of survey samples across strata using Neyman allocation.

Fixed Sample Size

Optimize allocation for a fixed total sample size:

# Using names
optimization <- sight_optimize_design(
  method = "fixed",
  value = 100,                           # Total sample size
  species = "Mule Deer",
  survey_type = "Sightability",
  analysis_unit = "North Converse 755",
  bio_year = 2024
)

# Using IDs (species_id required for historical variance calculation)
optimization <- sight_optimize_design_id(
  method = "fixed",
  value = 100,
  species_id = species_id,
  survey_type_id = survey_type_id,
  analysis_unit_id = dau_id,
  bio_year = 2024
)

Precision Target

Optimize allocation to achieve a target precision (error bound):

optimization <- sight_optimize_design(
  method = "precision",
  value = 500,                           # Error bound in animals
  species = "Mule Deer",
  survey_type = "Sightability",
  analysis_unit = "North Converse 755",
  bio_year = 2024
)

Using Previous Model Variance

For more accurate optimization, provide variance estimates from a previous model fit:

optimization <- sight_optimize_design_id(
  method = "fixed",
  value = 100,
  species_id = species_id,
  survey_type_id = survey_type_id,
  analysis_unit_id = dau_id,
  bio_year = 2024,
  samp_var = 15000,    # Sampling variance from previous model
  sight_var = 2000,    # Sightability variance
  mod_var = 800        # Model variance
)

Understanding the Output

The optimization result includes stratum-level allocations:

# View the optimization results
print(optimization)

# The output includes:
# - stratum_id: Stratum identifier
# - n_samples: Number of samples allocated
# - sample_proportion: Proportion of total samples
# - pop_sd: Historical population standard deviation

Step 2: Sample Subunits

Use sight_sample_subunits() to select subunits using GRTS (spatially-balanced) or random sampling. This function receives a payload containing the design data.

How Sampling Works

The sampling workflow is designed for integration with the SpeedGoat website:

User stratifies subunits in the website interface
Website sends the payload to the sightability API
API performs GRTS or random sampling
Results are returned for review and database save

Programmatic Sampling

The payload is a nested list describing the sampling frame: which subunits exist, which stratum each belongs to, and what proportion of each stratum to sample. You can build this list yourself for random sampling:

# Build a payload for random sampling
# Each design entry needs subunit_id and stratum_id.
# Each proportion entry needs stratum_id, proportion, and a nested
# stratum list with id and can_sample.
payload <- list(
  method = "random",
  designs = list(
    list(subunit_id = 1L, stratum_id = 1L),
    list(subunit_id = 2L, stratum_id = 1L),
    list(subunit_id = 3L, stratum_id = 1L),
    list(subunit_id = 4L, stratum_id = 2L),
    list(subunit_id = 5L, stratum_id = 2L),
    list(subunit_id = 6L, stratum_id = 2L)
  ),
  proportions = list(
    list(
      stratum_id = 1L,
      proportion = 0.5,
      stratum = list(id = 1L, can_sample = TRUE)
    ),
    list(
      stratum_id = 2L,
      proportion = 0.5,
      stratum = list(id = 2L, can_sample = TRUE)
    )
  )
)

samples <- sight_sample_subunits(method = "random", payload = payload)

GRTS sampling additionally requires subunit geometry for spatial balancing. In practice, GRTS payloads are retrieved from the database where subunit geometries are stored.

Understanding the Output

The sampling result has one row per subunit:

print(samples)

# Columns:
# - subunit_id: Subunit identifier
# - stratum_id: Stratum the subunit belongs to
# - is_selected: Whether the subunit was selected in the sample

Stratum Standard Deviation

For Neyman allocation, the API uses historical observation data to calculate stratum-level standard deviations. You can also calculate this directly:

# Calculate SD from historical entries
sd_values <- sight_calc_stratum_sd(
  species = "Mule Deer",
  analysis_unit = "North Converse 755",
  survey_type = "Sightability"       # Filter to same survey type
)

print(sd_values)
# Returns stratum_id and pop_sd columns

Complete Optimization Workflow

Here’s a complete example of the optimization workflow:

library(spdgt.sight)

# 1. Set up parameters
species <- "Mule Deer"
survey_type <- "Sightability"
analysis_unit <- "North Converse 755"
bio_year <- 2024

# 2. Optimize sample allocation for 100 samples
optimization <- sight_optimize_design(
  method = "fixed",
  value = 100,
  species = species,
  survey_type = survey_type,
  analysis_unit = analysis_unit,
  bio_year = bio_year
)

print("Optimization Results:")
print(optimization)

# 3. View stratum allocations
optimization |>
  dplyr::select(stratum_id, n_samples, sample_proportion, pop_sd)

Tips and Best Practices

Review historical data - Use sight_read_entries() to check what data is available
Filter by survey type - Use consistent survey types when calculating stratum SD (don’t mix sightability and composition surveys)
Review optimization - Check that allocations make sense before sampling
Use GRTS for spatial balance - Prefer GRTS over random for more even spatial coverage; use random for less taxi time
Cache IDs - Look up IDs once and reuse them for efficiency

Next Steps

After completing your survey, see vignette("model-fitting") to fit a sightability model and estimate abundance.