Run the embedding pipeline from a triplet data list — run_embeddings_from

A convenience wrapper around run_embeddings that accepts triplet data already loaded into R as a named list — the format returned by get.combined — rather than reading from CSV files.

Usage

run_embeddings_from_list(
  triplet_list,
  output_dir,
  d = 5L,
  max_epochs = 50000L,
  tolerance = 1e-04,
  tol_window = 10000L,
  seed = 222L,
  device = NULL
)

Arguments

triplet_list: A named list of data frames, one per participant, as returned by get.combined. Each data frame must contain the columns worker_id, Center, Left, Right, Answer, sampleAlg, and sampleSet.
output_dir: Path to the directory where output CSV files will be saved. Created automatically if it does not already exist.
d: Number of embedding dimensions. Default 5.
max_epochs: Maximum number of training epochs. Default 50000.
tolerance: Loss tolerance for early stopping. Default 1e-4.
tol_window: Epochs without improvement before early stopping triggers. Default 10000.
seed: Integer random seed for reproducibility. Default 222.
device: PyTorch device string, or NULL (default) to auto-select: CUDA GPU if available, then Apple MPS, then CPU. Pass "cpu" to force CPU even on a GPU machine.

Value

A named list with three elements:

individual: Named list of numeric matrices, one per participant. Each matrix has one row per item (with item names as row names) and d columns (dim_0, dim_1, …).
group: Numeric matrix of the group-level embedding, with item names as row names and d columns.
history: Data frame with one row per worker (plus "group") containing training diagnostics: worker_id, lowest_loss, epoch, counter_from_last_update, n_train_triplets, n_test_triplets.

Details

The function converts items to consistent zero-based integer indices (sorted alphabetically), writes temporary CSV files, calls the Python embedding pipeline, and returns results in the standard tripletTools format.

Item indexing

All unique item names appearing in the Center, Left, and Right columns across all participants are collected and sorted alphabetically. Each item's zero-based index in this sorted list is used as the integer index for the Python model. The same ordering is applied to all participants so that indices are consistent across workers.

Filtering

Trials with sampleAlg == "check" are excluded before fitting the embedding (these are attention-check trials that do not reflect genuine similarity judgments). The sampleSet column (indicating "train" or "test") must be present and is passed through unchanged.

Examples

if (FALSE) { # \dontrun{
results <- run_embeddings_from_list(
  triplet_list = icon_triplets,
  output_dir   = "embeddings_output",
  d            = 3L,
  max_epochs   = 50000L
)

# Group embedding
head(results$group)

# First participant's individual embedding
head(results$individual[[1]])

# Training diagnostics
results$history
} # }