Reading Triplet Data
read_data_vignette.RmdThis package contains a set of functions to aid analysis of data from triadic comparisons or triplet tasks. See the tripletTools Overview vignette for demonstrations of the various tools.
This vignette describes how to read new data into R in a format that works with the package, and describes how the raw data file should be structured.
Data file structure and naming conventions
The functions make use of two kinds of data files: triplet data files, which contain information about each trial of a tradic comparison task, and embedding files, which contain the coordinates of each item in an embedding computed from triplet judgment data.
Triplet data files
Triplet data files are .csv files generated by the
software used to collect triplet judgment data. The first row should be
a header specifying column names. Each subsequent row then records key
information for each trial of a triplet experiment. Typically data from
all participants in a given study are included in a single triplet data
file.
The triplet data file must be a .csv file and must
contain columns with the following names:
- worker_id: Arbitrary identifier for each participant
- rt: Response time for the trial
- Center, Left, Right: Strings indicating the items appearing in the center (target item), left side (option 1) and right side (option 2).
- Answer: String indicating which option the participant chose.
- sampleAlg: ALgorithm used to sample the item: either random, validation, or check.
- sampleSet: Indicates whether the triplet was used to fit the embedding (train) or not (test).
The data file can also contain any other fields. Often data will include an integer encoing of the triplet information with the following column names:
- head, winner, loser: Integer indices for each item appearing in the triplet
Data in this format can be read into the current session using the
function get_combined(fname) where fname is
the path to the data file. This function returns a named list, where
each element includes the triplet judgment data from a single subject,
and elements are named by the subject identified. This package includes
an example dataset in this format, icon_triplets:
head(icon_triplets[[1]])
#> head winner loser worker_id rt Center Left Right Answer sampleAlg
#> 1 29 24 19 3n7ggxph 3096 pnhns pncnb pdcos pncnb random
#> 2 14 0 24 3n7ggxph 1100 fnmyb fdfob pncnb fdfob random
#> 3 30 19 24 3n7ggxph 2616 pnhob pncnb pdcos pdcos random
#> 4 17 12 13 3n7ggxph 2629 pdcns fnmow fnmob fnmob validation
#> 5 29 9 8 3n7ggxph 2011 pnhns fnfow fnfob fnfow random
#> 6 25 23 12 3n7ggxph 1498 pncns fnmob pdhos pdhos random
#> sampleSet
#> 1 train
#> 2 train
#> 3 train
#> 4 train
#> 5 train
#> 6 trainHere you can see the triplet judgment data for each trial for the first participant in the experiment. Participants viewed the trials in the same order they are listed in the matrix.
The data from each participant is a separate element in the list, and
the elements are labeled by the worker\_id label in the raw
data file. You can see all the subject labels as follows:
names(icon_triplets)
#> [1] "3n7ggxph" "b5wma4no" "d8mmm1qn" "jn7bbjc0" "pbby694o" "sc2xbd6w"To learn more about this dataset, try
help(icon_triplets).
Embedding data
Embedding data are .csv files containing embedding
coordinates for each stimulus item in the study. Depending on the study,
there may be a single group embedding computed from a group of
participants, or individual embeddings computed separately for
each participant, or both.
In both cases the .csv file must contain columns with
the following labels:
- item: A string indicating the label for the item.
- dim_0 - dim_k: One column for each dimension of the embedding, numbered beginning with zero, containing a numeric value that indicates the item’s location on the corresponding dimension of the embedding.
If a separate embedding was computed for each participant, then all
embeddings should appear within the same .csv file, and
this should also include the following column:
- worker_id: The random participant identifier, which should be the same as the identifier used in the triplet dataset.
To read in a group-based single embedding file you can just use standard R:
grpemb <- read.csv("filename.csv", row.names = "item", header = T)
For studies with separate embeddings computed for each participant,
use the get.combined function to read the data, setting the
eflag flag to TRUE to indicate these are
embeddings:
indemb <- get.combined("filename.csv", eflag = TRUE)
As with triplet data, this will create a named list where each
element contains the embedding information computed for one participant.
The elements are labeled by the participant id (worker_id).
The icon_emb_ind object contains a list of the kind
returned by this function:
head(icon_emb_ind[[1]])
#> dim_0 dim_1 dim_2
#> fdfob 0.6411938 0.9710717 -0.9336048
#> fdfow 0.5504593 0.9558654 -0.9130039
#> fdfyb 0.2907846 0.6866032 -0.6360701
#> fdfyw 0.5820549 0.9266087 -0.8992642
#> fdmob 0.5776460 1.0081034 -0.8230091
#> fdmow 0.7911357 0.4666237 -0.4759873The row names indicate the stimulus identity and the entries indicate the coordinates of the stimulus along the first (dim_0), second (dim_1), and third (dim_2) dimensions of the embedding.