Title: | Zooarchaeological Analysis with Log-Ratios |
---|---|
Description: | Includes functions and reference data to generate and manipulate log-ratios (also known as log size index (LSI) values) from measurements obtained on zooarchaeological material. Log ratios are used to compare the relative (rather than the absolute) dimensions of animals from archaeological contexts (Meadow 1999, ISBN: 9783896463883). zoolog is also able to seamlessly integrate data and references with heterogeneous nomenclature, which is internally managed by a zoolog thesaurus. A preliminary version of the zoolog methods was first used by Trentacoste, Nieto-Espinet, and Valenzuela-Lamas (2018) <doi:10.1371/journal.pone.0208109>. |
Authors: | Jose M Pozo [aut, cre] , Angela Trentacoste [aut] , Ariadna Nieto-Espinet [aut] , Silvia Guimarães Chiarelli [aut] , Silvia Valenzuela-Lamas [aut] |
Maintainer: | Jose M Pozo <[email protected]> |
License: | GPL-3 |
Version: | 1.1.1.001 |
Built: | 2024-11-11 03:29:14 UTC |
Source: | https://github.com/josempozo/zoolog |
Function to build a reference dataframe selecting a case for each taxon from the available specimens in the references' database.
AssembleReference( combination, ref.db = referencesDatabase, thesaurus = zoologThesaurus$taxon )
AssembleReference( combination, ref.db = referencesDatabase, thesaurus = zoologThesaurus$taxon )
combination |
A dataframe or named list. Each (column) name identifies a taxon. Each column or list element must have a single element of type character, identifying one of the sources included in the references' database. |
ref.db |
A reference database. This is a named list of named lists of
dataframes. The first level is named by taxon and the second level is named
by reference source. Each dataframe includes the reference for the
corresponding taxon and source. The default
|
thesaurus |
A thesaurus for taxa. |
A reference dataframe.
## `referenceSets` includes a series of predefined reference compositions. referenceSets ## Actually the package `references` is build from them. ## We can rebuild any of them: referenceCombi <- AssembleReference(referenceSets["Combi", ]) ## Define an altenative reference combining differently the references' ## database: refComb <- list(cattle = "Nieto", sheep = "Davis", Goat = "Clutton", pig = "Albarella", redDeer = "Basel") userReference <- AssembleReference(refComb)
## `referenceSets` includes a series of predefined reference compositions. referenceSets ## Actually the package `references` is build from them. ## We can rebuild any of them: referenceCombi <- AssembleReference(referenceSets["Combi", ]) ## Define an altenative reference combining differently the references' ## database: refComb <- list(cattle = "Nieto", sheep = "Davis", Goat = "Clutton", pig = "Albarella", redDeer = "Basel") userReference <- AssembleReference(refComb)
This function condenses the calculated log ratio values into a reduced number
of features by grouping log ratio values and selecting or calculating a
feature value. By default the selected groups each represents a single dimension,
i.e. Length
and Width
. Only one feature is extracted per group.
Currently, two methods are possible: priority (default) or average.
CondenseLogs( data, grouping = list(Length = c("GL", "GLl", "GLm", "HTC"), Width = c("BT", "Bd", "Bp", "SD", "Bfd", "Bfp"), Depth = c("Dd", "DD", "BG", "Dp")), method = "priority" )
CondenseLogs( data, grouping = list(Length = c("GL", "GLl", "GLm", "HTC"), Width = c("BT", "Bd", "Bp", "SD", "Bfd", "Bfp"), Depth = c("Dd", "DD", "BG", "Dp")), method = "priority" )
data |
A dataframe with the input measurements. |
grouping |
A list of named character vectors. The list includes a vector
per selected group. Each vector gives the group of measurements in order of
priority. By default the groups are
|
method |
Character string indicating which method to use for extracting
the condensed features. Currently accepted methods: |
This operation is motivated by two circumstances. First, not all measurements are available for every bone specimen, which obstructs their direct comparison and statistical analysis. Second, several measurements can be strongly correlated (e.g. SD and Bd both represent bone width). Thus, considering them as independent would produce an over-representation of bone remains with more measurements per axis. Condensing each group of measurements into a single feature (e.g. one measure per axis) palliates both problems.
Observe that an important property of the log-ratios from a reference is that
it makes the different measures comparable. For instance, if a bone is
scaled with respect to the reference, so that it homogeneously doubles its
width, then all width related measures
(BT, Bd, Bp, SD, ...) will give the
same log-ratio (log(2)
). In contrast, the
absolute measures are not directly comparable.
The measurement names in the grouping list are given without the
logPrefix
. But the selection is made from the log-ratios.
The default method is "priority"
, which selects the first available
measure log-ratio in each group. The method "average"
extracts the
mean per group, ignoring the non-available measures.
We provide the following by-default group and prioritization:
For lengths, the order of priority is: GL, GLl, GLm, HTC.
For widths, the order of priority is: BT, Bd, Bp, SD, Bfd, Bfp.
For depths, the order of priority is: Dd, DD, BG, Dp
This order maximises the robustness and reliability of the measurements,
as priority is given to the most abundant, more replicable, and less age
dependent measurements.
This method was first used in: Trentacoste, A., Nieto-Espinet, A., & Valenzuela-Lamas, S. (2018). Pre-Roman improvements to agricultural production: Evidence from livestock husbandry in late prehistoric Italy. PloS one, 13(12), e0208109.
Alternatively, a user-defined method
can be provided as a function
with a single argument (data.frame) assumed to have as columns the measure
log-ratios determined by the grouping
.
A dataframe including the input dataframe and additional columns, one
for each extracted condensed feature, with the corresponding name given in
grouping
.
## Read an example dataset: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package="zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## For illustration purposes we keep now only a subset of cases to make ## the example run sufficiently fast. ## Avoid this step if you want to process the full example dataset. dataExample <- dataExample[1:1000, ] ## Compute the log-ratios and select the cases with available log ratios: dataExampleWithLogs <- RemoveNACases(LogRatios(dataExample)) ## We can observe the first lines (excluding some columns for visibility): head(dataExampleWithLogs)[, -c(6:20,32:63)] ## Extract the default condensed features with the default "priority" method: dataExampleWithSummary <- CondenseLogs(dataExampleWithLogs) head(dataExampleWithSummary)[, -c(6:20,32:63)] ## Extract only width with "average" method: dataExampleWithSummary2 <- CondenseLogs(dataExampleWithLogs, grouping = list(Width = c("BT", "Bd", "Bp", "SD")), method = "average") head(dataExampleWithSummary2)[, -c(6:20,32:63)]
## Read an example dataset: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package="zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## For illustration purposes we keep now only a subset of cases to make ## the example run sufficiently fast. ## Avoid this step if you want to process the full example dataset. dataExample <- dataExample[1:1000, ] ## Compute the log-ratios and select the cases with available log ratios: dataExampleWithLogs <- RemoveNACases(LogRatios(dataExample)) ## We can observe the first lines (excluding some columns for visibility): head(dataExampleWithLogs)[, -c(6:20,32:63)] ## Extract the default condensed features with the default "priority" method: dataExampleWithSummary <- CondenseLogs(dataExampleWithLogs) head(dataExampleWithSummary)[, -c(6:20,32:63)] ## Extract only width with "average" method: dataExampleWithSummary2 <- CondenseLogs(dataExampleWithLogs, grouping = list(Width = c("BT", "Bd", "Bp", "SD")), method = "average") head(dataExampleWithSummary2)[, -c(6:20,32:63)]
The dataset provided as an example originates from (Valenzuela-Lamas 2008). The dataset is written in Catalan, with the exception of some headings to facilitate understanding of its contents.
The dataset is provided in the zoolog extdata
folder as a file
in semicolon-separated values format but compressed with
gzip to reduce its size:
dataValenzuelaLamas2008.csv.gz
The file is provided in UTF-8 encoding. The file encoding is relevant
because the dataset contains accents and special characters that needs to be
correctly displayed. It can
be directly open by utils::read.csv2
, provided that the correct
encoding is set (see examples below).
Every row of the data.frame refers to one individual bone fragment unless otherwise stated in the Observations field ("Observacions").
All the measurements are expressed in millimetres and were obtained with a manual calliper.
The main headings in the database are:
The faunal remains from three Iron Age archaeological sites were recorded (ALP = Alorda Park, TFC = Turó de la Font de la Canya, OLD = Olèrdola).
A correlative number for each fragment.
Refers to the Stratigraphic Unit (SU in English).
Refers to the species.
Refers to the skeletal element.
Refers to the preserved part in the vertical axis (distal, proximal, diaphysis, etc.).
Bone laterality: right (d) or left (e).
Refers to the preserved part in relation to the circumference (c), or a vertically, transversally and obliquely fragmented (sto).
Refers to fracture during field excavation or lab work.
Refers to anthropic and post-depositional alterations.
Refers to degree of bone alteration in a scale from 0 (no alteration) to 4 (diaphysis completely altered).
Degree of fusion: s= fused, ns= unfused, ec = fusion visible. Also tooth wear is recorded here following (Gardeisen 1997).
Sex: male (masc) / female (fem).
Refers to butchery marks. It may also include other observations.
Observations.
Refers to the number of silo structure (e.g. SJ8) or the room (e.g. AB) from which the material originates.
Absolute chronology in Terminus Post Quem.
Absolute chronology in Terminus Ante Quem.
Chronological phasing.
Box number that contains the item.
The nomenclature follows (Von den Driesch 1976).
Gardeisen A (1997).
“Exploitation des prélèvements et fichiers de spécialité (PRL, FAUNE, OS).”
Lattara, 10, 251–278.
Valenzuela-Lamas S (2008).
Alimentació i ramaderia al Penedès durant la protohistòria (segles VII-III aC).
Societat Catalana d'Arqueologia (Premi d’Arqueologia - Memorial Josep Barber\‘a i Farr\'as, 5a edici\’o).
http://www.scarqueologia.com/?page_id=10.
Von den Driesch A (1976).
A guide to the measurement of animal bones from archaeological sites: as developed by the Institut für Palaeoanatomie, Domestikationsforschung und Geschichte der Tiermedizin of the University of Munich, volume 1.
Peabody Museum Press.
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package="zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8")
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package="zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8")
Function to check if an element belongs to a category according to a
thesaurus. It is similar to %in%
and
is.element
, returning a logical vector indicating if each
element in a given vector is included in a given set. But InCategory
checks for equality assuming the equivalencies defined in the given thesaurus.
InCategory(x, category, thesaurus)
InCategory(x, category, thesaurus)
x |
Character vector to be checked for its inclusion in the category. |
category |
Character vector identifying the categories in which the
inclusion of |
thesaurus |
A thesaurus object. |
A logical vector of the same length as x
. Each value answers the
question: Does the corresponding element in x
belongs to any of
the thesaurus categories identified by category
?
InCategory(c("sheep", "cattle", "goat", "red deer"), c("ovis", "capra"), zoologThesaurus$taxon)
InCategory(c("sheep", "cattle", "goat", "red deer"), c("ovis", "capra"), zoologThesaurus$taxon)
Function to compute the (base 10) log ratios of the measurements relative to standard reference values. The default reference and several alternative references are provided with the package. But the user can use their own references if desired.
LogRatios( data, ref = reference$Combi, identifiers = c("Taxon", "Element"), refMeasuresName = "Measure", refValuesName = "Standard", thesaurusSet = zoologThesaurus, taxonomy = zoologTaxonomy, joinCategories = NULL, mergedMeasures = NULL, useGenusIfUnambiguous = TRUE )
LogRatios( data, ref = reference$Combi, identifiers = c("Taxon", "Element"), refMeasuresName = "Measure", refValuesName = "Standard", thesaurusSet = zoologThesaurus, taxonomy = zoologTaxonomy, joinCategories = NULL, mergedMeasures = NULL, useGenusIfUnambiguous = TRUE )
data |
A dataframe with the input measurements. |
ref |
A dataframe including the measurement values used as references.
The default |
identifiers |
A vector of column names in |
refMeasuresName |
The column name in |
refValuesName |
The column name in |
thesaurusSet |
A thesaurus allowing datasets with different nomenclatures
to be merged. By default |
taxonomy |
A taxonomy allowing the automatic detection of data and
reference sharing the same genus (or higher taxonomic rank), although of
different species. By default |
joinCategories |
A list of named character vectors. Each vector is named
by a category in the reference and includes a set of categories in the data
for which to compute the log ratios with respect to that reference.
When |
mergedMeasures |
A list of character vectors or a single character vector. Each vector identifies a set of measures that the data presents merged in the same column, named as any of them. This practice only makes sense if only one of the measures can appear in each bone element. |
useGenusIfUnambiguous |
Boolean. If |
Each log ratio is defined as the decimal logarithm of the ratio of the variable of interest to a corresponding reference value.
The identifiers
are expected to determine corresponding
columns in both data and reference. Each value in these columns identifies
the type of bone. By default this is determined by a taxon and a bone
element. For any case in the data, the log ratios are computed with respect
to the reference values in the same bone type. If the reference does not
include that bone type, the corresponding log ratios are set to NA
.
The taxonomy allows the matching of data and reference by genus, instead
of by species. This is the default behaviour with
useGenusIfUnambiguous = TRUE
, unless there is some ambiguity:
reference including more than one species for the same genus. For instance,
reference$Combi
includes a reference for Sus scrofa.
If the data includes cases of Sus domesticus, their
log ratios will be computed with respect to the provided reference for
Sus scrofa.
However, a warning is given to inform the user of this assumption, and let
they know that this can be prevented by setting
useGenusIfUnambiguous = FALSE
.
For some applications it can be interesting to group some set of bone types
into the same reference category to compute the log ratios. The parameter
joinCategories
allows this grouping. joinCategories
must be a
list of named vectors, each including the set of categories in the data
which should be mapped to the reference category given by its name.
This can be applied to group different species into a single
reference species. For instance sheep, capra, and doubtful
cases between both (sheep/goat), can be grouped and matched to the
same reference for sheep, by setting
joinCategories = list(sheep = c("sheep", "goat", "oc"))
.
Indeed, the zoologTaxonomy can be used for that purpose using the function
SubtaxonomySet
as
joinCategories = list(sheep = SubtaxonomySet("Caprini"))
.
Similarly, joinCategories
can be applied to group
different bone elements into a single reference (see the example below for
undetermined phalanges).
Note that the joinCategories
option does not remove the distinction
between the different bone types in the data, just indicates that for any
of them the log ratios must be computed from the same reference.
Using the taxonomy, the presence of cases identified by higher taxonomic
ranks are also automatically detected. For instance, if some partially
identified cases have been recorded as "Ovis/Capra", this is recognized
to denote the tribe Caprini, which includes several possible species.
Then a warning is given informing the user of the detection of these cases
and of the option to use any of the corresponding species in the reference by
using the argument joinCategories
(unless this has been already done).
There are some measures that, for most usual taxa, are restricted to a subset
of bones. For instance, for Bos, Ovis, Capra, and Sus, the measure
GLl is only relevant for the astragalus, while GL is not
applicable to it.
Thus, there cannot be any ambiguity between both measures since they can
be identified by the bone element. This justifies that some users have
simplified datasets where a single column records indistinctly GL or
GLl. The optional parameter mergedMeasures
facilitates the
processing of this type of simplified dataset. For the alluded example,
mergedMeasures = list(c("GL", "GLl"))
automatically selects, for each
bone element, the corresponding measure present in the reference.
Observe that if mergedMeasures
is set to non mutually exclusive
measures, the behaviour is unpredictable.
A dataframe including the input dataframe and additional columns, one
for each extracted log ratio for each relevant measurement in the reference.
The name of the added columns are constructed by prefixing each measurement by
the internal variable logPrefix
.
If the input dataframe includes additional S3 classes (such as "tbl_df"), they are also passed to the output.
## Read an example dataset: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package="zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## For illustration purposes we keep now only a subset of cases to make ## the example run sufficiently fast. ## Avoid this step if you want to process the full example dataset. dataExample <- dataExample[1:400, ] ## We can observe the first lines (excluding some columns for visibility): head(dataExample)[, -c(6:20,32:64)] ## Compute the log-ratios with respect to the default reference in the ## package zoolog: dataExampleWithLogs <- LogRatios(dataExample) ## The output data frame include new columns with the log-ratios of the ## present measurements, in both data and reference, with a "log" prefix: head(dataExampleWithLogs)[, -c(6:20,32:64)] ## Compute the log-ratios with respect to a different reference: dataExampleWithLogs2 <- LogRatios(dataExample, ref = reference$Basel) head(dataExampleWithLogs2)[, -c(6:20,32:64)] ## Define an altenative reference combining differently the references' ## database: refComb <- list(cattle = "Nieto", sheep = "Davis", Goat = "Clutton", pig = "Albarella", redDeer = "Basel") userReference <- AssembleReference(refComb) ## Compute the log-ratios with respect to this alternative reference: dataExampleWithLogs3 <- LogRatios(dataExample, ref = userReference) ## We can be interested in including the first and second phalanges without ## anterior-posterior identification ("phal 1" and "phal 2"), by computing ## their log ratios with respect to the reference of the corresponding ## anterior phalanges ("phal 1 ant" and "phal 2 ant", respectively). ## For this we use the optional argument joinCategories: categoriesPhalAnt <- list('phal 1 ant' = c("phal 1 ant", "phal 1"), 'phal 2 ant' = c("phal 2 ant", "phal 2")) dataExampleWithLogs4 <- LogRatios(dataExample, joinCategories = categoriesPhalAnt) head(dataExampleWithLogs4)[, -c(6:20,32:64)]
## Read an example dataset: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package="zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## For illustration purposes we keep now only a subset of cases to make ## the example run sufficiently fast. ## Avoid this step if you want to process the full example dataset. dataExample <- dataExample[1:400, ] ## We can observe the first lines (excluding some columns for visibility): head(dataExample)[, -c(6:20,32:64)] ## Compute the log-ratios with respect to the default reference in the ## package zoolog: dataExampleWithLogs <- LogRatios(dataExample) ## The output data frame include new columns with the log-ratios of the ## present measurements, in both data and reference, with a "log" prefix: head(dataExampleWithLogs)[, -c(6:20,32:64)] ## Compute the log-ratios with respect to a different reference: dataExampleWithLogs2 <- LogRatios(dataExample, ref = reference$Basel) head(dataExampleWithLogs2)[, -c(6:20,32:64)] ## Define an altenative reference combining differently the references' ## database: refComb <- list(cattle = "Nieto", sheep = "Davis", Goat = "Clutton", pig = "Albarella", redDeer = "Basel") userReference <- AssembleReference(refComb) ## Compute the log-ratios with respect to this alternative reference: dataExampleWithLogs3 <- LogRatios(dataExample, ref = userReference) ## We can be interested in including the first and second phalanges without ## anterior-posterior identification ("phal 1" and "phal 2"), by computing ## their log ratios with respect to the reference of the corresponding ## anterior phalanges ("phal 1 ant" and "phal 2 ant", respectively). ## For this we use the optional argument joinCategories: categoriesPhalAnt <- list('phal 1 ant' = c("phal 1 ant", "phal 1"), 'phal 2 ant' = c("phal 2 ant", "phal 2")) dataExampleWithLogs4 <- LogRatios(dataExample, joinCategories = categoriesPhalAnt) head(dataExampleWithLogs4)[, -c(6:20,32:64)]
Several osteometrical references are provided in zoolog to enable researchers to use the one of their choice. The user can also use their own osteometrical reference if preferred.
reference referenceSets referencesDatabase
reference referenceSets referencesDatabase
Each reference is a data.frame including 4 columns:
The taxon to which each reference bone belongs.
The skeletal element.
The type of measurement taken on the bone.
The value of the measurement taken on the bone. All the measurements are expressed in millimetres.
An object of class data.frame
with 4 rows and 15 columns.
An object of class list
of length 15.
Currently, the references include reference values for the main domesticates and their agriotypes (Bos, Ovis, Capra, Sus), and other less frequent species, such as red deer and donkey, drawn from the following publications and resources:
Bos taurus. Female cow dated to the Early Bronze Age (Minferri, Catalonia), in Nieto-Espinet (2018).
Bos taurus. Inv.nr. 2426 (Hinterwälder; female; 17 years old; live weight: 340 kg; withers height: 113 cm), from Stopp and Deschler-Erb (2018).
Bos taurus. Standard values from means of cattle measures from Period II (Late Iron Age to Romano-British transition) of Elms Farm, Heybridge (Johnstone and Albarella 2002).
Bos primigenius. Female aurochs from Degerbøl and Fredskild (1970). Non-standard measures converted to more standard ones (Von den Driesch 1976)
Bos primigenius. Female aurochs from Steppan (2001). Same specimen as in Degerbøl and Fredskild (1970), but with new and more standandard measures (Von den Driesch 1976). Mean measurements from left and right bones when available.
Ovis aries. Mean values of measurements from a group of adult female Shetland sheep skeletons from a single flock (Davis 1996).
Ovis aries. Mean measurements from a group of male Soay sheep of known age (Clutton-Brock et al. 1990).
Ovis musimon. Inv.nr. 2266 (male; adult), from Stopp and Deschler-Erb (2018).
Ovis orientalis. Field Museum of Chicago catalogue number: FMC 57951 (female; western Iran) from Uerpmann and Uerpmann (1994).
Capra hircus. Inv.nr. 1597 (male; adult), from Stopp and Deschler-Erb (2018).
Capra hircus. Mean measurements from a group of goats of unknown age and sex (Clutton-Brock et al. 1990).
Capra aegagrus. Measurements based on female and male Capra aegagrus, Natural History Museum in London number: BMNH 651 M and L2 (Taurus Mountains in southern Turkey) from Uerpmann and Uerpmann (1994).
Sus domesticus. Mean measurements from a group of Late Neolithic pigs from Durrington Walls, England (Albarella and Payne 2005).
Sus scrofa. Inv.nr. 1446 (male; 2-3 years old; life weight: 120 kg) from Stopp and Deschler-Erb (2018).
Sus scrofa. Averaged left and right measurements of a female wild board from near Elaziğ, Turkey. Museum of Comparative Zoology, Harvard University, specimen #51621 (Hongo and Meadow 2000).
Sus scrofa. Measurements based on a sample of modern wild boar, Sus scrofa libycus, (male and female; Kizilcahamam, Turkey) from Payne and Bull (1988), Appendix 2.
Cervus elaphus. Inv.nr. 2271 (male; adult) from Stopp and Deschler-Erb (2018).
Dama mesopotamica. Adult female modern specimen from Israel (id #1047), curated in Archaeozoology Laboratory at the University of Haifa (Harding and Marom 2021).
Gazella gazella. Adult female modern specimen from Israel (id #1037), curated in Archaeozoology Laboratory at the University of Haifa (Harding and Marom 2021).
Equus asinus. Adult male modern specimen from Israel (id #1076), curated in Archaeozoology Laboratory at the University of Haifa (Harding and Marom 2021).
Equus caballus. 3 years old Icelandic mare (all bones fused, female) died in 1961, (Johnstone 2004). Skeleton held at the Zoologische Staatssammlung Munich in Germany. Specimen ID 1961/29.
Oryctolagus cuniculus. Adult male European rabbit from Audley End, Essex, UK, curated in the reference collection at University of Nottingham Arch department (ID RS139) (Ameen 2021).
Canis lupus. Hungarian Agricultural Museum: Specimen 73.4 (small mature female; probably local origin) from Russell (1993).
The zoolog variable referencesDatabase
collects all these
references. It is structured as a named list of named lists, following the
hierarchy described above:
str(referencesDatabase, max.level = 2) #> List of 15 #> $ Bos taurus :List of 3 #> ..$ Nieto :'data.frame': 68 obs. of 4 variables: #> ..$ Basel :'data.frame': 50 obs. of 4 variables: #> ..$ Johnstone:'data.frame': 24 obs. of 4 variables: #> $ Bos primigenius :List of 2 #> ..$ Degerbol:'data.frame': 50 obs. of 4 variables: #> ..$ Steppan :'data.frame': 84 obs. of 4 variables: #> $ Ovis aries :List of 2 #> ..$ Davis :'data.frame': 23 obs. of 4 variables: #> ..$ Clutton:'data.frame': 71 obs. of 4 variables: #> $ Ovis orientalis :List of 2 #> ..$ Basel :'data.frame': 36 obs. of 4 variables: #> ..$ Uerpmann:'data.frame': 50 obs. of 4 variables: #> $ Capra hircus :List of 2 #> ..$ Basel :'data.frame': 35 obs. of 4 variables: #> ..$ Clutton:'data.frame': 60 obs. of 4 variables: #> $ Capra aegagrus :List of 1 #> ..$ Uerpmann:'data.frame': 50 obs. of 4 variables: #> $ Sus domesticus :List of 1 #> ..$ Albarella:'data.frame': 42 obs. of 4 variables: #> $ Sus scrofa :List of 3 #> ..$ Basel:'data.frame': 41 obs. of 4 variables: #> ..$ Hongo:'data.frame': 96 obs. of 4 variables: #> ..$ Payne:'data.frame': 33 obs. of 4 variables: #> $ Cervus elaphus :List of 1 #> ..$ Basel:'data.frame': 14 obs. of 4 variables: #> $ Dama mesopotamica :List of 1 #> ..$ Haifa:'data.frame': 60 obs. of 4 variables: #> $ Gazella gazella :List of 1 #> ..$ Haifa:'data.frame': 63 obs. of 4 variables: #> $ Equus asinus :List of 1 #> ..$ Haifa:'data.frame': 48 obs. of 4 variables: #> $ Equus caballus :List of 1 #> ..$ Johnstone:'data.frame': 75 obs. of 4 variables: #> $ Oryctolagus cuniculus:List of 1 #> ..$ Nottingham:'data.frame': 58 obs. of 4 variables: #> $ Canis lupus :List of 1 #> ..$ Russell:'data.frame': 77 obs. of 4 variables:
The references' database is organized per taxon. However, in general the
zooarchaeological data to be analysed includes several taxa. Thus, the
reference dataframe should include one reference standard for each relevant
taxon.
The zoolog variable referenceSets
defines four possible
references:
referenceSets
Bos taurus | Bos primigenius | Ovis aries | Ovis orientalis | Capra hircus | Capra aegagrus | Sus domesticus | Sus scrofa | Cervus elaphus | Dama mesopotamica | Gazella gazella | Equus asinus | Equus caballus | Oryctolagus cuniculus | Canis lupus | |
NietoDavisAlbarella | Nieto | Davis | Albarella | ||||||||||||
Basel | Basel | Basel | Basel | Basel | Basel | ||||||||||
Combi | Nieto | Clutton | Clutton | Basel | Basel | Haifa | Haifa | Haifa | Johnstone | Nottingham | Russell | ||||
Groningen | Degerbol | Uerpmann | Uerpmann | Hongo | |||||||||||
Each row defines a reference set consisting of a reference source for
each taxon (column). The function
AssembleReference
allows us to build the reference set
taking the selected taxon-specific references from the
referencesDatabase
.
The zoolog variable reference
is a named list including the
references defined by referenceSets
:
str(reference) #> List of 4 #> $ NietoDavisAlbarella:'data.frame': 133 obs. of 4 variables: #> ..$ TAX : Factor w/ 3 levels "bota","ovar",..: 1 1 1 1 1 1 1 1 1 1 ... #> ..$ EL : Factor w/ 27 levels "AS","CAL","FE",..: 4 4 4 4 4 4 4 4 4 11 ... #> ..$ Measure : Factor w/ 26 levels "BFd","BFp","BT",..: 8 9 5 7 13 4 3 12 6 8 ... #> ..$ Standard: num [1:133] 259 234 78.3 90.2 29 ... #> $ Basel :'data.frame': 176 obs. of 4 variables: #> ..$ TAX : Factor w/ 5 levels "BOTA","Ovis orientalis",..: 1 1 1 1 1 1 1 1 1 1 ... #> ..$ EL : Factor w/ 28 levels "Astragalus","Calcaneus",..: 14 14 14 14 5 5 5 13 13 13 ... #> ..$ Measure : Factor w/ 26 levels "BFd","BFp","BG",..: 21 13 18 3 5 4 19 6 19 5 ... #> ..$ Standard: num [1:176] 65.9 83 66.9 58.1 95.3 ... #> $ Combi :'data.frame': 635 obs. of 4 variables: #> ..$ TAX : Factor w/ 11 levels "bota","OVAR",..: 1 1 1 1 1 1 1 1 1 1 ... #> ..$ EL : Factor w/ 69 levels "AS","CAL","FE",..: 4 4 4 4 4 4 4 4 4 11 ... #> ..$ Measure : Factor w/ 83 levels "BFd","BFp","BT",..: 8 9 5 7 13 4 3 12 6 8 ... #> ..$ Standard: num [1:635] 259 234 78.3 90.2 29 ... #> $ Groningen :'data.frame': 246 obs. of 4 variables: #> ..$ TAX : Factor w/ 4 levels "Bos primigenius",..: 1 1 1 1 1 1 1 1 1 1 ... #> ..$ EL : Factor w/ 23 levels "Astragalus","Calcaneus",..: 13 13 13 5 5 5 5 5 12 12 ... #> ..$ Measure : Factor w/ 45 levels "BFp","BG","BT",..: 14 12 2 8 9 4 3 13 8 5 ... #> ..$ Standard: num [1:246] 69 70 60 359 309 97 89 46 320 100 ...
reference$Combi
includes the most comprehensive reference for each
species so that more measurements can be considered. It is the default
reference for computing the log ratios.
If desired, the user can define their own combinations or can also use their own references, which must be a dataframe with the format described above.
referencesDatabase
, refereceSets
, and reference
are exported variables
automatically loaded in memory. In addition, zoolog provides in the
extdata
folder a set of semicolon separated files (csv), generating
them:
referenceSets.csv
Defines referenceSets
.
referencesDatabase.csv
Defines the structure of
referencesDatabase
.
A csv file for each taxon-specific reference, as named in
referencesDatabase.csv
.
utils::read.csv2(system.file("extdata", "referencesDatabase.csv", package = "zoolog")) #> Genus Taxon Source #> 1 Cattle - *Bos* Bos taurus Nieto #> 2 Cattle - *Bos* Bos taurus Basel #> 3 Cattle - *Bos* Bos taurus Johnstone #> 4 Cattle - *Bos* Bos primigenius Degerbol #> 5 Cattle - *Bos* Bos primigenius Steppan #> 6 Sheep - *Ovis* Ovis aries Davis #> 7 Sheep - *Ovis* Ovis aries Clutton #> 8 Sheep - *Ovis* Ovis orientalis Basel #> 9 Sheep - *Ovis* Ovis orientalis Uerpmann #> 10 Goat - *Capra* Capra hircus Basel #> 11 Goat - *Capra* Capra hircus Clutton #> 12 Goat - *Capra* Capra aegagrus Uerpmann #> 13 Pig - *Sus* Sus domesticus Albarella #> 14 Pig - *Sus* Sus scrofa Basel #> 15 Pig - *Sus* Sus scrofa Hongo #> 16 Pig - *Sus* Sus scrofa Payne #> 17 Red deer - *Cervus* Cervus elaphus Basel #> 18 Fallow deer - *Dama* Dama mesopotamica Haifa #> 19 Gazelle - *Gazella* Gazella gazella Haifa #> 20 Equid - *Equus* Equus asinus Haifa #> 21 Equid - *Equus* Equus caballus Johnstone #> 22 European rabbit - *Oryctolagus* Oryctolagus cuniculus Nottingham #> 23 Canid - *Canis* Canis lupus Russell #> Filename #> 1 referenceCattle_Nieto.csv #> 2 referenceCattle_Basel.csv #> 3 referenceCattle_Johnstone.csv #> 4 referenceCattle_Degerbol.csv #> 5 referenceCattle_Steppan.csv #> 6 referenceSheep_Davis.csv #> 7 referenceSheep_Clutton.csv #> 8 referenceSheep_Basel.csv #> 9 referenceSheep_Uerpmann.csv #> 10 referenceGoat_Basel.csv #> 11 referenceGoat_Clutton.csv #> 12 referenceGoat_Uerpmann.csv #> 13 referencePig_Albarella.csv #> 14 referencePig_Basel.csv #> 15 referencePig_Hongo.csv #> 16 referencePig_Payne.csv #> 17 referenceRedDeer_Basel.csv #> 18 referenceDama_Haifa.csv #> 19 referenceGazelle_Haifa.csv #> 20 referenceEquid_Haifa.csv #> 21 referenceEquid_Johnstone.csv #> 22 referenceRabbit_Nottingham.csv #> 23 referenceCanid_Russell.csv
We are grateful to Barbara Stopp and Sabine Deschler-Erb (University of Basel, Switzerland) for providing the Basel references for cattle, sheep, goat, wild boar, and red deer (Stopp and Deschler-Erb 2018), together with the permission to publish them as part of zoolog.
We thank also Francesca Slim and Dimitris Filioglou (University of Groningen) for providing the references for aurochs, mouflon, wild goat, and wild boar (Degerbøl and Fredskild 1970; Uerpmann and Uerpmann 1994; Hongo and Meadow 2000) in the Groningen set.
We thank Claudia Minniti (University of Salento) for providing Johnstone's reference for cattle (Johnstone and Albarella 2002).
We are also grateful to Sierra Harding and Nimrod Marom (University of Haifa) for providing the Haifa standard measurements for donkey, mountain gazelle, and Persian fallow deer (Harding and Marom 2021).
We thank Carly Ameen and Helene Benkert (University of Exeter) for providing references for horse (Johnstone 2004) and European rabbit (Ameen 2021).
We thank Mikolaj Lisowski (University of York) for pointing to the existence of the improved reference for Bos primigenius (Steppan 2001) and providing its source.
Albarella U, Payne S (2005).
“Neolithic pigs from Durrington Walls, Wiltshire, England: a biometrical database.”
Journal of Archaeological Science, 32(4), 589–599.
Ameen C (2021).
“Measurements from an adult male specimen from Audley End, Essex, UK. in the reference collection at the University of Nottingham Archaeology Department under ID RS139.”
Personal communication, included permission to publish them as part of the package zoolog.
Clutton-Brock J, Dennis-Bryan K, Armitage PL, Jewell PA (1990).
“Osteology of the Soay sheep.”
Bulletin of the British Museum, Natural History. Zoology, 56(1), 1–56.
Davis SJ (1996).
“Measurements of a group of adult female Shetland sheep skeletons from a single flock: a baseline for zooarchaeologists.”
Journal of archaeological science, 23(4), 593–612.
Degerbøl M, Fredskild B (1970).
The Urus (Bos Primigenius Bojanus) and Neolithic Domesticated Cattle (Bos Taurus Domesticus Linné) in Denmark: Zoological and Palynological Investigations, Biologiske skrifter, 17:1.
København, (Munksgaard).
Harding S, Marom N (2021).
“Measurements compiled for the Zooarchaeology of Southern Phoenicia (ZSP) Project, from the reference collection in the Leon Recanati Institute for Maritime Studies (RIMS, Department of Maritime Civilizations, University of Haifa, Israel).”
Personal communication, included permission to publish them as part of the package zoolog.
Hongo H, Meadow RH (2000).
“Faunal remains from Prepottery Neolithic levels at Çayönü, southeastern Turkey: a preliminary report focusing on pigs (Sus sp.).”
In Archaeozoology of the Near East IVA Proceedings of the fourth international symposium on the archaeozoology of southwestern Asia and adjacent areas. Groningen: ARC Publications, 121–139.
Johnstone C, Albarella U (2002).
“The Late Iron Age and Romano-British Mammal and Bird Bone Assemblage from Elms Farm, Heybridge, Essex (Site Code: Hyef93-95).”
Technical Report Report 45/2002, tab.16, p. 70, Centre for Archaeology.
Johnstone CJ (2004).
A biometric study of equids in the Roman world.
Ph.D. thesis, University of York.
Nieto-Espinet A (2018).
“Element measure standard biometrical data from a cow dated to the Early Bronze Age (Minferri, Catalonia).”
doi:10.13140/RG.2.2.13512.78081.
Payne S, Bull G (1988).
“Components of variation in measurements of pig bones and teeth, and the use of measurements to distinguish wild from domestic pig remains.”
Archaeozoologia, 2(1), 27–66.
Russell N (1993).
Hunting, Herding and Feasting: human use of animals in Neolithic Southeast Europe.
Ph.D. thesis, University of California, Berkeley.
Steppan K (2001).
“Ur oder Hausrind? Die Variabilität der Wildtieranteile in linearbandkeramischen Tierknochenkomplexen.”
In Arbogast R, Jeunesse C, Schibler J (eds.), Rôle et statut de la chasse dans le Néolithique ancien danubien (5500 - 4900 av. J.-C.) /Rolle und Bedeutung der Jagd während des Frühneolithikums Mitteleuropas (Linearbandkeramik 5500 - 4900 v.Chr.). Premières rencontres danubiennes, Strasbourg 20 et 21 novembre 1996, Actes de la première table-ronde. Internationale Archäologie: Arbeitsgemeinschaft, Symposium, Tagung, Kongress Band 1, 171–186.
na.
Stopp B, Deschler-Erb S (2018).
“Measurements compiled from the reference collection in the Integrative Prähistorische und Naturwissenschaftliche Archäologie (IPNA, University of Basel, Switzerland).”
Personal communication, included permission to publish them as part of the package zoolog.
Uerpmann M, Uerpmann H (1994).
“Animal bone finds from excavation 520 at Qala’at al-Bahrain.”
In Hojlund F, Andersen HH (eds.), Gala'at Al-Bahrain. 1 The Northern City Wall And The Islamic Fortress, 417–444.
Jutland Archaeological Society.
Von den Driesch A (1976).
A guide to the measurement of animal bones from archaeological sites: as developed by the Institut für Palaeoanatomie, Domestikationsforschung und Geschichte der Tiermedizin of the University of Munich, volume 1.
Peabody Museum Press.
Function to remove the table rows for which all measurements of interest are non-available (NA). A particular list of measurement names can be explicitly provided or selected by a common initial pattern. The default setting removes the rows with no log-ratio available.
RemoveNACases(data, measureNames = NULL, prefix = logPrefix)
RemoveNACases(data, measureNames = NULL, prefix = logPrefix)
data |
A dataframe with the input measurements. |
measureNames |
A vector of characters with the list of measurements
to be considered for missing values. If |
prefix |
A character string with the initial pattern to select the
list of measurements. The default is given by the internal variable
|
A dataframe with the same columns as the input dataframe but removing the rows with missing values for all measurements in the list.
## Read an example dataset: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package = "zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## We can observe the first lines (excluding some columns for visibility): head(dataExample)[, -c(6:20,32:64)] ## Remove the cases not including any measurement present in the reference. refMeasureNames <- unique(reference$Combi$Measure) refMeasureNames dataExamplePruned <- RemoveNACases(dataExample, measureNames = refMeasureNames) ## The first lines of the output data frame show at least one available ## measurement value in the selected list: head(dataExamplePruned)[, -c(6:20,32:64)] ## If we compute first the log-ratios dataExampleWithLogs <- LogRatios(dataExample) ## the cases not including any log-ratio can be removed with the ## default logPrefix dataExampleWithLogsPruned <- RemoveNACases(dataExampleWithLogs) head(dataExampleWithLogsPruned)[, -c(6:20,32:64)]
## Read an example dataset: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package = "zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## We can observe the first lines (excluding some columns for visibility): head(dataExample)[, -c(6:20,32:64)] ## Remove the cases not including any measurement present in the reference. refMeasureNames <- unique(reference$Combi$Measure) refMeasureNames dataExamplePruned <- RemoveNACases(dataExample, measureNames = refMeasureNames) ## The first lines of the output data frame show at least one available ## measurement value in the selected list: head(dataExamplePruned)[, -c(6:20,32:64)] ## If we compute first the log-ratios dataExampleWithLogs <- LogRatios(dataExample) ## the cases not including any log-ratio can be removed with the ## default logPrefix dataExampleWithLogsPruned <- RemoveNACases(dataExampleWithLogs) head(dataExampleWithLogsPruned)[, -c(6:20,32:64)]
Functions to map the user provided nomenclature into a standard one as defined in a thesaurus.
StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE) StandardizeDataSet(data, thesaurusSet = zoologThesaurus)
StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE) StandardizeDataSet(data, thesaurusSet = zoologThesaurus)
x |
Character vector. |
thesaurus |
A thesaurus object. |
mark.unknown |
Logical. If |
data |
A data frame. |
thesaurusSet |
A thesaurus set. |
StandardizeNomenclature
standardizes a character vector
according to a given thesaurus.
StandardizeDataSet
standardizes column names and values of
a data frame according to a thesaurus set.
StandardizeNomenclature
returns a vector of the same length as the
input vector x
. The names present in the thesaurus are set to their
corresponding category. The names not in the thesaurus are kept unchanged if
mark.unknown=FALSE
(default) and set to NA
if
mark.unknown=TRUE
.
StandardizeDataSet
returns a data frame with the same structure as
the input data
, but standardizing its nomenclature according to a thesaurus set
including appropriate thesauri for its column names and for the values of
a set of columns.
zoologThesaurus
for a description of the thesaurus and
thesaurus set structure,
ThesaurusReaderWriter
, ThesaurusManagement
## Select the thesaurus for taxa present in the thesaurus set ## zoolog::zoologThesaurus: thesaurus <- zoologThesaurus$taxon thesaurus ## Standardize an heterodox vector of taxa: StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"), thesaurus) ## Observe that "giraffe" is kept unchanged since it is not included in ## any thesaurus category. ## But if mark.unknown is set to TRUE, it is marked as NA: StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"), thesaurus, mark.unknown = TRUE) ## This thesaurus is not case sensitive: attr(thesaurus, "caseSensitive") # == FALSE ## Thus, names are recognized independently of their case: StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"), thesaurus) ## Load an example data frame: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package = "zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## Observe mainly the first columns: head(dataExample[,1:5]) ## Stadardize the dataset: dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus) head(dataStandardized[,1:5])
## Select the thesaurus for taxa present in the thesaurus set ## zoolog::zoologThesaurus: thesaurus <- zoologThesaurus$taxon thesaurus ## Standardize an heterodox vector of taxa: StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"), thesaurus) ## Observe that "giraffe" is kept unchanged since it is not included in ## any thesaurus category. ## But if mark.unknown is set to TRUE, it is marked as NA: StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"), thesaurus, mark.unknown = TRUE) ## This thesaurus is not case sensitive: attr(thesaurus, "caseSensitive") # == FALSE ## Thus, names are recognized independently of their case: StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"), thesaurus) ## Load an example data frame: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package = "zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## Observe mainly the first columns: head(dataExample[,1:5]) ## Stadardize the dataset: dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus) head(dataStandardized[,1:5])
Functions to obtain the subtaxonomy or the set of taxa included in a
particular taxonomic group, according to the zoologTaxonomy
by default.
Subtaxonomy( taxon, taxonomy = zoologTaxonomy, thesaurus = zoologThesaurus$taxon ) SubtaxonomySet( taxon, taxonomy = zoologTaxonomy, thesaurus = zoologThesaurus$taxon ) GetSpeciesIn( taxon, taxonomy = zoologTaxonomy, thesaurus = zoologThesaurus$taxon )
Subtaxonomy( taxon, taxonomy = zoologTaxonomy, thesaurus = zoologThesaurus$taxon ) SubtaxonomySet( taxon, taxonomy = zoologTaxonomy, thesaurus = zoologThesaurus$taxon ) GetSpeciesIn( taxon, taxonomy = zoologTaxonomy, thesaurus = zoologThesaurus$taxon )
taxon |
A name of any of the taxa, at any rank included in the taxonomy (from species to family in the zoolog taxonomy). |
taxonomy |
A taxonomy from which to extract the subtaxonomy.
By default |
thesaurus |
A thesaurus allowing datasets with different nomenclatures
to be merged. By default |
Subtaxonomy
returns a data.frame with the same structure of the input
taxonomy but with only the species (rows) included in the queried
taxon
, and the taxonomic ranks (columns)
up to its level.
SubtaxonomySet
returns a character vector including a unique copy
(set) of all the taxa, at any taxonomic rank, under the queried
taxon
.
Equivalent to Subtaxonomy but as a set instead of a dataframe.
GetSpeciesIn
returns a character vector including the species included
in the queried taxon
.
## Get species of genus Sus: GetSpeciesIn("Sus") ## Get species of family Bovidae: GetSpeciesIn("Bovidae") ## Get the subtaxonomy of the Tribe Caprini: Subtaxonomy("Caprini") ## Use SubtaxonomySet to join categories for computing log-ratios. ## For this, we read an example dataset: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package="zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## We illustrate with a subset of cases to make the example run ## sufficiently fast: dataExample <- dataExample[1:1000, ] ## Compute the log-ratios joining all taxa from tribe \emph{Caprini} ## to use the reference of \emph{Ovis aries}: categoriesCaprini <- list('Ovis aries' = SubtaxonomySet("Caprini")) dataExampleWithLogs <- LogRatios(dataExample, joinCategories = categoriesCaprini)
## Get species of genus Sus: GetSpeciesIn("Sus") ## Get species of family Bovidae: GetSpeciesIn("Bovidae") ## Get the subtaxonomy of the Tribe Caprini: Subtaxonomy("Caprini") ## Use SubtaxonomySet to join categories for computing log-ratios. ## For this, we read an example dataset: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package="zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8") ## We illustrate with a subset of cases to make the example run ## sufficiently fast: dataExample <- dataExample[1:1000, ] ## Compute the log-ratios joining all taxa from tribe \emph{Caprini} ## to use the reference of \emph{Ovis aries}: categoriesCaprini <- list('Ovis aries' = SubtaxonomySet("Caprini")) dataExampleWithLogs <- LogRatios(dataExample, joinCategories = categoriesCaprini)
Functions to modify and check thesauri.
NewThesaurus( caseSensitive = FALSE, accentSensitive = FALSE, punctuationSensitive = FALSE ) AddToThesaurus(thesaurus, newName, category = NULL) RemoveRepeatedNames(thesaurus) ThesaurusAmbiguity(thesaurus)
NewThesaurus( caseSensitive = FALSE, accentSensitive = FALSE, punctuationSensitive = FALSE ) AddToThesaurus(thesaurus, newName, category = NULL) RemoveRepeatedNames(thesaurus) ThesaurusAmbiguity(thesaurus)
caseSensitive , accentSensitive , punctuationSensitive
|
Logical. They set
the case, accent, and punctuation sensitivity ( |
thesaurus |
A thesaurus object. |
newName |
Character vector or (named) list of character vectors with new names to be added to the thesaurus. |
category |
Character vector identifying the classes where the new names should be included. |
In the function AddToThesaurus
the categories in which to add new
names can be specified either as names of a named list given as argument
newName
or explicitly in the argument category
. See the
examples below illustrating both alternatives.
From version 1.2.0 AddToThesurus
directly removes repeated names in
the resulting thesaurus.
NewThesaurus
returns an empty thesaurus. This can then be
populated by AddToThesaurus
.
AddToThesaurus
returns the input thesaurus complemented with new
names in the categories identified. If any of the categories is not present
in the input thesaurus, new categories are added as required.
RemoveRepeatedNames
returns the input thesaurus pruned of redundant
names in each category. The redundancy is evaluated in agreement with the
case and accent sensitivity of the thesaurus.
ThesaurusAmbiguity
returns FALSE if no ambiguity is present. When any
ambiguity is found, it returns TRUE with an attribute errmessage
including the names present in more than one category and the
the involved categories. This is internally used by
ReadThesaurus
and AddToThesaurus
to generate an
error in case they attempt to read or generate an ambiguous thesaurus.
zoologThesaurus
for a description of the thesaurus and
thesaurus set structure,
ReadThesaurus
, WriteThesaurus
,
StandardizeNomenclature
## Load an example thesaurus: thesaurus <- ReadThesaurus(system.file("extdata", "taxonThesaurus.csv", package="zoolog")) ## with categories names(thesaurus) # "bos taurus" "ovis aries" "sus domesticus" ## Add names to several categories: thesaurusExtended <- AddToThesaurus(thesaurus, c("Kuh", "Schwein"), c("bos taurus","sus domesticus")) ## This adds the name "Kuh" to the category "bos taurus" and ## the name "Schwein" to the category "sus domesticus". ## Generate a new thesaurus and populate it with two categories ## ("red" and "blue"): thesaurusNew <- NewThesaurus() thesaurusNew <- AddToThesaurus(thesaurusNew, c("scarlet", "vermilion", "ruby", "cherry", "carmine", "wine"), "red") thesaurusNew thesaurusNew <- AddToThesaurus(thesaurusNew, c("sky blue", "azure", "sapphire", "cerulean", "navy"), "blue") thesaurusNew ## Categories and names can also be included as named list thesaurusNew <- AddToThesaurus(thesaurusNew, list( blue = c("lapis lazuli", "indigo", "cyan"), brown = c("hazel", "chocolate-coloured", "brunette", "mousy", "beige")) ) thesaurusNew ## Attempt to generate an ambiguous thesaurus try(AddToThesaurus(thesaurusNew, "scarlet", "blue")) ## From version 1.2.0 AddToThesurus directly removes repeated names: AddToThesaurus(thesaurusNew, c("scarlet", "ruby"), "red") ## Remove repeated names in the same category: ## If we included any repetitions thesaurusNew[8:9,1] <- c("scarlet", "ruby") thesaurusNew ## they can be removed with RemoveRepeatedNames(thesaurusNew)
## Load an example thesaurus: thesaurus <- ReadThesaurus(system.file("extdata", "taxonThesaurus.csv", package="zoolog")) ## with categories names(thesaurus) # "bos taurus" "ovis aries" "sus domesticus" ## Add names to several categories: thesaurusExtended <- AddToThesaurus(thesaurus, c("Kuh", "Schwein"), c("bos taurus","sus domesticus")) ## This adds the name "Kuh" to the category "bos taurus" and ## the name "Schwein" to the category "sus domesticus". ## Generate a new thesaurus and populate it with two categories ## ("red" and "blue"): thesaurusNew <- NewThesaurus() thesaurusNew <- AddToThesaurus(thesaurusNew, c("scarlet", "vermilion", "ruby", "cherry", "carmine", "wine"), "red") thesaurusNew thesaurusNew <- AddToThesaurus(thesaurusNew, c("sky blue", "azure", "sapphire", "cerulean", "navy"), "blue") thesaurusNew ## Categories and names can also be included as named list thesaurusNew <- AddToThesaurus(thesaurusNew, list( blue = c("lapis lazuli", "indigo", "cyan"), brown = c("hazel", "chocolate-coloured", "brunette", "mousy", "beige")) ) thesaurusNew ## Attempt to generate an ambiguous thesaurus try(AddToThesaurus(thesaurusNew, "scarlet", "blue")) ## From version 1.2.0 AddToThesurus directly removes repeated names: AddToThesaurus(thesaurusNew, c("scarlet", "ruby"), "red") ## Remove repeated names in the same category: ## If we included any repetitions thesaurusNew[8:9,1] <- c("scarlet", "ruby") thesaurusNew ## they can be removed with RemoveRepeatedNames(thesaurusNew)
Functions to read and write thesauri and thesaurus sets.
ReadThesaurus( file, caseSensitive = FALSE, accentSensitive = FALSE, punctuationSensitive = FALSE ) ReadThesaurusSet(file) WriteThesaurus(thesaurus, file) WriteThesaurusSet(thesaurusSet, file)
ReadThesaurus( file, caseSensitive = FALSE, accentSensitive = FALSE, punctuationSensitive = FALSE ) ReadThesaurusSet(file) WriteThesaurus(thesaurus, file) WriteThesaurusSet(thesaurusSet, file)
file |
Name of a file. |
caseSensitive , accentSensitive , punctuationSensitive
|
Logical. They set
the case, accent, and punctuation sensitivity ( |
thesaurus |
A thesaurus object. |
thesaurusSet |
A thesaurus set. |
WriteThesaurus
and WriteThesaurusSet
create or overwrite the
corresponding files. No value is returned.
ReadThesaurus
and ReadThesaurusSet
return the read thesaurus or
thesaurusSet, respectively.
zoologThesaurus
for a description of the thesaurus and
thesaurus set structure,
ThesaurusManagement
,
StandardizeNomenclature
## Read a thesaurus for taxa: thesaurusFile <- system.file("extdata", "taxonThesaurus.csv", package="zoolog") thesaurus <- ReadThesaurus(thesaurusFile) ## The attributes of the thesaurus include the fields 'caseSensitive', ## 'accentSensitive', and 'punctuationSensitive', all FALSE by default. attributes(thesaurus) ## Any of them can be set by the user if desired: thesaurus2 <- ReadThesaurus(thesaurusFile, accentSensitive = TRUE) attributes(thesaurus2) ## Write the thesarus to a file: fileExample <- file.path(tempdir(), "thesaurusExample.csv") WriteThesaurus(thesaurus, fileExample) ## Replace tempdir() for your preferred local path if you want to easily ## examine the written file. ## Read a thesaurus set: thesaurusSetFile <- system.file("extdata", "zoologThesaurusSet.csv", package="zoolog") thesaurusSet <- ReadThesaurusSet(thesaurusSetFile) ## The attributes of the thesaurus set include information of the constituent ## thesauri: names, source file names, and their mode of application on datasets. attributes(thesaurusSet) ## The attributes of each thesaurus are also set by 'ReadThesaurusSet'. attributes(thesaurusSet$measure) ## Write the thesaurus set to a file: fileSetExample <- file.path(tempdir(), "thesaurusSetExample.csv") WriteThesaurusSet(thesaurusSet, fileSetExample) ## It writes the thesaurus-set main data frame and each of the included ## thesaurus files. ## Again, replace tempdir() for your preferred local path if you want to ## easily examine the written files.
## Read a thesaurus for taxa: thesaurusFile <- system.file("extdata", "taxonThesaurus.csv", package="zoolog") thesaurus <- ReadThesaurus(thesaurusFile) ## The attributes of the thesaurus include the fields 'caseSensitive', ## 'accentSensitive', and 'punctuationSensitive', all FALSE by default. attributes(thesaurus) ## Any of them can be set by the user if desired: thesaurus2 <- ReadThesaurus(thesaurusFile, accentSensitive = TRUE) attributes(thesaurus2) ## Write the thesarus to a file: fileExample <- file.path(tempdir(), "thesaurusExample.csv") WriteThesaurus(thesaurus, fileExample) ## Replace tempdir() for your preferred local path if you want to easily ## examine the written file. ## Read a thesaurus set: thesaurusSetFile <- system.file("extdata", "zoologThesaurusSet.csv", package="zoolog") thesaurusSet <- ReadThesaurusSet(thesaurusSetFile) ## The attributes of the thesaurus set include information of the constituent ## thesauri: names, source file names, and their mode of application on datasets. attributes(thesaurusSet) ## The attributes of each thesaurus are also set by 'ReadThesaurusSet'. attributes(thesaurusSet$measure) ## Write the thesaurus set to a file: fileSetExample <- file.path(tempdir(), "thesaurusSetExample.csv") WriteThesaurusSet(thesaurusSet, fileSetExample) ## It writes the thesaurus-set main data frame and each of the included ## thesaurus files. ## Again, replace tempdir() for your preferred local path if you want to ## easily examine the written files.
The taxonomy hierarchy for all taxa included in the osteometrical references
of the package zoolog.
This is used to allow the users to group the taxa by any taxonomical category
from species to family. See
Subtaxonomy
.
zoologTaxonomy
zoologTaxonomy
The taxonomy is given as a data.frame with columns for Species, Genus, Tribe, Subfamily, and Family. Each row lists the information for one species:
Species | Genus | Tribe | Subfamily | Family |
Bos taurus | Bos | Bovini | Bovinae | Bovidae |
Bos primigenius | Bos | Bovini | Bovinae | Bovidae |
Ovis aries | Ovis | Caprini | Caprinae | Bovidae |
Ovis orientalis | Ovis | Caprini | Caprinae | Bovidae |
Capra hircus | Capra | Caprini | Caprinae | Bovidae |
Capra aegagrus | Capra | Caprini | Caprinae | Bovidae |
Gazella gazella | Gazella | Antilopini | Antilopinae | Bovidae |
Sus domesticus | Sus | Suini | Suinae | Suidae |
Sus scrofa | Sus | Suini | Suinae | Suidae |
Cervus elaphus | Cervus | Cervini | Cervinae | Cervidae |
Dama mesopotamica | Dama | Cervini | Cervinae | Cervidae |
Equus asinus | Equus | Equini | Equinae | Equidae |
Equus caballus | Equus | Equini | Equinae | Equidae |
Oryctolagus cuniculus | Oryctolagus | Leporidae | ||
Canis familiaris | Canis | Canini | Caninae | Canidae |
Canis lupus | Canis | Canini | Caninae | Canidae |
zoologTaxonomy
is an exported variable automatically loaded in
memory. In addition, the csv source file zoologTaxonomy.csv
generating it is included in the zoolog extdata
folder.
The thesaurus set defined for the package zoolog.
This is used to make the methods robust to different nomenclatures used
in datasets created by different authors. The user can also use other
thesaurus sets, or can modify the provided thesaurus set (see
ThesaurusManagement
and ThesaurusReaderWriter
).
zoologThesaurus
zoologThesaurus
A thesaurus set is a list of thesauri with additional attributes:
Character vector with the name of each thesaurus.
Logical vector indicating whether each thesaurus should be applied to the column names of the data frame.
Logical vector indicating whether each thesaurus should be applied to the values in the corresponding column of the data frame.
Character vector with the source file of each thesaurus.
The examples below show the list of four thesauri included in the provided
zoologThesurus
.
Each thesaurus is a data frame also with additional attributes. Each column
of the data frame is a category of names with equivalent meaning in the
intended application. The column name identifies the category and is used
as the standard when applying StandardizeNomenclature
.
The names in each column (category) must not be included in any other
column, since this would make the thesaurus ambiguous (see
ThesaurusAmbiguity
).
Each thesaurus has the following attributes:
The standard name for the categories.
"data.frame"
Irrelevant
Logical indicating whether the names in the thesaurus should be considered case-sensitive.
Logical indicating whether the names in the thesaurus should be differentiated by the presence of accent marks.
Logical indicating whether the names in the thesaurus should be differentiated by the presence of punctuation marks.
The examples below show the content and characteristics of the first
thesaurus in zoologThesaurus
.
zoologThesaurus
is an exported variable automatically loaded in
memory. In addition, the source files generating it are included in the
zoolog extdata
folder. There is one file for the thesaurus set
main structure and one file for each included thesaurus. All of them are in
semicolon separated format. Thus, they can be examined in any text editor
or imported into any spreadsheet application. The files are:
zoologThesaurusSet.csv
Defines the main structure of the thesaurus set. It has a row for each thesaurus and seven columns (ThesaurusName, FileName, CaseSensitive, AccentSensitive, PunctuationSensitive, ApplyToColNames, and ApplyToColValues). Their meaning coincides with the description above. Observe that the case, accent, and punctuation sensitiveness is stored here, instead of in each thesaurus.
identifierThesaurus.csv
Thesaurus for the identifiers used
in LogRatios
to identify the bone types and the measure
names in the data and the references. It has for columns:
Taxon, Element, Measure, and Standard.
taxonThesaurus.csv
Thesaurus for the taxa. There is one column for each category of taxon considered.
elementThesaurus.csv
Thesaurus for the skeletal elements. One column for each category.
measureThesaurus.csv
Thesaurus for the measure names. One column for each category.
## List of thesaurus names and characteristics in the thesaurus set: attributes(zoologThesaurus) ## Content of the first thesaurus: zoologThesaurus$identifier attributes(zoologThesaurus$identifier)
## List of thesaurus names and characteristics in the thesaurus set: attributes(zoologThesaurus) ## Content of the first thesaurus: zoologThesaurus$identifier attributes(zoologThesaurus$identifier)