Skip to content
Snippets Groups Projects
README.Rmd 2.32 KiB
Newer Older
Facundo Muñoz's avatar
Facundo Muñoz committed
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(topomatch)
```
# topomatch

Helper function for matching toponyms from different sources, that can
be written in slightly different ways. Allows to inspect the matching
and act accordingly.

```{r example-match}
countries1 <- spData::world$name_long
Facundo Muñoz's avatar
Facundo Muñoz committed
countries2 <- unique(maps::world.cities$country.etc)

(country_matches <- topomatch(countries1, countries2))

```

There are some manual fixes needed for those toponyms that weren't
correctly matched. Just write the fixes in a named vector.
If there is no correct match for one toponym, give it an `NA`.

```{r example-fixes}
## Inspect the competing candidates for the unmatched countries
(bm <- best_matches(country_matches)[unmatched(country_matches)])

cnames_fixes <- setNames(
  c("Congo Democratic Republic", NA, "Laos", "Korea North",
    "Korea South", NA),
  names(bm)
)

## Fix the incorrectly matches from similarity as well
cnames_fixes <- c(
  cnames_fixes,
  "United States" = "USA",
  "French Southern and Antarctic Lands" = "France",
  "Côte d'Ivoire" = "Ivory Coast",
  "United Kingdom" = "UK",
  "Antarctica" = NA,
  "Northern Cyprus" = "Cyprus",
  "Somaliland" = "Somalia",
  "South Sudan" = "Sudan"
)
```

Now you can `transcribe` the original toponyms to the
matched terms.

```{r example-transcribe}
translate <- transcribe(country_matches, fixes = cnames_fixes)

translate(c("United Kingdom", "Kosovo"))

## "Translate" all of the original toponyms
countries1_trans <- translate(countries1)

## Only those "fixed" as NA are not found in the second list
countries1[!countries1_trans %in% countries2]
```


## Method
Facundo Muñoz's avatar
Facundo Muñoz committed

Wraps local-global alignment algorithm borrwed from bioConductor
package `Biostrings`. Works better than global alignment and requires
less fine-tuning (although is considerably slower too)
https://ro-che.info/articles/2016-12-11-local-alignment. 

## Installation

```r
Facundo Muñoz's avatar
Facundo Muñoz committed
remotes::install_gitlab("umr-astre/topomatch", host = "forgemia.inra.fr")
Facundo Muñoz's avatar
Facundo Muñoz committed
```

<!-- You can install the released version of topomatch from [CRAN](https://CRAN.R-project.org) with: -->

<!-- ``` r -->
<!-- install.packages("topomatch") -->
<!-- ``` -->