It’s October, time for spooky Twitter names! If you’re on this social
media platform, you might have noticed some of your friends switching
their names to something spooky and punny. Last year I was “Maelstrom
Salmon”, which I find scary but is arguably not that funny. Anyhow, what
if you want to switch your name but have no inspiration? In this post,
we shall explore R’s abilities to help us with that with the help of
webscraping, phonetic spelling and string distance algorithms, and the
magic of randomness!

My strategy for spooky name will be to replace each part of a
name (e.g. first and last names) with the phonetically closest
-related words. Therefore, the technical challenges here were
to find a list of Halloween-related words, and to measure phonetical
string distance.

The otherwise very useful package
rcorpora doesn’t have a
“spooky” category, so I decided to scrape this webpage of Halloween
words for
kids
.
It was one of the first search results, and being for kids it couldn’t
be offensive. Like the last two times I webscraped, I used the very
cool

polite package!

There were two less useful sections on the page, with French and Spanish
words, that I very inelegantly removed after looking up their div ID in
the page source.

# introduce myself
session 
## No encoding supplied: defaulting to UTF-8.
# remove the two sections I don't want
French 
## [1] "All Hallows' Event" "lanternnt"          "pranknt"           
## [4] "costument"          "make-believent"     "sweetsnt"

I cleaned the words a bit. Some of them were not words so I split
them. I hesitated between this solution and keeping only one-word
phrases… I’m still not sure it was the best decision!

words  0])
words 
## [1] "all"     "hallows" "eve"     "lantern" "prank"   "costume"

I obtained 258 words.

Once I had this vector of Halloween-related words, all I needed was a
way to compute a phonetical distance between each of them and a name. It
turned out more complicated than I thought! The soundex method of the
stringdist package
returned 1 or 0 only, so I set out to search another package supporting
phonetical comparison. I found the
phonics package that has a
paper in JOSS, that implements more algorithms translating strings to
phonetic “codes”. I ended up using the phonics::nysiis() function,
corresponding to the New York State Identification and Intelligence
System phonetic algorithm. That package also has an algorithm suitable
for German, via phonics::cologne(), and one for French, via
phonics::statcan(), but I haven’t explored that further.

phonics::nysiis("pain")
## [1] "PAN"
phonics::nysiis("pane")
## [1] "PAN"
phonics::nysiis("train")
## [1] "TRAN"

I was at a loss as to how best compare output from phonics::nysiis()
and very hackily used… stringdist::stringdist() with a default method!
Below is the function finding the closest match for any name part.

_word 
## [1] "bones"

So spookify_word() samples a word among the closest matches… that
aren’t necessarily very close to the original name. But that’s fine!

I finally created a function taking a whole name, say “Maëlle Salmon” or
“Ada Colau Ballano”, tokenizes it into words thanks to the
tokenizers package, gets the
closest match for each part and then collapses everything together,
capitalizing first letters thanks to
snakecase… before announcing
the result with the cool cowsay
package! I made sure to only use animals that are associated with
Halloween.

spookify 

And because I didn’t want to make fun of anyone else than me, I created
7 fake names using the charlatan package and ran spookify on them!

set.seed(42)
names 
## Colors cannot be applied in this environment :( Try using a terminal or RStudio.

## 
## 
##  ----- 
## Byebye Tyrik Graham PhD
## You are now Trick Grinning Giant 
##  ------ 
##        
##      
##               |
##               |
##               |
##              __
##           | /   |
##          _\  //_/
##           .'/()'.
##            \  //  [nosig]
## 

## Colors cannot be applied in this environment :( Try using a terminal or RStudio.

## 
##  ----- 
## Byebye Jeremy Rippin
## You are now Horrify Rotten 
##  ------ 
##        
##        
##       
##        /___/
##        {o}{o}|
##         v  /|
##        |     
##         ___/_/       [ab] 
##           | |

## Colors cannot be applied in this environment :( Try using a terminal or RStudio.

## 
##  ----- 
## Byebye Margarett Purdy
## You are now Boogers Party 
##  ------ 
##        
##        
##       
##        /___/
##        {o}{o}|
##         v  /|
##        |     
##         ___/_/       [ab] 
##           | |

## Colors cannot be applied in this environment :( Try using a terminal or RStudio.

## 
## 
##  ----- 
## Byebye Miss Tilla Funk DDS
## You are now Ooze Vil Fangs Dead 
##  ------ 
##        
##      
##                   ___
##                ___)__|_
##           .-*'          '*-,
##          /      /|   |     
##         ;      /_|   |_     ;
##         ;   |           /|  ;
##         ;   | ''--...--'' |  ;
##            ''---.....--''  /
##           ''*-.,_______,.-*'  [nosig]
## 

## Colors cannot be applied in this environment :( Try using a terminal or RStudio.

## 
## 
##  ----- 
## Byebye Wilton Carroll IV
## You are now Cartoon Cruella Eve 
##  ------ 
##        
##      
##               |
##               |
##               |
##              __
##           | /   |
##          _\  //_/
##           .'/()'.
##            \  //  [nosig]
## 

## Colors cannot be applied in this environment :( Try using a terminal or RStudio.

## 
## 
##  ----- 
## Byebye Tevin Conn
## You are now Coffin Bones 
##  ------ 
##        
##      
##      .-.
##     (o o)
##     | O 
##         
##       `~~~' [nosig]
## 

## Colors cannot be applied in this environment :( Try using a terminal or RStudio.

## 
## 
##  ------------- 
## Byebye Dr. Ballard Langworth III
## You are now Dish Blood Cemetery Dish 
##  -------------- 
##                  
##                  
##                 
##         __.--'     .__./     /'--.__
##     _.-'       '.__.'    '.__.'       '-._
##   .'                                      '.
##  /                                          
## |                                            |
## |                                            |
##           .---.              .---.         /
##   '._    .'     '.''.    .''.'     '.    _.'
##      '-./              /           .-'
##                       ''mrf

Note that I reported the message I got from
cowsay
. It’s not too bad,
cowsay is not designed for use in R Markdown I think.

I found the results not fantastic, but not too bad either! I especially
liked “Boogers Party”!

In conclusion, in this post I showed how to build a corpus of
Halloween-related words by responsibly webscraping, and how to more or
less hackily compare strings based on their pronunciation. I obviously
tried the function on my own name (without the accent, useless in this
case) and wasn’t too happy…

spookify("Maelle Salmon", 42, words)
## Colors cannot be applied in this environment :( Try using a terminal or RStudio.

## 
## 
##  ------------- 
## Byebye Maelle Salmon
## You are now Mucus Bulging 
##  -------------- 
##                  
##                  
##                 
##         __.--'     .__./     /'--.__
##     _.-'       '.__.'    '.__.'       '-._
##   .'                                      '.
##  /                                          
## |                                            |
## |                                            |
##           .---.              .---.         /
##   '._    .'     '.''.    .''.'     '.    _.'
##      '-./              /           .-'
##                       ''mrf

But then, at least, it is scary.


R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…


If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook





Source link https://www.r-bloggers.com/spookify-halloween-name-generation-in-r/

LEAVE A REPLY

Please enter your comment!
Please enter your name here