#Rank in League of Legends ###Christopher Lee ###CMSC 320

##Introduction

###What is League of Legends?

In the multiplayer online battle arena game League of Legends (acronym LoL), players compete in a battleground known as The Summoner’s Rift in teams of 5, where victory is achieved by destroying the enemy team’s nexus - the core of a team’s base. In the same way that, for example, the skills of a basketball player can be analyzed and compared quantitatively, the same can be achieved for players of LoL. To win in the Rift, it requires team coordination, game sense (referred to as “macro”), matchup understanding (referred to as “micro”), and a bit of luck.

Here’s a link from Riot (League’s company) to explain more on the basics of the game: https://na.leagueoflegends.com/en-us/how-to-play/

###Why League of Legends?

One thing that makes LoL unique and particularly interesting for data analysis is their ranking system. This ranking system is split into 9 tiers (where the worst players are in a tier called “Iron” and the best in a tier called “Challenger”), each with 4 divisions (e.g., Bronze 4, Bronze 1) where the lower number divisions are the “higher rank”. To advance in tiers and divisions, a player must consistently win games. The current highest rated player - a professional player named Zven - currently sits at a 64% win rate with over 300 games played (211 wins with 117 losses). In contrast, a below average player would see a win rate of around 40%.

The goal of many players is to sharpen their macro and micro skills with an effect of increasing their game performance, enabling them to ascend in rank.

Here’s an article that goes into detail on how important certain aspects of the game can be for overall improvement: https://mobalytics.gg/blog/path-to-improvement-in-league-legends/

###Prerequisite Vocabulary:

Champions: These are the “vessels” in which you play the game through. With over 140, each champion has a unique set of abilities they can use to fight other champions. Gold: The currency for the game. CS: “creep score”; To make your champion stronger, you need to obtain gold by killing monsters called creeps. Items: Items are what you use your gold on. Items give stat bonuses making your champion stronger, and is an integral part of a successful game. Kills/Deaths/Assists: Another way to earn (or lose) gold is to kill another champion in battle. If, e.g., two players on one team kill an opponent, one player would get an assist, the other a kill, and the player killed would get a death. Wards: Since the Summoner’s Rift is a massive map, it’s advantageous to have vision across all parts of the map. Once a ward is placed somewhere on the map by a player, that player’s team obtains vision in a small radius surrounding the ward for a short period of time.

Here is some more fun vocabulary if anyone’s interested: https://mobalytics.gg/blog/league-of-legends-terms/

###Project Motivation:

In the LoL community, there’s a lot of speculation as to what players should focus on to improve their performance and increase their rank. The typical answers here are: increase CS, die less, and place more wards (referred to as “warding”). These are the fundamental goals of a lot of players, and are perfected by the best. But, are these actually the the important aspects of the game players should focus on?

This project investigates the importance of CS, Kills/Deaths/Assists, and Kill Participation (KP).

##Our Process: First, we’ll scrape information from the website OP.GG, an open-source data & game tracker frequently used by many players, and certainly pivitol for top players.

Here’s their website: https://na.op.gg

Next, we’ll tidy up the data, preparing the data for exploratory analysis, hypothesis testing, and finally create a predictor model with multiple regression.

R Markdown

First, we’ll load in some libraries.

#first lets import some libraries
library(rvest)
## Loading required package: xml2
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()         masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag()            masks stats::lag()
## x purrr::pluck()          masks rvest::pluck()
library(stringr)
library(multcompView)
library(broom)

##Obtaining Data

The objective for this section is to obtain data to operate on. In this code, we’ll randomly select 150 players for our data.

First, we want to scrape the total number of users so that our code is reproducable.

#next let's grab the total number of pages

#data url
total_url <- "https://na.op.gg/ranking/ladder/"

#let's scrape
total_pages <- total_url %>%
  read_html() %>%
  html_node(".ranking-pagination__desc") %>%
  html_text()

#we have our desired text
total_pages
## [1] "#1 ~ #100\n\t\t\t\t\t\t\t\t\t\t\t\t/ Total 1,599,994 Summoners\n\t\t\t\t\t"
#now let's use regex to grab the total number of summoners
total_summoners <- total_pages %>%
  str_extract("Total (\\d|,)+") %>%
  str_remove_all(',') %>%
  str_split(' ') 

#cast the total number of  summoners to an integer
total_summoners <- as.integer(total_summoners[[1]][2])
total_summoners
## [1] 1599994

Since there are 100 users per page, we need to divide the total number of users by 100. We also do some slight adjusting, since, e.g., 1001 users means there’s 11 pages, where the 11th page has one user.

#since there are 100 players per page, let's use floor division to get the
#total number of pages
total_pages <- 0

if (total_summoners %% 10  == 0) {
  #if  the total number of pages is a multiple of 100, 
  #float division
  total_summoners <- total_summoners / 100
} else{
  #else we have to account for an "extra" page of players
  total_pages <- (total_summoners %/% 100) + 1
}

total_pages
## [1] 16000

Now, let’s randomly sample 50 total pages from all pages. From these pages, we’ll choose our players.

#now we want to randomly sample players from 
#all possible pages from the url below
pages_url <- "https://na.op.gg/ranking/ladder/page="

#generate 20 random pages to randomly select players from
#Without replacement
rand_pages <- sample(1:total_pages, 50)
rand_pages
##  [1] 14832  1872 14547  8842  1691 14490  4446  7997  9381  8291  2354  7572
## [13] 14085  7045  5415 11308  3590  2581  9498  6118  6705  8413 13100 12604
## [25] 13338 13802  9954  4626  3564  2429 13489 15734  4581 15099  7108 10021
## [37]  1032  6120 10589  3282  2073   908  3078  1634   249  2646  2759  5659
## [49]  3074 14172

Here, we’ll scrape 3 usernames of players randomly from our selected pages.

#create a list of player usernames 
usernames <- list()
i <- 1

for (page in rand_pages) {
  #add the page number to the pages url
  temp_url <- str_c(pages_url, page)
  
  #scrape the table of players from the current page 
  table_df <- temp_url %>%
    read_html() %>%
    html_node(".LadderRankingLayout") %>%
    html_node("table") %>%
    html_table()
  
  #generate 3 random numbers to pick 3 usernames
  n <- sample(1:100, 3)
  n
  
  for (j in n) {
    #add the username to the list of usernames
    usernames[[i]] <- table_df[j, 2]
    i <- i + 1 
  }

}

#150 randomly sampled usernames
usernames %>% head(10)
## [[1]]
## [1] "maigaren"
## 
## [[2]]
## [1] "1Chopper kun"
## 
## [[3]]
## [1] "Kyokeikeikashi"
## 
## [[4]]
## [1] "NoobItDown"
## 
## [[5]]
## [1] "theNotSoRUSSIAN"
## 
## [[6]]
## [1] "InflatedM"
## 
## [[7]]
## [1] "MrGrimm94"
## 
## [[8]]
## [1] "Dretep12"
## 
## [[9]]
## [1] "ßotÐiff"
## 
## [[10]]
## [1] "0mGkæx"

Let’s create a data frame with our selected attrbutes for our data.

#prep the data frame we'll be adding scraped data to
vect <- unlist(usernames)
df <- data.frame("Username" = c(vect), "Kills" = 0, "Deaths" = 0, "Assists" = 0, "CSmin" = 0, "CStotal" = 0, "KP" = 0, "rank" = "" )
df %>%
  head(10)

To make the next process easier, we create some functions that scrape specific data.

The first function, get_info(), scrapes most of the html page, where the nodes of the information we’re interested in lie.

After that, get_kda returns a list with a player’s Kills/Deaths/Assists from their most recent game.

#get recent game info html 
get_info <- function(name) {
  #now that we have our players, let's 
  #take some stats from their most recent ranked game
  user <- name
  user_url <- "https://na.op.gg/summoner/userName="

  temp_user_url <- str_c(user_url, user)

  #grab some general info from the most recently played game
  temp_user_url %>%
    read_html() %>%
    html_node(".GameItemWrap")  %>%
    html_node(".Content")
  # returns game_info 

}
  

get_kda <-  function(game_html) {

  #We want to focus on 2 classes:
  #KDA, STATS
  
  #grab the KDA of the player
  game_html %>%
    html_node(".KDA") %>%
    html_text() %>%
    str_remove_all("\t") %>%
    str_remove_all(" ") %>%
    str_remove_all("/") %>%
    str_split("\n")
}

For these two functions, we scrape the CS per minute and total CS from a player’s most recent game, and then get the kill participation (KP) from the same game.

#grab the stats for a particular player
get_cs <- function(game_html){
  stats <- game_info %>%
    html_node(".Stats") %>%
    html_text() %>%
    str_remove_all("\t") %>%
    str_split("\n") %>%
    unlist()
  
  #now, let'stake the stat info we want with some regex
  
  
  #grab some cs stats
  cs_stats <- str_remove_all(stats[6], "\\)") %>%
    str_remove_all("\\(") %>%
    str_split(" ") %>%
    unlist

}

get_kp <- function(game_html) {
  stats <- game_info %>%
    html_node(".Stats") %>%
    html_text() %>%
    str_remove_all("\t") %>%
    str_split("\n") %>%
    unlist()
  #grab the kill participation (KP)
  str_remove(stats[9], "\\%") %>%
    str_split(" ") %>%
    unlist()
}

Finally, we scrape the rank of the player.

#now lets scrape  their rank
get_rank <- function(name) {
  user <- name
  user_url <- "https://na.op.gg/summoner/userName="

  temp_user_url <- str_c(user_url, user)
  
  temp_user_url %>%
    read_html() %>%
    html_node(".TierRankInfo") %>%
    html_node(".TierRank") %>%
    html_text()

}

We have a corner case in which the user has no games listed on their op.gg. In this case, we’ll just fill out NA for missing data. We return TRUE here if there are no results (no data) to scrape from this player.

check_results <- function(name) {
  user <- name
  user_url <- "https://na.op.gg/summoner/userName="
  
  temp_user_url <- str_c(user_url, user)
  
  checker <- temp_user_url %>%
    read_html() %>%
    html_text()
  
  res <- checker %>%
    str_extract("There are no results recorded.")
  
  if (is.na(res)) {
    FALSE
  } else {
    #TRUE means that there are no results; cant use
    TRUE
  }
}

In this code chunk, we loop through the list of usernames and call each of the aformentioned functions to scrape data for a particular user.

One thing to note: Op.GG handles spaces in names for their url with “%20”. That is, if a username is “Cats Better Than Dogs”, we need to scrape from the url: “https://na.op.gg/summoner/userName=Cats%20Better%20Than%20Dogs”.

After data is scraped for each username, we add the data to our data frame.

#here we will call the functions used to scrape stats for all players
#and then add it to  our data frame. 

i <- 1 #iteration through the data frame when  adding content

user_list <- unlist(usernames)

#let's loop through our usernames and call get_info
for (player_username in user_list) {
  #op gg handles spaces in urls with '%20'
  player_username <- str_replace_all(player_username, " ", "%20")

  #check if there is data to scrape
  if (check_results(player_username) == TRUE) {
    #if there's nothing to scrape, use NA
    df[i,] = NA
  } else {  
    
    game_info <- get_info(player_username)  #scrape the most recent game information
    kda <- get_kda(game_info) #grab the kda of the player
    #skip the first two entries; use  entries 3-5
    kills <- kda[[1]][3]
    deaths <- kda[[1]][4]
    assists <- kda[[1]][5]
    
    cs_stats <- get_cs(game_info)
    CStotal <- cs_stats[1]
    CSmin <- cs_stats[2]
    
    kp <- get_kp(game_info)
    kp <- kp[2]
    
    rank <- get_rank(player_username)
    
    #now that we have all our data for a player, add to our frarme
    df[i, 2] <- kills 
    df[i, 3] <- deaths
    df[i, 4] <- assists
    df[i, 5] <- CSmin
    df[i, 6] <- CStotal
    df[i, 7] <- kp
    df[i, 8] <- rank
  }
  #increment df index
  i <- i + 1
}

#after a few minutes, we should have our frame
df %>%
  head(10)

##Tidying Our Data

Now that we’ve successfully scraped all our data, it’s time to do some tidying before we do some processing.

We start by adding back the list of usernames into the Username attribute column from our dataframe. From there, we seperate the tier and the division, since we’re only interested in the tier-differences. Finally, we change some data types and factorize the tiers.

#let's put the user names back in, and we have our data frame
df[1] <- user_list

#now we have to tidy up rank
#we wont focus on divisions; we'll stick to tiers. 

#get rid of the number in rank
temp_df <- df %>%
  mutate(temp_tier=str_split(df$rank, " "), tier="") 

i <- 1
for (arr in temp_df$temp_tier) {
  temp_df[i, 10] <- arr[1]
  i <- i + 1
}

# change the datatypes of the numeric attributes
# and select the attributes you want to keep
almost_tidy_df <- temp_df %>%
  mutate(Kills=as.integer(Kills), Deaths=as.integer(Deaths), Assists=as.integer(Assists),
         CSmin=as.double(CSmin), CStotal=as.integer(CStotal), KP=as.integer(KP)) %>%
  select(-rank)

#last thing we want to do is to quantify tiers; that is,
#we need to express that, e.g., Gold is a higher tier than Silver.

tidy_data <- almost_tidy_df %>%
  mutate(factor_tier=as.factor(tier))

##Exploratory Data Analysis

#let's looks at a scatter plot of K/D and CSmin
tidy_data %>%
  ggplot(mapping = aes(x=factor_tier, y=CSmin)) +
  geom_boxplot() + 
  labs(title="Distributions of CS per minute For Different Ranks",
      x="Tier",
      y="CS per minute")

#let's looks at a scatter plot of K/D and CSmin
tidy_data %>%
  ggplot(mapping = aes(x=factor_tier, y=((Kills+Assists)))) +
  geom_boxplot() + 
  labs(title="Distributions of Kills + Assists in a Game For Different Ranks",
      x="Tier",
      y="Kills and Assists")

#let's looks at a scatter plot of K/D and CSmin
tidy_data %>%
  ggplot(mapping = aes(x=factor_tier, y=(Deaths))) +
  geom_boxplot() + 
  labs(title="Distributions of Deaths in a Game For Different Ranks",
      x="Tier",
      y="Deaths")

#let's looks at a scatter plot of K/D and CSmin
tidy_data %>%
  ggplot(mapping = aes(x=factor_tier, y=(KP))) +
  geom_boxplot() + 
  labs(title="Distributions of Kill Participation in a Game For Different Ranks",
      x="Tier",
      y="Kill Participation (Percent)")

#it looks like csmin, kills+assists, kp could all explain rank.
#let's test the significane of them

#what we want to do now is find averages of all the above
#attributes for each rank and test if there's a 
#statistically significant difference. 


#let's use ANOVA to perform a multi-comparison test
#and follow up with the Tukey Method.

#let's focus on a few contiguous tiers
anova_data <- tidy_data %>%
  filter(tier=="Gold" | tier=="Silver" | tier=="Bronze")
anova_data %>%
  head(10)
model <- lm(anova_data$CSmin ~ anova_data$tier)
ANOVA = aov(model)

TUKEY <- TukeyHSD(x=ANOVA, 'anova_data$tier', conf.level=0.95)

plot(TUKEY , las=1 , col="brown")

model <- lm(anova_data$Deaths ~ anova_data$tier)
ANOVA = aov(model)

TUKEY <- TukeyHSD(x=ANOVA, 'anova_data$tier', conf.level=0.95)

plot(TUKEY , las=1 , col="brown")

model <- lm(anova_data$KP ~ anova_data$tier)
ANOVA = aov(model)

TUKEY <- TukeyHSD(x=ANOVA, 'anova_data$tier', conf.level=0.95)

plot(TUKEY , las=1 , col="brown")

rank_fit <- lm(KP~CSmin*Deaths*Kills*Assists*tier, data=anova_data)
rank_fit_stats <- rank_fit %>%
  tidy()
rank_fit_stats %>% knitr::kable()
term estimate std.error statistic p.value
(Intercept) 187.6149885 168.4087030 1.1140457 0.2684371
CSmin -24.0296854 29.3871912 -0.8176925 0.4158462
Deaths -27.3493446 25.2646715 -1.0825134 0.2821222
Kills -47.3697282 45.4550497 -1.0421225 0.3003450
Assists -26.5202838 21.1870442 -1.2517217 0.2141457
tierGold -163.2207725 173.1922085 -0.9424256 0.3486774
tierSilver -186.0442359 168.9125787 -1.1014232 0.2738586
CSmin:Deaths 4.1672392 4.4762036 0.9309762 0.3545338
CSmin:Kills 8.9760868 7.9736349 1.1257208 0.2634898
Deaths:Kills 8.7260945 7.2751034 1.1994461 0.2337269
CSmin:Assists 5.2379584 4.0228559 1.3020497 0.1964585
Deaths:Assists 5.3525996 3.6325216 1.4735217 0.1443481
Kills:Assists 9.2592344 6.2916087 1.4716800 0.1448440
CSmin:tierGold 20.4079933 30.9546191 0.6592875 0.5115146
CSmin:tierSilver 26.9441410 29.5130523 0.9129568 0.3638782
Deaths:tierGold 34.5104026 26.4710290 1.3037046 0.1958960
Deaths:tierSilver 28.1711488 25.4012958 1.1090438 0.2705764
Kills:tierGold 53.6870441 45.7219487 1.1742073 0.2436303
Kills:tierSilver 50.8091531 46.0231905 1.1039902 0.2727499
Assists:tierGold 29.6923595 21.6209253 1.3733159 0.1733088
Assists:tierSilver 30.4587302 21.3301413 1.4279666 0.1570089
CSmin:Deaths:Kills -1.5479563 1.3279005 -1.1657171 0.2470282
CSmin:Deaths:Assists -1.0139605 0.6974456 -1.4538203 0.1497223
CSmin:Kills:Assists -1.8603759 1.2467778 -1.4921471 0.1394069
Deaths:Kills:Assists -1.8072120 1.1254415 -1.6057804 0.1120752
CSmin:Deaths:tierGold -4.9541454 4.8366444 -1.0242939 0.3086373
CSmin:Deaths:tierSilver -3.8917650 4.5365676 -0.8578655 0.3934078
CSmin:Kills:tierGold -8.8496509 8.0229087 -1.1030477 0.2731566
CSmin:Kills:tierSilver -9.1076791 8.0541316 -1.1308083 0.2613541
Deaths:Kills:tierGold -10.0937236 7.3119894 -1.3804347 0.1711153
Deaths:Kills:tierSilver -8.9371809 7.3171299 -1.2214053 0.2253496
CSmin:Assists:tierGold -5.3884447 4.1555018 -1.2967013 0.1982847
CSmin:Assists:tierSilver -5.4953345 4.0596483 -1.3536479 0.1794799
Deaths:Assists:tierGold -5.9917401 3.7339990 -1.6046443 0.1123255
Deaths:Assists:tierSilver -5.4830902 3.6452594 -1.5041701 0.1362883
Kills:Assists:tierGold -9.5675929 6.3064938 -1.5171018 0.1329956
Kills:Assists:tierSilver -9.5511782 6.3217476 -1.5108446 0.1345809
CSmin:Deaths:Kills:Assists 0.3532898 0.2278623 1.5504533 0.1247924
CSmin:Deaths:Kills:tierGold 1.6911244 1.3348687 1.2668844 0.2086980
CSmin:Deaths:Kills:tierSilver 1.5328813 1.3358551 1.1474907 0.2544366
CSmin:Deaths:Assists:tierGold 1.1095381 0.7257982 1.5287143 0.1300924
CSmin:Deaths:Assists:tierSilver 0.9895571 0.7019765 1.4096728 0.1623281
CSmin:Kills:Assists:tierGold 1.8604594 1.2490844 1.4894585 0.1401119
CSmin:Kills:Assists:tierSilver 1.8910587 1.2508113 1.5118657 0.1343212
Deaths:Kills:Assists:tierGold 1.8958073 1.1274658 1.6814765 0.0963850
Deaths:Kills:Assists:tierSilver 1.8271949 1.1273387 1.6208038 0.1088078
CSmin:Deaths:Kills:Assists:tierGold -0.3653061 0.2282580 -1.6004085 0.1132625
CSmin:Deaths:Kills:Assists:tierSilver -0.3526437 0.2282084 -1.5452708 0.1260401
rank_fit %>%
  augment() %>%
  ggplot(aes(x=.fitted, y=.resid, color=tier)) + 
  geom_point()