#Rank in League of Legends ###Christopher Lee ###CMSC 320
##Introduction
###What is League of Legends?
In the multiplayer online battle arena game League of Legends (acronym LoL), players compete in a battleground known as The Summoner’s Rift in teams of 5, where victory is achieved by destroying the enemy team’s nexus - the core of a team’s base. In the same way that, for example, the skills of a basketball player can be analyzed and compared quantitatively, the same can be achieved for players of LoL. To win in the Rift, it requires team coordination, game sense (referred to as “macro”), matchup understanding (referred to as “micro”), and a bit of luck.
Here’s a link from Riot (League’s company) to explain more on the basics of the game: https://na.leagueoflegends.com/en-us/how-to-play/
###Why League of Legends?
One thing that makes LoL unique and particularly interesting for data analysis is their ranking system. This ranking system is split into 9 tiers (where the worst players are in a tier called “Iron” and the best in a tier called “Challenger”), each with 4 divisions (e.g., Bronze 4, Bronze 1) where the lower number divisions are the “higher rank”. To advance in tiers and divisions, a player must consistently win games. The current highest rated player - a professional player named Zven - currently sits at a 64% win rate with over 300 games played (211 wins with 117 losses). In contrast, a below average player would see a win rate of around 40%.
The goal of many players is to sharpen their macro and micro skills with an effect of increasing their game performance, enabling them to ascend in rank.
Here’s an article that goes into detail on how important certain aspects of the game can be for overall improvement: https://mobalytics.gg/blog/path-to-improvement-in-league-legends/
###Prerequisite Vocabulary:
Champions: These are the “vessels” in which you play the game through. With over 140, each champion has a unique set of abilities they can use to fight other champions. Gold: The currency for the game. CS: “creep score”; To make your champion stronger, you need to obtain gold by killing monsters called creeps. Items: Items are what you use your gold on. Items give stat bonuses making your champion stronger, and is an integral part of a successful game. Kills/Deaths/Assists: Another way to earn (or lose) gold is to kill another champion in battle. If, e.g., two players on one team kill an opponent, one player would get an assist, the other a kill, and the player killed would get a death. Wards: Since the Summoner’s Rift is a massive map, it’s advantageous to have vision across all parts of the map. Once a ward is placed somewhere on the map by a player, that player’s team obtains vision in a small radius surrounding the ward for a short period of time.
Here is some more fun vocabulary if anyone’s interested: https://mobalytics.gg/blog/league-of-legends-terms/
###Project Motivation:
In the LoL community, there’s a lot of speculation as to what players should focus on to improve their performance and increase their rank. The typical answers here are: increase CS, die less, and place more wards (referred to as “warding”). These are the fundamental goals of a lot of players, and are perfected by the best. But, are these actually the the important aspects of the game players should focus on?
This project investigates the importance of CS, Kills/Deaths/Assists, and Kill Participation (KP).
##Our Process: First, we’ll scrape information from the website OP.GG, an open-source data & game tracker frequently used by many players, and certainly pivitol for top players.
Here’s their website: https://na.op.gg
Next, we’ll tidy up the data, preparing the data for exploratory analysis, hypothesis testing, and finally create a predictor model with multiple regression.
First, we’ll load in some libraries.
#first lets import some libraries
library(rvest)
## Loading required package: xml2
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag() masks stats::lag()
## x purrr::pluck() masks rvest::pluck()
library(stringr)
library(multcompView)
library(broom)
##Obtaining Data
The objective for this section is to obtain data to operate on. In this code, we’ll randomly select 150 players for our data.
First, we want to scrape the total number of users so that our code is reproducable.
#next let's grab the total number of pages
#data url
total_url <- "https://na.op.gg/ranking/ladder/"
#let's scrape
total_pages <- total_url %>%
read_html() %>%
html_node(".ranking-pagination__desc") %>%
html_text()
#we have our desired text
total_pages
## [1] "#1 ~ #100\n\t\t\t\t\t\t\t\t\t\t\t\t/ Total 1,599,994 Summoners\n\t\t\t\t\t"
#now let's use regex to grab the total number of summoners
total_summoners <- total_pages %>%
str_extract("Total (\\d|,)+") %>%
str_remove_all(',') %>%
str_split(' ')
#cast the total number of summoners to an integer
total_summoners <- as.integer(total_summoners[[1]][2])
total_summoners
## [1] 1599994
Since there are 100 users per page, we need to divide the total number of users by 100. We also do some slight adjusting, since, e.g., 1001 users means there’s 11 pages, where the 11th page has one user.
#since there are 100 players per page, let's use floor division to get the
#total number of pages
total_pages <- 0
if (total_summoners %% 10 == 0) {
#if the total number of pages is a multiple of 100,
#float division
total_summoners <- total_summoners / 100
} else{
#else we have to account for an "extra" page of players
total_pages <- (total_summoners %/% 100) + 1
}
total_pages
## [1] 16000
Now, let’s randomly sample 50 total pages from all pages. From these pages, we’ll choose our players.
#now we want to randomly sample players from
#all possible pages from the url below
pages_url <- "https://na.op.gg/ranking/ladder/page="
#generate 20 random pages to randomly select players from
#Without replacement
rand_pages <- sample(1:total_pages, 50)
rand_pages
## [1] 14832 1872 14547 8842 1691 14490 4446 7997 9381 8291 2354 7572
## [13] 14085 7045 5415 11308 3590 2581 9498 6118 6705 8413 13100 12604
## [25] 13338 13802 9954 4626 3564 2429 13489 15734 4581 15099 7108 10021
## [37] 1032 6120 10589 3282 2073 908 3078 1634 249 2646 2759 5659
## [49] 3074 14172
Here, we’ll scrape 3 usernames of players randomly from our selected pages.
#create a list of player usernames
usernames <- list()
i <- 1
for (page in rand_pages) {
#add the page number to the pages url
temp_url <- str_c(pages_url, page)
#scrape the table of players from the current page
table_df <- temp_url %>%
read_html() %>%
html_node(".LadderRankingLayout") %>%
html_node("table") %>%
html_table()
#generate 3 random numbers to pick 3 usernames
n <- sample(1:100, 3)
n
for (j in n) {
#add the username to the list of usernames
usernames[[i]] <- table_df[j, 2]
i <- i + 1
}
}
#150 randomly sampled usernames
usernames %>% head(10)
## [[1]]
## [1] "maigaren"
##
## [[2]]
## [1] "1Chopper kun"
##
## [[3]]
## [1] "Kyokeikeikashi"
##
## [[4]]
## [1] "NoobItDown"
##
## [[5]]
## [1] "theNotSoRUSSIAN"
##
## [[6]]
## [1] "InflatedM"
##
## [[7]]
## [1] "MrGrimm94"
##
## [[8]]
## [1] "Dretep12"
##
## [[9]]
## [1] "ßotÐiff"
##
## [[10]]
## [1] "0mGkæx"
Let’s create a data frame with our selected attrbutes for our data.
#prep the data frame we'll be adding scraped data to
vect <- unlist(usernames)
df <- data.frame("Username" = c(vect), "Kills" = 0, "Deaths" = 0, "Assists" = 0, "CSmin" = 0, "CStotal" = 0, "KP" = 0, "rank" = "" )
df %>%
head(10)
To make the next process easier, we create some functions that scrape specific data.
The first function, get_info(), scrapes most of the html page, where the nodes of the information we’re interested in lie.
After that, get_kda returns a list with a player’s Kills/Deaths/Assists from their most recent game.
#get recent game info html
get_info <- function(name) {
#now that we have our players, let's
#take some stats from their most recent ranked game
user <- name
user_url <- "https://na.op.gg/summoner/userName="
temp_user_url <- str_c(user_url, user)
#grab some general info from the most recently played game
temp_user_url %>%
read_html() %>%
html_node(".GameItemWrap") %>%
html_node(".Content")
# returns game_info
}
get_kda <- function(game_html) {
#We want to focus on 2 classes:
#KDA, STATS
#grab the KDA of the player
game_html %>%
html_node(".KDA") %>%
html_text() %>%
str_remove_all("\t") %>%
str_remove_all(" ") %>%
str_remove_all("/") %>%
str_split("\n")
}
For these two functions, we scrape the CS per minute and total CS from a player’s most recent game, and then get the kill participation (KP) from the same game.
#grab the stats for a particular player
get_cs <- function(game_html){
stats <- game_info %>%
html_node(".Stats") %>%
html_text() %>%
str_remove_all("\t") %>%
str_split("\n") %>%
unlist()
#now, let'stake the stat info we want with some regex
#grab some cs stats
cs_stats <- str_remove_all(stats[6], "\\)") %>%
str_remove_all("\\(") %>%
str_split(" ") %>%
unlist
}
get_kp <- function(game_html) {
stats <- game_info %>%
html_node(".Stats") %>%
html_text() %>%
str_remove_all("\t") %>%
str_split("\n") %>%
unlist()
#grab the kill participation (KP)
str_remove(stats[9], "\\%") %>%
str_split(" ") %>%
unlist()
}
Finally, we scrape the rank of the player.
#now lets scrape their rank
get_rank <- function(name) {
user <- name
user_url <- "https://na.op.gg/summoner/userName="
temp_user_url <- str_c(user_url, user)
temp_user_url %>%
read_html() %>%
html_node(".TierRankInfo") %>%
html_node(".TierRank") %>%
html_text()
}
We have a corner case in which the user has no games listed on their op.gg. In this case, we’ll just fill out NA for missing data. We return TRUE here if there are no results (no data) to scrape from this player.
check_results <- function(name) {
user <- name
user_url <- "https://na.op.gg/summoner/userName="
temp_user_url <- str_c(user_url, user)
checker <- temp_user_url %>%
read_html() %>%
html_text()
res <- checker %>%
str_extract("There are no results recorded.")
if (is.na(res)) {
FALSE
} else {
#TRUE means that there are no results; cant use
TRUE
}
}
In this code chunk, we loop through the list of usernames and call each of the aformentioned functions to scrape data for a particular user.
One thing to note: Op.GG handles spaces in names for their url with “%20”. That is, if a username is “Cats Better Than Dogs”, we need to scrape from the url: “https://na.op.gg/summoner/userName=Cats%20Better%20Than%20Dogs”.
After data is scraped for each username, we add the data to our data frame.
#here we will call the functions used to scrape stats for all players
#and then add it to our data frame.
i <- 1 #iteration through the data frame when adding content
user_list <- unlist(usernames)
#let's loop through our usernames and call get_info
for (player_username in user_list) {
#op gg handles spaces in urls with '%20'
player_username <- str_replace_all(player_username, " ", "%20")
#check if there is data to scrape
if (check_results(player_username) == TRUE) {
#if there's nothing to scrape, use NA
df[i,] = NA
} else {
game_info <- get_info(player_username) #scrape the most recent game information
kda <- get_kda(game_info) #grab the kda of the player
#skip the first two entries; use entries 3-5
kills <- kda[[1]][3]
deaths <- kda[[1]][4]
assists <- kda[[1]][5]
cs_stats <- get_cs(game_info)
CStotal <- cs_stats[1]
CSmin <- cs_stats[2]
kp <- get_kp(game_info)
kp <- kp[2]
rank <- get_rank(player_username)
#now that we have all our data for a player, add to our frarme
df[i, 2] <- kills
df[i, 3] <- deaths
df[i, 4] <- assists
df[i, 5] <- CSmin
df[i, 6] <- CStotal
df[i, 7] <- kp
df[i, 8] <- rank
}
#increment df index
i <- i + 1
}
#after a few minutes, we should have our frame
df %>%
head(10)
##Tidying Our Data
Now that we’ve successfully scraped all our data, it’s time to do some tidying before we do some processing.
We start by adding back the list of usernames into the Username attribute column from our dataframe. From there, we seperate the tier and the division, since we’re only interested in the tier-differences. Finally, we change some data types and factorize the tiers.
#let's put the user names back in, and we have our data frame
df[1] <- user_list
#now we have to tidy up rank
#we wont focus on divisions; we'll stick to tiers.
#get rid of the number in rank
temp_df <- df %>%
mutate(temp_tier=str_split(df$rank, " "), tier="")
i <- 1
for (arr in temp_df$temp_tier) {
temp_df[i, 10] <- arr[1]
i <- i + 1
}
# change the datatypes of the numeric attributes
# and select the attributes you want to keep
almost_tidy_df <- temp_df %>%
mutate(Kills=as.integer(Kills), Deaths=as.integer(Deaths), Assists=as.integer(Assists),
CSmin=as.double(CSmin), CStotal=as.integer(CStotal), KP=as.integer(KP)) %>%
select(-rank)
#last thing we want to do is to quantify tiers; that is,
#we need to express that, e.g., Gold is a higher tier than Silver.
tidy_data <- almost_tidy_df %>%
mutate(factor_tier=as.factor(tier))
##Exploratory Data Analysis
#let's looks at a scatter plot of K/D and CSmin
tidy_data %>%
ggplot(mapping = aes(x=factor_tier, y=CSmin)) +
geom_boxplot() +
labs(title="Distributions of CS per minute For Different Ranks",
x="Tier",
y="CS per minute")
#let's looks at a scatter plot of K/D and CSmin
tidy_data %>%
ggplot(mapping = aes(x=factor_tier, y=((Kills+Assists)))) +
geom_boxplot() +
labs(title="Distributions of Kills + Assists in a Game For Different Ranks",
x="Tier",
y="Kills and Assists")
#let's looks at a scatter plot of K/D and CSmin
tidy_data %>%
ggplot(mapping = aes(x=factor_tier, y=(Deaths))) +
geom_boxplot() +
labs(title="Distributions of Deaths in a Game For Different Ranks",
x="Tier",
y="Deaths")
#let's looks at a scatter plot of K/D and CSmin
tidy_data %>%
ggplot(mapping = aes(x=factor_tier, y=(KP))) +
geom_boxplot() +
labs(title="Distributions of Kill Participation in a Game For Different Ranks",
x="Tier",
y="Kill Participation (Percent)")
#it looks like csmin, kills+assists, kp could all explain rank.
#let's test the significane of them
#what we want to do now is find averages of all the above
#attributes for each rank and test if there's a
#statistically significant difference.
#let's use ANOVA to perform a multi-comparison test
#and follow up with the Tukey Method.
#let's focus on a few contiguous tiers
anova_data <- tidy_data %>%
filter(tier=="Gold" | tier=="Silver" | tier=="Bronze")
anova_data %>%
head(10)
model <- lm(anova_data$CSmin ~ anova_data$tier)
ANOVA = aov(model)
TUKEY <- TukeyHSD(x=ANOVA, 'anova_data$tier', conf.level=0.95)
plot(TUKEY , las=1 , col="brown")
model <- lm(anova_data$Deaths ~ anova_data$tier)
ANOVA = aov(model)
TUKEY <- TukeyHSD(x=ANOVA, 'anova_data$tier', conf.level=0.95)
plot(TUKEY , las=1 , col="brown")
model <- lm(anova_data$KP ~ anova_data$tier)
ANOVA = aov(model)
TUKEY <- TukeyHSD(x=ANOVA, 'anova_data$tier', conf.level=0.95)
plot(TUKEY , las=1 , col="brown")
rank_fit <- lm(KP~CSmin*Deaths*Kills*Assists*tier, data=anova_data)
rank_fit_stats <- rank_fit %>%
tidy()
rank_fit_stats %>% knitr::kable()
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 187.6149885 | 168.4087030 | 1.1140457 | 0.2684371 |
CSmin | -24.0296854 | 29.3871912 | -0.8176925 | 0.4158462 |
Deaths | -27.3493446 | 25.2646715 | -1.0825134 | 0.2821222 |
Kills | -47.3697282 | 45.4550497 | -1.0421225 | 0.3003450 |
Assists | -26.5202838 | 21.1870442 | -1.2517217 | 0.2141457 |
tierGold | -163.2207725 | 173.1922085 | -0.9424256 | 0.3486774 |
tierSilver | -186.0442359 | 168.9125787 | -1.1014232 | 0.2738586 |
CSmin:Deaths | 4.1672392 | 4.4762036 | 0.9309762 | 0.3545338 |
CSmin:Kills | 8.9760868 | 7.9736349 | 1.1257208 | 0.2634898 |
Deaths:Kills | 8.7260945 | 7.2751034 | 1.1994461 | 0.2337269 |
CSmin:Assists | 5.2379584 | 4.0228559 | 1.3020497 | 0.1964585 |
Deaths:Assists | 5.3525996 | 3.6325216 | 1.4735217 | 0.1443481 |
Kills:Assists | 9.2592344 | 6.2916087 | 1.4716800 | 0.1448440 |
CSmin:tierGold | 20.4079933 | 30.9546191 | 0.6592875 | 0.5115146 |
CSmin:tierSilver | 26.9441410 | 29.5130523 | 0.9129568 | 0.3638782 |
Deaths:tierGold | 34.5104026 | 26.4710290 | 1.3037046 | 0.1958960 |
Deaths:tierSilver | 28.1711488 | 25.4012958 | 1.1090438 | 0.2705764 |
Kills:tierGold | 53.6870441 | 45.7219487 | 1.1742073 | 0.2436303 |
Kills:tierSilver | 50.8091531 | 46.0231905 | 1.1039902 | 0.2727499 |
Assists:tierGold | 29.6923595 | 21.6209253 | 1.3733159 | 0.1733088 |
Assists:tierSilver | 30.4587302 | 21.3301413 | 1.4279666 | 0.1570089 |
CSmin:Deaths:Kills | -1.5479563 | 1.3279005 | -1.1657171 | 0.2470282 |
CSmin:Deaths:Assists | -1.0139605 | 0.6974456 | -1.4538203 | 0.1497223 |
CSmin:Kills:Assists | -1.8603759 | 1.2467778 | -1.4921471 | 0.1394069 |
Deaths:Kills:Assists | -1.8072120 | 1.1254415 | -1.6057804 | 0.1120752 |
CSmin:Deaths:tierGold | -4.9541454 | 4.8366444 | -1.0242939 | 0.3086373 |
CSmin:Deaths:tierSilver | -3.8917650 | 4.5365676 | -0.8578655 | 0.3934078 |
CSmin:Kills:tierGold | -8.8496509 | 8.0229087 | -1.1030477 | 0.2731566 |
CSmin:Kills:tierSilver | -9.1076791 | 8.0541316 | -1.1308083 | 0.2613541 |
Deaths:Kills:tierGold | -10.0937236 | 7.3119894 | -1.3804347 | 0.1711153 |
Deaths:Kills:tierSilver | -8.9371809 | 7.3171299 | -1.2214053 | 0.2253496 |
CSmin:Assists:tierGold | -5.3884447 | 4.1555018 | -1.2967013 | 0.1982847 |
CSmin:Assists:tierSilver | -5.4953345 | 4.0596483 | -1.3536479 | 0.1794799 |
Deaths:Assists:tierGold | -5.9917401 | 3.7339990 | -1.6046443 | 0.1123255 |
Deaths:Assists:tierSilver | -5.4830902 | 3.6452594 | -1.5041701 | 0.1362883 |
Kills:Assists:tierGold | -9.5675929 | 6.3064938 | -1.5171018 | 0.1329956 |
Kills:Assists:tierSilver | -9.5511782 | 6.3217476 | -1.5108446 | 0.1345809 |
CSmin:Deaths:Kills:Assists | 0.3532898 | 0.2278623 | 1.5504533 | 0.1247924 |
CSmin:Deaths:Kills:tierGold | 1.6911244 | 1.3348687 | 1.2668844 | 0.2086980 |
CSmin:Deaths:Kills:tierSilver | 1.5328813 | 1.3358551 | 1.1474907 | 0.2544366 |
CSmin:Deaths:Assists:tierGold | 1.1095381 | 0.7257982 | 1.5287143 | 0.1300924 |
CSmin:Deaths:Assists:tierSilver | 0.9895571 | 0.7019765 | 1.4096728 | 0.1623281 |
CSmin:Kills:Assists:tierGold | 1.8604594 | 1.2490844 | 1.4894585 | 0.1401119 |
CSmin:Kills:Assists:tierSilver | 1.8910587 | 1.2508113 | 1.5118657 | 0.1343212 |
Deaths:Kills:Assists:tierGold | 1.8958073 | 1.1274658 | 1.6814765 | 0.0963850 |
Deaths:Kills:Assists:tierSilver | 1.8271949 | 1.1273387 | 1.6208038 | 0.1088078 |
CSmin:Deaths:Kills:Assists:tierGold | -0.3653061 | 0.2282580 | -1.6004085 | 0.1132625 |
CSmin:Deaths:Kills:Assists:tierSilver | -0.3526437 | 0.2282084 | -1.5452708 | 0.1260401 |
rank_fit %>%
augment() %>%
ggplot(aes(x=.fitted, y=.resid, color=tier)) +
geom_point()