<div dir="ltr">Senhores, bom dia!<div><br></div><div>Não sei se ainda há interesse na questão, mas retomei a ideia data mining do <a href="http://whoscored.com">whoscored.com</a> e gostaria de compartilhar uma solução.</div>
<div><br></div><div>No código apliquei só para o Fluminense, mas já gerei os índices pra montar o loop para outros times (objeto teams ou teamID).</div><div><br></div><div><div><font face="courier new, monospace">### <code r></font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"># setwd(choose.dir())</font></div><div><font face="courier new, monospace">setwd("C:/LAB/RBAS/dataMining")</font></div>
<div><font face="courier new, monospace">sapply(c("RCurl", "XML", "RJSONIO"), require, character.only=T)</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"># browseURL("<a href="http://www.whoscored.com/Teams/1232">http://www.whoscored.com/Teams/1232</a>")</font></div>
<div><font face="courier new, monospace">myURL <- "<a href="http://www.whoscored.com/Teams/1232">http://www.whoscored.com/Teams/1232</a>"</font></div><div><font face="courier new, monospace">htmRaw <- getURL(myURL)</font></div>
<div><font face="courier new, monospace">htmLin <- readLines(txtCon <- textConnection(htmRaw)); close(txtCon)</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">### Teams (IDs for future looping)</font></div>
<div><font face="courier new, monospace">pageTree <- htmlTreeParse(htmLin, error=function(...){}, useInternalNodes = TRUE)</font></div><div><font face="courier new, monospace">teamsNames <- as.character(xpathApply(pageTree, "//*/select[@id='teams']//option", xmlValue))</font></div>
<div><font face="courier new, monospace">teamsID <- xpathApply(pageTree, "//*/select[@id='teams']//option")</font></div><div><font face="courier new, monospace">teamsID <- sapply(teamsID, xmlGetAttr, 'value')</font></div>
<div><font face="courier new, monospace">teamsID <- as.integer(gsub("^.*Teams\\/(.*)", "\\1", teamsID))</font></div><div><font face="courier new, monospace">teams <- data.frame(teamsID, teamsNames, stringsAsFactors=F); teams</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">### Info about a specific team (Fluminense)</font></div><div><font face="courier new, monospace">sLin <- grep("DataStore.prime\\(\\'stage-player-stat\\'", htmLin)</font></div>
<div><font face="courier new, monospace">sDat <- htmLin[sLin]</font></div><div><font face="courier new, monospace">dJSON <- gsub("^.*DataStore.*\\[(.*)\\]);", "\\[\\1\\]", sDat)</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">convertJSONDate = function(x) {</font></div><div><font face="courier new, monospace"> if(grepl("/?(new )?Date\\(", x)) {</font></div>
<div><font face="courier new, monospace"> val = gsub(".*Date\\(([0-9]+)\\).*", "\\1", x)</font></div><div><font face="courier new, monospace"> as.Date(structure((as.numeric(val)/1000), class = c("POSIXct", "POSIXt")))</font></div>
<div><font face="courier new, monospace"> } else x }</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">myList <- fromJSON(dJSON, nullValue=NA, stringFun=convertJSONDate)</font></div>
<div><font face="courier new, monospace">length(myList)</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">myListVars <- as.vector(sapply(myList[1], names)); myListVars</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">fullDF <- data.frame(t(sapply(myList, as.vector)), stringsAsFactors=FALSE)</font></div><div><font face="courier new, monospace"><br>
</font></div><div><font face="courier new, monospace">shortListVars <- c("TeamRegionCode","Name","PositionShort","Age","Height","Weight","GameStarted","Goals","Assists","Yellow","Red","TotalShots","TotalPasses", "AccuratePasses","AerialWon","ManOfTheMatch","Rating")</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">head(fullDF[shortListVars])</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">### <code></font></div>
</div><div><br></div><div><br></div><div class="gmail_extra"><br clear="all"><div><div dir="ltr">Éder Comunello <<a href="mailto:comunello.eder@gmail.com" target="_blank">c</a><a href="mailto:omunello.eder@gmail.com" target="_blank">omunello.eder@gmail.com</a>> <br>
Dourados, MS - [22 16.5'S, 54 49'W]<br></div></div>
<br><br></div></div>