<div dir="ltr">Senhores,<div><br></div><div><br></div><div>estou estudando hadoop/map reduce no R e achei um código na pagina do RevolutionR para o Kmeans. Estou tentando entender e ei que me surgiu uma dúvida...</div><div><br></div><div>Existe uma declaração de função para a fase de map onde o parâmetro de input é um "."</div><div>Alguém sabe o que significa?</div><div>Esta função é chamada dentro de outra função. O que pensei é que o "." simboliza que a sub função herda os parâmetros da "função mãe".</div><div><br></div><div>como está declarada a função:</div><div><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre"> </span><span class="" style="font-weight:bold;color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre">function</span><span class="" style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre">(</span><span class="" style="color:rgb(0,153,153);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre">.</span><span class="" style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre">,</span><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre"> P</span><span class="" style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre">)</span><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre"> </span><br clear="all"><div><br></div><div><br></div><div>### código inteiro ####</div><div><br></div><div><div># Copyright 2011 Revolution Analytics</div><div># </div><div># Licensed under the Apache License, Version 2.0 (the "License");</div><div># you may not use this file except in compliance with the License.</div><div># You may obtain a copy of the License at</div><div># </div><div># <a href="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</a></div><div># </div><div># Unless required by applicable law or agreed to in writing, software</div><div># distributed under the License is distributed on an "AS IS" BASIS,</div><div># WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.</div><div># See the License for the specific language governing permissions and</div><div># limitations under the License.</div><div>Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.7.0.jar")</div><div>library(rmr2)</div><div>library(rhdfs)</div><div>hdfs.init()</div><div># P tabela com valores</div><div>## @knitr kmeans-signature</div><div><a href="http://kmeans.mr">kmeans.mr</a> = function(P, num.clusters, num.iter, combine, in.memory.combine) {</div><div>## @knitr kmeans-dist.fun</div><div># C : Centroides </div><div>dist.fun = function(C, P) {apply(C,1, function(x) {colSums((t(P) - x)^2))}}</div><div>## @knitr kmeans.map</div><div><b><font color="#cc0000">kmeans.map = function(., P) {</font></b></div><div> nearest = {if(is.null(C)) sample(1:num.clusters, nrow(P), replace = TRUE)</div><div> else {D = dist.fun(C, P) nearest = max.col(-D)}}</div><div>if(!(combine || in.memory.combine))</div><div>keyval(nearest, P) </div><div>else {keyval(nearest, cbind(1, P))}}</div><div>## @knitr kmeans.reduce</div><div>kmeans.reduce = {</div><div>if (!(combine || in.memory.combine) ) </div><div>function(., P) </div><div>t(as.matrix(apply(P, 2, mean)))</div><div>else </div><div>function(k, P) </div><div>keyval(</div><div>k, </div><div>t(as.matrix(apply(P, 2, sum))))}</div><div>## @knitr kmeans-main-1 </div><div>C = NULL</div><div>for(i in 1:num.iter ) {</div><div>C = </div><div>values(</div><div>from.dfs(</div><div>mapreduce(</div><div>P, </div><div>map = kmeans.map,</div><div>reduce = kmeans.reduce)))</div><div>if(combine || in.memory.combine)</div><div>C = C[, -1]/C[, 1]</div><div>## @knitr end</div><div># points(C, col = i + 1, pch = 19)</div><div>## @knitr kmeans-main-2</div><div>if(nrow(C) < num.clusters) {</div><div>C = </div><div>rbind(</div><div>C,</div><div>matrix(</div><div>rnorm(</div><div>(num.clusters - </div><div>nrow(C)) * nrow(C)), </div><div>ncol = nrow(C)) %*% C) }}</div><div>C}</div><div>## @knitr end</div><div><br></div><div>## sample runs</div><div>## </div><div><br></div><div>out = list()</div><div><br></div><div>for(be in c("local", "hadoop")) {</div><div>rmr.options(backend = be)</div><div>set.seed(0)</div><div>## @knitr kmeans-data</div><div>P = do.call(rbind, rep(list(matrix(rnorm(10, sd = 10), ncol=2)),20)) + matrix(rnorm(200), ncol =2)</div><div>## @knitr end</div><div># x11()</div><div># plot(P)</div><div># points(P)</div><div>out[[be]] = </div><div>## @knitr kmeans-run </div><div><a href="http://kmeans.mr">kmeans.mr</a>(to.dfs(P),num.clusters = 12, </div><div>num.iter = 5,</div><div>combine = FALSE,</div><div>in.memory.combine = FALSE)</div><div>## @knitr end</div><div>}</div><div><br></div><div># would love to take this step but kmeans in randomized in a way that makes it hard to be completely reprodubile</div><div># stopifnot(rmr2:::cmp(out[['hadoop']], out[['local']]))</div><div><br></div></div>-- <br><div dir="ltr"><i>Vinicius Brito Rocha.</i><br><br><br></div>
</div></div>