[R-br] Erro em função out_rem

Alisson Lucrecio alissonluc em gmail.com
Quarta Maio 21 10:51:25 BRT 2014


Caro Roberto,

Usei o operador %in% como sugerido, o NA é proprosital, assim no passo
seguinte cálculo a média por "Tempo" e substituo o NA por ela com a
seguinte função

tab1 <- ddply(tab1, "Tempo", function(dfm){
  if(any(is.na(dfm))){
    k <- which(is.na(dfm), arr.ind=TRUE)
    dfm[k] <- colMeans(dfm, na.rm=TRUE)[k[,2]]
  }
  return(dfm)
})

Além disso posso cálcular o valor de dados substituidos e conferir se estão
dentro dos 5 %.

Muito obrigado por sua ajuda.


On Wed, May 21, 2014 at 10:19 AM, Robert Iquiapaza <rbali em ufmg.br> wrote:

>   Allison,
> Você esta substituindo por NA não pela média. Mas o alerta deve-se a um
> erro na sua função quando usa which(x == outliers), x e outliers têm
> cumprimento diferente e a comparação produz resultados inesperados, use o
> operador %in%:
>
> tab2 <- apply(tab1[,-1],2,out_rem)
>
> out_rem1 <-function(x) {
>   outliers <- boxplot(x, plot = FALSE)$out
>   if (length(outliers) != 0){
>     x[which(x %in% outliers)] = NA  # mean(x,na.rm=T)
>   }
>   return(x)
> }
>
>
> tab3 <- apply(tab1[,-1],2,out_rem1)
>
> tab1[,-1]==tab3
>
> apply(tab2,2,function(x)sum(is.na(x))) #alguns outliers não são
> identificados corretamente
> apply(tab3,2,function(x)sum(is.na(x))) # ok
>
> Sds
>
>  *From:* Alisson Lucrecio <alissonluc em gmail.com>
> *Sent:* Wednesday, May 21, 2014 8:46 AM
> *To:* r-br <r-br em listas.c3sl.ufpr.br>
> *Subject:* [R-br] Erro em função out_rem
>
>
> Caros Colegas,
>
> Bom dia.
>
> Eu estou tentanto criar uma função para encontrar os outlier em uma data frame e substituir pela média, mas esta acontecendo a seguinte mensagem de erro descrito abaixo. Alguém saberia como solucionar esse problema?
>
> Obrigado.
>
>
> > str(tab1)
> 'data.frame':	40 obs. of  13 variables:
>  $ Tempo    : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ Fo       : int  58 84 69 67 90 85 77 86 85 76 ...
>  $ Fm       : int  240 427 290 331 424 373 351 375 393 302 ...
>  $ Fv.Fm    : num  0.758 0.803 0.762 0.798 0.788 0.772 0.781 0.771 0.784 0.748 ...
>  $ ETR      : num  20 22.4 22.3 23.9 20.1 20.7 18 23.9 27.5 24.2 ...
>  $ Clorofila: num  58.3 67.8 49.8 74.8 59.6 63.6 52.2 56.6 54.4 58.5 ...
>  $ MS       : num  0.57 0.69 0.71 0.81 0.48 1.2 0.55 0.68 0.55 0.3 ...
>  $ Umid     : num  58.1 62.5 63 58 63.6 ...
>  $ Zn       : num  33.5 100.5 93.9 127.4 535 ...
>  $ Cu       : num  27.7 27.7 27.7 27.7 27.6 ...
>  $ Fe       : num  1714 857 1660 1145 1141 ...
>  $ Mn       : num  612 1434 1068 1541 716 ...
>  $ Ca       : num  0.158 0.493 0.118 0.355 0.374 ...
>
> > dput(tab1)
> structure(list(Tempo = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2,
> 2, 2, 2, 4, 4, 4, 4, 4, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 10, 10,
> 10, 10, 10, 12, 12, 12, 12, 12), Fo = c(58L, 84L, 69L, 67L, 90L,
> 85L, 77L, 86L, 85L, 76L, 91L, 84L, 81L, 180L, 82L, 126L, 132L,
> 128L, 136L, 122L, 201L, 247L, 184L, 227L, 221L, 268L, 304L, 284L,
> 311L, 318L, 249L, 258L, 286L, 243L, 275L, 240L, 241L, 250L, 241L,
> 228L), Fm = c(240L, 427L, 290L, 331L, 424L, 373L, 351L, 375L,
> 393L, 302L, 231L, 217L, 207L, 349L, 137L, 233L, 232L, 202L, 239L,
> 197L, 446L, 533L, 418L, 337L, 492L, 463L, 396L, 429L, 430L, 455L,
> 481L, 469L, 599L, 524L, 460L, 244L, 543L, 418L, 398L, 474L),
>     Fv.Fm = c(0.758, 0.803, 0.762, 0.798, 0.788, 0.772, 0.781,
>     0.771, 0.784, 0.748, 0.602, 0.613, 0.609, 0.484, 0.401, 0.459,
>     0.431, 0.366, 0.431, 0.381, 0.549, 0.537, 0.56, 0.326, 0.551,
>     0.421, 0.232, 0.338, 0.277, 0.301, 0.482, 0.45, 0.523, 0.536,
>     0.402, 0.016, 0.556, 0.402, 0.394, 0.519), ETR = c(20, 22.4,
>     22.3, 23.9, 20.1, 20.7, 18, 23.9, 27.5, 24.2, 14.7, 16.3,
>     17.2, 6.1, 10.5, 6.5, 7.4, 4.8, 9.7, 7.1, 8, 12.1, 9, 5.5,
>     5.8, 7.3, 2.9, 4.9, 4.1, 4.6, 5.4, 6.9, 3.9, 6, 4.9, 0.3,
>     6.3, 4.1, 3.5, 8.2), Clorofila = c(58.3, 67.8, 49.8, 74.8,
>     59.6, 63.6, 52.2, 56.6, 54.4, 58.5, 58.1, 64.8, 46.4, 49.2,
>     43.7, 60.1, 48.1, 66, 50.4, 53, 56.2, 61, 50.8, 45.3, 56.6,
>     46, 45.9, 43, 46.1, 37.3, 57.6, 58.8, 46.7, 48.6, 41.9, 71,
>     42.6, 45.4, 44.2, 52.5), MS = c(0.57, 0.69, 0.71, 0.81, 0.48,
>     1.2, 0.55, 0.68, 0.55, 0.3, 0.52, 0.88, 0.46, 0.25, 0.29,
>     0.54, 0.48, 0.62, 0.26, 0.38, 0.46, 0.39, 0.4, 0.39, 0.37,
>     0.37, 0.74, 0.33, 0.5, 0.47, 0.54, 0.7, 0.38, 0.36, 0.17,
>     0.96, 0.54, 0.61, 0.41, 0.48), Umid = c(58.09, 62.5, 63.02,
>     58.03, 63.64, 52.94, 63.58, 63.83, 60.71, 64.29, 60.9, 51.65,
>     56.6, 51.92, 50.85, 53.04, 54.29, 40.38, 51.85, 53.66, 57.01,
>     55.17, 61.54, 63.21, 59.78, 53.75, 56.98, 61.63, 55.36, 60.83,
>     56.45, 58.08, 62.75, 58.14, 59.52, 22.58, 62.5, 55.15, 55.43,
>     60.33), Zn = c(33.4872691, 100.4618073, 93.905, 127.4425,
>     534.995015, 120.1941264, 140.4361914, 180.9215784, 46.88217673,
>     113.9705147, 95.33502538, 141.1397796, 113.7999002, 53.84846964,
>     53.79448622, 86.93668993, 87.11038961, 120.4339152, 63.88095238,
>     173.9600998, 153.9645709, 80.36944583, 93.905, 46.81206381,
>     100.5622189, 136.1928934, 107.5350701, 53.55289421, 80.77270447,
>     47.07017544, 93.81118881, 107.2663668, 80.40959041, 107.2663668,
>     134.15, 142.9980013, 107.1057884, 80.49, 46.9525, 221.3475
>     ), Cu = c(27.65851223, 27.65851223, 27.7, 27.7, 27.61714855,
>     27.57590841, 27.61714855, 27.67232767, 27.65851223, 27.68615692,
>     28.12182741, 27.75551102, 55.28942116, 83.39187155, 27.76942356,
>     27.61714855, 27.67232767, 27.63092269, 52.76190476, 27.63092269,
>     27.64471058, 27.65851223, 27.7, 27.61714855, 27.68615692,
>     28.12182741, 55.51102204, 27.64471058, 55.59458103, 27.76942356,
>     27.67232767, 27.68615692, 27.67232767, 27.68615692, 27.7,
>     36.90872751, 110.5788423, 55.4, 27.7, 27.7), Fe = c(1714.328507,
>     857.1642536, 1659.67, 1144.6, 1141.176471, 569.7361872, 912.9411765,
>     1257.802198, 1657.184224, 3317.681159, 2614.568528, 2408.476954,
>     5026.187625, 5168.790768, 688.481203, 1940, 1257.802198,
>     1712.618454, 981.0857143, 1084.658354, 799.6207585, 3142.935597,
>     2918.73, 1141.176471, 1029.625187, 2033.553299, 8716.392786,
>     2741.556886, 5743.100853, 4245.634085, 1372.147852, 1487.236382,
>     1029.110889, 2002.048976, 3548.26, 762.5582945, 12108.54291,
>     2174.74, 2174.74, 1945.82), Mn = c(611.9321018, 1433.669496,
>     1068.11, 1540.88, 715.7627119, 1621.134893, 1536.271186,
>     717.1928072, 1625.991013, 840.05997, 1457.685279, 789.5290581,
>     1258.203593, 755.5745108, 737.2631579, 1309.322034, 787.1628372,
>     1275.042394, 333.5238095, 1327.441397, 1048.502994, 1608.507239,
>     752.93, 611.0169492, 1137.581209, 871.0558376, 1140.430862,
>     664.0518962, 1071.861515, 473.9548872, 1311.938062, 1347.596202,
>     437.3126873, 542.5387306, 595.34, 2053.137908, 2097.005988,
>     437.75, 1068.11, 1383.29), Ca = c(0.15765352, 0.492667249,
>     0.1184175, 0.3552525, 0.373867149, 0.432252364, 0.35418993,
>     0.374614136, 0.433547179, 0.216990255, 0.280515228, 0.435067635,
>     0.334846557, 0.297083542, 0.197857143, 0.806765952, 0.492913337,
>     0.452801746, 0.300742857, 0.472488778, 0.630299401, 0.394133799,
>     0.21709875, 0.295158275, 0.611517991, 0.30055203, 0.37574023,
>     0.315149701, 0.455528098, 0.336357143, 0.473196803, 0.49315967,
>     0.177448801, 0.374801349, 0.13815375, 0.631139241, 0.590905689,
>     0.15789, 0.5131425, 0.53287875)), .Names = c("Tempo", "Fo",
> "Fm", "Fv.Fm", "ETR", "Clorofila", "MS", "Umid", "Zn", "Cu",
> "Fe", "Mn", "Ca"), row.names = c(NA, -40L), class = "data.frame")
>
> > out_rem <-function(x) {
>   outliers <- boxplot(x, plot = FALSE)$out
>   if (length(outliers) != 0){
>     x[which(x == outliers)] = NA
>   }
>   return(x)
> }
>
> > tab1[,-1] <- apply(tab1[,-1],2,out_rem)
>
> Warning message:
> In x == outliers :
>   longer object length is not a multiple of shorter object length
>
>
>
> --
> Alisson Lucrecio da Costa
>
>  ------------------------------
> _______________________________________________
> R-br mailing list
> R-br em listas.c3sl.ufpr.br
> https://listas.inf.ufpr.br/cgi-bin/mailman/listinfo/r-br
> Leia o guia de postagem (http://www.leg.ufpr.br/r-br-guia) e forneça
> código mínimo reproduzível.
>
>
> _______________________________________________
> R-br mailing list
> R-br em listas.c3sl.ufpr.br
> https://listas.inf.ufpr.br/cgi-bin/mailman/listinfo/r-br
> Leia o guia de postagem (http://www.leg.ufpr.br/r-br-guia) e forneça
> código mínimo reproduzível.
>



-- 
Alisson Lucrecio da Costa
-------------- Próxima Parte ----------
Um anexo em HTML foi limpo...
URL: <http://listas.inf.ufpr.br/pipermail/r-br/attachments/20140521/a91ea361/attachment.html>


Mais detalhes sobre a lista de discussão R-br