[R-br] [OFF-TOPIC] regex

Cleber N.Borges klebyn em yahoo.com.br
Segunda Setembro 30 09:36:51 BRT 2013


olá

não sei se minha solução é boa mas quebra um galho:

text <-   "seu abstract aqui"  #

cat(  gsub( "(.*)(\\(C\\))(.*)", "\\1", text, perl=TRUE )   )

cat(  gsub( "(.*)(\\(C\\))(.*)", "\\2\\3", text, perl=TRUE )   )


talvez possa te ajudar em alguma coisa

cleber



Em 30/09/2013 09:09, Roney Fraga Souza escreveu:
> Caros,
>
> Tenho um arquivo em formato txt com abstracts de artigos científicos, segue exemplo:
>
> We examine the productive efficiency of 70 Indian commercial banks during the early stages (1986-1991) of the ongoing period of liberalization. We use data envelopment analysis to calculate radial technical efficiency scores. We then use stochastic frontier analysis to attribute variation in the calculated efficiency scores to three sources: a temporal component, an ownership component, and a random noise component. We find publicly-owned Indian banks to have been the most efficient, followed by foreign-owned banks and privately-owned Indian banks. We also find a temporal improvement in the performance of foreign-owned banks, virtually no trend in the performance of privately-owned Indian banks, and a temporal decline in the performance of publicly-owned Indian banks. We attempt to explain these patterns in terms of the government's evolving regulatory policies. (C) 1997 Elsevier Science B.V
> Recently, considerable attention has been focused on the performance of various airlines and air carriers in terms of efficiency. Although it is obvious that air carriers use airports, few studies have focused on airport operational efficiency. This empirical study evaluates the operational efficiencies of 44 major U.S. airports using data envelopment analysis and some of its recent developments. Various airport characteristics are evaluated to determine their relationship to an airport's efficiency. Efficiency measures are based on four resource input measures including airport operational costs, number of airport employees, gates and runways, and five output measures including operational revenue, passenger flow, commercial and general aviation movement, and total cargo transportation. The results of this study have operational as well as public policy implications. (C) 2000 Elsevier Science B.V. All rights reserved
>
> considerando que cada parágrafo esta em uma linha, ou seja, a quebra de linha existe apenas no final de cada abstract, preciso apagar tudo que estiver a partir do '(C)' até a quebra de linha. No primeiro parágrafo preciso eliminar '(C) 1997 Elsevier Science B.V', no segundo parágrafo '(C) 2000 Elsevier Science B.V. All rights reserved'.
>
> Uso um macbook, qualquer sugestão via R ou terminal é bem vinda.
>
> Atenciosamente
> Roney
> _______________________________________________
> R-br mailing list
> R-br em listas.c3sl.ufpr.br
> https://listas.inf.ufpr.br/cgi-bin/mailman/listinfo/r-br
> Leia o guia de postagem (http://www.leg.ufpr.br/r-br-guia) e forneça código mínimo reproduzível.
>




Mais detalhes sobre a lista de discussão R-br