[R] Transform a data.frame with "; " sep column and another one in a a new one with the same two column but with repetitions

Fri Jul 4 17:00:44 CEST 2014

Em 04-07-2014 15:15, arun escreveu:
>
> Hi,
> Try:
> dat1 <- read.table(text="'1 > TC' 'WC'
> '2 > 0'  'Instruments & Instrumentation; Nuclear Science & Technology;Physics, Particles & Fields; Spectroscopy'
> '3 > 0' 'Nanoscience & Nanotechnology; Materials Science,Multidisciplinary; Physics, Applied'
> '4 > 2'    'Physics, Nuclear; Physics, Particles & Fields'
> '5 > 0'    'Chemistry, Inorganic & Nuclear'
> '6 > 2'    'Chemistry, Physical; Materials Science, Multidisciplinary;Metallurgy & Metallurgical Engineering'",sep="",header=F, stringsAsFactors=F)
>
> library(data.table)
> Using `cSplit()` from
> https://gist.github.com/mrdwab/11380733
>
> cSplit(dat1, "V2", ";", "long")
>          V1                                     V2
>   1: 1 > TC                                     WC
>   2:  2 > 0          Instruments & Instrumentation
>   3:  2 > 0           Nuclear Science & Technology
>   4:  2 > 0            Physics, Particles & Fields
>   5:  2 > 0                           Spectroscopy
>   6:  3 > 0           Nanoscience & Nanotechnology
>   7:  3 > 0    Materials Science,Multidisciplinary
>   8:  3 > 0                       Physics, Applied
>   9:  4 > 2                       Physics, Nuclear
> 10:  4 > 2            Physics, Particles & Fields
> 11:  5 > 0         Chemistry, Inorganic & Nuclear
> 12:  6 > 2                    Chemistry, Physical
> 13:  6 > 2   Materials Science, Multidisciplinary
> 14:  6 > 2 Metallurgy & Metallurgical Engineering
>
>
>
> A.K.
>
>
> On Friday, July 4, 2014 9:53 AM, João Azevedo Patrício <joao.patricio  gmx.pt> wrote:
> Hi,
>
> I've been trying to solve this issue but with no success.
>
> I have some data like this:
>
> 1 > TC    WC
> 2 > 0    Instruments & Instrumentation; Nuclear Science & Technology;
> Physics, Particles & Fields; Spectroscopy
> 3 > 0    Nanoscience & Nanotechnology; Materials Science,
> Multidisciplinary; Physics, Applied
> 4 > 2    Physics, Nuclear; Physics, Particles & Fields
> 5 > 0    Chemistry, Inorganic & Nuclear
> 6 > 2    Chemistry, Physical; Materials Science, Multidisciplinary;
> Metallurgy & Metallurgical Engineering
>
> And I need to have this:
>
> 1 > TC    WC
> 2 > 0    Instruments & Instrumentation
> 2 > 0    Nuclear Science & Technology
> 2 > 0    Physics, Particles & Fields
> 2 > 0    Spectroscopy
> 3 > 0    Nanoscience & Nanotechnology
> 3 > 0    Materials Science, Multidisciplinary
> 3 > 0    Physics, Applied
> 4 > 2    Physics, Nuclear
> 4 > 2    Physics, Particles & Fields
> 5 > 0    Chemistry, Inorganic & Nuclear
> 6 > 2    Chemistry, Physical
> 6 > 2    Materials Science, Multidisciplinary
> 6 > 2    Metallurgy & Metallurgical Engineering
>
> This means repeat the row for each element in WC and keeping the same
> value in TC. The goal is to check how many TC (sum) there are by WC,
> when WC is multiple.
>
> i've tried to separate the column using strsplt but then I cannot keep
> the track of TC.
>
> thanks in advance.
Thanks is simply fantastic!
After that I just have to do an aggregate by WC and it gives me the n of 
TC by WC.

thanks!

my code looks like this:

isi <- read.table("filename", header = TRUE, sep=";") ##get citations 
and web of science categories file
cSplit(isi, "WC", ";", "long") ## split by WC
isisplit <- cSplit(isi, "WC", ";", "long") ## create file with split WC info
wccitations <- aggregate (isisplit$TC, by=list(Category=isisplit$WC), 
FUN = sum) ## creates a table with the list of WCategories and the 
specific citations sum for  each
wcproduction <- table(isisplit$WC) ## creates a table with the number of 
pubs by WCategories

-- 
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee

"Take 2 seconds to think before you act"