[R] Transform a data.frame with "; " sep column and another one in a a new one with the same two column but with repetitions
João Azevedo Patrício
joao.patricio at gmx.pt
Fri Jul 4 17:00:44 CEST 2014
Em 04-07-2014 15:15, arun escreveu:
>
> Hi,
> Try:
> dat1 <- read.table(text="'1 > TC' 'WC'
> '2 > 0' 'Instruments & Instrumentation; Nuclear Science & Technology;Physics, Particles & Fields; Spectroscopy'
> '3 > 0' 'Nanoscience & Nanotechnology; Materials Science,Multidisciplinary; Physics, Applied'
> '4 > 2' 'Physics, Nuclear; Physics, Particles & Fields'
> '5 > 0' 'Chemistry, Inorganic & Nuclear'
> '6 > 2' 'Chemistry, Physical; Materials Science, Multidisciplinary;Metallurgy & Metallurgical Engineering'",sep="",header=F, stringsAsFactors=F)
>
> library(data.table)
> Using `cSplit()` from
> https://gist.github.com/mrdwab/11380733
>
> cSplit(dat1, "V2", ";", "long")
> V1 V2
> 1: 1 > TC WC
> 2: 2 > 0 Instruments & Instrumentation
> 3: 2 > 0 Nuclear Science & Technology
> 4: 2 > 0 Physics, Particles & Fields
> 5: 2 > 0 Spectroscopy
> 6: 3 > 0 Nanoscience & Nanotechnology
> 7: 3 > 0 Materials Science,Multidisciplinary
> 8: 3 > 0 Physics, Applied
> 9: 4 > 2 Physics, Nuclear
> 10: 4 > 2 Physics, Particles & Fields
> 11: 5 > 0 Chemistry, Inorganic & Nuclear
> 12: 6 > 2 Chemistry, Physical
> 13: 6 > 2 Materials Science, Multidisciplinary
> 14: 6 > 2 Metallurgy & Metallurgical Engineering
>
>
>
> A.K.
>
>
> On Friday, July 4, 2014 9:53 AM, João Azevedo Patrício <joao.patricio gmx.pt> wrote:
> Hi,
>
> I've been trying to solve this issue but with no success.
>
> I have some data like this:
>
> 1 > TC WC
> 2 > 0 Instruments & Instrumentation; Nuclear Science & Technology;
> Physics, Particles & Fields; Spectroscopy
> 3 > 0 Nanoscience & Nanotechnology; Materials Science,
> Multidisciplinary; Physics, Applied
> 4 > 2 Physics, Nuclear; Physics, Particles & Fields
> 5 > 0 Chemistry, Inorganic & Nuclear
> 6 > 2 Chemistry, Physical; Materials Science, Multidisciplinary;
> Metallurgy & Metallurgical Engineering
>
> And I need to have this:
>
> 1 > TC WC
> 2 > 0 Instruments & Instrumentation
> 2 > 0 Nuclear Science & Technology
> 2 > 0 Physics, Particles & Fields
> 2 > 0 Spectroscopy
> 3 > 0 Nanoscience & Nanotechnology
> 3 > 0 Materials Science, Multidisciplinary
> 3 > 0 Physics, Applied
> 4 > 2 Physics, Nuclear
> 4 > 2 Physics, Particles & Fields
> 5 > 0 Chemistry, Inorganic & Nuclear
> 6 > 2 Chemistry, Physical
> 6 > 2 Materials Science, Multidisciplinary
> 6 > 2 Metallurgy & Metallurgical Engineering
>
> This means repeat the row for each element in WC and keeping the same
> value in TC. The goal is to check how many TC (sum) there are by WC,
> when WC is multiple.
>
> i've tried to separate the column using strsplt but then I cannot keep
> the track of TC.
>
> thanks in advance.
Thanks is simply fantastic!
After that I just have to do an aggregate by WC and it gives me the n of
TC by WC.
thanks!
my code looks like this:
isi <- read.table("filename", header = TRUE, sep=";") ##get citations
and web of science categories file
cSplit(isi, "WC", ";", "long") ## split by WC
isisplit <- cSplit(isi, "WC", ";", "long") ## create file with split WC info
wccitations <- aggregate (isisplit$TC, by=list(Category=isisplit$WC),
FUN = sum) ## creates a table with the list of WCategories and the
specific citations sum for each
wcproduction <- table(isisplit$WC) ## creates a table with the number of
pubs by WCategories
--
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee
"Take 2 seconds to think before you act"
More information about the R-help
mailing list