[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Crombie, Burnette N bcrombie at utk.edu
Fri Dec 19 14:52:03 CET 2014


That is the solution I had tried first (yes, it's nice!), but it doesn't provide the other PViol.Type's that aren't necessarily in my dataset.  That's where my problem is.  I'm closer to the cure, though, and think I've thought of a solution as soon as I have time.  I'll update everyone then. -- BNC

-----Original Message-----
From: John Kane [mailto:jrkrideau at inbox.com] 
Sent: Friday, December 19, 2014 8:44 AM
To: Sven E. Templer; Chel Hee Lee
Cc: R Help List; Crombie, Burnette N
Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Very pretty. 
I could have saved myself about 1/2 hour of mucking about if I had thought ot "length".

John Kane
Kingston ON Canada


> -----Original Message-----
> From: sven.templer at gmail.com
> Sent: Fri, 19 Dec 2014 10:13:55 +0100
> To: chl948 at mail.usask.ca
> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df 
> then adjust col1 data display
> 
> Another solution:
> 
> CaseID <- c("1015285", "1005317", "1012281", "1015285", "1015285", 
> "1007183", "1008833", "1015315", "1015322", "1015285") 
> Primary.Viol.Type <- c("AS.Age", "HS.Hours", "HS.Hours", "HS.Hours", 
> "RK.Records_CL", "OT.Overtime", "OT.Overtime", "OT.Overtime", 
> "V.Poster_Other",
> "V.Poster_Other")
> 
> library(reshape2)
> dcast(data.frame(CaseID, Primary.Viol.Type), CaseID~Primary.Viol.Type,
> length)
> 
> # result:
> 
> Using Primary.Viol.Type as value column: use value.var to override.
>    CaseID AS.Age HS.Hours OT.Overtime RK.Records_CL V.Poster_Other
> 1 1005317      0        1           0             0              0
> 2 1007183      0        0           1             0              0
> 3 1008833      0        0           1             0              0
> 4 1012281      0        1           0             0              0
> 5 1015285      1        1           0             1              1
> 6 1015315      0        0           1             0              0
> 7 1015322      0        0           0             0              1
> 
> 
> best, s.
> 
> On 19 December 2014 at 06:35, Chel Hee Lee <chl948 at mail.usask.ca> wrote:
>> Please take a look at my code again.  The error message says that 
>> object 'Primary.Viol.Type' not found.  Have you ever created the object
>> 'Primary.Viol.Type'?   It will be working if you replace
>> 'Primary.Viol.Type'
>> by 'PViol.Type.Per.Case.Original$Primary.Viol.Type' where 'factor()' 
>> is used.  I hope this helps.
>> 
>> Chel Hee Lee
>> 
>> On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:
>>> 
>>> Chel, your solution is fantastic on the dataset I submitted in my 
>>> question but it is not working when I import my real dataset into R.  
>>> Do I need to vectorize the columns in my real dataset after 
>>> importing?  I tried a few things (###) but not making progress:
>>> 
>>> MERGE_PViol.Detail.Per.Case <-
>>> read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>>> stringsAsFactors=TRUE)
>>> 
>>> ### select only certain columns
>>> PViol.Type.Per.Case.Original <-
>>> MERGE_PViol.Detail.Per.Case[,c("CaseID",
>>> "Primary.Viol.Type")]
>>> 
>>> ###
>>> write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original <-
>>> read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original$X <- NULL 
>>> ###PViol.Type.Per.Case.Original[] <- 
>>> lapply(PViol.Type.Per.Case.Original,
>>> as.character)
>>> 
>>> PViol.Type <- c("CaseID",
>>>                  "BW.BackWages",
>>>                  "LD.Liquid_Damages",
>>>                  "MW.Minimum_Wage",
>>>                  "OT.Overtime",
>>>                  "RK.Records_FLSA",
>>>                  "V.Poster_Other",
>>>                  "AS.Age",
>>>                  "BW.WHMIS_BackWages",
>>>                  "HS.Hours",
>>>                  "OA.HazOccupationAg",
>>>                  "ON.HazOccupationNonAg",
>>>                  "R3.Reg3AgeOccupation",
>>>                  "RK.Records_CL",
>>>                  "V.Other")
>>> 
>>> PViol.Type.Per.Case.Original$Primary.Viol.Type <- 
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>>> 
>>> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels =
>>> PViol.Type) :  object 'Primary.Viol.Type' not found
>>> 
>>> tmp <-
>>> split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$Case
>>> ID) ans <- ifelse(do.call(rbind, lapply(tmp, 
>>> function(x)table(x$Primary.Viol.Type))), 1, NA)
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Crombie, Burnette N
>>> Sent: Thursday, December 18, 2014 3:01 PM
>>> To: 'Chel Hee Lee'
>>> Subject: RE: [R] Make 2nd col of 2-col df into header row of same df 
>>> then adjust col1 data display
>>> 
>>> Thanks for taking the time to review this, Chel.  I've got to step 
>>> away from my desk, but will reply more substantially as soon as 
>>> possible. -- BNC
>>> 
>>> -----Original Message-----
>>> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
>>> Sent: Thursday, December 18, 2014 2:43 PM
>>> To: Jeff Newmiller; Crombie, Burnette N
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df 
>>> then adjust col1 data display
>>> 
>>> I like the approach presented by Jeff Newmiller as shown in the 
>>> previous post (I really like his way).  As he suggested, it would be 
>>> good to start with 'factor' since you have all values of 
>>> 'Primary.Viol.Type'.
>>> You may try to use 'split()' function for creating table that you 
>>> wish to build.  Please see the below (I hope this helps):
>>> 
>>>   > PViol.Type.Per.Case.Original$Primary.Viol.Type <- 
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)  >  
>>> > tmp <- split(PViol.Type.Per.Case.Original,
>>> PViol.Type.Per.Case.Original$CaseID)
>>>   > ans <- ifelse(do.call(rbind, lapply(tmp, function(x) 
>>> table(x$Primary.Viol.Type))), 1, NA)  > ans
>>>           CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage 
>>> OT.Overtime
>>> 1005317     NA           NA                NA              NA
>>> NA
>>> 1007183     NA           NA                NA              NA
>>> 1
>>> 1008833     NA           NA                NA              NA
>>> 1
>>> 1012281     NA           NA                NA              NA
>>> NA
>>> 1015285     NA           NA                NA              NA
>>> NA
>>> 1015315     NA           NA                NA              NA
>>> 1
>>> 1015322     NA           NA                NA              NA
>>> NA
>>>           RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages 
>>> HS.Hours
>>> 1005317              NA             NA     NA                 NA
>>> 1
>>> 1007183              NA             NA     NA                 NA
>>> NA
>>> 1008833              NA             NA     NA                 NA
>>> NA
>>> 1012281              NA             NA     NA                 NA
>>> 1
>>> 1015285              NA              1      1                 NA
>>> 1
>>> 1015315              NA             NA     NA                 NA
>>> NA
>>> 1015322              NA              1     NA                 NA
>>> NA
>>>           OA.HazOccupationAg ON.HazOccupationNonAg R3.Reg3AgeOccupation
>>> 1005317                 NA                    NA                   NA
>>> 1007183                 NA                    NA                   NA
>>> 1008833                 NA                    NA                   NA
>>> 1012281                 NA                    NA                   NA
>>> 1015285                 NA                    NA                   NA
>>> 1015315                 NA                    NA                   NA
>>> 1015322                 NA                    NA                   NA
>>>           RK.Records_CL V.Other
>>> 1005317            NA      NA
>>> 1007183            NA      NA
>>> 1008833            NA      NA
>>> 1012281            NA      NA
>>> 1015285             1      NA
>>> 1015315            NA      NA
>>> 1015322            NA      NA
>>>   >
>>> 
>>> Chel Hee Lee
>>> 
>>> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>>>> 
>>>> No guarantees on "best"... but one way using base R could be:
>>>> 
>>>> # Note that "CaseID" is actually not a valid PViol.Type as you had 
>>>> it PViol.Type <- c( "BW.BackWages"
>>>>                  , "LD.Liquid_Damages"
>>>>                  , "MW.Minimum_Wage"
>>>>                  , "OT.Overtime"
>>>>                  , "RK.Records_FLSA"
>>>>                  , "V.Poster_Other"
>>>>                  , "AS.Age"
>>>>                  , "BW.WHMIS_BackWages"
>>>>                  , "HS.Hours"
>>>>                  , "OA.HazOccupationAg"
>>>>                  , "ON.HazOccupationNonAg"
>>>>                  , "R3.Reg3AgeOccupation"
>>>>                  , "RK.Records_CL"
>>>>                  , "V.Other" )
>>>> 
>>>> # explicitly specifying all levels to the factor insures a complete 
>>>> # set of column outputs regardless of what is in the input 
>>>> PViol.Type.Per.Case.Original <-
>>>>       data.frame( CaseID
>>>>                 , Primary.Viol.Type=factor( Primary.Viol.Type
>>>>                                           , levels=PViol.Type ) )
>>>> 
>>>> tmp <- table( PViol.Type.Per.Case.Original ) ans <- data.frame( 
>>>> CaseID=rownames( tmp )
>>>>                    , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>>>                    )
>>>> 
>>>> 
>>>> On Wed, 17 Dec 2014, bcrombie wrote:
>>>> 
>>>>> # I have a dataframe that contains 2 columns:
>>>>> CaseID  <- c('1015285',
>>>>> '1005317',
>>>>> '1012281',
>>>>> '1015285',
>>>>> '1015285',
>>>>> '1007183',
>>>>> '1008833',
>>>>> '1015315',
>>>>> '1015322',
>>>>> '1015285')
>>>>> 
>>>>> Primary.Viol.Type <- c('AS.Age',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'RK.Records_CL',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'V.Poster_Other',
>>>>> 'V.Poster_Other')
>>>>> 
>>>>> PViol.Type.Per.Case.Original <- 
>>>>> data.frame(CaseID,Primary.Viol.Type)
>>>>> 
>>>>> # CaseID?s can be repeated because there can be up to 14 
>>>>> Primary.Viol.Type?s per CaseID.
>>>>> 
>>>>> # I want to transform this dataframe into one that has 15 columns, 
>>>>> where the first column is CaseID, and the rest are the 14 primary 
>>>>> viol. types.  The CaseID column will contain a list of the unique 
>>>>> CaseID?s (no
>>>>> replicates) and
>>>>> for each of their rows, there will be a ?1? under  a column 
>>>>> corresponding to a primary violation type recorded for that CaseID.
>>>>> So, technically, there could be zero to 14 ?1?s? in a CaseID?s row.
>>>>> 
>>>>> # For example, the row for CaseID '1015285' above would have a ?1?
>>>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?, 
>>>>> but have "NA"
>>>>> under the rest of the columns.
>>>>> 
>>>>> PViol.Type <- c("CaseID",
>>>>>                 "BW.BackWages",
>>>>>            "LD.Liquid_Damages",
>>>>>            "MW.Minimum_Wage",
>>>>>            "OT.Overtime",
>>>>>            "RK.Records_FLSA",
>>>>>            "V.Poster_Other",
>>>>>            "AS.Age",
>>>>>            "BW.WHMIS_BackWages",
>>>>>            "HS.Hours",
>>>>>            "OA.HazOccupationAg",
>>>>>            "ON.HazOccupationNonAg",
>>>>>            "R3.Reg3AgeOccupation",
>>>>>            "RK.Records_CL",
>>>>>            "V.Other")
>>>>> 
>>>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>>> 
>>>>> # What is the best way to do this in R?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>>>>> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header
>>>>> -ro w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>>> 
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....  Go
>>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>>> Go...
>>>>                                         Live:   OO#.. Dead: OO#..
>>>> Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>> rocks...1k
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.



More information about the R-help mailing list