[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display
John Kane
jrkrideau at inbox.com
Fri Dec 19 14:44:12 CET 2014
Very pretty.
I could have saved myself about 1/2 hour of mucking about if I had thought ot "length".
John Kane
Kingston ON Canada
> -----Original Message-----
> From: sven.templer at gmail.com
> Sent: Fri, 19 Dec 2014 10:13:55 +0100
> To: chl948 at mail.usask.ca
> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then
> adjust col1 data display
>
> Another solution:
>
> CaseID <- c("1015285", "1005317", "1012281", "1015285", "1015285",
> "1007183",
> "1008833", "1015315", "1015322", "1015285")
> Primary.Viol.Type <- c("AS.Age", "HS.Hours", "HS.Hours", "HS.Hours",
> "RK.Records_CL",
> "OT.Overtime", "OT.Overtime", "OT.Overtime", "V.Poster_Other",
> "V.Poster_Other")
>
> library(reshape2)
> dcast(data.frame(CaseID, Primary.Viol.Type), CaseID~Primary.Viol.Type,
> length)
>
> # result:
>
> Using Primary.Viol.Type as value column: use value.var to override.
> CaseID AS.Age HS.Hours OT.Overtime RK.Records_CL V.Poster_Other
> 1 1005317 0 1 0 0 0
> 2 1007183 0 0 1 0 0
> 3 1008833 0 0 1 0 0
> 4 1012281 0 1 0 0 0
> 5 1015285 1 1 0 1 1
> 6 1015315 0 0 1 0 0
> 7 1015322 0 0 0 0 1
>
>
> best, s.
>
> On 19 December 2014 at 06:35, Chel Hee Lee <chl948 at mail.usask.ca> wrote:
>> Please take a look at my code again. The error message says that object
>> 'Primary.Viol.Type' not found. Have you ever created the object
>> 'Primary.Viol.Type'? It will be working if you replace
>> 'Primary.Viol.Type'
>> by 'PViol.Type.Per.Case.Original$Primary.Viol.Type' where 'factor()' is
>> used. I hope this helps.
>>
>> Chel Hee Lee
>>
>> On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:
>>>
>>> Chel, your solution is fantastic on the dataset I submitted in my
>>> question
>>> but it is not working when I import my real dataset into R. Do I need
>>> to
>>> vectorize the columns in my real dataset after importing? I tried a
>>> few
>>> things (###) but not making progress:
>>>
>>> MERGE_PViol.Detail.Per.Case <-
>>> read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>>> stringsAsFactors=TRUE)
>>>
>>> ### select only certain columns
>>> PViol.Type.Per.Case.Original <-
>>> MERGE_PViol.Detail.Per.Case[,c("CaseID",
>>> "Primary.Viol.Type")]
>>>
>>> ###
>>> write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original <-
>>> read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original$X <- NULL
>>> ###PViol.Type.Per.Case.Original[] <-
>>> lapply(PViol.Type.Per.Case.Original,
>>> as.character)
>>>
>>> PViol.Type <- c("CaseID",
>>> "BW.BackWages",
>>> "LD.Liquid_Damages",
>>> "MW.Minimum_Wage",
>>> "OT.Overtime",
>>> "RK.Records_FLSA",
>>> "V.Poster_Other",
>>> "AS.Age",
>>> "BW.WHMIS_BackWages",
>>> "HS.Hours",
>>> "OA.HazOccupationAg",
>>> "ON.HazOccupationNonAg",
>>> "R3.Reg3AgeOccupation",
>>> "RK.Records_CL",
>>> "V.Other")
>>>
>>> PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>>>
>>> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels =
>>> PViol.Type) : object 'Primary.Viol.Type' not found
>>>
>>> tmp <-
>>> split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$CaseID)
>>> ans <- ifelse(do.call(rbind, lapply(tmp,
>>> function(x)table(x$Primary.Viol.Type))), 1, NA)
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Crombie, Burnette N
>>> Sent: Thursday, December 18, 2014 3:01 PM
>>> To: 'Chel Hee Lee'
>>> Subject: RE: [R] Make 2nd col of 2-col df into header row of same df
>>> then
>>> adjust col1 data display
>>>
>>> Thanks for taking the time to review this, Chel. I've got to step away
>>> from my desk, but will reply more substantially as soon as possible. --
>>> BNC
>>>
>>> -----Original Message-----
>>> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
>>> Sent: Thursday, December 18, 2014 2:43 PM
>>> To: Jeff Newmiller; Crombie, Burnette N
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df
>>> then
>>> adjust col1 data display
>>>
>>> I like the approach presented by Jeff Newmiller as shown in the
>>> previous
>>> post (I really like his way). As he suggested, it would be good to
>>> start
>>> with 'factor' since you have all values of 'Primary.Viol.Type'.
>>> You may try to use 'split()' function for creating table that you wish
>>> to
>>> build. Please see the below (I hope this helps):
>>>
>>> > PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type) > >
>>> tmp <-
>>> split(PViol.Type.Per.Case.Original,
>>> PViol.Type.Per.Case.Original$CaseID)
>>> > ans <- ifelse(do.call(rbind, lapply(tmp, function(x)
>>> table(x$Primary.Viol.Type))), 1, NA) > ans
>>> CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage
>>> OT.Overtime
>>> 1005317 NA NA NA NA
>>> NA
>>> 1007183 NA NA NA NA
>>> 1
>>> 1008833 NA NA NA NA
>>> 1
>>> 1012281 NA NA NA NA
>>> NA
>>> 1015285 NA NA NA NA
>>> NA
>>> 1015315 NA NA NA NA
>>> 1
>>> 1015322 NA NA NA NA
>>> NA
>>> RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages
>>> HS.Hours
>>> 1005317 NA NA NA NA
>>> 1
>>> 1007183 NA NA NA NA
>>> NA
>>> 1008833 NA NA NA NA
>>> NA
>>> 1012281 NA NA NA NA
>>> 1
>>> 1015285 NA 1 1 NA
>>> 1
>>> 1015315 NA NA NA NA
>>> NA
>>> 1015322 NA 1 NA NA
>>> NA
>>> OA.HazOccupationAg ON.HazOccupationNonAg R3.Reg3AgeOccupation
>>> 1005317 NA NA NA
>>> 1007183 NA NA NA
>>> 1008833 NA NA NA
>>> 1012281 NA NA NA
>>> 1015285 NA NA NA
>>> 1015315 NA NA NA
>>> 1015322 NA NA NA
>>> RK.Records_CL V.Other
>>> 1005317 NA NA
>>> 1007183 NA NA
>>> 1008833 NA NA
>>> 1012281 NA NA
>>> 1015285 1 NA
>>> 1015315 NA NA
>>> 1015322 NA NA
>>> >
>>>
>>> Chel Hee Lee
>>>
>>> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>>>>
>>>> No guarantees on "best"... but one way using base R could be:
>>>>
>>>> # Note that "CaseID" is actually not a valid PViol.Type as you had it
>>>> PViol.Type <- c( "BW.BackWages"
>>>> , "LD.Liquid_Damages"
>>>> , "MW.Minimum_Wage"
>>>> , "OT.Overtime"
>>>> , "RK.Records_FLSA"
>>>> , "V.Poster_Other"
>>>> , "AS.Age"
>>>> , "BW.WHMIS_BackWages"
>>>> , "HS.Hours"
>>>> , "OA.HazOccupationAg"
>>>> , "ON.HazOccupationNonAg"
>>>> , "R3.Reg3AgeOccupation"
>>>> , "RK.Records_CL"
>>>> , "V.Other" )
>>>>
>>>> # explicitly specifying all levels to the factor insures a complete #
>>>> set of column outputs regardless of what is in the input
>>>> PViol.Type.Per.Case.Original <-
>>>> data.frame( CaseID
>>>> , Primary.Viol.Type=factor( Primary.Viol.Type
>>>> , levels=PViol.Type ) )
>>>>
>>>> tmp <- table( PViol.Type.Per.Case.Original ) ans <- data.frame(
>>>> CaseID=rownames( tmp )
>>>> , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>>> )
>>>>
>>>>
>>>> On Wed, 17 Dec 2014, bcrombie wrote:
>>>>
>>>>> # I have a dataframe that contains 2 columns:
>>>>> CaseID <- c('1015285',
>>>>> '1005317',
>>>>> '1012281',
>>>>> '1015285',
>>>>> '1015285',
>>>>> '1007183',
>>>>> '1008833',
>>>>> '1015315',
>>>>> '1015322',
>>>>> '1015285')
>>>>>
>>>>> Primary.Viol.Type <- c('AS.Age',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'RK.Records_CL',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'V.Poster_Other',
>>>>> 'V.Poster_Other')
>>>>>
>>>>> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>>>>>
>>>>> # CaseID?s can be repeated because there can be up to 14
>>>>> Primary.Viol.Type?s per CaseID.
>>>>>
>>>>> # I want to transform this dataframe into one that has 15 columns,
>>>>> where the first column is CaseID, and the rest are the 14 primary
>>>>> viol. types. The CaseID column will contain a list of the unique
>>>>> CaseID?s (no
>>>>> replicates) and
>>>>> for each of their rows, there will be a ?1? under a column
>>>>> corresponding to a primary violation type recorded for that CaseID.
>>>>> So, technically, there could be zero to 14 ?1?s? in a CaseID?s row.
>>>>>
>>>>> # For example, the row for CaseID '1015285' above would have a ?1?
>>>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?,
>>>>> but have "NA"
>>>>> under the rest of the columns.
>>>>>
>>>>> PViol.Type <- c("CaseID",
>>>>> "BW.BackWages",
>>>>> "LD.Liquid_Damages",
>>>>> "MW.Minimum_Wage",
>>>>> "OT.Overtime",
>>>>> "RK.Records_FLSA",
>>>>> "V.Poster_Other",
>>>>> "AS.Age",
>>>>> "BW.WHMIS_BackWages",
>>>>> "HS.Hours",
>>>>> "OA.HazOccupationAg",
>>>>> "ON.HazOccupationNonAg",
>>>>> "R3.Reg3AgeOccupation",
>>>>> "RK.Records_CL",
>>>>> "V.Other")
>>>>>
>>>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>>>
>>>>> # What is the best way to do this in R?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-ro
>>>>> w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>>>
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------------
>>>> Jeff Newmiller The ..... ..... Go
>>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
>>>> Go...
>>>> Live: OO#.. Dead: OO#..
>>>> Playing
>>>> Research Engineer (Solar/Batteries O.O#. #.O#. with
>>>> /Software/Embedded Controllers) .OO#. .OO#.
>>>> rocks...1k
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.
More information about the R-help
mailing list