[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

John Kane jrkrideau at inbox.com
Fri Dec 19 14:44:12 CET 2014


Very pretty. 
I could have saved myself about 1/2 hour of mucking about if I had thought ot "length".

John Kane
Kingston ON Canada


> -----Original Message-----
> From: sven.templer at gmail.com
> Sent: Fri, 19 Dec 2014 10:13:55 +0100
> To: chl948 at mail.usask.ca
> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then
> adjust col1 data display
> 
> Another solution:
> 
> CaseID <- c("1015285", "1005317", "1012281", "1015285", "1015285",
> "1007183",
> "1008833", "1015315", "1015322", "1015285")
> Primary.Viol.Type <- c("AS.Age", "HS.Hours", "HS.Hours", "HS.Hours",
> "RK.Records_CL",
> "OT.Overtime", "OT.Overtime", "OT.Overtime", "V.Poster_Other",
> "V.Poster_Other")
> 
> library(reshape2)
> dcast(data.frame(CaseID, Primary.Viol.Type), CaseID~Primary.Viol.Type,
> length)
> 
> # result:
> 
> Using Primary.Viol.Type as value column: use value.var to override.
>    CaseID AS.Age HS.Hours OT.Overtime RK.Records_CL V.Poster_Other
> 1 1005317      0        1           0             0              0
> 2 1007183      0        0           1             0              0
> 3 1008833      0        0           1             0              0
> 4 1012281      0        1           0             0              0
> 5 1015285      1        1           0             1              1
> 6 1015315      0        0           1             0              0
> 7 1015322      0        0           0             0              1
> 
> 
> best, s.
> 
> On 19 December 2014 at 06:35, Chel Hee Lee <chl948 at mail.usask.ca> wrote:
>> Please take a look at my code again.  The error message says that object
>> 'Primary.Viol.Type' not found.  Have you ever created the object
>> 'Primary.Viol.Type'?   It will be working if you replace
>> 'Primary.Viol.Type'
>> by 'PViol.Type.Per.Case.Original$Primary.Viol.Type' where 'factor()' is
>> used.  I hope this helps.
>> 
>> Chel Hee Lee
>> 
>> On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:
>>> 
>>> Chel, your solution is fantastic on the dataset I submitted in my
>>> question
>>> but it is not working when I import my real dataset into R.  Do I need
>>> to
>>> vectorize the columns in my real dataset after importing?  I tried a
>>> few
>>> things (###) but not making progress:
>>> 
>>> MERGE_PViol.Detail.Per.Case <-
>>> read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>>> stringsAsFactors=TRUE)
>>> 
>>> ### select only certain columns
>>> PViol.Type.Per.Case.Original <-
>>> MERGE_PViol.Detail.Per.Case[,c("CaseID",
>>> "Primary.Viol.Type")]
>>> 
>>> ###
>>> write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original <-
>>> read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original$X <- NULL
>>> ###PViol.Type.Per.Case.Original[] <-
>>> lapply(PViol.Type.Per.Case.Original,
>>> as.character)
>>> 
>>> PViol.Type <- c("CaseID",
>>>                  "BW.BackWages",
>>>                  "LD.Liquid_Damages",
>>>                  "MW.Minimum_Wage",
>>>                  "OT.Overtime",
>>>                  "RK.Records_FLSA",
>>>                  "V.Poster_Other",
>>>                  "AS.Age",
>>>                  "BW.WHMIS_BackWages",
>>>                  "HS.Hours",
>>>                  "OA.HazOccupationAg",
>>>                  "ON.HazOccupationNonAg",
>>>                  "R3.Reg3AgeOccupation",
>>>                  "RK.Records_CL",
>>>                  "V.Other")
>>> 
>>> PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>>> 
>>> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels =
>>> PViol.Type) :  object 'Primary.Viol.Type' not found
>>> 
>>> tmp <-
>>> split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$CaseID)
>>> ans <- ifelse(do.call(rbind, lapply(tmp,
>>> function(x)table(x$Primary.Viol.Type))), 1, NA)
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Crombie, Burnette N
>>> Sent: Thursday, December 18, 2014 3:01 PM
>>> To: 'Chel Hee Lee'
>>> Subject: RE: [R] Make 2nd col of 2-col df into header row of same df
>>> then
>>> adjust col1 data display
>>> 
>>> Thanks for taking the time to review this, Chel.  I've got to step away
>>> from my desk, but will reply more substantially as soon as possible. --
>>> BNC
>>> 
>>> -----Original Message-----
>>> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
>>> Sent: Thursday, December 18, 2014 2:43 PM
>>> To: Jeff Newmiller; Crombie, Burnette N
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df
>>> then
>>> adjust col1 data display
>>> 
>>> I like the approach presented by Jeff Newmiller as shown in the
>>> previous
>>> post (I really like his way).  As he suggested, it would be good to
>>> start
>>> with 'factor' since you have all values of 'Primary.Viol.Type'.
>>> You may try to use 'split()' function for creating table that you wish
>>> to
>>> build.  Please see the below (I hope this helps):
>>> 
>>>   > PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)  >  >
>>> tmp <-
>>> split(PViol.Type.Per.Case.Original,
>>> PViol.Type.Per.Case.Original$CaseID)
>>>   > ans <- ifelse(do.call(rbind, lapply(tmp, function(x)
>>> table(x$Primary.Viol.Type))), 1, NA)  > ans
>>>           CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage
>>> OT.Overtime
>>> 1005317     NA           NA                NA              NA
>>> NA
>>> 1007183     NA           NA                NA              NA
>>> 1
>>> 1008833     NA           NA                NA              NA
>>> 1
>>> 1012281     NA           NA                NA              NA
>>> NA
>>> 1015285     NA           NA                NA              NA
>>> NA
>>> 1015315     NA           NA                NA              NA
>>> 1
>>> 1015322     NA           NA                NA              NA
>>> NA
>>>           RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages
>>> HS.Hours
>>> 1005317              NA             NA     NA                 NA
>>> 1
>>> 1007183              NA             NA     NA                 NA
>>> NA
>>> 1008833              NA             NA     NA                 NA
>>> NA
>>> 1012281              NA             NA     NA                 NA
>>> 1
>>> 1015285              NA              1      1                 NA
>>> 1
>>> 1015315              NA             NA     NA                 NA
>>> NA
>>> 1015322              NA              1     NA                 NA
>>> NA
>>>           OA.HazOccupationAg ON.HazOccupationNonAg R3.Reg3AgeOccupation
>>> 1005317                 NA                    NA                   NA
>>> 1007183                 NA                    NA                   NA
>>> 1008833                 NA                    NA                   NA
>>> 1012281                 NA                    NA                   NA
>>> 1015285                 NA                    NA                   NA
>>> 1015315                 NA                    NA                   NA
>>> 1015322                 NA                    NA                   NA
>>>           RK.Records_CL V.Other
>>> 1005317            NA      NA
>>> 1007183            NA      NA
>>> 1008833            NA      NA
>>> 1012281            NA      NA
>>> 1015285             1      NA
>>> 1015315            NA      NA
>>> 1015322            NA      NA
>>>   >
>>> 
>>> Chel Hee Lee
>>> 
>>> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>>>> 
>>>> No guarantees on "best"... but one way using base R could be:
>>>> 
>>>> # Note that "CaseID" is actually not a valid PViol.Type as you had it
>>>> PViol.Type <- c( "BW.BackWages"
>>>>                  , "LD.Liquid_Damages"
>>>>                  , "MW.Minimum_Wage"
>>>>                  , "OT.Overtime"
>>>>                  , "RK.Records_FLSA"
>>>>                  , "V.Poster_Other"
>>>>                  , "AS.Age"
>>>>                  , "BW.WHMIS_BackWages"
>>>>                  , "HS.Hours"
>>>>                  , "OA.HazOccupationAg"
>>>>                  , "ON.HazOccupationNonAg"
>>>>                  , "R3.Reg3AgeOccupation"
>>>>                  , "RK.Records_CL"
>>>>                  , "V.Other" )
>>>> 
>>>> # explicitly specifying all levels to the factor insures a complete #
>>>> set of column outputs regardless of what is in the input
>>>> PViol.Type.Per.Case.Original <-
>>>>       data.frame( CaseID
>>>>                 , Primary.Viol.Type=factor( Primary.Viol.Type
>>>>                                           , levels=PViol.Type ) )
>>>> 
>>>> tmp <- table( PViol.Type.Per.Case.Original ) ans <- data.frame(
>>>> CaseID=rownames( tmp )
>>>>                    , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>>>                    )
>>>> 
>>>> 
>>>> On Wed, 17 Dec 2014, bcrombie wrote:
>>>> 
>>>>> # I have a dataframe that contains 2 columns:
>>>>> CaseID  <- c('1015285',
>>>>> '1005317',
>>>>> '1012281',
>>>>> '1015285',
>>>>> '1015285',
>>>>> '1007183',
>>>>> '1008833',
>>>>> '1015315',
>>>>> '1015322',
>>>>> '1015285')
>>>>> 
>>>>> Primary.Viol.Type <- c('AS.Age',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'RK.Records_CL',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'V.Poster_Other',
>>>>> 'V.Poster_Other')
>>>>> 
>>>>> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>>>>> 
>>>>> # CaseID?s can be repeated because there can be up to 14
>>>>> Primary.Viol.Type?s per CaseID.
>>>>> 
>>>>> # I want to transform this dataframe into one that has 15 columns,
>>>>> where the first column is CaseID, and the rest are the 14 primary
>>>>> viol. types.  The CaseID column will contain a list of the unique
>>>>> CaseID?s (no
>>>>> replicates) and
>>>>> for each of their rows, there will be a ?1? under  a column
>>>>> corresponding to a primary violation type recorded for that CaseID.
>>>>> So, technically, there could be zero to 14 ?1?s? in a CaseID?s row.
>>>>> 
>>>>> # For example, the row for CaseID '1015285' above would have a ?1?
>>>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?,
>>>>> but have "NA"
>>>>> under the rest of the columns.
>>>>> 
>>>>> PViol.Type <- c("CaseID",
>>>>>                 "BW.BackWages",
>>>>>            "LD.Liquid_Damages",
>>>>>            "MW.Minimum_Wage",
>>>>>            "OT.Overtime",
>>>>>            "RK.Records_FLSA",
>>>>>            "V.Poster_Other",
>>>>>            "AS.Age",
>>>>>            "BW.WHMIS_BackWages",
>>>>>            "HS.Hours",
>>>>>            "OA.HazOccupationAg",
>>>>>            "ON.HazOccupationNonAg",
>>>>>            "R3.Reg3AgeOccupation",
>>>>>            "RK.Records_CL",
>>>>>            "V.Other")
>>>>> 
>>>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>>> 
>>>>> # What is the best way to do this in R?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>>>>> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-ro
>>>>> w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>>> 
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....  Go
>>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>>> Go...
>>>>                                         Live:   OO#.. Dead: OO#..
>>>> Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>> rocks...1k
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.



More information about the R-help mailing list