[R] Unable to read csv files with comma in values

Sun Apr 7 18:55:54 CEST 2019

(Note: This follows an earlier mistaken reply just to Duncan)

Multiple "amens!" to Duncan's comments...

However:

Here is a start at my interpretation of how to do what you want. Note first
that your "example" listed 4 fields in the line, but you showed only 3. I
modified your example for 3 text fields, only one of which has brackets
([...]) in it I assume. Here is a little example of how to use regex's to
replace the commas within the brackets by "-", which would presumably then
allow you to easily convert the text into a data frame e.g. using
textConnection() and read.csv. Obviously, if this is not what you meant,
read no further.

##Example
txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
        "Mary, [Stats, Designer, R], t14")

wh <- grep("\\[.+\\]",txt)  ## which records need to be modified?
fixup <- gsub(" *, *","-",sub(".+(\\[.+\\]).+","\\1",txt[wh])) ## bracketed
expressions, changing "," to "-"

## Unfortunately, the "replacement" argument in sub() is not vectorized, se
we need a loop:

for(i in wh) txt[wh[i]] <- sub("\\[.+\\]",fixup[i],txt[wh[i]]) ## replace
original bracketed text with fixed up bracketed text

> txt
[1] "Sam, [HadoopAnalyst-DBA-Developer], R46443 "
[2] "Jan, DBA, R101"
[3] "Mary, [HadoopAnalyst-DBA-Developer], t14"

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sun, Apr 7, 2019 at 9:00 AM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 06/04/2019 10:03 a.m., Amit Govil wrote:
> > Hi,
> >
> > I have a bunch of csv files to read in R. I'm unable to read them
> correctly
> > because in some of the files, there is a column ("Role") which has comma
> in
> > the values.
> >
> > Sample data:
> >
> > User, Role, Rule, GAPId
> > Sam, [HadoopAnalyst, DBA, Developer], R46443
> >
> > I'm trying to play with the below code but it doesnt work:
>
> Since you didn't give a reproducible example, you should at least say
> what "doesn't work" means.
>
> But here's some general advice:  if you want to debug code, don't write
> huge expressions like the chain of functions below, put things in
> temporary variables and make sure you get what you were expecting at
> each stage.
>
> Instead of
> >
> > files <- list.files(pattern='.*REDUNDANT(.*).csv$')
> >
> > tbl <- sapply(files, function(f) {
> >    gsub('\\[|\\]', '"', readLines(f)) %>%
> >      read.csv(text = ., check.names = FALSE)
> > }) %>%
> >    bind_rows(.id = "id") %>%
> >    select(id, User, Rule) %>%
> >    distinct()
>
> try
>
>
> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>
> tmp1 <- sapply(files, function(f) {
>    gsub('\\[|\\]', '"', readLines(f)) %>%
>      read.csv(text = ., check.names = FALSE)
> })
>
> tmp2 <- tmp1 %>% bind_rows(.id = "id")
>
> tmp3 <- tmp2 %>% select(id, User, Rule)
>
> tbl <- tmp3 %>% distinct()
>
> (You don't need pipes here, but it will make it easier to put the giant
> expression back together at the end.)
>
> Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went
> wrong.
>
> Duncan Murdoch
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]