[R] Working with data frames
William Dunlap
wdunlap at tibco.com
Thu Dec 11 18:58:04 CET 2014
Sun Shine wrote
> with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE,
xlab='Area') axis(side=2) axis(side=1, at=seq_along(levels(MHP.def$Names)),
lab=levels(MHP.def$Names))})
Error: unexpected symbol in "with(MHP.def, {plot(as.integer(MHP.def$Names),
MHP.def$cH.E, axes=FALSE, xlab='Area') axis"
This may have something to do with the period between cH and E or perhaps
from the $ to access data from a column?
--> When you see a syntax error message the error is usually towards the
end of the quoted text. In your case, you are missing a newline or
semicolon between "'Area')" and the subsequent "axis".
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Dec 11, 2014 at 9:05 AM, Sun Shine <phaedrusv at gmail.com> wrote:
> Hello William, Ivan and Jim
>
> I appreciate your replies.
>
> I did suppress the factors using stringsAsFactors=FALSE and in that way
> was able to progress some more on getting a sense of the data set, so
> thanks for that suggestion. I had previously overlooked it.
>
> Also thanks William, I never understood what those thick line segs were -
> now I do. That had been about the best I could get by that point and still
> not with the names on the x axis.
>
> Unfortunately using William's suggestion of 'with' gave me errors:
>
> > with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE,
> xlab='Area') axis(side=2) axis(side=1, at=seq_along(levels(MHP.def$Names)),
> lab=levels(MHP.def$Names))})
>
> Error: unexpected symbol in "with(MHP.def,
> {plot(as.integer(MHP.def$Names), MHP.def$cH.E, axes=FALSE, xlab='Area')
> axis"
>
> This may have something to do with the period between cH and E or perhaps
> from the $ to access data from a column?
>
> I have now installed ggplot2 and with the help of the graphics cookbook
> will see if I can make some headway like this, at least for now. I think
> William's suggestion about learning to work with factors is fundamentally
> sound and something I will need to get my head around. For now though, I
> think I'll stick to exploring ggplot2 so that I can visualise this data set
> more easily.
>
> Thanks again.
>
> Best
>
> Sun
>
>
> On 11/12/14 16:06, William Dunlap wrote:
>
> Here is a reproducible example
> > d <- read.csv(text="Name,Age\nBob,2\nXavier,25\nAdam,1")
> > str(d)
> 'data.frame': 3 obs. of 2 variables:
> $ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1
> $ Age : int 2 25 1
>
> Do you get something similar? If not, show us what you have (you
> could trim it down to a few columns).
>
> Let's try some plots.
> > plot(d$Age)
> This shows a plot of d$Age (on y axis) vs "Index", where Index is
> 1:length(d$Age). The points are at (1,2), (2,25), and (3,1). You gave
> plot() no information about what should be on the x axis so it gave
> you the index numbers.
>
> Now asking for d$Name on the x axis and d$Age on the y.
> > plot(d$Name, d$Age)
> This put the names, in alphabetical order on the x axis. The y axis
> ranges from about 0 to 25 and neither axis is labelled. There are
> thick horizontal line segments where you expect the the points to
> be. These are degenerate boxplots - when you ask to plot a
> 'factor' variable on the x axis and numbers on the y you get such
> a plot.
>
> Some folks suggested you avoid factors by adding stringsAsFactors=FALSE
> (or as.is=TRUE) to your call to read.csv. Let's try that
> > d2 <- read.csv(stringsAsFactors=FALSE,
> text="Name,Age\nBob,2\nXavier,25\nAdam,1")
> > plot(d2$Name, d2$Age)
> Error in plot.window(...) : need finite 'xlim' values
> In addition: Warning messages:
> 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
> 2: In min(x) : no non-missing arguments to min; returning Inf
> 3: In max(x) : no non-missing arguments to max; returning -Inf
> You get no plot at all.
>
> You can get closer to what I think you want with
> with(d, {
> plot(as.integer(Name), Age, axes=FALSE, xlab="Name")
> axis(side=2) # draw the usual y axis
> axis(side=1, at=seq_along(levels(Name)), lab=levels(Name))
> })
> If you want the names in a different order on the x axis, then reconstruct
> the factor object d$Name with a different order of levels. E.g.,
> d$Name <- factor(d$Name, levels=c("Xavier", "Bob", "Adam"))
> and replot.
>
> There are various plotting packages, e.g., ggplot2, that can make this
> sort of thing easier, but I think the recommendation not to use factors
> is wrong. You do need to learn how to use them to your advantage.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine <phaedrusv at gmail.com> wrote:
>
>> Hello
>>
>> I am struggling with data frames and would appreciate some help please.
>>
>> I have a data set of 13 observations and 80 variables. The first column
>> is the names of different political area boundaries (e.g. MHad, LBNW, etc),
>> the first row is a vector of variable names concerning various census data
>> (e.g. age.T, hse.Unk, etc.). The first cell [1,1] is blank.
>>
>> I have loaded this via read.csv('path.to/data.set.csv'
>> <http://path.to/data.set.csv%27>), and now want to run some analyses on
>> this data frame. If I want to get a list of the names of the political
>> areas (i.e. the first column), the result is a vector of numbers which
>> appear to correlate with the factors, but I don't get the text names, just
>> the corresponding number. So, if I want to plot something basic, like the
>> area that uses the most gas for central heating, for example:
>>
>> > plot(data.set$ch.Gas)
>>
>> The result is the y-axis gives the gas usage for the areas, but the
>> x-axis gives only the numbers of the areas, not the names of the areas
>> (which is preferred).
>>
>> So, two questions:
>>
>> (1) have I set up my csv file correctly to be read as a data frame as the
>> first row of all of the remaining columns with the values for that
>> political area in the corresponding row in the column with the specific
>> variable name? So far, looking through tutorials and books seems to suggest
>> yes, but at this point I'm no longer sure.
>>
>> (2) How can I access the names of the political areas when plotting so
>> that these are given on the x-axis instead of the numbers?
>>
>> Thanks for any help.
>>
>> Cheers
>> Sun
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list