[R] Dataset suggestion sought
hadley wickham
h.wickham at gmail.com
Thu Jun 18 23:51:34 CEST 2009
> In revising my book Regression Modeling Strategies for a second edition, I
> am seeking a dataset for exemplifying multiple regression using least
> squares. Ideally the dataset would have 5-40 variables and 40-10000
> independent observations, and would generate significant interest for a wide
> variety of readers. For example, the topic could be political science,
> society, human suffering, sports, psychology, economics, entertainment,
> history, etc. The dataset needs to be publicly available.
I have a few datasets that might be of interest:
* Movie rankings from imdb, https://github.com/hadley/data-movies/tree
* Prices of 50,000 round cut diamonds (included in ggplot2)
* Baby name popularity for the top 1000 names over the whole USA
1880-2008, and top 100 names per state 1960 to 2008,
https://github.com/hadley/data-baby-names/tree
* EPA fuel economy measurements for all cars tested in the US,
https://github.com/hadley/data-fuel-economy/tree
* Many datasets about the US housing crisis (work in progress),
https://github.com/hadley/data-housing-crisis
* 500,000 house sales in the Bay Area, https://github.com/hadley/sfhousing/tree
If any of those sound of interest, I can provide more details.
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list