[R] Need to insert various rows of data from a data frame after particular rows from another dataframe
Richard O'Keefe
r@oknz @end|ng |rom gm@||@com
Thu Jul 28 08:41:34 CEST 2022
I'm retired, and I had an hour on my hands while tea cooked and my
granddaughter did her homework, and I just *love* showing off how helpful I
am.
Good news: someone finally looked at your data.
(That would be me.)
Bad news: it's going to be a lot of work to do what you want to, and YOU
SHOULDN'T EVEN TRY because it won't make any sense, as we would all have
known at once had you been clear about the structure of your data in the
beginning.
You have two files.
dacnet_yield_update till 2019.csv
is a straightforward "data" file with structure
crop factor(arhar,bajra,cotton,gram,maize,moong,mustard,
potato,rice,soyabean,sugarcane,urad,wheat)
season factor(kharif,rabi),
state.id integer(1201..1235),
state.name factor, # 34 states
district.id integer(15001..15648),
district.name factor,
year integer(1998..2017),
yield decimal(0.001 .. 314.736, precision=3)
The one problem is that the file name is misleading. It says "till 2019",
but includes no data for 2019 or 2018.
Ah, but the other file! That's not a "data" file intended for machine use
at all. It's a "display" file intended for human beings to look at and go
"wow, gosh, lookit them numbahs". It's the kind of thing that gets
included as an appendix in an official report which seems as if perversely
designed to impede the development of insight as much as possible.
Amongst other difficulties:
- the same column contains state names, district names, crop names, and
assorted junk;
- state names are not coded the same way in the two files;
- district names are in UPPERCASE in the .xls files and have numbers
prefixed to them for no apparent reason;
- crop names are not coded the same way in the two files;
- yields are not coded the same way in the two files (3 digit precision in
one, 2 digit precision in the other) and I have some doubt as to whether
they are measuring the same thing;
- above all, years appear to be CALENDAR years in the .csv file (e.g.,
2017) but FINANCIAL years in the .xsl file (e.g., 2018-19)
Now I could wrangle the .xls file into something closer to the .csv file
easily enough. I'd do it by converting the .xsl to .csv, then writing a
script in AWK. *BUT* my uncertainty that "yield" means the same thing in
the two files and my certainty that "year" does NOT mean the same thing
make it unrewarding to do so.
The .xls file is the end product of some process that derived it from data
better structured for computation. It seems like a better use of your time
to go and look for the original data.
It also seems like a good use of your time to make certain you know what
the fields of the .csv file actually mean. ARE those calendar yields, or
just part of a financial year? Why are only the yields of interest and not
the area planted? How are the yields computed?
On Wed, 27 Jul 2022 at 14:31, Ranjeet Kumar Jha <ranjeetjhaiitkgp using gmail.com>
wrote:
> Hello Everyone,
>
> I have dataset in a particular format in "dacnet_yield_update till
> 2019.xlsx" file, where I need to insert the data of rows 2018-2019 and
> 2019-2020 for the districts those data are available in "Kharif crops
> yield_18-19.xlsx". I need to insert these two rows of data belonging to
> every district, if data is available in a later excel file, just after the
> particular crop group data for the particular district.
>
> I have put the data file in the given link.
>
> https://drive.google.com/drive/u/0/folders/1dNmGTI8_c9PK1QqmfIjnpbyzuiCXgxFC
>
> Please help solving this problem.
>
> Regards and Thanks,
> Ranjeet
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list