[R] Query about calculating the monthly average of daily data columns

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sun Oct 20 18:38:21 CEST 2019


Hello,

Here are two other ways using aggregate.
The difference is in the way to create a MONTH grouping column.
The second way is base R only.


df1$MONTH <- zoo::as.yearmon(as.Date(as.character(df1$DATE), '%Y%m%d'))
aggregate(Spread ~ PERMNO + MONTH, df1, mean)

df1$MONTH <- df1$DATE %/% 100
aggregate(Spread ~ PERMNO + MONTH, df1, mean)


Hope this helps,

Rui Barradas


Às 17:17 de 20/10/19, jim holtman escreveu:
> Does this do what you want:
> 
>> library(tidyverse)
> 
>> input <- read_delim("PERMNO DATE Spread
> + 111 19940103 0.025464308
> + 111 19940104 0.064424296
> + 111 19940105 0.018579337
> + 111 19940106 0.018872211
>   ..." ... [TRUNCATED]
> 
>> # drop last two digits to get the month
>> monthly <- input %>%
> +   group_by(PERMNO, month = DATE %/% 100) %>%
> +   summarise(avg = mean(Spread))
>> monthly
> # A tibble: 12 x 3
> # Groups:   PERMNO [3]
>     PERMNO  month      avg
>      <dbl>  <dbl>    <dbl>
>   1    111 199401 0.0416
>   2    111 199402 0.0508
>   3    111 199403 0.0567
>   4    111 199404 0.0466
>   5    112 199401 0.000533
>   6    112 199402 0.000593
>   7    112 199403 0.000471
>   8    112 199404 0.000587
>   9    113 199401 0.000692
> 10    113 199402 0.000591
> 11    113 199403 0.000677
> 12    113 199404 0.000555
>>
> 
> 
> Jim Holtman
> *Data Munger Guru*
> 
> 
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
> 
> 
> On Sun, Oct 20, 2019 at 5:10 AM Subhamitra Patra <subhamitra.patra using gmail.com>
> wrote:
> 
>> Dear Sir,
>>
>> Thank you very much for your suggestions.
>>
>> Due to certain inconveniences, I was unable to work on your suggestions.
>>
>> Today I worked on both suggestions and got the result that I really wanted
>> that monthly averages for each country.
>>
>> Here, I am asking one more query (just for learning purpose) that if my
>> country name and its respective variable is in the panel format, and I want
>> to take the monthly average for each country, how the code will be
>> arranged. For your convenience, I am providing a small data sample below.
>>
>> PERMNO DATE Spread
>> 111 19940103 0.025464308
>> 111 19940104 0.064424296
>> 111 19940105 0.018579337
>> 111 19940106 0.018872211
>> 111 19940107 0.065279782
>> 111 19940110 0.063485905
>> 111 19940111 0.018355453
>> 111 19940112 0.064135683
>> 111 19940113 0.063519987
>> 111 19940114 0.018277351
>> 111 19940117 0.018628417
>> 111 19940118 0.065630229
>> 111 19940119 0.018713152
>> 111 19940120 0.019119037
>> 111 19940121 0.068342043
>> 111 19940124 0.020843244
>> 111 19940125 0.019954211
>> 111 19940126 0.018980321
>> 111 19940127 0.066827165
>> 111 19940128 0.067459235
>> 111 19940131 0.068682559
>> 111 19940201 0.02081465
>> 111 19940202 0.068236091
>> 111 19940203 0.068821406
>> 111 19940204 0.020075648
>> 111 19940207 0.066070584
>> 111 19940208 0.066068837
>> 111 19940209 0.019077072
>> 111 19940210 0.065894875
>> 111 19940211 0.018847478
>> 111 19940214 0.065040844
>> 111 19940215 0.01880332
>> 111 19940216 0.018836199
>> 111 19940217 0.066888865
>> 111 19940218 0.067116793
>> 111 19940221 0.068809742
>> 111 19940222 0.068230213
>> 111 19940223 0.069502855
>> 111 19940224 0.070383523
>> 111 19940225 0.020430811
>> 111 19940228 0.067087257
>> 111 19940301 0.066776479
>> 111 19940302 0.019959031
>> 111 19940303 0.066596469
>> 111 19940304 0.019131334
>> 111 19940307 0.019312528
>> 111 19940308 0.067349909
>> 111 19940309 0.068916431
>> 111 19940310 0.068620043
>> 111 19940311 0.070494844
>> 111 19940314 0.071056842
>> 111 19940315 0.071042517
>> 111 19940316 0.072401771
>> 111 19940317 0.071940001
>> 111 19940318 0.07352884
>> 111 19940321 0.072671688
>> 111 19940322 0.072652595
>> 111 19940323 0.021352138
>> 111 19940324 0.069933727
>> 111 19940325 0.068717467
>> 111 19940328 0.020470748
>> 111 19940329 0.020003748
>> 111 19940330 0.065833717
>> 111 19940331 0.065268388
>> 111 19940401 0.018762356
>> 111 19940404 0.064914179
>> 111 19940405 0.064706743
>> 111 19940406 0.018764175
>> 111 19940407 0.06524806
>> 111 19940408 0.018593449
>> 111 19940411 0.064913949
>> 111 19940412 0.01872089
>> 111 19940413 0.018729328
>> 111 19940414 0.018978773
>> 111 19940415 0.065477137
>> 111 19940418 0.064614365
>> 111 19940419 0.064184148
>> 111 19940420 0.018553192
>> 111 19940421 0.066872771
>> 111 19940422 0.06680782
>> 111 19940425 0.067467961
>> 111 19940426 0.02014297
>> 111 19940427 0.062464016
>> 111 19940428 0.062357052
>> 112 19940429 0.000233993
>> 112 19940103 0.000815264
>> 112 19940104 0.000238165
>> 112 19940105 0.000813632
>> 112 19940106 0.000236915
>> 112 19940107 0.000809102
>> 112 19940110 0.000801642
>> 112 19940111 0.000797932
>> 112 19940112 0.000795251
>> 112 19940113 0.000795186
>> 112 19940114 0.000231359
>> 112 19940117 0.000232134
>> 112 19940118 0.000233718
>> 112 19940119 0.000233993
>> 112 19940120 0.000234694
>> 112 19940121 0.000235753
>> 112 19940124 0.000808653
>> 112 19940125 0.000235604
>> 112 19940126 0.000805068
>> 112 19940127 0.000802337
>> 112 19940128 0.000801768
>> 112 19940131 0.000233517
>> 112 19940201 0.000797431
>> 112 19940202 0.000233338
>> 112 19940203 0.000233826
>> 112 19940204 0.000799519
>> 112 19940207 0.000798105
>> 112 19940208 0.000792245
>> 112 19940209 0.000231113
>> 112 19940210 0.000233413
>> 112 19940211 0.000798168
>> 112 19940214 0.000233282
>> 112 19940215 0.000797848
>> 112 19940216 0.000785165
>> 112 19940217 0.000228426
>> 112 19940218 0.000786783
>> 112 19940221 0.00078343
>> 112 19940222 0.000781459
>> 112 19940223 0.000776264
>> 112 19940224 0.000226399
>> 112 19940225 0.000779066
>> 112 19940228 0.000773603
>> 112 19940301 0.000226487
>> 112 19940302 0.000775233
>> 112 19940303 0.000227017
>> 112 19940304 0.000227854
>> 112 19940307 0.000782814
>> 112 19940308 0.000229164
>> 112 19940309 0.000787033
>> 112 19940310 0.000784049
>> 112 19940311 0.000228984
>> 112 19940314 0.00078697
>> 112 19940315 0.000782567
>> 112 19940316 0.000228516
>> 112 19940317 0.000786347
>> 112 19940318 0.000229236
>> 112 19940321 0.000230107
>> 112 19940322 0.000792689
>> 112 19940323 0.000787284
>> 112 19940324 0.000787221
>> 112 19940325 0.000227978
>> 112 19940328 0.000228713
>> 112 19940329 0.000228894
>> 112 19940330 0.000229255
>> 112 19940331 0.000231003
>> 112 19940401 0.000796567
>> 112 19940404 0.000790668
>> 112 19940405 0.00078195
>> 112 19940406 0.000780475
>> 112 19940407 0.000228355
>> 112 19940408 0.000781723
>> 112 19940411 0.000775741
>> 112 19940412 0.000226647
>> 112 19940413 0.000778876
>> 112 19940414 0.000777336
>> 112 19940415 0.000775253
>> 112 19940418 0.000226362
>> 112 19940419 0.000779554
>> 112 19940420 0.000774824
>> 112 19940421 0.000225582
>> 112 19940422 0.000225724
>> 112 19940425 0.000773361
>> 112 19940426 0.0002256
>> 112 19940427 0.000776416
>> 113 19940428 0.000280542
>> 113 19940429 0.000964148
>> 113 19940103 0.000962654
>> 113 19940104 0.000281768
>> 113 19940105 0.000962219
>> 113 19940106 0.000961965
>> 113 19940107 0.000958602
>> 113 19940110 0.000280056
>> 113 19940111 0.000956348
>> 113 19940112 0.000952171
>> 113 19940113 0.000948176
>> 113 19940114 0.000275607
>> 113 19940117 0.000275773
>> 113 19940118 0.000276738
>> 113 19940119 0.000947068
>> 113 19940120 0.000940959
>> 113 19940121 0.000275224
>> 113 19940124 0.000948489
>> 113 19940125 0.000940076
>> 113 19940126 0.0009309
>> 113 19940127 0.000269955
>> 113 19940128 0.000270328
>> 113 19940131 0.000924234
>> 113 19940201 0.000924038
>> 113 19940202 0.000269088
>> 113 19940203 0.000270247
>> 113 19940204 0.000270562
>> 113 19940207 0.00092656
>> 113 19940208 0.000921819
>> 113 19940209 0.000920361
>> 113 19940210 0.000268958
>> 113 19940211 0.000924758
>> 113 19940214 0.000266768
>> 113 19940215 0.000911325
>> 113 19940216 0.000909294
>> 113 19940217 0.000905887
>> 113 19940218 0.000262919
>> 113 19940221 0.000262978
>> 113 19940222 0.000263189
>> 113 19940223 0.000904439
>> 113 19940224 0.000263512
>> 113 19940225 0.000906184
>> 113 19940228 0.000265198
>> 113 19940301 0.000906126
>> 113 19940302 0.000264357
>> 113 19940303 0.000265392
>> 113 19940304 0.000912495
>> 113 19940307 0.000910641
>> 113 19940308 0.000266143
>> 113 19940309 0.000910113
>> 113 19940310 0.000909277
>> 113 19940311 0.000905056
>> 113 19940314 0.00090285
>> 113 19940315 0.000898831
>> 113 19940316 0.000896118
>> 113 19940317 0.000261294
>> 113 19940318 0.000892563
>> 113 19940321 0.000890852
>> 113 19940322 0.00088639
>> 113 19940323 0.000258509
>> 113 19940324 0.000260286
>> 113 19940325 0.000889354
>> 113 19940328 0.000888373
>> 113 19940329 0.000885049
>> 113 19940330 0.000259116
>> 113 19940331 0.000259474
>> 113 19940401 0.000260316
>> 113 19940404 0.000897493
>> 113 19940405 0.000894592
>> 113 19940406 0.000260435
>> 113 19940407 0.000260989
>> 113 19940408 0.000262061
>> 113 19940411 0.000262262
>> 113 19940412 0.000263604
>> 113 19940413 0.000908682
>> 113 19940414 0.000265348
>> 113 19940415 0.000265637
>> 113 19940418 0.00026862
>> 113 19940419 0.000918882
>> 113 19940420 0.000909904
>> 113 19940421 0.000901725
>> 113 19940422 0.000900062
>> 113 19940425 0.000893547
>> 113 19940426 0.000260899
>> Here, the 1st column is the name of the countries panel which I identified
>> in number, 2nd is the date column arranged in yyyy-mm-dd format, and 3rd is
>> the daily variable for each country for 4 months of 1994. I need to take
>> the monthly average of spread variable for each country (i.e. noted as 111,
>> 112, and 113) in the above example. In short, my monthly spread variable
>> should be sorted on the basis of both PERMNO (i.e. country identifier), and
>> months of the particular year.
>>
>> Please educate me that in this data format, how the average code can be
>> written?
>>
>> Thank you very much.
>>
>>
>>
>> [image: Mailtrack]
>> <
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
>>>
>> Sender
>> notified by
>> Mailtrack
>> <
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
>>>
>> 10/20/19,
>> 05:31:23 PM
>>
>> On Mon, Sep 16, 2019 at 4:31 PM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>>
>>> Original email did not come through (some problems with formating).
>>>
>>> Hi
>>>
>>> No, on contrary. I **am** suggesting to change date column to real date
>>> asi it is easy to handle with appropriate functions.
>>>
>>> Here are some fake data
>>>
>>>> str(spdat)
>>> 'data.frame':   260 obs. of  3 variables:
>>> $ dates   : Date, format: "1995-01-01" "1995-01-02" "1995-01-03"
>>> "1995-01-04" ...
>>> $ coutryA : num  0.188 0.405 -0.107 -0.596 -0.529 ...
>>> $ countryB: num  9.4 10.76 11.24 8.26 10.71 ..
>>>
>>>> head(spdat)
>>>         dates    coutryA  countryB
>>> 1 1995-01-01  0.1875060  9.402851
>>> 2 1995-01-02  0.4045193 10.755112
>>> 3 1995-01-03 -0.1073904 11.243663
>>> 4 1995-01-04 -0.5959683  8.256424
>>> 5 1995-01-05 -0.5293772 10.705431
>>> 6 1995-01-06 -0.2228029 10.171461
>>>
>>> First I melt it
>>> spdat.m <- melt(spdat, id.var="dates")
>>>
>>>> head(spdat.m)
>>>         dates variable      value
>>> 1 1995-01-01  coutryA  0.1875060
>>> 2 1995-01-02  coutryA  0.4045193
>>> 3 1995-01-03  coutryA -0.1073904
>>> 4 1995-01-04  coutryA -0.5959683
>>> 5 1995-01-05  coutryA -0.5293772
>>> 6 1995-01-06  coutryA -0.2228029
>>>
>>> I do aggregation
>>>
>>>> spdat.ag <- aggregate(spdat.m$value, list(format(spdat.m$dates,
>>> "%m.%Y"), spdat.m$variable), mean)
>>>
>>> And now I use dcast  to get required result.
>>>
>>>> dcast(spdat.ag, Group.1~Group.2)
>>> Using x as value column: use value.var to override.
>>>     Group.1      coutryA  countryB
>>> 1  01.1995  0.098688137 10.177696
>>> 2  02.1995  0.352264682  9.609261
>>> 3  03.1995  0.155521876 10.043503
>>> 4  04.1995 -0.166092393 10.129844
>>> 5  05.1995  0.164665188 10.308275
>>> 6  06.1995  0.260633585 10.210129
>>> 7  07.1995  0.003671979 10.549016
>>> 8  08.1995  0.045295990 10.087435
>>> 9  09.1995 -0.145488206  9.689876
>>> 10 10.1995 -0.225645950  9.743744
>>> 11 11.1995  0.030273383 10.025435
>>> 12 12.1995  0.043557468 10.105626
>>>
>>> Cheers
>>> Petr
>>>
>>> Here are the data.
>>>
>>>> dput(spdat)
>>> spdat  <- structure(list(dates = structure(c(9131, 9132, 9133, 9134,
>> 9135,
>>> 9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146,
>>> 9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166,
>>> 9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177,
>>> 9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196,
>>> 9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207,
>>> 9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226,
>>> 9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237,
>>> 9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258,
>>> 9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269,
>>> 9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288,
>>> 9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299,
>>> 9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318,
>>> 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329,
>>> 9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350,
>>> 9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361,
>>> 9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380,
>>> 9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391,
>>> 9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411,
>>> 9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422,
>>> 9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442,
>>> 9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453,
>>> 9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472,
>>> 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483,
>>> 9484, 9151), class = "Date"), coutryA = c(0.187506004416315,
>>> 0.404519257417805, -0.107390371811605, -0.595968278805544,
>>> -0.529377240936012,
>>> -0.222802921207767, 0.413182392872818, 0.689673026532298,
>>> -1.2768723266992,
>>> -0.506308625809406, 0.113859233745174, -0.0963423819877653,
>>> 0.323987304768398,
>>> 1.63846917270538, 0.893233423250338, 0.297732439150487,
>> 0.949323101836486,
>>> -0.599518074708052, 0.366372319197032, -2.25734971953878,
>>> -0.190971733204918,
>>> -0.0874143568874351, 1.46699645184047, 0.00702170238687361,
>>> 0.11221346278474,
>>> -0.8060359607624, 0.340842350476532, 0.798838328074708,
>> 0.449214745851041,
>>> -0.664972890558734, 0.521830282184173, -1.35020467264521,
>>> -0.95240631225826,
>>> 1.25607320999849, 1.57018988549379, 0.99477900888445, -0.936218787378207,
>>> 1.48489932847779, 0.529222943794807, 0.0995675049147771,
>>> 0.477770516727839,
>>> 1.64567253670186, -0.0212651530684566, 0.558952796713992,
>>> 0.0409979382929057,
>>> 0.428675380654606, 0.0919422583362682, -0.819694497340459,
>>> 1.23998830450888,
>>> 0.607498144489643, -1.27724580163097, 1.41634774644371,
>>> -0.579094515769707,
>>> 2.02039606694223, 0.0740478208705996, -1.69826944583929,
>>> -0.321482399813063,
>>> -0.489198601400924, 2.0066750686302, -1.90624857752322, 0.46762405849973,
>>> 1.31264724137396, -0.0473627194710677, 0.141362267796145,
>>> 0.329709761206515,
>>> 0.518454586458572, -1.39489985851779, -0.388303591187678,
>>> -0.668922704543522,
>>> 0.0735115674875065, 1.30737242978235, 0.198503397980751,
>>> 0.257831448122427,
>>> -1.31173539205588, -1.45147941969116, 0.359725782295977,
>>> 0.612882118056585,
>>> -0.0733768753346202, -0.508349204402508, 1.35776663767231,
>>> 0.997807735669086,
>>> -1.41717534266382, -0.894170593324238, -0.68578120845151,
>>> -0.211509378018794,
>>> 0.436738904337909, -1.46932152770435, 0.0817388759874159,
>>> -0.0389350881653141,
>>> 0.709198476466861, -0.963669144724435, -0.548607422521798,
>>> -0.896886885575286,
>>> 0.322231150840934, 1.37327611339939, 0.0310213133870952,
>>> 0.796577750757324,
>>> -0.2010067423637, -0.241723752424226, 1.37547329580654,
>> -1.15382202538982,
>>> 0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473,
>>> -0.536393730924719, -0.45845011727266, 1.10226256157127,
>>> -0.385596991265563,
>>> 3.20218061566932, -1.25865250042183, -0.13613128784276,
>> 0.483329357746514,
>>> -0.597187329618306, 0.710977603908319, -1.07945708269043,
>>> -0.477626236401394,
>>> 1.51034914684104, 2.35886426985999, -0.0250526828683629,
>>> -0.29439443478131,
>>> 0.665774016744828, 0.464027472251246, 0.226658374792016,
>>> -0.802597030454373,
>>> 0.825517059805602, -1.11293193130819, -1.27677400513873,
>> 1.60776237113347,
>>> 1.12490009531342, 0.95767047134623, 0.0475745549797055,
>>> -0.0591587460876868,
>>> -0.690617365240342, -1.62111622938126, -1.3545210707469, 1.8607927043106,
>>> 0.764367674339969, 1.49261525602638, 0.549570728337346,
>> -1.29658399741794,
>>> -1.6289903797869, 0.00573336252135834, 0.0300702149640632,
>>> 0.440810830115721,
>>> 0.663568666361326, -0.126685900835146, -0.00221628368438927,
>>> 0.815321995886579, -0.499280888368945, -0.271814047751667,
>>> -0.071025546459042,
>>> 1.73165491816826, -0.0294770299043331, 0.833605607221529,
>>> -0.670108794857159,
>>> -0.303323318026829, 1.29039844459134, -0.818806702120603,
>>> -0.445515595649677,
>>> -0.0128796557666887, 0.320923705586147, 0.230597275812536,
>>> -1.54009153212366,
>>> -0.294702981688559, 0.581209734391958, 0.121384768986639,
>>> 0.502914098451111,
>>> -1.59018268505718, -0.635101104166451, 1.48005776676403,
>>> -0.25631761189957,
>>> 0.171947814411552, 0.444646195980014, 0.172655758440111,
>>> -0.00432159794094836,
>>> -0.549321974240026, 0.585055026451421, -1.22813371480849,
>>> 0.846807540195381,
>>> 0.319629441352597, 0.393525732059709, -1.40275675444594,
>> 1.11062585584811,
>>> 0.214809571213853, -0.636432711800391, -0.283087127251573,
>>> -1.46385553207618,
>>> 0.436928676930225, -1.34231945433777, 0.451281957595763,
>>> -0.523155001924496,
>>> -2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993,
>>> 0.346147428691405, -0.464527560160041, 0.337233933370495,
>>> 1.11331396366389,
>>> -1.00060600083316, -0.734784444487169, 1.40476315358621,
>> 1.01671092179193,
>>> -0.0144306250829694, -0.923555930346906, -1.02275966525015,
>>> 0.619422010219383,
>>> 0.603484309754755, -0.774553813657576, 0.0932792545556387,
>>> -0.651884521428279,
>>> -0.61965612647073, -1.22104834441579, -1.31439612639271,
>>> -2.87707752518163,
>>> -0.0343801084491906, -0.640678302378492, -1.38653452986558,
>>> 0.884963139028743,
>>> -0.657454283462004, 0.462842665244993, -0.20881674837534,
>> 0.6345884135548,
>>> 0.707165108434729, -0.162090928425892, -0.998662309785188,
>>> 1.3130254639318,
>>> 0.191890764940071, -0.0493619237876962, -0.55183232511689,
>>> 0.470263932874487,
>>> -0.217088645692971, 0.231550037620628, -0.530406537266415,
>>> -0.616522469083808,
>>> 0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474,
>>> 0.766584887163714, -0.259803384094296, -0.402463714097741,
>>> -0.0229799209735185,
>>> -0.259677990559218, -1.41529707261105, 0.191362852138627,
>>> 1.54483266684747,
>>> -1.17947655378489, -0.426265411073274, 0.723010460481118,
>>> 1.37405142869537,
>>> -0.374771207936141, 0.0513905365832423, -0.369432731236118,
>>> -0.945441984794364,
>>> 0.179506648255554, 0.31971255725438, -1.25117960937319, 2.46228549580083
>>> ), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434,
>>> 8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206,
>>> 10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519,
>>> 9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038,
>>> 10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564,
>>> 10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619,
>>> 10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189,
>>> 9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391,
>>> 7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189,
>>> 9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184,
>>> 10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424,
>>> 9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392,
>>> 11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254,
>>> 9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876,
>>> 9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516,
>>> 11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734,
>>> 9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537,
>>> 11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302,
>>> 10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422,
>>> 9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831,
>>> 10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824,
>>> 8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738,
>>> 10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299,
>>> 11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503,
>>> 9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066,
>>> 10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586,
>>> 10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215,
>>> 10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683,
>>> 11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267,
>>> 8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223,
>>> 10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979,
>>> 10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741,
>>> 8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731,
>>> 9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742,
>>> 10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558,
>>> 12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162,
>>> 11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478,
>>> 10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575,
>>> 8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547,
>>> 7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965,
>>> 11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435,
>>> 12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739,
>>> 10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715,
>>> 10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454,
>>> 10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477,
>>> 8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993,
>>> 10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978,
>>> 9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548,
>>> 8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947,
>>> 9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303,
>>> 10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661,
>>> 9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735,
>>> 8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765,
>>> 10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694,
>>> 10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637,
>>> 11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512,
>>> 9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853,
>>> 8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322,
>>> 9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782,
>>> 10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681,
>>> 9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337,
>>> 10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034,
>>> 10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427,
>>> 9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933,
>>> 10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191,
>>> 9.59278722974697)), row.names = c(NA, -260L), class = "data.frame")
>>>
>>>
>>>
>>>
>>>
>>> From: Subhamitra Patra <mailto:subhamitra.patra using gmail.com>
>>> Sent: Friday, September 13, 2019 3:59 PM
>>> To: PIKAL Petr <mailto:petr.pikal using precheza.cz>; r-help mailing list
>>> <mailto:r-help using r-project.org>
>>> Subject: Re: [R] Query about calculating the monthly average of daily
>> data
>>> columns
>>>
>>> Dear PIKAL,
>>>
>>> Thank you very much for your suggestion.
>>>
>>> I tried your previous suggested code and getting the average value for
>>> each month for both country A, and B. But in your recent email, you are
>>> suggesting not to change the date column to real date. If I am going
>>> through your recently suggested code, i.e.
>>>
>>>   "aggregate(value column, list(format(date column, "%m.%Y"), country
>>> column), mean)"
>>>
>>> I am getting an Error that "aggregate(value, list(format(date, "%m.%Y"),
>>> country), mean) : object 'value' not found".
>>>
>>> Here, my query "may I need to define the date column, country column, and
>>> value column separately?"
>>>
>>> Further, I need something the average value result like below in the data
>>> frame
>>>
>>> Month       Country A   Country B
>>> Jan 1994    26.66         35.78
>>> Feb 1994    26.13         29.14
>>>
>>> so that it will be easy for me to export to excel, and to use for the
>>> further calculations.
>>>
>>> Please suggest me in this regard.
>>>
>>> Thank you.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
>>> Sender notified by
>>>
>>>
>> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
>>> 09/13/19, 07:22:53 PM
>>>
>>>
>>>
>>> On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto:
>> petr.pikal using precheza.cz>
>>> wrote:
>>> Hi
>>>
>>> I am almost 100% sure that you would spare yourself much trouble if you
>>> changed your date column to real date
>>>
>>> ?as.Date
>>>
>>> reshape your wide format to long one
>>> library(reshape2)
>>> ?melt
>>>
>>> to get 3 column data.frame with one date column, one country column and
>>> one value column
>>>
>>> use ?aggregate and ?format to get summary value
>>>
>>> something like
>>> aggregate(value column, list(format(date column, "%m.%Y"), country
>>> column), mean)
>>>
>>> But if you insist to scratch your left ear with right hand accross your
>>> head, you could continue your way.
>>>
>>> Cheers
>>> Petr
>>>
>>>> -----Original Message-----
>>>> From: R-help <mailto:r-help-bounces using r-project.org> On Behalf Of
>>> Subhamitra
>>>> Patra
>>>> Sent: Friday, September 13, 2019 3:20 PM
>>>> To: Jim Lemon <mailto:drjimlemon using gmail.com>; r-help mailing list
>>> <r-help using r-
>>>> http://project.org>
>>>> Subject: Re: [R] Query about calculating the monthly average of daily
>>> data
>>>> columns
>>>>
>>>> Dear Sir,
>>>>
>>>> Yes, I understood the logic. But, still, I have a few queries that I
>>> mentioned
>>>> below your answers.
>>>>
>>>> "# if you only have to get the monthly averages, it can be done this
>> way
>>>>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
>>>>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>>>>>
>>>>> B. Here, I need to define the no. of months, and years separately,
>>> right?
>>>>> or else what 2, and 3 (in bold) indicates?
>>>>>
>>>>
>>>> To get the grouping variable of sequential months that you want, you
>> only
>>>> need the month and year values of the dates in the first column. First
>> I
>>> used
>>>> the "strsplit" function to split the date field at the hyphens, then
>> used
>>>> "sapply" to extract ("[") the second (month) and *third (year)* parts
>> as
>>> two
>>>> new columns. Because you have more than one year of data, you need the
>>>> year values or you will group all Januarys, all Februarys and so on.
>>>> Notice how I pass both of the new columns as a list (a data frame is a
>>> type of
>>>> list) in the call to get the mean of each month.
>>>>
>>>> 1. Here, as per my understanding, the "3" indicates the 3rd year,
>> right?
>>>> But, you showed an average for 2 months of the same year. Then, what
>> "3"
>>>> in the  spdat$year object indicate?
>>>>
>>>>
>>>> C. From this part, I got the exact average values of both January and
>>>>> February of 1994 for country A, and B. But, in code, I have a query
>>>>> that I need to define  spdat$returnA, and  spdat$returnB separately
>>>>> before writing this code, right? Like this, I need to define for each
>>>>> 84 countries separately with their respective number of months, and
>>>>> years before writing this code, right?
>>>>>
>>>>
>>>> I don't think so. Because I don't know what your data looks like, I am
>>>> guessing that for each row, it has columns for each of the 84
>> countries.
>>> I
>>>> don't know what these columns are named, either. Maybe:
>>>>
>>>> date             Australia   Belarus   ...    Zambia
>>>> 01/01/1994   20             21                 22
>>>> ...
>>>>
>>>> Here, due to my misunderstanding about the code, I was wrong. But, what
>>>> data structure you guessed, it is absolutely right that for each row, I
>>> have
>>>> columns for each of the 84 countries. So, I think, I need to define the
>>> date
>>>> column with no. of months, and years once for all the countries.
>>>> Therefore, I got my answer to the first and third question in the
>>> previous
>>>> email (what you suggested) that I no need to define the column of each
>>>> country, as the date, and no. of observations are same for all
>> countries.
>>>> But, the no. of days are different for each month, and similarly, for
>>> each
>>>> year. So, I think I need to define date for each year separately.
>>> Hence, I have
>>>> given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
>>> have
>>>> written the following code. Please correct me in case I am wrong.
>>>>
>>>>   spdat<-data.frame(
>>>>
>>>>
>>>
>> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
>>>> ep(1,21),rep(2,20),
>>>> rep(3,23), rep(4,21),
>>>>
>>>
>> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
>>>> ,22)
>>>> ),rep(1994,260)
>>>>   dates1=
>>>>
>>>
>> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
>>>> 2),rep(2,20),
>>>> rep(3,23), rep(4,20),
>>>>
>>>
>> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
>>>> ,21)
>>>> ),rep(1995,259) ,sep="-")
>>>>
>>>> Concerning the exporting of structure of the dataset to excel, I will
>>> have
>>>> 12*84 matrix. But, please suggest me the way to proceed for the large
>>>> sample. I have mentioned below what I understood from your code. Please
>>>> correct me if I am wrong.
>>>> 1. I need to define the date for each year as the no. of days in each
>>> month
>>>> are different for each year (as mentioned in my above code). For
>>> instance, in
>>>> my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
>>>> 2. Need to define the date column as character.
>>>> 3. Need to define the monthly average for each month, and year. So, now
>>>> code will be as follows.
>>>>
>>>
>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
>>>>    %%%%As I need all months average sequentially.
>>>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
>>>>
>>>> Here, this meaning of "3", I am really unable to get.
>>>>
>>>> 4. Need to define each country with each month and year as mentioned in
>>>> the last part of your code.
>>>>
>>>> Please suggest me in this regard.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
>>>> mpaign=signaturevirality5&>
>>>> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
>>>> mpaign=signaturevirality5&>
>>>> 09/13/19,
>>>> 06:41:41 PM
>>>>
>>>> On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto:drjimlemon using gmail.com
>>>
>>> wrote:
>>>>
>>>>> Hi Subhamitra,
>>>>> I'll try to write my answers adjacent to your questions below.
>>>>>
>>>>> On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
>>>>> mailto:subhamitra.patra using gmail.com> wrote:
>>>>>
>>>>>> Dear Sir,
>>>>>>
>>>>>> Thank you very much for your suggestion.
>>>>>>
>>>>>> Yes, your suggested code worked. But, actually, I have data from 3rd
>>>>>> January 1994 to 3rd August 2017 for very large (i.e. for 84
>>>>>> countries) sample. From this, I have given the example of the years
>>>>>> up to 2000. Before applying the same code for the long 24 years, I
>>>>>> want to learn the logic behind the code. Actually, some part of the
>>>>>> code is not understandable to me which I mentioned in the bold
>> letter
>>> as
>>>> follows.
>>>>>>
>>>>>> "spdat<-data.frame(
>>>>>>
>>>   dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
>>>>>>    returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
>>>>>>
>>>>>> A. Here, I need to define the no. of days in a month, and the no. of
>>>>>> countries name separately, right? But, what is meant by 15:50, and
>>>>>> 10:45 in return A, and B respectively?
>>>>>>
>>>>>
>>>>> To paraphrase Donald Trump, this is FAKE DATA! I have no idea what
>> the
>>>>> real values of return are, so I made them up using the "sample"
>>> function.
>>>>> However, this is not meant to mislead anyone, just to show how
>>>>> whatever numbers are in your data can be used in calculations. The
>>>>> colon (":") operator creates a sequence of numbers starting with the
>>>>> one to the left and ending with the one to the right.
>>>>>
>>>>>>
>>>>>> "# if you only have to get the monthly averages, it can be done this
>>>>>> way
>>>>>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
>>>>>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
>>>>>>
>>>>>> B. Here, I need to define the no. of months, and years separately,
>>> right?
>>>>>> or else what 2, and 3 (in bold) indicates?
>>>>>>
>>>>>
>>>>> To get the grouping variable of sequential months that you want, you
>>>>> only need the month and year values of the dates in the first column.
>>>>> First I used the "strsplit" function to split the date field at the
>>>>> hyphens, then used "sapply" to extract ("[") the second (month) and
>>>>> third (year) parts as two new columns. Because you have more than one
>>>>> year of data, you need the year values or you will group all
>> Januarys,
>>>>> all Februarys and so on. Notice how I pass both of the new columns as
>>>>> a list (a data frame is a type of
>>>>> list) in the call to get the mean of each month.
>>>>>
>>>>>>
>>>>>> "# get the averages by month and year - is this correct?
>>>>>> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
>>>>>> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
>>>>>>
>>>>>> C. From this part, I got the exact average values of both January
>> and
>>>>>> February of 1994 for country A, and B. But, in code, I have a query
>>>>>> that I need to define  spdat$returnA, and  spdat$returnB separately
>>>>>> before writing this code, right? Like this, I need to define for
>> each
>>>>>> 84 countries separately with their respective number of months, and
>>>>>> years before writing this code, right?
>>>>>>
>>>>>
>>>>> I don't think so. Because I don't know what your data looks like, I
>> am
>>>>> guessing that for each row, it has columns for each of the 84
>>>>> countries. I don't know what these columns are named, either. Maybe:
>>>>>
>>>>> date             Australia   Belarus   ...    Zambia
>>>>> 01/01/1994   20             21                 22
>>>>> ...
>>>>>
>>>>>
>>>>>> Yes, after obtaining the monthly average for each country's data, I
>>>>>> need to use them for further calculations. So, I want to export the
>>>>>> result to excel. But, until understanding the code, I think I
>> willn't
>>>>>> able to apply for the entire sample, and cannot be able to discuss
>>>>>> the format of the resulted column to export to excel.
>>>>>>
>>>>>
>>>>> Say that we perform the grouped mean calculation for the first two
>>>>> country columns like this:
>>>>> monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
>>>>> monmeans
>>>>>      Australia  Belarus
>>>>> [1,]  29.70000 30.43333
>>>>> [2,]  34.17857 27.39286
>>>>>
>>>>> We are presented with a 2x2 matrix of monthly means in just the
>> format
>>>>> someone might use for importing into Excel. The first row is January
>>>>> 1994, the second February 1994 and so on. By expanding the columns to
>>>>> include all the countries in your data, You should have the result
>> you
>>> want.
>>>>>
>>>>> Jim
>>>>>
>>>>
>>>>
>>>> --
>>>> *Best Regards,*
>>>> *Subhamitra Patra*
>>>> *Phd. Research Scholar*
>>>> *Department of Humanities and Social Sciences* *Indian Institute of
>>>> Technology, Kharagpur*
>>>> *INDIA*
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>> see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>> guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
>>> partnerů PRECHEZA a.s. jsou zveřejněny na:
>>> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
>>> about processing and protection of business partner’s personal data are
>>> available on website:
>>> https://www.precheza.cz/en/personal-data-protection-principles/
>>> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
>>> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
>>> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
>>> documents attached to it may be confidential and are subject to the
>> legally
>>> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Subhamitra Patra
>>> Phd. Research Scholar
>>> Department of Humanities and Social Sciences
>>> Indian Institute of Technology, Kharagpur
>>> INDIA
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> *Best Regards,*
>> *Subhamitra Patra*
>> *Phd. Research Scholar*
>> *Department of Humanities and Social Sciences*
>> *Indian Institute of Technology, Kharagpur*
>> *INDIA*
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list