[R] strange strsplit gsub problem 0 is this a bug or a string length limitation?

Marc Schwartz marc_schwartz at me.com
Fri Jul 10 14:58:28 CEST 2009


On Jul 10, 2009, at 7:18 AM, tradenet wrote:

>
> I was working with the rmetrics portfolioBacktesting function and  
> dug into
> the code to try to find why my formula with 113 items, i.e. A1 thru  
> A113,
> was being truncated and I only get 85 items, not 113.
>
> Is it due to a string length limitation in R or is it a bug in the  
> strsplit
> or gsub functions, or in my string?
>
> I'd very much appreciate any suggestions
>
>
> ============Input script:
>
> backtestFormula<- 
> SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113
> benchmarkName = as.character(backtestFormula)[2]
> print(as.character(backtestFormula)[3])
> print(benchmarkName)
>    assetsNames <- strsplit(gsub(" ", "",  
> as.character(backtestFormula)[3]),
> "\\+")[[1]]
>    nAssets = length(assetsNames)
> print(nAssets)
> list(assetsNames)
>
> ===============output:
>
>
>> backtestFormula<- 
>> SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113
>
>> benchmarkName = as.character(backtestFormula)[2]
>
>> print(benchmarkName)
> [1] "SPX"
>
>> print(as.character(backtestFormula)[3])
> [1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 +  
> A13 +
> A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24 +  
> A25 + A26
> + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 + A36 + A37 +  
> A38 +
> A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47 + A48 + A49 +  
> A50 + A51
> + A52 + A53 + A54 + A55 + A56 + A57 + A58 + A59 + A60 + A61 + A62 +  
> A63 +
> A64 + A65 + A66 + A67 + A68 + A69 + A70 + A71 + A72 + A73 + A74 +  
> A75 + A76
> + A77 + A78 + A79 + A80 + A81 + A82 + A83 + A84 + A85 + "
>
>> assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula) 
>> [3]),
>> "\\+")[[1]]
>
>> print(nAssets)
> [1] 85
>
>> nAssets = length(assetsNames)
>
>> print(nAssets)
> [1] 85
>
>> list(assetsNames)
> [[1]]
> [1] "A1"  "A2"  "A3"  "A4"  "A5"  "A6"  "A7"  "A8"  "A9"  "A10"  
> "A11" "A12"
> "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20" "A21" "A22" "A23"  
> "A24"
> "A25" "A26" "A27" "A28" "A29" "A30" "A31" "A32" "A33"
> [34] "A34" "A35" "A36" "A37" "A38" "A39" "A40" "A41" "A42" "A43"  
> "A44" "A45"
> "A46" "A47" "A48" "A49" "A50" "A51" "A52" "A53" "A54" "A55" "A56"  
> "A57"
> "A58" "A59" "A60" "A61" "A62" "A63" "A64" "A65" "A66"
> [67] "A67" "A68" "A69" "A70" "A71" "A72" "A73" "A74" "A75" "A76"  
> "A77" "A78"
> "A79" "A80" "A81" "A82" "A83" "A84" "A85"



You appear to be bumping up against the 500 character length limit of  
as.character() when used with R language objects.

Review the Note in ?as.character:

   "as.character truncates components of language objects to 500  
characters (was about 70 before 1.3.1)."



It is not a string length limitation or a bug in strsplit():

 > paste("A", 1:113, sep = "", collapse = " + ")
[1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 +  
A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24  
+ A25 + A26 + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 +  
A36 + A37 + A38 + A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47  
+ A48 + A49 + A50 + A51 + A52 + A53 + A54 + A55 + A56 + A57 + A58 +  
A59 + A60 + A61 + A62 + A63 + A64 + A65 + A66 + A67 + A68 + A69 + A70  
+ A71 + A72 + A73 + A74 + A75 + A76 + A77 + A78 + A79 + A80 + A81 +  
A82 + A83 + A84 + A85 + A86 + A87 + A88 + A89 + A90 + A91 + A92 + A93  
+ A94 + A95 + A96 + A97 + A98 + A99 + A100 + A101 + A102 + A103 + A104  
+ A105 + A106 + A107 + A108 + A109 + A110 + A111 + A112 + A113"


 > nchar(paste("A", 1:113, sep = "", collapse = " + "))
[1] 680


 > strsplit(paste("A", 1:113, sep = "", collapse = " + "), " \\+ ")[[1]]
   [1] "A1"   "A2"   "A3"   "A4"   "A5"   "A6"   "A7"   "A8"   "A9"
  [10] "A10"  "A11"  "A12"  "A13"  "A14"  "A15"  "A16"  "A17"  "A18"
  [19] "A19"  "A20"  "A21"  "A22"  "A23"  "A24"  "A25"  "A26"  "A27"
  [28] "A28"  "A29"  "A30"  "A31"  "A32"  "A33"  "A34"  "A35"  "A36"
  [37] "A37"  "A38"  "A39"  "A40"  "A41"  "A42"  "A43"  "A44"  "A45"
  [46] "A46"  "A47"  "A48"  "A49"  "A50"  "A51"  "A52"  "A53"  "A54"
  [55] "A55"  "A56"  "A57"  "A58"  "A59"  "A60"  "A61"  "A62"  "A63"
  [64] "A64"  "A65"  "A66"  "A67"  "A68"  "A69"  "A70"  "A71"  "A72"
  [73] "A73"  "A74"  "A75"  "A76"  "A77"  "A78"  "A79"  "A80"  "A81"
  [82] "A82"  "A83"  "A84"  "A85"  "A86"  "A87"  "A88"  "A89"  "A90"
  [91] "A91"  "A92"  "A93"  "A94"  "A95"  "A96"  "A97"  "A98"  "A99"
[100] "A100" "A101" "A102" "A103" "A104" "A105" "A106" "A107" "A108"
[109] "A109" "A110" "A111" "A112" "A113"

HTH,

Marc Schwartz




More information about the R-help mailing list