[R] RW 0.64.2 substring() string truncation?

T.E.Diaz tediaz at gwis2.circ.gwu.edu
Tue Aug 3 15:25:28 CEST 1999

Thanks to Prof Ripley and Peter Dalgaard for the "#define MAXELTSIZE 
8192" clarification to my earlier post (attached below).

As to the the need for a string of nchar()>8192, I am using it to 
store the alphanumeric names of FromNodes and ToNodes in a large 
generalized network. The optimization routine is implemented in C 
(actually translated from Fortran 77 using F2C and did some 
modifications) but the network representation is constructed in R.

(1) My first impulse was to use a pair of contiguous 
memories to store the FromNodes and ToNodes, pass their addresses to 
the C function through the .C() call, which then writes the solution 
SolFromNodes and SolToNodes in another pair of contiguous memories 
whose addresses were also passed in the .C() call. These four 
contiguous memories are represented in R as four character "vectors" 
each of length()=1 (node names are of fixed length). Inside the C 
function, I employ pointer arithmetic applied to each single string 
to access subsets of characters (the individual nodes).

(2) As suggested below by Prof Ripley, to overcome the 8192 
limitation, I can also use a vector representation of FromNodes, for 
example, with each element representing a single node. Inside the C 
function, I would then employ "pointer to character pointers" 
arithmetic to access individual nodes. 

Solution (1) is actually closer to the "array of characters"
representation of FromNodes (etc ...) in the original Fortran77 code,
and which, I was guessing, is the more efficient implementation (I 
shall know better after experimenting with (2)) from a process time 
viewpoint. We are dealing here with as large as 10,000 nodes each of 
6 characters long. The optimization routine will be implemented in a 
simulation function and a fraction of a second gain in efficiency in 
a single replicate would be nice. 

George Washington University
Washington, DC

From:          Prof Brian D Ripley <ripley at stats.ox.ac.uk>
> On Tue, 3 Aug 1999, T.E.Diaz wrote:
> > Can somebody tell me what exactly is going on below. Basically, I am 
> > running into some kind of "string truncation" problem when I try 
> > to get a substring starting past the 8192nd character (see sample 
> > session below). There doesn't appear to be any problem creating the 
> > string, and nchar() reports the correct size as constructed.
> substr/substring has a buffer size limit of 8192. Indeed, the include file
> says
> Defn.h:#define MAXELTSIZE 8192 /* The largest string size */
> One day this limit may be removed, but for now at least we could document
> it.  I am not at all clear why one would want to use a single string longer
> than 8192 chars: is it possible in your applications to use a vector of
> shorter strings instead?

From:          Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>
> Possibly - ;) - related to this:
> src/include/Defn.h:#define MAXELTSIZE 8192 /* The largest string
> size */
> There are a couple of fixed-size arrays in the code. We'll want to
> eradicate them at some point but it's pretty painful to do. Do you
> have a serious application for text strings of more than 8k length?
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list