[R] [FORGED] Newbie Question on R versus Matlab/Octave versus C
Alan Feuerbacher
@|@n|00 @end|ng |rom comc@@t@net
Wed Jan 30 17:16:52 CET 2019
On 1/29/2019 11:50 PM, Jeff Newmiller wrote:
Thanks very much for providing these coding examples! I think this is a
good way to learn some R.
Alan
> On Tue, 29 Jan 2019, Alan Feuerbacher wrote:
>
>> On 1/28/2019 7:51 PM, Jeff Newmiller wrote:
>>> If you forge on with your preconceptions of how such a simulation
>>> should be implemented then you will be able to reproduce your failure
>>> just as spectacularly using R as you did using Octave.
>>
>> I think I've come to the same conclusion. :-)
>>
>>> It is crucial to employ vectorization of your algorithms if you want
>>> good performance with either Octave or R. That vectorization may
>>> either be over time or over separate simulations.
>>
>> Please explain further, if you don't mind. My background is not in
>> programming, but in analog microchip circuit design (I'm now retired).
>> Thus I'm a user of circuit simulators, not a programmer of them. Also,
>> I'm running this stuff on my home computers, either Linux or Windows
>> machines.
>>
>>> I am running simulations of a million cases of power plant
>>> performance over 25 years in about a minute. I know someone who used
>>> R to simulate a CFD river flow problem in a class in a few minutes,
>>> while others using Fortran or Matlab were struggling to get
>>> comparable runs completed in many hours. I believe the difference was
>>> in how the data were structured and manipulated more than the
>>> language that was being used. I think the strong capabilities for
>>> presenting results using R makes using it advantageous over Octave,
>>> though.
>>
>> After my failed attempt at using Octave, I realized that most likely
>> the main contributing factor was that I was not able to figure out an
>> efficient data structure to model one person. But C lent itself
>> perfectly to my idea of how to go about programming my simulation. So
>> here's a simplified pseudocode sort of example of what I did:
>
> Don't model one person... model an array of people.
>
>> To model a single reproducing woman I used this C construct:
>>
>> typedef struct woman {
>> int isAlive;
>> int isPregnant;
>> double age;
>> . . .
>> } WOMAN;
>
> # e.g.
> Nwomen <- 100
> women <- data.frame( isAlive = rep( TRUE, Nwomen )
> , isPregnant = rep( FALSE, Nwomen )
> , age = rep( 20, Nwomen )
> )
>
>> Then I allocated memory for a big array of these things, using the C
>> malloc() function, which gave me the equivalent of this statement:
>>
>> WOMAN women[NWOMEN]; /* An array of NWOMEN woman-structs */
>>
>> After some initialization I set up two loops:
>>
>> for( j=0; j<numberOfYears; j++) {
>> for(i=1; i< numberOfWomen; i++) {
>> updateWomen();
>> }
>> }
>
> for ( j in seq.int( numberOfYears ) {
> # let vectorized data storage automatically handle the other for loop
> women <- updateWomen( women )
> }
>
>> The function updateWomen() figures out things like whether the woman
>> becomes pregnant or gives birth on a given day, dies, etc.
>
> You can use your "fixed size" allocation strategy with flags indicating
> whether specific rows are in use, or you can only work with valid rows
> and add rows as needed for children... best to compute a logical vector
> that identifies all of the birthing mothers as a subset of the data
> frame, and build a set of children rows using the birthing mothers data
> frame as input, and then rbind the new rows to the updated women
> dataframe as appropriate. The most clear approach for individual
> decision calculations is the use of the vectorized "ifelse" function,
> though under certain circumstances putting an indexed subset on the left
> side of an assignment can modify memory "in place" (the
> functional-programming restriction against this is probably a foreign
> idea to a dyed-in-the-wool C programmer, but R usually prevents you from
> modifying the variable that was input to a function, automatically
> making a local copy of the input as needed in order to prevent such
> backwash into the caller's context).
>
>> I added other refinements that are not relevant here, such as random
>> variations of various parameters, using the GNU Scientific Library
>> random number generator functions.
>
> R has quite sophisticated random number generation by default.
>
>> If you can suggest a data construct in R or Octave that does something
>> like this, and uses your idea of vectorization, I'd like to hear it.
>> I'd like to implement it and compare results with my C implementation.
>>
>>> If your problems truly need a compiled language, the Rcpp package
>>> lets you mix C++ with R quite easily and then you get the best of
>>> both worlds. (C and Fortran are supported, but they are a bit more
>>> finicky to setup than C++).
>>
>> I don't know the answer to that, but perhaps you can help decide.
>>
>> Alan
>>
>>
>>> On January 28, 2019 4:00:07 PM PST, Alan Feuerbacher
>>> <alanf00 using comcast.net> wrote:
>>>> On 1/28/2019 4:20 PM, Rolf Turner wrote:
>>>>>
>>>>> On 1/29/19 10:05 AM, Alan Feuerbacher wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I recently learned of the existence of R through a physicist friend
>>>>>> who uses it in his research. I've used Octave for a decade, and C
>>>> for
>>>>>> 35 years, but would like to learn R. These all have advantages and
>>>>>> disadvantages for certain tasks, but as I'm new to R I hardly know
>>>> how
>>>>>> to evaluate them. Any suggestions?
>>>>>
>>>>> * C is fast, but with a syntax that is (to my mind) virtually
>>>>> incomprehensible. (You probably think differently about this.)
>>>>
>>>> I've been doing it long enough that I have little problem with it,
>>>> except for pointers. :-)
>>>>
>>>>> * In C, you essentially have to roll your own for all tasks; in R,
>>>>> practically anything (well ...) that you want to do has already
>>>>> been programmed up. CRAN is a wonderful resource, and there's
>>>> more
>>>>> on github.
>>>>>
>>>>> * The syntax of R meshes beautifully with *my* thought patterns;
>>>> YMMV.
>>>>>
>>>>> * Why not just bog in and try R out? It's free, it's readily
>>>> available,
>>>>> and there are a number of good online tutorials.
>>>>
>>>> I just installed R on my Linux Fedora system, so I'll do that.
>>>>
>>>> I wonder if you'd care to comment on my little project that prompted
>>>> this? As part of another project, I wanted to model population growth
>>>> starting from a handful of starting individuals. This is exponential in
>>>>
>>>> the long run, of course, but I wanted to see how a few basic parameters
>>>>
>>>> affected the outcome. Using Octave, I modeled a single person as a
>>>> "cell", which in Octave has a good deal of overhead. The program
>>>> basically looped over the entire population, and updated each person
>>>> according to the parameters, which included random statistical
>>>> variations. So when the total population reached, say 10,000, and an
>>>> update time of 1 day, the program had to execute 10,000 x 365 update
>>>> operations for each year of growth. For large populations, say 100,000,
>>>>
>>>> the program did not return even after 24 hours of run time.
>>>>
>>>> So I switched to C, and used its "struct" declaration and an array of
>>>> structs to model the population. This allowed the program to complete
>>>> in
>>>> under a minute as opposed to 24 hours+. So in line with your comments,
>>>> C
>>>> is far more efficient than Octave.
>>>>
>>>> How do you think R would fare in this simulation?
>>>>
>>>> Alan
>>>>
>>>>
>>>> ---
>>>> This email has been checked for viruses by Avast antivirus software.
>>>> https://www.avast.com/antivirus
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil using dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
More information about the R-help
mailing list