[R] "Best" way to merge 300+ .5MB dataframes?
David Winsemius
dwinsemius at comcast.net
Tue Aug 12 08:07:13 CEST 2014
On Aug 11, 2014, at 8:01 PM, John McKown wrote:
> On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams <tea3rd at gmail.com> wrote:
>> Grant,
>>
>> Assuming all your filenames are something like file1.txt,
>> file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
>> the directory where your files are located...
>>
>> This will strip off the 1st lines, that is, your header lines:
>>
>> for file in *.txt;do
>> sed -i '1d'${file};
>> done
>>
>> Then, do this:
>>
>> cat *.txt > newfilename.txt
>>
>> Doing both should only take a few seconds, depending on your file sizes.
>>
>> Cheers!
>> Tom
>>
>
> Using sed hadn't occurred to me. I guess I'm just "awk-ward" <grin/>.
> A slightly different way would be:
>
> for file in *.txt;do
> sed '1d' ${file}
> done >newfilename.txt
>
> that way the original files are not modified. But it strips out the
> header on the 1st file as well. Not a big deal, but the read.table
> will need to be changed to accommodate that. Also, it creates an
> otherwise unnecessary intermediate file "newfilename.txt". To get the
> 1st file's header, the script could:
>
> head -1 >newfilename.txt
> for file in *.txt;do
> sed '1d' ${file}
> done >>newfilename.txt
>
> I really like having multiple answers to a given problem. Especially
> since I have a poorly implemented version of "awk" on one of my
> systems. It is the vendor's "awk" and conforms exactly to the POSIX
> definition with no additions. So I don't have the FNR built-in
> variable. Your implementation would work well on that system. Well, if
> there were a version of R for it. It is a branded UNIX system which
> was designed to be totally __and only__ POSIX compliant, with few
> (maybe no) extensions at all. IOW, it stinks. No, it can't be
> replaced. It is the z/OS system from IBM which is EBCDIC based and
> runs on the "big iron" mainframe, system z.
>
> --
On the Mac the awk equivalent is gawk. Within R you would use `system()` possibly using paste0() to construct a string to send.
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list