[BioC] Rsamtools Package Dereferences Symbolic Links

Martin Morgan mtmorgan at fhcrc.org
Wed Oct 17 18:54:47 CEST 2012


Thanks all for your contributions.

There are two things going on, both fixed in the devel version 1.11.2 (the best 
work-around is probably, as Lucas has discovered, is to arrange it so that the 
sym links and index files are not in the same directory).

- samtools expects index files _without_ .bai; Rsamtools now tries to tolerate 
index files when the user (e.g., Lucas and Vince) specifies a file with .bai by 
checking for and stripping .bai prior to opening the file.

- samtools doesn't do tilde expansion, i.e., ~/myfile.bam does not work. 
Rsamtools tried to help the user out by using path.expand (which does tilde 
expansion) and, for good measure, normalizePath, which would replace something 
like ../bams/mybam.bam with the full path to mybam.bam, as well as dereferencing 
symlinks. But samtools seems to know about ./ and ../ and seems to be happy with 
symlinks, so we no longer do normalizePath.

This also allows indexing in the directory in which the symlink occurs, rather 
than in the de-referenced directory.

I'd be interested in knowing if this causes alternative errors.

Martin

On 10/16/2012 10:10 AM, Lucas Swanson wrote:
> I can try to clarify with a specific example.
>
> My original (unindexed) bam file is in a directory to which I do not have write access:
> ~/R_tests/readonly_dir/original.bam
>
> So I create a symbolic link to it in a directory to which I do have write access:
> ~/R_tests/writable_dir/link_to_original.bam -> ~/R_tests/readonly_dir/original.bam
>
> And then I create an index for it in the writable directory:
> ~/R_tests/writable_dir/link_to_original.bam.bai
>
> Then I start up R and get the following:
> $ R --vanilla
>
> R version 2.14.2 (2012-02-29)
> Copyright (C) 2012 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>    Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> library(Rsamtools)
> Loading required package: IRanges
>
> Attaching package: ‘IRanges’
>
> The following object(s) are masked from ‘package:base’:
>
>      cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
>      pmin, pmin.int, rbind, rep.int, setdiff, table, union
>
> Loading required package: GenomicRanges
> Loading required package: Biostrings
>> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam", index="~/R_tests/writable_dir/link_to_original.bam.bai")
>> my.bam
> class: BamFile
> path: /home/lswanson/R_tests/readonly_dir/original.bam
> index: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
> isOpen: FALSE
>> open(my.bam)
> Error in open.BamFile(my.bam) : failed to load BAM index
>    file: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
> In addition: Warning message:
> In open.BamFile(my.bam) : [bam_index_load] fail to load BAM index.
>> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam", index="~/R_tests/writable_dir/link_to_original.bam")
>> my.bam
> class: BamFile
> path: /home/lswanson/R_tests/readonly_dir/original.bam
> index: /home/lswanson/R_tests/readonly_dir/original.bam
> isOpen: FALSE
>> open(my.bam)
> Error in open.BamFile(my.bam) : failed to load BAM index
>    file: /home/lswanson/R_tests/readonly_dir/original.bam
> In addition: Warning message:
> In open.BamFile(my.bam) : [bam_index_load] fail to load BAM index.
>
>
> Note that, even though I used the path "~/R_tests/writable_dir/link_to_original.bam" (note WRITABLE_DIR) when I created the BamFile object, when I view the details of the BamFile object, it has changed the path to "/home/lswanson/R_tests/readonly_dir/original.bam" (note READONLY_DIR).
>
> This change of the (symbolic link) writable_dir path that I pass to the BamFile constructor to the (target) readonly_dir path is what I mean by the dereferencing of the symbolic link path.
>
> Note that, if I change the name of the index file to:
> ~/R_tests/writable_dir/renamed_index.bam.bai
>
> I can get it to work, by excluding the ".bai" from the end of the "index" path I use in the BamFile constructor:
>> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam", index="~/R_tests/writable_dir/renamed_index.bam")
>> my.bam
> class: BamFile
> path: /home/lswanson/R_tests/readonly_dir/original.bam
> index: /home/lswanson/R_tests/writable_dir/renamed_index.bam
> isOpen: FALSE
>> open(my.bam)
>> my.bam
> class: BamFile
> path: /home/lswanson/R_tests/readonly_dir/original.bam
> index: /home/lswanson/R_tests/writable_dir/renamed_index.bam
> isOpen: TRUE
>
> The behaviour that I expect is that when I make a command like:
>> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam")
> Then the "path" attribute of my.bam should be "/home/lswanson/R_tests/writable_dir/link_to_original.bam", NOT "/home/lswanson/R_tests/readonly_dir/original.bam"
>
> It looks like another aspect of the problem is that the "index" parameter of the BamFile constructor is not actually expecting the full path to the index file, since it is automatically adding ".bai" to whatever path it is given (even if the path already ends with ".bai"). And the error message is a little misleading, since when it says:
> Error in open.BamFile(my.bam) : failed to load BAM index
>    file: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
> The path that it is actually trying to open as the index file is:
> /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai.bai
> Which, of course, does not exist.
>
> ~Lucas
>
> ________________________________________
> From: Vincent Carey [stvjc at channing.harvard.edu]
> Sent: Tuesday, October 16, 2012 8:19 AM
> To: Cook, Malcolm
> Cc: Lucas Swanson; bioconductor at r-project.org
> Subject: Re: [BioC] Rsamtools Package Dereferences Symbolic Links
>
> I did not find these notes to be particularly clear.  I knew that BamFile allows both file and index to be specified separately.
>
> In the following, ex1.bam is a symbolic link in current working folder, and ex1.bam.bai is the physical index file.
>
>> X = BamFile("ex1.bam", index="./ex1.bam.bai")
>> X
> class: BamFile
> path: /Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/ex1.bam
> index: /Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/FO.../ex1.bam.bai
> isOpen: FALSE
> yieldSize: NA
>> open(X)
> Error in open.BamFile(X) : failed to load BAM index
>    file: /Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/FOO/ex1.bam.bai
> In addition: Warning message:
> In open.BamFile(X) : [bam_index_load] fail to load BAM index.
>
> I suppose the "dereferencing" refers to the fact that FO... is not present in the path report on X
>
> I was surprised that the error was thrown.
>
>> sessionInfo()
> R Under development (unstable) (2012-10-07 r60889)
> Platform: x86_64-apple-darwin10.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.US-ASCII/en_US.US-ASCII/en_US.US-ASCII/C/en_US.US-ASCII/en_US.US-ASCII
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     tools     methods
> [8] base
>
> other attached packages:
> [1] Rsamtools_1.10.1     Biostrings_2.26.1    GenomicRanges_1.10.1
> [4] IRanges_1.16.2       BiocGenerics_0.4.0   BiocInstaller_1.8.2
> [7] weaver_1.24.0        codetools_0.2-8      digest_0.5.2
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-4.1  parallel_2.16.0 stats4_2.16.0   zlibbioc_1.4.0
>
>
> On Tue, Oct 16, 2012 at 11:00 AM, Cook, Malcolm <MEC at stowers.org<mailto:MEC at stowers.org>> wrote:
> +1
>
> I too have noticed this.  Further, consistently with this, if you use the
> Rsamtools package to create the indices to symlinks pointing to bamfiles,
> the indices are created in the target directory.
>
> I think if there is a code change to address this issue by allowing
> control over whether links are dereferenced, the DEFAULT should be NOT to
> dereference like this.
>
> --Malcolm Cook
>
>
> On 10/15/12 9:24 PM, "Lucas Swanson" <lswanson at bcgsc.ca<mailto:lswanson at bcgsc.ca>> wrote:
>
>> Hello,
>>
>> I am attempting to use your Rsamtools Bioconductor package.
>> Unfortunately, I am having a bit of trouble. You see, my BAM files are in
>> a directory to which I do not have write access, and are too large for me
>> to copy to my own directory. So I created symbolic links in my own
>> directory, pointing to the BAM files, and then indexed them in my own
>> directory. However, when I try to use these symbolic links the Rsamtools
>> package dereferences the links, and looks for the indexes in the original
>> directory (to which I do not have write access), rather than in my own
>> directory.
>>
>> Is there any way to prevent Rsamtools from dereferencing symbolic links?
>> (That is, not replacing paths to symbolic links with paths to the target
>> of the link)
>>
>> ~Thank you,
>> Lucas Swanson
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list