[R] need help in trying out sparklyr - spark_connect will not work on local copy of Spark

Thu Feb 2 22:26:19 CET 2017

> So this makes me wonder if you do not have a proper installation of java for one of those other packages. 

David,

You were right. I was using Java 1.6 instead of Java 1.7 or later.  Mea culpa. I am now up and running, and looking to do many things  with R and Spark. Thank you.

Ron

Ronald C. Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568,  email: ronald.taylor at pnnl.gov
web page:  http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Wednesday, February 01, 2017 4:40 PM
To: Taylor, Ronald C
Cc: r-help at r-project.org; ronald.taylor24 (ronald.taylor24 at gmail.com)
Subject: Re: [R] need help in trying out sparklyr - spark_connect will not work on local copy of Spark

> On Feb 1, 2017, at 3:23 PM, Taylor, Ronald C <Ronald.Taylor at pnnl.gov> wrote:
> 
> Hello R-help list,
> 
> I am a new list member. My first question: I was trying out sparklyr (in R ver 3.3.2) on my Red Hat Linux workstation, following the instructions at spark.rstudio.com as to how to download and use a local copy of Spark. The Spark download appears to work. However, when I try to issue the spark_connect, to get started, I get the error msgs that  you see below.
> 
> I cannot find any guidance as to how to fix this. Quite frustrating. Can somebody give me a bit of help? Does something need to be added to my PATH env var in my .mycshrc file, for example? Is there a closed port problem? Has anybody run into this type of error msg? Do I need to do something additional to start up the local copy of Spark that is not mentioned in the RStudio online documentation?
> 
> -          Ron
> 
> %%%%%%%%%%%%%%%%%%%%
> 
> Here is the spark_install (apparently successful) and then the error msg on the spark_connect():
> 
>> spark_install(version = "1.6.2")
> 
> Installing Spark 1.6.2 for Hadoop 2.6 or later.
> 
> Downloading from:
> 
> - 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
> 
> Installing to:
> 
> - '~/.cache/spark/spark-1.6.2-bin-hadoop2.6'
> 
> trying URL 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
> 
> Content type 'application/x-tar' length 278057117 bytes (265.2 MB)
> 
> ==================================================
> 
> downloaded 265.2 MB
> 
> Installation complete.
> 
>> 
> 
>> sc <- spark_connect(master = "local")
> 
> Error in force(code) :
> 
>  Failed while connecting to sparklyr to port (8880) for sessionid (3689): Gateway in port (8880) did not respond.
> 
>    Path: /home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-submit
> 
>    Parameters: --class, sparklyr.Backend, --jars, '/usr/lib64/R/library/sparklyr/java/spark-csv_2.11-1.3.0.jar','/usr/lib64/R/library/sparklyr/java/commons-csv-1.1.jar','/usr/lib64/R/library/sparklyr/java/univocity-parsers-1.5.1.jar', '/usr/lib64/R/library/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 3689
> 
> ---- Output Log ----
> 
> /home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-class: line 86: /usr/local/bin/bin/java: No such file or directory

So this makes me wonder if you do not have a proper installation of java for one of those other packages. This seems off-topic for r-help, although possibly on-topic for the R-SIG-DBI list :

 https://stat.ethz.ch/mailman/listinfo/r-sig-db

There's also a "Report a bug" link:

	• Report a bug at 
https://github.com/rstudio/sparklyr/issues

-- 
David.
> 
> ---- Error Log ----
> 
>> 
> 
> %%%%%%%%%%%%%%%%%%
> 
> And here is the entire screen output of my R session, from the R invocation on:
> 
> sidney115% R
> 
> R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
> 
> Copyright (C) 2016 The R Foundation for Statistical Computing
> 
> Platform: x86_64-redhat-linux-gnu (64-bit)
> 
>> 
> 
>> library(sparklyr)
> 
>> 
> 
>> ls(pos = "package:sparklyr")
> 
>  [1] "%>%"
> 
>  [2] "compile_package_jars"
> 
>  [3] "connection_config"
> 
>  [4] "connection_is_open"
> 
>  [5] "copy_to"
> 
>  [6] "ensure_scalar_boolean"
> 
>  [7] "ensure_scalar_character"
> 
>  [8] "ensure_scalar_double"
> 
>  [9] "ensure_scalar_integer"
> 
> [10] "find_scalac"
> 
> [11] "ft_binarizer"
> 
> [12] "ft_bucketizer"
> 
> [13] "ft_discrete_cosine_transform"
> 
> [14] "ft_elementwise_product"
> 
> [15] "ft_index_to_string"
> 
> [16] "ft_one_hot_encoder"
> 
> [17] "ft_quantile_discretizer"
> 
> [18] "ft_regex_tokenizer"
> 
> [19] "ft_sql_transformer"
> 
> [20] "ft_string_indexer"
> 
> [21] "ft_tokenizer"
> 
> [22] "ft_vector_assembler"
> 
> [23] "hive_context"
> 
> [24] "invoke"
> 
> [25] "invoke_method"
> 
> [26] "invoke_new"
> 
> [27] "invoke_static"
> 
> [28] "java_context"
> 
> [29] "livy_available_versions"
> 
> [30] "livy_config"
> 
> [31] "livy_home_dir"
> 
> [32] "livy_install"
> 
> [33] "livy_install_dir"
> 
> [34] "livy_installed_versions"
> 
> [35] "livy_service_start"
> 
> [36] "livy_service_stop"
> 
> [37] "ml_als_factorization"
> 
> [38] "ml_binary_classification_eval"
> 
> [39] "ml_classification_eval"
> 
> [40] "ml_create_dummy_variables"
> 
> [41] "ml_decision_tree"
> 
> [42] "ml_generalized_linear_regression"
> 
> [43] "ml_gradient_boosted_trees"
> 
> [44] "ml_kmeans"
> 
> [45] "ml_lda"
> 
> [46] "ml_linear_regression"
> 
> [47] "ml_load"
> 
> [48] "ml_logistic_regression"
> 
> [49] "ml_model"
> 
> [50] "ml_multilayer_perceptron"
> 
> [51] "ml_naive_bayes"
> 
> [52] "ml_one_vs_rest"
> 
> [53] "ml_options"
> 
> [54] "ml_pca"
> 
> [55] "ml_prepare_dataframe"
> 
> [56] "ml_prepare_features"
> 
> [57] "ml_prepare_response_features_intercept"
> 
> [58] "ml_random_forest"
> 
> [59] "ml_save"
> 
> [60] "ml_survival_regression"
> 
> [61] "ml_tree_feature_importance"
> 
> [62] "na.replace"
> 
> [63] "print_jobj"
> 
> [64] "register_extension"
> 
> [65] "registered_extensions"
> 
> [66] "sdf_copy_to"
> 
> [67] "sdf_import"
> 
> [68] "sdf_load_parquet"
> 
> [69] "sdf_load_table"
> 
> [70] "sdf_mutate"
> 
> [71] "sdf_mutate_"
> 
> [72] "sdf_partition"
> 
> [73] "sdf_persist"
> 
> [74] "sdf_predict"
> 
> [75] "sdf_quantile"
> 
> [76] "sdf_read_column"
> 
> [77] "sdf_register"
> 
> [78] "sdf_sample"
> 
> [79] "sdf_save_parquet"
> 
> [80] "sdf_save_table"
> 
> [81] "sdf_schema"
> 
> [82] "sdf_sort"
> 
> [83] "sdf_with_unique_id"
> 
> [84] "spark_available_versions"
> 
> [85] "spark_compilation_spec"
> 
> [86] "spark_compile"
> 
> [87] "spark_config"
> 
> [88] "spark_connect"
> 
> [89] "spark_connection"
> 
> [90] "spark_connection_is_open"
> 
> [91] "spark_context"
> 
> [92] "spark_dataframe"
> 
> [93] "spark_default_compilation_spec"
> 
> [94] "spark_dependency"
> 
> [95] "spark_disconnect"
> 
> [96] "spark_disconnect_all"
> 
> [97] "spark_home_dir"
> 
> [98] "spark_install"
> 
> [99] "spark_install_dir"
> 
> [100] "spark_install_tar"
> 
> [101] "spark_installed_versions"
> 
> [102] "spark_jobj"
> 
> [103] "spark_load_table"
> 
> [104] "spark_log"
> 
> [105] "spark_read_csv"
> 
> [106] "spark_read_json"
> 
> [107] "spark_read_parquet"
> 
> [108] "spark_save_table"
> 
> [109] "spark_session"
> 
> [110] "spark_uninstall"
> 
> [111] "spark_version"
> 
> [112] "spark_version_from_home"
> 
> [113] "spark_web"
> 
> [114] "spark_write_csv"
> 
> [115] "spark_write_json"
> 
> [116] "spark_write_parquet"
> 
> [117] "tbl_cache"
> 
> [118] "tbl_uncache"
> 
>> 
> 
>> 
> 
>> 
> 
>> spark_install(version = "1.6.2")
> 
> Installing Spark 1.6.2 for Hadoop 2.6 or later.
> 
> Downloading from:
> 
> - 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
> 
> Installing to:
> 
> - '~/.cache/spark/spark-1.6.2-bin-hadoop2.6'
> 
> trying URL 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
> 
> Content type 'application/x-tar' length 278057117 bytes (265.2 MB)
> 
> ==================================================
> 
> downloaded 265.2 MB
> 
> Installation complete.
> 
>> 
> 
>> sc <- spark_connect(master = "local")
> 
> Error in force(code) :
> 
>  Failed while connecting to sparklyr to port (8880) for sessionid (3689): Gateway in port (8880) did not respond.
> 
>    Path: /home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-submit
> 
>    Parameters: --class, sparklyr.Backend, --jars, '/usr/lib64/R/library/sparklyr/java/spark-csv_2.11-1.3.0.jar','/usr/lib64/R/library/sparklyr/java/commons-csv-1.1.jar','/usr/lib64/R/library/sparklyr/java/univocity-parsers-1.5.1.jar', '/usr/lib64/R/library/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 3689
> 
> 
> ---- Output Log ----
> 
> /home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-class: line 86: /usr/local/bin/bin/java: No such file or directory

> ---- Error Log ----
> 
>> 
> 
> %%%%%%%%%%%%%%%%%%
> 
> Ronald C. Taylor, Ph.D.
> Computational Biology & Bioinformatics Group
> Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
> Richland, WA 99352
> phone: (509) 372-6568,  email: ronald.taylor at pnnl.gov
> web page:  http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA