[R-es] Problemas h2O

Carlos Ortega cof en qualityexcellence.es
Dom Ago 20 01:23:57 CEST 2017


Hola,

¿Has probado a forzar a la hora de iniciar h2o que trabaje con el máximo
número de cores...?

h2o.init(threads = -1)

En la ayuda de "h2o.init":

nthreads

(Optional) Number of threads in the thread pool. This relates very closely
to the number of CPUs used. -1 means use all CPUs on the host (Default). A
positive integer specifies the number of CPUs directly. This value is only
used when R starts H2O.

Otra de las cosas que veo en tu código aunque no preguntas por ello es que
grabas ficheros con "fwrite" cuando h2o tiene una función propia
"h2o.exportFile()" que paraleliza la escritura....

Saludos,
Carlos Ortega
www.qualityexcellence.es



2017-08-19 21:26 GMT+02:00 Jesús Para Fernández <
j.para.fernandez en hotmail.com>:

> Buenas
>
>
> Estoy usando H20 en local y tb en un ec2 de amazon, pero tengo que tener
> algo mal configurado seguro.
>
> Para iniciarlo, hago lo siguiente:
>
> conexion<-h2o.init()
>
>
> Me arranca el cluster con el maximo de cores y memoria que se permite.
>
>
> Una vez hech oesto, quiero calcular la distancia entre dos data.frames:
>
>
> uno<-data.frame(matrix(rnorm(300000),ncol=10))
>
> dos<-data.frame(matrix(rnorm(500),ncol=10))
>
> uno<-as.h2o(uno)
>
> dos<-as.h2o(dos)
>
> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
>
> for(i in nrow(dos)){
>
> matriz[,i]<-h2o.distance(uno,dos[i,])
>
> }
>
>
>
> Al hacerlo, y haciendo uso de htop veo que de lso 4 nucleos de mi pc o los
> 16 del ec2 de amazon, solo se usa uno, y es mas, en el ec2 esta tardando en
> ejecutarlo mas que en el pc.
>
> Por ello creo que no esta paralelizando bien. ¿A alguien le ha ocurrido?
>
>
> Si hago un h2o.clusterStatus() me aparece que esta todo OK
>
> R version 3.4.1 (2017-06-30) -- "Single Candle"
> Copyright (C) 2017 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> R es un software libre y viene sin GARANTIA ALGUNA.
> Usted puede redistribuirlo bajo ciertas circunstancias.
> Escriba 'license()' o 'licence()' para detalles de distribucion.
>
> R es un proyecto colaborativo con muchos contribuyentes.
> Escriba 'contributors()' para obtener más información y
> 'citation()' para saber cómo citar R o paquetes de R en publicaciones.
>
> Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line de
> ayuda,
> o 'help.start()' para abrir el sistema de ayuda HTML con su navegador.
> Escriba 'q()' para salir de R.
>
> [Workspace loaded from ~/.RData]
>
> > library(h2o)
>
> ----------------------------------------------------------------------
>
> Your next step is to start H2O:
>     > h2o.init()
>
> For H2O package documentation, ask for help:
>     > ??h2o
>
> After starting H2O, you can use the Web UI at http://localhost:54321
> For more information visit http://docs.h2o.ai
>
> ----------------------------------------------------------------------
>
>
> Attaching package: ‘h2o’
>
> The following objects are masked from ‘package:stats’:
>
>     cor, sd, var
>
> The following objects are masked from ‘package:base’:
>
>     ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-,
> ifelse, %in%,
>     is.character, is.factor, is.numeric, log, log10, log1p, log2, round,
> signif, trunc
>
> > h2o.init()
>
> H2O is not running yet, starting it now...
>
> Note:  In case of errors look at the following log files:
>     /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.out
>     /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.err
>
> java version "1.8.0_144"
> Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>
> Starting H2O JVM and connecting: .. Connection successful!
>
> R is connected to the H2O cluster:
>     H2O cluster uptime:         1 seconds 905 milliseconds
>     H2O cluster version:        3.10.5.3
>     H2O cluster version age:    1 month and 20 days
>     H2O cluster name:           H2O_started_from_R_jesus_rqh095
>     H2O cluster total nodes:    1
>     H2O cluster total memory:   1.71 GB
>     H2O cluster total cores:    4
>     H2O cluster allowed cores:  4
>     H2O cluster healthy:        TRUE
>     H2O Connection ip:          localhost
>     H2O Connection port:        54321
>     H2O Connection proxy:       NA
>     H2O Internal Security:      FALSE
>     R Version:                  R version 3.4.1 (2017-06-30)
>
> > gc()
>           used (Mb) gc trigger (Mb) max used (Mb)
> Ncells  679385 36.3    1168576 62.5   940480 50.3
> Vcells 1138497  8.7    1920143 14.7  1532430 11.7
> > rm(list=ls())
> > datos<-read.table("/home/jesus/master/datos/datos-
> balanceado/datos-100/datos.csv",header=T,dec=".",sep=",")
> > uno<-datos[datos$InspectionReport == "ACCEPTED",]
> > dos<-datos[datos$InspectionReport != "ACCEPTED",]
> > uno$InspectionReport<-NULL
> > dos$InspectionReport<-NULL
> > uno2<-as.h2o(uno)
>   |=======================================================================================|
> 100%
> > dos2<-as.h2o(dos)
>   |=======================================================================================|
> 100%
> > h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/
> jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
> Error in !missing(row) && !(base::is.character(row)) :
>   objeto 'i' no encontrado
> > i<-1
> > h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/
> jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
>   |=======================================================================================|
> 100%
> > t=Sys.time()
> > for(i in 1:10){
> +  h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/
> jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
> +
> + }
>
> ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url =
> http://localhost:54321/3/Frames/RTMP_sid_8e47_4/export)
>
> water.exceptions.H2OIllegalArgumentException
>  [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument:
> /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of
> function: exportFrame: File /home/jesus/master/datos/
> datos-balanceado/matriz-overlapping/k1.csv already exists!"
>  [2] "    water.fvec.Frame.export(Frame.java:1370)"
>  [3] "    water.api.FramesHandler.export(FramesHandler.java:258)"
>  [4] "    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"
>  [5] "    sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)"
>  [6] "    sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)"
>  [7] "    java.lang.reflect.Method.invoke(Method.java:498)"
>  [8] "    water.api.Handler.handle(Handler.java:63)"
>  [9] "    water.api.RequestServer.serve(RequestServer.java:448)"
> [10] "    water.api.RequestServer.doGeneric(RequestServer.java:297)"
> [11] "    water.api.RequestServer.doPost(RequestServer.java:223)"
> [12] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"
> [13] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"
> [14] "    org.eclipse.jetty.servlet.ServletHolder.handle(
> ServletHolder.java:684)"
> [15] "    org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:503)"
> [16] "    org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1086)"
> [17] "    org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:429)"
> [18] "    org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1020)"
> [19] "    org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:135)"
> [20] "    org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:154)"
> [21] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:116)"
> [22] "    water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:183)"
> [23] "    org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:154)"
> [24] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:116)"
> [25] "    org.eclipse.jetty.server.Server.handle(Server.java:370)"
> [26] "    org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(
> AbstractHttpConnection.java:494)"
> [27] "    org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(
> BlockingHttpConnection.java:53)"
> [28] "    org.eclipse.jetty.server.AbstractHttpConnection.content(
> AbstractHttpConnection.java:982)"
> [29] "    org.eclipse.jetty.server.AbstractHttpConnection$
> RequestHandler.content(AbstractHttpConnection.java:1043)"
> [30] "    org.eclipse.jetty.http.HttpParser.parseNext(
> HttpParser.java:865)"
> [31] "    org.eclipse.jetty.http.HttpParser.parseAvailable(
> HttpParser.java:240)"
> [32] "    org.eclipse.jetty.server.BlockingHttpConnection.handle(
> BlockingHttpConnection.java:72)"
> [33] "    org.eclipse.jetty.server.bio.SocketConnector$
> ConnectorEndPoint.run(SocketConnector.java:264)"
> [34] "    org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:608)"
> [35] "    org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:543)"
> [36] "    java.lang.Thread.run(Thread.java:748)"
>
> Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix
> = page,  :
>
>
> ERROR MESSAGE:
>
> Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv
> of function: exportFrame: File /home/jesus/master/datos/
> datos-balanceado/matriz-overlapping/k1.csv already exists!
>
> >
> > print(Sys.time()-t)
> Time difference of 0.218374 secs
> > ?h2o.exportFile
> > t=Sys.time()
> > for(i in 1:10){
> +  h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/
> jesus/master/datos/datos-balanceado/matriz-overlapping/
> k",i,".csv"),force=T)
> +
> + }
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
>   |=======================================================================================|
> 100%
> >
> > print(Sys.time()-t)
> Time difference of 11.31977 secs
> > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
>   |=======================================================================================|
> 100%
> > t=Sys.time()
> > for(i in 1:10){
> +
> +  matriz[,j]<-h2o.distance(uno2,dos2[i,])
> + }
> Error in !allCol && is.na(col) : objeto 'j' no encontrado
> >
> > print(Sys.time()-t)
> Time difference of 0.006168127 secs
> > t=Sys.time()
> > for(i in 1:10){
> +
> +  matriz[,i]<-h2o.distance(uno2,dos2[i,])
> + }
> >
> > print(Sys.time()-t)
> Time difference of 30.33803 secs
> > 10/30
> [1] 0.3333333
> > 30/10*nrow(dos)
> [1] 16068
> > 30/10*nrow(dos)
> [1] 16068
> > 30/10*nrow(dos)/3600
> [1] 4.463333
> > library(data.table)
> data.table 1.10.4
>   The fastest way to learn (by data.table authors):
> https://www.datacamp.com/courses/data-analysis-the-data-table-way
>   Documentation: ?data.table, example(data.table) and
> browseVignettes("data.table")
>   Release notes, videos and slides: http://r-datatable.com
>
> Attaching package: ‘data.table’
>
> The following objects are masked from ‘package:h2o’:
>
>     hour, month, week, year
>
> > t=Sys.time()
> > for(i in 1:10){
> + fwrite(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/
> master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
> +
> + }
> Error: is.list(x) is not TRUE
> > print(Sys.time()-t)
> Time difference of 0.1015148 secs
> > ?fwrite
> > matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
>   |=======================================================================================|
> 100%
> > t=Sys.time()
> > for(i in 1:10){
> +
> +  matriz[,i]<-h2o.distance(uno2,dos2[i,])
> + }
> >
> > print(Sys.time()-t)
> Time difference of 28.89684 secs
> > 30/10
> [1] 3
> > 30/10*nrow(dos)
> [1] 16068
> > 30/10*nrow(dos)/3600
> [1] 4.463333
> > t=Sys.time()
> > for(i in 1:50){
> +
> +  matriz[,i]<-h2o.distance(uno2,dos2[i,])
> + }
> >
> > print(Sys.time()-t)
> Time difference of 2.506209 mins
> > 2*60+30
> [1] 150
> > 150/50
> [1] 3
> > h2o.cluster_sizes()
> Error in .model.parts(object) :
>   el argumento "object" está ausente, sin valor por omisión
> > h2o.clusterStatus()
> Version: 3.10.5.3
> Cluster name: H2O_started_from_R_jesus_rqh095
> Cluster size: 1
> Cluster is locked
>
>                         h2o healthy   last_ping num_cpus sys_load
> mem_value_size   free_mem
> 1 localhost/127.0.0.1:54321    TRUE 1.50317e+12        4     0.55
> 707369984 1129209856
>   pojo_mem swap_mem  free_disk    max_disk  pid num_keys tcps_active
> open_fds rpcs_active
> 1        0        0 6571425792 20121124864 8553    16417           0
>  38           0
> > ?h2o.init()
> > h2o.clusterIsUp()
> [1] TRUE
> > h2o.clusterInfo()
> R is connected to the H2O cluster:
>     H2O cluster uptime:         1 hours 31 minutes
>     H2O cluster version:        3.10.5.3
>     H2O cluster version age:    1 month and 20 days
>     H2O cluster name:           H2O_started_from_R_jesus_rqh095
>     H2O cluster total nodes:    1
>     H2O cluster total memory:   1.05 GB
>     H2O cluster total cores:    4
>     H2O cluster allowed cores:  4
>     H2O cluster healthy:        TRUE
>     H2O Connection ip:          localhost
>     H2O Connection port:        54321
>     H2O Connection proxy:       NA
>     H2O Internal Security:      FALSE
>     R Version:                  R version 3.4.1 (2017-06-30)
> > h2o.cluster_sizes(dos2)
> Error in .model.parts(object) :
>   tentativa de obtener un slot "model" de un objeto cuya clase
> ("H2OFrame") que no es un objecto clase S4
> > h2o.clusterInfo()
> R is connected to the H2O cluster:
>     H2O cluster uptime:         1 hours 37 minutes
>     H2O cluster version:        3.10.5.3
>     H2O cluster version age:    1 month and 20 days
>     H2O cluster name:           H2O_started_from_R_jesus_rqh095
>     H2O cluster total nodes:    1
>     H2O cluster total memory:   1.05 GB
>     H2O cluster total cores:    4
>     H2O cluster allowed cores:  4
>     H2O cluster healthy:        TRUE
>     H2O Connection ip:          localhost
>     H2O Connection port:        54321
>     H2O Connection proxy:       NA
>     H2O Internal Security:      FALSE
>     R Version:                  R version 3.4.1 (2017-06-30)
>
>
>
> Gracias
>
> Jesus
>
>
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-help-es mailing list
> R-help-es en r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

	[[alternative HTML version deleted]]



Más información sobre la lista de distribución R-help-es