example1.Rmd
This site shows the basic approach for accessing handling and aggregating ARTMO Databases using ARTMO_R
This part of the code is necessary to set up the working environment. All necessary Functions have been wrapped in a package downloadable directly from the Github repository and included in the dev branch of the repository The package can be downloaded and installed as follows
library("devtools")
devtools::install_github("mattia6690/ARTMO_R",ref="dev")
library("ArtmoR")
In a first step the connection credentials are necessary. Below you can see the standard MySQL credentials as described in the ARTMO manual.
user <- "root"
host <- "localhost"
password <- "123456"
The connection itself can be done in two ways. First to the overall MySQL without specification of a database in order to explore the available databases with connect.raw
. Each Database is treated in its own and cannot be used together. If you wish to analyze all the databases at the same time you may need an additional for loop or some other iterative functionalities (e.g. purrr). This is discouraged due to the processing time necessary. To display the Databases available in MYSQL use the getDBs
parameter
con.raw <- connect.artmo(user, password, host,getDB=TRUE)
By selecting the Database the same function is able to connect solemly to the database specified.
database <- "artmo_1"
con <- connect.artmo(user, password, host, database)
summary(con)
## Length Class Mode
## 1 MariaDBConnection S4
Once connected db.tables
helps to explore the tables stored in the MySQL back-end for the specific database. All the available tables are displayed. The ones without entries information have a count of 0 and will be ignored in the next steps
## Name Rows Avg_row_length Data_length
## 1 aux_prospect4_t2 1000 81 81920
## 2 aux_sail4_t2 1000 180 180224
## 3 master 10 3296 32960
## 4 prospect4_t1 20 21748 434976
## 5 prospect4_t2 6090 33680 205111200
## 6 s_prospect4_t2 1000 65 65536
## 7 s_sail4_t2 1000 65 65536
## 8 sail4_t1 40 2745 109800
## 9 sail4_t2 81000 240 19480000
## 10 sensor 16 36784 588660
## 11 test_cf_alfa 33 15 495
## 12 test_cf_algoritmo 33 42 1400
## 13 test_cf_beta 33 15 495
## 14 test_cf_class 7 847740 5934184
## 15 test_cf_noise 126 9048056 1140055080
## 16 test_cf_parameter 7 1296 9076
## 17 test_cf_train 125 1116 139500
## 18 test_inv_cf_ppal 8 498 3988
## 19 test_mla_algoritmo 7 284 1988
## 20 test_mla_class 3 280 840
## 21 test_mla_noise1 7 13 91
## 22 test_mla_noise2 31 13 403
## 23 test_mla_parameter 3 268 804
## 24 test_mla_ppal 3 2312 6936
## 25 test_mla_result 31 873528 27079392
## 26 test_mla_train1 31 13 403
## 27 test_mla_train2 31 13 403
## 28 vis_eq_t1 1 260 260
## 29 vis_eq_t2 6 42 252
## 30 vis_eq_t3 13 285 3705
## 31 vis_t1 4 260 1040
## 32 vis_t2 11 85 936
## 33 vis_t3 24 534 12816
Furthermore the most important part when dealing with MySQL - or basically with any SQL - code is to link the various tables one to the other. Due to the complexity of ARTMO each approach generates numerous tables, comprise several different data types and are linked in a sometimes inconsistent way to each other. Depending on the workflows more than 10 tables have to be recursively liked one to the other. The getLinks
function resolves tackles this issue and finds the connection between tables
## # A tibble: 17 x 8
## Database Table IDs Count Table_Type Table_Name Rank result
## <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
## 1 artmo_1 test_inv_cf_ppal ID_T1 8 test_cf ppal 1 0
## 2 artmo_1 test_cf_class ID_T2 7 test_cf class 2 0
## 3 artmo_1 test_cf_paramet~ ID_T3 7 test_cf parameter 3 0
## 4 artmo_1 test_cf_algorit~ ID_T4 33 test_cf algoritmo 4 0
## 5 artmo_1 test_cf_alfa ID_T5 33 test_cf alfa 5 0
## 6 artmo_1 test_cf_beta ID_T6 33 test_cf beta 6 0
## 7 artmo_1 test_cf_noise ID_T7 126 test_cf noise 7 0
## 8 artmo_1 test_cf_train ID_T8 125 test_cf train 8 1
## 9 artmo_1 test_mla_ppal ID_T1 3 test_mla ppal 1 0
## 10 artmo_1 test_mla_class ID_T2 3 test_mla class 2 0
## 11 artmo_1 test_mla_parame~ ID_T3 3 test_mla parameter 3 0
## 12 artmo_1 test_mla_algori~ ID_T4 7 test_mla algoritmo 4 0
## 13 artmo_1 test_mla_noise1 ID_T5 7 test_mla noise1 5 0
## 14 artmo_1 test_mla_noise2 ID_T6 31 test_mla noise2 6 0
## 15 artmo_1 test_mla_train1 ID_T7 31 test_mla train1 7 0
## 16 artmo_1 test_mla_train2 ID_T8 31 test_mla train2 8 0
## 17 artmo_1 test_mla_result ID_T9 31 test_mla result 9 1
As you can see above each table contains one master ID that allows the link between the databases. Now all the data from each of the table has to be retrieved from the MySQL database. Hereby no information is written to the HD but everything remains in the RAM, so be aware that huge databases may cause Memory Issues.
dbtabs <- getTabs(con,dblinks)
## Warning in rep(., each = vsalidas): first element used of 'each' argument
## Warning in cbind(Spec_id, vsalidas.rep, spectral): number of rows of result
## is not a multiple of vector length (arg 1)
print(dbtabs)
## # A tibble: 17 x 8
## Database Table_Type Table_Name Table Rank Count result MyTables
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <list>
## 1 artmo_1 test_cf ppal test_inv_~ 1 8 0 <tibble [8~
## 2 artmo_1 test_cf class test_cf_c~ 2 7 0 <tibble [7~
## 3 artmo_1 test_cf parameter test_cf_p~ 3 7 0 <tibble [7~
## 4 artmo_1 test_cf algoritmo test_cf_a~ 4 33 0 <tibble [3~
## 5 artmo_1 test_cf alfa test_cf_a~ 5 33 0 <tibble [3~
## 6 artmo_1 test_cf beta test_cf_b~ 6 33 0 <tibble [3~
## 7 artmo_1 test_cf noise test_cf_n~ 7 126 0 <tibble [1~
## 8 artmo_1 test_cf train test_cf_t~ 8 125 1 <tibble [1~
## 9 artmo_1 test_mla ppal test_mla_~ 1 3 0 <tibble [3~
## 10 artmo_1 test_mla class test_mla_~ 2 3 0 <tibble [3~
## 11 artmo_1 test_mla parameter test_mla_~ 3 3 0 <tibble [3~
## 12 artmo_1 test_mla algoritmo test_mla_~ 4 7 0 <tibble [7~
## 13 artmo_1 test_mla noise1 test_mla_~ 5 7 0 <tibble [7~
## 14 artmo_1 test_mla noise2 test_mla_~ 6 31 0 <tibble [3~
## 15 artmo_1 test_mla train1 test_mla_~ 7 31 0 <tibble [3~
## 16 artmo_1 test_mla train2 test_mla_~ 8 31 0 <tibble [3~
## 17 artmo_1 test_mla result test_mla_~ 9 31 1 <tibble [3~
The data now is nested depending on the Table Type (which approach has been used). The MyTable columns contain all data retrieved in the MySQL database stored in lists of tibbles. The function dplyr::unnest()
allows to reduce their dimensionality and explore in detail the single tables for each of the approaches. If we want to unravel the Table concerning the results of the Database we can use
library("magrittr")
library("dplyr")
library("tidyr")
table<-dbtabs %>% filter(Table_Name=="train") %>% select(MyTables) %>% unnest
print(table)
## # A tibble: 125 x 16
## ID_T8 ID_T7 ID_T3 ID_TRAIN train ME RMSE RELRMSE MAE R R2
## <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10 10 1 1 1 0.101 1.77 70.6 1.22 0.423 0.179
## 2 20 20 1 1 1 -0.434 2.20 87.6 1.61 0.439 0.193
## 3 30 30 1 1 1 1.27 1.98 78.9 1.48 0.561 0.315
## 4 40 40 1 1 1 0.0467 1.86 74.3 1.30 0.425 0.180
## 5 50 50 2 1 1 0.222 1.66 66.2 1.19 0.562 0.316
## 6 64 64 2 1 1 -0.0518 2.21 88.0 1.63 0.321 0.103
## 7 84 84 2 1 1 0.349 1.63 65.0 1.17 0.539 0.291
## 8 115 116 7 1 1 1.1 3.19 127. 2.54 0.158 0.025
## 9 9 9 1 1 1 0.121 1.79 71.3 1.23 0.418 0.174
## 10 19 19 1 1 1 -0.369 2.13 84.9 1.54 0.429 0.184
## # ... with 115 more rows, and 5 more variables: NRMSE <dbl>, TS <dbl>,
## # NSE <dbl>, tictoc <dbl>, Results <list>
The nesting may has happened on multiple levels in the tibbles. This means that some of the lists contain lists. This level of complexity s necessary to reduce the amount of processing time as well as the RAM resources. The dbJoin
function helps to aggregate all the information based on the Table_Type column. The MyTables column is therefore recursively joined based on the unique identifiers
## # A tibble: 2 x 3
## Database Table_Type Metrics
## <chr> <chr> <list>
## 1 artmo_1 test_cf <tibble [129 x 35]>
## 2 artmo_1 test_mla <tibble [31 x 35]>
Actually, this section is up to the next Part of this Website. Nonetheless here is the most common way to analyze one of the Table_Type aggregation. If the dplyr::unnest
is used on single columns the MySQL result can be analyzed.
NOTE: This won’t work for the complete tibble since they have different column names. It is recommended to either standardize or process them one by one.
statistics<- dbjoin[1,] %>% unnest
print(statistics)
## # A tibble: 129 x 37
## Database Table_Type Model Database1 Project PY_ID Date name_class
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 2 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 3 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 4 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 5 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 6 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 7 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 8 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 9 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## 10 artmo_1 test_cf Mode~ artmo_1 Mattia1 1 2018~ Full_image
## # ... with 119 more rows, and 29 more variables: derivar <int>,
## # size_data <S3: integer64>, estadistico <int>, Spectra <list>,
## # LUT <list>, Name1 <chr>, Name2 <chr>, clave <chr>, param_user <list>,
## # algorithm <chr>, normalizar <int>, sim <S3: integer64>, alfa <dbl>,
## # beta <dbl>, noise <dbl>, train <dbl>, ME <dbl>, RMSE <dbl>,
## # RELRMSE <dbl>, MAE <dbl>, R <dbl>, R2 <dbl>, NRMSE <dbl>, TS <dbl>,
## # NSE <dbl>, tictoc <dbl>, Results <list>, Store.ID <int>,
## # Table_Type1 <chr>
Even though I would personally not recommend to save the resulting nested tibble the best possibility is by creating an RDS file. This is simply done with the saveRDS
function. The big advantage of RDS is that the format is capable to handle the nesting as used in the presented tibbles. However, the dbjoin Element could become an extremely large object when saved. Consider saving parts of the dbJoin object such as by model.
In general, it is recommended to build a folder structure before going in the deep analysis. This is very useful when the Output of the Model (such as Spatial Rasters) have to be connected to the Model output later.It is mandatory to manually connect the model output to the table since ARTMO provides no logical link in the MySQL tables within its table structure.
dir<-"C:/ARTMO"
models<-buildpath(dbjoin,dir,foldersetup = T)
models
## # A tibble: 11 x 3
## Database Table_Type Dir
## <chr> <chr> <chr>
## 1 artmo_1 test_cf C:/ARTMO/artmo_1/test_cf/Model1/
## 2 artmo_1 test_cf C:/ARTMO/artmo_1/test_cf/Model2/
## 3 artmo_1 test_cf C:/ARTMO/artmo_1/test_cf/ModelTia/
## 4 artmo_1 test_cf C:/ARTMO/artmo_1/test_cf/Mattab/
## 5 artmo_1 test_cf C:/ARTMO/artmo_1/test_cf/Table_1107/
## 6 artmo_1 test_cf C:/ARTMO/artmo_1/test_cf/Tabnew/
## 7 artmo_1 test_cf C:/ARTMO/artmo_1/test_cf/TABLEAGAIN/
## 8 artmo_1 test_cf C:/ARTMO/artmo_1/test_cf/Table_1210/
## 9 artmo_1 test_mla C:/ARTMO/artmo_1/test_mla/MLRA1/
## 10 artmo_1 test_mla C:/ARTMO/artmo_1/test_mla/MLRA2/
## 11 artmo_1 test_mla C:/ARTMO/artmo_1/test_mla/MLRA3/