This site shows the basic approach for accessing handling and aggregating ARTMO Databases using ARTMO_R

Essentials

This part of the code is necessary to set up the working environment. All necessary Functions have been wrapped in a package downloadable directly from the Github repository and included in the dev branch of the repository The package can be downloaded and installed as follows

library("devtools")
devtools::install_github("mattia6690/ARTMO_R",ref="dev")
library("ArtmoR")

Connect

In a first step the connection credentials are necessary. Below you can see the standard MySQL credentials as described in the ARTMO manual.

user      <- "root"
host      <- "localhost"
password  <- "123456"

The connection itself can be done in two ways. First to the overall MySQL without specification of a database in order to explore the available databases with connect.raw. Each Database is treated in its own and cannot be used together. If you wish to analyze all the databases at the same time you may need an additional for loop or some other iterative functionalities (e.g. purrr). This is discouraged due to the processing time necessary. To display the Databases available in MYSQL use the getDBs parameter

con.raw <- connect.artmo(user, password, host,getDB=TRUE)

Select

By selecting the Database the same function is able to connect solemly to the database specified.

database  <- "artmo_1"
con       <- connect.artmo(user, password, host, database)
summary(con)
##            Length             Class              Mode 
##                 1 MariaDBConnection                S4

Once connected db.tables helps to explore the tables stored in the MySQL back-end for the specific database. All the available tables are displayed. The ones without entries information have a count of 0 and will be ignored in the next steps

db.tabs<-db.tables(con)
print(db.tabs)
##                  Name  Rows Avg_row_length Data_length
## 1    aux_prospect4_t2  1000             81       81920
## 2        aux_sail4_t2  1000            180      180224
## 3              master    10           3296       32960
## 4        prospect4_t1    20          21748      434976
## 5        prospect4_t2  6090          33680   205111200
## 6      s_prospect4_t2  1000             65       65536
## 7          s_sail4_t2  1000             65       65536
## 8            sail4_t1    40           2745      109800
## 9            sail4_t2 81000            240    19480000
## 10             sensor    16          36784      588660
## 11       test_cf_alfa    33             15         495
## 12  test_cf_algoritmo    33             42        1400
## 13       test_cf_beta    33             15         495
## 14      test_cf_class     7         847740     5934184
## 15      test_cf_noise   126        9048056  1140055080
## 16  test_cf_parameter     7           1296        9076
## 17      test_cf_train   125           1116      139500
## 18   test_inv_cf_ppal     8            498        3988
## 19 test_mla_algoritmo     7            284        1988
## 20     test_mla_class     3            280         840
## 21    test_mla_noise1     7             13          91
## 22    test_mla_noise2    31             13         403
## 23 test_mla_parameter     3            268         804
## 24      test_mla_ppal     3           2312        6936
## 25    test_mla_result    31         873528    27079392
## 26    test_mla_train1    31             13         403
## 27    test_mla_train2    31             13         403
## 28          vis_eq_t1     1            260         260
## 29          vis_eq_t2     6             42         252
## 30          vis_eq_t3    13            285        3705
## 31             vis_t1     4            260        1040
## 32             vis_t2    11             85         936
## 33             vis_t3    24            534       12816

Search and Store

Furthermore the most important part when dealing with MySQL - or basically with any SQL - code is to link the various tables one to the other. Due to the complexity of ARTMO each approach generates numerous tables, comprise several different data types and are linked in a sometimes inconsistent way to each other. Depending on the workflows more than 10 tables have to be recursively liked one to the other. The getLinks function resolves tackles this issue and finds the connection between tables

dblinks  <- getLinks(con)
print(dblinks)
## # A tibble: 17 x 8
##    Database Table            IDs   Count Table_Type Table_Name  Rank result
##    <chr>    <chr>            <chr> <dbl> <chr>      <chr>      <dbl>  <dbl>
##  1 artmo_1  test_inv_cf_ppal ID_T1     8 test_cf    ppal           1      0
##  2 artmo_1  test_cf_class    ID_T2     7 test_cf    class          2      0
##  3 artmo_1  test_cf_paramet~ ID_T3     7 test_cf    parameter      3      0
##  4 artmo_1  test_cf_algorit~ ID_T4    33 test_cf    algoritmo      4      0
##  5 artmo_1  test_cf_alfa     ID_T5    33 test_cf    alfa           5      0
##  6 artmo_1  test_cf_beta     ID_T6    33 test_cf    beta           6      0
##  7 artmo_1  test_cf_noise    ID_T7   126 test_cf    noise          7      0
##  8 artmo_1  test_cf_train    ID_T8   125 test_cf    train          8      1
##  9 artmo_1  test_mla_ppal    ID_T1     3 test_mla   ppal           1      0
## 10 artmo_1  test_mla_class   ID_T2     3 test_mla   class          2      0
## 11 artmo_1  test_mla_parame~ ID_T3     3 test_mla   parameter      3      0
## 12 artmo_1  test_mla_algori~ ID_T4     7 test_mla   algoritmo      4      0
## 13 artmo_1  test_mla_noise1  ID_T5     7 test_mla   noise1         5      0
## 14 artmo_1  test_mla_noise2  ID_T6    31 test_mla   noise2         6      0
## 15 artmo_1  test_mla_train1  ID_T7    31 test_mla   train1         7      0
## 16 artmo_1  test_mla_train2  ID_T8    31 test_mla   train2         8      0
## 17 artmo_1  test_mla_result  ID_T9    31 test_mla   result         9      1

As you can see above each table contains one master ID that allows the link between the databases. Now all the data from each of the table has to be retrieved from the MySQL database. Hereby no information is written to the HD but everything remains in the RAM, so be aware that huge databases may cause Memory Issues.

dbtabs   <- getTabs(con,dblinks)
## Warning in rep(., each = vsalidas): first element used of 'each' argument
## Warning in cbind(Spec_id, vsalidas.rep, spectral): number of rows of result
## is not a multiple of vector length (arg 1)
print(dbtabs)
## # A tibble: 17 x 8
##    Database Table_Type Table_Name Table       Rank Count result MyTables   
##    <chr>    <chr>      <chr>      <chr>      <dbl> <dbl>  <dbl> <list>     
##  1 artmo_1  test_cf    ppal       test_inv_~     1     8      0 <tibble [8~
##  2 artmo_1  test_cf    class      test_cf_c~     2     7      0 <tibble [7~
##  3 artmo_1  test_cf    parameter  test_cf_p~     3     7      0 <tibble [7~
##  4 artmo_1  test_cf    algoritmo  test_cf_a~     4    33      0 <tibble [3~
##  5 artmo_1  test_cf    alfa       test_cf_a~     5    33      0 <tibble [3~
##  6 artmo_1  test_cf    beta       test_cf_b~     6    33      0 <tibble [3~
##  7 artmo_1  test_cf    noise      test_cf_n~     7   126      0 <tibble [1~
##  8 artmo_1  test_cf    train      test_cf_t~     8   125      1 <tibble [1~
##  9 artmo_1  test_mla   ppal       test_mla_~     1     3      0 <tibble [3~
## 10 artmo_1  test_mla   class      test_mla_~     2     3      0 <tibble [3~
## 11 artmo_1  test_mla   parameter  test_mla_~     3     3      0 <tibble [3~
## 12 artmo_1  test_mla   algoritmo  test_mla_~     4     7      0 <tibble [7~
## 13 artmo_1  test_mla   noise1     test_mla_~     5     7      0 <tibble [7~
## 14 artmo_1  test_mla   noise2     test_mla_~     6    31      0 <tibble [3~
## 15 artmo_1  test_mla   train1     test_mla_~     7    31      0 <tibble [3~
## 16 artmo_1  test_mla   train2     test_mla_~     8    31      0 <tibble [3~
## 17 artmo_1  test_mla   result     test_mla_~     9    31      1 <tibble [3~

The data now is nested depending on the Table Type (which approach has been used). The MyTable columns contain all data retrieved in the MySQL database stored in lists of tibbles. The function dplyr::unnest() allows to reduce their dimensionality and explore in detail the single tables for each of the approaches. If we want to unravel the Table concerning the results of the Database we can use

library("magrittr")
library("dplyr")
library("tidyr")
table<-dbtabs %>% filter(Table_Name=="train") %>% select(MyTables) %>% unnest
print(table)
## # A tibble: 125 x 16
##    ID_T8 ID_T7 ID_T3 ID_TRAIN train      ME  RMSE RELRMSE   MAE     R    R2
##    <int> <int> <int>    <int> <dbl>   <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
##  1    10    10     1        1     1  0.101   1.77    70.6  1.22 0.423 0.179
##  2    20    20     1        1     1 -0.434   2.20    87.6  1.61 0.439 0.193
##  3    30    30     1        1     1  1.27    1.98    78.9  1.48 0.561 0.315
##  4    40    40     1        1     1  0.0467  1.86    74.3  1.30 0.425 0.180
##  5    50    50     2        1     1  0.222   1.66    66.2  1.19 0.562 0.316
##  6    64    64     2        1     1 -0.0518  2.21    88.0  1.63 0.321 0.103
##  7    84    84     2        1     1  0.349   1.63    65.0  1.17 0.539 0.291
##  8   115   116     7        1     1  1.1     3.19   127.   2.54 0.158 0.025
##  9     9     9     1        1     1  0.121   1.79    71.3  1.23 0.418 0.174
## 10    19    19     1        1     1 -0.369   2.13    84.9  1.54 0.429 0.184
## # ... with 115 more rows, and 5 more variables: NRMSE <dbl>, TS <dbl>,
## #   NSE <dbl>, tictoc <dbl>, Results <list>

Aggregate

The nesting may has happened on multiple levels in the tibbles. This means that some of the lists contain lists. This level of complexity s necessary to reduce the amount of processing time as well as the RAM resources. The dbJoin function helps to aggregate all the information based on the Table_Type column. The MyTables column is therefore recursively joined based on the unique identifiers

dbjoin   <- doJoin(dbtabs,removeid = T)
print(dbjoin)
## # A tibble: 2 x 3
##   Database Table_Type Metrics            
##   <chr>    <chr>      <list>             
## 1 artmo_1  test_cf    <tibble [129 x 35]>
## 2 artmo_1  test_mla   <tibble [31 x 35]>

Exploration

Actually, this section is up to the next Part of this Website. Nonetheless here is the most common way to analyze one of the Table_Type aggregation. If the dplyr::unnest is used on single columns the MySQL result can be analyzed.

NOTE: This won’t work for the complete tibble since they have different column names. It is recommended to either standardize or process them one by one.

statistics<- dbjoin[1,] %>% unnest
print(statistics)
## # A tibble: 129 x 37
##    Database Table_Type Model Database1 Project PY_ID Date  name_class
##    <chr>    <chr>      <chr> <chr>     <chr>   <chr> <chr> <chr>     
##  1 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
##  2 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
##  3 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
##  4 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
##  5 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
##  6 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
##  7 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
##  8 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
##  9 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
## 10 artmo_1  test_cf    Mode~ artmo_1   Mattia1 1     2018~ Full_image
## # ... with 119 more rows, and 29 more variables: derivar <int>,
## #   size_data <S3: integer64>, estadistico <int>, Spectra <list>,
## #   LUT <list>, Name1 <chr>, Name2 <chr>, clave <chr>, param_user <list>,
## #   algorithm <chr>, normalizar <int>, sim <S3: integer64>, alfa <dbl>,
## #   beta <dbl>, noise <dbl>, train <dbl>, ME <dbl>, RMSE <dbl>,
## #   RELRMSE <dbl>, MAE <dbl>, R <dbl>, R2 <dbl>, NRMSE <dbl>, TS <dbl>,
## #   NSE <dbl>, tictoc <dbl>, Results <list>, Store.ID <int>,
## #   Table_Type1 <chr>

Working Environment

Even though I would personally not recommend to save the resulting nested tibble the best possibility is by creating an RDS file. This is simply done with the saveRDS function. The big advantage of RDS is that the format is capable to handle the nesting as used in the presented tibbles. However, the dbjoin Element could become an extremely large object when saved. Consider saving parts of the dbJoin object such as by model.
In general, it is recommended to build a folder structure before going in the deep analysis. This is very useful when the Output of the Model (such as Spatial Rasters) have to be connected to the Model output later.It is mandatory to manually connect the model output to the table since ARTMO provides no logical link in the MySQL tables within its table structure.

dir<-"C:/ARTMO"
models<-buildpath(dbjoin,dir,foldersetup = T)
models
## # A tibble: 11 x 3
##    Database Table_Type Dir                                 
##    <chr>    <chr>      <chr>                               
##  1 artmo_1  test_cf    C:/ARTMO/artmo_1/test_cf/Model1/    
##  2 artmo_1  test_cf    C:/ARTMO/artmo_1/test_cf/Model2/    
##  3 artmo_1  test_cf    C:/ARTMO/artmo_1/test_cf/ModelTia/  
##  4 artmo_1  test_cf    C:/ARTMO/artmo_1/test_cf/Mattab/    
##  5 artmo_1  test_cf    C:/ARTMO/artmo_1/test_cf/Table_1107/
##  6 artmo_1  test_cf    C:/ARTMO/artmo_1/test_cf/Tabnew/    
##  7 artmo_1  test_cf    C:/ARTMO/artmo_1/test_cf/TABLEAGAIN/
##  8 artmo_1  test_cf    C:/ARTMO/artmo_1/test_cf/Table_1210/
##  9 artmo_1  test_mla   C:/ARTMO/artmo_1/test_mla/MLRA1/    
## 10 artmo_1  test_mla   C:/ARTMO/artmo_1/test_mla/MLRA2/    
## 11 artmo_1  test_mla   C:/ARTMO/artmo_1/test_mla/MLRA3/