Computes distances via dist
and saves then
as file-backed matrix(FBM) using bigstatsr package or connects
existing FBM backup file on disk.
bigdist(mat, file, method = "euclidean", type = "float")
mat | Numeric matrix. When missing, attempts to connect to existing backup file. See 'file' argument. |
---|---|
file | (string) Name of the backing file to be created or an existing backup file. Do not include trailing ".bk". See details for the backup file format. |
method | (string or function) See method argument of
|
type | (string, default: 'float') Storage type of FBM. See
|
An object of class 'bigdist'.
bigdist class is a list where the key 'fbm' holds the FBM connection. The filename format is of the form <somename>_<size>_<type>.bk where size is the number of observations and type is the data type like 'double', 'float'.
bigstatsr package stores matrices on disk and allows efficient
computation on them. The disto provides a unified frontend to read
parts of distance matrices and apply functions over rows/columns. For
efficient operations, write C++ functions to talk to bigstatsr's
FBM
.
The distance computation and writing to FBM may be parallelized by setting a future backend
# basics of 'bigdist' # create a random matrix set.seed(1) amat <- matrix(rnorm(1e3), ncol = 10) td <- tempdir() # create a bigdist object with FBM (file-backed matrix) on disk temp <- bigdist(mat = amat, file = file.path(td, "temp_ex1"))#>#>#>#>#>#>temp#> $fbm #> A Filebacked Big Matrix of type 'float' with 100 rows and 100 columns. #> #> attr(,"class") #> [1] "bigdist"temp$fbm$backingfile#> [1] "/tmp/RtmpNBdlQr/temp_ex1_100_float.bk"temp$fbm[1, 2]#> [1] 4.631341# connect to FBM on disk as a bigdist object temp2 <- bigdist(file = file.path(td, "temp_ex1_100_float")) temp2#> $fbm #> A Filebacked Big Matrix of type 'float' with 100 rows and 100 columns. #> #> attr(,"class") #> [1] "bigdist"temp2$fbm[1,2]#> [1] 4.631341#> [1] 100#> [,1] #> [1,] 4.631341#> [,1] [,2] #> [1,] 3.976406 3.531089 #> [2,] 5.591309 4.661480#> [1] 3.976406 4.661480#> [1] 2 100#> [1] 100 2#> [1] 3.531089 4.131712 4.124086 3.900174 3.730360#> $fbm #> A Filebacked Big Matrix of type 'float' with 100 rows and 100 columns. #> #> attr(,"class") #> [1] "bigdist"#> [,1] #> [1,] 10#> $fbm #> A Filebacked Big Matrix of type 'float' with 100 rows and 100 columns. #> #> attr(,"class") #> [1] "bigdist"#> [1] 11 12#> $fbm #> A Filebacked Big Matrix of type 'float' with 100 rows and 100 columns. #> #> attr(,"class") #> [1] "bigdist"#> [1] 51 52 53 54 55# subset a bigdist object temp_subset <- bigdist_subset(temp, index = 21:30, file = file.path(td, "temp_ex2")) temp_subset#> $fbm #> A Filebacked Big Matrix of type 'float' with 10 rows and 10 columns. #> #> attr(,"class") #> [1] "bigdist"temp_subset$fbm$backingfile#> [1] "/tmp/RtmpNBdlQr/temp_ex2_10_float.bk"# convert a dist object(in memory) to a bigdist object temp3 <- as_bigdist(dist(mtcars), file = file.path(td, "temp_ex3"))#>#>#>#>#>temp3#> $fbm #> A Filebacked Big Matrix of type 'double' with 32 rows and 32 columns. #> #> attr(,"class") #> [1] "bigdist"