Title: | Compare Output and Run Time |
---|---|
Description: | Quickly run experiments to compare the run time and output of code blocks. The function mbc() can make fast comparisons of code, and will calculate statistics comparing the resulting outputs. It can be used to compare model fits to the same data or see which function runs faster. The R6 class ffexp$new() runs a function using all possible combinations of selected inputs. This is useful for comparing the effect of different parameter values. It can also run in parallel and automatically save intermediate results, which is very useful for long computations. |
Authors: | Collin Erickson [aut, cre] |
Maintainer: | Collin Erickson <[email protected]> |
License: | GPL-3 |
Version: | 0.2.4.9000 |
Built: | 2025-02-06 02:55:55 UTC |
Source: | https://github.com/collinerickson/comparer |
A class for easily creating and evaluating full factorial experiments.
e1 <- ffexp$new(eval_func=, ) e1$run_all() e1$plot_run_times() e1$save_self()
eval_func
The function called to evaluate each design point.
...
Factors and their levels to be evaluated at.
save_output
Should the output be saved?
parallel
If TRUE
, function evaluations are done in parallel.
parallel_cores
Number of cores to be used in parallel.
If "detect"
, parallel::detectCores()
is used to determine
number. "detect-1"
may be used so that the computer isn't running
at full capacity, which can slow down other tasks.
$new()
Initialize an experiment. The preprocessing is done,
but no function evaluations are run.
$run_all()
Run all factor combinations.
$run_one()
Run a single factor combination.
$add_result_of_one()
Used to add result of evaluation to data set,
don't manually call.
$plot_run_times()
Plot the run times. Especially useful when
they have been run in parallel.
$save_self()
Save ffexp R6 object.
$recover_parallel_temp_save()
If you ran the experiment using
parallel with parallel_temp_save=TRUE
and it crashes partway
through, call this to recover the runs that were completed.
Runs that were stopped mid-execution are not recoverable.
outrawdf
Raw data frame of output.
outcleandf
Clean output in data frame.
rungrid
matrix specifying which inputs will be run for each experiment.
nvars
Number of variables
allvars
All variables
varlist
Character vector of objects to pass to a parallel cluster.
arglist
List of values for each argument
number_runs
Total number of runs
completed_runs
Logical vector of whether each run has been completed.
eval_func
The function that is called for each experiment trial.
outlist
A list of the output from each run.
save_output
Logical of whether the output should be saved.
parallel
Logical whether experiment runs should be run in parallel. Allows for massive speedup.
parallel_cores
How many cores to use when running in parallel. Can be an integer, or 'detect' will detect how many cores are available, or 'detect-1' will do one less than that.
parallel_cluster
The parallel cluster being used.
folder_path
The path to the folder where output will be saved.
verbose
How much should be printed when running. 0 is none, 2 is average.
extract_output_to_df
A function to extract the raw output into a data frame. E.g., if the output is a list, but you want a single item to show up in the output data frame.
hashvalue
A value used to make sure inputs match when reloading.
new()
Create an 'ffexp' object.
ffexp$new( ..., eval_func, save_output = FALSE, parallel = FALSE, parallel_cores = "detect", folder_path, varlist = NULL, verbose = 2, extract_output_to_df = NULL )
...
Input arguments for the experiment
eval_func
The function to be run. It must take named arguments matching the names of ...
save_output
Should output be saved to file?
parallel
Should a parallel cluster be used?
parallel_cores
When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.
folder_path
Where the data and files should be stored. If not given, a folder in the existing directory will be created.
varlist
Character vector of names of objects that need to be passed to the parallel environment.
verbose
How much should be printed when running. 0 is none, 2 is average.
extract_output_to_df
A function to extract the raw output into a data frame. E.g., if the output is a list, but you want a single item to show up in the output data frame.
run_all()
Run an experiment. The user can choose to run all rows, or just specified ones, if it should be run in parallel, and what files should be saved.
ffexp$run_all( to_run = NULL, random_n = NULL, redo = FALSE, run_order, save_output = self$save_output, parallel = self$parallel, parallel_cores = self$parallel_cores, parallel_temp_save = save_output, write_start_files = save_output, write_error_files = save_output, delete_parallel_temp_save_after = FALSE, varlist = self$varlist, verbose = self$verbose, outfile, warn_repeat = TRUE )
to_run
Which rows should be run? If NULL, then all that haven't been run yet.
random_n
Randomly selects n trials among those not yet completed and runs them.
redo
Should already completed rows be run again?
run_order
In what order should the rows by run? Options: random, in_order, and reverse.
save_output
Should the output be saved?
parallel
Should it be run in parallel?
parallel_cores
When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.
parallel_temp_save
Should temp files be written when running in parallel? Prevents losing results if it crashes partway through.
write_start_files
Should start files be written?
write_error_files
Should error files be written for rows that fail?
delete_parallel_temp_save_after
If using parallel temp save files, should they be deleted afterwards?
varlist
A character vector of names of variables to be passed the the parallel cluster.
verbose
How much should be printed when running. 0 is none, 2 is average.
outfile
Where should master output file be saved when running in parallel?
warn_repeat
Should warnings be given when repeating already completed rows?
run_for_time()
Run the experiment for a given time, not for a specified number of trials. Runs 'batch_size' trials between checking the time elapsed, only needs to be more than 1 when running in parallel. It will complete the current batch before stopping, it does not quit in the middle of the batch when reaching the time limit, so it will go over the time limit given.
ffexp$run_for_time( sec, batch_size, show_time_in_bar = FALSE, save_output = self$save_output, parallel = self$parallel, parallel_cores = self$parallel_cores, parallel_temp_save = save_output, write_start_files = save_output, write_error_files = save_output, delete_parallel_temp_save_after = FALSE, varlist = self$varlist, verbose = self$verbose, warn_repeat = TRUE )
sec
Number of seconds to run for
batch_size
Number of trials to run between checking the time elapsed.
show_time_in_bar
The progress bar can show either the number of runs completed or the time elapsed.
save_output
Should the output be saved?
parallel
Should it be run in parallel?
parallel_cores
When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.
parallel_temp_save
Should temp files be written when running in parallel? Prevents losing results if it crashes partway through.
write_start_files
Should start files be written?
write_error_files
Should error files be written for rows that fail?
delete_parallel_temp_save_after
If using parallel temp save files, should they be deleted afterwards?
varlist
A character vector of names of variables to be passed the the parallel cluster.
verbose
How much should be printed when running. 0 is none, 2 is average.
warn_repeat
Should warnings be given when repeating already completed rows?
run_superbatch()
Run batches. Allows for better progress visualization and saving when running in parallel
ffexp$run_superbatch( nsb, redo = FALSE, run_order, save_output = self$save_output, parallel = self$parallel, parallel_cores = self$parallel_cores, parallel_temp_save = save_output, write_start_files = save_output, write_error_files = save_output, delete_parallel_temp_save_after = FALSE, varlist = self$varlist, verbose = self$verbose, warn_repeat = TRUE )
nsb
Number of super batches
redo
Should already completed rows be run again?
run_order
In what order should the rows by run? Options: random, in_order, and reverse.
save_output
Should the output be saved?
parallel
Should it be run in parallel?
parallel_cores
When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.
parallel_temp_save
Should temp files be written when running in parallel? Prevents losing results if it crashes partway through.
write_start_files
Should start files be written?
write_error_files
Should error files be written for rows that fail?
delete_parallel_temp_save_after
If using parallel temp save files, should they be deleted afterwards?
varlist
A character vector of names of variables to be passed the the parallel cluster.
verbose
How much should be printed when running. 0 is none, 2 is average.
warn_repeat
Should warnings be given when repeating already completed rows?
outfile
Where should master output file be saved when running in parallel?
run_one()
Run a single row of the experiment. You can specify which one to run. Generally this should not be used by users, use 'run_all' instead.
ffexp$run_one( irow = NULL, save_output = self$save_output, write_start_files = save_output, write_error_files = save_output, warn_repeat = TRUE, is_parallel = FALSE, return_list_result_of_one = FALSE, verbose = self$verbose, force_this_as_output )
irow
Which row should be run?
save_output
Should the output be saved?
write_start_files
Should a file be written when starting the experiment?
write_error_files
Should a file be written if there is an error?
warn_repeat
Should a warning be given if repeating a row?
is_parallel
Is this being run in parallel?
return_list_result_of_one
Should the list of the result of this one be return?
verbose
How much should be printed when running. 0 is none, 2 is average.
force_this_as_output
Value to use instead of evaluating function.
add_result_of_one()
Add the result of a single experiment to the object. This shouldn't be used by users.
ffexp$add_result_of_one( output, systime, irow, row_grid, row_df, start_time, end_time, save_output, hashvalue )
output
The output of the experiment.
systime
The time it took to run
irow
The row of inputs used.
row_grid
The corresponding row in the run grid.
row_df
The corresponding row data frame.
start_time
The start time of the experiment.
end_time
The end time of the experiment.
save_output
Should the output be saved?
hashvalue
Not used.
plot_run_times()
Plot the run times of each trial.
ffexp$plot_run_times()
plot_pairs()
Plot pairs of inputs and outputs. Helps see correlations and distributions.
ffexp$plot_pairs()
plot()
Calling 'plot' on an 'ffexp' object calls 'plot_pairs()'
ffexp$plot()
calculate_effects()
Calculate the effects of each variable as if this was an experiment using a linear model.
ffexp$calculate_effects()
calculate_effects2()
Calculate the effects of each variable as if this was an experiment using a linear model.
ffexp$calculate_effects2()
save_self()
Save this R6 object
ffexp$save_self(verbose = self$verbose)
verbose
How much should be printed when running. 0 is none, 2 is average.
create_save_folder_if_nonexistent()
Create the save folder if it doesn't already exist.
ffexp$create_save_folder_if_nonexistent()
rename_save_folder()
Rename the save folder
ffexp$rename_save_folder(new_folder_path, new_folder_name)
new_folder_path
New path for the save folder
new_folder_name
If you want the new save folder to be in the current directory, you can use this instead of 'new_folder_path' and just give the folder name.
delete_save_folder_if_empty()
Delete the save folder if it is empty. Used to prevent leaving behind empty folders.
ffexp$delete_save_folder_if_empty(verbose = self$verbose)
verbose
How much should be printed when running. 0 is none, 2 is average.
recover_parallel_temp_save()
Running this loads the information saved to files if 'save_parallel_temp_save=TRUE' was used when running. Useful when running long jobs in parallel so that you don't lose all results if it crashes before finishing.
ffexp$recover_parallel_temp_save(delete_after = FALSE, only_reload_new = FALSE)
delete_after
Should the temp files be deleted after they are recovered? If TRUE, make sure you save the ffexp object after running this function so you don't lose the data.
only_reload_new
Will only reload output from runs that don't show as completed yet. Can make it much faster if there are many saved files, but most have already been loaded to this object.
rungrid2()
Display the input rows of the experiment. rungrid just gives integers, this gives the actual values.
ffexp$rungrid2(rows = 1:nrow(self$rungrid))
rows
Which rows to display the inputs for? On big experiments, specifying the rows can be much faster.
add_variable()
Add a variable to the experiment. You must specify the value of the variable for all existing rows, and then also the values of the variable which haven't been run yet.
ffexp$add_variable(name, existing_value, new_values, suppressMessage = FALSE)
name
Name of the variable being added.
existing_value
Which existing argument is a level being added to?
new_values
The values of the new variable which have not been run. This should not include 'arg_name', the name of the new variable at the existing values.
suppressMessage
Should the message be suppressed? The message tells the user a new variable was added and it is being returned in a new object. Default FALSE.
add_level()
Add a level to one of the arguments. This returns a new object. The existing object is not changed.
ffexp$add_level(arg_name, new_values, suppressMessage = FALSE)
arg_name
Which existing argument is a level being added to?
new_values
The value of the new levels to be added to 'arg_name'.
suppressMessage
Should the message be suppressed? The message tells the user a new level was added and it is being returned in a new object. Default FALSE.
remove_results()
Remove results of completed trials. They will be rerun next time $run_all() is called.
ffexp$remove_results(to_remove)
to_remove
Indexes of trials to remove
print()
Printing the object shows some summary information.
ffexp$print()
set_parallel_cores()
Set the number of parallel cores to be used when running in parallel. Needed in case user sets "detect"
ffexp$set_parallel_cores(parallel_cores)
parallel_cores
When running in parallel, how many cores should be used. Not actually the number of cores used, actually the number of clusters created. Can be more than the computer has available, but will hurt performance. Can set to 'detect' to have it detect how many cores are available and use that, or 'detect-1' to use one fewer than there are.
stop_cluster()
Stop the parallel cluster.
ffexp$stop_cluster()
finalize()
Cleanup after deleting object.
ffexp$finalize()
clone()
The objects of this class are cloneable with this method.
ffexp$clone(deep = FALSE)
deep
Whether to make a deep clone.
# Two factors, both with two levels. # The evaluation function simply prints out the combination cc <- ffexp$new(a=1:2,b=c("A","B"), eval_func=function(...) {c(...)}) # View the factor settings it will run (each row). cc$rungrid # Evaluate all four settings cc$run_all() cc <- ffexp$new(a=1:3,b=2, cd=data.frame(c=3:4,d=5:6), eval_func=function(...) {list(...)})
# Two factors, both with two levels. # The evaluation function simply prints out the combination cc <- ffexp$new(a=1:2,b=c("A","B"), eval_func=function(...) {c(...)}) # View the factor settings it will run (each row). cc$rungrid # Evaluate all four settings cc$run_all() cc <- ffexp$new(a=1:3,b=2, cd=data.frame(c=3:4,d=5:6), eval_func=function(...) {list(...)})
Hyperparameter optimization
hype( eval_func, ..., X0 = NULL, Z0 = NULL, n_lhs, extract_output_func, verbose = 1, model = "GauPro", covtype = "matern5_2", nugget.estim = TRUE )
hype( eval_func, ..., X0 = NULL, Z0 = NULL, n_lhs, extract_output_func, verbose = 1, model = "GauPro", covtype = "matern5_2", nugget.estim = TRUE )
eval_func |
The function we evaluate. |
... |
Pass in hyperparameters, such as par_unif() as unnamed arguments. |
X0 |
A data frame of initial points to include. They must have the same names as the hyperparameters. If Z0 is also passed, it should match the points in X0. If Z0 is not passed, then X0 will be the first points evaluated. |
Z0 |
A vector whose values are the result of applying 'eval_func' to each row of X0. |
n_lhs |
The number of random points to start with. They are selected using a Latin hypercube sample. |
extract_output_func |
A function that takes in the output from 'eval_func' and returns the value we are trying to minimize. |
verbose |
How much should be printed? 0 is none, 1 is standard, 2 is more, 5+ is a lot |
model |
What kind of model to use. |
covtype |
The covariance function to use for the Gaussian process model. |
nugget.estim |
Whether a nugget should be estimated when fitting the Gaussian process model. |
# Have df output, but only use one value from it h1 <- hype( eval_func = function(a, b) {data.frame(c=a^2+b^2, d=1:2)}, extract_output_func = function(odf) {odf$c[1]}, a = par_unif('a', -1, 2), b = par_unif('b', -10, 10), n_lhs = 10 ) h1$run_all() h1$add_EI(n = 1) h1$run_all() #system.time(h1$run_EI_for_time(sec=3, batch_size = 1)) #system.time(h1$run_EI_for_time(sec=3, batch_size = 3)) h1$plotorder() h1$plotX()
# Have df output, but only use one value from it h1 <- hype( eval_func = function(a, b) {data.frame(c=a^2+b^2, d=1:2)}, extract_output_func = function(odf) {odf$c[1]}, a = par_unif('a', -1, 2), b = par_unif('b', -10, 10), n_lhs = 10 ) h1$run_all() h1$add_EI(n = 1) h1$run_all() #system.time(h1$run_EI_for_time(sec=3, batch_size = 1)) #system.time(h1$run_EI_for_time(sec=3, batch_size = 3)) h1$plotorder() h1$plotX()
Compare the run time and output of various code chunks
mbc( ..., times = 5, input, inputi, evaluator, post, target, targetin, metric = "rmse", paired, kfold )
mbc( ..., times = 5, input, inputi, evaluator, post, target, targetin, metric = "rmse", paired, kfold )
... |
Functions to run |
times |
Number of times to run |
input |
Object to be passed as input to each function |
inputi |
Function to be called with the replicate number then passed to each function. |
evaluator |
An expression that the ... expressions will be passed as "." for evaluation. |
post |
Function or expression (using ".") to post-process results. |
target |
Values the functions are expected to (approximately) return. |
targetin |
Values that will be given to the result of the run to produce output. |
metric |
c("rmse", "t", "mis90", "sr27") Metric used to compare output values to target. mis90 is the mean interval score for 90% confidence, see Gneiting and Raftery (2007). sr27 is the scoring rule given in Equation 27 of Gneiting and Raftery (2007). |
paired |
Should the results be paired for comparison? |
kfold |
First element should be the number of elements that are being split into groups. If the number of folds is different from 'times', then the second argument is the number of folds. Use 'ki' in 'inputi' and 'targeti' to select elements in the current fold. |
Data frame of comparison results
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359-378.
# Compare distribution of mean for different sample sizes mbc(mean(rnorm(1e2)), mean(rnorm(1e4)), times=20) # Compare mean and median on same data mbc(mean(x), median(x), inputi={x=rexp(1e2)}) # input given, no post mbc({Sys.sleep(rexp(1, 30));mean(x)}, {Sys.sleep(rexp(1, 5));median(x)}, inputi={x=runif(100)}) # input given with post mbc(mean={Sys.sleep(rexp(1, 30));mean(x)}, med={Sys.sleep(rexp(1, 5));median(x)}, inputi={x=runif(100)}, post=function(x){c(x+1, x^2)}) # input given with post, 30 times mbc(mean={Sys.sleep(rexp(1, 30));mean(x)+runif(1)}, med={Sys.sleep(rexp(1, 50));median(x)+runif(1)}, inputi={x=runif(100)}, post=function(x){c(x+1, x^2)}, times=10) # Name one function and post mbc({mean(x)+runif(1)}, a1={median(x)+runif(1)}, inputi={x=runif(100)}, post=function(x){c(rr=x+1, gg=x^2)}, times=10) # No input m1 <- mbc(mean={x <- runif(100);Sys.sleep(rexp(1, 30));mean(x)}, med={x <- runif(100);Sys.sleep(rexp(1, 50));median(x)})
# Compare distribution of mean for different sample sizes mbc(mean(rnorm(1e2)), mean(rnorm(1e4)), times=20) # Compare mean and median on same data mbc(mean(x), median(x), inputi={x=rexp(1e2)}) # input given, no post mbc({Sys.sleep(rexp(1, 30));mean(x)}, {Sys.sleep(rexp(1, 5));median(x)}, inputi={x=runif(100)}) # input given with post mbc(mean={Sys.sleep(rexp(1, 30));mean(x)}, med={Sys.sleep(rexp(1, 5));median(x)}, inputi={x=runif(100)}, post=function(x){c(x+1, x^2)}) # input given with post, 30 times mbc(mean={Sys.sleep(rexp(1, 30));mean(x)+runif(1)}, med={Sys.sleep(rexp(1, 50));median(x)+runif(1)}, inputi={x=runif(100)}, post=function(x){c(x+1, x^2)}, times=10) # Name one function and post mbc({mean(x)+runif(1)}, a1={median(x)+runif(1)}, inputi={x=runif(100)}, post=function(x){c(rr=x+1, gg=x^2)}, times=10) # No input m1 <- mbc(mean={x <- runif(100);Sys.sleep(rexp(1, 30));mean(x)}, med={x <- runif(100);Sys.sleep(rexp(1, 50));median(x)})
Parameter with uniform distribution for hyperparameter optimization
par_discretenum(name, values)
par_discretenum(name, values)
name |
Name of the parameter, must match the input to 'eval_func'. |
values |
Values, discrete numeric |
p1 <- par_discretenum('x1', 0:2) class(p1) print(p1)
p1 <- par_discretenum('x1', 0:2) class(p1) print(p1)
Parameter with uniform distribution over integer range for hyperparameter optimization
par_integer(name, lower, upper)
par_integer(name, lower, upper)
name |
Name of the parameter, must match the input to 'eval_func'. |
lower |
Lower bound of the parameter |
upper |
Upper bound of the parameter |
p1 <- par_integer('x1', 3, 8) class(p1) print(p1) table(p1$generate(runif(1000)))
p1 <- par_integer('x1', 3, 8) class(p1) print(p1) table(p1$generate(runif(1000)))
Hyperparameter on log10 scale
par_log10(name, lower, upper)
par_log10(name, lower, upper)
name |
Name of the parameter, must match the input to 'eval_func'. |
lower |
Lower bound of the parameter |
upper |
Upper bound of the parameter |
p1 <- par_log10('x1', 1e-4, 1e4) class(p1) print(p1)
p1 <- par_log10('x1', 1e-4, 1e4) class(p1) print(p1)
Hyperparameter of discrete (factor) variable
par_ordered(name, values)
par_ordered(name, values)
name |
Name of the parameter, must match the input to 'eval_func'. |
values |
Vector of values |
p1 <- par_ordered('x1', c('a', 'b', 'c')) class(p1) print(p1)
p1 <- par_ordered('x1', c('a', 'b', 'c')) class(p1) print(p1)
Parameter with uniform distribution for hyperparameter optimization
par_unif(name, lower, upper)
par_unif(name, lower, upper)
name |
Name of the parameter, must match the input to 'eval_func'. |
lower |
Lower bound of the parameter |
upper |
Upper bound of the parameter |
Returns an R6 class generated by R6_par_unif.
p1 <- par_unif('x1', 1, 10) class(p1) print(p1)
p1 <- par_unif('x1', 1, 10) class(p1) print(p1)
Hyperparameter of discrete (factor) variable
par_unordered(name, values)
par_unordered(name, values)
name |
Name of the parameter, must match the input to 'eval_func'. |
values |
Vector of values |
p1 <- par_unordered('x1', c('a', 'b', 'c')) class(p1) print(p1)
p1 <- par_unordered('x1', c('a', 'b', 'c')) class(p1) print(p1)
Plot mbc class
## S3 method for class 'mbc' plot(x, ...)
## S3 method for class 'mbc' plot(x, ...)
x |
Object of class mbc |
... |
Additional parameters |
None
m1 <- mbc(mn= {Sys.sleep(rexp(1, 30));mean(x)}, med={Sys.sleep(rexp(1, 5));median(x)}, input=runif(100)) plot(m1)
m1 <- mbc(mn= {Sys.sleep(rexp(1, 30));mean(x)}, med={Sys.sleep(rexp(1, 5));median(x)}, input=runif(100)) plot(m1)
Print mbc class
## S3 method for class 'mbc' print(x, ...)
## S3 method for class 'mbc' print(x, ...)
x |
Object of class mbc |
... |
Additional parameters |
None
m1 <- mbc({Sys.sleep(rexp(1, 30));mean(x)}, {Sys.sleep(rexp(1, 5));median(x)}, input=runif(100)) print(m1)
m1 <- mbc({Sys.sleep(rexp(1, 30));mean(x)}, {Sys.sleep(rexp(1, 5));median(x)}, input=runif(100)) print(m1)
Hyperparameter optimization
Hyperparameter optimization
X
Data frame of inputs that have been evaluated or will be evaluated next.
Z
Output at X
runtime
The time it took to evaluate each row of X
parnames
Names of the parameters
parlowerraw
Lower bounds for each parameter on raw scale
parupperraw
Upper bounds for each parameter on raw scale
parlowertrans
Lower bounds for each parameter on transformed scale
paruppertrans
Upper bounds for each parameter on transformed scale
parlist
List of all parameters
modlist
A list with details about the model. The user shouldn't ever edit this directly.
ffexp
An ffexp R6 object used to run the experiment and store the results.
eval_func
The function we evaluate.
extract_output_func
A function that takes in the output from 'eval_func' and returns the value we are trying to minimize.
par_all_cts
Are all the parameters continuous?
verbose
How much should be printed? 0 is none, 1 is standard, 2 is more, 5+ is a lot
mod
Gaussian process model used to predict what the output will be.
new()
Create hype R6 object.
R6_hype$new( eval_func, ..., X0 = NULL, Z0 = NULL, n_lhs, extract_output_func, verbose = 1, model = "GauPro", covtype = "matern5_2", nugget.estim = TRUE )
eval_func
The function used to evaluate new points.
...
Hyperparameters to optimize over.
X0
Data frame of initial points to run, or points already evaluated. If already evaluated, give in outputs in "Z0"
Z0
Evaluated outputs at "X0".
n_lhs
The number that should initially be run using a maximin Latin hypercube.
extract_output_func
A function that takes in the output from 'eval_func' and returns the value we are trying to minimize.
verbose
How much should be printed? 0 is none, 1 is standard, 2 is more, 5+ is a lot
model
What package to fit the Gaussian process model with. Either "GauPro" or "DiceKriging"/"DK".
covtype
Covariance/correlation/kernel function for the GP model.
nugget.estim
Should the nugget be estimated when fitting the GP model?
add_data()
Add data to the experiment results.
R6_hype$add_data(X, Z)
X
Data frame with names matching the input parameters
Z
Output at rows of X matching the experiment output.
add_X()
Add new inputs to run. This allows the user to specify what they want run next.
R6_hype$add_X(X)
X
Data frame with names matching the input parameters.
add_LHS()
Add new input points using a maximin Latin hypercube. Latin hypercubes are usually more spacing than randomly picking points.
R6_hype$add_LHS(n, just_return_df = FALSE)
n
Number of points to add.
just_return_df
Instead of adding to experiment, should it just return the new set of values?
convert_trans_to_raw()
Convert parameters from transformed scale to raw scale.
R6_hype$convert_trans_to_raw(Xtrans)
Xtrans
Parameters on the transformed scale
convert_raw_to_trans()
Convert parameters from raw scale to transformed scale.
R6_hype$convert_raw_to_trans(Xraw)
Xraw
Parameters on the raw scale
change_par_bounds()
Change lower/upper bounds of a parameter
R6_hype$change_par_bounds(parname, lower, upper)
parname
Name of the parameter
lower
New lower bound. Leave empty if not changing.
upper
New upper bound. Leave empty if not changing.
add_EI()
Add new inputs to run using the expected information criteria
R6_hype$add_EI( n, covtype = NULL, nugget.estim = NULL, model = NULL, eps, just_return = FALSE, calculate_at )
n
Number of points to add.
covtype
Covariance function to use for the Gaussian process model.
nugget.estim
Should a nugget be estimated?
model
Which package should be used to fit the model and calculate the EI? Use "DK" for DiceKriging or "GauPro" for GauPro.
eps
Exploration parameter. The minimum amount of improvement you care about.
just_return
Just return the EI info, don't actually add the points to the design.
calculate_at
Calculate the EI at a specific point.
fit_mod()
Fit model to the data collected so far
R6_hype$fit_mod(covtype = NULL, nugget.estim = NULL, model = NULL)
covtype
Covariance function to use for the Gaussian process model.
nugget.estim
Should a nugget be estimated?
model
Which package should be used to fit the model and calculate the EI? Use "DK" for DiceKriging or "GauPro" for GauPro.
run_all()
Run all unevaluated input points.
R6_hype$run_all(...)
...
Passed into ‘ffexp$run_all'. Can set ’parallel=TRUE' to evaluate multiple points simultaneously as long as all needed variables have been passed to 'varlist'
run_EI_for_time()
Add points using the expected information criteria, evaluate them, and repeat until a specified amount of time has passed.
R6_hype$run_EI_for_time( sec, batch_size, covtype = "matern5_2", nugget.estim = TRUE, verbose = 0, model = "GauPro", eps = 0, ... )
sec
Number of seconds to run for. It will go over this time limit, finish the current iteration, then stop.
batch_size
Number of points to run at once.
covtype
Covariance function to use for the Gaussian process model.
nugget.estim
Should a nugget be estimated?
verbose
Verbose parameter to pass to ffexp$
model
Which package should be used to fit the model and calculate the EI? Use "DK" for DiceKriging or "GauPro" for GauPro.
eps
Exploration parameter. The minimum amount of improvement you care about.
...
Passed into 'ffexp$run_all'.
plot()
Make a plot to summarize the experiment.
R6_hype$plot()
pairs()
Plot pairs of inputs and output
R6_hype$pairs()
plotorder()
Plot the output of the points evaluated in order.
R6_hype$plotorder()
plotX()
Plot the output as a function of each input.
R6_hype$plotX( addlines = TRUE, addEIlines = TRUE, covtype = NULL, nugget.estim = NULL, model = NULL )
addlines
Should prediction mean and 95% interval be plotted?
addEIlines
Should expected improvement lines be plotted?
covtype
Covariance function to use for the Gaussian process model.
nugget.estim
Should a nugget be estimated?
model
Which package should be used to fit the model and calculate the EI? Use "DK" for DiceKriging or "GauPro" for GauPro.
plotXorder()
Plot each input in the order they were chosen. Colored by quality.
R6_hype$plotXorder()
plotinteractions()
Plot the 2D plots from inputs to the output. All other variables are held at their values for the best input.
R6_hype$plotinteractions(covtype = "matern5_2", nugget.estim = TRUE)
covtype
Covariance function to use for the Gaussian process model.
nugget.estim
Should a nugget be estimated?
print()
Print details of the object.
R6_hype$print(...)
...
not used
best_params()
Returns the best parameters evaluated so far.
R6_hype$best_params()
update_mod_userspeclist()
Updates the specifications for the GP model.
R6_hype$update_mod_userspeclist( model = NULL, covtype = NULL, nugget.estim = NULL )
model
What package to fit the Gaussian process model with. Either "GauPro" or "DiceKriging"/"DK".
covtype
Covariance/correlation/kernel function for the GP model.
nugget.estim
Should the nugget be estimated when fitting the GP model?
clone()
The objects of this class are cloneable with this method.
R6_hype$clone(deep = FALSE)
deep
Whether to make a deep clone.
# Have df output, but only use one value from it h1 <- hype( eval_func = function(a, b) {data.frame(c=a^2+b^2, d=1:2)}, extract_output_func = function(odf) {odf$c[1]}, a = par_unif('a', -1, 2), b = par_unif('b', -10, 10), n_lhs = 10 ) h1$run_all() h1$add_EI(n = 1) h1$run_all() #system.time(h1$run_EI_for_time(sec=3, batch_size = 1)) #system.time(h1$run_EI_for_time(sec=3, batch_size = 3)) h1$plotorder() h1$plotX()
# Have df output, but only use one value from it h1 <- hype( eval_func = function(a, b) {data.frame(c=a^2+b^2, d=1:2)}, extract_output_func = function(odf) {odf$c[1]}, a = par_unif('a', -1, 2), b = par_unif('b', -10, 10), n_lhs = 10 ) h1$run_all() h1$add_EI(n = 1) h1$run_all() #system.time(h1$run_EI_for_time(sec=3, batch_size = 1)) #system.time(h1$run_EI_for_time(sec=3, batch_size = 3)) h1$plotorder() h1$plotX()
R6 object for discrete numeric
R6 object for discrete numeric
Parameter with uniform distribution for hyperparameter optimization
comparer::par_hype
-> par_discretenum
name
Name of the parameter, must match the input to 'eval_func'.
values
Values, discrete numeric
ggtrans
Transformation for ggplot, see ggplot2::scale_x_continuous()
fromraw()
Function to convert from raw scale to transformed scale
R6_par_discretenum$fromraw(x)
x
Value of raw scale
toraw()
Function to convert from transformed scale to raw scale
R6_par_discretenum$toraw(x)
x
Value of transformed scale
generate()
Generate values in the raw space based on quantiles.
R6_par_discretenum$generate(q)
q
In [0,1].
getseq()
Get a sequence, uniform on the transformed scale
R6_par_discretenum$getseq(n)
n
Number of points. Ignored for discrete.
isvalid()
Check if input is valid for parameter
R6_par_discretenum$isvalid(x)
x
Parameter value
convert_to_mopar()
Convert this to a parameter for the mixopt R package.
R6_par_discretenum$convert_to_mopar(raw_scale = FALSE)
raw_scale
Should it be on the raw scale?
new()
Create a hyperparameter with uniform distribution
R6_par_discretenum$new(name, values)
name
Name of the parameter, must match the input to 'eval_func'.
values
Numeric values, must be in ascending order
print()
Print details of the object.
R6_par_discretenum$print(...)
...
not used
clone()
The objects of this class are cloneable with this method.
R6_par_discretenum$clone(deep = FALSE)
deep
Whether to make a deep clone.
p1 <- R6_par_discretenum$new('x1', 0:2) class(p1) print(p1)
p1 <- R6_par_discretenum$new('x1', 0:2) class(p1) print(p1)
Parameter for hyperparameter optimization
Parameter for hyperparameter optimization
partrans
The transformation type.
getseq()
Get a sequence, uniform on the transformed scale
R6_par_hype$getseq(n)
n
Number of points. Ignored for discrete.
clone()
The objects of this class are cloneable with this method.
R6_par_hype$clone(deep = FALSE)
deep
Whether to make a deep clone.
p1 <- R6_par_hype$new() class(p1) print(p1)
p1 <- R6_par_hype$new() class(p1) print(p1)
Parameter with uniform distribution over integer range for hyperparameter optimization
Parameter with uniform distribution over integer range for hyperparameter optimization
comparer::par_hype
-> par_integer
name
Name of the parameter, must match the input to 'eval_func'.
lower
Lower bound of the parameter
upper
Upper bound of the parameter
ggtrans
Transformation for ggplot, see ggplot2::scale_x_continuous()
fromraw()
Function to convert from raw scale to transformed scale
R6_par_integer$fromraw(x)
x
Value of raw scale
toraw()
Function to convert from transformed scale to raw scale
R6_par_integer$toraw(x)
x
Value of transformed scale
generate()
Generate values in the raw space based on quantiles.
R6_par_integer$generate(q)
q
In [0,1].
getseq()
Get a sequence, uniform on the transformed scale
R6_par_integer$getseq(n)
n
Number of points. Ignored for discrete.
isvalid()
Check if input is valid for parameter
R6_par_integer$isvalid(x)
x
Parameter value
convert_to_mopar()
Convert this to a parameter for the mixopt R package.
R6_par_integer$convert_to_mopar(raw_scale = FALSE)
raw_scale
Should it be on the raw scale?
new()
Create a hyperparameter with uniform distribution
R6_par_integer$new(name, lower, upper)
name
Name of the parameter, must match the input to 'eval_func'.
lower
Lower bound of the parameter
upper
Upper bound of the parameter
print()
Print details of the object.
R6_par_integer$print(...)
...
not used,
clone()
The objects of this class are cloneable with this method.
R6_par_integer$clone(deep = FALSE)
deep
Whether to make a deep clone.
p1 <- R6_par_integer$new('x1', 0, 2) class(p1) print(p1)
p1 <- R6_par_integer$new('x1', 0, 2) class(p1) print(p1)
R6 class for hyperparameter on log10 scale
R6 class for hyperparameter on log10 scale
comparer::par_hype
-> par_log10
name
Name of the parameter, must match the input to 'eval_func'.
lower
Lower bound of the parameter
upper
Upper bound of the parameter
ggtrans
Transformation for ggplot, see ggplot2::scale_x_continuous()
fromraw()
Function to convert from raw scale to transformed scale
R6_par_log10$fromraw(x)
x
Value of raw scale
toraw()
Function to convert from transformed scale to raw scale
R6_par_log10$toraw(x)
x
Value of transformed scale
generate()
Generate values in the raw space based on quantiles.
R6_par_log10$generate(q)
q
In [0,1].
isvalid()
Check if input is valid for parameter
R6_par_log10$isvalid(x)
x
Parameter value
convert_to_mopar()
Convert this to a parameter for the mixopt R package.
R6_par_log10$convert_to_mopar(raw_scale = FALSE)
raw_scale
Should it be on the raw scale?
new()
Create a hyperparameter with uniform distribution
R6_par_log10$new(name, lower, upper)
name
Name of the parameter, must match the input to 'eval_func'.
lower
Lower bound of the parameter
upper
Upper bound of the parameter
print()
Print details of the object.
R6_par_log10$print(...)
...
not used
clone()
The objects of this class are cloneable with this method.
R6_par_log10$clone(deep = FALSE)
deep
Whether to make a deep clone.
p1 <- par_log10('x1', 1e-4, 1e4) class(p1) print(p1)
p1 <- par_log10('x1', 1e-4, 1e4) class(p1) print(p1)
R6 class for hyperparameter of discrete (factor) variable
R6 class for hyperparameter of discrete (factor) variable
comparer::par_hype
-> par_ordered
name
Name of the parameter, must match the input to 'eval_func'.
values
Vector of values
ggtrans
Transformation for ggplot, see ggplot2::scale_x_continuous()
lower
Lower bound of the parameter
upper
Upper bound of the parameter
fromraw()
Function to convert from raw scale to transformed scale
R6_par_ordered$fromraw(x)
x
Value of raw scale
toraw()
Function to convert from transformed scale to raw scale
R6_par_ordered$toraw(x)
x
Value of transformed scale
fromint()
Convert from integer index to actual value
R6_par_ordered$fromint(x)
x
Integer index
toint()
Convert from value to integer index
R6_par_ordered$toint(x)
x
Value
generate()
Generate values in the raw space based on quantiles.
R6_par_ordered$generate(q)
q
In [0,1].
getseq()
Get a sequence, uniform on the transformed scale
R6_par_ordered$getseq(n)
n
Number of points. Ignored for discrete.
isvalid()
Check if input is valid for parameter
R6_par_ordered$isvalid(x)
x
Parameter value
convert_to_mopar()
Convert this to a parameter for the mixopt R package.
R6_par_ordered$convert_to_mopar(raw_scale = FALSE)
raw_scale
Should it be on the raw scale?
new()
Create a hyperparameter with uniform distribution
R6_par_ordered$new(name, values)
name
Name of the parameter, must match the input to 'eval_func'.
values
The values the variable can take on.
print()
Print details of the object.
R6_par_ordered$print(...)
...
not used
clone()
The objects of this class are cloneable with this method.
R6_par_ordered$clone(deep = FALSE)
deep
Whether to make a deep clone.
p1 <- par_ordered('x1', c('a', 'b', 'c')) class(p1) print(p1)
p1 <- par_ordered('x1', c('a', 'b', 'c')) class(p1) print(p1)
R6 class for Uniform parameter
R6 class for Uniform parameter
Parameter with uniform distribution for hyperparameter optimization
comparer::par_hype
-> par_unif
name
Name of the parameter, must match the input to 'eval_func'.
lower
Lower bound of the parameter
upper
Upper bound of the parameter
ggtrans
Transformation for ggplot, see ggplot2::scale_x_continuous()
fromraw()
Function to convert from raw scale to transformed scale
R6_par_unif$fromraw(x)
x
Value of raw scale
toraw()
Function to convert from transformed scale to raw scale
R6_par_unif$toraw(x)
x
Value of transformed scale
generate()
Generate values in the raw space based on quantiles.
R6_par_unif$generate(q)
q
In [0,1].
isvalid()
Check if input is valid for parameter
R6_par_unif$isvalid(x)
x
Parameter value
convert_to_mopar()
Convert this to a parameter for the mixopt R package.
R6_par_unif$convert_to_mopar(raw_scale = FALSE)
raw_scale
Should it be on the raw scale?
new()
Create a hyperparameter with uniform distribution
R6_par_unif$new(name, lower, upper)
name
Name of the parameter, must match the input to 'eval_func'.
lower
Lower bound of the parameter
upper
Upper bound of the parameter
print()
Print details of the object.
R6_par_unif$print(...)
...
not used,
clone()
The objects of this class are cloneable with this method.
R6_par_unif$clone(deep = FALSE)
deep
Whether to make a deep clone.
R6 class for hyperparameter of discrete (factor) variable
R6 class for hyperparameter of discrete (factor) variable
comparer::par_hype
-> par_unordered
name
Name of the parameter, must match the input to 'eval_func'.
values
Vector of values
ggtrans
Transformation for ggplot, see ggplot2::scale_x_continuous()
lower
Lower bound of the parameter
upper
Upper bound of the parameter
fromraw()
Function to convert from raw scale to transformed scale
R6_par_unordered$fromraw(x)
x
Value of raw scale
toraw()
Function to convert from transformed scale to raw scale
R6_par_unordered$toraw(x)
x
Value of transformed scale
fromint()
Convert from integer index to actual value
R6_par_unordered$fromint(x)
x
Integer index
toint()
Convert from value to integer index
R6_par_unordered$toint(x)
x
Value
generate()
Generate values in the raw space based on quantiles.
R6_par_unordered$generate(q)
q
In [0,1].
getseq()
Get a sequence, uniform on the transformed scale
R6_par_unordered$getseq(n)
n
Number of points. Ignored for discrete.
isvalid()
Check if input is valid for parameter
R6_par_unordered$isvalid(x)
x
Parameter value
convert_to_mopar()
Convert this to a parameter for the mixopt R package.
R6_par_unordered$convert_to_mopar(raw_scale = FALSE)
raw_scale
Should it be on the raw scale?
new()
Create a hyperparameter with uniform distribution
R6_par_unordered$new(name, values)
name
Name of the parameter, must match the input to 'eval_func'.
values
The values the variable can take on.
print()
Print details of the object.
R6_par_unordered$print(...)
...
not used
clone()
The objects of this class are cloneable with this method.
R6_par_unordered$clone(deep = FALSE)
deep
Whether to make a deep clone.
p1 <- par_unordered('x1', c('a', 'b', 'c')) class(p1) print(p1)
p1 <- par_unordered('x1', c('a', 'b', 'c')) class(p1) print(p1)