candle.ckpt_pytorch_utils.CandleCkptPyTorch#
- class candle.ckpt_pytorch_utils.CandleCkptPyTorch(gParams, logger='DEFAULT', verbose=True)#
PyTorch Callback for CANDLE-compliant Benchmarks to use for checkpointing Creates a JSON file alongside the weights and optimizer checkpoints that includes important metadata, particularly for restarting and tracking complex workflows.
- __init__(gParams, logger='DEFAULT', verbose=True)#
- Parameters:
logger (Logger) – The logger to use. May be None to disable or “DEFAULT” to use the default.
verbose (boolean) – If True, more verbose logging Passed to helper_utils.set_up_logger(verbose) for this logger
Methods
__init__(gParams[, logger, verbose])- param Logger logger:
The logger to use.
build_model(model_file)checksum(dir_work)Simple checksum dispatch dir_work: A pathlib.Path
checksum_file(filename)Read file, compute checksum, return it as a string.
ckpt_epoch(epoch, metric_value)The PyTorch training loop should call this each epoch
clean(epoch_now)Clean old epoch directories
debug(message)delete(epoch)disabled(key)Is this parameter set to False?
enabled(key)Is this parameter set to True?
info(message)keep(epoch, epoch_now, kept)kept: Number of epochs already kept return True if we are keeping this epoch, else False
on_train_end([logs])param(key, dflt[, type_, allowed])Pull key from parameters with type checks and conversions
param_allowed(key, value, allowed)Check that the value is in the list of allowed values If allowed is None, there is no check, simply success
param_type_check(key, value, type_)Check that value is convertable to given type:
param_type_check_bool(key, value)param_type_check_float(key, value, type_)param_type_check_int(key, value, type_)relpath(p)If Path p is relative to CWD, relativize it and return it.
report_final()report_initial()Simply report that we are ready to run
restart(model[, verbose])Possibly restarts model from CheckpointCallback according to given settings and the ckpt-info.json
restart_json(directory)save_check(epoch, direction, metric_value)Make sure we want to save this epoch based on the model metrics in given logs Also updates epoch_best if appropriate. epoch: The current epoch (just completed) direction: either "+" (metric_value should increase) or "-" (should decrease) metric_value: The current ckpt metric value.
save_check_best(epoch, direction, metric_value)scan_params(gParams)Simply translate gParameters into instance fields
set_model(model)model: A dict with the model {'model':model, 'optimizer':optimizer}
symlink(src, dst)Like os.symlink, but overwrites dst and logs
write_json(jsonfile, epoch)write_model(dir_work, epoch)Do the I/O, report stats dir_work: A pathlib.Path
write_model_backend(model, epoch)