candle.ckpt_pytorch_utils.CandleCkptPyTorch

candle.ckpt_pytorch_utils.CandleCkptPyTorch#

class candle.ckpt_pytorch_utils.CandleCkptPyTorch(gParams, logger='DEFAULT', verbose=True)#

PyTorch Callback for CANDLE-compliant Benchmarks to use for checkpointing Creates a JSON file alongside the weights and optimizer checkpoints that includes important metadata, particularly for restarting and tracking complex workflows.

__init__(gParams, logger='DEFAULT', verbose=True)#
Parameters:
  • logger (Logger) – The logger to use. May be None to disable or “DEFAULT” to use the default.

  • verbose (boolean) – If True, more verbose logging Passed to helper_utils.set_up_logger(verbose) for this logger

Methods

__init__(gParams[, logger, verbose])

param Logger logger:

The logger to use.

build_model(model_file)

checksum(dir_work)

Simple checksum dispatch dir_work: A pathlib.Path

checksum_file(filename)

Read file, compute checksum, return it as a string.

ckpt_epoch(epoch, metric_value)

The PyTorch training loop should call this each epoch

clean(epoch_now)

Clean old epoch directories

debug(message)

delete(epoch)

disabled(key)

Is this parameter set to False?

enabled(key)

Is this parameter set to True?

info(message)

keep(epoch, epoch_now, kept)

kept: Number of epochs already kept return True if we are keeping this epoch, else False

on_train_end([logs])

param(key, dflt[, type_, allowed])

Pull key from parameters with type checks and conversions

param_allowed(key, value, allowed)

Check that the value is in the list of allowed values If allowed is None, there is no check, simply success

param_type_check(key, value, type_)

Check that value is convertable to given type:

param_type_check_bool(key, value)

param_type_check_float(key, value, type_)

param_type_check_int(key, value, type_)

relpath(p)

If Path p is relative to CWD, relativize it and return it.

report_final()

report_initial()

Simply report that we are ready to run

restart(model[, verbose])

Possibly restarts model from CheckpointCallback according to given settings and the ckpt-info.json

restart_json(directory)

save_check(epoch, direction, metric_value)

Make sure we want to save this epoch based on the model metrics in given logs Also updates epoch_best if appropriate. epoch: The current epoch (just completed) direction: either "+" (metric_value should increase) or "-" (should decrease) metric_value: The current ckpt metric value.

save_check_best(epoch, direction, metric_value)

scan_params(gParams)

Simply translate gParameters into instance fields

set_model(model)

model: A dict with the model {'model':model, 'optimizer':optimizer}

symlink(src, dst)

Like os.symlink, but overwrites dst and logs

write_json(jsonfile, epoch)

write_model(dir_work, epoch)

Do the I/O, report stats dir_work: A pathlib.Path

write_model_backend(model, epoch)