Skip to content

miniml.optim

Optimization algorithms for MiniML models.

ScipyOptimizer

Bases: MiniMLOptimizer

Optimizer that wraps scipy.optimize.minimize and supports the following methods:

  • 'Nelder-Mead'
  • 'Powell'
  • 'CG'
  • 'BFGS'
  • 'L-BFGS-B'
  • 'Newton-CG'
  • 'trust-ncg'
  • 'trust-krylov'
  • 'trust-constr'
  • 'dogleg'
  • 'trust-exact'
  • 'COBYLA'

__init__(method='L-BFGS-B', options={}, tol=None)

Initialize the ScipyOptimizer.

Parameters:

Name Type Description Default
method str

The optimization method to use. Defaults to 'L-BFGS-B'.

'L-BFGS-B'
options dict

Options to pass to scipy.optimize.minimize. Defaults to {}.

{}
tol float | None

Tolerance for termination. Defaults to None.

None

AdamOptimizer

Bases: AdamBaseOptimizer

Adaptive Moment Estimation (Adam) optimizer.

This implements classical Adam. Optional weight_decay is applied via L2 regularisation inside the gradient (coupled weight decay).

References

__init__(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)

Initialize the classical Adam optimizer.

Note

Weight decay here is a way to implement L2 regularization, which is redundant compared to the use of reg_lambda in MiniML models if per-parameter regularization has been set. Be mindful of not mixing the two approaches unintentionally; if you use weight_decay > 0, use reg_lambda=0 in the fitting call.

Parameters:

Name Type Description Default
alpha float

Learning rate. Defaults to 0.001.

0.001
beta_1 float

Exponential decay rate for the first moment estimates. Defaults to 0.9.

0.9
beta_2 float

Exponential decay rate for the second moment estimates. Defaults to 0.999.

0.999
eps float

Small constant for numerical stability. Defaults to 1e-8.

1e-08
weight_decay float

L2 weight decay coefficient applied inside the gradient (coupled). Defaults to 0.0.

0.0
ortho_grad bool

If True, project gradients to be orthogonal to the parameters at each step. Defaults to False.

False
tol float

Tolerance for stopping criterion based on the norm of the first moment. Defaults to 0.0.

0.0
maxiter int

Maximum number of iterations. Defaults to 1000.

1000

AdamWOptimizer

Bases: AdamBaseOptimizer

AdamW optimizer with decoupled weight decay.

__init__(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)

Initialize the AdamW optimizer with decoupled weight decay.

Note

Weight decay here is a way to implement decoupled regularization, which is redundant compared to the use of reg_lambda in MiniML models if per-parameter regularization has been set. Be mindful of not mixing the two approaches unintentionally; if you use weight_decay > 0, you may want to use reg_lambda=0 in the fitting call.

Parameters:

Name Type Description Default
alpha float

Learning rate. Defaults to 0.001.

0.001
beta_1 float

Exponential decay rate for the first moment estimates. Defaults to 0.9.

0.9
beta_2 float

Exponential decay rate for the second moment estimates. Defaults to 0.999.

0.999
eps float

Small constant for numerical stability. Defaults to 1e-8.

1e-08
weight_decay float

L2 weight decay coefficient, applied in a decoupled AdamW fashion. Defaults to 0.0.

0.0
ortho_grad bool

If True, project gradients to be orthogonal to the parameters at each step. Defaults to False.

False
tol float

Tolerance for stopping criterion based on the norm of the first moment. Defaults to 0.0.

0.0
maxiter int

Maximum number of iterations. Defaults to 1000.

1000