`miniml.optim`

Optimization algorithms for MiniML models.

`ScipyOptimizer`

Bases: MiniMLOptimizer

Optimizer that wraps scipy.optimize.minimize and supports the following methods:

'Nelder-Mead'
'Powell'
'CG'
'BFGS'
'L-BFGS-B'
'Newton-CG'
'trust-ncg'
'trust-krylov'
'trust-constr'
'dogleg'
'trust-exact'
'COBYLA'

`init(method='L-BFGS-B', options={}, tol=None)`

Initialize the ScipyOptimizer.

Parameters:

Name	Type	Description	Default
`method`	`str`	The optimization method to use. Defaults to 'L-BFGS-B'.	`'L-BFGS-B'`
`options`	`dict`	Options to pass to scipy.optimize.minimize. Defaults to {}.	`{}`
`tol`	`float \| None`	Tolerance for termination. Defaults to None.	`None`

`AdamOptimizer`

Bases: AdamBaseOptimizer

Adaptive Moment Estimation (Adam) optimizer.

This implements classical Adam. Optional weight_decay is applied via L2 regularisation inside the gradient (coupled weight decay).

References

Diederik P. Kingma and Jimmy Ba. "Adam: A Method for Stochastic Optimization."

`init(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)`

Initialize the classical Adam optimizer.

Note

Weight decay here is a way to implement L2 regularization, which is redundant compared to the use of reg_lambda in MiniML models if per-parameter regularization has been set. Be mindful of not mixing the two approaches unintentionally; if you use weight_decay > 0, use reg_lambda=0 in the fitting call.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	Learning rate. Defaults to 0.001.	`0.001`
`beta_1`	`float`	Exponential decay rate for the first moment estimates. Defaults to 0.9.	`0.9`
`beta_2`	`float`	Exponential decay rate for the second moment estimates. Defaults to 0.999.	`0.999`
`eps`	`float`	Small constant for numerical stability. Defaults to 1e-8.	`1e-08`
`weight_decay`	`float`	L2 weight decay coefficient applied inside the gradient (coupled). Defaults to 0.0.	`0.0`
`ortho_grad`	`bool`	If True, project gradients to be orthogonal to the parameters at each step. Defaults to False.	`False`
`tol`	`float`	Tolerance for stopping criterion based on the norm of the first moment. Defaults to 0.0.	`0.0`
`maxiter`	`int`	Maximum number of iterations. Defaults to 1000.	`1000`

`AdamWOptimizer`

Bases: AdamBaseOptimizer

AdamW optimizer with decoupled weight decay.

`init(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)`

Initialize the AdamW optimizer with decoupled weight decay.

Note

Weight decay here is a way to implement decoupled regularization, which is redundant compared to the use of reg_lambda in MiniML models if per-parameter regularization has been set. Be mindful of not mixing the two approaches unintentionally; if you use weight_decay > 0, you may want to use reg_lambda=0 in the fitting call.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	Learning rate. Defaults to 0.001.	`0.001`
`beta_1`	`float`	Exponential decay rate for the first moment estimates. Defaults to 0.9.	`0.9`
`beta_2`	`float`	Exponential decay rate for the second moment estimates. Defaults to 0.999.	`0.999`
`eps`	`float`	Small constant for numerical stability. Defaults to 1e-8.	`1e-08`
`weight_decay`	`float`	L2 weight decay coefficient, applied in a decoupled AdamW fashion. Defaults to 0.0.	`0.0`
`ortho_grad`	`bool`	If True, project gradients to be orthogonal to the parameters at each step. Defaults to False.	`False`
`tol`	`float`	Tolerance for stopping criterion based on the norm of the first moment. Defaults to 0.0.	`0.0`
`maxiter`	`int`	Maximum number of iterations. Defaults to 1000.	`1000`

miniml.optim

ScipyOptimizer

__init__(method='L-BFGS-B', options={}, tol=None)

AdamOptimizer

__init__(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)

AdamWOptimizer

__init__(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)

`miniml.optim`

`ScipyOptimizer`

`init(method='L-BFGS-B', options={}, tol=None)`

`AdamOptimizer`

`init(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)`

`AdamWOptimizer`

`init(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)`