miniml.optim
Optimization algorithms for MiniML models.
ScipyOptimizer
Bases: MiniMLOptimizer
Optimizer that wraps scipy.optimize.minimize and supports the following methods:
- 'Nelder-Mead'
- 'Powell'
- 'CG'
- 'BFGS'
- 'L-BFGS-B'
- 'Newton-CG'
- 'trust-ncg'
- 'trust-krylov'
- 'trust-constr'
- 'dogleg'
- 'trust-exact'
- 'COBYLA'
__init__(method='L-BFGS-B', options={}, tol=None)
Initialize the ScipyOptimizer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
The optimization method to use. Defaults to 'L-BFGS-B'. |
'L-BFGS-B'
|
options
|
dict
|
Options to pass to scipy.optimize.minimize. Defaults to {}. |
{}
|
tol
|
float | None
|
Tolerance for termination. Defaults to None. |
None
|
AdamOptimizer
Bases: AdamBaseOptimizer
Adaptive Moment Estimation (Adam) optimizer.
This implements classical Adam. Optional weight_decay is applied
via L2 regularisation inside the gradient (coupled weight decay).
References
- Diederik P. Kingma and Jimmy Ba. "Adam: A Method for Stochastic Optimization."
__init__(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)
Initialize the classical Adam optimizer.
Note
Weight decay here is a way to implement L2 regularization, which is redundant compared to the use of reg_lambda in MiniML models if per-parameter regularization has been set. Be mindful of not mixing the two approaches unintentionally; if you use weight_decay > 0, use reg_lambda=0 in the fitting call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha
|
float
|
Learning rate. Defaults to 0.001. |
0.001
|
beta_1
|
float
|
Exponential decay rate for the first moment estimates. Defaults to 0.9. |
0.9
|
beta_2
|
float
|
Exponential decay rate for the second moment estimates. Defaults to 0.999. |
0.999
|
eps
|
float
|
Small constant for numerical stability. Defaults to 1e-8. |
1e-08
|
weight_decay
|
float
|
L2 weight decay coefficient applied inside the gradient (coupled). Defaults to 0.0. |
0.0
|
ortho_grad
|
bool
|
If True, project gradients to be orthogonal to the parameters at each step. Defaults to False. |
False
|
tol
|
float
|
Tolerance for stopping criterion based on the norm of the first moment. Defaults to 0.0. |
0.0
|
maxiter
|
int
|
Maximum number of iterations. Defaults to 1000. |
1000
|
AdamWOptimizer
Bases: AdamBaseOptimizer
AdamW optimizer with decoupled weight decay.
__init__(alpha=0.001, beta_1=0.9, beta_2=0.999, eps=1e-08, weight_decay=0.0, ortho_grad=False, tol=0.0, maxiter=1000)
Initialize the AdamW optimizer with decoupled weight decay.
Note
Weight decay here is a way to implement decoupled regularization, which is redundant compared to the use of reg_lambda in MiniML models if per-parameter regularization has been set. Be mindful of not mixing the two approaches unintentionally; if you use weight_decay > 0, you may want to use reg_lambda=0 in the fitting call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha
|
float
|
Learning rate. Defaults to 0.001. |
0.001
|
beta_1
|
float
|
Exponential decay rate for the first moment estimates. Defaults to 0.9. |
0.9
|
beta_2
|
float
|
Exponential decay rate for the second moment estimates. Defaults to 0.999. |
0.999
|
eps
|
float
|
Small constant for numerical stability. Defaults to 1e-8. |
1e-08
|
weight_decay
|
float
|
L2 weight decay coefficient, applied in a decoupled AdamW fashion. Defaults to 0.0. |
0.0
|
ortho_grad
|
bool
|
If True, project gradients to be orthogonal to the parameters at each step. Defaults to False. |
False
|
tol
|
float
|
Tolerance for stopping criterion based on the norm of the first moment. Defaults to 0.0. |
0.0
|
maxiter
|
int
|
Maximum number of iterations. Defaults to 1000. |
1000
|