Difference between revisions of "Hyperparameters"
Jump to navigation
Jump to search
(5 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
* How many timeslots to sub-sample when creating d-pixel | * How many timeslots to sub-sample when creating d-pixel | ||
− | * | + | ** 16 |
− | * | + | ** 25 |
− | * | + | ** '''40''' |
* Representation dimension | * Representation dimension | ||
− | * | + | ** 64 |
− | * | + | ** '''128''' |
− | * | + | ** 256 |
* Representation length for each dimension | * Representation length for each dimension | ||
− | * | + | ** FP8 |
− | * | + | ** INT8 |
− | * | + | ** Float16 |
− | * | + | ** Bfloat16 |
− | * | + | ** '''32 bits''' However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things |
*Projector size | *Projector size | ||
− | * | + | ** 0 |
− | * | + | ** 256 |
− | * | + | ** 512 |
− | * | + | ** '''1024''' |
* Loss function | * Loss function | ||
− | * | + | ** Barlow twin (parameter lambda = 0.005) |
− | * | + | ** '''MMCR (parameters alpha=0.005, lambda=0.005)''' |
* Learning rate | * Learning rate | ||
− | * | + | ** '''0.0001''' |
* Encoder type | * Encoder type | ||
− | * | + | ** MLP |
− | * | + | ** ResNet50 |
− | * | + | ** '''Transformer''' |
− | * | + | ***'''8 attention heads''' |
− | * | + | ***'''Q, K, V same dimension as representation dimension = 128''' |
− | * | + | *** '''3 layers''' |
* Number of augmentation pairs to use for each pixel | * Number of augmentation pairs to use for each pixel | ||
− | * | + | ** Training |
− | * | + | *** '''1''' |
− | * | + | *** 2 |
− | * | + | **Inferencing |
− | * | + | ***1 |
− | * | + | ***10 |
− | * | + | **** majority vote |
− | * | + | **** '''average''' |
* Downstream classifier | * Downstream classifier | ||
− | * | + | ** '''MLP with 3 layers''' |
− | * | + | ** Random Forest |
− | * | + | **XGBoost |
− | * | + | **Linear regression |
− | * | + | **Logistic regression |
* Seasonal masking | * Seasonal masking | ||
− | * | + | **Yes |
− | * | + | **No |
Latest revision as of 17:03, 22 May 2025
Chosen values are in bold.
- Pixel not patch input data for training and inference.
- How many timeslots to sub-sample when creating d-pixel
- 16
- 25
- 40
- Representation dimension
- 64
- 128
- 256
- Representation length for each dimension
- FP8
- INT8
- Float16
- Bfloat16
- 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
- Projector size
- 0
- 256
- 512
- 1024
- Loss function
- Barlow twin (parameter lambda = 0.005)
- MMCR (parameters alpha=0.005, lambda=0.005)
- Learning rate
- 0.0001
- Encoder type
- MLP
- ResNet50
- Transformer
- 8 attention heads
- Q, K, V same dimension as representation dimension = 128
- 3 layers
- Number of augmentation pairs to use for each pixel
- Training
- 1
- 2
- Inferencing
- 1
- 10
- majority vote
- average
- Training
- Downstream classifier
- MLP with 3 layers
- Random Forest
- XGBoost
- Linear regression
- Logistic regression
- Seasonal masking
- Yes
- No