Difference between revisions of "Hyperparameters"
Jump to navigation
Jump to search
(Created page with " - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length - augmentations...") |
|||
Line 1: | Line 1: | ||
+ | - **Pixel** not patch | ||
− | + | - How many timeslots to sub-sample when creating d-pixel | |
− | + | - 16 timeslots | |
− | + | - 25 timeslots | |
− | + | - 40 timeslots | |
− | - 16 | ||
− | - 25 | ||
- Representation dimension | - Representation dimension | ||
- 64, **128**, or 256 | - 64, **128**, or 256 | ||
− | - | + | -Representation length for each dimension |
- ~~FP8~~ | - ~~FP8~~ | ||
- ~~INT8~~ | - ~~INT8~~ | ||
Line 28: | Line 27: | ||
- Learning rate | - Learning rate | ||
- **0.0001** | - **0.0001** | ||
− | - others | + | - others |
- Encoder type (each with its own parameters) | - Encoder type (each with its own parameters) | ||
Line 56: | Line 55: | ||
- Logistic regression | - Logistic regression | ||
− | - ** | + | |
+ | - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length | ||
+ | - augmentations | ||
+ | - masking of season or some blocks | ||
+ | - FFT on the pixels |
Revision as of 12:02, 22 May 2025
- **Pixel** not patch
- How many timeslots to sub-sample when creating d-pixel
- 16 timeslots - 25 timeslots - 40 timeslots
- Representation dimension
- 64, **128**, or 256
-Representation length for each dimension
- ~~FP8~~ - ~~INT8~~ - ~~Float16~~ - ~~Bfloat16~~ - **32 bits** - look at the distribution of representations for each dimension to see if they can be reduced - Matryoshka may change things
- Projector size
- 0, 256, 512, **1024**
- Loss function
- Barlow twin (parameter lambda = 0.005) - **MMCR (parameters alpha=0.005, lambda=0.005)**
- Learning rate
- **0.0001** - others
- Encoder type (each with its own parameters)
- MLP - ResNet - **Transformer** - **8 attention heads** - Q, K, V same dimension as representation dimension = 128 - **3 layers**
- How many augmentation pairs to use for each pixel
- Training - **1,**2 - Testing (number of inferences for downstream task) - **1** - 10 (prioritise this) - majority vote - **average**
- Downstream classifier
- **MLP** - Number of layers - **3** - Random Forest - XGBoost - Linear regression - Logistic regression
- **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length - augmentations - masking of season or some blocks - FFT on the pixels