Hyperparameters
Jump to navigation
Jump to search
- **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length - augmentations - masking of season or some blocks - FFT on the pixels - 16 timeslot sub-sample - 25 timeslot sub-sample
- Representation dimension
- 64, **128**, or 256
- ~~Representation length for each dimension~~
- ~~FP8~~ - ~~INT8~~ - ~~Float16~~ - ~~Bfloat16~~ - **32 bits** - look at the distribution of representations for each dimension to see if they can be reduced - Matryoshka may change things
- Projector size
- 0, 256, 512, **1024**
- Loss function
- Barlow twin (parameter lambda = 0.005) - **MMCR (parameters alpha=0.005, lambda=0.005)**
- Learning rate
- **0.0001** - others - chosen by Frank - depends on the data size
- Encoder type (each with its own parameters)
- MLP - ResNet - **Transformer** - **8 attention heads** - Q, K, V same dimension as representation dimension = 128 - **3 layers**
- How many augmentation pairs to use for each pixel
- Training - **1,**2 - Testing (number of inferences for downstream task) - **1** - 10 (prioritise this) - majority vote - **average**
- Downstream classifier
- **MLP** - Number of layers - **3** - Random Forest - XGBoost - Linear regression - Logistic regression
- **Pixel** not patch