Hyperparameters
Jump to navigation
Jump to search
- **Pixel** not patch
- How many timeslots to sub-sample when creating d-pixel
- 16 timeslots - 25 timeslots - 40 timeslots
- Representation dimension
- 64, **128**, or 256
-Representation length for each dimension
- ~~FP8~~ - ~~INT8~~ - ~~Float16~~ - ~~Bfloat16~~ - **32 bits** - look at the distribution of representations for each dimension to see if they can be reduced - Matryoshka may change things
- Projector size
- 0, 256, 512, **1024**
- Loss function
- Barlow twin (parameter lambda = 0.005) - **MMCR (parameters alpha=0.005, lambda=0.005)**
- Learning rate
- **0.0001** - others
- Encoder type (each with its own parameters)
- MLP - ResNet - **Transformer** - **8 attention heads** - Q, K, V same dimension as representation dimension = 128 - **3 layers**
- How many augmentation pairs to use for each pixel
- Training - **1,**2 - Testing (number of inferences for downstream task) - **1** - 10 (prioritise this) - majority vote - **average**
- Downstream classifier
- **MLP** - Number of layers - **3** - Random Forest - XGBoost - Linear regression - Logistic regression
- **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length - augmentations - masking of season or some blocks - FFT on the pixels