Hyperparameters

From tessera
Revision as of 11:49, 22 May 2025 by Keshav (talk | contribs) (Created page with " - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length - augmentations...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
   - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
       - augmentations
           - masking of season or some blocks
           - FFT on the pixels
   - 16 timeslot sub-sample
   - 25 timeslot sub-sample

- Representation dimension

   - 64, **128**, or 256

- ~~Representation length for each dimension~~

   - ~~FP8~~
   - ~~INT8~~
   - ~~Float16~~
   - ~~Bfloat16~~
   - **32 bits**
       - look at the distribution of representations for each dimension to see if they can be reduced
       - Matryoshka may change things

- Projector size

   - 0, 256, 512, **1024**

- Loss function

   - Barlow twin (parameter lambda = 0.005)
   - **MMCR (parameters alpha=0.005, lambda=0.005)**

- Learning rate

   - **0.0001**
   - others - chosen by Frank - depends on the data size

- Encoder type (each with its own parameters)

   - MLP
   - ResNet
   - **Transformer**
       - **8 attention heads**
       - Q, K, V same dimension as representation dimension = 128
       - **3 layers**

- How many augmentation pairs to use for each pixel

   - Training
       - **1,**2
   - Testing (number of inferences for downstream task)
       - **1**
       - 10 (prioritise this)
           - majority vote
           - **average**

- Downstream classifier

   - **MLP**
       - Number of layers
           - **3**
   - Random Forest
   - XGBoost
   - Linear regression
   - Logistic regression

- **Pixel** not patch