Difference between revisions of "Hyperparameters"

From tessera
Jump to navigation Jump to search
(Created page with " - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length - augmentations...")
 
Line 1: Line 1:
 +
- **Pixel** not patch
  
    - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
+
- How many timeslots to sub-sample when creating d-pixel
        - augmentations
+
     - 16 timeslots
            - masking of season or some blocks
+
     - 25 timeslots
            - FFT on the pixels
+
    - 40 timeslots
     - 16 timeslot sub-sample
 
     - 25 timeslot sub-sample
 
  
 
- Representation dimension
 
- Representation dimension
 
     - 64, **128**, or 256
 
     - 64, **128**, or 256
  
- ~~Representation length for each dimension~~
+
-Representation length for each dimension
 
     - ~~FP8~~
 
     - ~~FP8~~
 
     - ~~INT8~~
 
     - ~~INT8~~
Line 28: Line 27:
 
- Learning rate
 
- Learning rate
 
     - **0.0001**
 
     - **0.0001**
     - others - chosen by Frank - depends on the data size
+
     - others
  
 
- Encoder type (each with its own parameters)
 
- Encoder type (each with its own parameters)
Line 56: Line 55:
 
     - Logistic regression
 
     - Logistic regression
  
- **Pixel** not patch
+
 
 +
    - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
 +
        - augmentations
 +
            - masking of season or some blocks
 +
            - FFT on the pixels

Revision as of 12:02, 22 May 2025

- **Pixel** not patch

- How many timeslots to sub-sample when creating d-pixel

   - 16 timeslots
   - 25 timeslots
   - 40 timeslots

- Representation dimension

   - 64, **128**, or 256

-Representation length for each dimension

   - ~~FP8~~
   - ~~INT8~~
   - ~~Float16~~
   - ~~Bfloat16~~
   - **32 bits**
       - look at the distribution of representations for each dimension to see if they can be reduced
       - Matryoshka may change things

- Projector size

   - 0, 256, 512, **1024**

- Loss function

   - Barlow twin (parameter lambda = 0.005)
   - **MMCR (parameters alpha=0.005, lambda=0.005)**

- Learning rate

   - **0.0001**
   - others

- Encoder type (each with its own parameters)

   - MLP
   - ResNet
   - **Transformer**
       - **8 attention heads**
       - Q, K, V same dimension as representation dimension = 128
       - **3 layers**

- How many augmentation pairs to use for each pixel

   - Training
       - **1,**2
   - Testing (number of inferences for downstream task)
       - **1**
       - 10 (prioritise this)
           - majority vote
           - **average**

- Downstream classifier

   - **MLP**
       - Number of layers
           - **3**
   - Random Forest
   - XGBoost
   - Linear regression
   - Logistic regression


   - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
       - augmentations
           - masking of season or some blocks
           - FFT on the pixels