Difference between revisions of "Hyperparameters"

From tessera
Jump to navigation Jump to search
(Created page with " - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length - augmentations...")
 
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
  
    - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
+
Chosen values are in bold.
        - augmentations
 
            - masking of season or some blocks
 
            - FFT on the pixels
 
    - 16 timeslot sub-sample
 
    - 25 timeslot sub-sample
 
  
- Representation dimension
+
* '''Pixel''' not patch input data for training and inference.
    - 64, **128**, or 256
 
  
- ~~Representation length for each dimension~~
+
* How many timeslots to sub-sample when creating d-pixel
    - ~~FP8~~
+
** 16
    - ~~INT8~~
+
** 25
    - ~~Float16~~
+
** '''40'''
    - ~~Bfloat16~~
 
    - **32 bits**
 
        - look at the distribution of representations for each dimension to see if they can be reduced
 
        - Matryoshka may change things
 
  
- Projector size
+
* Representation dimension
    - 0, 256, 512, **1024**
+
** 64
 +
** '''128'''
 +
** 256
  
- Loss function
+
* Representation length for each dimension
    - Barlow twin (parameter lambda = 0.005)
+
** FP8
    - **MMCR (parameters alpha=0.005, lambda=0.005)**
+
** INT8
 +
** Float16
 +
** Bfloat16
 +
** '''32 bits''' However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
  
- Learning rate
+
*Projector size
    - **0.0001**
+
** 0
    - others - chosen by Frank - depends on the data size
+
** 256
 +
** 512
 +
** '''1024'''
  
- Encoder type (each with its own parameters)
+
* Loss function
    - MLP
+
** Barlow twin (parameter lambda = 0.005)
    - ResNet
+
** '''MMCR (parameters alpha=0.005, lambda=0.005)'''
    - **Transformer**
 
        - **8 attention heads**
 
        - Q, K, V same dimension as representation dimension = 128
 
        - **3 layers**
 
  
- How many augmentation pairs to use for each pixel
+
* Learning rate
    - Training
+
** '''0.0001'''
        - **1,**2
 
    - Testing (number of inferences for downstream task)
 
        - **1**
 
        - 10 (prioritise this)
 
            - majority vote
 
            - **average**
 
  
- Downstream classifier
+
* Encoder type
    - **MLP**
+
** MLP
        - Number of layers
+
** ResNet50
            - **3**
+
** '''Transformer'''
    - Random Forest
+
***'''8 attention heads'''
    - XGBoost
+
***'''Q, K, V same dimension as representation dimension = 128'''
    - Linear regression
+
*** '''3 layers'''
    - Logistic regression
 
  
- **Pixel** not patch
+
* Number of augmentation pairs to use for each pixel
 +
** Training
 +
*** '''1'''
 +
*** 2
 +
**Inferencing
 +
***1
 +
***10
 +
**** majority vote
 +
**** '''average'''
 +
 
 +
* Downstream classifier
 +
** '''MLP with 3 layers'''
 +
** Random Forest
 +
**XGBoost
 +
**Linear regression
 +
**Logistic regression
 +
 
 +
* Seasonal masking
 +
**Yes
 +
**No

Latest revision as of 17:03, 22 May 2025

Chosen values are in bold.

  • Pixel not patch input data for training and inference.
  • How many timeslots to sub-sample when creating d-pixel
    • 16
    • 25
    • 40
  • Representation dimension
    • 64
    • 128
    • 256
  • Representation length for each dimension
    • FP8
    • INT8
    • Float16
    • Bfloat16
    • 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
  • Projector size
    • 0
    • 256
    • 512
    • 1024
  • Loss function
    • Barlow twin (parameter lambda = 0.005)
    • MMCR (parameters alpha=0.005, lambda=0.005)
  • Learning rate
    • 0.0001
  • Encoder type
    • MLP
    • ResNet50
    • Transformer
      • 8 attention heads
      • Q, K, V same dimension as representation dimension = 128
      • 3 layers
  • Number of augmentation pairs to use for each pixel
    • Training
      • 1
      • 2
    • Inferencing
      • 1
      • 10
        • majority vote
        • average
  • Downstream classifier
    • MLP with 3 layers
    • Random Forest
    • XGBoost
    • Linear regression
    • Logistic regression
  • Seasonal masking
    • Yes
    • No