Difference between revisions of "Hyperparameters"

From tessera
Jump to navigation Jump to search
Line 1: Line 1:
- **Pixel** not patch
 
  
- How many timeslots to sub-sample when creating d-pixel
+
Chosen values are in bold.
    - 16 timeslots
 
    - 25 timeslots
 
    - 40 timeslots
 
  
- Representation dimension
+
* '''Pixel''' not patch input data for training and inference.
    - 64, **128**, or 256
 
  
-Representation length for each dimension
+
* How many timeslots to sub-sample when creating d-pixel
    - ~~FP8~~
+
*# 16
    - ~~INT8~~
+
*# 25
    - ~~Float16~~
+
*# '''40'''
    - ~~Bfloat16~~
 
    - **32 bits**
 
        - look at the distribution of representations for each dimension to see if they can be reduced
 
        - Matryoshka may change things
 
  
- Projector size
+
* Representation dimension
    - 0, 256, 512, **1024**
+
*# 64
 +
*# '''128'''
 +
*# 256
  
- Loss function
+
* Representation length for each dimension
    - Barlow twin (parameter lambda = 0.005)
+
*# FP8
    - **MMCR (parameters alpha=0.005, lambda=0.005)**
+
*# INT8
 +
*# Float16
 +
*# Bfloat16
 +
*# '''32 bits''' However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
  
- Learning rate
+
*Projector size
    - **0.0001**
+
*0
    - others
+
*# 256
 +
*# 512
 +
*# '''1024'''
  
- Encoder type (each with its own parameters)
+
* Loss function
    - MLP
+
*# Barlow twin (parameter lambda = 0.005)
    - ResNet
+
*#  '''MMCR (parameters alpha=0.005, lambda=0.005)'''
    - **Transformer**
 
        - **8 attention heads**
 
        - Q, K, V same dimension as representation dimension = 128
 
        - **3 layers**
 
  
- How many augmentation pairs to use for each pixel
+
* Learning rate
    - Training
+
*# '''0.0001'''
        - **1,**2
 
    - Testing (number of inferences for downstream task)
 
        - **1**
 
        - 10 (prioritise this)
 
            - majority vote
 
            - **average**
 
  
- Downstream classifier
+
* Encoder type
    - **MLP**
+
*# MLP
        - Number of layers
+
*# ResNet50
            - **3**
+
*# '''Transformer''
    - Random Forest
+
*##'''8 attention heads'''
    - XGBoost
+
*##''Q, K, V same dimension as representation dimension = 128'''
    - Linear regression
+
*## '''3 layers'''
    - Logistic regression
 
  
 +
* Number of augmentation pairs to use for each pixel
 +
*# Training
 +
*## '''1'''
 +
*## 2
  
    - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
+
*#Inferencing
        - augmentations
+
*# 1
            - masking of season or some blocks
+
*# 10
            - FFT on the pixels
+
*## majority vote
 +
*##3 '''average'''
 +
 
 +
* Downstream classifier
 +
*# '''MLP with 3 layers'''
 +
*# Random Forest
 +
*#XGBoost
 +
*#Linear regression
 +
*#Logistic regression
 +
 
 +
* Seasonal masking
 +
*#Yes
 +
*#No

Revision as of 16:58, 22 May 2025

Chosen values are in bold.

  • Pixel not patch input data for training and inference.
  • How many timeslots to sub-sample when creating d-pixel
    1. 16
    2. 25
    3. 40
  • Representation dimension
    1. 64
    2. 128
    3. 256
  • Representation length for each dimension
    1. FP8
    2. INT8
    3. Float16
    4. Bfloat16
    5. 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
  • Projector size
    1. 0
    2. 256
    3. 512
    4. 1024
  • Loss function
    1. Barlow twin (parameter lambda = 0.005)
    2. MMCR (parameters alpha=0.005, lambda=0.005)
  • Learning rate
    1. 0.0001
  • Encoder type
    1. MLP
    2. ResNet50
    3. 'Transformer
      1. 8 attention heads
      2. Q, K, V same dimension as representation dimension = 128'
      3. 3 layers
  • Number of augmentation pairs to use for each pixel
    1. Training
      1. 1
      2. 2
    1. Inferencing
    2. 1
    3. 10
      1. majority vote
      2. 3 average
  • Downstream classifier
    1. MLP with 3 layers
    2. Random Forest
    3. XGBoost
    4. Linear regression
    5. Logistic regression
  • Seasonal masking
    1. Yes
    2. No