Hyperparameters

From tessera
Revision as of 16:59, 22 May 2025 by Keshav (talk | contribs)
Jump to navigation Jump to search

Chosen values are in bold.

  • Pixel not patch input data for training and inference.
  • How many timeslots to sub-sample when creating d-pixel
    1. 16
    2. 25
    3. 40
  • Representation dimension
    1. 64
    2. 128
    3. 256
  • Representation length for each dimension
    1. FP8
    2. INT8
    3. Float16
    4. Bfloat16
    5. 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
  • Projector size
    1. 0
    2. 256
    3. 512
    4. 1024
  • Loss function
    1. Barlow twin (parameter lambda = 0.005)
    2. MMCR (parameters alpha=0.005, lambda=0.005)
  • Learning rate
    1. 0.0001
  • Encoder type
    1. MLP
    2. ResNet50
    3. 'Transformer
      1. 8 attention heads
      2. Q, K, V same dimension as representation dimension = 128'
      3. 3 layers
  • Number of augmentation pairs to use for each pixel
    1. Training
      1. 1
      2. 2
    2. Inferencing
    3. 1
    4. 10
      1. majority vote
      2. 3 average
  • Downstream classifier
    1. MLP with 3 layers
    2. Random Forest
    3. XGBoost
    4. Linear regression
    5. Logistic regression
  • Seasonal masking
    1. Yes
    2. No