Hyperparameters

From tessera
Jump to navigation Jump to search

Chosen values are in bold.

  • Pixel not patch input data for training and inference.
  • How many timeslots to sub-sample when creating d-pixel
    • 16
    • 25
    • 40
  • Representation dimension
    • 64
    • 128
    • 256
  • Representation length for each dimension
    • FP8
    • INT8
    • Float16
    • Bfloat16
    • 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
  • Projector size
    • 0
    • 256
    • 512
    • 1024
  • Loss function
    • Barlow twin (parameter lambda = 0.005)
    • MMCR (parameters alpha=0.005, lambda=0.005)
  • Learning rate
    • 0.0001
  • Encoder type
    • MLP
    • ResNet50
    • Transformer
      • 8 attention heads
      • Q, K, V same dimension as representation dimension = 128
      • 3 layers
  • Number of augmentation pairs to use for each pixel
    • Training
      • 1
      • 2
    • Inferencing
      • 1
      • 10
        • majority vote
        • average
  • Downstream classifier
    • MLP with 3 layers
    • Random Forest
    • XGBoost
    • Linear regression
    • Logistic regression
  • Seasonal masking
    • Yes
    • No