Hyperparameters

From tessera

Revision as of 15:59, 22 May 2025 by Keshav (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Chosen values are in bold.

Pixel not patch input data for training and inference.

How many timeslots to sub-sample when creating d-pixel
1. 16
2. 25
3. 40

Representation dimension
1. 64
2. 128
3. 256

Representation length for each dimension
1. FP8
2. INT8
3. Float16
4. Bfloat16
5. 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things

Projector size
1. 0
2. 256
3. 512
4. 1024

Loss function
1. Barlow twin (parameter lambda = 0.005)
2. MMCR (parameters alpha=0.005, lambda=0.005)

Learning rate
1. 0.0001

Encoder type
1. MLP
2. ResNet50
3. 'Transformer
  1. 8 attention heads
  2. Q, K, V same dimension as representation dimension = 128'
  3. 3 layers

Number of augmentation pairs to use for each pixel
1. Training
  1. 1
  2. 2
2. Inferencing
3. 1
4. 10
  1. majority vote
  2. 3 average

Downstream classifier
1. MLP with 3 layers
2. Random Forest
3. XGBoost
4. Linear regression
5. Logistic regression

Seasonal masking
1. Yes
2. No

Retrieved from ‘https://svr-sk818-web.cl.cam.ac.uk/tessera/index.php?title=Hyperparameters&oldid=72’