Hyperparameters

From tessera

Jump to navigation Jump to search

Chosen values are in bold.

Pixel not patch input data for training and inference.

How many timeslots to sub-sample when creating d-pixel
- 16
- 25
- 40

Representation dimension
- 64
- 128
- 256

Representation length for each dimension
- FP8
- INT8
- Float16
- Bfloat16
- 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things

Projector size
- 0
- 256
- 512
- 1024

Loss function
- Barlow twin (parameter lambda = 0.005)
- MMCR (parameters alpha=0.005, lambda=0.005)

Learning rate
- 0.0001

Encoder type
- MLP
- ResNet50
- Transformer
  - 8 attention heads
  - Q, K, V same dimension as representation dimension = 128
  - 3 layers

Number of augmentation pairs to use for each pixel
- Training
  - 1
  - 2
- Inferencing
  - 1
  - 10
    - majority vote
    - average

Downstream classifier
- MLP with 3 layers
- Random Forest
- XGBoost
- Linear regression
- Logistic regression

Seasonal masking
- Yes
- No

Retrieved from ‘https://svr-sk818-web.cl.cam.ac.uk/tessera/index.php?title=Hyperparameters&oldid=78’