Difference between revisions of "Hyperparameters"
Jump to navigation
Jump to search
| Line 1: | Line 1: | ||
| − | |||
| − | + | Chosen values are in bold. | |
| − | |||
| − | |||
| − | |||
| − | + | * '''Pixel''' not patch input data for training and inference. | |
| − | |||
| − | - | + | * How many timeslots to sub-sample when creating d-pixel |
| − | + | *# 16 | |
| − | + | *# 25 | |
| − | + | *# '''40''' | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | * Representation dimension | |
| − | + | *# 64 | |
| + | *# '''128''' | ||
| + | *# 256 | ||
| − | + | * Representation length for each dimension | |
| − | + | *# FP8 | |
| − | + | *# INT8 | |
| + | *# Float16 | ||
| + | *# Bfloat16 | ||
| + | *# '''32 bits''' However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things | ||
| − | + | *Projector size | |
| − | + | *# 0 | |
| − | + | *# 256 | |
| + | *# 512 | ||
| + | *# '''1024''' | ||
| − | + | * Loss function | |
| − | + | *# Barlow twin (parameter lambda = 0.005) | |
| − | + | *# '''MMCR (parameters alpha=0.005, lambda=0.005)''' | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | * Learning rate | |
| − | + | *# '''0.0001''' | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | * Encoder type | |
| − | + | *# MLP | |
| − | + | *# ResNet50 | |
| − | + | *# '''Transformer'' | |
| − | + | *##'''8 attention heads''' | |
| − | + | *##''Q, K, V same dimension as representation dimension = 128''' | |
| − | + | *## '''3 layers''' | |
| − | |||
| + | * Number of augmentation pairs to use for each pixel | ||
| + | *# Training | ||
| + | *## '''1''' | ||
| + | *## 2 | ||
| − | + | *#Inferencing | |
| − | + | *# 1 | |
| − | + | *# 10 | |
| − | + | *## majority vote | |
| + | *##3 '''average''' | ||
| + | |||
| + | * Downstream classifier | ||
| + | *# '''MLP with 3 layers''' | ||
| + | *# Random Forest | ||
| + | *#XGBoost | ||
| + | *#Linear regression | ||
| + | *#Logistic regression | ||
| + | |||
| + | * Seasonal masking | ||
| + | *#Yes | ||
| + | *#No | ||
Revision as of 15:58, 22 May 2025
Chosen values are in bold.
- Pixel not patch input data for training and inference.
- How many timeslots to sub-sample when creating d-pixel
- 16
- 25
- 40
- Representation dimension
- 64
- 128
- 256
- Representation length for each dimension
- FP8
- INT8
- Float16
- Bfloat16
- 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
- Projector size
- 0
- 256
- 512
- 1024
- Loss function
- Barlow twin (parameter lambda = 0.005)
- MMCR (parameters alpha=0.005, lambda=0.005)
- Learning rate
- 0.0001
- Encoder type
- MLP
- ResNet50
- 'Transformer
- 8 attention heads
- Q, K, V same dimension as representation dimension = 128'
- 3 layers
- Number of augmentation pairs to use for each pixel
- Training
- 1
- 2
- Training
- Inferencing
- 1
- 10
- majority vote
- 3 average
- Downstream classifier
- MLP with 3 layers
- Random Forest
- XGBoost
- Linear regression
- Logistic regression
- Seasonal masking
- Yes
- No