Difference between revisions of "Hyperparameters"

From tessera
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 5: Line 5:
  
 
* How many timeslots to sub-sample when creating d-pixel
 
* How many timeslots to sub-sample when creating d-pixel
*# 16
+
** 16
*# 25
+
** 25
*# '''40'''
+
** '''40'''
  
 
* Representation dimension
 
* Representation dimension
*# 64  
+
** 64  
*# '''128'''
+
** '''128'''
*# 256
+
** 256
  
 
* Representation length for each dimension
 
* Representation length for each dimension
*# FP8
+
** FP8
*# INT8
+
** INT8
*# Float16
+
** Float16
*# Bfloat16
+
** Bfloat16
*# '''32 bits''' However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
+
** '''32 bits''' However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
  
 
*Projector size
 
*Projector size
*# 0
+
** 0
*# 256
+
** 256
*# 512
+
** 512
*# '''1024'''
+
** '''1024'''
  
 
* Loss function
 
* Loss function
*# Barlow twin (parameter lambda = 0.005)
+
** Barlow twin (parameter lambda = 0.005)
*# '''MMCR (parameters alpha=0.005, lambda=0.005)'''
+
** '''MMCR (parameters alpha=0.005, lambda=0.005)'''
  
 
* Learning rate
 
* Learning rate
*# '''0.0001'''
+
** '''0.0001'''
  
 
* Encoder type
 
* Encoder type
*# MLP
+
** MLP
*# ResNet50
+
** ResNet50
*# '''Transformer''
+
** '''Transformer'''
*##'''8 attention heads'''
+
***'''8 attention heads'''
*##''Q, K, V same dimension as representation dimension = 128'''
+
***'''Q, K, V same dimension as representation dimension = 128'''
*## '''3 layers'''
+
*** '''3 layers'''
  
 
* Number of augmentation pairs to use for each pixel
 
* Number of augmentation pairs to use for each pixel
*# Training
+
** Training
*## '''1'''
+
*** '''1'''
*## 2
+
*** 2
*#Inferencing
+
**Inferencing
*#1
+
***1
*## 10
+
***10
*### majority vote
+
**** majority vote
*### '''average'''
+
**** '''average'''
  
 
* Downstream classifier
 
* Downstream classifier
*# '''MLP with 3 layers'''
+
** '''MLP with 3 layers'''
*# Random Forest
+
** Random Forest
*#XGBoost
+
**XGBoost
*#Linear regression
+
**Linear regression
*#Logistic regression
+
**Logistic regression
  
 
* Seasonal masking
 
* Seasonal masking
*#Yes
+
**Yes
*#No
+
**No

Latest revision as of 17:03, 22 May 2025

Chosen values are in bold.

  • Pixel not patch input data for training and inference.
  • How many timeslots to sub-sample when creating d-pixel
    • 16
    • 25
    • 40
  • Representation dimension
    • 64
    • 128
    • 256
  • Representation length for each dimension
    • FP8
    • INT8
    • Float16
    • Bfloat16
    • 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
  • Projector size
    • 0
    • 256
    • 512
    • 1024
  • Loss function
    • Barlow twin (parameter lambda = 0.005)
    • MMCR (parameters alpha=0.005, lambda=0.005)
  • Learning rate
    • 0.0001
  • Encoder type
    • MLP
    • ResNet50
    • Transformer
      • 8 attention heads
      • Q, K, V same dimension as representation dimension = 128
      • 3 layers
  • Number of augmentation pairs to use for each pixel
    • Training
      • 1
      • 2
    • Inferencing
      • 1
      • 10
        • majority vote
        • average
  • Downstream classifier
    • MLP with 3 layers
    • Random Forest
    • XGBoost
    • Linear regression
    • Logistic regression
  • Seasonal masking
    • Yes
    • No