Difference between revisions of "Hyperparameters"

Latest revision as of 16:03, 22 May 2025

Chosen values are in bold.

Pixel not patch input data for training and inference.

How many timeslots to sub-sample when creating d-pixel
- 16
- 25
- 40

Representation dimension
- 64
- 128
- 256

Representation length for each dimension
- FP8
- INT8
- Float16
- Bfloat16
- 32 bits However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things

Projector size
- 0
- 256
- 512
- 1024

Loss function
- Barlow twin (parameter lambda = 0.005)
- MMCR (parameters alpha=0.005, lambda=0.005)

Learning rate
- 0.0001

Encoder type
- MLP
- ResNet50
- Transformer
  - 8 attention heads
  - Q, K, V same dimension as representation dimension = 128
  - 3 layers

Number of augmentation pairs to use for each pixel
- Training
  - 1
  - 2
- Inferencing
  - 1
  - 10
    - majority vote
    - average

Downstream classifier
- MLP with 3 layers
- Random Forest
- XGBoost
- Linear regression
- Logistic regression

Seasonal masking
- Yes
- No

@@ Line 1: / Line 1: @@
-- **Pixel** not patch
-- How many timeslots to sub-sample when creating d-pixel
+Chosen values are in bold.
-    - 16 timeslots
-    - 25 timeslots
-    - 40 timeslots
-- Representation dimension
+* '''Pixel''' not patch input data for training and inference.
-    - 64, **128**, or 256
--Representation length for each dimension
+* How many timeslots to sub-sample when creating d-pixel
-    - ~~FP8~~
+** 16
-    - ~~INT8~~
+** 25
-    - ~~Float16~~
+** '''40'''
-    - ~~Bfloat16~~
-    - **32 bits**
-        - look at the distribution of representations for each dimension to see if they can be reduced
-        - Matryoshka may change things
-- Projector size
+* Representation dimension
-    - 0, 256, 512, **1024**
+** 64
+** '''128'''
+** 256
-- Loss function
+* Representation length for each dimension
-    - Barlow twin (parameter lambda = 0.005)
+** FP8
-    - **MMCR (parameters alpha=0.005, lambda=0.005)**
+** INT8
+** Float16
+** Bfloat16
+** '''32 bits''' However, we will need to look at the distribution of representations for each dimension to see if they can be reduced, and Matryoshka may change things
-- Learning rate
+*Projector size
-    - **0.0001**
+**  0
-    - others
+** 256
+** 512
+** '''1024'''
-- Encoder type (each with its own parameters)
+* Loss function
-    - MLP
+** Barlow twin (parameter lambda = 0.005)
-    - ResNet
+**  '''MMCR (parameters alpha=0.005, lambda=0.005)'''
-    - **Transformer**
-        - **8 attention heads**
-        - Q, K, V same dimension as representation dimension = 128
-        - **3 layers**
-- How many augmentation pairs to use for each pixel
+* Learning rate
-    - Training
+** '''0.0001'''
-        - **1,**2
-    - Testing (number of inferences for downstream task)
-        - **1**
-        - 10 (prioritise this)
-            - majority vote
-            - **average**
-- Downstream classifier
+* Encoder type
-    - **MLP**
+** MLP
-        - Number of layers
+** ResNet50
-            - **3**
+** '''Transformer'''
-    - Random Forest
+***'''8 attention heads'''
-    - XGBoost
+***'''Q, K, V same dimension as representation dimension = 128'''
-    - Linear regression
+*** '''3 layers'''
-    - Logistic regression
+* Number of augmentation pairs to use for each pixel
+** Training
+*** '''1'''
+*** 2
+**Inferencing
+***1
+***10
+**** majority vote
+**** '''average'''
-    - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
+* Downstream classifier
-        - augmentations
+** '''MLP with 3 layers'''
-            - masking of season or some blocks
+** Random Forest
-            - FFT on the pixels
+**XGBoost
+**Linear regression
+**Logistic regression
+* Seasonal masking
+**Yes
+**No

Difference between revisions of "Hyperparameters"

Latest revision as of 16:03, 22 May 2025

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools