Difference between revisions of "Hyperparameters"

Revision as of 11:02, 22 May 2025

- **Pixel** not patch

- How many timeslots to sub-sample when creating d-pixel

   - 16 timeslots
   - 25 timeslots
   - 40 timeslots

- Representation dimension

   - 64, **128**, or 256

-Representation length for each dimension

   - ~~FP8~~
   - ~~INT8~~
   - ~~Float16~~
   - ~~Bfloat16~~
   - **32 bits**
       - look at the distribution of representations for each dimension to see if they can be reduced
       - Matryoshka may change things

- Projector size

   - 0, 256, 512, **1024**

- Loss function

   - Barlow twin (parameter lambda = 0.005)
   - **MMCR (parameters alpha=0.005, lambda=0.005)**

- Learning rate

   - **0.0001**
   - others

- Encoder type (each with its own parameters)

   - MLP
   - ResNet
   - **Transformer**
       - **8 attention heads**
       - Q, K, V same dimension as representation dimension = 128
       - **3 layers**

- How many augmentation pairs to use for each pixel

   - Training
       - **1,**2
   - Testing (number of inferences for downstream task)
       - **1**
       - 10 (prioritise this)
           - majority vote
           - **average**

- Downstream classifier

   - **MLP**
       - Number of layers
           - **3**
   - Random Forest
   - XGBoost
   - Linear regression
   - Logistic regression

   - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
       - augmentations
           - masking of season or some blocks
           - FFT on the pixels

Difference between revisions of "Hyperparameters"

Revision as of 11:02, 22 May 2025

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
+- **Pixel** not patch
-    - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
+- How many timeslots to sub-sample when creating d-pixel
-        - augmentations
+     - 16 timeslots
-            - masking of season or some blocks
+     - 25 timeslots
-            - FFT on the pixels
+    - 40 timeslots
-     - 16 timeslot sub-sample
-     - 25 timeslot sub-sample
 - Representation dimension
      - 64, **128**, or 256
-- ~~Representation length for each dimension~~
+-Representation length for each dimension
      - ~~FP8~~
      - ~~INT8~~
@@ Line 28: / Line 27: @@
 - Learning rate
      - **0.0001**
-     - others - chosen by Frank - depends on the data size
+     - others
 - Encoder type (each with its own parameters)
@@ Line 56: / Line 55: @@
      - Logistic regression
-- **Pixel** not patch
+    - **choose size fixed-length representations** based on the distribution of the number of cloudy days in the training data: base length
+        - augmentations
+            - masking of season or some blocks
+            - FFT on the pixels