# Pre-process Data

The Pre-process data step will help ensure that the data is ready to be uploaded in a tool ingestible format for the [Data Import ](/project-setup/build-jobs/2.-data-import.md)step. The dataset must segregated in a folder and stored in an Amazon S3 bucket

1. [Format Data](#id-1.-format-point-cloud-data) - Pointcloud formated to .las
2. [Segregate Data](#id-2.-segregate-data-in-folder-s) - Folder structure
3. [Store Data](#id-3.-data-storage) - Amazon S3 Bucket

## Format Point Cloud Data

* The tool supports LiDAR & RaDar datasets in `.las` format only.
* The point cloud datasets can be in formats such as `.bin` `.pcd` `.json` `.txt`&#x20;
  * Data in `.bin` format first will be coverted to `.pcd` and then further converted to `.las`
    * Read about `.bin` to `.pcd` conversion for NuScenes data set [here](https://forum.nuscenes.org/t/how-do-i-convert-nuscenes-lidar-data-to-pcd-file/785/2)
    * CloudCompare can be used to convert `.pcd` to `.las` using the following command:

{% hint style="info" %}
An example on how to convert pcd file in las:

<pre><code><strong>cloudcompare.CloudCompare -SILENT -O &#x3C;filename>.pcd -C_EXPORT_FMT LAS -SAVE_CLOUDS FILE &#x3C;filename>.las
</strong></code></pre>

{% endhint %}

## Segregate Data into Folders

The way the data is segregated impacts the visibility of tasks that load on the tool for the labeling experts. Below is the terminology that will be frequently used in this document to segregate data correctly.&#x20;

### Data Terminology

#### **Task**

A task is defined as the labeling work performed on one frame that loads on the annotation tool.&#x20;

#### **Frame**

A frame is a visual dataset that loads on the annotation tool that has Image data along with its respective sensor data (LiDAR, RaDar etc..)

#### **Batch**

A batch (or sequence) is the collective set of multiple frames that load on the annotation tool for a single expert is called a batch. The size of a batch can vary between 1 and n.

* Submission happens for a Batch.

### Data Reflection on Tool

To understand segregation better, consider the following example:

*There are 100 frames in a sequence which need to be annotated. The desired number of frames that a single batch should load is 10 at most. This limit of frames in a batch that loads on the tool is set at the time of setting a* [*batch limit*](/project-setup/build-jobs/2.-data-import.md#data-grouping) *at the time of* [*importing data*](/project-setup/build-jobs/2.-data-import.md)*.*

Here is the representation of this batch on the tool.

* There is batch of 10 frames (BLUE).&#x20;
* Each frame has 1 point cloud (ORANGE) and 3 camera images (GREEN) linked to it.
* Each camera is synced with the point cloud (ORANGE). The images will be synchronized with the corresponding point cloud based on the availability of calibration details.&#x20;
* The LiDAR point cloud may include pre-existing labeled data. (PINK)

<figure><img src="https://lh7-us.googleusercontent.com/docsz/AD_4nXdSn2vl2l2crAa1JgQ1ZT_XXXLRpeGV9mrMjLgS7ZZ40zm3GxVWSRBNN1L8JKTKpxWfFIGOfEN256ROkdERu2CkFajE3N-kBpMz6hABqUBTq8uamKZ2lfx_muxEkFkr3-3XDECdp9xu_BwUBYH0DwgmCEin?key=lGhAQLv5xC3Ciuwf6zBTGQ" alt=""><figcaption></figcaption></figure>

Using the help of the above example, the Data Folder will need to be reorganzied in the following format:

* Respective Camera Data - Green folders
* LiDAR data - Orange folder
* Calibration - Red folder (preferable to have calibration to sync the camera with point cloud for quicker reference)
* Pre-labelled annotation data - Pink folder (when prelabelled data is available)
* Velocity/ Ego vehicle data - White folder (optional based on output required)

### Preparing Data Folder

* The data can be store all together in one Folder or multiple folders
  * Option 1: Prepare 1 folder with all 100 frames data
    * &#x20;Ideally all frames belonging to a single sequence should be stored together.
  * Option 2: Prepare multiple folders to divide 100 frame's data.
    * This is useful when the frames in a batch are not sequenced
* The folder name should not have any space Eg: `Folder_1`

<div align="center"><figure><img src="https://lh7-us.googleusercontent.com/docsz/AD_4nXd3x8rGkB-FTlzRcpmdQf1i97UvTkbIR8ah06tWL9rkVFJa9rn_3s6miDMnL1PggYRq-VOzxwI1pVE_m98-kbq1BFbwHVVU9btFhbfTRKx2TYvOmMW28ec6AdP4Ngf1_MWq8_k61NGMplCiYEqW7HpDc0nJ?key=lGhAQLv5xC3Ciuwf6zBTGQ" alt="" width="375"><figcaption></figcaption></figure></div>

#### Step 1: Create Camera Folders

* Create image folders for each camera sensor respectively in `Folder 1`. For example, `Camera 1` , `Camera 2` , `Camera 3`... `Camera n`
* The Camera folder name will reflect on the respective camera images fetched on the annotation tool.  Hence, ensure the camera folders are named appropriately to help provide context to the subject expert.
* The image files' format should be in .`jpeg` or .`png` format.
* Each camera folder should contain all the images belonging to that camera sensor across all the Frames stored in `Folder_1`
* The image files in the Camera folders should have identical names if they were captured at the same instance.  For example, in `Frame 1`  , `Camera 1` , `Camera 2`  and `Camera 3` will all have the image filename saved as `xyz_timestamp1.jpeg`.

#### Step 2:  Create LiDAR Folder

* Create a folder containing files of the point cloud data across all the frames in `Folder_1`.
* The point cloud folder can be named arbitrarily. For consistency, consider naming it **LiDAR** or **PCT.**
* The point cloud data must be in **.las** format, as this is the supported format for the annotation tool. Refer to the guide on [Format LiDAR Data](#a-format-point-cloud-data) for more details.
* The folder should include all the point cloud files corresponding to all frames, organized within `Folder_1`
* The point cloud file in the LiDAR folder must have the same name as the images from the cameras associated with that frame. For instance, if the image files for all three cameras (stored in their respective camera folders i.e.  `Camera 1` , `Camera 2`  and `Camera 3`) are named `xyz_timestamp1.jpeg` for Frame 1, then the point cloud file for that frame in the LiDAR folder must be named `xyz_timestamp1.las`.

#### Step 3: Create Calibration Folder

* When camera sensors and point cloud files are available, a calibration file may also be present. If it exists, create a dedicated folder to store all calibration data in **.json** format.
* This folder must be named as `calibration`.
* If the calibration is identical for all camera sensors, then store 1 calibration file in it called `calibration.json`
* If the calibration data varies for all camera sensors, then either:
  * Prepare one separate file for each camera sensor and save it under the `calibration` folder OR,
  * Prepare one .json file with a separate block for each camera calibration.
* Compute the calibration matrix by multiplying the cam\_intrinsic matrix with the inverse of camera\_extrinsic

```
calibration_matrix = camera_intrinsic * inverse(camera_extrinsic)
```

<details>

<summary>json format for calibration file:</summary>

<pre class="language-json"><code class="lang-json"><strong>{
</strong>   "matrices": [
       {
           "fromWorld": {
               "elements": [ //All element's components are in column Major format
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0,
                   0
               ]
           },
           "name": "LIDAR_TOP"
       },
       {
           "fromWorld": {
               "elements": [
                   839.3693313296216,
                   480.50874358343594,
                   0.9999117986552755,
                   0,
                   -1244.2586937950568,
                   20.203096824064712,
                   0.010155927635204103,
                   0,
                   -8.2467494447129,
                   -1248.651045792533,
                   0.008558740785910282,
                   0,
                   -1427.154718970285,
                   1039.0897354143844,
                   -1.7346966604269405,
                   1
               ]
           },
           "name": "CAM_FRONT"
       }
   ]
}
</code></pre>

</details>

#### Step 4: Create a Folder for Pre-Labelled Annotations&#x20;

1. Prepare the [data folder structure](#preparing-data-folder).
2. Store the prelabled annotations in the lidar\_annotation folder.

For example, if you have 100 frames, the corresponding files should be named as: 1.json, 2.json, 3.json, and so on up to 100.json.

{% hint style="info" %}
Prelabled annotations are stored as JSON files, one json per frame
{% endhint %}

The JSON schema needed to create prelabled files (supports cuboid, 2D bbox, 3Dpolyline).

<details>

<summary>json schema for prelabled file:</summary>

<pre class="language-json" data-line-numbers><code class="lang-json">{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Annotation File",
  "type": "object",
  "properties": {
    "annotations": {
      "type": "array",
      "items": {
        "type": "object",
        "oneOf": [
          {
            "if": {
              "properties": {
                "object_type": {
                  "const": "rectangle"
                }
              }
            },
            "then": {
              "required": [
                "object_type",
                "geometry"
              ],
              "properties": {
                "object_type": {
                  "const": "rectangle"
                },
                "class": {
                  "type": "string"
                },
                "identity": {
                  "type": [
                    "integer",
                    "string"
                  ]
                },
                "reference_folder": {
                  "type": "string"
                },
                "geometry": {
                  "type": "object",
                  "properties": {
                    "coordinates": {
                      "type": "array",
                      "items": {
                        "type": "object",
                        "properties": {
                          "x": {
                            "type": "number"
                          },
                          "y": {
                            "type": "number"
                          }
                        },
                        "required": [
                          "x",
                          "y"
                        ]
                      },
                      "minItems": 4,
                      "maxItems": 4
                    }
                  },
                  "required": [
                    "coordinates"
                  ]
                },
                "taxonomy_attribute": {
                  "type": "object"
                },
                "isGeometryKeyFrame": {
                  "type": "boolean"
                },
                "origin": {
                  "type": "string"
                },
                "prelabel": {
                  "type": "object",
                  "properties": {
                    "modelName": {
                      "type": "string"
                    },
                    "modelVersion": {
                      "type": "string"
                    },
                    "confidenceScore": {
                      "type": "string"
                    }
                  }
                }
              }
            }
          },
          {
            "if": {
              "properties": {
                "object_type": {
                  "const": "cuboid"
                }
              }
            },
            "then": {
              "required": [
                "object_type",
                "geometry"
              ],
              "properties": {
                "object_type": {
                  "const": "cuboid"
                },
                "id": {
                  "type": "string"
                },
                "class": {
                  "type": "string"
                },
                "identity": {
                  "type": [
                    "integer",
                    "string"
                  ]
                },
                "classId": {
                  "type": [
                    "string",
                    "integer"
                  ]
                },
                "geometry": {
                  "type": "object",
                  "properties": {
                    "position": {
                      "type": "object",
                      "properties": {
                        "x": {
                          "type": "number"
                        },
                        "y": {
                          "type": "number"
                        },
                        "z": {
                          "type": "number"
                        }
                      },
                      "required": [
                        "x",
                        "y",
                        "z"
                      ]
                    },
                    "rotation": {
                      "type": "object",
                      "properties": {
                        "x": {
                          "type": "number"
                        },
                        "y": {
                          "type": "number"
                        },
                        "z": {
                          "type": "number"
                        }
                      },
                      "required": [
                        "x",
                        "y",
                        "z"
                      ]
                    },
                    "boxSize": {
                      "type": "object",
                      "properties": {
                        "x": {
                          "type": "number"
                        },
                        "y": {
                          "type": "number"
                        },
                        "z": {
                          "type": "number"
                        }
                      },
                      "required": [
                        "x",
                        "y",
                        "z"
                      ]
                    }
                  },
                  "required": [
                    "position",
                    "rotation",
                    "boxSize"
                  ]
                },
                "taxonomy_attribute": {
                  "type": "object"
                },
                "isGeometryKeyFrame": {
                  "type": "boolean"
                },
                "isAttributeKeyFrame": {
                  "type": "boolean"
                },
                "origin": {
                  "type": "string"
                },
                "prelabel": {
                  "type": "object",
                  "properties": {
                    "modelName": {
                      "type": "string"
                    },
                    "modelVersion": {
                      "type": "string"
                    },
                    "confidenceScore": {
                      "type": "string"
                    }
                  }
                }
              }
            }
          },
          {
            "if": {
              "properties": {
                "object_type": {
                  "const": "polyline"
                }
              }
            },
            "then": {
              "required": [
                "object_type",
                "geometry"
              ],
              "properties": {
                "object_type": {
                  "const": "polyline"
                },
                "id": {
                  "type": "string"
                },
                "identity": {
                  "type": [
                    "integer",
                    "string"
                  ]
                },
                "geometry": {
                  "type": "object",
                  "properties": {
                    "points": {
                      "type": "array",
                      "items": {
                        "type": "object",
                        "properties": {
                          "position": {
                            "type": "object",
                            "properties": {
                              "x": {
                                "type": "number"
                              },
                              "y": {
                                "type": "number"
                              },
                              "z": {
                                "type": "number"
                              }
                            },
                            "required": [
                              "x",
                              "y",
                              "z"
                            ]
                          }
                        },
                        "required": [
                          "position"
                        ]
                      }
                    },
                    "thickness": {
                      "type": "number"
                    }
                  },
                  "required": [
                    "points",
                    "thickness"
                  ]
                },
                "taxonomy_attribute": {
                  "type": "object"
                },
                "origin": {
                  "type": "string"
                },
                "prelabel": {
                  "type": "object",
                  "properties": {
                    "modelName": {
                      "type": "string"
                    },
                    "modelVersion": {
                      "type": "string"
                    },
                    "confidenceScore": {
                      "type": "string"
                    }
                  }
                }
              }
            }
          }
        ]
      }
    }
  },
  "required": [
    "annotations"
  ]
<strong>}
</strong></code></pre>

</details>

**Sample/snippet of Cuboid, 2D Bbox, 3D Polyline**

{% tabs %}
{% tab title="Cuboid" %}
{% code lineNumbers="true" %}

```json
{
  "annotations": [
    {
      "id" : "10e3547e-0ffc-11f0-beff-c9af1eb6b655", // Optional - UUIDv4 ID for all annotations across sequence
      "class": "car",
      "object_type": "cuboid",
      "taxonomy_attribute": {},
      "geometry": {
        "position": { // Position in metres
          "x": 5.992325288342432,
          "y": 4.904602559666202,
          "z": 1.5813289166406617
        },
        "rotation": { // Rotation in Euler angles (in radian) with ZYX order 
          "x": 0,
          "y": 0,
          "z": 0
        },
        "boxSize": { 
          "x": 4.837695807772452,
          "y": 4.519224086476998,
          "z": 2.4204508776208957
        }
      },
      "identity": 1, // Identity must start from 1
      "isGeometryKeyFrame": true, // Geometry key frame remains untouched with interpolation 
      "origin": "Customer",// With 'Customer' label, labeller is able identify prelabels on point cloud tool, and origin of label
      "prelabel": {
        "modelName": "Pvrcnn",// Optional - To help collect statistics, model name
        "modelVersion": "v1",// Optional - To help collect statistics, model version
        "confidenceScore": "0.8"// Option - Confidence score of annotation
      }
    }
  ]
}
```

{% endcode %}
{% endtab %}

{% tab title="2D Bbox" %}

```json
{
  "annotations": [
    {
      "class": "Vehicle",
      "object_type": "rectangle",
      "identity": 3,
      "reference_folder": "rearward_left_bottom_medium_ID060",
      "geometry": {
        "coordinates": [
          {
            "x": 855.318048,
            "y": 2158.155112
          },
          {
            "x": 855.318048,
            "y": 861.2488400000001
          },
          {
            "x": 2784.83608,
            "y": 861.2488400000001
          },
          {
            "x": 2784.83608,
            "y": 2158.155112
          }
        ]
      },
      "taxonomy_attribute": {},
      "isGeometryKeyFrame": true, // Geometry key frame remains untouched with interpolation 
      "origin": "Customer",// With 'Customer' label, labeller is able identify prelabels on point cloud tool, and origin of label
      "prelabel": {
        "modelName": "Pvrcnn",// Optional - To help collect statistics, model name
        "modelVersion": "v1",// Optional - To help collect statistics, model version
        "confidenceScore": "0.8"// Optional - Confidence score of annotation
      }
    }
  ]
}
```

{% endtab %}

{% tab title="3D Polyline " %}
{% code lineNumbers="true" %}

```json
{
  "annotations": [
  
    {
      "id" : "10e2447e-0ffc-11f0-b0dd-c9af1eb6b655", // Optional - UUIDv4 ID for all annotations across sequence
      "class": "lane",
      "object_type": "polyline",
      "taxonomy_attribute": {},
      "geometry": {

        "points": [
          {
            "position": {
              "x": -5.910736083984375,
              "y": 0.7063865661621094,
              "z": -0.001922607421875
            }
          },
          {
            "position": {
              "x": 4.194019317626953,
              "y": -2.124612808227539,
              "z": 0.3410773277282715
            }
          },
          {
            "position": {
              "x": -3.438983917236328,
              "y": -7.768611907958984,
              "z": 2.552077293395996
            }
          },
          {
            "position": {
              "x": -11.68572998046875,
              "y": -6.034612655639648,
              "z": 5.980077266693115
            }
          },
          {
            "position": {
              "x": -10.565982818603516,
              "y": -5.695611953735352,
              "z": 3.379077434539795
            }
          },
          {
            "position": {
              "x": -7.793731689453125,
              "y": -2.306612014770508,
              "z": 0.15207719802856445
            }
          },
          {
            "position": {
              "x": -17.27798080444336,
              "y": -7.60261344909668,
              "z": 2.255077362060547
            }
          }
        ],
        "thickness": 0
      },
      "identity": 1,   // Identity must start from 1
      "origin": "Customer",// With 'Customer' label, labeller is able identify prelabels on point cloud tool, and origin of label
      "prelabel": {
        "modelName": "Pvrcnn",// Optional - To help collect statistics, model name
        "modelVersion": "v1",// Optional - To help collect statistics, model version
        "confidenceScore": "0.8"// Optional - Confidence score of annotation
      }
    }
  ]
}


```

{% endcode %}
{% endtab %}
{% endtabs %}

#### Step 5: Create a folder for ego data&#x20;

Ego pose data is used to enable features such as vehicle velocity, [merged point cloud](/annotation-tool/key-features/merged-point-cloud.md) etc. To calculate the reference velocity of objects around the ego vehicle, the ego data information for each frame should be provided with the dataset.

* Create a folder within `Folder_1` containing files of the ego data for each frame.
* This folder must be named as `ego_data`
* The ego data files in the `ego_data` folder must have the same name as the point cloud file corresponding to that frame. For example, if the point cloud file for `Frame 1` is named `xyz_timestamp.las` in the LiDAR folder, then the ego data file should also be named `xyz_timestamp.json` in the `ego_data` folder.
* To capture the **velocity** of objects around the ego vehicle, each file within the folder must include the "`timestamp_epoch_ns`" information
  * `timestamp_epoch_ns` is the timestamp at which each frame is captured
  * It is represented as a Unix epoch timestamp in nanoseconds (ns).&#x20;
* To facilitate **merge point cloud** functionality, the ego data information for each frame will be either calculated using the ICP Vanilla registration algorithm or is provided with the dataset and needs to be placed in the `ego_data` folder.

<details>

<summary>Snippet of the ego data</summary>

<pre class="language-json"><code class="lang-json"><strong>{
</strong> "ego": {
   "timestamp_epoch_ns": 50904429,
   "utmHeading_deg": 0.0,
   "utmX_m": 0.0,
   "utmY_m": 0.0,
   "utmZ_m": 0.0
 }
}
</code></pre>

</details>

* This file should include the "utmHeading\_deg,utmX\_m,utmY\_m,utmZ\_m" information.
  * Translation (x, y, z):
    * prev\_utmX\_m: The distance the object has moved along the x-axis (in meters) with respect to the `1st frame`.
    * prev\_utmY\_m: The distance the object has moved along the y-axis (in meters) with respect to the `1st frame.`
    * prev\_utmZ\_m: The distance the object has moved along the z-axis (in meters) with respect to the `1st frame.`
  * Rotation (yaw, pitch, roll):
    * prev\_utmHeading\_deg: The angle of rotation around the yaw axis (in degrees) with respect to the `1st frame.`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.imerit-prod.io/project-setup/pre-process-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
