Pre-process Data
The Pre-process data step will help ensure that the data is ready to be uploaded in a tool ingestible format for the Data Import step. The dataset must segregated in a folder and stored in an Amazon S3 bucket
Format Data - Pointcloud formated to .las
Segregate Data - Folder structure
Store Data - Amazon S3 Bucket
Format Point Cloud Data
The tool supports LiDAR & RaDar datasets in
.las
format only.The point cloud datasets can be in formats such as
.bin
.pcd
.json
.txt
Data in
.bin
format first will be coverted to.pcd
and then further converted to.las
Read about
.bin
to.pcd
conversion for NuScenes data set hereCloudCompare can be used to convert
.pcd
to.las
using the following command:
Segregate Data into Folders
The way the data is segregated impacts the visibility of tasks that load on the tool for the labeling experts. Below is the terminology that will be frequently used in this document to segregate data correctly.
Data Terminology
Task
A task is defined as the labeling work performed on one frame that loads on the annotation tool.
Frame
A frame is a visual dataset that loads on the annotation tool that has Image data along with its respective sensor data (LiDAR, RaDar etc..)
Batch
A batch (or sequence) is the collective set of multiple frames that load on the annotation tool for a single expert is called a batch. The size of a batch can vary between 1 and n.
Submission happens for a Batch.
Data Reflection on Tool
To understand segregation better, consider the following example:
There are 100 frames in a sequence which need to be annotated. The desired number of frames that a single batch should load is 10 at most. This limit of frames in a batch that loads on the tool is set at the time of setting a batch limit at the time of importing data.
Here is the representation of this batch on the tool.
There is batch of 10 frames (BLUE).
Each frame has 1 point cloud (ORANGE) and 3 camera images (GREEN) linked to it.
Each camera is synced with the point cloud (ORANGE). The images will be synchronized with the corresponding point cloud based on the availability of calibration details.
The LiDAR point cloud may include pre-existing labeled data. (PINK)
Using the help of the above example, the Data Folder will need to be reorganzied in the following format:
Respective Camera Data - Green folders
LiDAR data - Orange folder
Calibration - Red folder (preferable to have calibration to sync the camera with point cloud for quicker reference)
Pre-labelled annotation data - Pink folder (when prelabelled data is available)
Velocity/ Ego vehicle data - White folder (optional based on output required)
Preparing Data Folder
The data can be store all together in one Folder or multiple folders
Option 1: Prepare 1 folder with all 100 frames data
Ideally all frames belonging to a single sequence should be stored together.
Option 2: Prepare multiple folders to divide 100 frame's data.
This is useful when the frames in a batch are not sequenced
The folder name should not have any space Eg:
Folder_1
Step 1: Create Camera Folders
Create image folders for each camera sensor respectively in
Folder 1
. For example,Camera 1
,Camera 2
,Camera 3
...Camera n
The Camera folder name will reflect on the respective camera images fetched on the annotation tool. Hence, ensure the camera folders are named appropriately to help provide context to the subject expert.
The image files' format should be in .
jpeg
or .png
format.Each camera folder should contain all the images belonging to that camera sensor across all the Frames stored in
Folder_1
The image files in the Camera folders should have identical names if they were captured at the same instance. For example, in
Frame 1
,Camera 1
,Camera 2
andCamera 3
will all have the image filename saved asxyz_timestamp1.jpeg
.
Step 2: Create LiDAR Folder
Create a folder containing files of the point cloud data across all the frames in
Folder_1
.The point cloud folder can be named arbitrarily. For consistency, consider naming it LiDAR or PCT.
The point cloud data must be in .las format, as this is the supported format for the annotation tool. Refer to the guide on Format LiDAR Data for more details.
The folder should include all the point cloud files corresponding to all frames, organized within
Folder_1
The point cloud file in the LiDAR folder must have the same name as the images from the cameras associated with that frame. For instance, if the image files for all three cameras (stored in their respective camera folders i.e.
Camera 1
,Camera 2
andCamera 3
) are namedxyz_timestamp1.jpeg
for Frame 1, then the point cloud file for that frame in the LiDAR folder must be namedxyz_timestamp1.las
.
Step 3: Create Calibration Folder
When camera sensors and point cloud files are available, a calibration file may also be present. If it exists, create a dedicated folder to store all calibration data in .json format.
This folder must be named as
calibration
.If the calibration is identical for all camera sensors, then store 1 calibration file in it called
calibration.json
If the calibration data varies for all camera sensors, then either:
Prepare one separate file for each camera sensor and save it under the
calibration
folder OR,Prepare one .json file with a separate block for each camera calibration.
Compute the calibration matrix by multiplying the cam_intrinsic matrix with the inverse of camera_extrinsic
Step 4: Create a folder for Pre-Labelled Annotations
Pre-labelled dataset format needs to be in iMerit's Annotation Tool output format. Get in touch with iMerit in case of labeling requirements for pre-labelled datasets.
Step 5: Create a folder for ego data
Ego pose data is used to enable features such as vehicle velocity, merged point cloud etc. To calculate the reference velocity of objects around the ego vehicle, the ego data information for each frame should be provided with the dataset.
Create a folder within
Folder_1
containing files of the ego data for each frame.This folder must be named as
ego_data
The ego data files in the
ego_data
folder must have the same name as the point cloud file corresponding to that frame. For example, if the point cloud file forFrame 1
is namedxyz_timestamp.las
in the LiDAR folder, then the ego data file should also be namedxyz_timestamp.json
in theego_data
folder.To capture the velocity of objects around the ego vehicle, each file within the folder must include the "
timestamp_epoch_ns
" informationtimestamp_epoch_ns
is the timestamp at which each frame is capturedIt is represented as a Unix epoch timestamp in nanoseconds (ns).
To facilitate merge point cloud functionality, the ego data information for each frame will be either calculated using the ICP Vanilla registration algorithm or is provided with the dataset and needs to be placed in the
ego_data
folder.
This file should include the "utmHeading_deg,utmX_m,utmY_m,utmZ_m" information.
Translation (x, y, z):
prev_utmX_m: The distance the object has moved along the x-axis (in meters) with respect to the
1st frame
.prev_utmY_m: The distance the object has moved along the y-axis (in meters) with respect to the
1st frame.
prev_utmZ_m: The distance the object has moved along the z-axis (in meters) with respect to the
1st frame.
Rotation (yaw, pitch, roll):
prev_utmHeading_deg: The angle of rotation around the yaw axis (in degrees) with respect to the
1st frame.
3. Data Storage
By default, iMerit provides Cross-account IAM roles for programmatic access to upload customer data into a predesignated S3 bucket.
These steps are for a customer trying to upload data.
Create an IAM role or user in the customer AWS account (role_1).
Give the role_1 download permission (GetObject) and upload (PutObject) objects to and from the predefined S3 bucket.
Last updated