Pre-process Data
Last updated
Last updated
The Pre-process data step will help ensure that the data is ready to be uploaded in a tool ingestible format for the step. The dataset must segregated in a folder and stored in an Amazon S3 bucket
- Pointcloud formated to .las
- Folder structure
- Amazon S3 Bucket
The tool supports LiDAR & RaDar datasets in .las
format only.
The point cloud datasets can be in formats such as .bin
.pcd
.json
.txt
Data in .bin
format first will be coverted to .pcd
and then further converted to .las
Read about .bin
to .pcd
conversion for NuScenes data set
CloudCompare can be used to convert .pcd
to .las
using the following command:
The way the data is segregated impacts the visibility of tasks that load on the tool for the labeling experts. Below is the terminology that will be frequently used in this document to segregate data correctly.
A task is defined as the labeling work performed on one frame that loads on the annotation tool.
A frame is a visual dataset that loads on the annotation tool that has Image data along with its respective sensor data (LiDAR, RaDar etc..)
A batch (or sequence) is the collective set of multiple frames that load on the annotation tool for a single expert is called a batch. The size of a batch can vary between 1 and n.
Submission happens for a Batch.
To understand segregation better, consider the following example:
Here is the representation of this batch on the tool.
There is batch of 10 frames (BLUE).
Each frame has 1 point cloud (ORANGE) and 3 camera images (GREEN) linked to it.
Each camera is synced with the point cloud (ORANGE). The images will be synchronized with the corresponding point cloud based on the availability of calibration details.
The LiDAR point cloud may include pre-existing labeled data. (PINK)
Using the help of the above example, the Data Folder will need to be reorganzied in the following format:
Respective Camera Data - Green folders
LiDAR data - Orange folder
Calibration - Red folder (preferable to have calibration to sync the camera with point cloud for quicker reference)
Pre-labelled annotation data - Pink folder (when prelabelled data is available)
Velocity/ Ego vehicle data - White folder (optional based on output required)
The data can be store all together in one Folder or multiple folders
Option 1: Prepare 1 folder with all 100 frames data
Ideally all frames belonging to a single sequence should be stored together.
Option 2: Prepare multiple folders to divide 100 frame's data.
This is useful when the frames in a batch are not sequenced
The folder name should not have any space Eg: Folder_1
Create image folders for each camera sensor respectively in Folder 1
. For example, Camera 1
, Camera 2
, Camera 3
... Camera n
The Camera folder name will reflect on the respective camera images fetched on the annotation tool. Hence, ensure the camera folders are named appropriately to help provide context to the subject expert.
The image files' format should be in .jpeg
or .png
format.
Each camera folder should contain all the images belonging to that camera sensor across all the Frames stored in Folder_1
The image files in the Camera folders should have identical names if they were captured at the same instance. For example, in Frame 1
, Camera 1
, Camera 2
and Camera 3
will all have the image filename saved as xyz_timestamp1.jpeg
.
Create a folder containing files of the point cloud data across all the frames in Folder_1
.
The point cloud folder can be named arbitrarily. For consistency, consider naming it LiDAR or PCT.
The folder should include all the point cloud files corresponding to all frames, organized within Folder_1
The point cloud file in the LiDAR folder must have the same name as the images from the cameras associated with that frame. For instance, if the image files for all three cameras (stored in their respective camera folders i.e. Camera 1
, Camera 2
and Camera 3
) are named xyz_timestamp1.jpeg
for Frame 1, then the point cloud file for that frame in the LiDAR folder must be named xyz_timestamp1.las
.
When camera sensors and point cloud files are available, a calibration file may also be present. If it exists, create a dedicated folder to store all calibration data in .json format.
This folder must be named as calibration
.
If the calibration is identical for all camera sensors, then store 1 calibration file in it called calibration.json
If the calibration data varies for all camera sensors, then either:
Prepare one separate file for each camera sensor and save it under the calibration
folder OR,
Prepare one .json file with a separate block for each camera calibration.
Compute the calibration matrix by multiplying the cam_intrinsic matrix with the inverse of camera_extrinsic
Store the prelabled annotations in the lidar_annotation folder.
For example, if you have 100 frames, the corresponding files should be named as: 1.json, 2.json, 3.json, and so on up to 100.json.
The JSON schema needed to create prelabled files (supports cuboid and 3Dpolyline).
Sample/snippet of 3D Polyline and Cuboid
Create a folder within Folder_1
containing files of the ego data for each frame.
This folder must be named as ego_data
The ego data files in the ego_data
folder must have the same name as the point cloud file corresponding to that frame. For example, if the point cloud file for Frame 1
is named xyz_timestamp.las
in the LiDAR folder, then the ego data file should also be named xyz_timestamp.json
in the ego_data
folder.
To capture the velocity of objects around the ego vehicle, each file within the folder must include the "timestamp_epoch_ns
" information
timestamp_epoch_ns
is the timestamp at which each frame is captured
It is represented as a Unix epoch timestamp in nanoseconds (ns).
To facilitate merge point cloud functionality, the ego data information for each frame will be either calculated using the ICP Vanilla registration algorithm or is provided with the dataset and needs to be placed in the ego_data
folder.
This file should include the "utmHeading_deg,utmX_m,utmY_m,utmZ_m" information.
Translation (x, y, z):
prev_utmX_m: The distance the object has moved along the x-axis (in meters) with respect to the 1st frame
.
prev_utmY_m: The distance the object has moved along the y-axis (in meters) with respect to the 1st frame.
prev_utmZ_m: The distance the object has moved along the z-axis (in meters) with respect to the 1st frame.
Rotation (yaw, pitch, roll):
prev_utmHeading_deg: The angle of rotation around the yaw axis (in degrees) with respect to the 1st frame.
There are 100 frames in a sequence which need to be annotated. The desired number of frames that a single batch should load is 10 at most. This limit of frames in a batch that loads on the tool is set at the time of setting a at the time of .
The point cloud data must be in .las format, as this is the supported format for the annotation tool. Refer to the guide on for more details.
Prepare the .
Ego pose data is used to enable features such as vehicle velocity, etc. To calculate the reference velocity of objects around the ego vehicle, the ego data information for each frame should be provided with the dataset.