Pre-process Data
The Pre-process data step will help ensure that the data is ready to be uploaded in a tool ingestible format for the Data Import step. The dataset must segregated in a folder and stored in an Amazon S3 bucket
Format Data - Pointcloud formated to .las
Segregate Data - Folder structure
Store Data - Amazon S3 Bucket
Format Point Cloud Data
The tool supports LiDAR & RaDar datasets in
.lasformat only.The point cloud datasets can be in formats such as
.bin.pcd.json.txtData in
.binformat first will be coverted to.pcdand then further converted to.lasRead about
.binto.pcdconversion for NuScenes data set hereCloudCompare can be used to convert
.pcdto.lasusing the following command:
Segregate Data into Folders
The way the data is segregated impacts the visibility of tasks that load on the tool for the labeling experts. Below is the terminology that will be frequently used in this document to segregate data correctly.
Data Terminology
Task
A task is defined as the labeling work performed on one frame that loads on the annotation tool.
Frame
A frame is a visual dataset that loads on the annotation tool that has Image data along with its respective sensor data (LiDAR, RaDar etc..)
Batch
A batch (or sequence) is the collective set of multiple frames that load on the annotation tool for a single expert is called a batch. The size of a batch can vary between 1 and n.
Submission happens for a Batch.
Data Reflection on Tool
To understand segregation better, consider the following example:
There are 100 frames in a sequence which need to be annotated. The desired number of frames that a single batch should load is 10 at most. This limit of frames in a batch that loads on the tool is set at the time of setting a batch limit at the time of importing data.
Here is the representation of this batch on the tool.
There is batch of 10 frames (BLUE).
Each frame has 1 point cloud (ORANGE) and 3 camera images (GREEN) linked to it.
Each camera is synced with the point cloud (ORANGE). The images will be synchronized with the corresponding point cloud based on the availability of calibration details.
The LiDAR point cloud may include pre-existing labeled data. (PINK)
Using the help of the above example, the Data Folder will need to be reorganzied in the following format:
Respective Camera Data - Green folders
LiDAR data - Orange folder
Calibration - Red folder (preferable to have calibration to sync the camera with point cloud for quicker reference)
Pre-labelled annotation data - Pink folder (when prelabelled data is available)
Velocity/ Ego vehicle data - White folder (optional based on output required)
Preparing Data Folder
The data can be store all together in one Folder or multiple folders
Option 1: Prepare 1 folder with all 100 frames data
Ideally all frames belonging to a single sequence should be stored together.
Option 2: Prepare multiple folders to divide 100 frame's data.
This is useful when the frames in a batch are not sequenced
The folder name should not have any space Eg:
Folder_1
Step 1: Create Camera Folders
Create image folders for each camera sensor respectively in
Folder 1. For example,Camera 1,Camera 2,Camera 3...Camera nThe Camera folder name will reflect on the respective camera images fetched on the annotation tool. Hence, ensure the camera folders are named appropriately to help provide context to the subject expert.
The image files' format should be in .
jpegor .pngformat.Each camera folder should contain all the images belonging to that camera sensor across all the Frames stored in
Folder_1The image files in the Camera folders should have identical names if they were captured at the same instance. For example, in
Frame 1,Camera 1,Camera 2andCamera 3will all have the image filename saved asxyz_timestamp1.jpeg.
Step 2: Create LiDAR Folder
Create a folder containing files of the point cloud data across all the frames in
Folder_1.The point cloud folder can be named arbitrarily. For consistency, consider naming it LiDAR or PCT.
The point cloud data must be in .las format, as this is the supported format for the annotation tool. Refer to the guide on Format LiDAR Data for more details.
The folder should include all the point cloud files corresponding to all frames, organized within
Folder_1The point cloud file in the LiDAR folder must have the same name as the images from the cameras associated with that frame. For instance, if the image files for all three cameras (stored in their respective camera folders i.e.
Camera 1,Camera 2andCamera 3) are namedxyz_timestamp1.jpegfor Frame 1, then the point cloud file for that frame in the LiDAR folder must be namedxyz_timestamp1.las.
Step 3: Create Calibration Folder
When camera sensors and point cloud files are available, a calibration file may also be present. If it exists, create a dedicated folder to store all calibration data in .json format.
This folder must be named as
calibration.If the calibration is identical for all camera sensors, then store 1 calibration file in it called
calibration.jsonIf the calibration data varies for all camera sensors, then either:
Prepare one separate file for each camera sensor and save it under the
calibrationfolder OR,Prepare one .json file with a separate block for each camera calibration.
Compute the calibration matrix by multiplying the cam_intrinsic matrix with the inverse of camera_extrinsic
calibration_matrix = camera_intrinsic * inverse(camera_extrinsic)Step 4: Create a Folder for Pre-Labelled Annotations
Prepare the data folder structure.
Store the prelabled annotations in the lidar_annotation folder.
For example, if you have 100 frames, the corresponding files should be named as: 1.json, 2.json, 3.json, and so on up to 100.json.
The JSON schema needed to create prelabled files (supports cuboid, 2D bbox, 3Dpolyline).
Sample/snippet of Cuboid, 2D Bbox, 3D Polyline
{
"annotations": [
{
"id" : "10e3547e-0ffc-11f0-beff-c9af1eb6b655", // Optional - UUIDv4 ID for all annotations across sequence
"class": "car",
"object_type": "cuboid",
"taxonomy_attribute": {},
"geometry": {
"position": { // Position in metres
"x": 5.992325288342432,
"y": 4.904602559666202,
"z": 1.5813289166406617
},
"rotation": { // Rotation in Euler angles (in radian) with ZYX order
"x": 0,
"y": 0,
"z": 0
},
"boxSize": {
"x": 4.837695807772452,
"y": 4.519224086476998,
"z": 2.4204508776208957
}
},
"identity": 1, // Identity must start from 1
"isGeometryKeyFrame": true, // Geometry key frame remains untouched with interpolation
"origin": "Customer",// With 'Customer' label, labeller is able identify prelabels on point cloud tool, and origin of label
"prelabel": {
"modelName": "Pvrcnn",// Optional - To help collect statistics, model name
"modelVersion": "v1",// Optional - To help collect statistics, model version
"confidenceScore": "0.8"// Option - Confidence score of annotation
}
}
]
}{
"annotations": [
{
"class": "Vehicle",
"object_type": "rectangle",
"identity": 3,
"reference_folder": "rearward_left_bottom_medium_ID060",
"geometry": {
"coordinates": [
{
"x": 855.318048,
"y": 2158.155112
},
{
"x": 855.318048,
"y": 861.2488400000001
},
{
"x": 2784.83608,
"y": 861.2488400000001
},
{
"x": 2784.83608,
"y": 2158.155112
}
]
},
"taxonomy_attribute": {},
"isGeometryKeyFrame": true, // Geometry key frame remains untouched with interpolation
"origin": "Customer",// With 'Customer' label, labeller is able identify prelabels on point cloud tool, and origin of label
"prelabel": {
"modelName": "Pvrcnn",// Optional - To help collect statistics, model name
"modelVersion": "v1",// Optional - To help collect statistics, model version
"confidenceScore": "0.8"// Optional - Confidence score of annotation
}
}
]
}{
"annotations": [
{
"id" : "10e2447e-0ffc-11f0-b0dd-c9af1eb6b655", // Optional - UUIDv4 ID for all annotations across sequence
"class": "lane",
"object_type": "polyline",
"taxonomy_attribute": {},
"geometry": {
"points": [
{
"position": {
"x": -5.910736083984375,
"y": 0.7063865661621094,
"z": -0.001922607421875
}
},
{
"position": {
"x": 4.194019317626953,
"y": -2.124612808227539,
"z": 0.3410773277282715
}
},
{
"position": {
"x": -3.438983917236328,
"y": -7.768611907958984,
"z": 2.552077293395996
}
},
{
"position": {
"x": -11.68572998046875,
"y": -6.034612655639648,
"z": 5.980077266693115
}
},
{
"position": {
"x": -10.565982818603516,
"y": -5.695611953735352,
"z": 3.379077434539795
}
},
{
"position": {
"x": -7.793731689453125,
"y": -2.306612014770508,
"z": 0.15207719802856445
}
},
{
"position": {
"x": -17.27798080444336,
"y": -7.60261344909668,
"z": 2.255077362060547
}
}
],
"thickness": 0
},
"identity": 1, // Identity must start from 1
"origin": "Customer",// With 'Customer' label, labeller is able identify prelabels on point cloud tool, and origin of label
"prelabel": {
"modelName": "Pvrcnn",// Optional - To help collect statistics, model name
"modelVersion": "v1",// Optional - To help collect statistics, model version
"confidenceScore": "0.8"// Optional - Confidence score of annotation
}
}
]
}
Step 5: Create a folder for ego data
Ego pose data is used to enable features such as vehicle velocity, merged point cloud etc. To calculate the reference velocity of objects around the ego vehicle, the ego data information for each frame should be provided with the dataset.
Create a folder within
Folder_1containing files of the ego data for each frame.This folder must be named as
ego_dataThe ego data files in the
ego_datafolder must have the same name as the point cloud file corresponding to that frame. For example, if the point cloud file forFrame 1is namedxyz_timestamp.lasin the LiDAR folder, then the ego data file should also be namedxyz_timestamp.jsonin theego_datafolder.To capture the velocity of objects around the ego vehicle, each file within the folder must include the "
timestamp_epoch_ns" informationtimestamp_epoch_nsis the timestamp at which each frame is capturedIt is represented as a Unix epoch timestamp in nanoseconds (ns).
To facilitate merge point cloud functionality, the ego data information for each frame will be either calculated using the ICP Vanilla registration algorithm or is provided with the dataset and needs to be placed in the
ego_datafolder.
This file should include the "utmHeading_deg,utmX_m,utmY_m,utmZ_m" information.
Translation (x, y, z):
prev_utmX_m: The distance the object has moved along the x-axis (in meters) with respect to the
1st frame.prev_utmY_m: The distance the object has moved along the y-axis (in meters) with respect to the
1st frame.prev_utmZ_m: The distance the object has moved along the z-axis (in meters) with respect to the
1st frame.
Rotation (yaw, pitch, roll):
prev_utmHeading_deg: The angle of rotation around the yaw axis (in degrees) with respect to the
1st frame.
Last updated