It’s 2021, and we’re already gearing up for the new era of full self-driving cars. As automotive companies are competing to achieve level 5 autonomy for vehicles, high-quality and diverse training datasets have acquired a vital status in the development process. The players who are closer to the finish line are those who’ve broken the garbage-in-garbage-out cycle with training/labeled data because we are all aware that machine learning models are only as good as the data used to create them.
The most efficient method of acquiring good training datasets is to create them. Labeled datasets designed for specific autonomous problems tend to help ML teams achieve better success rates in solving those predefined problems. But we also understand that everyone cannot afford to heavily invest in creating special training datasets. And that’s why we’ve curated a list of the top 12 open datasets that can get you started immediately.
Top 12 open-source autonomous driving datasets.
The Waymo Open Dataset released at CVPR (2019) is one of the most popular, high-resolution sensor data for autonomous driving projects. The dataset features 1950 segments (20-second sequences) of driving data collected at 10 Hz with multi-sensor data from 1 mid-range lidar, 4 short-range lidars, and 5 front+side cameras. There are approximately 12.6M 3D bounding box labels on lidar data and 11.8M 2D bounding box labels on camera data, along with tracking IDs. It also provides synchronized lidar and camera data, lidar to camera projections, sensor calibrations, and vehicle poses that amplify the data usability for various use cases. The dataset also offers a high diversity by factoring in variables such as weather, pedestrians, cyclists, night time, daylight, construction, downtown, and suburban conditions.
Find out more about the scalability, diversity, object detection, and tracking baselines of the dataset here.
The Argoverse dataset claims to be the first large-scale dataset with highly curated datasets and high-definition maps from over 1000 hours of driving data. The dataset contains two HD maps with geometric and semantic metadata such as lane centerlines, lane direction, and driveable area. The data is collected from roof-mounted sensors including 2 lidar sensors (10 Hz), 7 ring cameras (30 Hz), and 2 front-facing stereo cameras (5 Hz). They also provide intrinsic and extrinsic calibration data for LiDAR and all nine cameras. Argoverse 3D Tracking Dataset includes 3D tracking annotations for 113 segments (15-30 second sequences) and more than 11K tracked objects in drivable areas to help more accurate 3D tracking in autonomous driving projects. Argoverse Motion Forecasting Dataset includes more than 300K segments (5-second sequences) from 320 driving hours for better motion forecasting models. The datasets also contain challenging segments — including vehicles at intersections, vehicles taking left or right turns, vehicles changing lanes, etc.
Lyft’s large scale dataset provides sensor inputs and maps for perception engineering models and detected traffic agents for better motion prediction and trajectory planning to accelerate the development of Level 5 self-driving cars. The dataset is collected from three synchronized lidars (1 roof-mounted and 2 on the bumper) that produce more than 215K points at 10 Hz, six 360° cameras, and one long-focal camera pointing upwards. All cameras are synchronized with lidars. The Prediction Dataset contains 170K road scenes from over 1000 hours of movement of traffic agents like cars, pedestrians, and cyclists, and close to 15K HD semantic maps. The maps feature semantic elements like lane segments, pedestrian crosswalks, stop signs, parking zones, and speed bumps. The Perception Dataset contains 1.3M 3D annotations and 350+ scenes that come in 60 -90 minute sequences. The datasets also include real-world scenarios like intersections and multi-lane traffic.
UC Berkeley’s Artificial Intelligence Research Lab (BAIR) open-sourced their driving database called Berkeley Deep Dive. The dataset contains 100K video sequences which have 40-second long sequences in moderate HD at 30 fps. The dataset contains recordings from different parts of the day -- dawn/dusk, daytime, and nighttime; different seasons/weather -- clear sky, partly cloudy, overcast, rainy, snowy, foggy; and different scenes like residential areas, highways, parking lots, city streets, gas stations, and tunnels. The dataset also includes localization, time stamps, and IMU data. The dataset provides road objects, lane markings, instance-semantic masks, and drivable area information. The dataset includes more than 1 million cars, 300K street signs, 130K pedestrians, data from multiple cities, and multiple scenarios.
The nuScenes dataset is developed by Motional (formerly nuTonomy) contains 1000 scenes from 15 hours of driving data with 20-second long sequences. This large-scale dataset provides data from the entire sensor-suite including six cameras, one lidar, 5 radars, localisation info via GPS, and IMU data. It includes 1.4M camera images, 390k LIDAR sweeps, 1.4M RADAR sweeps and 1.4M object bounding boxes in 40k keyframes. The nuScenes-lidar seg contains 1.4 billion annotated points across 40,000 point clouds, 850 scenes for training and validation, and 150 scenes for testing. The dataset showcases good diversity with data from two diverse cities, left versus right hand traffic, interesting driving manoeuvres, common traffic situations, and unexpected behaviours. The dataset also ensures data alignment between sensors and cameras by calibrating extrinsics and intrinsics of every sensor. It also provides synchronised sensor data to achieve cross-modality data alignment between the lidar and camera sensors.
The PandaSet released by Hesai & Scale covers a large number of urban driving situations. It includes 48K camera images, 16K LiDAR sweeps, 100+ scenes ranging 8 seconds each. The data includes 28 annotation classes including class attributes related to activity, visibility, location, pose and 37 semantic segmentation labels. The data is collected using the sensor suite of a full self-driving car including a mechanical LiDAR, a solid-state LiDAR, 1 long-focus and 5 wide-angle cameras, and on-board GPS/IMU. The dataset contains complex urban driving scenarios, including steep hills, construction, dense traffic and pedestrians, and a variety of times of day and lighting conditions in the morning, afternoon, dusk and evening. PandaSet scenes are selected from 2 routes in Silicon Valley: (1) San Francisco; and (2) El Camino Real from Palo Alto to San Mateo. The dataset also includes both camera and LiDAR extrinsics, camera intrinsic calibration, and IMU extrinsics.
Cityscapes is a large-scale dataset for autonomous driving that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with semantic segmentation annotations of 5 000 frames and a larger set of 20K weakly annotated frames. The Cityscapes dataset supports semantic understanding of urban street scenes via dense semantic segmentation, pan-optic labels, and instance segmentation for vehicles and people. The dataset contains 30 classes including road, sidewalk, parking, pole, pole group, sky, rider, person, etc. The dataset is quite diverse with scenes from 50 cities, seasons [spring, summer, fall], daytime data, and good/medium weather conditions. The dataset also includes useful metadata including GPS coordinates, ego-motion data from vehicle odometry, outside temperature from vehicle sensor.
LiDAR Object Tracking in ApolloScapeIn 2018 Baidu released ApolloScape. The data volume of ApolloScape was at the time 10 times greater than any other open-source autonomous driving dataset, including Kitti and CityScapes.
It includes, trajectory dataset, 3D perception lidar object detection and tracking dataset including about 100K image frames, 80k lidar point cloud and 1000km trajectories for urban traffic.
This data can be utilised for perception, simulation scenes, road networks etc., as well as enabling autonomous driving vehicles to be trained in more complex environments, weather and traffic conditions. It is collected under various lighting conditions and traffic densities in Beijing, China. ApolloScape also defines 26 different semantic items — eg. cars, bicycles, pedestrians, buildings, streetlights, etc. — with pixel-by-pixel semantic segmentation technique.
The Oxford RobotCar Dataset contains over 100 repetitions of a consistent route through Oxford, UK, captured over a period of over a year. It contains video, LiDAR, RADAR datasets for urban traffic including GPS, vehicle data and codes. The dataset includes a fairly diverse mix of many different combinations of weather, traffic and pedestrians, along with longer term changes such as construction and roadworks.
The Boxy Dataset by Bosch is a large vehicle detection dataset with almost two million annotated vehicles for training and evaluating object detection methods for self-driving cars on freeways. It has 200,000 images, 1,990,000 annotated vehicles in 5 megapixel resolution. The data also includes sunshine, rain, dusk, and night, and clear freeways, heavy traffic, traffic jams.
Google open sourced its largest human-made and natural landmarks dataset to support advance instance-level landmark recognition. The dataset was released as part of the Landmark Recognition and Landmark Retrieval Kaggle challenges in 2018.The dataset contains more than 2 million images depicting 30 thousand unique landmarks from across the world, a number of classes that is ~30x larger than what is available in commonly used datasets.
In 2019, Google released Landmarks V2, a larger landmark recognition dataset than the previous version. The dataset includes over 5 million images (2x that of the first release) of more than 200 thousand different landmarks (an increase of 7x). Due to the difference in scale, this dataset is much more diverse.
The KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset was released in 2012. They offer 2D, 3D, and bird’s eye view object detection datasets, 2D object and multi-object tracking datasets and 2D multi-object and segmentation datasets, road/lane evaluation detection datasets, both pixel and instance-level semantic datasets, and raw datasets. Data is diverse with datasets including cityscapes, residential areas, and campus areas. The sensor set-up was equipped with a standard station wagon with two high-resolution colors, grayscale video cameras, a Velodyne laser scanner, and a GPS localisation system. Datasets are captured by driving around the mid-size city of Karlsruhe, in rural areas, and on highways. Up to 15 cars and 30 pedestrians are visible per image. Besides providing all data in raw format, the datasets also extract benchmarks for each task. For each of our benchmarks, we also provide an evaluation metrics.
Note: The images in this blog are not owned by Playment. They have been sourced from Google for visual representation of these datasets.