Karlsruhe Dataset
Author: Andreas GeigerInstitute of Measurement and Control Systems
Karlsruhe Institute of Technology
Please send any feedback to: geiger@kit.edu
Labeled Objects (Cars + Pedestrians)
This section contains two datasets with roughly 1000 images each and object bounding boxes for cars or pedestrians (~10000 bounding boxes in total). Furthermore, it contains the orientation of each object, discretized into 8 classes for cars and 4 classes for pedestrians. The .zip files provided below come with a MATLAB-based label viewer. Labels are saved as MATLAB matrices (.mat files), where each row is an object in the image and the columns are position, size, class and orientation of the object. You can use this dataset to train your preferred objected detector. We had very good experiences with the cascaded part-based L-SVM from Ross Girshick and Pedro Felzenszwalb, which we modifed to fix the latent orientations to the ones given by the labels. We have been able to detect very small objects by upsampling the original images up to a factor of three.
Downloads
- objects_2011_a.zip: 775 images with car and pedestrian labels. This dataset additionally contains 1155 negative images (no objects) for learning a discriminative object detector, as well as a flag indicating which objects are clearly visible.
- objects_2011_b.zip: 1020 images with car labels.
References
This dataset has been used for learning the part-based object detector in:
LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger11,
author = {Andreas Geiger and Christian Wojek and Raquel Urtasun},
title = {Joint 3D Estimation of Objects and Scene Layout},
booktitle = {Neural Information Processing Systems (NIPS)},
year = {2011},
month = {December},
address = {Granada, Spain}
}
Stereo Video Sequences + (rough) GPS Vehicle Poses
This section contains high-quality stereo sequences recorded from a moving vehicle in Karlsruhe. The sequences, which are captured by Pointgrey Flea2 firewire cameras, are saved as rectified images in *.png format, ground truth odometry from an OXTS RT 3000 GPS/IMU system is provided in a separate text file.The calibration files contain parameters of the system before and after rectification. You only need to consider the matrices P1_roi (left camera) and P2_roi (right camera), which represent the 3x4 projection matrices (P=K[R|t]) of the visible region of interest after rectification in row-aligned order, i.e., the first 4 entries are the first row, etc. The (intrinsic) calibration matrices K1 and K2 of the left and right camera are equal and specified by the first 3x3 submatrix of P1_roi and P2_roi, respectively. The left camera is the reference camera. Hence, the baseline is given by base = -P2_roi(1,4)/P2_roi(1,1). Reprojection to 3D can be performed via
X = (u-cu)*base/d
Y = (v-cv)*base/d
Z = f*base/d
where (u,v) is a 2D point in the image coordinate system, (cu,cv) is the principal point of the camera, f is the focal length, base is the baseline, d is the disparity and (X,Y,Z) is a 3D point in the camera coordinate system.
The file insdata.txt contains for each frame (first row = frame 0) the output from the GPS/IMU system. The columns are: timestamp,lat,lon,alt,x,y,z,roll,pitch,yaw. Here x,y,z is metric (z is the vehicle height over sea level), and can be used for comparing visual odometry, but make sure to align the rotation of both coordinate systems, and be aware that the GPS/IMU system is not as accurate as it is supposed to be (0.05 meters). The camera is roughly located 1.6 meters in front, 0.6 meters above and 0.05 meters to the left of the GPS/IMU system. The pitch angle of the camera with respect to the GPS/IMU is -0.08 radian or -4.6 degrees (the camera is slightly facing downwards).
Note: We do have more sequences, about 350 in total (overview archive: thumbnails.tar). If you are interested in one of those, please write me a mail.
Additional Note: Unfortunately our recording program had a bug at the time of recording the sequences, so roll and pitch are not valid at the moment. We have fixed that bug, so that future sequences will be ok. The rest of the variables is correct, however. Thanks to Diego Rodriguez for reporting this issue!
Name: 2009_09_08_drive_0010.zipCalibration: 2009_09_08_calib.txt
Resolution: 1344x391 pixels
Length: 1424 frames (0.7 Gigabytes)
Framerate: 10 frames per second
Name: 2009_09_08_drive_0012.zipCalibration: 2009_09_08_calib.txt
Resolution: 1344x391 pixels
Length: 2579 frames (1.4 Gigabytes)
Framerate: 10 frames per second
Name: 2009_09_08_drive_0015.zipCalibration: 2009_09_08_calib.txt
Resolution: 1344x391 pixels
Length: 1022 frames (0.5 Gigabytes)
Framerate: 10 frames per second
Name: 2009_09_08_drive_0016.zipCalibration: 2009_09_08_calib.txt
Resolution: 1344x391 pixels
Length: 1206 frames (0.7 Gigabytes)
Framerate: 10 frames per second
Name: 2009_09_08_drive_0019.zipCalibration: 2009_09_08_calib.txt
Resolution: 1344x391 pixels
Length: 1249 frames (0.7 Gigabytes)
Framerate: 10 frames per second
Name: 2009_09_08_drive_0021.zipCalibration: 2009_09_08_calib.txt
Resolution: 1344x391 pixels
Length: 1514 frames (0.9 Gigabytes)
Framerate: 10 frames per second
Name: 2009_12_14_drive_0051.zipCalibration: 2009_12_14_calib.txt
Resolution: 1267x387 pixels
Length: 953 frames (0.5 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_04_drive_0032.zipCalibration: 2010_03_04_calib.txt
Resolution: 1348x374 pixels
Length: 602 frames (0.4 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_04_drive_0033.zipCalibration: 2010_03_04_calib.txt
Resolution: 1348x374 pixels
Length: 400 frames (0.2 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_04_drive_0041.zipCalibration: 2010_03_04_calib.txt
Resolution: 1348x374 pixels
Length: 449 frames (0.2 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_04_drive_0042.zipCalibration: 2010_03_04_calib.txt
Resolution: 1348x374 pixels
Length: 355 frames (0.2 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_05_drive_0017.zipCalibration: 2010_03_05_calib.txt
Resolution: 1351x374 pixels
Length: 759 frames (0.3 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_05_drive_0023.zipCalibration: 2010_03_05_calib.txt
Resolution: 1351x374 pixels
Length: 819 frames (0.4 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_09_drive_0019.zipCalibration: 2010_03_09_calib.txt
Resolution: 1344x372 pixels
Length: 373 frames (0.2 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_09_drive_0020.zipCalibration: 2010_03_09_calib.txt
Resolution: 1344x372 pixels
Length: 546 frames (0.4 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_09_drive_0023.zipCalibration: 2010_03_09_calib.txt
Resolution: 1344x372 pixels
Length: 112 frames (0.1 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_09_drive_0051.zipCalibration: 2010_03_09_calib.txt
Resolution: 1344x372 pixels
Length: 485 frames (0.3 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_09_drive_0081.zipCalibration: 2010_03_09_calib.txt
Resolution: 1344x372 pixels
Length: 341 frames (0.2 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_09_drive_0082.zipCalibration: 2010_03_09_calib.txt
Resolution: 1344x372 pixels
Length: 502 frames (0.3 Gigabytes)
Framerate: 10 frames per second
Name: 2010_03_17_drive_0046.zipCalibration: 2010_03_17_calib.txt
Resolution: 1365x369 pixels
Length: 967 frames (0.5 Gigabytes)
Framerate: 10 frames per second
Using this dataset in your publication
If you find this dataset useful or if you use this software for your research, we would be happy if you cite the following related publications:
LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Geiger10,
author = {Andreas Geiger and Martin Roser and Raquel Urtasun},
title = {Efficient Large-Scale Stereo Matching},
booktitle = {Asian Conference on Computer Vision},
year = {2010},
month = {November},
address = {Queenstown, New Zealand}
}
LATEX BIBTEX CITATION ENTRY:
@INPROCEEDINGS{Kitt10,
author = {Bernd Kitt and Andreas Geiger and Henning Lategahn},
title = {Visual Odometry based on Stereo Image Sequences with RANSAC-based Outlier Rejection Scheme},
booktitle = {IEEE Intelligent Vehicles Symposium},
year = {2010},
month = {June},
address = {San Diego, USA}
}