Okutama-Action is a public video dataset for concurrent human action detection from an aerial viewpoint, created with support from the Prendinger Lab at Japan’s National Institute of Informatics. It contains 43 fully annotated video sequences, each about one minute long, covering 12 action classes. The dataset focuses on simulating multi-person, multi-action, dynamically changing scenarios in drone-captured aerial footage.
Functionally, it can be used for pedestrian detection and spatio-temporal action detection, while multi-person tracking is noted as still under development. The dataset is fairly challenging: up to 9 actors continuously perform multiple actions in a video, with as many as 10 concurrent actions/actors appearing at once. It also includes real-world issues such as dynamic action transitions, multi-label actors, significant changes in scale and aspect ratio, sudden camera movement, and more. The data is provided as 1280x720 frames as well as 4K video versions. Annotation fields include Track ID, bounding box, frame number, lost, occluded, generated/interpolated, Person label, and action columns. The page also provides three types of label files—MultiActionLabels, SingleActionLabels, and SingleActionTrackingLabels—making it easier to use for different tasks.
The project provides a final trained Caffe model, but it does not mention support for PyTorch, TensorFlow, or other modern frameworks, nor does it provide an API/SDK. It is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0, so it is free for non-commercial research use. For commercial use, users need to contact the authors for further discussion. As a result, it is more of a research data asset than a complete developer platform or commercial SaaS tool.
Its strengths are its distinctive data setting, aerial drone viewpoint, clear annotation structure, high resolution, and inclusion of real-world challenges such as occlusion, viewpoint changes, and concurrent actions. It is well suited for evaluating algorithm robustness. The drawbacks are also clear: the page was last updated in 2018, so maintenance activity is uncertain; the model is based on Caffe, which is now a relatively dated ecosystem; and it lacks data-loading scripts, example training code, APIs, and documentation for integration with modern frameworks.
It is suitable for computer vision researchers, drone vision teams, and developers working on action detection or multi-object tracking algorithms, especially for academic paper reproduction and benchmarking. The download links have been hosted on AWS and Dropbox, so access from mainland China may be unstable and partially restricted. For large-file downloads, a reliable network environment is recommended. Alternative or complementary datasets include AVA, Kinetics, UCF101, VisDrone, and MOTChallenge.
⚠ This review is compiled from public sources and does not constitute a purchase recommendation. Verify all facts on the vendor's official site. Verify on okutama-action.org official site.
okutama-action.org is an Unknown Dev Tools provider. TG4G tracks its product information, an overall rating of 7.0/10, and a China-accessibility score of China direct-connect friendly. Click "Visit Official Site" to reach okutama-action.org directly.