AI & Machine Learning
Vintage Film AI Training Dataset
License a large-scale dataset of vintage 8mm and Super 8 archival footage for computer vision, generative AI, and temporal analysis research. Over 217,560 digitized clips with structured metadata spanning the 1930s–1980s.
Key Stats
Dataset Contents
Each clip record includes the following structured metadata fields:
- Title — descriptive clip title
- Description — detailed narrative description
- Keywords — searchable keyword tags
- Shot Year — estimated year the footage was captured
- Shot Decade — decade classification (e.g. 1950s, 1960s)
- Duration — clip length in milliseconds
- Format Tags — film format (8mm, Super 8)
- Location — city, region, and country
- Geo Tokens — tokenized geographic identifiers
- Copyright — rights and ownership information
Metadata coverage across the full dataset:
Film Format Breakdown
All footage in the collection was originally shot on analog film stock:
Decade Coverage
Distribution of clips by the decade in which they were shot:
Geographic Coverage
Footage spans 129 countries and 1,435+ cities. Top 15 countries by clip count:
Still Images
In addition to video clips, the dataset includes approximately 800,000 still frame extractions. These high-resolution images are derived from key frames across the archive and can be used independently for image classification, object detection, and visual similarity research.
Use Cases
Computer Vision
Train object detection, scene classification, and activity recognition models on authentic mid-century imagery.
Generative AI
Fine-tune video and image generation models to produce realistic vintage aesthetics including film grain, color shifts, and period artifacts.
Temporal Analysis
Study visual changes over decades across consistent geographic locations, fashion, architecture, and urban landscapes.
Multimodal Research
Pair rich textual metadata with visual content for vision-language model training and cross-modal retrieval.
Cultural Preservation
Develop AI tools for automated restoration, colorization, and cataloging of historical film archives.
Case Study: Paris, France 1947

To test the dataset’s potential for generative AI, we trained a model on the likeness of a specific genre — targeting Paris, France circa 1947 using approximately 2,500 still images extracted from the collection. The results were remarkable: the generated images were nearly indistinguishable from authentic archival footage, capturing the film grain, color palette, and period atmosphere with striking accuracy.
Browse the source collection at 1947 Paris France to see the original training material. The trained model is open-sourced and available for download — try it yourself and see the results firsthand at CivitAI.
Frequently Asked Questions
Video clips are available as MP4 files. Metadata can be delivered as CSV, JSON, or via API access. Still frame extractions are available as high-resolution JPEG or PNG files.
Yes. We offer custom subsets filtered by decade, geography, format, or keyword. Contact us to discuss your specific requirements.
We offer both research/academic and commercial licenses. Research licenses include usage for non-commercial AI training and academic publications. Commercial licenses cover production use in AI products and services.
All metadata fields are structured and machine-readable. Keywords are tokenized, locations are normalized, and temporal data follows consistent formatting for easy integration with ML pipelines.
Clips are digitized from original 8mm and Super 8 film stock. Resolution varies by source but typically ranges from 480p to 720p, preserving authentic film grain and period characteristics.
License This Dataset
Interested in licensing the Stockfilm dataset for AI training, academic research, or commercial applications? Contact us to discuss pricing, delivery formats, and custom subsets.
Contact Us