AI & Machine Learning

AI Training Dataset

License a large-scale dataset of vintage archival footage for computer vision, generative AI, and temporal analysis research. Over 217,560 digitized clips with structured metadata spanning 18002099.

217,560 clips~800,000 still frames396+ hours126 countries

Key Stats

217,560+Video Clips
~800KStill Images
396+Hours of Footage
126Countries
920+Cities
18002099Temporal Range

Dataset Contents

Each clip record includes the following structured metadata fields:

Format Breakdown

Clips are tagged by original film format:

FormatClips
HomeMovie194,249
8mm159,495
16mm155,609
Super81,146
35mm157
VHS58
DV10

Decade Coverage

Distribution of clips by the decade in which they were shot:

DecadeClips
1800s10
1810s11
1820s10
1830s10
1840s10
1850s10
1860s11
1870s11
1880s12
1890s8
1900s10
1910s11
1920s10
1930s7,867
1940s10,573
1950s34,294
1960s68,156
1970s32,001
1980s7,554
1990s13
2000s12
2010s23,566
2020s31,665
2030s10
2040s10
2050s10
2060s4
2070s4
2080s10
2090s10

Geographic Coverage

Footage spans 126 countries and 920+ cities. Top 15 countries by clip count:

CountryClips
United States37,737
Mexico1,327
Canada1,042
France838
Italy488
India428
Russia409
Japan392
England367
Denmark366
Kenya339
Spain276
Greece262
Germany258
Cuba226

Still Images

In addition to video clips, the dataset includes approximately 800,000 still frame extractions. These high-resolution images are derived from key frames across the archive and can be used independently for image classification, object detection, and visual similarity research.

Use Cases

Case Study: Paris, France 1947

AI-generated image trained on 1947 Paris, France archival footage

To test the dataset’s potential for generative AI, we trained a model on the likeness of a specific genre — targeting Paris, France circa 1947 using approximately 2,500 still images extracted from the collection. The results were remarkable: the generated images were nearly indistinguishable from authentic archival footage, capturing the film grain, color palette, and period atmosphere with striking accuracy.

Browse the source collection at 1947 Paris France to see the original training material. The trained model is open-sourced and available for download — try it yourself and see the results firsthand at CivitAI.

License This Dataset

Interested in licensing the Stockfilm dataset for AI training, academic research, or commercial applications? Contact us to discuss pricing, delivery formats, and custom subsets.

Contact Us