AI & Machine Learning

Vintage Film AI Training Dataset

Name: Stockfilm Vintage Archival Footage Dataset
Creator: Stockfilm
Published: 2025-01-01
License: https://stockfilm.com/licensing-guide

License a large-scale dataset of vintage 8mm and Super 8 archival footage for computer vision, generative AI, and temporal analysis research. Over 162,072 digitized clips with structured metadata spanning the 1930s–1980s.

162,072 clips~800,000 still frames299+ hours121 countries

Key Stats

162,072+Video Clips

~800KStill Images

299+Hours of Footage

121Countries

935+Cities

1930s–1980sTemporal Range

Dataset Contents

Each clip record includes the following structured metadata fields:

Title — descriptive clip title
Description — detailed narrative description
Keywords — searchable keyword tags
Shot Year — estimated year the footage was captured
Shot Decade — decade classification (e.g. 1950s, 1960s)
Duration — clip length in milliseconds
Format Tags — film format (8mm, Super 8)
Location — city, region, and country
Geo Tokens — tokenized geographic identifiers
Copyright — rights and ownership information

Metadata coverage across the full dataset:

Keywords0%

Description0%

Still Images

In addition to video clips, the dataset includes approximately 800,000 still frame extractions. These high-resolution images are derived from key frames across the archive and can be used independently for image classification, object detection, and visual similarity research.

Use Cases

Computer Vision

Train object detection, scene classification, and activity recognition models on authentic mid-century imagery.

Generative AI

Fine-tune video and image generation models to produce realistic vintage aesthetics including film grain, color shifts, and period artifacts.

Temporal Analysis

Study visual changes over decades across consistent geographic locations, fashion, architecture, and urban landscapes.

Multimodal Research

Pair rich textual metadata with visual content for vision-language model training and cross-modal retrieval.

Cultural Preservation

Develop AI tools for automated restoration, colorization, and cataloging of historical film archives.

Case Study: Paris, France 1947

AI-generated image trained on 1947 Paris, France archival footage

To test the dataset on generative AI, we trained a model on roughly 2,500 still images pulled from a single location and era: Paris circa 1947. The output was hard to tell apart from real archival frames. It picked up the film grain, the color palette, and the period look.

Browse the source collection at 1947 Paris France to see the original training material. The trained model is open-sourced and available for download — try it yourself and see the results firsthand at CivitAI.

Access & Pricing

Standard Stockfilm clip licenses do not include AI-training rights. Dataset licenses are separate agreements that name the permitted uses, which keeps the chain of title clean for your legal review.

Evaluation sample

Free. 100 clips with complete metadata records, matched to your use case. Requests are reviewed before delivery.

Research license

From $4,900. Subsets up to 25,000 clips for non-commercial training and publication. Metadata delivered as CSV or JSON.

Commercial training license

From $0.95 per clip. Volume-tiered. Filter by decade, geography, subject, or content flags. Full-corpus and refresh terms are negotiated directly.

Live counts behind every claim on this page are published in the machine-readable datasheet, generated from the same catalog database that fulfills orders.

Frequently Asked Questions

What formats is the dataset available in?

Video clips are available as MP4 files. Metadata can be delivered as CSV, JSON, or via API access. Still frame extractions are available as high-resolution JPEG or PNG files.

Can I license a subset of the dataset?

Yes. We offer custom subsets filtered by decade, geography, format, or keyword. Contact us to discuss your specific requirements.

What are the licensing terms?

Research licenses cover non-commercial training and academic publication. Commercial licenses cover production use in AI products and services. Either way, training rights are granted explicitly in the dataset agreement — the standard per-clip license sold on this site does not include them.

Is the metadata machine-readable?

All metadata fields are structured and machine-readable. Keywords are tokenized, locations are normalized, and temporal data follows consistent formatting for easy integration with ML pipelines.

What resolution are the clips?

Source film is scanned at 4K, which is what the per-clip licenses on this site deliver. Dataset deliveries can be 4K masters, HD mezzanines, or lower-resolution analysis proxies — most teams take HD or proxies to keep storage manageable. The grain, color shifts, and period artifacts survive at every resolution.

License This Dataset

Start with the free 100-clip evaluation sample, or go straight to scoping a subset. Tell us the use case and we respond with a concrete proposal: clip counts, delivery format, and price.

Request the Evaluation Sample Download the Datasheet (JSON)