Video allows us to look back to a time gone-by. For heritage institutions, moving images present a new challenge for engaging interaction. Previously, video has been mostly used unmodified as a citation or chopped up as a remix. But there is far more information, encoded deep within videos than what we see on the surface. If it were possible to extract this information, this high-level information, what would be possible for the creative sector and heritage sector alike? With this project, SCORE!, we plan to examine exactly that.

SCORE! aims to investigate a new way of matching sound to archival videos. Recent advances in deep learning have allowed us to take raw data, such as images and video, and reduce it to latent representations. A latent representation is a point in a low dimensional space—the latent space—from which the data can be recovered. The most important, high-level variations in the data, such as facial expression or painting style, are mapped to directions in the latent space. Moving the point around in a latent space corresponds to, for instance, turning a photograph of a frowning person into a smiling one, or turning a photograph of a man into one of a woman. We will create latent representations of video and audio, and map the former to the latter. This will give us an automatically generated audio track that matches high-level events in the video.


Project number


Main applicant

Dr. P. Bloem

Affiliated with

Vrije Universiteit Amsterdam


15/02/2018 to 14/02/2019