Automatic Foley Machine

This project uses YOLOv5 to detect animals from a video without audio and then play appropriate sounds for the identified objects when they appear in the video using the freesound.org API. To create an interactive audio-visual experience, sounds can be adjusted with different knobs enabling effects like reverb, filtering and pitching.

The website is served by a node.js web server, which is also responsible for temporarily storing the uploaded video on its hard drive. The file can be uploaded via a dropzone. After receiving the file the web server sends a POST-request to the Python server which then runs the inference on a pre-trained machine learning model based on YOLOv5 for object detection.

After processing and analyzing the individual frames, the Python server sends back a JSON object containing all the detected animals and the respective framenumber to the node.js web server. With this information, the web server makes multiple requests to the freesound.org API. The response consists of list related to the searched keywords and links to the specific audio files, which are then integrated into the website to create all required audio elements that stream the sound directly from the freesound servers.

The audio data can then be manipulated during playback using the controls, for which we utilize the capabilities of the Web Audio API.

Technologies Used: Python, Node.js, YOLOv5, Web-Audio-API, freesound.org API, AWS

My Responsibilities:

Building the Python backend

Customizing YOLOv5 to our needs

Deployment on AWS

Automatic Foley Machine

Given a video without audio the automatic foley machine identifies animals in the video and generates the corresponding sounds. The sounds can be adjusted separately for each animal with different knobs enabling effects like reverb, filtering and pitching.

Description