Video-Audio Signal Processing

This repository contains three algorithms that solve the following problems respectively:

Recognize faces from videos clips.
Recongizes voices from audios.
Separate speeches from a videos of three speakers speaking together based on given visual and audio information of the speakers.

This project is implemented in Python with the following packages: Face Recognition, Resemblyzer, Speechbrain.

Run the code

Clone this project:

git clone https://github.com/Lukeli0425/VASP.git

Install the required packages:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
pretrained_models/sepformer-wsj03mix		pretrained_models/sepformer-wsj03mix
speechbrain		speechbrain
temp		temp
test_offline		test_offline
train		train
train_gen		train_gen
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
detect_face.py		detect_face.py
env.md		env.md
face_demo.py		face_demo.py
haarcascade_frontalface_default.xml		haarcascade_frontalface_default.xml
requirements.txt		requirements.txt
separation_demo.py		separation_demo.py
test.py		test.py
train_model.py		train_model.py
train_video.py		train_video.py
try_sep.py		try_sep.py
utils.py		utils.py
vasp_demo.png		vasp_demo.png
voice_demo.py		voice_demo.py
代码说明.txt		代码说明.txt
大作业文档.docx		大作业文档.docx
更新说明.txt		更新说明.txt