Face tracking & lip activity detection real time video processing (AI powered)
$1500-3000 USD
Maksettu toimituksen yhteydessä
Requirement:
A computer vision software that processes multiple video feeds to find somebody talking in the video to then select him and magnify him.
Problem:
A "hybrid meeting" takes place. This means, 10 people are sitting in a conference room on one table and you are taking part over Zoom.
For you, this is a bad experience. You feel (and are) excluded: All other meeting members are sitting in the same room, chatting, looking into each others eyes and having all the advantages that real-life meetings have. You on the other end are only seeing all 10 people all together. The one camera in the meeting room only gives you one angle. You mostly see them from the side. You only see them very small since they all have to fit into one video. Some of the meeting attendees you don't see at all, because someone else is sitting in front of them and the camera does not always have a clear view. Sometimes people turn and talk to someone sitting across the table, so you can only see their back of the head.
For the 10 people, you are ruining their meeting, because every time they tak to you, they have to make sure that you can see them and are turning to the screen and camera, possibly neglecting the main audience, all others in the room.
Solution:
The meeting room is equipped with multiple cameras. Each camera produces a video that contains one or multiple people.
All videos should be processed in realtime and the person that is currently speaking should be selected, cropped out and resized to fill the frame to be then sent on as input for video conferencing software (Zoom, Teams, Skype, whatever).
When speaker is clearly detected, the output video should be the video with the highest number of people in it, cropped with all faces still visible and resized to fill the frame.
When only one person speaks at the same time (which usually happens if the meeting is going well), this should result in a much better meeting experience for everyone.
Input:
Multiple video feeds of people in a conference room holding a meeting.
Output:
The one person that is currently speaking selected and cropped out of the rest and resized to fill the frame
Methods:
- Face tracking
- lip activity detection
- scoring system to select the most likely speaker from hist most head-on viewing angle
- post processing to crop and resize the speaker
- re-evaluating in real time to select different speaker
- optional: Also analyze sound input to detect change of speaker
Software suggestions:
- Face detection / tracking: [login to view URL]
- Lip Movement Detector: [login to view URL]
- Lip Tracking: [login to view URL]
Hardware Suggestions:
- Coral USB Accelerator Edge TPU as a coprocessor to accelerate inferencing for machine learning models:
[login to view URL]
Projektin tunnus: #27886793
Tietoa projektista
19 freelanceria on tarjonnut keskimäärin $2317 tähän työhön
Hi, I hope you are doing fine. I have done many image processing and video processing projects in matlab, python, JAVA, etc. My PhD thesis was also visual analysis of human motion. I have also published several journal Lisää
Hi bro, I am Ahmed. I have a Master's degree in Computer Science from Europe. I can immediately and perfectly do your face tracking and lips activity detection task. I am a Machine Learning expert. Please contact me to Lisää
hello sir, i am highly interested in your project. i have gone through your requirements and i believe i can be a valuable asset for your project. i am an expert in deep learning and computer vision in python. previou Lisää
HI, I am experienced developer in the field of AI, ML, deep learning and Time series analysis (Algo Trading) and have been working as data scientist from last two years. I can handle task by using Python( Tensorflow, k Lisää
Hello, **Ready to start now to get your work done ASAP ** I review your job details. I am interested in your project and can be great fit for this project. I would like to discuss more about your Project. I will sw Lisää
Hello, I hope you are doing well! Yes, I have worked on the Face tracking & lip activity detection real time video. Summary: Senior Unity 3D developer with over 6 years of experience in building AR/VR apps, mobile-bas Lisää
******** YOU WILL GET PERFECT RESULT IN TIME ************ Thank you for your posting! I am an image processing expert with full experiences using machine learning such as tensorflow, caffe, DARKNET, OPENALPR, etc and Lisää
Hi, I have 10+ years of experience in Image processing using OpenCV and ML. My main programming languages are C#, C++ and Java script. I think this project is very suitable for me and I am sure to give good results. B Lisää