We expect that our dataset of over 2.8 million videos contains a number of duplicates and require a system to detect such. This job posting is intended for an experienced developer capable of developing software intended for processing datasets of this scale in a relatively short amount of time (depending on underlying hardware).
We require a system that is capable of fingerprinting our entire dataset (2.8 million and growing) and is able to quickly identify duplicate or derivative video content. At a rate of ~6000 videos per day uploaded to our site, the system will need to be able to fingerprint new uploads at a rate that is fast enough to keep up with them.
A "duplicate" for this system is a video that is either exactly the same or a derivative, ie. a clip of a longer video.
To recap, the developed system will need to be able to:
- Fingerprint our existing video dataset
- Process at least 10000 - 20000 new videos per day (depending on hardware, of course)
- Match two or more identical videos
- Match derivative videos, ie. a short clip of a longer video
- Match identical videos between resolutions
- Run within a Linux system (we use Ubuntu)
Please provide examples of relevant work in your cover letter.
8 freelanceria on tarjonnut keskimäärin 1223$ tähän työhön
Hi there, We have developed a Similar website [login to view URL] which handles and deals with videos on AWS, we can also help you filter duplicate videos, Please send a message so we can discuss complete project.