There is a FUSE based filesystem called httpFS.
Search google for httpfs.
I need you to enhance its functionality.
There is no need for you to understand how filesystems work, the code enhancement is mainly about sockets and connections. So you need to know how to use sockets in c++.
There are currently two tasks to do:
Firstly, the httpfs filesystem re-connects for every 4k block of data. Your goal is to modify the code so it can reuse the previously created connection (keep alive) and there needs to be some read-ahead mechanism, so if the filesystem reads 4KB, the actual connection reads eg. 40KB and caches it somewhere in RAM so subsequent reads from consecutive position gets the data right away.
Secondly, the httpfs filesystem currently supports one argument - a remote URL for one file, for example URL to an ISO image. I need you to implement a mirrors support, as explained below:
When the FUSE filesystem is mounted, it looks like it contains the single file from the URL already downloaded - but the file is not downloaded (can have several gigabytes). Instead, when any application accesses the file, it download the actually read()'ed parts (4K blocks) in background, instead of downloading the whole file. This is already implemented and works.
I need you to enhance the code in a way that user will provide three URLs - the URL for iso image like before, but then also an URL which will contain list of alternate URLs of the same file, and third URL which will contain md5 checksums for all 4k blocks of the iso file. With this, the fuse filesystem will then be able to download random parts from random URLs, and check if any given 4k block has a valid checksum.
You must keep in mind that:
- the code must re-download the list of mirrors periodically from time to time, because it may change every few minutes, so the list of mirrors needs to be updated in memory
- the list of md5 checksums will never be updated, so once you download it, there is no need to re-download it again.
- any of the URLs in mirrors list may not be accessible after some time, so once it is not accessible, it must remember the URL in some sort of blacklist to never try it anymore
- it must keep open several connections at the same time. There must be a connections pool. It will probably never read from different connections at the same time, but it should keep them open and randomly pick from which one it's going to read now.
- it must handle properly connection timeouts, to easily detect that any given URL is no longer accessible. Also if a md5 checksum if not verified, it should consider that URL as broken, and blacklist it.
You should not mind parallel processing because, as far as I am aware of, FUSE filesystems do not support parallel filesystem access.
11 freelancers are bidding on average $526 for this job
Hi, I have 11 years of experience and understood your requirement. I work in distributed domain only. Your project is interesting and I like to work on it. Thanks, Anurag
Hi, I am a software developer with 15 years of experience programming in C and C++ under UNIX and Linux. I will be happy to work on this project. Hope to hear from you soon, Boris