The project is to scrap all companies in the UK [login to view URL] (you can also try using their API [login to view URL] ), or you own [login to view URL] companies should be saved along with any additional data in a database with unified field names and their values.
Information should be similar to this sample file.
IDE:Visual Studio 2017 Enterprise LibCurlC++ (as a static library)
Part of the job is to scrap the attached documents (PDF, which needs to be OCR'ed, and .IXBRL files which are kind of HTML files) to each company's record and create a database of all unified data.
You can use ABBY OCR for the OCR part. [login to view URL]