I am looking for a freelancer to create an OCR dataset for segmenting Chinese and English text from any image. The dataset should be generated via Python. A lot of the code already exists; see "What already exists?" section. Lastly the dataset should be compatible with PyTorch.
Tasks that need to be completed are:
1. extract sentence fragments from a dialogue text file and dictionary database
2. add sentence to image with entire sentence in bounding box (place randomly without overlap)
3. generate fixed dataset inside the `dataset/` directory
4. create a pytorch dataset with random chinese and english sentences for semantic segmentation
What already exists?
- [login to view URL] (synthetic single character dataset)
- [login to view URL] ( chinese dictionary)
- different dialogue txt files for english and chinese
- [login to view URL]
I have created a private GitHub repo for this project. You can get access to it for further details before the project begins. If you are interested in this project please start your bid with "OCR PROJECT". (There are many bots bidding.)
4 freelanceria on tarjonnut keskimäärin $259 tähän työhön
Hi, I'm a nativeChinese speaker and machine learning engineer with 4 years working experience. I've been involved in Chinese OCR project with leading tech company. For further discussion, pls contact me via inmail.