Generate a synthetic dataset for Chinese and English (Python)


I am looking for a freelancer to create an OCR dataset for segmenting Chinese and English text from any image. The dataset should be generated via Python. A lot of the code already exists; see "What already exists?" section. Lastly the dataset should be compatible with PyTorch.

Tasks that need to be completed are:

1. extract sentence fragments from a dialogue text file and dictionary database

2. add sentence to image with entire sentence in bounding box (place randomly without overlap)

3. generate fixed dataset inside the `dataset/` directory

4. create a pytorch dataset with random chinese and english sentences for semantic segmentation

What already exists?

- [login to view URL] (synthetic single character dataset)

- [login to view URL] ( chinese dictionary)

- different dialogue txt files for english and chinese

- [login to view URL]

I have created a private GitHub repo for this project. You can get access to it for further details before the project begins. If you are interested in this project please start your bid with "OCR PROJECT". (There are many bots bidding.)

Taidot: Python, Tietojenkäsittely, Tekstintunnistus, Datatiede, Image Processing

Näytä lisää: chinese english building industry words, change autocad chinese english, simplified chinese english free translation, translation beijing chinese english email hiring web, chinese english typing input data, conversion chinese english, translate document chinese english, translation chinese english malaysia charges, looking chinese english bilingual, looking chinese english speaking actors, looking for chinese english interpreter in sydney, looking for chinese english translator, python code to generate synthetic data, chinese english translation dataset, chinese-english translation dataset, generate synthetic time series data python

Tietoa työnantajasta:
( 2 arvostelua ) Lübbecke, Germany

Projektin tunnus: #29879822

4 freelanceria on tarjonnut keskimäärin $259 tähän työhön


***"OCR PROJECT"*** Greeting of the Day, I appreciate posting this kind of job. I understand your requirement and I want to help you out with smart solutions. I'm an expert Python Developer having 5+ years of experi Lisää

$140 USD 15 päivässä
(11 arvostelua)

Hi! I am an expert in photoshop & lightroom at [login to view URL] with 10+ years of experience in photo editing and retouching, having the top rate (highest score reviws 5/5). I saw the task Generate a synthetic dat Lisää

$140 USD 2 päivässä
(1 arvostelu)

"OCR PROJECT" Hi, I am a OCR expert. I can use python for OCR and computere vision. I can use openCV or deep learnign model. Theses are CNN or RNN model. The importance is to prepere good data set. It we prepare datas Lisää

$200 USD 7 päivässä
(1 arvostelu)

Hi, I'm a nativeChinese speaker and machine learning engineer with 4 years working experience. I've been involved in Chinese OCR project with leading tech company. For further discussion, pls contact me via inmail.

$556 USD 14 päivässä
(0 arvostelua)