Find Jobs
Hire Freelancers

Document parsing and text mining in Python

$15-25 USD / hour

Peruutettu
Julkaistu noin 9 vuotta sitten

$15-25 USD / hour

Programmer for Tree Parsing/Text Mining Job Summary Seeking an experienced programmer for engagement in long-term freelance work. Strong tree parsing skills are essential. A background in NLP and experience with NLTK is preferred but not required. Pay is commensurate with experience and is hourly-based. As part of our hiring process, we ask that interested candidates successfully complete the tasks below to demonstrate basic competency. Project Background The SEC stores various text files they receive from companies on their Edgar website. The files typically contain detailed discussions of companies’ performance as well as financial data summarizing their performance. Attached is a random sample of 15 full .txt files from 5 different years with a file type of “10-K” from Edgar. You will find files which embed HTML, SGML, or XBRL code, in addition to tables, special characters, images, and other embedded files, such as PDF, etc. Tasks Extract the following sections from the 10-K using a tree parser: Management Discussion and Analysis (MD&A), Risk Factors, and Notes to the Financial Statements. Flatten each section extracted to raw text. That is, remove all code, tables, images, or embedded files. Write the raw text of each section to a separate .txt file. The filename for the raw text file should be that of its parent with a suffix for each section appended (e.g., “*[login to view URL]”, “*[login to view URL]”, and “*[login to view URL]”). Discuss any outstanding issues, questions, or concerns regarding the steps above. For example, discuss weaknesses in your approach to identifying section and sentence boundaries. Apply For full consideration, please upload your resume, output, and responses by April 15, 2015. We are an equal opportunity employer. Work permits or visas are not required.
Projektin tunnus (ID): 7393520

Tietoa projektista

8 ehdotukset
Etäprojekti
Aktiivinen 9 vuotta sitten

Haluatko ansaita rahaa?

Freelancerin tarjouskilpailun edut

Aseta budjettisi ja aikataulu
Saa maksu työstäsi
Kuvaile ehdotustasi
Rekisteröinti ja töihin tarjoaminen on ilmaista
8 freelancerit tarjoavat keskimäärin $22 USD/tunti tätä projektia
Käyttäjän avatar
A proposal has not yet been provided
$17 USD 3 päivässä
5,0 (43 arvostelua)
4,8
4,8
Käyttäjän avatar
Hi! I am professional C/C++/C#/Java/Python developer. I can do this project with highest quality! Best regards, Szymszetinsl
$21 USD 30 päivässä
5,0 (2 arvostelua)
3,3
3,3
Käyttäjän avatar
hi i can parse text from many type of files including .txt, csv, pdf, doc, docx, png, jpeg, psd, rst etc. i am ready to do the task . i could not see that link of text files ? could you give me the text file ?(link)
$17 USD 20 päivässä
5,0 (4 arvostelua)
3,2
3,2
Käyttäjän avatar
Hi, I am a graduate research student doing research on network programming languages. My work on NPLs involves representing network topologies in graphs like tree data structures and running different algorithms on those data structures. I also have deep understanding of NLP as I have worked on lexicons, parsers and regular grammars. Besides, I have experience of 4 years in software development. I can deliver you the result with the quality you expect. I haven't found any attachment. Please provide the files. I shall upload the resume, output and responses soon after having the files. Thanks, Shahbaz
$22 USD 20 päivässä
0,0 (0 arvostelua)
0,0
0,0
Käyttäjän avatar
Hello, I'm a freelance Python developer and I and very interested in being your developer for the job '"Document parsing and text mining in Python" I have worked on projects that required parsing files and I worked with pdf, doc, csv, docx and odf formats.I have also worked on two projects that involved data mining, getting to use libraries such as Numpy, Scipy, NLTK, Scrapy, Gensim, Requests and Matplotlib. Worth mentioning is that I performed some Natural Language Processing on the data and also semantic matching. Please refer to my portfolio for previous projects I have handled. I'm looking forward to hearing from you, Regards, Aurlus I. Wedava
$16 USD 40 päivässä
0,0 (0 arvostelua)
0,0
0,0
Käyttäjän avatar
I can set this up in Python. - networkx for graph object to trace extractions. - pdf to text no problem. Based in Toronto. Though, I'm afraid I can't commit to a skills demo without a milestone or compensation.
$27 USD 5 päivässä
0,0 (0 arvostelua)
2,3
2,3
Käyttäjän avatar
Great experience in NLP, text mining, contextual extraction, sentiment analysis. Using combination of advanced tools which are written by me and commercial software. Also could twice increase amount of work hr/week if needed. Also got several ready-to-work classification taxonomies in different subject domains from past projects. All my code is working now like client-server python software, sending text to server and receiving clean version, facts, categories. Also could do data mining on text/statistical/social graph information. P.S. If needed, could enable to work small team (they would be lucky to take a part in interesting project) to solve advanced statistical tasks using textual/numeric information, for example, parsing pharma data and searching if symptoms of seasonal illness correlate with prices or smith else, like retail customer segmentation, telecom/banking messages analysis, credit scoring models... P.P.S. By the way, file attached with data is not available now for testing... m.b. deleted by system... Could you please attach that file to test my skills?
$23 USD 20 päivässä
0,0 (0 arvostelua)
0,0
0,0

Tietoja asiakkaasta

Maan UNITED STATES lippu
United States
0,0
0
Liittynyt maalisk. 29, 2015

Asiakkaan vahvistus

Kiitos! Olemme lähettäneet sinulle sähköpostitse linkin, jolla voit lunastaa ilmaisen krediittisi.
Jotain meni pieleen lähetettäessä sähköpostiasi. Yritä uudelleen.
Rekisteröitynyttä käyttäjää Ilmoitettua työtä yhteensä
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Ladataan esikatselua
Lupa myönnetty Geolocation.
Kirjautumisistuntosi on vanhentunut ja sinut on kirjattu ulos. Kirjaudu uudelleen sisään.