Suoritettu

Extract questions from PDF deposition transcript

I need a python script that can extract only the questions from a PDF of a deposition transcript and then save them into a word document format which is numbered. The PDF can range from 5 pages to over 1000 pages in length.

The text will look [almost - **] exactly like the below (except it will be in a PDF). I have attached a sample PDF to this proposal:

1. A. Yes.

2. Q. And you're familiar with Navistar; right?

3. A. Yes, ma'am.

4. Q. And you've been familiar with Navistar for decades?

5. A. Yeah. All these old things sound bad but, yes,

6. decades.

7. Q. And this engine that was in the Nolans' Excursion that

8. was manufactured by Navistar; right?

9. A. Correct.

The above would be scraped, numbering removed from the PDF and then saved in a Word document as:

Q. And you're familiar with Navistar; right?

Q. And you've been familiar with Navistar for decades?

Q. And this engine that was in the Nolans' Excursion that

was manufactured by Navistar; right?

ISSUES - **

1. Each page of the transcript has each line numbered from 1 to 25 on the left hand side. Any attempt to scrape the line from "Q:" until reaching "?" will grab the line numbers as well. These will need to be removed from the output.

2. Some questions which are near the end of one page will continue onto the next page. At the bottom of each page of the PDF is a footer and a page number. At the top of the next page is a header. NONE of that should be saved. I will leave it up to the coder to decide the best method to deal with this.

I will note that it may be easier to convert the PDF to some other type first (.txt) which will strip the header/footer, but leave the page and line numbers which have to be dealt with. It will be up to you how to deal with this issue.

3. The question text will need to be reformatted to remove all hard coded returns at the end of lines EXCEPT those that follow the ending "?" Below is an example of before and after:

10. Q: Okay. Yesterday, as you recall, you were in

11. here for a different case, Jones versus Ford, and you

12. have a binder that looks almost identical today?

The lines above, if scraped or converted to .txt format, will retain the shortened text structure which needs to be revised. The above should be reformatted and saved as:

Q: Okay. Yesterday, as you recall, you were in here for a different case, Jones versus Ford, and you have a binder that looks almost identical today?

and NOT saved as:

Q: Okay. Yesterday, as you recall, you were in

here for a different case, Jones versus Ford, and you

have a binder that looks almost identical today?

4. Processing of the PDF/Document should stop when the words "ERRATA SHEET" are encountered. That is the end of the file. At times there may be more pages beyond this - they should be ignored and all work saved once the above phrase is found. This exact phrase (including being in caps) only appears once per document. There may be times where this phrase is not found and the document simply ends. Either scenario should be accounted for.

Requirements:

1. Simple method to tell the program where the PDF is located. I would prefer NOT to have to type in the exact directory location, instead be able to navigate to it and then the program know the location from that. Drag and drop is fine, navigation by user is fine - whatever is easier.

2. Save the output in the same directory as the input file. File should be saved in Word format. (.doc or .docx fine) Use same filename as input file with the addition of _questions to the filename. If original file is "[login to view URL]" the saved file should be "[login to view URL]" or "deposition_questions.docx." Obviously, if drag and drop is implemented, output directory can simply be the "documents" folder located at: "C:\Users\Username\Documents" with username required by the user to be entered.

Program probably grows to more than one phase, this is just a starting point.

Taidot: Data Extraction, tietojenkäsittely, Data Scraping, Python, tietojärjestelmäarkkitehtuuri

Näytä lisää: how to summarize a deposition transcript, transcript management software, deposition questions construction defect, how long does it take to summarize a deposition, free deposition summary software, deposition editing software, deposition summary format, sample deposition transcript, net itextsharp extract tiff pdf, extract information pdf, extract images pdf, extract photos pdf files, extract data pdf report, extract pics pdf, extract picture pdf, pdfsharp extract images pdf, extract info pdf, extract images pdf using open source, extract images pdf file using php, itextsharp extract tiff pdf

Tietoa työnantajasta:
( 23 arvostelua ) Los Angeles, United States

Projektin tunnus: #19595485

Myönnetty käyttäjälle:

visionbhind2020

1. I am a regular programmer who has written more than 500 python script. I have 3 year experience in python.

$35 USD 1 päivässä
(0 Arvostelua)
0.0

21 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön

yesikov1224

[login to view URL] Credit is my motto. I am expert web scraping. I can do your job with BS4 ,Perl script,Scrapy Seleinum framework of python. And You can know my skill as u see my profile. I can do any project in your demand comp Lisää

$150 USD 7 päivässä
(32 arvostelua)
5.5
RushService

Feel fee to contact me for Extract questions from PDF deposition transcript .Shoot me message to discuss further more details .We provide the comments,images,videos,demos and live sessions in order to help the cli Lisää

$150 USD 3 päivässä
(49 arvostelua)
5.9
hamzaisb

I have done before extracting data from scanned and digital pdf in Python. I have a lot of experience in python nearly 3+ years it has been working in python. and I have done such similar task. I have questions to as Lisää

$250 USD 3 päivässä
(37 arvostelua)
5.8
ferozstk

Hello, After reading your project details I believe I'm suitable for this project. As I'm expert on it with more than 7 years experience. Please feel free to contact me. I am looking forward to hear from you. Lisää

$60 USD 3 päivässä
(27 arvostelua)
5.3
meeshal1994

Hello, I am a PHP developer and an scraping expert. I know that you want a script in Python but I have done this many times with PHP, I have developed a script which decodes PDF pages in a manner that nothing is distu Lisää

$400 USD 8 päivässä
(16 arvostelua)
4.8
susanna2018

Hi, Sir!! i am a python expert and full-stack developer with full time. Your project is very simple for me. -):-):-) i want to help you with my python skills :-):-):-) “””” Specially Data Science , Data Ana Lisää

$155 USD 3 päivässä
(21 arvostelua)
5.0
xinglong717

Hi, How are you today? I have just read your job description carefully and I am very interesting in your job. I am a senior Excel,VBA,C,C++ ,C#,Python developer with 10 years experiences and I have developed many progr Lisää

$140 USD 7 päivässä
(28 arvostelua)
4.9
adeelpirzada

Hi, i have done scrapping almost on Half of Worldwide web including eCommerce giants (Amazon, eBay, craigslist) News Feed, Social media websites, API's. I develop my own scrapers and Bots based on requirements w Lisää

$125 USD 7 päivässä
(16 arvostelua)
5.2
albertpopov46

Dear, sir @I am full time freelancer@ I have read your description carefully. I am a python expert and I have rich experience about 7 years. So I think that i can do your project in your time. If you hire me, I 'll pro Lisää

$199 USD 7 päivässä
(11 arvostelua)
4.3
KongHyongRan

Hi. I am really interested in your project. Because it's the most appropriate project I can complete perfectly. I have been working as a professional software developer for years and I earned rich experience in it. Lisää

$155 USD 3 päivässä
(9 arvostelua)
3.7
Logo199

I have total 7 years of working experience in process mapping , process documentation , data entry , data analysis and expert in MS office application including MS word , MS powerpoint , MS excel , Photoshop , Adobe Lisää

$50 USD 1 päivässä
(9 arvostelua)
3.5
AlexanderPGR

~~~ Very interested to me! ~~~ Hi, dear! Feel contact me for these kinds of projects. I will provide you the best result on time. I am waiting for your message to have a detail discussion. Thanks.

$150 USD 7 päivässä
(8 arvostelua)
3.2
bktk

Hi Dear Sir, after reading the project description, Sir I am able to write proposal for this project. Sir I am expert in all those skills which is mentioned here, and all those which is require for this project. Sir I Lisää

$30 USD 3 päivässä
(11 arvostelua)
2.8
alkajain2906

Q. And you're familiar with Navistar; right? Yes Q. And you've been familiar with Navistar for decades? Q. And this engine that was in the Nolans' Excursion that was manufactured by Navistar; right? Greetings Lisää

$155 USD 3 päivässä
(1 arvostelu)
2.4
anshsparkle

HELLO I CAN START RIGHT NOW - I AM EXPERT IN Python Data Processing and I BET YOU CANNOT FIND BETTER FREELANCER THAN ME ... pLEASE MESSEGE ME AND LETS DISCUSS THE THINGS THANKSPlease Reply

$140 USD 7 päivässä
(1 arvostelu)
1.0
cooddooc

Hi, When we can talk please schedule a time? I work till satisfaction for all my clients and do work more perfectly and clearly. I can work in full-time, 12+ hours /day and can work in your timezone. Also will keep goo Lisää

$140 USD 7 päivässä
(1 arvostelu)
0.6
HarleyJohnson

I Will Do Data Entry,Data Analysis,Data Mining,Internet Research I specialize in : ? Offline and Online Data Entry ? Data Mining ? Data Analysis ? Copy Paste Task ? Data Capturing From Any Website ? G Lisää

$250 USD 5 päivässä
(0 arvostelua)
2.2
hayteekeys5

hi, i have great python experience and skill to help you develop a python script that can extract only the questions from a PDF of a deposition transcript and then save them into a word document format which is numbere Lisää

$30 USD 1 päivässä
(0 arvostelua)
0.0
fercontador952

I have done something similar but extracting dates. The logic should be open de pdf, read it and filter at the same time

$100 USD 3 päivässä
(0 arvostelua)
0.0
stackcru

We can take care of the development and designing of your mobile app and it will be done without any hurdle with good quality. Quality is always on a top priority for us. The app will be developed in React Native. F Lisää

$155 USD 15 päivässä
(0 arvostelua)
0.0