
Closed
Posted
Paid on delivery
I have an existing Python script whose sole purpose is data extraction: it turns PDF documents into CSV files. Right now the results are incomplete and the code is a little fragile. I need it tightened up so every piece of data in each PDF is captured and written to a clean, well-structured CSV. What I already have • A working—but imperfect—Python script • Sample PDFs that show the range of layouts the tool must handle • A sample CSV that illustrates the column order I expect What needs to improve • Reliable parsing across multiple pages and varied table structures • Accurate capture of every field, not just the obvious text blocks • Clear, readable code with comments so future tweaks are simple • A straightforward command-line call such as python [login to view URL] [login to view URL] [login to view URL] Useful libraries are entirely up to you—pdfplumber, PyPDF2, tabula-py, Camelot, pandas, or a combo—so long as the final script runs on standard Python 3 and requires only pip-installable packages. Deliverables 1. Updated script (single .py file or a small module) 2. [login to view URL] listing any external dependencies 3. One example CSV generated from my sample PDFs to prove full data coverage 4. Brief README with run instructions Acceptance Criteria • Running the script on my test PDFs produces a CSV that matches the source data exactly, column for column and row for row. • No hard-coded file paths; everything is parameterised. • Code executes without warnings or errors on Python 3.10 under Windows and Linux. Please keep the focus on robust extraction—the project’s primary goal—so I can drop new PDFs in and get accurate CSVs every time.
Project ID: 40301625
10 proposals
Remote project
Active 16 hours ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
10 freelancers are bidding on average ₹1,134 INR for this job

Hello I have several years of experience with Python coding and processing PDF files Also, I completed several similar projects recently. Could you share sample of PDF files to process? Thanks
₹1,047 INR in 1 day
8.1
8.1

Hi, I can improve your existing Python PDF-to-CSV extraction script to ensure accurate parsing across multiple pages and varied table structures. I’ll make the code more robust, readable, and parameterized with proper error handling and CLI usage. Thanks Anshuman
₹1,200 INR in 2 days
6.4
6.4

I have done a similar project a week ago. I am sure you will give me more projects after this. I am interested to do this project too and ready to complete this within the timeline. Kindly check my profile to see all rating and reviews given by clients. Hoping to hear from you soon. Payment after completion.
₹1,500 INR in 3 days
0.0
0.0

I will transform your fragile PDF extractor into a resilient, production-ready pipeline. Instead of just "fixing" the current script, I will implement a schema-validated extraction process that ensures 100% data integrity for every CSV row. You will receive: 1. A robust, non-fragile Python script. 2. Complete extraction of all missing fields. 3. Zero-loss data conversion guarantee. Ready to deliver the final functional script within 24 hours.
₹1,000 INR in 1 day
0.0
0.0

Matunga East, India
Member since Mar 2, 2023
$10-30 USD
$250-750 USD
₹37500-75000 INR
£250-750 GBP
$1500-3000 USD
€250-750 EUR
$15-25 USD / hour
$30-250 CAD
$2-8 USD / hour
$2-8 USD / hour
₹600-1500 INR
₹12500-37500 INR
€12-18 EUR / hour
$10-30 AUD
$15-25 USD / hour
$250-750 USD
₹100-400 INR / hour
$250-750 USD
₹100-150 INR / hour
$30-250 USD