
Closed
Posted
Paid on delivery
I have a collection of PDFs whose tables combine normal text with embedded images, so simple copy-and-paste will not do. Your task is to pull every table out of each file, run OCR where the cells are only images, and populate a custom-designed SQLite database. We will agree on the schema up front so the final .sqlite file lines up with my downstream reporting tools. Accuracy is key: numbers must keep their formatting, column order must stay intact, and any footnotes that belong to a row need to be captured in a linked field rather than discarded. Whether you prefer Python with Camelot, Tabula-py, pdfplumber, Tesseract, or another extraction stack is up to you—as long as the workflow is reproducible. Once all conversions are complete, send me the finished database along with the script or notebook you used so I can rerun the process when new PDFs arrive.
Project ID: 40423283
15 proposals
Remote project
Active 6 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
15 freelancers are bidding on average ₹7,278 INR for this job

Hello, I am Software Developer from Bangalore, India.I will extract using python module.I will write script to automate this flow. Let’s connect through chat
₹7,000 INR in 2 days
6.5
6.5

Hi, I can extract all tables from your PDFs, including OCR for image-based data, and organize everything into a clean SQLite database with accurate formatting and structure. I’ll also provide a reusable script so you can run the same process on future files. Best Regards,,,,,,,,,
₹15,000 INR in 1 day
5.8
5.8

hello! hoping you are fine. im an expert with +8 years of experience in python, ocr and sqlite. i've extracted complex mixed-media tables from pdfs using pdfplumber and tesseract before, so i can get this script and database ready for you fast. do you already have the sqlite schema defined? contact me, i will reply right now!
₹7,000 INR in 2 days
4.9
4.9

Hello, I am senior developer. I can do you required Extract PDF Tables to SQLite. Please share sample PDF file for me Thank You
₹12,500 INR in 5 days
3.4
3.4

With your project description,I believe my extensive background in data processing and database programming position me to tackle this project successfully. I have leveraged Python in extracting tables from PDFs using Tabula-py and pdfplumber, as well as employing Tesseract OCR when necessary. Hence, I can definitely handle the intricate task of identifying images within cells and applying OCR to them for accurate conversion. Moreover, my proficiency in SQLite will ensure that our agreed-upon schema is efficiently incorporated into the final .sqlite file to align with your reporting tools.I am detail-oriented, outcomes-focused and strive to maintain the precision needed for data conversion projects such as yours. Importantly, I always provide clear documentation so you can easily reproduce the workflow when more PDF files require extraction- something that has earned me praise from past clients. In summary, you can count on my decade-long Full Stack/Web Application Development experience and particular expertise in Data Processing and SQLite operations to deliver a reliable and reproducible data extraction process with zero compromise on accuracy. I am eager to get started on this project. Reach out today and let's ensure your PDF tables are extracted seamlessly and ready for downstream reporting!
₹7,000 INR in 7 days
3.1
3.1

Handling tables with both text and embedded images presents a unique data extraction challenge. I've previously built similar data pipelines using Python and OCR for financial reporting, ensuring precise number formatting and data integrity. I'd approach this by leveraging a combination of libraries like Camelot or Tabula-py for initial table detection, integrating Tesseract OCR for image-based cells, and meticulously structuring the data into your custom SQLite schema. I estimate I can deliver the completed SQLite database and accompanying script within 7 days, prioritizing accuracy and reproducibility. Would you be able to share a sample PDF with a particularly complex table so I can assess the OCR requirements upfront?
₹6,167 INR in 7 days
2.5
2.5

Hi there, I read your requirements carefully, and I can help extract tables from your PDFs, run OCR where table cells are image-based, and populate the results into a custom SQLite database. I’ll first review a sample PDF and finalize the SQLite schema with you, then build a reproducible Python workflow using tools like pdfplumber/Camelot/Tabula for text-based tables and Tesseract/PaddleOCR where image-based cells need OCR. I’ll make sure column order, number formatting, row structure, and linked footnotes are preserved as accurately as possible. Deliverables will include: Final .sqlite database Extraction/OCR script or notebook Agreed schema documentation Accuracy check notes for extracted tables I’ll keep the workflow reusable so you can run the same process again when new PDFs arrive. Cost: ₹10,000 || Timeline: 1-2 days Payment and timeline details can be discussed further to align with your expectations. I’d be happy to help build a clean and reliable PDF-to-SQLite extraction process. Best regards, Oluwatobi Okedairo
₹10,000 INR in 1 day
1.8
1.8

Hello, I can build a reliable, fully reproducible workflow to extract all tables from your PDFs and populate a structured SQLite database with high accuracy. Approach 1. Schema Definition - Review sample PDFs and reporting requirements - Finalize SQLite schema (tables, relations, footnote linkage, formatting rules) 2. Table Extraction - Use Python-based pipeline combining pdfplumber / Camelot / Tabula-py for structured tables - Detect mixed-content cells automatically - Preserve column order and numeric formatting exactly as in source files 3. OCR Processing - Apply Tesseract OCR for image-based cells - Preprocess images (contrast, thresholding, segmentation) for higher accuracy - Map OCR output back to correct rows and columns 4. Footnote Handling - Detect and extract footnotes - Store them in linked relational fields instead of flattening or losing context 5. Database Population - Automated validation checks - Insert cleaned data into custom-designed SQLite database - Maintain consistent structure for downstream reporting tools 6. Reproducible Workflow - Deliver complete Python script or Jupyter Notebook - Configurable folder-based processing for future PDFs - Clear documentation for rerunning the pipeline Deliverables - Final .sqlite database - Extraction + OCR automation script - Setup instructions and usage guide The focus will be precision, repeatability, and long-term scalability rather than one-time conversion.
₹4,000 INR in 3 days
1.6
1.6

I can help you complete this quickly and cleanly by using Python with Camelot and Tesseract to extract tables from PDFs, then populate a custom SQLite database, ensuring accuracy and reproducibility, and deliver a stable database along with the script, so you can rerun the process when new PDFs arrive, happy to discuss further over DM.
₹1,500 INR in 1 day
1.4
1.4

Hi, I can build a reproducible Python workflow to extract PDF tables, OCR image-based cells, and load the cleaned results into a custom SQLite database. I understand the challenge here is preserving table structure, not just pulling rough text. I’ll review sample PDFs first, agree on the SQLite schema, then use the best extraction mix such as pdfplumber, Camelot/Tabula, Tesseract OCR, and custom cleanup logic depending on whether each table is digital, scanned, or mixed. I’ll make sure column order, number formatting, row relationships, and footnotes are preserved correctly, with footnotes stored in linked fields instead of being lost. The final output will be a `.sqlite` database aligned with your reporting tools, plus the script or notebook used so you can rerun it for future PDFs. I can also include validation checks for row counts, missing OCR cells, numeric formats, and extraction confidence so the process is reliable. Best regards Ankit
₹5,000 INR in 1 day
1.0
1.0

You need tables extracted from PDFs into a structured database, but manual data entry kills timelines and introduces errors—especially when image-based cells need OCR. I've handled exactly this workflow multiple times. I'll build an automated Python pipeline using pdfplumber for text extraction, Tesseract for OCR on image cells, and a custom SQLite schema tailored to your table structure. The approach: parse your PDF collection, validate OCR accuracy on sample cells, populate the database, then document the entire workflow so you can rerun it on new PDFs without my involvement. You'll get three deliverables: the finished SQLite database, a fully reproducible Python script, and clear setup instructions. I'm based in Uruguay and deliver fast without cutting corners—I'll test thoroughly before handoff to ensure data integrity. Typical turnaround on projects like this is 3-5 days. Ready to move forward? Share a sample PDF or two so I can test the OCR quality on your specific table types and give you an exact timeline. Let's get this done.
₹6,500 INR in 5 days
0.0
0.0

Mixed text-and-image PDF tables are exactly the kind of pipeline I build. I will use Camelot for the text-mode tables, run the image cells through Tesseract (with deskew + threshold preprocessing so OCR holds up on noisy scans), and write everything into the SQLite schema we agree on up front. Footnotes get captured in a separate linked table keyed to the row they belong to, so nothing gets dropped. Numbers stay typed correctly (no leading-zero loss, no scientific-notation surprises). I will run a sample batch first, send you the output for sign-off on column mapping, then process the rest. Deliverables: the .sqlite file, the Python pipeline, a small validation report flagging any rows where OCR confidence was low, and a README. 3 days from go-ahead. Quick clarifier before I start: roughly how many PDFs in the batch, and do you have a sample I could test the OCR threshold against? Turner
₹6,500 INR in 3 days
0.0
0.0

Delhi, India
Member since Apr 30, 2025
$250-750 USD
₹100-400 INR / hour
₹12500-37500 INR
$250-750 USD
$15-25 USD / hour
₹600-1500 INR
₹100-400 INR / hour
₹750-1250 INR / hour
€100-300 EUR
$250-750 USD
$10-30 USD
₹12500-37500 INR
₹75000-150000 INR
$30-250 NZD
₹1500-12500 INR
₹12500-37500 INR
$30-250 AUD
$10-20 USD
₹750-1250 INR / hour
$30-250 USD