
Suljettu
Julkaistu
Maksettu toimituksen yhteydessä
I need a production-ready search stack that starts with an ETL flow pulling exclusively from our internal PostgreSQL databases. The pipeline must ingest and transform 38 000+ B2B category records and 5 000–10 000 company profiles, then run cleaning, vectorization, and enrichment steps so every record is categorized and stored in a pgvector-enabled schema. Once the data is in place, a separate microservice should expose a REST API that supports hybrid search: dense vectors (OpenAI text-embedding-3-small) combined with BM25 and blended with RRF scoring. Results have to work equally well in Hungarian and English; huspacy, spaCy, and Open AI are the preferred tools for language handling and any fallback generation. I expect the codebase in Python 3.10+, organised as two deployable units: • ETL package that connects to the existing tables, performs the vector and category enrichment, and writes into PostgreSQL/pgvector with idempotent reruns. • FastAPI microservice offering endpoints for single-query search and batch queries, with Docker files and a short README explaining environment variables and health checks. Acceptance will be based on end-to-end tests: I run the ETL, hit /search with a Hungarian and an English query, and receive ranked results that include both BM25 and vector hits blended by RRF. Detaled RFQ attached.
Projektin tunnus (ID): 40179591
96 ehdotukset
Etäprojekti
Aktiivinen 18 päivää sitten
Aseta budjettisi ja aikataulu
Saa maksu työstäsi
Kuvaile ehdotustasi
Rekisteröinti ja töihin tarjoaminen on ilmaista
96 freelancerit tarjoavat keskimäärin $7 842 USD tätä projektia

Hello, I will deliver a production-ready bilingual search stack starting from PostgreSQL data. The ETL will ingest 38,000+ category records and 5,000–10,000 company profiles, then clean, vectorize, enrich, and write to a pgvector-enabled schema with idempotent reruns. The Python 3.10+ codebase will have two deployable units: an ETL package and a FastAPI microservice (with Dockerfiles and a README explaining env vars and health checks). The API will offer single and batch queries using hybrid scoring: OpenAI embeddings (text-embedding-3-small) blended with BM25 via RRF, with language handling for Hungarian and English through huspacy and spaCy, plus fallback generation if needed. End-to-end tests will validate results for Hungarian and English queries. I’ll provide a concise deployment guide and testing plan. 1) Which PostgreSQL version and is pgvector already available in the target environment? 2) Please share exact schema and data quality details for the 38k category records and 5k-10k company profiles (fields, types, nulls, constraints). 3) Is OpenAI embeddings for all records acceptable, or do you want model-specific tuning and any rate limit/cost constraints? 4) What are your uptime targets, environment (cloud, k8s vs Docker), and CI/CD expectations? 5) What latency and throughput do you expect for /search and batch endpoints, and any quota limits on API usage? Best regards,
$10 000 USD 16 päivässä
8,6
8,6

We have successfully completed similar projects, and we're confident in delivering an exceptional Bilingual AI Search Pipeline tailored to your needs. We understand the project requires a robust ETL process to handle and transform extensive B2B records, followed by a microservice to execute hybrid searches using OpenAI embeddings and BM25. Our experience in AI-first product development, particularly with LLMs and RAG systems, aligns perfectly with your requirements. Leveraging our expertise in Python, PostgreSQL, and FastAPI, we will build an efficient, scalable, and maintainable solution. Our customer-centric approach ensures the pipeline integrates seamlessly with your existing architecture while maintaining high performance and accuracy in both Hungarian and English searches. Our portfolio includes implementing complex APIs and data processing pipelines like the one you need, ensuring we deliver uncompromised quality. We look forward to discussing how our top-rated, AI-enabled strategies can drive your project forward. Q: Are there specific constraints or existing dependencies we should consider in the ETL process? Q: Would you like to schedule a call to discuss potential challenges and our approach in more detail? Let's create an intelligent, impactful solution together. We are excited about the possibility of collaborating on this project.
$10 000 USD 44 päivässä
7,7
7,7

Hello, I READ YOUR REQUIREMENTS CAREFULLY AND UNDERSTOOD VERY WELL THE PROJECT SCOPE FOR THE BILINGUAL AI SEARCH PIPELINE (ETL + SEARCH API) AND WILL EXECUTE IT IN CLEAR, CONTROLLED STAGES. I HAVE 10+ YEARS OF EXPERIENCE WORKING WITH PYTHON, POSTGRESQL, NLP, SEARCH SYSTEMS, AND AI-DRIVEN DATA PIPELINES, INCLUDING MULTI-LANGUAGE PROCESSING AND PRODUCTION-GRADE APIs. I WOULD START WITH DATABASE SCHEMA AND INDEX IMPLEMENTATION, FOLLOWED BY THE BILINGUAL ETL PIPELINE (LANGUAGE DETECTION, TRANSLATION, AI ENRICHMENT, VECTOR + BM25 GENERATION), AND THEN BUILD THE SEARCH MICROSERVICE WITH LANGUAGE-AWARE RRF SCORING AND CROSS-LANGUAGE FALLBACK, EXACTLY AS SPECIFIED. ALL COMPONENTS WILL BE DOCKERIZED, TESTED PER ACCEPTANCE CRITERIA, AND DOCUMENTED FOR EASY DEPLOYMENT AND VALIDATION. I WILL PROVIDE 2 YEAR FREE ONGOING SUPPORT AND COMPLETE SOURCE CODE, WE WILL WORK WITH AGILE METHODOLOGY AND WILL GIVE YOU ASSISTANCE FROM ZERO TO PUBLISHING ON STORES / PRODUCTION DEPLOYMENT. I AM AVAILABLE AS PER YOUR TIME ZONE, WILL PROVIDE WEEKLY PROGRESS UPDATES, AND WILL WORK UNTIL THE DELIVERABLES AND QUALITY TARGETS ARE MET. I EAGERLY AWAIT YOUR POSITIVE RESPONSE. Thanks
$5 000 USD 7 päivässä
6,6
6,6

Hi, this is Elias from Miami. I read your requirements and I get the outcome: a Python 3.10+ codebase split into two deployable units where ETL pulls only from your internal Postgres, cleans + enriches 38k+ categories and 5–10k companies, generates OpenAI text-embedding-3-small vectors, and writes into a pgvector schema with idempotent reruns. Then a FastAPI service exposes hybrid search (BM25 + dense vectors) blended with RRF, working well in Hungarian + English using huSpaCy/spaCy + OpenAI. How I’d implement it: * ETL: incremental + idempotent (upsert + content hash), batching, retryable OpenAI calls, enrichment + categorization, writes vectors + normalized text + language fields * Retrieval: BM25 + vector query, then RRF merge with tunable weights, returning a single ranked list including source scores * Service: /search, /batch, /health, Dockerfiles, env-driven config, and an end-to-end test script that proves HU + EN queries work Q1: Do you require true BM25 inside Postgres (extension like ParadeDB/pg_search), or is “BM25-style lexical” via tsvector acceptable if we match ranking behavior closely? Q2: What fields should be embedded and searched (name, description, tags, categories), and should HU/EN be embedded separately or combined? Q3: Do you want ETL to run from scratch only, or also support CDC-like incremental updates (e.g., nightly + on-demand)? Regards, Elias
$7 500 USD 7 päivässä
6,4
6,4

Hi, We’ve built similar AI-driven search solutions that combine multiple data sources and use advanced NLP techniques for language processing. One of our products, Descripio, uses a hybrid search approach to extract product descriptions from Amazon and eBay, enriching them with LLM-generated data to improve search results. For your project, we can leverage our expertise in Python, FastAPI, and Azure/AWS to create a robust, production-ready solution. We also have strong in-house developers for front-end work, ensuring you get a fully integrated product without the hassle of managing multiple resources. Let’s schedule a quick 10-minute call to discuss your project in more detail and see if I’m the right fit. I usually respond within 10 minutes. Best regards, Adil
$7 333,33 USD 21 päivässä
5,9
5,9

As a bilingual software engineer with a strong background in data processing and Python, I am uniquely qualified to tackle your Bilingual AI Search Pipeline project. Having previously worked on diverse projects such as full-stack development, IoT systems design, and cybersecurity solutions, I bring an extensive skill set that directly aligns with your needs. In addition to my language capabilities in English and Arabic, which allow me to understand the nuances of working in multiple languages, my expertise with data processing tools such as spaCy, huspacy, and PostgreSQL makes me an ideal candidate for constructing a pipeline that efficiently transforms and categorizes your B2B category records and company profiles. I'm also familiar with OpenAI text-embedding-3-small and the pgvector-enabled schema. My commitment to quality and meticulous attention to detail ensures that I will deliver the cleanest, most enriched dataset possible in an idempotent manner. Moreover, my experience developing robust REST APIs using FastAPI coupled with my knowledge of Docker will streamline the process of exposing the hybrid search functionality you desire.
$8 333,33 USD 12 päivässä
5,7
5,7

⭐⭐⭐⭐⭐ Hello Valuable Client, Your requirements align closely with what CnELIndia and Raman Ladhani specialize in delivering—production-ready, multilingual search systems built on solid data engineering foundations. We will start by designing a robust Python 3.10+ ETL package that connects securely to your internal PostgreSQL databases, ingests all 38,000+ B2B categories and 5,000–10,000 company profiles, and performs deterministic cleaning, normalization, and enrichment. Using spaCy and huSpaCy, we will handle Hungarian and English language processing, followed by OpenAI text-embedding-3-small vectorization. The enriched data will be stored in a pgvector-enabled schema with full idempotency to support safe reruns. In parallel, we will deliver a Dockerized FastAPI microservice exposing REST endpoints for single and batch search. The service will combine BM25 and dense vector search, blended using Reciprocal Rank Fusion for consistent hybrid relevance. Clear health checks, environment configuration, and end-to-end tests will ensure acceptance criteria are met. CnELIndia provides the engineering rigor, while Raman Ladhani leads architecture, search relevance, and delivery—ensuring the project is completed smoothly, on time, and ready for production.
$7 500 USD 7 päivässä
5,8
5,8

Hello, I understand you’re looking for a production-ready bilingual AI search pipeline that ingests internal PostgreSQL data, performs cleaning, enrichment, vectorization, and exposes a FastAPI microservice for hybrid search. I have extensive experience building ETL pipelines, vector search with pgvector, and REST APIs using FastAPI and Docker. I can design an idempotent ETL flow handling 38,000+ B2B category records and 5,000–10,000 company profiles, performing language-aware preprocessing in English and Hungarian using spaCy, huspacy, and OpenAI embeddings (text-embedding-3-small). The solution will include two deployable Python 3.10+ units: an ETL package that pulls from existing tables, enriches data, computes dense embeddings, and writes to pgvector safely, and a FastAPI microservice providing single-query and batch-search endpoints. Search results will blend dense vector similarity with BM25 using Reciprocal Rank Fusion (RRF) for accurate multilingual retrieval. Dockerized deployments and a concise README with environment variables and health checks will ensure smooth setup and maintenance. I prioritize clean code, bilingual accuracy, and reliable end-to-end search. Acceptance testing will confirm both Hungarian and English queries return properly ranked results with BM25 and vector hits combined. The project will be fully tested, documented, and production-ready, meeting all RFQ requirements and supporting future scalability. Thanks, Asif
$10 000 USD 11 päivässä
5,6
5,6

Hello, HAVE HANDS-ON EXPERIENCE WITH SUCH PROJECT I have 11+ years of proven experience building AI search systems, ETL pipelines, and multilingual data platforms, and I confidently understand the depth and rigor of your bilingual AI search requirements. The goal is to deliver a scalable, high-accuracy, bilingual AI search infrastructure that combines robust ETL processing with fast, language-aware hybrid search. -->> Bilingual ETL (Hungarian & English) with hashing, translation & NLP -->> pgvector + BM25 hybrid search with RRF scoring -->> FastAPI-based search microservice with auto language detection -->> Dockerized, test-covered, production-ready stack I follow clean architecture, strict data quality controls, optimized AI usage, and an agile, test-driven workflow. Let’s connect in chat, as I have a few technical questions around scaling, index tuning, and translation validation before proceeding. Thanks, Julian
$5 100 USD 7 päivässä
5,9
5,9

Hi there, Could you clarify if you already have existing data structures in place for the ETL pipeline? I propose to build a bilingual AI search infrastructure using Python and PostgreSQL, ensuring seamless processing of over 38,000 B2B categories and 5,000-10,000 company profiles. I’ll implement a robust ETL pipeline with language detection, along with a REST API that allows hybrid search capabilities using vector embedding and BM25 scoring. Utilizing OpenAI’s models, I’ll ensure high-quality translations and embeddings to keep technical terms intact. I’ve successfully built similar infrastructures before and I’m confident in delivering this project within your specified timeline. Let’s discuss how I can help further! Best, Badar Madni
$8 000 USD 30 päivässä
5,1
5,1

Hi, I have delivered enterprise Java backend systems remotely, focusing on clean architecture, performance, and reliable API development. I can contribute immediately as a senior Java developer, owning feature delivery end to end while keeping code maintainable and well tested. My experience includes building Spring Boot microservices, designing RESTful APIs, integrating with SQL databases, and optimizing transaction heavy workflows. I work comfortably with modern Java practices, dependency injection, layered architecture, and automated testing with JUnit and Mockito. I also have strong Git based collaboration habits, clear pull request communication, and familiarity with CI/CD pipelines. I am proactive in debugging complex production issues, improving legacy code through refactoring, and delivering stable releases in agile sprint environments. I am ready for long term remote collaboration and can work with a professional, reliable cadence. Best Alex
$8 000 USD 30 päivässä
5,4
5,4

I’m very interested in building your production-ready bilingual search pipeline that ingests, cleans, enriches, and indexes your PostgreSQL B2B and company datasets. I specialize in Python-based ETL, vector search, and API development, and I have experience implementing pgvector-enabled architectures with hybrid search combining dense embeddings and BM25, including RRF scoring for ranking. For your project, I propose delivering: ETL package: Connects to your internal PostgreSQL tables, performs vectorization (OpenAI text-embedding-3-small), categorization, and enrichment, and writes to a pgvector-enabled schema with idempotent reruns. FastAPI microservice: Offers REST endpoints for single and batch queries, supporting Hungarian and English, integrating spaCy/huspacy for language handling, and blending vector + BM25 results via RRF scoring. Deployment-ready code: Includes Docker files, environment configuration, and a concise README with health checks and usage instructions. I can provide a robust, end-to-end solution that meets your acceptance tests, ensures reproducibility, and is fully production-ready. I have delivered similar AI search pipelines handling multilingual data, vector search, and hybrid retrieval for enterprise clients. I would be excited to bring this project to life and ensure a scalable, high-performance solution for your team. Best regards, Jiayin
$7 500 USD 7 päivässä
4,8
4,8

Hello, I’m excited about the opportunity to build your production-ready hybrid search stack on PostgreSQL + pgvector. With strong experience in Python ETL pipelines, vector search, and FastAPI microservices, I can deliver a clean two-unit codebase that ingests your internal Postgres data, performs idempotent enrichment/vectorization, and exposes a reliable hybrid search API with BM25 + embeddings blended via RRF. I will implement the ETL to pull and transform your 38k+ categories and 5k–10k company profiles, run cleaning and multilingual handling (Hungarian/English via huSpaCy/spaCy), generate embeddings with text-embedding-3-small, and write into a pgvector-enabled schema designed for reruns and traceability. You can expect a FastAPI service with single and batch search endpoints, Dockerized deployment, clear env/health checks, and end-to-end validation so Hungarian and English queries return ranked results that include both BM25 and dense vector hits. Best regards, Juan
$5 000 USD 7 päivässä
4,9
4,9

Hi, I'm excited to propose for your production-ready search stack project. With over a decade of experience in the field, I can deliver on time and within budget. For this project, I will develop an ETL flow using Python 3.10+ that connects directly to your internal PostgreSQL databases. The pipeline will ingest and transform approximately 38,000 B2B category records and between 5,000 to 10,000 company profiles, performing cleaning, vectorization, and enrichment steps before storing the data in a pgvector-enabled schema. Additionally, I’ll create a FastAPI microservice with endpoints for single-query search and batch queries. The service will support hybrid search using dense vectors (OpenAI text-embedding-3-small), BM25, and blended RRF scoring. Language handling will be managed by huspacy, spaCy, and OpenAI, ensuring results are effective in both Hungarian and English. The codebase will consist of two deployable units: the ETL package for data ingestion and transformation, and a FastAPI microservice with Docker files and a README that explains environment variables and health checks. Acceptance criteria include end-to-end tests where I run the ETL, hit /search with both Hungarian and English queries, and receive ranked results combining BM25 and vector hits. You can check my portfolio on my profile for references. Best regards, Reed
$7 550 USD 40 päivässä
4,6
4,6

Hi — I reviewed your brief and it’s a clean, buildable scope: Postgres-only ETL → pgvector enrichment → FastAPI hybrid search with BM25 + embeddings fused via RRF, working equally in Hungarian and English. How I’ll execute: ETL package (Python 3.10+): pull from existing tables, normalize/clean key text fields, language-detect HU/EN, enrich categories, generate OpenAI text-embedding-3-small embeddings in batched jobs with retry + backoff, then idempotent upserts into a pgvector-enabled schema (migrations + rerun-safe checkpoints). Hybrid retrieval: BM25/lexical (Postgres FTS or BM25 extension if you prefer) + dense vector similarity, then RRF fusion with transparent scoring so you can see BM25 hits + vector hits in the final rank list. Hungarian + English: huSpaCy/spaCy analyzers for normalization; safe fallbacks when language models miss; optional query expansion only when needed. FastAPI service: /search and batch endpoint, health checks, env-driven config, Dockerfiles, and an end-to-end test that proves HU + EN queries return blended results. A couple quick questions to lock scope: Preferred lexical layer: Postgres FTS only, or are you open to a BM25 extension? Which text fields should be embedded (name, description, tags, categories)? Any latency target for /search (p95)? — Deepanshu
$6 000 USD 7 päivässä
4,2
4,2

Hi, I can deliver a production-ready Python search stack exactly as described. I’ve built similar ETL + search systems using PostgreSQL + pgvector, OpenAI embeddings, BM25, and RRF blending, with clean separation between data pipelines and FastAPI services.
$8 000 USD 30 päivässä
3,9
3,9

Hello, I have reviewed the details of your project. i will use python 3.10+ and fastapi, organizing the work in two deployable units. first, the etl package will connect to your internal postgresql databases, pull the 38 000+ b2b category records and 5 000–10 000 company profiles, and run cleaning, vectorization, and enrichment. vectors will be generated using openai text-embedding-3-small and stored in a pgvector-enabled schema. data transformation will include category mapping, deduplication, and normalization to ensure consistent results on reruns. idempotency will be handled so repeated runs do not duplicate or corrupt data. second, the fastapi microservice will expose endpoints for single-query and batch searches. hybrid search will combine dense vectors with bm25 scoring and rrf fusion. huspacy will handle hungarian text and spacy will handle english text, with fallback generation via openai for missing embeddings. Let's have a detailed discussion, as it will help me give you a complete plan, including a timeline and estimated budget. I will share my portfolio in chat I look forward to hear from you. Thanks Best Regards, Mughira
$7 500 USD 7 päivässä
4,1
4,1

Hi, I’m Karthik, a Full-Stack & AI Engineer with 10+ years of experience building production-grade ETL pipelines, vector search systems, and API-driven microservices. I can deliver your bilingual (Hungarian + English) AI search stack exactly as specified, ready for end-to-end acceptance testing. Proposed Implementation • Python 3.10+ codebase, cleanly structured and documented • ETL pipeline pulling only from internal PostgreSQL • Ingest 38k+ categories & 5k–10k company profiles • Data cleaning, enrichment, categorisation • Vectorisation using OpenAI text-embedding-3-small • Storage in PostgreSQL with pgvector, supporting idempotent re-runs Search Microservice • FastAPI REST service • Hybrid search: BM25 + dense vectors + RRF blending • Equal-quality results in HU & EN using huspacy, spaCy, OpenAI • Endpoints for single and batch search • Dockerised with health checks and clear README Quality & Acceptance • Deterministic hybrid ranking (BM25 + vector hits) • End-to-end tested ETL → /search workflow • Production-ready deployment artefacts I’ve built multilingual NLP pipelines, vector search systems, and scalable ETL stacks for B2B platforms. Ready to start immediately and deliver a fully compliant, test-passing solution.
$9 990 USD 7 päivässä
5,3
5,3

⭐ Hello there, My availability is immediate. I read your project post on Python Developer for Bilingual AI Search Pipeline. I am an experienced full-stack Python developers with skill sets in - Python, Django, Flask, FastAPI, Jupyter Notebook, Selenium, Data Visualization, ETL - React, JavaScript, jQuery, TypeScript, NextJS, React Native - NodeJS, ExpressJS - Web App Development, Data Science, Web/API Scrapping - API Development, Authentication, Authorization - SQLAlchemy, PostegresDB, MySQL, SQLite, SQLServer, Datasets - Web hosting, Docker, Azure, AWS, GPC, Digital Ocean, GoDaddy, Web Hosting - Python Libraries: NumPy, pandas, scikit-learn, tensorflow, etc. Please send a message So we can quickly discuss your project and proceed further. I am looking forward to hearing from you. Thanks
$8 500 USD 30 päivässä
4,2
4,2

Hi Eszter, I can build a Python 3.10+ ETL pipeline to ingest your PostgreSQL tables, clean, enrich, and vectorize 38k+ category records and 5k–10k company profiles, storing them in a pgvector-enabled schema with idempotent reruns. For search, I’ll deliver a FastAPI microservice supporting hybrid search with OpenAI embeddings, BM25, and RRF blending, handling Hungarian and English via huspacy/spaCy and fallback strategies. Both ETL and API will be fully Dockerized with health checks and documented environment variables. End-to-end tests with multilingual queries will validate correct ranking. Could you clarify your primary goal—fast relevance, multilingual coverage, or enrichment depth so I can optimize the pipeline accordingly? Best; Zaman
$8 500 USD 30 päivässä
3,4
3,4

Budapest, Hungary
Liittynyt tammik. 6, 2026
$1500-3000 USD
₹12500-37500 INR
£10-15 GBP/ tunnissa
$25-50 USD/ tunnissa
$250-750 USD
$8-15 USD/ tunnissa
$15-25 USD/ tunnissa
$250-750 AUD
$10-30 USD
$250-750 USD
₹600-1500 INR
₹1500-12500 INR
$25-50 USD/ tunnissa
₹1500-12500 INR
$10-40 USD
$30-250 USD
₹600-1500 INR
₹600-1500 INR
₹1500-12500 INR
$30-250 USD
₹1500-12500 INR