Looking for a python developer to help me finish a search engine with tf-idf and cosine similarity + query WITHOUT libraries such as sklearn

I am looking for a python developer, preferably an expert in NLP, to help me finish a search engine for one of my college courses. The first part of the code, which is an inverted index, is already done.

Please DO NOT change any parts of the pre-existing code, except for the parts instructed. It is important to keep the posting lists as they are - DO NOT shorten them.

As I only have a limited number of characters, i have added a file that contains a more detailed job description, which examples, as well as a screenshot of what the result should look like.

Please read the instructions carefully first and have a look at the screenshot before bidding. It is of great importance to follow the instructions (e.g. NOT using libraries for certain parts)

This task should not be too much trouble for a skilled developer.

Here is the rough outline of what needs to be done:

- the tokens need to be stemmed, using snowballstemmer for German. It MUST be done using a separate function, do not stem in the same function as tokens are counted. I have noted in the code where to add this part.

Stemming has also to be done in the queries. So, for example, if you type in "eating" in the queries (both inverted index AND cosine similarty), anything starting with "eat" should be printed out.

- tf-idf needs to be calculated. MOST IMPORTANTLY: you CANNOT use any libraries for this. So DO NOT use sklearn, tfidfvectorizer or anything like that.

Each part (tf, idf, tfidf) needs to be calculated in a separate function. I have noted where to add these in the code as well.

If you use a library like tfidfvectorizer, or anything else that does the same, I cannot accept the code.

- cosine similary has to be calculated; also MUST be done using a function, NO libraries (No sklearn, etc.)

it has to be calculated based on whatever is typed into a query, comparing to the texts in the corpus.

This query has to be accessed using the main function by typing in "2" in the menu. (menu already implemented; please find the corresponding part in the main function to add the query)

The user should be able to search for words and then see the cosine similarity, tf, idf, and the final tf-idf for the Top N (e.g. Top 10) ranked document names AND document IDs for each result (please view the screen shot for this)

after choosing the option for tf-idf in the menu (menu already implemented, tf-idf is chosen by entering "2"), first, the overall top 10 results (or any other number) for tf-idf should be printed out; without a query (no cosine similarty in this, as it is used for queries only).

it should look something like this:

Documents: [id: name (|d|)]

0: text1, 1: text2, 2: text3,....

dictionary: [term: idf | (doc: tf), (doc: tf), (doc: tf),...]

and then it should ask the user to type something into a query. the result should look something like this (using cosine similarity):

Query: food

Top 3 containing the queried word(s):

filename1 (file ID, tf | idf)

filename2 (file ID, tf | idf)

filename3 (file ID, tf | idf)

(please view the screenshot for details, you will understand what I mean)

The user should be able to type in more than just one word, but it the texts don't have to contain every single one of the words typed in in order to appear in the results.

the added screenshot, a commented screenshot, and the more detailed project description will give you more details. Please advice these if you need more information. I have also provided some of the texts I am working with.

Please note that the code has to be as simple as possible, nothing too hard/fancy. And it should be quite fast as I have to go through almost 4000 texts.

To test the query with the texts I provided, I recommend searching for "vater sohn" and see if cosine similarity works.

Taidot: Python, Machine Learning (ML), Luonnollinen kieli, tietojärjestelmäarkkitehtuuri, Vektorointi

Näytä lisää: pr4 links help search engine, developer help frames google image search, finish php code search engine, simple chatbot code in python github, freelancer, recommender system python example, how to create a chatbot in python using machine learning, movie recommendation system source code python, chatbot using python code, freelance tasks, simple chatbot code in python, pr3 test links help search engine rankings, python search engine, build search engine mysql python, regular expression java python search engine , coding search engine web developer 2008, search engine optimistion coldfusion developer ukraine, web developer flight search engine, developer flight search engine, looking search engine engineers

Tietoa työnantajasta:
( 2 arvostelua ) Birkenfeld, Germany

Projektin tunnus: #26972532

3 freelanceria on tarjonnut keskimäärin 190€ tähän työhön


Thanks for your posting!. I am a machine learning and I am familiar with programming languages such as Python/R/Matlab/Java/C++/Android. I have enriched experiences with machine learning jobs before and I am very inte Lisää

€30 EUR 3 päivässä
(10 arvostelua)

Hi, I can help you in your college course! I have been working in python for quite a lot of time and I can definitely take care of your search engine requirement for you. Here is some work which I have done before 1. Lisää

€50 EUR 7 päivässä
(0 arvostelua)

I have 3+ years of experience as a Python programmer and have worked on several Machine Learning projects mainly targeting the domain of Computer Vision and Digital Image Processing. Get effective Python programming / Lisää

€490 EUR 7 päivässä
(0 arvostelua)