We’d like to merge a main database with three (maybe 4) other databases based on a company’s name for each of the years 2005-2017.
The names are not identical across all databases.
For example- JPMorgan & company vs. J.P. Morgan investment.
There are cases with more than one match per company every year (for example, one parent company and few subsidiaries). In other cases there are no matches at all.
If the databases contain empty years, we ask to fill in the empty years. For example if there is data for a specific company in database #2 in 2007 and 2009 but empty in 2008, replicate 2007’s data into 2008 only in this database. Do the same in all cases.
Each database contains the name of the company and its ID number (a different ID number in each database). The goal is to create a merged database that contains the following columns each year. We attach a theoretical example below:
Company name (as it appears) in main database
Company ID number in main database
Company ID number in database #2
Company ID number in database #3
Company ID number in database #4
(attached example of the desired target file).
Some firms have different names under the same ID at the same year (due to mergers or other reasons). Although the matching is based on the company’s name, we need the ID numbers. If 2 different names appear in the same database under the same ID number at the same year, refer to them as one company with the common ID.
1. sample from one of the source DB's
2. example of desired output file after merging all the DB's.
The challenge in this project is to do heuristic/similarity search of the firms names (fuzzy search , spacy similarity, cosine similarity in python/vb/excel macro or any other methods).
Project budget - 100$.
37 freelanceria on tarjonnut keskimäärin $114 tähän työhön
Hi there, I can assist in cleaning up your database and get into the output format you want. How many records are there in each database? Regards Reuben
Hello, I have prepared a sample for your review and comment. I can't automate a process but I have a solution for this. Could you please share all your files? Please drop a PM if you’re interested. Best regards, Hoang