Clustering from an email corpus

There is a corpus of emails exchanged among different senders and receivers within an organization. A server stores all these emails along with their timestamps and sender and recipient names.

Upon application of a query string, the system should find out similary measures of the entities ( individuals in the organization / senders or recivers ) by analyzing the context ( which is the query string ) againts the email corpus.

There is an undirected graph where the nodes will represent entities within the organization. Any link between 2 nodes will represent e-mail correspondence between the two individuals.

The links will be assigned weights proportional to the similarity measures. The similarity measures will be measures of similarity of a query being searched ( context ) against the content of the emails exchanged between the individuals.

Besides, entities need to be clustered or grouped based on query context againts the email corpus.

This is a machine learning and AI problem. Spectral Clustering can also be applied. Anyone with extensive AI and data mining and machine learning background is welcome to post a bid. A report of the etchnique and statistcis/ mathematics along with prototype code need to be developed. Will give more existing code and report to work with.

## Platform

java, C# Dot Net

