Extraction of templates from heterogeneous web pages using cluster techniques

Most of the information in WWW is in the form of unstructured text which makes the information hard to query. To make the queries easy and to provide the result accurately, template extraction technique is used. In the existing system the techniques which are used to extract the data is not efficient and causes factors such as delay, accuracy and duplicate data. The proposed system is presented with Hyper Graph technique for extracting the templates from a large number of web documents which are generated from heterogeneous templates for making the web search more efficient in cost, performance and response times. The system analysis and the requirements are given in this paper. Design has to be implemented in future and is applicable to certain changes as and when required in order to develop a prototype for the proposed work.

