Main Page

From webdb

Jump to: navigation, search

[edit] Finding, retrieving, querying and integrating Hidden-Web data

Our goal in this project is to simplify the process of finding and enable the exploration of Web content that is hidden behind form interfaces. We have developed a new technique to retrieve data behind keyword-based that is able to retrieve a high percentage of the hidden content in a completely automatic fashion. Another problem we have been exploring is that of finding online Web databases. Given the dynamic nature of the Web, where data sources are constantly changing, it is crucial to automatically discover these resources. However, considering the number of documents on the Web (Google already indexes over 8 billion documents), automatically finding tens, hundreds or even thousands of forms that are relevant to the integration task is really like looking for a few needles in a haystack. We have developed a new Web crawling strategy that is specialized for finding search/query forms and that achieves very high efficiency.


Personal tools