Main Page
From webdb
This is the Wiki of the WebDB group at the School of Computing, at the University of Utah.
The WebDB group is led by Professor Juliana Freire. Our research spans a number of topics that lie in the intersection between databases, Web, information retrieval and machine learning. Our work is/has been funded by grants and contracts from the Department of Energy, the National Science Foundation, the Army Research Office, an IBM Faculty Award, and the University of Utah Seed Grant program.
- Projects internal
[edit] Finding, retrieving, querying and integrating Hidden-Web data
Our goal in this project is to build infrastructure which simplifies the process of locating and organizing Web content that is hidden behind form interfaces. We have developed a set of techniques and software components, including:
- A hidden-Web crawler that automatically retrieves data behind keyword-based interfaces;
- A focused-crawler that is efficient for locating sparse concept on the Web, and which we have used to locate online Web databases and services;
- A classifier ensemble that is able to determine the domain of Web forms with high accuracy;
- A clustering strategy for organizing a large set of Web forms; and
- A learning-based approach for automatically extracting labels from Web forms.
Many of these components have been used to build DeepPeep, a search engine specialized in Web forms. You can download the forms of the DeepPeep's current version
here.
