Summer Internship, Harvard University, Cambridge MA

Creating a taxonomy of scholarship at Harvard


Harvard Faculty Finder (HFF) (http://facultyfinder.harvard.edu) is a new website that enables a search and browse interface to all Harvard faculty. It links together databases across Harvard University, including its Human Resources database, the Harvard OnLine Library Information System (HOLLIS), Thomson Reuters Web of Science (WoS), the Harvard Course Catalog, and many others. In order to improve the HFF web interface and make HFF a more useful tool for data mining and analytics, we seek a student intern to assist us in an ongoing project to create a taxonomy for HFF. Specific tasks include: (1) Matching terms in different taxonomies that already exists for some of the source databases for HFF. For example, journal articles in WoS are classified by subject areas (e.g., "Chemistry, Organic"), while books in HOLLIS are assigned Library of Congress call numbers (e.g., "QD241-441 Organic Chemistry"). (2) Reviewing computationally generated taxonomies. Using data mining algorithms applied to the content within HFF, we automatically created preliminary discipline-specific taxonomies for Harvard faculty. Manual review is needed to flag concepts/keywords that are inappropriate and to compare the taxonomies to similar ones developed by certain departments at Harvard. (3) Identifying errors in HFF's name disambiguation algorithms. HFF automatically attempts to match publications and other content to the correct faculty. Taxonomies can help discover possibly incorrect matches, such as an article in a medical journal being matched to a professor in the Music department. Similarly, unexpected matches that are actually correct can be used to improve the taxonomies.

Creating a taxonomy of scholarship at Harvard

Harvard Faculty Finder (HFF) (http://facultyfinder.harvard.edu) is a new website that enables a search and browse interface to all Harvard faculty. It links together databases across Harvard University, including its Human Resources database, the Harvard OnLine Library Information System (HOLLIS), Thomson Reuters Web of Science (WoS), the Harvard Course Catalog, and many others.

In order to improve the HFF web interface and make HFF a more useful tool for data mining and analytics, we seek a student intern to assist us in an ongoing project to create a taxonomy for HFF. Specific tasks include: (1) Matching terms in different taxonomies that already exists for some of the source databases for HFF. For example, journal articles in WoS are classified by subject areas (e.g., "Chemistry, Organic"), while books in HOLLIS are assigned Library of Congress call numbers (e.g., "QD241-441 Organic Chemistry"). (2) Reviewing computationally generated taxonomies. Using data mining algorithms applied to the content within HFF, we automatically created preliminary discipline-specific taxonomies for Harvard faculty. Manual review is needed to flag concepts/keywords that are inappropriate and to compare the taxonomies to similar ones developed by certain departments at Harvard. (3) Identifying errors in HFF's name disambiguation algorithms. HFF automatically attempts to match publications and other content to the correct faculty. Taxonomies can help discover possibly incorrect matches, such as an article in a medical journal being matched to a professor in the Music department. Similarly, unexpected matches that are actually correct can be used to improve the taxonomies.


To apply, please forward a resume and cover letter to amy_brand@harvard.edu.

 

Opportunities