The Gatherer retrieves information resources using a variety of standard access methods (FTP, Gopher, HTTP, News, and local files), and then summarizes those resources in various type-specific ways to generate structured indexing information. For example, a Gatherer can retrieve a technical report from an FTP archive, and then extract the author, title, and abstract from the paper to summarize the technical report. Harvest Brokers or other search services can then retrieve the indexing information from the Gatherer to use in a searchable index available via a WWW interface.
The structured indexing information that the Gatherer collects is represented as a list of attribute-value pairs using the Summary Object Interchange Format (see Appendix B). Several example Gatherers are provided with the Harvest software distribution (see Appendix C).