

This is the README file for script called get-mbooks.pl

This script -- get-mbooks.pl -- is an OAI harvester. It makes a
connection to the OAI data provider at the University of Michigan. [1]
It then requests the set of public domain Google Books (mbooks:pd)
using the marc21 (MARCXML) metadata schema. As the metadata data is
downloaded it gets converted into MARC records in communications format
through the use of the MARC::File::SAX handler.

The magic of this script lies in MARC::File::SAX. Is a hack written by
Ed Summers against MARC::File::SAX found on CPAN. It converts the
metadata sent from the provider into "real" MARC. You will need this
hacked version of the module in your Perl path, and it has been saved in
the lib directory of this distribution.

To get get-mbooks.pl to work you will first need Perl. Describing how to
install Perl is beyond the scope of this README. Next you will need the
necessary modules. Installing them is best accomplished through the use
of cpan but you will need to be root. As root, run cpan and when
prompted, install Net::OAI::Harvester:

  $ sudo cpan
  cpan> install Net::OAI::Harvester

You will also need the various MARC::Record modules:

  $ sudo cpan
  cpan> install MARC::Record

When you get this far, and assuming you the hacked version of
MARC::File::SAX is saved in the distribution's lib directory, all you
need to do next is run the program.

  $ ./get-mbooks.pl

Downloading the data is not a quick process, and progress will be
echoed in the terminal. At any time after you have gotten some
records you can quit the program (ctrl-c) and use the Perl script
marcdump to see what you have gotten (marcdump <file>).

Fun with OAI, Google Books, and MARC.

[1] http://quod.lib.umich.edu/cgi/o/oai/oai

-- 
Eric Lease Morgan
May 26, 2008 --Memorial Day Observed

