Using the ANC with Xaira

Setting up Xaira and the ANC is a relatively straight forward task. It may look like there are a large number of steps involved, but each step is essentially the same:

These instructions only describe how to set up an elementary index for the ANC. Xaira supports much more complex indexing capabilities. Please refer to the Xaira documentation for more details.

Requirements

It is assumed that you have unpacked the ANC files somewhere onto your hard drive. We will refer to this location as the ANC home directory from now on. At a minimum you will need the "merged" directory and the XML files respStmt.xml and publicationStmt.xml in the ANC home directory.

It is also assumed that you have the latest available build of Xaira installed (v1.10 as of this writing).

Steps

  1. Download the ANCXaira.zip file. The zip archive contains the following files:
    • BulkXsltW.exe : a program for preprocessing the ANC files.
    • anc.xsl : a XSLT style sheet used to preprocess the ANC.
    • bib.xsl.txt : A text file containing the XSL to be pasted into Xaira when creating the bibliography
    • nyt-headers-fixed.zip : A zip file containing 14 updated/repaired headers for the NY Times files.
    Unzip the ANCXaira.zip file to your ANC home directory. Unzip the nyt-headers-fixed.zip into the ANC home directory as well. The structure of the files in the nut-headers-fixed zip file is the same as the directory structure of the merged\nytimes directory so the headers will unzip to the proper locations if unpacked into the ANC home directory.
  2. Create a new directory called "texts" in your ANC home directory.
  3. Start the BulkXsltW program. If the BulkXsltW program is in the ANC home directory you can accept the default values and simply click the "Transform" button. Otherwise enter the following:
    • For the input directory browse to the ANC home directory and select the "merged" directory.
    • For the XSL file select the "anc.xsl" file you installed above.
    • For the output directory select the "texts" directory you created above.
    • Leave the rest of the fields as they are.
    • Click the "Transform" button.
    • Wait. (it takes approximately 15 minutes on a 1.5 GHz machine.)
    Quit the BulkXslt program when it completes.

    Note: The BulkXslt program is a Java application bundled as an executable and requires that you have Java 1.4 or later installed on your system. Since Xaira is a Windows application I have only provided a Windows executable. I can provide executables for other platforms, a Java Jar file, or the Java source code if needed.
  4. Start Xaira's index tools program.
    1. Select "New" from the "File" menu. You should see the message "(new) New Corpus" in Xaira's window.
    2. Select "Parameter file..." from the Tools menu
      1. Enter any thing you want as the name
      2. Click the Browse button for the "Root" and select your ANC home directory.
      3. Click the Default button.
      4. Click the "Advanced button"
        • Make sure the "XML Validation" check box is not checked.
        • Click OK
      5. Click Ok.
    3. Select "File list..." from the "Tools" menu. Click the "Generate" button. There should be 11405 files in the new file list. Click Ok.
    4. Select "Make Header" from the "Tools" menu. Click "Ok" if you are asked if you want to make a new header. Go for coffee.
    5. It's probably a good idea to save your work at this point. Select "File -> Save" or click the "Save" icon on the toolbar.
    6. Select "Make Bibliography". Replace the contents of the window with the following:

      <!-- Replace the select path by a path from the root of the document to the bibliography. --> <xsl:template match="/" xmlns:x="http://www.xces.org/schema/2003"> <xaira:bibliography> <xsl:apply-templates select="//x:monogr" mode="xces"/> </xaira:bibliography> </xsl:template> <xsl:template match="*" mode="xces"> <xsl:element name="{local-name()}"> <xsl:apply-templates select="@*" mode="xces"/> <xsl:apply-templates mode="xces"/> </xsl:element> </xsl:template> <xsl:template match="@*|text()" mode="xces"> <xsl:copy-of select="."/> </xsl:template> <!-- Make sure nothing else produces output --> <xsl:template match="text()"/>

      Click the "Ok" button and go for another coffee.
      Note: You can cut and paste the above from the bib.xsl.txt file.
    7. Save the corpus again.
    8. Select "Special tags.." from the Tools menu. Select "Word break" from the combo box and "tok" in the Tags list. Click Ok.
    9. Select "Additional keys..." from the Tools menu and add the following two keys:
      1. For part of speech
        • Name : POS
        • Description : Part of speech
        • Element : tok
        • Attribute : msd
        • Proc : Use value
      2. For lemmata
        • Name : Lemma
        • Description : Lemma
        • Element : tok
        • Attribute : base
        • Proc : Use value
        • Lemma scheme : checked
      Click Ok to close the Additional keys dialog.
    10. Select "Indexer -> Run" from the "Tools" menu. Wait. Try to time this so you can select Run, turn off the lights, and go home for the night..
    11. Select "XCorpus file..." from the "Tools" menu. Click the OK button.
  5. Close IndexTools.

You should now be able to open and query the ANC with Xaira.


Acknowledgements

The ANC acknowledges the following, who have provided software and/or software support for ANC development:

Valid XHTML 1.0!