(: file: dt08_189_documentation.xq date: 07-Aug-2009 author: Gary Lewis purpose: Documentation of the process used to create XML data suitable to address the question: Is higher education countercyclical? 1. Enrollment data is from Table 189 in the 2008 Digest of Education Statistics. 2. The url for this table is: http://nces.ed.gov/programs/digest/d08/tables/dt08_189.asp?referrer=list 3. From an open browser pointing to this url, do: View > Page Source and then: File > Save Page As [accept the default dt08_189.asp]. 4. Edit dt08_189.asp to find the encoding used. It appears as: 5. At OS prompt, run this tidy command to clean up the html and produce valid xhtml: tidy --error-file dt08_189.err --output-file dt08_189.xml --output-xhtml yes --add-xml-decl yes --quote-nbsp no --char-encoding latin1 dt08_189.asp Note: latin1 encoding refers to iso-8859-1 in tidy. 6. Examine dt08_189.err to make certain it is benign. 7. Edit dt08_189.xml to remove the namespace xmlns from the tag. 8. Manually create dt08_189_metadata.xml with column headers in Table 189. A sample of this xml: ... 9. Run mfred_1series_observations.xq to produce the unemployment XML as fred_UNRATE_observations.xml: zorba -o fred_UNRATE_observations.xml -f -q mfred_1series_observations.xq -e series_id:="UNRATE" -z indent=yes Note: mfred_1series_observations.xq is available at http://garymlewis.com/instchg/public/xquery. 10. The dates for business cycles are available at: http://research.stlouisfed.org/fred2/help-faq/#graph_recessions Cut and paste the dates into a text editor. Then modify and save the file as fred_us_recessions.xml. A sample of this xml: ... 11. Run dt08_189_enrl.xq as follows to produce the enrollment XML data needed for analysis: zorba -o dt08_189_enrl.xml -f -q dt08_189_enrl.xq -z indent=yes 12. Run dt08_189_econ.xq as follows to produce the XML with unemployment rates and recession periods for analysis: zorba -o dt08_189_econ.xml -f -q dt08_189_econ.xq -z indent=yes 13. Run dt08_189_final.xq as follows to produce the merged XML with enrollment, unemployment, and recession data: zorba -o dt08_189_final.xml -f -q dt08_189_final.xq -z indent=yes 14. Use dt08_189_final.xml and the statistics program R to produce the final graphs. :)