<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE gcapaper PUBLIC "-//GCA//DTD GCAPAP-X DTD 20021024 Vers 6.2//EN" "../gcapaper.dtd">
<gcapaper id="04-01-03" area="CM" level="HT" prestime="11:00" presdate="Thursday, 8 May" casestudy="1">
   <front>
      <title>Maintaining accessible websites with Microsoft Word and XML</title>
      <keyword>Accessibility</keyword><keyword>Authoring</keyword><keyword>Case Studies</keyword><keyword>Conformance</keyword><keyword>Content conversion</keyword><keyword>E-Government</keyword><keyword>Legacy Data Conversion</keyword><keyword>Markup</keyword><keyword>Validation</keyword><keyword>Document Creation</keyword><keyword>WWW</keyword><keyword>XSLT</keyword><keyword>Content Management</keyword><keyword>Dublin Core</keyword><author>
         <fname>Eoin</fname>
         <surname>Campbell</surname>
         <address>
            <affil>XML Workshop Ltd.</affil>
            <country>Ireland</country>
         </address>
         <bio>
            <para>Eoin Campbell has been working with SGML and XML technology for 13 years, and co-founded XML Workshop Ltd. in 1998. He  has spoken at a number of XML Europe conferences over the past 5 years.</para>
         <para>Recently, Eoin has designed a number of products and services that apply XML technology to improve website maintainability and accessibility.</para></bio>
      </author>
   <abstract><para>A key objective of the eEurope Action Plan, one of the strategy documents driving the Information Society at government level, is that public sector websites should be accessible to people with disabilities. Unfortunately, a recent study of Irish websites (<a href="http://eaccess.rince.ie/"/>) found that 94% failed to meet even the most basic Level 1 WAI accessibility guidelines, and the picture across Europe is unlikely to be much better. The goodwill may exist to make websites accessible, but the budget and the tools often may not. Expensive CMSs can be configured to generate WAI-compliant pages, but the most common tools used to create and maintain websites are still HTML and plain-text editors. These editors generally lack support for enforcing accessibility guidelines, although the latest editions of some popular editors are beginning to address this issue.</para>
         <para>This talk discusses how XML can be used to help authors using Microsoft Word to create HTML pages that are fully WAI-compliant, thus lowering the cost and simplifying the task of making websites accessible to all citizens.</para>
         <para>Much of the information on public sector web pages starts life inside a Word document. When the document is finally approved for publication, the process of getting it into HTML format begins. This process is generally manual, slow and error-prone, and results in an inaccessible web page. However, if styles are used properly in the document, then the conversion process can be completely automated, and generate fully WAI-compliant output. </para>
         <para>The use of structured mark-up is not limited to XML documents. Any word-processor that supports styles, such as Title, List, or Heading can be used to create structured content. Such documents can then be converted into XML format, while retaining the structure. Once in XML, it is easy to convert into HTML, automatically applying any required graphic design and navigation information as well. </para>
         <para>Teaching authors to use styles properly in Word does not require any investment in tools or systems. There are many low-cost Word to XML converters available that can convert these documents into XML and HTML.</para>
         <para>Apart from the obvious benefit of improved accessibility, other benefits include speed of publication, consistency of appearance, and low cost. This approach also allows pages to be created, reviewed and edited offline, unlike many typical Web Content Management Systems, which force all content to be created and edited online in a web form.</para>
      </abstract></front>
   <body>
      
      <para>This paper describes a process for making your website content comply with the accessibility requirements defined in the W3C Web Accessibility Initiative (<acronym.grp><acronym>WAI</acronym><expansion>Web Accessibility Initiative</expansion></acronym.grp>) Guidelines using Microsoft Word as the content editor, and XML as the enabling technology. To implement the process, you need one of the many low-cost   Word to XML converters currently available. In the near future, Word 2003 will have the capability to save as XML directly, and no third-party software will be required.</para>
         <para>The paper starts with the larger picture of what accessibility is about, and adopts a very broad definition of what true accessibility really entails. The general architecture of a Word and XML based website content maintenance solution is outlined. A number of techniques for authoring accessible content in Word are discussed. A case study of one particular website in the Irish public sector developed to WAI Level 3 compliance is discussed. Finally, the far from trivial task of migrating current web pages into the maintenance system is considered, and a partially automated approach suggested.</para>
         <section>
         <title>Defining the goal: what is an Accessible page?</title>
         <para>Accessibility is often considered in a narrow sense as the art of making web pages readable by people who are blind or visually impaired. The WAI Guidelines, which categorise each Guideline according to a Priority Level, includes only the following requirements for Level 1, the minimum level of accessibility.</para><randlist>
               <li>
                  <para>Include text equivalents for non-text elements, e.g.
<code>img/alt</code> attributes (1.1).</para>
               </li>
               <li>
                  <para>Avoid using only colour to convey information
(2.1).</para>
               </li>
               <li>
                  <para>Identify changes in the text language
(4.1).</para>
               </li>
               <li>
                  <para>Identify row and column headers in data tables
(5.1).</para>
               </li>
               <li>
                  <para>Organize content logically, so it can be read even
without style sheets (6.1).</para>
               </li>
               <li>
                  <para>Avoid causing the screen to flicker
(7.1).</para>
               </li>
               <li>
                  <para>Use clear and simple language (14.1).</para>
               </li>
            </randlist><para>This is a very unambitious target for accessibility, and it would be a pity to adopt WAI Level 1 (also called 'A' compliance) as an acceptable standard. In Ireland, the government has chosen WAI Level 2 ('AA' compliance) as the minimum standard, and this is a reasonable compromise between the needs of readers and cost of compliance. </para><para>True accessibility has a much broader and more ambitious scope than simply compliance with WAI Guidelines, however. Accessibility is about making web pages <i>readable</i>, <i>usable</i> and <i>navigable</i> for everybody, not just people with permanent or temporary physical impairments.</para><randlist>
               <li>
                  <para>
                     <b>Readability</b> involves
making sure that the language used on a website or a page is understandable by
its audience, and that spelling and grammar are correct. It also involves
ensuring that pages can be read properly, regardless of the browser or platform
being used by the reader.</para>
               </li>
               <li>
                  <para>
                     <b>Usability</b> means that
the website offers a convenient and efficient browsing experience to the reader,
and allows them to achieve the primary goal for which they are visiting the site
in the first place. This may be to complete a transaction, or more often, to
simply access a particular piece of information they require.</para>
               </li>
               <li>
                  <para>
                     <b>Navigability</b> means
that the page and the website can be easily traversed, the reader is aware of
where they are and how they got there, and information is organised according to
a logical system that readers can follow.</para>
               </li>
            </randlist><para>Websites should exhibit these characteristics for all readers, regardless of any impairment, physical or environmental, they may have, and regardless of their browsing platform.</para><para>Based on the broadest view of accessibility, we consider that an accessible web page must meet the following criteria.</para><randlist>
               <li>
                  <para>Complies with technical criteria of Level 3 of the
W3C WAI Guidelines</para>
               </li>
               <li>
                  <para>Validates against the HTML 4.01 Strict
DTD</para>
               </li>
               <li>
                  <para>Contains Dublin Core Metadata</para>
               </li>
               <li>
                  <para>Uses <acronym.grp><acronym>CSS</acronym><expansion>Cascading Style Sheets</expansion></acronym.grp> stylesheets for presentation</para>
               </li>
            </randlist><para>These criteria may seem quite demanding, but in fact it is possible to achieve them quite efficiently, simply using Microsoft Word as the authoring tool.</para><para>In addition to the formal technical standards mentioned above, it is also highly desirable that pages use best practice techniques for maximising accessibility, and gradually these techniques are being identified and disseminated in books (see &quot;Building Accessible Websites&quot; <bibref refloc="jclark"/> by Joe Clark and &quot;Constructing Accessible Web Sites&quot; <bibref refloc="jthatcher"/> by Jim Thatcher <i>et al.</i>) and online resources (see <a href="http://www.diveintoaccessibility.org/"/>).</para><subsec1>
            <title>Testing Accessibility</title>
            <para>There is a widespread view that any page that passes the basic Bobby
	(<a href="http://bobby.watchfire.com/"/>)
    test is accessible. This is not true. Bobby, or any other automated application, can only examine and report on the HTML <i>mark-up</i> used in a page. Tests can detect the use of inaccessible mark-up, but cannot detect the absence of accessible mark-up. The use of structural mark-up is a requirement for accessibility, so the following mark-up is inaccessible, but will not be detected by Bobby, or any other testing tool.</para>
            <code.block>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;</code.block>
            <para>You can test a web page and prove that it is <i>not</i> accessible, but you can't prove that it <i>is</i> accessible, without human inspection. Even some mark-up issues that can be tested, such as the use of access keys as keyboard shortcuts, are not actually tested by Bobby, so such tools are quite unsatisfactory at present.</para>
            <para>We believe that accessibility can only be achieved by implementing a publishing process that outputs accessible pages by default. Testing is then merely a Quality Assurance activity to ensure that the production process is in control.</para>
         </subsec1>
      </section>
      <section>
         <title>System architecture</title>
         <para>Many systems have been developed for publishing information on websites, from plain text editing in Notepad at the low end, to Web Content Management Systems (<acronym.grp><acronym>WCMS</acronym><expansion>Web Content Management Systems</expansion></acronym.grp>) at the high end. Most solutions ignore the reality that the information being published is initially generated using word-processors such as Microsoft Word or another word-processor. On all but the most expensive CMSs, integration of Word is limited to copy-and-paste facilities for text only, if you're lucky.</para>
         <para>Microsoft CMS 2002, for example, offers close integration of Word XP with the CMS, and it can automatically import Word documents and store them in XML in the database, but at over $40,000 per server license, it may have a limited market.</para>
         <para>If the goal is to <i>economically</i> and <i>efficiently</i> maintain accessible web pages, then much lower cost alternatives must be considered. It is clear that given the ubiquity of Microsoft Word, there are only two categories of tool which qualify: </para>
         <randlist>
            <li>
               <para>Word to HTML converters, and </para>
            </li>
            <li>
               <para>Word to XML converters that can also convert the XML
into HTML.</para>
            </li>
         </randlist>
         <para>We reviewed a limited number of Word to HTML converters on the market, but found none that attempt to generate accessible web pages. The main feature of these tools is that they can split long Word documents into smaller HTML chunks, and apply a common HTML template to every page. This is a big improvement on Words' own Save As HTML option, but nonetheless far from the standard we wish to achieve. In general they simply reproduce all the formatting found in Word, using HTML mark-up, and using the same poor-quality techniques to render content in the same fashion as Word itself. Examples include the use of mark-up tags to apply visual layout, typically <code>blockquote</code> to achieve indentation.</para>
         <para>The process of creating a Word document that can be published as an accessible web page has three stages, each of which is discussed below. </para>
         <seqlist>
            <li>
               <para>Author the content using Microsoft Word, applying
suitable mark-up,</para>
            </li>
            <li>
               <para>convert the Word document into XML, and</para>
            </li>
            <li>
               <para>convert the XML document into HTML.</para>
            </li>
         </seqlist>
         <figure><title>Word to XML to HTML conversion process</title><graphic href="04-01-03-fig01.gif" width="590px" height="341px"/></figure><subsec1>
            <title>Microsoft Word authoring</title>
            <para>It is not enough to convert Word content into XML and then HTML. Unless the original Word document has been carefully marked up using the available built-in styles, the quality of the output will still be poor. Therefore the author needs assistance and support to create structured content. Of the currently available XML converters, only eXportXML and <acronym.grp><acronym>YAWC</acronym><expansion>Yet Another Word Converter</expansion></acronym.grp> include this capability out of the box, by including Word templates. However, it is quite easy to develop your own templates that provide menus, toolbars and shortcut keys to simplify marking up documents.</para>
            <para>The following keyboard shortcuts can be assigned to insert common Word styles. Making these bindings explicit and visible, by providing toolbar and menu equivalents as well, greatly improves the ease-of-use for authors. </para>
            <table border="1" summary="Keyboard shortcuts for common styles">
               <caption>
                  Keyboard shortcuts for common styles
               </caption>
               <colgroup span="1">
                  <col width="50.191571%" span="1"/>
                  <col width="49.808429%" span="1"/>
               </colgroup>
					<thead>
                  <tr valign="top">
                     <td>To insert the Word style… </td>
                     <td>Use the keyboard shortcut…</td>
                  </tr>
					</thead>
               <tbody>
					   <tr valign="top">
                     <td>Title</td>
                     <td>&lt;Ctrl&gt;+T</td>
                  </tr>
                  <tr valign="top">
                     <td>Heading 1</td>
                     <td>&lt;Ctrl&gt;+1</td>
                  </tr>
                  <tr valign="top">
                     <td>Heading 2</td>
                     <td>&lt;Ctrl&gt;+2</td>
                  </tr>
                  <tr valign="top">
                     <td>Heading 3</td>
                     <td>&lt;Ctrl&gt;+3</td>
                  </tr>
                  <tr valign="top">
                     <td>List Bullet</td>
                     <td>&lt;Ctrl&gt;+8</td>
                  </tr>
                  <tr valign="top">
                     <td>List Number</td>
                     <td>&lt;Ctrl&gt;+9</td>
                  </tr>
                  <tr valign="top">
                     <td>Normal</td>
                     <td>&lt;Ctrl&gt;+0</td>
                  </tr>
               </tbody>
            </table>
            <para>The styles listed above are all built into Word, so there is no need to create them. There are a number of commonly occurring mark-up constructs in HTML that have no equivalent in Word, and these should be added as user-defined styles. The paragraph styles Table Title and Table Summary are essential for accessibility (mapping to the <code>caption</code> element and <code>table/summary</code> attribute), while other styles to map to the <code>blockquote</code> and <code>pre</code> elements may be required in some instances. The character styles Abbreviation and Acronym should be included if you wish to aim for Level 3 Accessibility.</para>
            <subsec2>
               <title>Images</title>
               <para>When images are required, authors can use screenshots and bitmapped images, if the tools to create web-compatible images are unavailable. This is because most Word to XML converters will automatically convert images to image formats such as GIF, JPEG or PNG. Users of Word 97 on Windows 98 will find this particularly useful, because they have no other way of converting images into web-compatible formats. Microsoft Paint supports only BMP on Windows 98, although it now supports GIF and JPEG in Windows 2000.</para>
               <para>When inserting images, authors must be encouraged or forced to fill in a caption, so that when the page is converted, the resulting <code>img/alt</code> attribute  can be automatically filled in with sensible text. Word 2000 can be configured to automatically add a Caption style immediately after an image is inserted, to prompt the author to fill in text.</para>
            </subsec2>
            <subsec2>
               <title>Metadata</title>
               <para>The use of metadata is a requirement for accessibility, and the standard for web page metadata is the Dublin Core Metadata Element Set
	(<a href="http://www.dublincore.org/"/>)
   . Metadata doesn't improve accessibility directly, but it does improve findability, as search engines can index the metadata on a site, enabling information to be found more easily.</para>
               <para>It is quite easy to create a dialog box using Words VBA editor to enable authors to fill in metadata fields. Many of the fields are fixed for a particular website, and others can be automatically deduced from the document itself. For example, the DC.Title field can be extracted from the document title, and the DC.Date.modified field from the date of conversion. Even the value of the DC.Identifier field can be at least partially if not fully deduced by the conversion software. The figure below shows how a metadata dialog box can look.</para>
               
               <figure><title>Dublin Core Metadata dialog box</title><graphic width="372px" height="401px" href="04-01-03-fig02.jpg"/></figure><para>You should configure the search engine on your website to use this metadata to improve the quality of search results. The Open Source search engine Swish-E can be configured in this way. The date fields are particularly useful for sorting results chronologically, and the Type field, if used, can also assist users in limiting results to relevant pages, e.g. Press Releases.</para>
               <para>Unfortunately, the major search engines such as Google and Atomz do not index Dublin Core metadata elements, so using this metadata doesn't help users find your website among all the others. However, as part of the XML to HTML conversion process, you could duplicate metadata terms so that the metadata used by Internet search engines are also included.</para>
            </subsec2>
         </subsec1>
         <subsec1>
            <title>Word to XML conversion</title>
            <para>There are many low-cost Word to XML converters available in the market, and a number of them support a two-stage conversion process, first from Word into XML, and secondly from XML into HTML, using the XSLT stylesheet language. Because XML information is structured, it is easily manipulated, so converting XML into accessible HTML is achievable. You do have to write the XSLT stylesheet yourself, but this is a once-off task, and allows you to clean up the initial XML mark-up when converting to HTML. The following stand-alone tools support this two-stage process by default.</para>
            <randlist>
               <li>
                  <para>UpCast,
Infinity Loop
	(<a href="http://www.infinity-loop.de/"/>)
    (Germany)</para>
               </li>
               <li>
                  <para>eXportXML,
Schultz
	(<a href="http://www.schultz.dk/"/>)
    (Denmark)</para>
               </li>
               <li>
                  <para>YAWC Pro,
XML Workshop Ltd.
	(<a href="http://www.xmlw.ie/"/>)
    (Ireland)</para>
               </li>
               <li>
                  <para>W2XML, docsoft
	(<a href="http://www.docsoft.com/"/>)
   ,
(US)</para>
               </li>
            </randlist>
            <para>However, there are many other XML converters available, and most of them could be integrated into a system that achieves the same result of a two-stage process, with a little bit of work.</para>
            <para>A more recent development are real-time online Word document converters, which use the ASP model to offer a service to clients who may prefer not to install and support conversion software on multiple PCs. </para>
            <randlist>
               <li>
                  <para>In New Zealand,
3months.com
	(<a href="http://www.3months.com/"/>)
    offer a service to government
departments to convert Word documents into multiple web pages.</para>
               </li>
               <li>
                  <para>In the US,
Metaverse
	(<a href="http://www.metaverse.cc/"/>)
    offer a Word to XML conversion
service, which can be integrated as part of a larger CMS.</para>
               </li>
               <li>
                  <para>In Ireland, my company,
XML Workshop Ltd.
	(<a href="http://www.xmlw.ie/"/>)
   , offers an online Word to
HTML conversion and publishing service at
YAWC Online
	(<a href="http://www.yawconline.com/"/>)
   .</para><para>This service implements the process described in this paper.</para>
               </li>
            </randlist>
            <para>Given the universal nature of the output standards, in principle a single service could handle all countries, but in practice, there are some national variations in the area of metadata usage, so custom solutions for each jurisdiction are still desirable.</para>
            <para>When converting Word into XML, the question arises of which XML vocabulary to use. There are plenty of candidates, including DocBook, XHTML 1.0, whatever default XML output is supported by the chosen conversion tool, or even a schema or DTD you define yourself.</para>
            <para>We recommend that you do not use a schema particular to a single tool, for two reasons. Firstly it ties you unnecessarily to a single vendor. If you want to change to a different tool later, you will need to rewrite  code. Secondly, the typical schema of such tools is presentation- rather than structure-oriented, and really just RTF with pointy brackets rather than accolades. The mark-up is much more comprehensive and complete than you need. For example, font sizes, colour and family are marked up in XML, and you don't need or want this level of detail.</para>
            <para>Since HTML 4.01 Strict is the target, the closest XML-compatible equivalent is XHTML 1.0 Strict. We defined a subset of this DTD by stripping out the bits we didn't need or like (forms, character entity declarations) and adding in bits we wanted, such as Dublin Core metadata elements and hierarchically nested sections. We have called the result WebDoc
	(<a href="http://www.xmlw.ie/webdoc/"/>)
   , and made it publicly available on our website. This DTD turns out to be quite like the draft XHTML 2.0 schema being worked on by the W3C.</para>
            <para>XML documents can be maintained for each Word document, and can be very useful. For example, if you would like to change the graphic design of an entire site, it is quite easy to do so by changing the main HTML template used for the design, and then regenerating all of the web pages from the XML copies of each file.</para>
            <para>If you generate an XML file that will be stored and possibly re-used, then in fact you will generally need to carry out two XSLT transformations rather than one. The first transformation converts the initial XML output from your chosen Word converter into XML according to your preferred DTD. The second transformation converts from your DTD into HTML.</para>
            <para>The first transformation can also do useful things with the metadata, for example resolving automated references. Rather than force authors to explicitly enter values for each metadata field, you can automatically assign some values from the document itself, or the date of conversion. The DC.Title field can be extracted from the document title, the DC.Date.modified field can be taken from the date of conversion, and the DC.Identifier field can be taken from the file name, and possibly the directory path.</para>
            <para>You can insert a placeholder keyword (e.g. '{auto}') into metadata fields that can be automatically derived, when creating your Word template. A big advantage of this approach is that many authors copy existing documents to create new ones. Using keywords means that this can be done safely, and the author doesn't have to remember to edit the metadata. The less manually entered metadata there is, the easier for authors, and the less room for error and omission.</para>
         </subsec1>
         <subsec1>
            <title>XML 	to HTML conversion</title>
            <para>Accessibility guidelines suggest that the latest version of any W3C Recommendation should be used when deciding which mark-up language to use. This would mean using XHTML 1.0 for your web pages. However, we recommend HTML 4.01 Strict instead, because it is compatible with all browsers and versions. Amazingly, even Microsoft Internet Explorer 6.0 does not fully support XHTML 1.0, so there is no real benefit to be gained from generating XML- rather than SGML-compatible HTML.</para>
            <para>All XML to HTML conversion should be done using XSLT. XSLT is the most widely used language for transforming XML into HTML, and XSLT processors are supported in most browsers, web servers, and relational databases, in addition to standalone tools.</para>
            <para>In converting the XML generated from Word into HTML, a number of different tasks must be carried out. Firstly, an appropriate XHTML template, containing the target websites look and feel, must be wrapped around the content of the page. The template must be in XHTML rather than HTML, or the XSLT processor will not be able to read it.</para>
            <para>The template can be designed in either of three ways. The simplest approach is to use absolute paths for all images, hyperlinks, stylesheets etc., allowing a single HTML template to be used for the whole website. However, this approach has a major disadvantage, which is that the resulting web pages will probably not look or behave properly on a local copy of the website, since all the links are absolute. </para>
            <para>A second approach is to use relative paths, and maintain a different template for each level in the site, but this makes template maintenance more complex.</para>
            <para>A third approach is to use a single HTML template with relative paths, but dynamically calculate the depth of the page from the value of the DC.Identifier metadata field that contains the complete URL of the page. The XSLT transformation can then add an appropriate number of &quot;../&quot; characters to each link. For example, a page with the URL www.domain.gov.ie/press/2003/02/20030211.htm is at level 3 in the directory hierarchy (count the forward slashes after the domain name), and so each link in the template should point 3 levels up. The home page link should contain &quot;../../../index.htm&quot;. On level 1, the home page link would contain &quot;../index.htm&quot;. </para>
            <para>The benefit of this more complex code is that each generated HTML page will contain links that work both locally on the file system, and remotely on the live website. This is important not just for navigation purposes, but also for appearance, as any images and stylesheets used in the design are also referenced by links. A navigable local copy of the website can be maintained as a set of HTML files, which is useful if you don't have a permanent Internet connection, want to minimise bandwidth usage, run tests, or don't want to install a local webserver just in order to navigate a local copy of the site.</para>
            <para>A second task in the conversion process is to handle metadata. This can be easily transposed from the XML file, assuming it has been properly stored. </para>
            <para>The value of the<code>head/title</code> element must be filled in. This is an important value, because it is stored in the browser history list, and also used when pages are bookmarked. Inserting a default value for this element makes navigation much more difficult for everybody. It is fairly easy to add the document title to this field, and it is even better if you can add an orientation indicator of where on the website the page belongs, e.g. the Press or Publications area. This can be done in a number of ways. For example, you could examine the DC.Type field for clues, or keep other metadata in the document specifically for this purpose.</para>
            <para>Another task is providing explanations of acronyms and abbreviations, using the title attribute of these elements (e.g. <code>&lt;acronym title=&quot;Extensible Markup Language&quot;&gt;XML&lt;/acronym&gt;</code>). We believe the best way to achieve this is to maintain a separate file for mapping acronyms to their full expansion. (It is quite difficult to embed the expansion into the Word file, for example using a bookmark, index marker, or comment text.) When converting into HTML, this file can be automatically examined, and any acronyms searched for. If it is found, the title attribute can be filled in with the expanded form of the acronym, and included in the final HTML.</para>
            <para>This is a bit of extra work, but worth it if your site makes frequent use of acronyms. The file used to contain the mappings is really just a Glossary, and should probably be maintained as a web page in its own right, anyway.</para>
            <para>If you do use acronyms, you should also define a visual cue for the HTML presentation, so that readers can recognise the acronym text. The default presentation of HTML does not show acronyms or abbreviations at all. A dotted underline seems appropriate (in CSS: acronym { text-decoration: dotted underline}). Some browsers will display the expansion as popup text when you move the mouse over the acronym, and others display it in the status bar. Screen readers such as JAWS or Home Page Reader will speak the expansion text.</para>
         </subsec1>
      </section>
      <section>
         <title>Case study: Dublin City Council Disability Services</title>
         <para>Dublin City Council
	(<a href="http://www.dublincity.ie/"/>)
    developed a handbook containing details of services available for people with disabilities. These services range from wheelchair-accessible libraries to postal voting to sign-language services. They wished to publish the handbook online on their website, and make it as accessible as possible, including complying with Level 3 of the WAI Accessibility Guidelines.</para>
         <para>Achieving this goal involved three different components: HTML template page design, additional information, and content mark-up.</para>
         <randlist>
            <li>
               <para>
                  <b>HTML template:</b> we
designed a special HTML template that maximised the navigability of the subsite
for people with disabilities, by including extra navigation aids, such as links
to skip past navigation links to the content, and shortcut keys to support
mouse-free navigation. A left navigation bar with links to each page on the site
was also added. The readability of the subsite was enhanced through use of
strong colour contrasts and relative font sizes for text, so that it could be
easily read by those with low vision, and magnified within the
browser.</para>
            </li>
            <li>
               <para>
                  <b>Additional
information:</b> Accessibility requires a site map page
with a comprehensive listing of all areas of the site. We created this page
based on the different sections and subsections of the website, and included
hyperlinks to link directly to each subsection for speedy navigation. We also
added an accessibility statement, which provides details about the accessibility
features of the site, including shortcut keys, standards compliance, the visual
design, and metadata.</para>
            </li>
            <li>
               <para>
                  <b>Content mark-up:</b> the
text content of the entire subsite was marked up using Microsoft Word. The
initial text was created as a single 20 page file, so this was broken into a
number of files, to keep each page short, minimise download times, and speed
navigation. A hyperlinked mini table of contents was added to the top of each
page, to provide a quick overview of the contents, and enable readers to jump
directly to their area of interest. Changes in the language (some text was in
Irish) were also marked up, as this is important for screen readers. Identifying
the language of text avoids screen readers attempting to pronounce text in a
language they don't support.</para>
            </li>
         </randlist>
         <para>After the mark-up stage was completed, each page was converted into HTML using an online Word to HTML conversion service called YAWC Online
	(<a href="http://www.yawconline.com/"/>)
   , which has been developed and recently launched by our company. Setting up a customisation for a particular website on YAWC Online simply involves uploading the XHTML template required to be added around each web page. </para>
         <para>The HTML generated by YAWC Online complies with Level 3 of the WAI Accessibility Guidelines without any post-conversion manual tweaking, assuming you have designed a suitable template. YAWC Online can also be configured to publish the converted pages, if you supply FTP login and password details.</para><figure><title>Screenshot of DCC Disability Services subsite</title><graphic width="550px" height="400px" href="04-01-03-fig03.jpg"/></figure>
         <para>The Disability Services subsite is now maintained directly by the staff of the Social Inclusion Unit of the Council, using the Microsoft Word files we supplied. They can update the site as necessary, while still maintaining maximum accessibility, without any knowledge of either HTML or accessibility. The subsite is online at DCC Disability Services
	(<a href="http://www.dublincity.ie/disability/"/>)
   .</para>
         <para>Other public sector websites in Ireland using Word and XML to maintain accessible web pages include Teagasc
	(<a href="http://www.teagasc.ie/"/>)
    (the Irish Agriculture and Food Development Authority) and the Department of Enterprise Trade and Employment
	(<a href="http://www.entemp.ie/"/>)
   . These sites are vastly larger in scale (circa 2,000 pages each) than the Disability site, but nonetheless are maintained using Microsoft Word, and generally are WAI Level 2 compliant. </para>
         <para>The basic approach of authoring in Word, with conversion to XML and accessible HTML, has also been applied to other areas, such as the creation of online learning materials for third-level educational institutions.</para>
      </section>
      <section>
         <title>Migrating existing web pages into Word</title>
         <para>If you are persuaded by the advantages of maintaining web pages in Word, the first major issue is how do you get your current pages into Word, so I can start maintaining them. You can simply open them in Word 2000 and save them in Words native binary format (.doc), but this is not the most efficient approach, because the paragraph-level mark-up will be mostly incorrect. For example, the <code>h1</code> element is mapped to a H1 style in Word, rather than the built-in Heading 1 or Title styles.</para>
         <para>Luckily, Word 2000 can read a richer form of HTML, in which the native Word styles are defined as class attributes for each element. The following table shows some examples, and these should work in all language editions of Word.</para>
         <table border="1" summary="Optimum HTML markup for import into Word 2000" title="Optimum HTML markup for import into Word 2000">
            <caption>
               Optimum HTML mark-up for import into Word 2000
            </caption>
            <colgroup span="1">
               <col width="36.827195%" span="1"/>
               <col width="43.342776%" span="1"/>
               <col width="19.546742%" span="1"/>
            </colgroup>
           <thead>
                <tr valign="top">
                  <td>Normal HTML markup</td>
                  <td>Word 2000 HTML markup</td>
                  <td>Word style</td>
               </tr>
           </thead>
           <tbody>
               <tr valign="top">
                  <td><code.block>&lt;p&gt;</code.block></td>
                  <td><code.block>&lt;p class=&quot;MsoNormal&quot;&gt;</code.block></td>
                  <td>Normal</td>
               </tr>
               <tr valign="top">
                  <td><code.block>&lt;h1&gt;</code.block></td>
                  <td><code.block>&lt;h1 class=&quot;MsoHeading1&quot;&gt;</code.block></td>
                  <td>Heading 1</td>
               </tr>
               <tr valign="top">
                  <td><code.block>&lt;ul&gt;&lt;li&gt;</code.block></td>
                  <td><code.block>&lt;p class=&quot;MsoListBullet&quot;&gt;</code.block></td>
                  <td>List Bullet</td>
               </tr>
               <tr valign="top">
                  <td><code.block>&lt;ol&gt;&lt;li&gt;</code.block></td>
                  <td><code.block>&lt;p class=&quot;MsoListNumber&quot;&gt;</code.block></td>
                  <td>List Number</td>
               </tr>
            </tbody>
         </table>
         <para>By pre-processing HTML pages to apply the additional mark-up recognised by Word 2000, it is possible to greatly reduce the amount of manual mark-up required to migrate existing web pages into Microsoft Word. Much of the styling required for each document can be automatically applied, and many of the metadata fields filled in. If your website contains a large number of pages, then any automation will save a significant amount of effort and money. By default, all navigation links, logos, etc., will also be imported, but you can also have this automatically removed in many cases, if the markup is consistent.</para>
         <para>We have developed an online demonstration for this process, by developing an XSLT stylesheet that converts a HTML page into Word 2000-compatible HTML. It uses HTMLTidy to first convert the page into well-formed XML, and also attempts to strip out navigation and other standard text from the page, leaving just the text content. You can test our Online HTML to Word Converter here
	(<a href="http://www.xmlw.ie/services/html2word/index.htm"/>)
   .</para>
      </section>
      <section>
         <title>Conclusion</title>
         <para>This paper describes a process that allows you to maintain accessible web pages simply and efficiently using Microsoft Word and XML. Implementing this process in your organisation would not cost a lot of money, and would allow you to dramatically improve the quality of your web pages. For smaller websites, where a full Web CMS is overkill, it is an ideal low-cost solution to the maintenance problem. For many larger websites, it is still a very attractive solution, if the information is mainly textual in nature. The process is not disruptive, in that you can leave existing web pages alone, and use Word to maintain only new pages. The process can also co-exist with other content maintenance systems that may be in use, such as dynamic database-backed areas of the site.</para>
         <para>If you don't have the resources to develop and support the software required for this process, an online Word to HTML conversion service, YAWC Online
	(<a href="http://www.yawconline.com/"/>)
   , which implements the system described, and generates very accessible HTML, is available at a low cost.</para>
      </section>
   </body>
<rear><bibliog><bibitem id="jclark"><bib>JC 2002</bib><pub>Clark, Joe 2002. <i>Building Accessible Websites</i>, New Riders Publishing, Indianapolis, USA. ISBN 0-7357-1150-X</pub></bibitem><bibitem id="jthatcher"><bib>JT 2002</bib><pub>Thatcher, Jim, et al. 2002. <i>Constructing Accessible Web Sites</i>, glasshaus, Birmingham, UK. ISBN 1-904151-00-0</pub></bibitem></bibliog></rear></gcapaper>
