Using Microsoft Word to write content for the web – background

Understand your objectives

When writing content for a website, it can make sense to turn to Microsoft Word to write the original copy because it is a very capable word processing program, but you need to be aware of the differences between a Word document and a web page.

Your objective is to produce text output with (not too complex) formatting. Your Word document will require processing to convert it into ‘clean’ HTML (Hypertext Markup Language – the code used for web pages) and the conversion process works best if the Word document is not too complicated. After this your web designer – or you, if you have the skills – will almost certainly have to make some adjustments to the HTML to make it display correctly on the variety of different-sized devices used to browse web pages.

Mapping Word content to HTML

Web page structure

One of the objectives of using semantic HTML in web pages is to make it easier for search engines to parse your website for indexing. Another is to simplify the work browsers have to do to render your pages.

In broad terms, modern web pages are structured into headings, lists, tables, columns and paragraphs and these can be contained in articles, sections, nav (ie navigation/menu sections), headers and footers. There are other elements, but that’s a good starter pack.

The headings have levels h1, h2, h3, h4, h5 and h6, though we would only recommend using h1-h4 in normal usage.

Word equivalents

Word too has headings, lists, tables, columns and paragraphs and they can be mapped to their HTML equivalents with some minor constraints.

There is no out-of-the-box equivalent of the articles, sections and nav containers though it may be possible to have some convention within the text to mark these up.

Word’s headers and footers are used in a different way and so you should expect no mapping between these and the web page.

Converting documents from Word to HTML

It’s possible, though tedious, to manually convert Word documents to HTML. It was this tedium that prompted us to create our own Word-to-HTML converter to expedite this process. The CazMiranda converter aims to produce ‘clean’, semantic HTML suitable for use with mobile-friendly websites and not a pixel accurate rendition of the Word document.

In CazMiranda, the upload process is started from the web page editor; other systems may vary. There is a button on the menu which is labelled Import from Word that kicks off the process. That said, most of our clients just mail us the Word document and we take care of its publishing.

When converting documents, any embedded assets like pictures have to be put in a convenient location on the website where they can be cropped, scaled and optimised.

It should be noted that the CazMiranda converter can be extended to meet particular requirements ie if we receive a number of documents with particular formatting, then we try to create a generalised approach to converting them. If it’s just a one-off, manual conversion is usually more economic.

Word-to-HTML conversion limitations

Saving a page as HTML within Word does not produce good results

As mentioned above, when creating web pages, designers look to structure the page to allow it to be indexed easily by search engines. If you just use Word’s capability to save as web page then the results will not give that semantic structure and it’s extrememly unlikely that it will render well on a smartphone.

The results are also incredibly verbose because Word is trying to deliver a pixel accurate version of the Word document. This should not be your objective – you’re just after semantically structured content.

Rigid formatting is not appropriate for a web page

Web pages are continuous and not paginated. This is a fundamental conceptual difference worthy of a bit of cogitation because things like page breaks are going to be ignored by the converter.

In the early days of the web graphic designers tried to make their pictures pixel accurate because that’s the way they had done things with print media. This echoes with what the built-in Word converter is trying to do. But the web’s not like that and it didn’t work then and it doesn’t work now because of the variety of screen resolutions and browsers. You have to think ‘flow’.

These days the content of a web page has to deal with the extremes of a smartphone and 8K/UHD TV. Web designers are capable of producing websites that can react differently according to the size of display – a technique known as responsive design. This means that you should not attempt any position-dependent formatting like sidebar boxes. Just put sidebars in the flow for the time being and the web designer can then re-arrange them after conversion.

There is no equivalent to tabs on the web

If you rely on tabs to get the formatting you want in Word, you may run into difficulty upon conversion to HTML. Tabs do not exist in HTML – they can be simulated with span tags, but this can quickly get messy.

Returns are not a good way of vertical spacing

In Word there are ‘hard’ returns which convert to HTML paragraph tags and soft returns which convert to the ‘break’ tag.

Converting multiple carriage returns will create multiple (empty) paragraph tags, and empty HTML tags are not a good idea. Some content management systems will strip out empty paragraphs during the conversion process.

Vertical spacing on the web is achieved using margin and padding attributes on the appropriate HTML elements and these may have to be added in after the Word document has been imported.

Tables of data

There is not just the technical issue of how to fit a table of data into a website, there is the problem of how the user is going to read it within the website constraints – designers frequently set a maximum width to a page – and there is the other issue of small screens.

A table with two or three columns should not present any problems, but when you start getting lots of columns it’s going to be necessary to have some kind of scrolling mechanism within a ‘viewport’. Given that tables are there for comparing and contrasting data, if the user can’t see the entire table they are going to have to try and remember hidden values.

If you expect to have a lot of tables of data on your website, you should discuss this with the website designer to make sure that they can be accommodated.

Pictures embedded in Word may not be the right size for web use

Quite reasonably Word embeds any image within its file at the imported resolution and then scales it on the printed page according to internal defaults or a custom size that you have applied. The downside of this is that the image may not be immediately appropriate for web use because it’s the wrong scale or resolution. Usually it’s too large and it’s as well to remember that some people with mobile devices are paying by the megabyte on their phone contracts. What with that and the fact that large files take longer to download the user experience is not going to be good.

Ideally your converter and/or content management system will extract the files to somewhere where you can go through them, optimising them for your website.

We have seen instances where people embed text in an image. This rarely converts well and alternatives should be considered. Although sometimes the text is merely for labelling parts of an image, it has also been used by a designer to get their font of choice. These days it’s much easier to get a ‘nice’ font on your website using web fonts.

References

For more information about web page structure see the relevant page in our multi-page SEO guide.