000-000-0000
skip to the main content area of this page
Free conversion of docx to many popular formats.

A newbie's guide to .docx

.docx is the default file format in the latest version of Microsoft Word in Office 2007. All docs in the new office family is based upon an open, standardized specification called Office Open XML.

One .docx file is actually a collection of many files, stored in an archive or (zip-file). Let's dive into an example .docx file and see what's inside

A sample word document with a bit of formatting

Here is an ordinary document (in Norwegian - sorry). A little bit of text and some formatting. Now, you could work with this document as any other document created earlier with Word. However, for those interested in the internal representation read on...

Step 1: Rename your document to .zip

A .docx file as it appears in Windows Explorer 

Open the document in Windows Explorer, right click and rename to .zip as shown here:

A .docx file renamed to .zip in Windows Explorer 

Step 2: Extract zip-file to a new folder

Once the file is renamed to .zip you can use it like any other zip-file. Obviously, we want to look inside. And this is where the magic appears. Extract to the current folder and a number of files and directories appears like this:

Contents of a .docx file extracted from a zip-file 

Step 3: Explore the various files and folders

In the rool level, we have 3 folders "_rels", "docProps" and "Word". In addition with have a file called [Content_Types].xml. The [Content_Types].xml file describes the contents of the zip-package and is used internally to Word as a table of contents for further processing. The rels folder will hold a map of all the relationships within the package. It is a map over all the files in the package and how they relate to each other.

Folder: _rels

In a minimum document it holds one file .rels which is a xml-file like this:

Folder: docProps

docProps folder in Windows Explorer

The docProps folder contains at least app.xml and core.xml. The files hold meta-information about a document, such as it's creator, when it was last opened, saved, edited and so forth. It also holds the word-count, number of paragraphs etc. For our sample document app.xml looks like this:

Folder: word

word folder in Windows Explorer

Now moving on to the word folder we get to the actual content of the word document. From the folder structure above you can see a number of xml-files. The most important of all xml-files in the entire zip-package is the document.xml Why? Because it is here the content as you know it is stored. Let's look at it from our Hello World example:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

- <w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officedocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officedocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
- <w:body>
- <w:p w:rsidR="00B45928" w:rsidRDefault="004D007F" w:rsidP="004D007F">
- <w:pPr>
  <w:pStyle w:val="IntenseQuote" />
  </w:pPr>
- <w:r>
  <w:t>Heisann, dette er en test!</w:t>
  </w:r>
  </w:p>
  <w:p w:rsidR="004D007F" w:rsidRDefault="004D007F" w:rsidP="004D007F" />
- <w:p w:rsidR="004D007F" w:rsidRPr="004D007F" w:rsidRDefault="004D007F" w:rsidP="004D007F">
- <w:r>
  <w:t>Jommen sa jeg smør....</w:t>
  </w:r>
  </w:p>
- <w:sectPr w:rsidR="004D007F" w:rsidRPr="004D007F" w:rsidSect="00B45928">
  <w:pgSz w:w="11906" w:h="16838" />
  <w:pgMar w:top="1417" w:right="1417" w:bottom="1417" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
  <w:cols w:space="708" />
  <w:docGrid w:linePitch="360" />
  </w:sectPr>
  </w:body>
  </w:document>

Here you recognize our text from the first screenshot. Within the special xml-files we have our content in plain text....*phew* So, if you are really desperate and need the actual text from an document - this is the place to look. But, I recommend that you use this online conversion tool, or even better - purchase Microsoft Office 2007 and start creating .docx files yourself.

Recommended

Valid XHTML 1.0 Transitional