OpenOffice is a must have software in my arsenal.It is free and easy to use.But recently I faced weird problem using this software.I was working on conversion of CSV file to Word file.During this process I was facing a strange problem while opening a file.
OpenOffice failed to open the file.It was throwing below message while opening the file
Read-Error format error discovered in the file in sub-document content.xml at 219,64 (row,column)
Honestly I never faced such problem before.I was wondering whether my conversion program has some issues.But after spending some time I found the conversion program is working fine.The problem was with OpenOffice.So I installed LibreOffice.I tried opening the file in LibreOffice as well.LibreOffice was also throwing the same problem.
After that I mailed the file to my friend with Microsoft Office.The file opened successfully in Microsoft Office.This confirmed my thought that the problem is with OpenOffice only.LibreOffice is based on OpenOffice only.After spending sometime with the file and analyzing the location reported by OpenOffice I was able to solve the problem.
There is not much available on Internet as well about this problem.So I am reproducing all the steps here.Later in the tutorial I will share some observations which might save you lots of time.You can directly modify your file without following below steps.
- Extract the file using 7 Zip.I prefer 7 Zip.But other extraction programs may also work.It is better to extract the file inside a new folder.So that you are aware of all the files generated by extracting the file.
- You can see styles.xml,mimetype,meta.xml and content.xml files inside the extracted folder.Apart from these files there is also META-INF folder.
- Now open the content.xml file with any text editor.I would recommend using Notepad++.Notepad++ is an excellent multi purpose text editor software.I have written separate post on its search and replace capabilities.It can search and replace text in multiple files within a directory.In case you are interested then you can read the post here.
- The problem with Notepad and Wordpad are that they do not show line numbers.So it is tough to find out the problem reason.Notepad++ gives you line numbers.The error message have row and column.The row corresponds to line number.So Notepad++ makes it easier to locate the line in which you have problem.
- Now go to the column to see the character creating the problem.You need to delete the character from your OpenOffice document or use different character in its place.
- You can make above modification and save the file.After saving the file you should try opening it again.
- It should work fine.But if not then it will be facing issue in some other row and column.
- You should do same procedure as above for problematic row and column.
I faced this problem many a times while working on above project.The characters causing the problem were & (ampersand) and < (less than symbol). So if you are facing Read Error in sub document at content.xml you can skip above procedure and do as mentioned below. First evaluate impact on your document if you replace & and < with their meaning.If there is no huge impact on your document then you can replace these special characters with their appropriate meaning in the document.After making the changes you can open your document to see if it is working fine. I have not faced problem due to other characters.These were the only two characters.I do not know the reason why these characters were creating the problem.This is may be task of excellent OpenOffice persons.They can make appropriate changes to handle this scenario as well.