|Products Purchase Publishing Articles Support Company Contact|
Articles > COM > Reading Office Document Properties
By Stjepan Pejic
Microsoft Office applications use OLE Structured Storage technology to subdivide the document file into multiple streams of data. While structured storage files can be used for a wide array of uses, by far the familiar use is as the underlying file format for Office documents. Many of the streams of data within an OLE Structured Storage document use proprietary formats unique to each application. However Microsoft has also defined standard streams of data that hold document properties such as the author, description, time and date of last edit, and so on. These document properties can be viewed from any application that supports this standard format.
Desaware's StorageTools allows Visual Basic programmers to read the property information from Office documents. This allows you to create programs that can search, categorize, view and even modify the properties of documents ranging from Excel spreadsheets, to PowerPoint slides to Word documents. This article will demonstrate how Desaware's StorageTools makes this possible. The sample code uses the demo version of the Storage ATL library, which is included with the sample code for the article.
Most of the property information describing an Office document is contained in the SummaryInformation and DocumentSummaryInformation streams. This is the data that you set with the File Properties menu in most Office applications, or in the Summary, Statistics and Custom tabs in the Explorer file properties dialog box. This information is easy to retrieve with StorageTools. After adding a reference to the StorageTools component to your application, you open the file and read in the elements of the SummaryInformation stream as shown here:
Dim dwStorage1 As New dwStorage Dim RootStorage As dStorage
Set RootStorage = dwStorage1.OpenStorageFile (file, flags)
' Load the SummaryInformation stream from the file If (RootStorage.siOpenSummaryInfo() = False) Then Debug.Print "This file has no Summary Info" Exit Sub End If
All the fixed properties of the SummaryInformation and DocumentSummaryInformation streams can be read with method calls from the RootStorage object. For example, to read the Author property, you would use the following line:
Dim sAuthor as String
' get the name of the document sAuthor = RootStorage.siGetAuthor()
There are separate functions to read and write each property in the SummaryInformation and DocumentSummaryInformation streams. There are also some helper functions to do things like increment the number of revisions, or record the current date and time as the time the document was printed.
There are also method calls for reading and writing the Custom properties in a structured storage file. To get the name and data of the first custom property, the following code can be used:
Dim vData as Variant Dim sName as String
sName = RootStorage.dsiUserDirectory(0) ' the Name vData = RootStorage.dsiUserGet(0) ' the Data
These summary information streams give you general information that pertains to any Office document (except for some PowerPoint specific items in the DocumentSummaryInformation stream). With the ability to access this information, you can create programs that search through Office document for specific authors, or for files that were printed on a certain date, and so on. Naturally, StorageTools allows you to modify this information as well.
It goes without saying that only a small portion of the data associated with a document is stored in the standard property streams. The actual document, formatting information, macros, embedded objects, and other application specific information is contained in the other streams and storages of Office files. While StorageTools does not directly parse the data in Office files, it does let you access data if you know where the information is stored. A number of articles in the Microsoft Developer's Network CD-ROM and Microsoft web site show you how to do things like peek into .xls files to see what version of Excel wrote the file. These articles are written for C programmers, but it is not difficult to port them to Visual Basic, as the StorageTools control closely matches the OLE structured storage methods. Some examples can be found here:
To show you some of the things you can do with this information, we have created a sample program called "StorageInfo". The StorageInfo program reads the property information from the SummaryInformation and DocumentSummaryInformation streams and places it in a form that is similar to the Explorer file properties dialog. If you use it to open an Excel file, it uses StorageTools' ability to read any stream to determine if the file contains any macros and which version of Excel created the file. This sample program only reads the items; it does let the user edit the information presented. For an example of saving changes back to disk, refer to the StorageBrowser program that comes with the full StorageTools product demo or with the StorageTools package itself.
You can download the sample code for this article from Desawares ftp site at ftp.desaware.com.
For notification when new articles are available, sign up for Desaware's Newsletter.
|Products Purchase Articles Support Company Contact