CAPTCHA Image Generator Image Editor in C#
Apr 27

Printer Friendly Version

Download Source Code: OfficeProperties.zip - 61.15KB

With Primary Interop Assemblies

Open now our OfficeProperties solution. This is with managed C# code, for .NET. We'll continue to extract and show, in the Debug window, summary and custom properties of the three Office documents included with the projects, for demo purposes: test.doc, test.xls and test.ppt are Word, Excel and PowerPoint documents with empty content, but some summary and custom properties set. An incomplete solution to this issue is offered on Microsoft's site, as How To Use Automation to Get and to Set Office Document Properties with Visual C# .NET.

private static object missing = Missing.Value;

public static void ShowWordProperties(object filename)
{
    // Launch Word (in hidden mode) and open the document
    Word.Application app = new Word.Application();

    object readOnly = true;
    object visible = false;
    Word._Document doc = app.Documents.Open(ref filename,
        ref missing, ref readOnly, ref missing, ref missing,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref visible, ref missing,
        ref missing, ref missing);
    WriteLine(">>> " + filename.ToString());

    // Show all builtin and custom document properties
    ShowBuiltinProperties(doc.BuiltInDocumentProperties);
    ShowCustomProperties(doc.CustomDocumentProperties);

    // Cleanup
    object save = false;
    doc.Close(ref save, ref missing, ref missing);
    app.Quit(ref save, ref missing, ref missing);
}

public static void ShowExcelProperties(string filename)
{
    // Launch Excel (in hidden mode) and open the document
    Excel.Application app = new Excel.Application();
    Excel.Workbook doc = app.Workbooks.Open(filename,
        missing, true, missing, missing, missing, missing,
        missing, missing, missing, missing, missing,
        missing, missing, missing);
    WriteLine(">>> " + filename);

    // Show all builtin and custom document properties
    ShowBuiltinProperties(doc.BuiltinDocumentProperties);
    ShowCustomProperties(doc.CustomDocumentProperties);

    // Cleanup
    doc.Close(false, null, null);
    app.Quit();
}

public static void ShowPowerPointProperties(string filename)
{
    // Launch PowerPoint (in hidden mode) and open the document
    PowerPoint.Application app = new PowerPoint.Application();
    PowerPoint.Presentation doc = app.Presentations.Open(
        filename, MsoTriState.msoTrue, MsoTriState.msoTrue,
        MsoTriState.msoFalse);
    WriteLine(">>> " + filename);

    // Show all builtin and custom document properties
    ShowBuiltinProperties(doc.BuiltInDocumentProperties);
    ShowCustomProperties(doc.CustomDocumentProperties);
    
    // Cleanup
    doc.Close();
    app.Quit();
}
Custom Properties of a Microsoft PowerPoint document
Custom Properties of a Microsoft PowerPoint document

A first striking remark is the number of Missing.Value parameters we have to specify for each optional argument in the Open methods. This is because C# requires all method parameter values to be specified.

A second remark is we need to use late-bind method invokes for each value read, early-binding is not possible. Collections are VB-like collections, different from the System.Collection namespace classes. Again, we had to use Reflection, to get strong-typing methods for Item, Value and Name properties within each collection.

The most serious limitation was we were not able to enumerate the built-in properties collection the natural way. It doesn't matter for this collection we also have pre-defined get/set property methods for each summary property. We should be able to enumerate it and automatically discover all its items at run-time.

What we finally did is use the WdBuiltInProperty enum type from Word application, which contains a constant for each built-in property. This will do it.

// Enumerate builtin properties collection
// of any kind of Office document
public static void ShowBuiltinProperties(object builtinProps)
{
    WriteLine("Builtin Properties:");
    Type etype = typeof(Word.WdBuiltInProperty);
    foreach (int i in Enum.GetValues(etype))
    {
        object item = GetPropertyValue(builtinProps, "Item", i);

        object val = null;
        try { val = GetPropertyValue(item, "Value"); }
        catch { continue; }

        string name = Enum.GetName(etype, i).Substring(10);

        WriteProperty(name, val);
    }
}

// Enumerate custom properties collection
// of any kind of Office document
public static void ShowCustomProperties(object customProps)
{
    WriteLine("Custom Properties:");
    int count = Convert.ToInt32(
        GetPropertyValue(customProps, "Count"));
    for (int i = 1; i <= count; i++)
    {
        object item = GetPropertyValue(customProps, "Item", i);
        object val = GetPropertyValue(item, "Value");
        string name = GetPropertyValue(item, "Name").ToString();

        WriteProperty(name, val);
    }
}

public static object GetPropertyValue(object source,
    string name, params object[] parameters)
{
    return source.GetType().InvokeMember(name,
        BindingFlags.Default | BindingFlags.GetProperty,
        null, source, parameters);
}

Fortunately, the custom properties collection is not only an enumeration, but also an indexed collection, with a Count property. Instead of enumerating all custom properties, we were able to traverse the collection as an array.

Microsoft created custom Interop wrappers for its Office components. The so-called Redistributable Primary Interop Assemblies (PIA) can be downloaded from their web site, but you still need to have Office previousy installed on your computer.

For XP, the PIAs can be found here. Most PIAs come as Microsoft.Office.Interop.{product} DLLs, and similar Interop assemblies can be generated in Visual Studio or Express when you simply reference Office object libraries. Essentially, first Office reference will also create a reference to Microsoft.Office.Core. You also get references to each product (Word, Excel, PowerPoint) and to the VBIDE, which is in fact the VBA support.

Briefly stated, a lot of external components and too many applications to reference and load. And the execution times are comparable with VB6 application's, so pretty slow.

With DSOFile Component

Best way, to date, to access Office document properties from managed .NET code is maybe through the DSOFile component. This is a short name for Microsoft Developer Support OLE File Property Reader, a COM component implemented by the support guys, at Microsoft, in C++.

As with any COM component implemented the old-fashioned way, you'll have no other choice, you have to register it first manually. After you download dsofile.dll, from our project or from Microsoft's site (this comes with full C++ source code), don't forget to Run regsvr32.exe with the full path where this DLL was saved. Then make sure an Interop assembly was properly created when you referenced DSOFile.dll from the project. No other reference, to Office components or PIAs, is required.

DSOFile has been specifically created and optimized to access only summary and custom document properties. DSOFile will directly access these properties through IPropertyStorage COM interface, bypassing the Office application. It makes sense: as long as these properties are stored through a standard COM interface, why would we need to load first a full Office application?!

dsofile.dll Object Model
dsofile.dll Object Model

The object model exposed by DSOFile is clean and self-explanatory. And, as our OfficePropertiesDso class code below proves, it is indeed a pleasure to finally have a simple, fast and natural manner to enumerate those values. Still using some COM component written in unmanaged code, but better than previous solutions:

public static void GetDocumentProperties(string filename)
{
    OleDocumentPropertiesClass doc
        = new OleDocumentPropertiesClass();
    doc.Open(filename, true,
        dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess);

    WriteLine("==================== " + filename);
    WriteLine("File Properties:");
    PropertyInfo[] props
        = typeof(OleDocumentPropertiesClass).GetProperties();
    foreach (PropertyInfo prop in props)
        WriteProperty(prop, doc);

    WriteLine("Summary Properties:");
    SummaryProperties summ_props = doc.SummaryProperties;
    props = typeof(SummaryProperties).GetProperties();
    foreach (PropertyInfo prop in props)
        WriteProperty(prop, summ_props);

    WriteLine("Custom Properties:");
    CustomProperties cust_props = doc.CustomProperties;
    foreach (CustomProperty cust_prop in cust_props)
        WriteProperty(cust_prop.Name, cust_prop.get_Value());

    doc.Close(false);
}

Subscribe and Share: Subscribe using any feed reader Bookmark and Share

1 Comment

1. Michael from N.J. Says:
I was having a lot of trouble with this. Looked at tens of topics on the issue and finally found your text.
Well done! :) It looks like you covered most possible ways to deal with this. I didn't know about WdBuiltInProperty; it looks like something I could use.
About DSOFile... I really didn't move to .NET to have to deal with registration of COM components. Why oh why the MS guys do not provide alternative .NET assemblies?!
Best Wishes,
Michael
 

Leave a Reply