1
votes

I'm trying to extract data from multiple Tables in a Word document. When trying to convert the data in the tables to text I get an error. The ConvertToText method has two optional parameters(how to seperate the data, and a boolean).Here is my current code:

#usr/bin/perl
#OLEWord.pl

#Use string and print warnings
use strict;use warnings;
#Using OLE + OLE constants for Variants and OLE enumeration for Enumerations
use Win32::OLE qw(in);
use Win32::OLE::Const 'Microsoft Word';
use Win32::OLE::Variant;

my $var1 = Win32::OLE::Variant->new(VT_BOOL, 'true');

$Win32::OLE::Warn = 3;

#set the file to be opened
my $file = 'C:\work\SCL_International Financial New Fund Setup Questionnaire V1.6.docx';

#Create a new instance of Win32::OLE for the Word application, die if could not open the application
my $MSWord = Win32::OLE->GetActiveObject('Excel.Application') or Win32::OLE->new('Word.Application','Quit');

#Set the screen to Visible, so that you can see what is going on
$MSWord->{'Visible'} = 1;
 $MSWord->{'DisplayAlerts'} = 0; #Supress Alerts, such as 'Save As....'

#open the request file or die and print warning message
my $Doc = $MSWord->{'Documents'}->Open($file) or die "Could not open ", $file, " Error:", Win32::OLE->LastError();

#$MSWord->ActiveDocument->SaveAs({Filename => 'AlteredTest.docx', 
                            #FileFormat => wdFormatDocument});

my $tables = $MSWord->ActiveDocument->{'Tables'};

for my $table (in $tables){
   my $tableText = $table->ConverToText(wdSeparateByParagraphs,$var1);
   print "Table: ", $tableText, "\n";
}


$MSWord->ActiveDocument->Close;
$MSWord->Quit;

and I'm getting this error:

Bareword "VT_BOOL" not allowed while "strict subs" in use at OLEWord.pl line 31
Bareword "true" not allowed while "strict subs" in use at OLEWord.pl line 31
Execution of OLEWord.pl aborted due to compilation errors.

4

4 Answers

3
votes

When things like VT_BOOL are not defined as constant, perl will consider them bareword. Others already provided info on them.

The root cause of your problem are missing constants that are exported by Win32::OLE::Variant module. Add:

use Win32::OLE::Variant;

to your script to remove first error. The second one is similar problem, true is not defined as well. Replace it with 1 or define constant yourself with:

use constant true => 1;

Edit: Here is example of extracting table text:

my $tables = $MSWord->ActiveDocument->{'Tables'};
for my $table (in $tables){
   my $tableText = $table->ConvertToText({ Separator => wdSeparateByTabs });
   print "Table: ", $tableText->Text(), "\n";
}

In your code you had typo in method name ConverToText. Also the method returns Range object, so you have to use Text method to get actual text.

2
votes

A 'Bareword' error is caused by a syntax error in your code. A 'runaway multi-line' usually pinpoints where the start of the error is, and usually means that a line has not been completed, often because of mismatched brackets or quote marks.

As has been pointed out by several SO-ers, that doesn't look like Perl! The Perl interpreter is balking on a syntax error because it doesn't speak that particular language! Source

Not using strict will not give you the warning. (But you should use it for a good code)

Read about Bareword so that you will know what are they and you will know by your own that how can you correct this error.

Here are some links for study about Bareword: 1. perl.com 2. alumnus

1
votes

removing "use strict" will remove the "Bareword" errors

0
votes

extract all the doc tables into a single xls file

     sub doParseDoc {

           my $msg     = '' ; 
           my $ret     = 1 ; # assume failure at the beginning ...

           $msg        = 'START --- doParseDoc' ; 
           $objLogger->LogDebugMsg( $msg );
           $msg        = 'using the following DocFile: "' . $DocFile . '"' ; 
           $objLogger->LogInfoMsg( $msg );
           #-----------------------------------------------------------------------
           #Using OLE + OLE constants for Variants and OLE enumeration for Enumerations


           # Create a new Excel workbook
           my $objWorkBook = Spreadsheet::WriteExcel->new("$DocFile" . '.xls');

           # Add a worksheet
           my $objWorkSheet = $objWorkBook->add_worksheet();


           my $var1 = Win32::OLE::Variant->new(VT_BOOL, 'true');

           Win32::OLE->Option(Warn => \&Carp::croak);
           use constant true => 0;

           # at this point you should have the Word application opened in UI with t
           # the DocFile
           # build the MS Word object during run-time 
           my $objMSWord = Win32::OLE->GetActiveObject('Word.Application')
                             or Win32::OLE->new('Word.Application', 'Quit');  

           # build the doc object during run-time 
           my $objDoc   = $objMSWord->Documents->Open($DocFile)
                 or die "Could not open ", $DocFile, " Error:", Win32::OLE->LastError();

           #Set the screen to Visible, so that you can see what is going on
           $objMSWord->{'Visible'} = 1;
           # try NOT printing directly to the file


            #$objMSWord->ActiveDocument->SaveAs({Filename => 'AlteredTest.docx', 
                                        #FileFormat => wdFormatDocument});

           my $tables        = $objMSWord->ActiveDocument->Tables();
           my $tableText     = '' ;   
           my $xlsRow        = 1 ; 

           for my $table (in $tables){
              # extract the table text as a single string
              #$tableText = $table->ConvertToText({ Separator => 'wdSeparateByTabs' });
              # cheated those properties from here: 
              # https://msdn.microsoft.com/en-us/library/aa537149(v=office.11).aspx#officewordautomatingtablesdata_populateatablewithdata
              my $RowsCount = $table->{'Rows'}->{'Count'} ; 
              my $ColsCount = $table->{'Columns'}->{'Count'} ; 

              # disgard the tables having different than 5 columns count
              next unless ( $ColsCount == 5 ) ;

              $msg           = "Rows Count: $RowsCount " ; 
              $msg           .= "Cols Count: $ColsCount " ; 
              $objLogger->LogDebugMsg ( $msg ) ; 

              #my $tableRange = $table->ConvertToText({ Separator => '##' });
              # OBS !!! simple print WILL print to your doc file use Select ?!
              #$objLogger->LogDebugMsg ( $tableRange . "\n" );
              # skip the header row
              foreach my $row ( 0..$RowsCount ) {
                 foreach my $col (0..$ColsCount) {

                    # nope ... $table->cell($row,$col)->->{'WrapText'} = 1 ; 
                    # nope $table->cell($row,$col)->{'WordWrap'} = 1  ;
                    # so so $table->cell($row,$col)->WordWrap() ; 

                    my $txt = ''; 
                    # well some 1% of the values are so nasty that we really give up on them ... 
                    eval {
                       $txt = $table->cell($row,$col)->range->{'Text'}; 
                       #replace all the ctrl chars by space
                       $txt =~ s/\r/ /g   ; 
                       $txt =~ s/[^\040-\176]/ /g  ; 
                       # perform some cleansing - ColName<primary key>=> ColName
                       #$txt =~ s#^(.[a-zA-Z_0-9]*)(\<.*)#$1#g ; 

                       # this will most probably brake your cmd ... 
                       # $objLogger->LogDebugMsg ( "row: $row , col: $col with txt: $txt \n" ) ; 
                    } or $txt = 'N/A' ; 

                    # Write a formatted and unformatted string, row and column notation.
                    $objWorkSheet->write($xlsRow, $col, $txt);

                 } #eof foreach col

                 # we just want to dump all the tables into the one sheet
                 $xlsRow++ ; 
               } #eof foreach row
               sleep 1 ; 
           }  #eof foreach table

           # close the opened in the UI document
           $objMSWord->ActiveDocument->Close;

           # OBS !!! now we are able to print 
           $objLogger->LogDebugMsg ( $tableText . "\n" );

           # exit the whole Word application
           $objMSWord->Quit;

           return ( $ret , $msg ) ; 
     }
     #eof sub doParseDoc