1
votes

I believe I have a general Perl problem, rather than an LWP::UserAgent problem... however its somewhat complex.

The task is to write a test-script that does a SWORD deposit. I create tests by first writing code to prove the thing works, then add in the Test::More wrappers to make it a test.

BACKGROUND

A SWORD deposit is simply an http post request with a bunch of defined headers, and the content of the body being the thing to be ingested. This all works fine, I can perform the actions through CURL, and I've written scripts to do this.... but within a a larger application environment (that'll be EPrints.)

CODE

My problem, I believe, comes when I try to attach the contents of the file on the disk.

#!/home/cpan/bin/perl
use strict;
use warnings;

use LWP::UserAgent;
##use WWW::Mechanize;
use File::Slurp;
use MIME::Base64;

my $auth     = 'username:password';
my $domain   = 'devel.example.com';

my $ua = LWP::UserAgent->new();

my $basedir = "./test_files";

my $package  = 'http://opendepot.org/europePMC/2.0';
my $filename = "$basedir/PMC165035.zip";
my $mime     = 'application/zip';
print "filename: $filename\n";
my $deposit_url = $domain . '/sword-app/deposit/archive';

my $file = read_file( $filename, { binmode => ':raw' } );

# Set up the SWORD deposit
my $autho = "Basic " . MIME::Base64::encode( $auth, '' );

my %headers = (
  'X-Packaging'         => $package,
  'X-No-Op'             => 'false',
  'X-Verbose'           => 'true',
  'Content-Disposition' => "filename=$filename",
  'Content-Type'        => $mime,
  'User-Agent'          => 'Broker Test Harness',
  'Authorization'       => $autho,
); 
my $r = $ua->post( $deposit_url, %headers, Content => $file );

# DEBUG TEST
write_file('foo.zip', $file);

my $ret = $r->decoded_content;
print "Content: $ret\n";
if ( $r->is_success ) { print "Deposited $package successfully" }

WHAT WORKS, WHAT DOESN'T

This code is lifted pretty much directly from working code I have - the only difference is that the working code gets the content for $file via an object-call within EPrints.

I know the file exists on the disk, if I do an ls -l on the filename printed, I can see the file, and its readable

In the code above, there is a line write_file('foo.zip', $file); - that writes a file which unzip -l foo.zip happily tells me has 3 files in it.

The line print "Content: $ret\n"; should print an atom response - for me, it prints nothing.... The Access log reports an error 500, but there's diddly-squat in the error-log.

The help

What I need to know is how I get the actual contents of the .zip file into the content part of the LWP::UserAgent post request...

(I'm going to spend much time not trying to dig into EPrints, to track where the error-500 is coming from, and why nothing appears in the log file.... but that's probably going to be down to an issue with what's been posted)

1
There is no protocol (http://) at the start of $deposit_urlBorodin
You also need binmode => ':raw' in your call to write_fileBorodin
It is best to use unzip -t foo.zip to verify your zip file. Alternatively, if you're just copying one zip file to an other then why not use `File::Copy?Borodin

1 Answers

1
votes

The solution lies in realizing what LWP POST is doing.

my $filename = "$basedir/PMC165035.zip";
my $file = read_file( $filename, { binmode => ':raw' } );
my %headers = (
  'X-Packaging'         => $package,
  'X-No-Op'             => 'false',
  'X-Verbose'           => 'true',
  'Content-Disposition' => "filename=$filename",
  'Content-Type'        => $mime,
  'User-Agent'          => 'Broker Test Harness',
  'Authorization'       => $autho,
); 

All work by setting $filename to be something like /home/services/foo/testing/test_files/PMC165035.zip, and passing this (full) filename to the server example.com.

The problem is that the server is looking for a filename, not a filename-with-path... so when the service does its thing with the file by dumping the content into its temporary upload location, and then it looks for ~~~temp_location/home/services/foo/testing/test_files/PMC165035.zip, it can't find it!

The solution is to read in the file, but ensure that the filename given in the headers is just the filename, not with-a-path