0
votes

I'm using a subprocess call to untar a file in the command line, I need to use the output of that call to stream into a temp file so I can read the contents of the "+CONTENTS" folder with in the tgz file.

My failed output is:

./streamContents.py rsh: ftp: No address associated with hostname tar (child): ftp://myftpserver.com/pkgsrc/doxygen_pkgs/test. tgz: Cannot open: Input/output error tar (child): Error is not recoverable: exiting now

gzip: stdin: unexpected end of file tar: Child returned status 2 tar: Error exit delayed from previous errors Traceback (most recent call last): File "./streamContents.py", line 29, in stream = proc.stdout.read(8196) AttributeError: 'int' object has no attribute 'stdout'

#!/usr/bin/python

from io import BytesIO
import urllib2
import tarfile
import ftplib
import socket
import threading
import subprocess

tarfile_url = "ftp://myftpserver.com/pkgsrc/doxygen_pkgs/test.tg
z"

try:
    ftpstream = urllib2.urlopen(tarfile_url)
except URLerror, e:
    print "URL timeout"
except socket.timeout:
    print "Socket timeout"


# BytesIO creates an in-memory temporary file.
tmpfile = BytesIO()
last_size = 0
tfile_extract = ""

while True:
    proc = subprocess.call(['tar','-xzvf', tarfile_url], stdout=subprocess.PIPE)
    # Download a piece of the file from the ftp connection
    stream = proc.stdout.read(8196)
    if not stream: break
    tmpfile.write(bytes(stream))
    # Seeking back to the beginning of the temporary file.
    tmpfile.seek(0)
    # r|gz forbids seeking backward; r:gz allows seeking backward
    try:
       tfile = tarfile.open(fileobj=tmpfile, mode="r:gz")
       print tfile.extractfile("+CONTENTS")
       tfile_extract_text = tfile_extract.read()
       print tfile_extract.tell()
       tfile.close()
       if tfile_extract.tell() > 0 and tfile_extract.tell() == last_size:
          print tfile_extract_text
          break
       else:
          last_size = tfile_extract.tell()
    except Exception:
       tfile.close()
       pass


tfile_extract_text = tfile_extract.read()
print tfile_extract_text

# When you're done:
tfile.close()
tmpfile.close()
1
why are you repeatedly calling tar in the loop? - Padraic Cunningham
Sorry, I didn't catch that, I see that is part of my problem. Initially I was trying to stream directly from the tar file. The tarfile module would not let me stream directly from it because it needs to build the index prior to it letting me stream. - digitalbyte
Also, running tar on the ftp URL seems wrong. You need to save the file to disk and run tar on the local file. - vikramls
I have changed from trying to directly open the tarfile_url, and instead called the ftpstream variable. proc = subprocess.call(['tar','-xzvf', ftpstream], stdout=subprocess.PIPE) So if I follow correctly what @vikramls is saying I'm trying to untar the file still on the ftp server, right? - digitalbyte

1 Answers

0
votes

Expanding on my comment above, you need to do download the tar file using urllib2 and tempfile to a temporary file and then open this temporary file using tarfile.

Here's some code to get started:

import urllib2
import tarfile
from tempfile import TemporaryFile

f_url = 'url_of_your_tar_archive'
ftpstream = urllib2.urlopen(f_url)
tmpfile = TemporaryFile()

# Download contents of tar to a temporary file
while True:
    s = ftpstream.read(16384)
    if not s:
        break
    tmpfile.write(s)
ftpstream.close()

# Access the temporary file to extract the file you need
tmpfile.seek(0)
tfile = tarfile.open(fileobj=tmpfile, mode='r:gz')
print tfile.getnames()
contents = tfile.extractfile("+CONTENTS").read()
print contents