0
votes

The Problem

I would like to pass my 'value' variable from my dictionary (created from a simple csv file) into a subprocess sed call within python, the problem is I get an error:

sed: -e expression #1, char 1: unknown command: `''

When I run the following script:

import sys
import subprocess

speciesdictfile = open("speciesfiletest.csv",'r')

file = sys.argv[1]

dict = {}

for line in speciesdictfile:
    fields = line.split(',')
    dict[fields[0]] = fields[1]


for line in file:
    for key, value in dict.items():
        if file == key:
            subprocess.call(["sed", "'s/>/>" + value + "_/g'", file])

and when I try this instead:

subprocess.call(['sed', 's/>/>' + value + '_/g', file])

I get the following error:

sed: -e expression #1, char 30: unterminated `s' command

Example input

Dictionary CSV file:

file,Species
GCF_000006175.1_ASM617v2_genomic.faa,Methanococcus voltae
GCF_000006805.1_ASM680v1_genomic.faa,Halobacterium sp.

The file I want to be search and replacing, for example with a filename of GCF_000006175.1_ASM617v2_genomic.faa:

>NZ_LT985082.1_1_1
EQVWKSIKKYMAYYLFDTIEFMEKLFEKEFYRIVNRDSYYKNWISKFIMIN*
>NZ_LT985082.1_2_1
MKFNISKLWNPTGFFISFFMSFLMPIMFAVPFGYIPIDIFLYQQLIRWPVAYFIVTLIVI
PISLYLAKSFFTFPPTDRFFNPVTFFISLQMSFIMPFLLGYGFGSMSLNILFLMWPMRWV
VAYFMVNFAIRPLSISLARIVFNVEPQHLIIKF*

Desired output

A working sed command, replacing each instance of a line with '>' on it, with '>' followed by the value variable without spaces such as this:

>Methanococcus_voltae_NZ_LT985082.1_1_1
EQVWKSIKKYMAYYLFDTIEFMEKLFEKEFYRIVNRDSYYKNWISKFIMIN*
>Methanococcus_voltae_NZ_LT985082.1_2_1
MKFNISKLWNPTGFFISFFMSFLMPIMFAVPFGYIPIDIFLYQQLIRWPVAYFIVTLIVI
PISLYLAKSFFTFPPTDRFFNPVTFFISLQMSFIMPFLLGYGFGSMSLNILFLMWPMRWV
VAYFMVNFAIRPLSISLARIVFNVEPQHLIIKF*
1
try shell=True on the subprocess.call()Tom Lubenow

1 Answers

0
votes

The problem was that there were newline characters being taken from the csv file. I solved it with:

import sys
import subprocess

speciesdictfile = open("speciesfiletest.csv",'r')

file = sys.argv[1]

dict = {}

for line in speciesdictfile:
    fields = line.rstrip().split(',')
    dict[fields[0]] = fields[1]


for line in file:
    for key, value in dict.items():
        if file == key:
            subprocess.call("sed -e 's/>/>" + value + "_/g' " + file, shell=True)

The line

fields = line.rstrip().split(',')

Stopped the newline characters being stored in the ditionary and this allows them to be used in the subprocess.call sed command.