0
votes

This is my first attempt at writing something with argparse and I am really lost. The goal of this script is to read file.sdf, then write it back out as file2.sdf

This is my script:

import argparse
import rdkit as rdkit
from rdkit.Chem import PandasTools
import pandas as pd

parser = argparse.ArgumentParser(description='This is a work in progress')
parser.add_argument("-i", "--input", help="path to input sdf file")
parser.add_argument("-o", "--output", help="path to output sdf file")
args = parser.parse_args()

df = rdkit.Chem.PandasTools.LoadSDF(r"args.input") 
PandasTools.WriteSDF(df, r"args.output", properties=list(df.columns))

When I run this script like

python script.py --input file.sdf --output file2.sdf

I get this error

File "C:\Users\lkv206\Anaconda3\envs\rdkit\lib\site-packages\rdkit\Chem\PandasTools.py", line 456, in LoadSDF
    f = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'args.input'

If I open and run a jupyter notebook with this code:

import rdkit as rdkit
from rdkit.Chem import PandasTools
import pandas as pd

df = rdkit.Chem.PandasTools.LoadSDF(r"file.sdf")
PandasTools.WriteSDF(df, r"file2.sdf", properties=list(df.columns))

It successfully gives me the desired output, file2.sdf

So it seems like the code works without argparse, but I can't get it to work with argparse. I'm guessing I did something wrong in

parser.add_argument

or how I called it later.

I was working off this tutorial: https://www.youtube.com/watch?v=cdblJqEUDNo&ab_channel=JohnnyMetz and can't understand where I went wrong

1
Do you understand the difference between args.input and r"args.input"? - chepner
Why did you think you need to quote args.input? - Barmar
@chepner No, but I needed it in the Jupyter Notebook so figured I'd need it here as well? In the notebook version, I originally had it as just the ("file.sdf"). But that didn't work, giving me this error: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape somewhere I saw a similar issue that was solved with (r"file.sdf"). I'm not sure why that works either, something about string literals? I figured args.input was a string variable that just inserted the path, so I would still need to quote it and add the r. - DK_chemistry

1 Answers

1
votes

args is an object. Try:

import argparse
import rdkit as rdkit
from rdkit.Chem import PandasTools
import pandas as pd

parser = argparse.ArgumentParser(description='This is a work in progress')
parser.add_argument("-i", "--input", help="path to input sdf file")
parser.add_argument("-o", "--output", help="path to output sdf file")
args = parser.parse_args()

df = rdkit.Chem.PandasTools.LoadSDF(args.input) 
PandasTools.WriteSDF(df, args.output, properties=list(df.columns))