0
votes

I've only been using python for a few days and am writing a program that takes a csv file and a txt file from command line. The csv file contains names and a number that is the longest sequence of a type of DNA (example John, 26, 45, 23). The txt file contains a set of dna which is basically a string of char. I need to put the csv file into a list of lists then the txt file into a variable. A function I'm given will return an int for the longest occurrence of that strand. I run that function three times to get my 3 ints to compare. Currently my below program when I run it just sits a while then says: killed.

import csv
import sys


def main():

    # TODO: Check for command-line usage
    if len(sys.argv) != 3:
        print("Missing Files!")
        sys.exit(1)
    # TODO: Read database file into a variable
    dnalist = []
    with open(sys.argv[1], "r", newline = ''):
        dnalist = list(csv.reader(sys.argv[1]))
        for row_list in dnalist:
            dnalist.append(row_list)

    # TODO: Read DNA sequence file into a variable
    with open(sys.argv[2], "r") as f:
        sequence = f.read()

    # TODO: Find longest match of each STR in DNA sequence
    chdnalist = []
    subsequence = "AGAT"
    chdnalist[0] = longest_match(sequence, subsequence)
    subsequence = "AATG"
    chdnalist[1] = longest_match(sequence, subsequence)
    subsequence = "TATC"
    chdnalist[2] = longest_match(sequence, subsequence)

    # TODO: Check database for matching profiles
    for i in range (0, len(dnalist)):
        if dnalist[i][1] == chdnalist[0] and dnalist[i][2] == chdnalist[1] and dnalist[i][3] == chdnalist[2]:
            print (dnalist[i][0])
            return
    print ("No Match")
    return


def longest_match(sequence, subsequence):
    """Returns length of longest run of subsequence in sequence."""

    # Initialize variables
    longest_run = 0
    subsequence_length = len(subsequence)
    sequence_length = len(sequence)

    # Check each character in sequence for most consecutive runs of subsequence
    for i in range(sequence_length):

        # Initialize count of consecutive runs
        count = 0

        # Check for a subsequence match in a "substring" (a subset of characters) within sequence
        # If a match, move substring to next potential match in sequence
        # Continue moving substring and checking for matches until out of consecutive matches
        while True:

            # Adjust substring start and end
            start = i + count * subsequence_length
            end = start + subsequence_length

            # If there is a match in the substring
            if sequence[start:end] == subsequence:
                count += 1

            # If there is no match in the substring
            else:
                break

        # Update most consecutive matches found
        longest_run = max(longest_run, count)

    # After checking for runs at each character in seqeuence, return longest run found
    return longest_run


main()
When python exits and prints KILLED it generally means your program ran out of memory for whatever it's attempting to compute. - h0r53
how large is that text file? - Matiiss
The txt file is 161 characters. The csv file im currently using is 4 lines and the bigger one I havent tried yet is 24. - Scout_vet