2
votes

i'm new to python & here is my question: Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.

This is the file:

But soft what light through yonder window breaks

It is the east and Juliet is the sun

Arise fair sun and kill the envious moon

Who is already sick and pale with grief

Desired Output:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

This is my code:

fname = raw_input("Enter file name: ")
fh = open(fname)

lst = list()

#loop through the text to get the lines
for line in fh:

    line = line.rstrip()

    #loop through the line to get the words       
    for word in line:
        words = line.split()

        #if a word is not in the empty list, append it       
        if not word in lst: lst.append(word)

lst.sort()
print lst

My output:

[' ', 'A', 'B', 'I', 'J', 'W', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'v', 'w', 'y']

If you could tell me what is wrong (to get only the first letters of the words with a space in the beginning instead of the whole words), it would be great..

Note: I want the code using these instructions, not other advanced instructions (to keep my learning sequence)

Thank you

4
When you write for word in line:, Python will loop through what it considers the "elements" of the string line: That is, it will loop through the string character by character. So word is always a single character. (Note that there is some confusion here anyway: you define words, but never use it.) - Mees de Vries

4 Answers

0
votes

You should be calling split() on the line, not the words in the line.

for line in file:
    result = line.split()  # this returns a list of values
    for word in result:
        # check if it already is in your list of words
list.sort()
0
votes

Let's take the code line-by-line

for line in fh:
    line = line.rstrip()

So now our first line contains "But soft what light through yonder window breaks", and it's a string.

for word in line:

ah, but now, we've said "Let's iterate over line (a string) and let word be each part of it when we go through the loop. But line is a complete string! When you iterate over a string like this you get one letter at a time, which is what you're seeing in your results.

Instead, don't have that for loop and just split the line as you were before:

for line in fh:
    line = line.rstrip()
    words = line.split()
    for word in words:
        if word not in lst:
            lst.append(word)
0
votes

Try this:

with open('test.txt') as f:
    words = []
    for line in f:
        if line:
            words.extend(line.split())

    print(sorted(set(words)))

Output:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
0
votes

just small thing you are missing is this line

for word in line.split():    #split() gives you list of word seprated by space 

by doing this you are making line => list of words

right now it is list of char(a simple string). try to print word in your example and print word after using line.split() you will get better idea.

Checkout this link > how to use split