0
votes

I am using lxml to fetch text inside tags, and doing in this way

  xpaths_for_questions_lxml = []
    for tag in self.tree.iter():
        try:
            if tag.text and utils.is_question(tag.text.strip()):
                xpaths_for_questions_lxml.append(self.tree.getpath(tag))

        except Exception as e:
            self.logger.debug(traceback.format_exc())
            raise Exception

is_question module returns true if the statement has question mark

But when tag type is label the tag.text attribute is empty, it is not showing any text even though there is text inside the label tag in the actual webpage.

Why label tag is not showing any text content? or anything additional needed to be done to fetch through label tags?

EDIT1: My question is, i am iterating through all children in dom tree, but why text inside label is not showing up?

1
Which element exactly you need to handle? - Andersson
any question in the page - Satyaaditya

1 Answers

1
votes

If you want to get questions you can try

r = requests.get('https://www.amctheatres.com/faqs/movie-info')
source = html.fromstring(r.text)
questions = source.xpath('//label[@itemprop="text"]/text()')

or

questions = [label.text_content() for label in source.xpath('//label[@itemprop="text"]')]

Note that label.text_content() should be used instead of label.text because label nodes contain more than one child text nodes

print(questions)
#['Does the runtime shown for each movie include trailers?', 'Where can I find MPAA movie ratings information?', 'What does advertised showtime mean?', 'What movies are playing right now at AMC?', 'What movies are coming soon to AMC?', 'How can I find movie times at AMC?']