0
votes

I am trying to do web scraping using python. When i try to create a data frame to store my variable with extracted information, it shows "ValueError: If using all scalar values, you must pass an index". I already check other related post in this website by trying to indexing {'trade_name':trade_name}, index=[0]), but still unable to solve. Please help.

import pandas as pd
import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = 'https://www.medindia.net/doctors/drug_information/abacavir.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
drug = soup.find(class_='mi-container__fluid')
print(drug)

# whole page contain drug content
items = drug.find_all(class_='report-content drug-widget')
print(items)

# extract drug information from drug content into individual variable
trade_name = print(items[0].find(class_='drug-content').get_text())
function = print(items[1].find(class_='drug-content').get_text())
Contraindications = print(items[2].find(class_='drug-content').get_text())
Dosage = print(items[3].find(class_='drug-content').get_text())
how_to_use = print(items[4].find(class_='drug-content').get_text())
warnings = print(items[5].find(class_='drug-content').get_text())
storage = print(items[7].find(class_='drug-content').get_text())


drug_stuff = pd.DataFrame(
        {
                'trade_name':trade_name,
                'function': function,
                'Contraindications': Contraindications,
                'Dosage': Dosage,
                'how_to_use':how_to_use,
                'warnings':warnings,
                'storage':storage,
                
        })


print(drug_stuff)

1
print() always return None - so trade_name = print( ...) works like trade_name = None. Remove print() to assign value to variabla trade_name = items[0].find(class_='drug-content').get_text()furas
i have done remove print, but still show same error when trying to create data frame with the variable.Shawn Teh
to create DataFrame you have to use list with elments - even if you have only one element - 'trade_name': [trade_name], ..., 'storage': [storage], - and then it works without warningfuras
Thanks. That solve my problem. Now i need to know how to clean the newline in my data.Shawn Teh
maybe get_text(strip=True). Eventually text = text.replace("\n", "")furas

1 Answers

0
votes

First: print() always return None - so trade_name = print(...) works like trade_name = None and you get nothing. Remove print() to assign value to variabla

 trade_name = items[0].find(class_='drug-content').get_text()

To create DataFrame you have to use list with elements - even if you have only one element -

'trade_name': [trade_name], ..., 'storage': [storage],

and then it works without warning.

If you use without [] then it may treats strings in trade_name, storage as lists of chars and put every char in separated row. Because strings have different number of chars so they create different number of rows which may make problem and it shows warning.


BTW: you could use get_text(strip=True) to remove some useless spaces, tabs, enters from text.


import pandas as pd
import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = 'https://www.medindia.net/doctors/drug_information/abacavir.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
drug = soup.find(class_='mi-container__fluid')
#print(drug)

# whole page contain drug content
items = drug.find_all(class_='report-content drug-widget')
#print(items)

# extract drug information from drug content into individual variable
trade_name = items[0].find(class_='drug-content').get_text(strip=True)
function = items[1].find(class_='drug-content').get_text(strip=True)
contraindications = items[2].find(class_='drug-content').get_text(strip=True)
dosage = items[3].find(class_='drug-content').get_text(strip=True)
how_to_use = items[4].find(class_='drug-content').get_text(strip=True)
warnings = items[5].find(class_='drug-content').get_text(strip=True)
storage = items[7].find(class_='drug-content').get_text(strip=True)


drug_stuff = pd.DataFrame({
    'trade_name': [trade_name],
    'function': [function],
    'contraindications': [contraindications],
    'dosage': [dosage],
    'how_to_use': [how_to_use],
    'warnings': [warnings],
    'storage': [storage],
})

print(drug_stuff)