0
votes

Hi I am trying to run my first pyspark code in jupyter notebook. I got the issue as SparkContext' object has no attribute 'prallelize. Could you please help me out?

The codes are as shown below:

import findspark
findspark.init()
findspark.find()
import pyspark
findspark.find()

gives me the result as: C:\Users\Owner\spark-3.0.0-bin-hadoop2.7\spark-3.0.0-bin-hadoop2.7'--

from pyspark import SparkContext,SparkConf
from pyspark.sql import SparkSession
conf = pyspark.SparkConf().setAppName('SparkApp').setMaster('local')
sc = pyspark.SparkContext(conf=conf)
spark = SparkSession(sc)


myRDD = sc.prallelize([('Ross',19),('Joey',18),('Rachel',16),('Pheobe',18),('Chandler',17),('Monica',20),('Ram',25),('Hari',10)])

The above code gave me error as shown below: AttributeError: 'SparkContext' object has no attribute 'prallelize'

2

2 Answers

3
votes

You can try:

 from pyspark.sql import SparkSession 

 spark = SparkSession.builder.master("local").getOrCreate() 
 sc = spark.sparkContext
 rdd_names = sc.parallelize([(1, "Joe"), (2, "Thomas"), (3, "Michael"), (4, "Sean")])
1
votes

Its a type error, it should be

myRDD = sc.parallelize([('Ross',19),('Joey',18),('Rachel',16),('Pheobe',18),('Chandler',17),('Monica',20),('Ram',25),('Hari',10)])