Not able to create spark dataframe by pyspark

Question

I want to create spark dataframe by using PySpark and for that I ran this code in PyCharm:

from pyspark.sql import SparkSession
Spark_Session:SparkSession.builder\
.enableHiveSupport()\
.master("local"\
.getOrCreate()

However, it returns this errors:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 18/01/08 10:17:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/08 10:18:14 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

How should I solve this problem?

These are warnings and (probably?) not that important, you should stil lbe able to use Spark. For example, [here] is someone getting exactly the same errors when starting the Spark shell while still getting everything to run properly. — Shaido

brandomr brandomr · Accepted Answer · 2018-01-09T03:01:05

Where are you running this? Is Hadoop installed? It seems like Spark can't find it: Unable to load native-hadoop library for your platform... using builtin-java classes. You need to ensure that the correct libraries are available. In the Spark UI you can check the context.

Try:

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Spark Example") \
    .getOrCreate()

That should work.

Not able to create spark dataframe by pyspark

1 Answers