I have a very large csv file, so i used spark and load it into a spark dataframe.
I need to extract the latitude and longitude from each row on the csv in order to create a folium map.
with pandas i can solve my problem with a loop:
for index, row in locations.iterrows():
folium.CircleMarker(location=(row["Pickup_latitude"],
row["Pickup_longitude"]),
radius=20,
color="#0A8A9F",fill=True).add_to(marker_cluster)
I found that unlike pandas data-frame the spark data-frame can't be processed by a loop =>how to loop through each row of dataFrame in pyspark .
so i thought that to i can engenieer the problem and cut the big data into hive tables then iterate them .
is it possible to cut the huge SPARK data-frame in hive tables and then iterate the rows with a loop?