mardi 4 août 2015

Spark + Python - how to set the system environment variables?

I'm on spark-1.4.1. How can I set the system environment variables for Python?

For instance, in R,

Sys.setenv(SPARK_HOME = "C:/Apache/spark-1.4.1")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

What about in Python?

import os
import sys

from pyspark.sql import SQLContext

sc = SparkContext(appName="PythonSQL")
sqlContext = SQLContext(sc)

# Set the system environment variables.
# ref: http://ift.tt/1ImpBqH
if len(sys.argv) < 2:
    path = "file://" + \
        os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json")
else:
    path = sys.argv[1]

# Create the DataFrame
df = sqlContext.jsonFile(path)

# Show the content of the DataFrame
df.show()

I get this error,

df is not defined.

enter image description here

Any ideas?

Aucun commentaire:

Enregistrer un commentaire