How to create dataframe in pyspark

      No Comments on How to create dataframe in pyspark

First, switch as spark user and run pyspark command
su – spark
pyspark

You can use below code to create sample dataframe in pyspark 
from pyspark.sql import *
department1 = Row(id='123456', name='Computer Science')
department2 = Row(id='789012', name='Mechanical Engineering')
department3 = Row(id='345678', name='Theater and Drama')
department4 = Row(id='901234', name='Indoor Recreation')
Employee = Row("firstName", "lastName", "email", "salary")
employee1 = Employee('michael', 'armbrust', 'no-reply@berkeley.edu', 100000)
employee2 = Employee('xiangrui', 'meng', 'no-reply@stanford.edu', 120000)
employee3 = Employee('matei', None, 'no-reply@waterloo.edu', 140000)
employee4 = Employee(None, 'wendell', 'no-reply@berkeley.edu', 160000)
departmentWithEmployees1 = Row(department=department1, employees=[employee1, employee2])
departmentWithEmployees2 = Row(department=department2, employees=[employee3, employee4])
departmentWithEmployees3 = Row(department=department3, employees=[employee1, employee4])
departmentWithEmployees4 = Row(department=department4, employees=[employee2, employee3])
departmentsWithEmployeesSeq1 = [departmentWithEmployees1, departmentWithEmployees2]
       df1 = spark.createDataFrame(departmentsWithEmployeesSeq1)
       df1.show()
       +--------------------+--------------------+
       |          department|           employees|
       +--------------------+--------------------+
       |[123456, Computer…|[[michael, armbru…|
       |[789012, Mechanic…|[[matei,, no-repl…|
       +--------------------+--------------------+         

Leave a Reply