To convert the type of column in Apache Spark, you use cast, not convert.

Cast Function Overview

Example:

from pyspark.sql.functions import col

# Change the column "age" to IntegerType
df = df.withColumn("age", col("age").cast("int"))

Common Types You Can Cast To:

  • StringType ("string"): Converts the column to a string.
  • IntegerType ("int", "integer"): Converts the column to an integer.
  • LongType ("long"): Converts the column to a long integer.
  • FloatType ("float"): Converts the column to a float.
  • DoubleType ("double"): Converts the column to a double.
  • BooleanType ("boolean"): Converts the column to a boolean (True/False).
  • DateType ("date"): Converts the column to a date type.
  • TimestampType ("timestamp"): Converts the column to a timestamp (date and time).

Full Example:

from pyspark.sql.functions import col

# Convert different columns to different types
df = df.withColumn("age", col("age").cast("int")) \
       .withColumn("price", col("price").cast("double")) \
       .withColumn("created_at", col("created_at").cast("timestamp"))

List of Valid Data Types:

  • “string”
  • “int” or “integer”
  • “long”
  • “float”
  • “double”
  • “boolean”
  • “date”
  • “timestamp”
  • “binary”