【Spark】Spark SQL 数据类型转换-阿里云开发者社区

【Spark】Spark SQL 数据类型转换

2022-06-11 1869

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 【Spark】Spark SQL 数据类型转换

前言

数据类型转换这个在任何语言框架中都会涉及到，看起来非常简单，不过要把所有的数据类型都掌握还是需要一定的时间历练。

SparkSQL数据类型

数字类型

ByteType：代表一个字节的整数。范围是-128到127

ShortType：代表两个字节的整数。范围是-32768到32767

IntegerType：代表4个字节的整数。范围是-2147483648到2147483647

LongType：代表8个字节的整数。范围是-9223372036854775808到9223372036854775807

FloatType：代表4字节的单精度浮点数

DoubleType：代表8字节的双精度浮点数

DecimalType：代表任意精度的10进制数据。通过内部的java.math.BigDecimal支持。BigDecimal由一个任意精度的整型非标度值和一个32位整数组成

StringType：代表一个字符串值

BinaryType：代表一个byte序列值

BooleanType：代表boolean值

Datetime类型

TimestampType：代表包含字段年，月，日，时，分，秒的值

DateType：代表包含字段年，月，日的值

复杂类型

ArrayType(elementType, containsNull)：代表由elementType类型元素组成的序列值。containsNull用来指明ArrayType中的值是否有null值

MapType(keyType, valueType, valueContainsNull)：表示包括一组键 - 值对的值。通过keyType表示key数据的类型，通过valueType表示value数据的类型。valueContainsNull用来指明MapType中的值是否有null值

StructType(fields):表示一个拥有StructFields (fields)序列结构的值

StructField(name, dataType, nullable):代表StructType中的一个字段，字段的名字通过name指定，dataType指定field的数据类型，nullable表示字段的值是否有null值。

Spark SQL数据类型和Scala数据类型对比

Spark SQL数据类型转换案例

一句话描述：调用Column类的cast方法

如何获取Column类

这个之前写过

df("columnName")            // On a specific `df` DataFrame.
col("columnName")           // A generic column not yet associated with a DataFrame.
col("columnName.field")     // Extracting a struct field
col("`a.column.with.dots`") // Escape `.` in column names.
$"columnName"               // Scala short hand for a named column.

测试数据准备

1,tom,23
2,jack,24
3,lily,18
4,lucy,19

spark入口代码

val spark = SparkSession
      .builder()
      .appName("test")
      .master("local[*]")
      .getOrCreate()

测试默认数据类型

spark.read.
      textFile("./data/user")
      .map(_.split(","))
      .map(x => (x(0), x(1), x(2)))
      .toDF("id", "name", "age")
      .dtypes
      .foreach(println)

结果：

(id,StringType)
(name,StringType)
(age,StringType)

说明默认都是StringType类型

把数值型的列转为IntegerType

import spark.implicits._
    spark.read.
      textFile("./data/user")
      .map(_.split(","))
      .map(x => (x(0), x(1), x(2)))
      .toDF("id", "name", "age")
      .select($"id".cast("int"), $"name", $"age".cast("int"))
      .dtypes
      .foreach(println)

结果：

(id,IntegerType)
(name,StringType)
(age,IntegerType)

Column类cast方法的两种重载

第一种

def cast(to: String): Column

Casts the column to a different data type, using the canonical string representation of the type. The supported types are:

string, boolean, byte, short, int, long, float, double, decimal, date, timestamp.

// Casts colA to integer.
df.select(df("colA").cast("int"))
Since
1.3.0

第二种

def cast(to: DataType): Column

Casts the column to a different data type.

// Casts colA to IntegerType.
import org.apache.spark.sql.types.IntegerType
df.select(df("colA").cast(IntegerType))
// equivalent to
df.select(df("colA").cast("int"))

【Spark】Spark SQL 数据类型转换

前言

SparkSQL数据类型

数字类型

复杂类型

Spark SQL数据类型和Scala数据类型对比

Spark SQL数据类型转换案例

如何获取Column类

测试数据准备

spark入口代码

测试默认数据类型

把数值型的列转为IntegerType

Column类cast方法的两种重载

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【Spark】Spark SQL 数据类型转换

前言

SparkSQL数据类型

数字类型

复杂类型

Spark SQL数据类型和Scala数据类型对比

Spark SQL数据类型转换案例

如何获取Column类

测试数据准备

spark入口代码

测试默认数据类型

把数值型的列转为IntegerType

Column类cast方法的两种重载

热门文章

最新文章

相关课程

相关电子书

相关实验场景