[Spark][Python][DataFrame][Write]DataFrame写入的例子

简介:

[Spark][Python][DataFrame][Write]DataFrame写入的例子

$ hdfs dfs -cat people.json

{"name":"Alice","pcode":"94304"}
{"name":"Brayden","age":30,"pcode":"94304"}
{"name":"Carla","age":19,"pcoe":"10036"}
{"name":"Diana","age":46}
{"name":"Etienne","pcode":"94104"}

 

$pyspark

sqlContext = HiveContext(sc)

peopleDF = sqlContext.read.json("people.json")

peopleDF.write.format("parquet").mode("append").partitionBy("age").saveAsTable("people")

 

复制代码

17/10/07 00:58:18 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 65.5 KB, free 338.2 KB)
17/10/07 00:58:18 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 21.4 KB, free 359.6 KB)
17/10/07 00:58:18 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:59616 (size: 21.4 KB, free: 208.8 MB)
17/10/07 00:58:18 INFO spark.SparkContext: Created broadcast 2 from saveAsTable at NativeMethodAccessorImpl.java:-2
17/10/07 00:58:18 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 251.1 KB, free 610.7 KB)
17/10/07 00:58:18 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 21.6 KB, free 632.4 KB)
17/10/07 00:58:18 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:59616 (size: 21.6 KB, free: 208.7 MB)
17/10/07 00:58:18 INFO spark.SparkContext: Created broadcast 3 from saveAsTable at NativeMethodAccessorImpl.java:-2
17/10/07 00:58:19 INFO parquet.ParquetRelation: Using default output committer for Parquet: parquet.hadoop.ParquetOutputCommitter
17/10/07 00:58:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/10/07 00:58:19 INFO datasources.DynamicPartitionWriterContainer: Using user defined output committer class parquet.hadoop.ParquetOutputCommitter
17/10/07 00:58:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/10/07 00:58:19 INFO mapred.FileInputFormat: Total input paths to process : 1
17/10/07 00:58:19 INFO spark.SparkContext: Starting job: saveAsTable at NativeMethodAccessorImpl.java:-2
17/10/07 00:58:19 INFO scheduler.DAGScheduler: Got job 1 (saveAsTable at NativeMethodAccessorImpl.java:-2) with 1 output partitions
17/10/07 00:58:19 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (saveAsTable at NativeMethodAccessorImpl.java:-2)
17/10/07 00:58:19 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/10/07 00:58:19 INFO scheduler.DAGScheduler: Missing parents: List()
17/10/07 00:58:19 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[7] at saveAsTable at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/07 00:58:19 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 72.7 KB, free 705.0 KB)
17/10/07 00:58:20 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 26.4 KB, free 731.4 KB)
17/10/07 00:58:20 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:59616 (size: 26.4 KB, free: 208.7 MB)
17/10/07 00:58:20 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006
17/10/07 00:58:20 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[7] at saveAsTable at NativeMethodAccessorImpl.java:-2)
17/10/07 00:58:20 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/10/07 00:58:20 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,PROCESS_LOCAL, 2149 bytes)
17/10/07 00:58:20 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
17/10/07 00:58:20 INFO rdd.HadoopRDD: Input split: hdfs://localhost:8020/user/training/people.json:0+179
17/10/07 00:58:20 INFO codegen.GenerateUnsafeProjection: Code generated in 314.888218 ms
17/10/07 00:58:20 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/10/07 00:58:20 INFO datasources.DynamicPartitionWriterContainer: Using user defined output committer class parquet.hadoop.ParquetOutputCommitter
17/10/07 00:58:20 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/10/07 00:58:20 INFO codegen.GenerateUnsafeProjection: Code generated in 46.978197 ms
17/10/07 00:58:20 INFO codegen.GenerateUnsafeProjection: Code generated in 64.665839 ms
17/10/07 00:58:21 INFO codegen.GenerateUnsafeProjection: Code generated in 94.259071 ms
17/10/07 00:58:21 INFO codec.CodecConfig: Compression: GZIP
17/10/07 00:58:21 INFO hadoop.ParquetOutputFormat: Parquet block size to 134217728
17/10/07 00:58:21 INFO hadoop.ParquetOutputFormat: Parquet page size to 1048576
17/10/07 00:58:21 INFO hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
17/10/07 00:58:21 INFO hadoop.ParquetOutputFormat: Dictionary is on
17/10/07 00:58:21 INFO hadoop.ParquetOutputFormat: Validation is off
17/10/07 00:58:21 INFO hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
17/10/07 00:58:21 INFO hadoop.ParquetOutputFormat: Maximum row group padding size is 8388608 bytes
17/10/07 00:58:21 INFO parquet.CatalystWriteSupport: Initialized Parquet WriteSupport with Catalyst schema:
{
"type" : "struct",
"fields" : [ {
"name" : "name",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pcode",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pcoe",
"type" : "string",
"nullable" : true,
"metadata" : { }
} ]
}
and corresponding Parquet message type:
message spark_schema {
optional binary name (UTF8);
optional binary pcode (UTF8);
optional binary pcoe (UTF8);
}


17/10/07 00:58:21 INFO compress.CodecPool: Got brand-new compressor [.gz]
17/10/07 00:58:21 INFO datasources.DynamicPartitionWriterContainer: Maximum partitions reached, falling back on sorting.
17/10/07 00:58:21 INFO codegen.GenerateUnsafeProjection: Code generated in 34.281133 ms
17/10/07 00:58:21 INFO codegen.GenerateOrdering: Code generated in 85.573905 ms
17/10/07 00:58:21 INFO datasources.DynamicPartitionWriterContainer: Sorting complete. Writing out partition files one at a time.
17/10/07 00:58:21 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 54
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-hadoop-bundle-1.5.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-pig-bundle-1.5.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/parquet-format-2.1.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/hive-jdbc-1.1.0-cdh5.7.0-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
17/10/07 00:58:21 INFO hadoop.ColumnChunkPageWriteStore: written 80B for [name] BINARY: 2 values, 26B raw, 43B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:21 INFO hadoop.ColumnChunkPageWriteStore: written 73B for [pcode] BINARY: 2 values, 24B raw, 38B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:21 INFO hadoop.ColumnChunkPageWriteStore: written 47B for [pcoe] BINARY: 2 values, 6B raw, 26B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO codec.CodecConfig: Compression: GZIP
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet block size to 134217728
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet page size to 1048576
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Dictionary is on
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Validation is off
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Maximum row group padding size is 8388608 bytes
17/10/07 00:58:22 INFO parquet.CatalystWriteSupport: Initialized Parquet WriteSupport with Catalyst schema:
{
"type" : "struct",
"fields" : [ {
"name" : "name",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pcode",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pcoe",
"type" : "string",
"nullable" : true,
"metadata" : { }
} ]
}
and corresponding Parquet message type:
message spark_schema {
optional binary name (UTF8);
optional binary pcode (UTF8);
optional binary pcoe (UTF8);
}


17/10/07 00:58:22 INFO compress.CodecPool: Got brand-new compressor [.gz]
17/10/07 00:58:22 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 26
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 68B for [name] BINARY: 1 values, 15B raw, 33B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 47B for [pcode] BINARY: 1 values, 6B raw, 26B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 68B for [pcoe] BINARY: 1 values, 15B raw, 33B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO codec.CodecConfig: Compression: GZIP
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet block size to 134217728
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet page size to 1048576
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Dictionary is on
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Validation is off
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Maximum row group padding size is 8388608 bytes
17/10/07 00:58:22 INFO parquet.CatalystWriteSupport: Initialized Parquet WriteSupport with Catalyst schema:
{
"type" : "struct",
"fields" : [ {
"name" : "name",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pcode",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pcoe",
"type" : "string",
"nullable" : true,
"metadata" : { }
} ]
}
and corresponding Parquet message type:
message spark_schema {
optional binary name (UTF8);
optional binary pcode (UTF8);
optional binary pcoe (UTF8);
}


17/10/07 00:58:22 INFO compress.CodecPool: Got brand-new compressor [.gz]
17/10/07 00:58:22 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 28
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 74B for [name] BINARY: 1 values, 17B raw, 35B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 68B for [pcode] BINARY: 1 values, 15B raw, 33B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 47B for [pcoe] BINARY: 1 values, 6B raw, 26B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:59616 in memory (size: 21.4 KB, free: 208.7 MB)
17/10/07 00:58:22 INFO codec.CodecConfig: Compression: GZIP
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet block size to 134217728
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet page size to 1048576
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Dictionary is on
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Validation is off
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
17/10/07 00:58:22 INFO hadoop.ParquetOutputFormat: Maximum row group padding size is 8388608 bytes
17/10/07 00:58:22 INFO parquet.CatalystWriteSupport: Initialized Parquet WriteSupport with Catalyst schema:
{
"type" : "struct",
"fields" : [ {
"name" : "name",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pcode",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "pcoe",
"type" : "string",
"nullable" : true,
"metadata" : { }
} ]
}
and corresponding Parquet message type:
message spark_schema {
optional binary name (UTF8);
optional binary pcode (UTF8);
optional binary pcoe (UTF8);
}


17/10/07 00:58:22 INFO compress.CodecPool: Got brand-new compressor [.gz]
17/10/07 00:58:22 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 13
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 68B for [name] BINARY: 1 values, 15B raw, 33B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 47B for [pcode] BINARY: 1 values, 6B raw, 26B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO hadoop.ColumnChunkPageWriteStore: written 47B for [pcoe] BINARY: 1 values, 6B raw, 26B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
17/10/07 00:58:22 INFO output.FileOutputCommitter: Saved output of task 'attempt_201710070058_0001_m_000000_0' to hdfs://localhost:8020/user/hive/warehouse/people/_temporary/0/task_201710070058_0001_m_000000
17/10/07 00:58:22 INFO mapred.SparkHadoopMapRedUtil: attempt_201710070058_0001_m_000000_0: Committed
17/10/07 00:58:22 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 2057 bytes result sent to driver
17/10/07 00:58:22 INFO scheduler.DAGScheduler: ResultStage 1 (saveAsTable at NativeMethodAccessorImpl.java:-2) finished in 2.797 s
17/10/07 00:58:22 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 2797 ms on localhost (1/1)
17/10/07 00:58:22 INFO scheduler.DAGScheduler: Job 1 finished: saveAsTable at NativeMethodAccessorImpl.java:-2, took 3.236619 s
17/10/07 00:58:22 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
17/10/07 00:58:23 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
17/10/07 00:58:23 INFO datasources.DynamicPartitionWriterContainer: Job job_201710070058_0000 committed.
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people/age=19 on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people/age=30 on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people/age=46 on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people/age=__HIVE_DEFAULT_PARTITION__ on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people/age=19 on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people/age=30 on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people/age=46 on driver
17/10/07 00:58:23 INFO parquet.ParquetRelation: Listing hdfs://localhost:8020/user/hive/warehouse/people/age=__HIVE_DEFAULT_PARTITION__ on driver
17/10/07 00:58:24 WARN hive.HiveContext$$anon$2: Persisting partitioned data source relation `people` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. Input path(s): 
hdfs://localhost:8020/user/hive/warehouse/people

复制代码

 

[training@localhost ~]$ hive

hive> 
> show tables like 'people';
OK
people
Time taken: 5.046 seconds, Fetched: 1 row(s)
hive>

 

sqlContext =HiveContext(sc)
newPeopleDF = sqlContext.read.table("people")

newPeopleDF.limit(5).show()

+-------+-----+-----+----+
| name|pcode| pcoe| age|
+-------+-----+-----+----+
|Brayden|94304| null| 30|
| Diana| null| null| 46|
| Carla| null|10036| 19|
| Alice|94304| null|null|
|Etienne|94104| null|null|
+-------+-----+-----+----+

可以看到,确实把一个从jason 读取得到的 DataFrame,写入了parquet 格式的表,表名为 people

然后,通过再一次地通过 HiveContext 来读取此表,得到并显示了它的数据。






本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/p/dataframe_write.html,如需转载请自行联系原作者

目录
相关文章
|
7月前
|
数据采集 数据挖掘 Python
【Python DataFrame专栏】讲解DataFrame中缺失值的处理方法,包括填充、删除和插值技术。
【5月更文挑战第20天】在Python的Pandas库中处理DataFrame缺失值,包括查看缺失值(`isnull().sum()`)、填充(`fillna()`:固定值、前向填充、后向填充)、删除(`dropna()`:按行或列)和插值(`interpolate()`:线性、多项式、分段常数)。示例代码展示了这些方法的使用。
578 3
【Python DataFrame专栏】讲解DataFrame中缺失值的处理方法,包括填充、删除和插值技术。
|
1月前
|
SQL JSON 分布式计算
【赵渝强老师】Spark SQL的数据模型:DataFrame
本文介绍了在Spark SQL中创建DataFrame的三种方法。首先,通过定义case class来创建表结构,然后将CSV文件读入RDD并关联Schema生成DataFrame。其次,使用StructType定义表结构,同样将CSV文件读入RDD并转换为Row对象后创建DataFrame。最后,直接加载带有格式的数据文件(如JSON),通过读取文件内容直接创建DataFrame。每种方法都包含详细的代码示例和解释。
|
2月前
|
Python
Python Tricks : How to Write Debuggable Decorators
Python Tricks : How to Write Debuggable Decorators
15 1
|
2月前
|
SQL 分布式计算 大数据
大数据-94 Spark 集群 SQL DataFrame & DataSet & RDD 创建与相互转换 SparkSQL
大数据-94 Spark 集群 SQL DataFrame & DataSet & RDD 创建与相互转换 SparkSQL
77 0
|
4月前
|
SQL 存储 分布式计算
|
4月前
【Pandas+Python】初始化一个全零的Dataframe
初始化一个100*3的0矩阵,变为Dataframe类型,并为每列赋值一个属性。
58 2
|
5月前
|
数据采集 机器学习/深度学习 数据可视化
了解数据科学面试中的Python数据分析重点,包括Pandas(DataFrame)、NumPy(ndarray)和Matplotlib(图表绘制)。
【7月更文挑战第5天】了解数据科学面试中的Python数据分析重点,包括Pandas(DataFrame)、NumPy(ndarray)和Matplotlib(图表绘制)。数据预处理涉及缺失值(dropna(), fillna())和异常值处理。使用describe()进行统计分析,通过Matplotlib和Seaborn绘图。回归和分类分析用到Scikit-learn,如LinearRegression和RandomForestClassifier。
110 3
|
6月前
|
Python
在Python的pandas库中,向DataFrame添加新列简单易行
【6月更文挑战第15天】在Python的pandas库中,向DataFrame添加新列简单易行。可通过直接赋值、使用Series或apply方法实现。例如,直接赋值可将列表或Series对象分配给新列;使用Series可基于现有列计算生成新列;apply方法则允许应用自定义函数到每一行或列来创建新列。
501 8
|
7月前
|
数据可视化 数据挖掘 Python
【Python DataFrame专栏】DataFrame的可视化探索:使用matplotlib和seaborn
【5月更文挑战第20天】本文介绍了使用Python的pandas、matplotlib和seaborn库进行数据可视化的步骤,包括创建示例数据集、绘制折线图、柱状图、散点图、热力图、箱线图、小提琴图和饼图。这些图表有助于直观理解数据分布、关系和趋势,适用于数据分析中的探索性研究。
114 1
【Python DataFrame专栏】DataFrame的可视化探索:使用matplotlib和seaborn
|
5月前
|
存储 数据可视化 数据处理
`geopandas`是一个开源项目,它为Python提供了地理空间数据处理的能力。它基于`pandas`库,并扩展了其对地理空间数据(如点、线、多边形等)的支持。`GeoDataFrame`是`geopandas`中的核心数据结构,它类似于`pandas`的`DataFrame`,但包含了一个额外的地理列(通常是`geometry`列),用于存储地理空间数据。
`geopandas`是一个开源项目,它为Python提供了地理空间数据处理的能力。它基于`pandas`库,并扩展了其对地理空间数据(如点、线、多边形等)的支持。`GeoDataFrame`是`geopandas`中的核心数据结构,它类似于`pandas`的`DataFrame`,但包含了一个额外的地理列(通常是`geometry`列),用于存储地理空间数据。