hands-on-data-analysis 第二单元 2,3节

简介: 数据合并——concat横向合并

hands-on-data-analysis 第二单元 2,3节

第二节 数据重构

万事开头记得导入基本的库:

# 导入基本库
import numpy as np
import pandas as pd

2.1.数据合并——concat横向合并

官方文档:

pandas.concat — pandas 1.4.2 documentation (pydata.org)

pandas中DataFrame数据合并连接(merge、join、concat

text_left_up,text_right_up两张表,如果横向合并为一张表(就是列与列拼接在一起)

text_left_up

PassengerId Survived Pclass Name
0 1 0 3 Braund, Mr. Owen Harris
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 1 3 Heikkinen, Miss. Laina
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 5 0 3 Allen, Mr. William Henry

text_right_up:

Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 female 35.0 1.0 0.0 113803 53.1000 C123 S
4 male 35.0 0.0 0.0 373450 8.0500 NaN S
list_up = [text_left_up,text_right_up]
result_up = pd.concat(list_up,axis=1)
result_up.head()

得到:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S

就好比,把带有小明的学号的表和带有小明成绩的表合在一起。

2.2.数据合并——concat纵向合并

官方文档:

pandas.concat — pandas 1.4.2 documentation (pydata.org)

pandas中DataFrame数据合并连接(merge、join、concat

将train-left-down和train-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。

text_left_down的数据为:

PassengerId Survived Pclass Name
0 440 0 2 Kvillner, Mr. Johan Henrik Johannesson
1 441 1 2 Hart, Mrs. Benjamin (Esther Ada Bloomfield)
2 442 0 3 Hampe, Mr. Leon
3 443 0 3 Petterson, Mr. Johan Emil
4 444 1 2 Reynaldo, Ms. Encarnacion

text_right_down数据为:

Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 31.0 0 0 C.A. 18723 10.500 NaN S
1 female 45.0 1 1 F.C.C. 13529 26.250 NaN S
2 male 20.0 0 0 345769 9.500 NaN S
3 male 25.0 1 0 347076 7.775 NaN S
4 female 28.0 0 0 230434 13.000 NaN S
list_down=[text_left_down,text_right_down]
result_down = pd.concat(list_down,axis=1)
result = pd.concat([result_up,result_down])
result.head()

合并后的表为:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S

2.3.数据合并——join

官方文档:
pandas.DataFrame.join — pandas 1.4.2 documentation (pydata.org)

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

从官方文档上可以知道,join的方式比较灵活。

可以在 索引 上将 列 与其他 DataFrame 连接。 也可以通过传递一个列表,一次有效地按索引连接多个 DataFrame 对象。

参数有:

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

2.4. concat 与 join 比较

concat、join等的比较

Merge, join, concatenate and compare — pandas 1.4.2 documentation (pydata.org)

第三节 GroupBy 接口

官方文档:

pandas.DataFrame.groupby — pandas 1.4.2 documentation (pydata.org)

目录
相关文章
|
11天前
|
算法 数据挖掘 测试技术
文献解读-Processing UMI Datasets at High Accuracy and Efficiency with the Sentieon ctDNA Analysis Pipeline
Sentieon ctDNA分析流程通过创新的算法设计和高效的软件实现,为高深度、大panel的ctDNA测序数据分析提了一个快速而准确的解决方案。它在多个数据集上均展现出优于或等同于现有方法的性能,同时大幅提高了处理速度。这一进展有望推动ctDNA技术在临床肿瘤学中的广泛应用,特别是在早期癌症检测和最小残留病监测等领域。
36 8
|
26天前
|
算法 安全 数据挖掘
文献解读-Transcriptional Start Site Coverage Analysis in Plasma Cell-Free DNA Reveals Disease Severity and Tissue Specificity of COVID-19 Patients
这项研究展示了 cfDNA 分析在揭示新冠肺炎进展中的组织参与情况和疾病机制方面的潜力。它强调了 cfDNA 作为无创生物标志物在疾病严重程度检测、患者监测和预后评估中的应用价值。这种方法为理解新冠肺炎的病理生理学提供了新的视角,并可能帮助开发更有针对性的治疗策略。
26 2
|
4月前
|
机器学习/深度学习 存储 算法
【博士每天一篇文献-算法】Memory augmented echo state network for time series prediction
本文介绍了一种记忆增强的回声状态网络(MA-ESN),它通过在储层中引入线性记忆模块和非线性映射模块来平衡ESN的记忆能力和非线性映射能力,提高了时间序列预测的性能,并在多个基准数据集上展示了其优越的记忆能力和预测精度。
34 3
【博士每天一篇文献-算法】Memory augmented echo state network for time series prediction
|
4月前
|
机器学习/深度学习 人工智能 算法
【博士每天一篇论文-算法】Collective Behavior of a Small-World Recurrent Neural System With Scale-Free Distrib
本文介绍了一种新型的尺度无标度高聚类回声状态网络(SHESN)模型,该模型通过模拟生物神经系统的特性,如小世界现象和无标度分布,显著提高了逼近复杂非线性动力学系统的能力,并在Mackey-Glass动态系统和激光时间序列预测等问题上展示了其优越的性能。
39 1
【博士每天一篇论文-算法】Collective Behavior of a Small-World Recurrent Neural System With Scale-Free Distrib
|
数据挖掘
【提示学习】Prompt Tuning for Multi-Label Text Classification: How to Link Exercises to Knowledge Concept
文章这里使用的是BCEWithLogitsLoss,它适用于多标签分类。即:把[MASK]位置预测到的词表的值进行sigmoid,取指定阈值以上的标签,然后算损失。
|
算法 Linux Shell
SGAT丨Single Gene Analysis Tool
SGAT丨Single Gene Analysis Tool
2022亚太建模B题Optimal Design of High-speed Train思路分析
2022亚太建模B题思路分析高速列车的优化设计 Optimal Design of High-speed Train
2022亚太建模B题Optimal Design of High-speed Train思路分析
|
机器学习/深度学习 算法 数据挖掘
【多标签文本分类】Improved Neural Network-based Multi-label Classification with Better Initialization ……
【多标签文本分类】Improved Neural Network-based Multi-label Classification with Better Initialization ……
139 0
【多标签文本分类】Improved Neural Network-based Multi-label Classification with Better Initialization ……
《Distributed End-to-End Drug Similarity Analytics and Visualization Workflow》电子版地址
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow
81 0
《Distributed End-to-End Drug Similarity Analytics and Visualization Workflow》电子版地址
《Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation》电子版地址
Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation
84 0
《Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation》电子版地址