[雪峰磁针石博客]使用pandas数据分析工具处理excel

简介: pandas有强大的excel数据处理和导入处理功能,本文简单介绍pandas在csv和excel等格式方面处理的应用及绘制图表等功能。 pandas处理excel依赖xlutils, OpenPyXL, XlsxWriter等库。

pandas有强大的excel数据处理和导入处理功能,本文简单介绍pandas在csv和excel等格式方面处理的应用及绘制图表等功能。

pandas处理excel依赖xlutils, OpenPyXL, XlsxWriter等库。

pandas数据读取概述

读写文本

Function Description
read_csv Load delimited data from a file, URL, or file-like object; use comma as default delimiter
read_table Load delimited data from a file, URL, or file-like object; use tab ('\t') as default delimiter
read_fwf Read data in fixed-width column format (i.e., no delimiters)
read_clipboard Version of Read_table that Reads data from the clipboard; useful for converting tables from web pages
read_excel Read tabular data from an Excel XLS or XLSX file
read_hdf Read HDF5 files written by pandas
read_html Read all tables found in the given HTML document
read_json Read data from a JSON (JavaScript Object Notation) string representation
read_msgpack Read pandas data encoded using the MessagePack binary format
read_pickle Read an arbitrary object stored in Python pickle format
read_sas Read a SAS dataset stored in one of the SAS system’s custom storage formats
read_sql Read the results of a SQL query (using SQLAlchemy) as a pandas DataFrame
read_stata Read a dataset from Stata file format
read_feather Read the Feather binary file format

参数主要涉及索引、类型推理和数据转换、日期时间处理、迭代、脏数据。

ex1.csv的内容如下:


a,b,c,d,message
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

读取:


In [9]: df = pd.read_csv('examples/ex1.csv')
In [10]: df
Out[10]:
a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo

还可以改用read_table读取


In [11]: pd.read_table('examples/ex1.csv', sep=',')
Out[11]:
a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo

ex2.csv的内容如下:


1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

可以使用header=None表示没有列名,也可以用names自行指定列名,还可以使用index_col将列作为索引。


In [13]: pd.read_csv('examples/ex2.csv', header=None)
Out[13]:
0 1 2 3 4
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo

In [14]: pd.read_csv('examples/ex2.csv', names=['a', 'b', 'c', 'd', 'message'])
Out[14]:
a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo
6.1

In [15]: names = ['a', 'b', 'c', 'd', 'message']
In [16]: pd.read_csv('examples/ex2.csv', names=names, index_col='message')
Out[16]:
a b c d
message
hello 1 2 3 4
world 5 6 7 8
foo 9 10 11 12

csv_mindex.csv的内容:


key1,key2,value1,value2
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16

建立层级索引:


In [18]: parsed = pd.read_csv('examples/csv_mindex.csv',
....: index_col=['key1', 'key2'])
In [19]: parsed
Out[19]:
value1 value2
key1 key2
one a 1 2
b 3 4
c 5 6
d 7 8
two a 9 10
b 11 12
c 13 14
d 15 16

用正则表达式处理混合的分隔符:


In [20]: list(open('examples/ex3.txt'))
Out[20]:
[' A B C\n',
'aaa -0.264438 -1.026059 -0.619500\n',
'bbb 0.927272 0.302904 -0.032399\n',
'ccc -0.264273 -0.386314 -0.217601\n',
'ddd -0.871858 -0.348382 1.100491\n']

In [21]: result = pd.read_table('examples/ex3.txt', sep='\s+')
In [22]: result
Out[22]:
A B C
aaa -0.264438 -1.026059 -0.619500
bbb 0.927272 0.302904 -0.032399
ccc -0.264273 -0.386314 -0.217601
ddd -0.871858 -0.348382 1.100491

ex4.csv的内容:


# hey!
a,b,c,d,message
# just wanted to make things more difficult for you
# who reads CSV files with computers, anyway?
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

skiprows可以忽略行


In [24]: pd.read_csv('examples/ex4.csv', skiprows=[0, 2, 3])
Out[24]:
a b c d message
0 1 2 3 4 hello
1 5 6 7 8 world
2 9 10 11 12 foo

ex5.csv的内容:


something,a,b,c,d,message
one,1,2,3,4,NA
two,5,6,,8,world
three,9,10,11,12,foo

可以指定哪些值为缺失值,甚至可以针对行指定缺失值。


In [26]: result = pd.read_csv('examples/ex5.csv')
In [27]: result
Out[27]:
something a b c d message
0 one 1 2 3.0 4 NaN
1 two 5 6 NaN 8 world
2 three 9 10 11.0 12 foo
In [28]: pd.isnull(result)
Out[28]:
something a b c d message
0 False False False False False True
1 False False False True False False
2 False False False False False False

In [29]: result = pd.read_csv('examples/ex5.csv', na_values=['NULL'])
In [30]: result
Out[30]:
something a b c d message
0 one 1 2 3.0 4 NaN
1 two 5 6 NaN 8 world
2 three 9 10 11.0 12 foo

In [31]: sentinels = {
    'message': ['foo', 'NA'], 'something': ['two']}
In [32]: pd.read_csv('examples/ex5.csv', na_values=sentinels)
Out[32]:
something a b c d message
0 one 1 2 3.0 4 NaN
1 NaN 5 6 NaN 8 world
2 three 9 10 11.0 12 NaN

pandas.read_csv和pandas.read_table的常用参数如下:

Argument Description
path String indicating filesystem location, URL, or file-like object
sep or delimiter Character sequence or regular expression to use to split fields in each row
header Row number to use as column names; defaults to 0 (first row), but should be None if there is no header row。
index_col Column numbers or names to use as the row index in the result; can be a single name/number or alist of them for a hierarchical index
names List of column names for result, combine with header=None
skiprows Number of rows at beginning of file to ignore or list of row numbers (starting from 0) to skip.
na_values Sequence of values to replace with NA.
comment Character(s) to split comments off the end of lines.
parse_dates Attempt to parse data to datetime; False by default. If True, will attempt to parse all columns.Otherwise can specify a list of column numbers or name to parse. If element of list is tuple or list, willcombine multiple columns together and parse to date (e.g., if date/time split across two columns).
keep_date_col If joining columns to parse date, keep the joined columns; False by default.
converters Dict containing column number of name mapping to functions (e.g., {'foo': f} would apply the function f to all values in the 'foo' column).
dayfirst When parsing potentially ambiguous dates, treat as international format (e.g., 7/6/2012 -> June 7,2012); False by default.
date_parser Function to use to parse dates.
nrows Number of rows to read from beginning of file.
iterator Return a TextParser object for reading file piecemeal.
chunksize For iteration, size of file chunks.
skip_footer Number of lines to ignore at end of file.
verbose Print various parser output information, like the number of missing values placed in non-numericcolumns.
encoding Text encoding for Unicode (e.g., 'utf-8' for UTF-8 encoded text).
squeeze If the parsed data only contains one column, return a Series.
thousands Separator for thousands (e.g., ',' or '.').

参考资料:

更多参考:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

CSV

使用pandas读写csv

pandas_parsing_and_write.py


import pandas as pd

input_file = r"supplier_data.csv"
output_file = r"output_files\1output.csv"

data_frame = pd.read_csv(input_file)
print(data_frame)
data_frame.to_csv(output_file, index=False)

当然也可以用python实现:

1csv_simple_parsing_and_write.py


input_file = r"supplier_data.csv"
output_file = r"output_files\1output.csv"

with open(input_file, newline='') as filereader:
    with open(output_file, 'w', newline='') as filewriter:
        for row in filereader:
            filewriter.write(row)

2csv_reader_parsing_and_write.py


import csv

input_file = r"supplier_data.csv"
output_file = r"output_files\2output.csv"

with open(input_file, 'r', newline='') as csv_in_file:
    with open(output_file, 'w', newline='') as csv_out_file:
        filereader = csv.reader(csv_in_file, delimiter=',')
        filewriter = csv.writer(csv_out_file, delimiter=',')
        for row_list in filereader:
            filewriter.writerow(row_list)

过滤特定行

  • 选择供应商名字包含Z或者Cost大于600的行

pandas_value_meets_condition.py


import pandas as pd

input_file = r"supplier_data.csv"
output_file = r"output_files\3output.csv"

data_frame = pd.read_csv(input_file)

data_frame['Cost'] = data_frame['Cost'].str.strip('$').astype(float)
data_frame_value_meets_condition = data_frame.loc[(data_frame['Supplier Name']\
.str.contains('Z')) | (data_frame['Cost'] > 600.0), :]

data_frame_value_meets_condition.to_csv(output_file, index=False)

注意pandas的strip连里面的内容都可以清除, 有点类似replace的功能。

  • 选择符合一个集合的数据:

选择日期为'1/20/14', '1/30/14'的行


import pandas as pd

input_file = r"supplier_data.csv"
output_file = r"output_files\4output.csv"

data_frame = pd.read_csv(input_file)

important_dates = ['1/20/14', '1/30/14']
data_frame_value_in_set = data_frame.loc[data_frame['Purchase Date']\
.isin(important_dates), :]

data_frame_value_in_set.to_csv(output_file, index=False)
  • 用正则表达式选择数据

pandas_value_matches_pattern.py


import pandas as pd

input_file = r"supplier_data.csv"
output_file = r"output_files\4output.csv"

data_frame = pd.read_csv(input_file)
data_frame_value_matches_pattern = data_frame.ix[data_frame['Invoice Number']\
.str.startswith("001-"), :]

data_frame_value_matches_pattern.to_csv(output_file, index=False)

过滤特定列

  • 选择0,3列

pandas_column_by_index.py


import pandas as pd
import sys

input_file = r"supplier_data.csv"
output_file = r"output_files\6output.csv"

data_frame = pd.read_csv(input_file)
data_frame_column_by_index = data_frame.iloc[:, [0, 3]]
data_frame_column_by_index.to_csv(output_file, index=False)

pandas_column_by_index.py


import pandas as pd

input_file = r"supplier_data.csv"
output_file = r"output_files\7output.csv"

data_frame = pd.read_csv(input_file)
data_frame_column_by_name = data_frame.loc[
    :, ['Invoice Number', 'Purchase Date']]
data_frame_column_by_name.to_csv(output_file, index=False)

pandas_select_contiguous_rows.py


import pandas as pd

input_file = r"supplier_data_unnecessary_header_footer.csv"
output_file = r"output_files\11output.csv"

data_frame = pd.read_csv(input_file, header=None)
data_frame = data_frame.drop([0,1,2,16,17,18])
data_frame.columns = data_frame.iloc[0]
data_frame = data_frame.reindex(data_frame.index.drop(3))
data_frame.to_csv(output_file, index=False)

添加行头

pandas_add_header_row.py


import pandas as pd

input_file = r"supplier_data_no_header_row.csv"
output_file = r"output_files\11output.csv"
header_list = ['Supplier Name', 'Invoice Number', \
'Part Number', 'Cost', 'Purchase Date']
data_frame = pd.read_csv(input_file, header=None, names=header_list)
data_frame.to_csv(output_file, index=False)

合并多个文件

pandas_concat_rows_from_multiple_files.py


import pandas as pd
import glob
import os

input_path = r"D:\code\foundations-for-analytics-with-python\csv"
output_file = r"output_files\12output.csv"

all_files = glob.glob(os.path.join(input_path,'sales_*'))
all_data_frames = []
for file in all_files:
    data_frame = pd.read_csv(file, index_col=None)
    all_data_frames.append(data_frame)
data_frame_concat = pd.concat(all_data_frames, axis=0, ignore_index=True)
data_frame_concat.to_csv(output_file, index = False)

求和和求平均值

pandas_sum_average_from_multiple_files.py


import pandas as pd
import glob
import os

input_path = r"D:\code\foundations-for-analytics-with-python\csv"
output_file = r"output_files\12output.csv"

all_files = glob.glob(os.path.join(input_path,'sales_*'))
all_data_frames = []
for input_file in all_files:
    print(input_file)
    data_frame = pd.read_csv(input_file, index_col=None)
    
    print(data_frame)
    
    sales = pd.DataFrame([float(str(value).strip('$').replace(',','')) 
      for value in data_frame.loc[:, 'Sale Amount']])
    
    total_cost = sales.sum()
    average_cost = sales.mean()

    data = {
    'file_name': os.path.basename(input_file),
            'total_sales': total_cost,
            'average_sales': average_cost}

    all_data_frames.append(pd.DataFrame(
        data, columns=['file_name', 'total_sales', 'average_sales']))

data_frames_concat = pd.concat(all_data_frames, axis=0, ignore_index=True)
data_frames_concat.to_csv(output_file, index = False)

XLS

使用pandas读写xls

pandas_parsing_and_write_keep_dates.py


import pandas as pd

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"
data_frame = pd.read_excel(input_file, sheetname='january_2013')

writer = pd.ExcelWriter(output_file)
data_frame.to_excel(writer, sheet_name='jan_13_output', index=False)
writer.save()

过滤特定行

  • 销售额大于1400的记录

pandas_value_meets_condition.py


import pandas as pd

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"

data_frame = pd.read_excel(input_file, 'january_2013', index_col=None)
data_frame_value_meets_condition = \
    data_frame[data_frame['Sale Amount'].astype(float) > 1400.0]

writer = pd.ExcelWriter(output_file)
data_frame_value_meets_condition.to_excel(
    writer, sheet_name='jan_13_output', index=False)
writer.save()
  • 指定日期的

pandas_value_in_set.py


import string

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"

data_frame = pd.read_excel(input_file, 'january_2013', index_col=None)

important_dates = ['01/24/2013','01/31/2013']
data_frame_value_in_set = data_frame[data_frame['Purchase Date'].isin(important_dates)]

writer = pd.ExcelWriter(output_file)
data_frame_value_in_set.to_excel(writer, sheet_name='jan_13_output', index=False)
writer.save()
  • 其他条件

startswith , endswith , match和search等。

pandas_value_matches_pattern.py


import pandas as pd

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"

data_frame = pd.read_excel(input_file, 'january_2013', index_col=None)

data_frame_value_matches_pattern = data_frame[
    data_frame['Customer Name'].str.startswith("J")]

writer = pd.ExcelWriter(output_file)
data_frame_value_matches_pattern.to_excel(
    writer, sheet_name='jan_13_output', index=False)
writer.save()

选取特定列

  • iloc基于index选取第2和第5列

import pandas as pd

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"

data_frame = pd.read_excel(input_file, 'january_2013', index_col=None)

data_frame_column_by_index = data_frame.iloc[:, [1, 4]]

writer = pd.ExcelWriter(output_file)
data_frame_column_by_index.to_excel(
    writer, sheet_name='jan_13_output', index=False)
writer.save()
  • loc基于列名选取第2和第5列

pandas_column_by_name.py


import pandas as pd

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"

data_frame = pd.read_excel(input_file, 'january_2013', index_col=None)

data_frame_column_by_name = data_frame.loc[:, ['Customer ID', 'Purchase Date']]

writer = pd.ExcelWriter(output_file)
data_frame_column_by_name.to_excel(
    writer, sheet_name='jan_13_output', index=False)
writer.save()

操作所有sheet

  • 选取销售额大于2000的行

pandas_value_meets_condition_all_worksheets.py


import pandas as pd

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"

data_frame = pd.read_excel(input_file, sheetname=None, index_col=None)

row_output = []
for worksheet_name, data in data_frame.items():
    row_output.append(data[data['Sale Amount'].replace('$', '').
                           replace(',', '').astype(float) > 2000.0])
filtered_rows = pd.concat(row_output, axis=0, ignore_index=True)

writer = pd.ExcelWriter(output_file)
filtered_rows.to_excel(writer, sheet_name='sale_amount_gt2000', index=False)
writer.save()

  • loc基于列名选取所有sheet的第2和第5列

pandas_value_meets_condition_all_worksheets.py


import pandas as pd

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"

data_frame = pd.read_excel(input_file, sheet_name=None, index_col=None)

column_output = []
for worksheet_name, data in data_frame.items():
    column_output.append(data.loc[:, ['Customer Name', 'Sale Amount']])
selected_columns = pd.concat(column_output, axis=0, ignore_index=True)

writer = pd.ExcelWriter(output_file)
selected_columns.to_excel(
        writer, sheet_name='selected_columns_all_worksheets', index=False)
writer.save()

操作部分sheet

  • 选取销售额大于2000的行

pandas_value_meets_condition_set_of_worksheets.py


import pandas as pd

input_file = "sales_2013.xlsx"
output_file = "pandas_output.xls"

my_sheets = [0,1]
threshold = 1900.0

data_frame = pd.read_excel(input_file, sheetname=my_sheets, index_col=None)

row_list = []
for worksheet_name, data in data_frame.items():
    row_list.append(data[data['Sale Amount'].replace('$', '').
                         replace(',', '').astype(float) > threshold])
filtered_rows = pd.concat(row_list, axis=0, ignore_index=True)

writer = pd.ExcelWriter(output_file)
filtered_rows.to_excel(writer, sheet_name='set_of_worksheets', index=False)
writer.save()

处理多个excel

  • 连接concat

pandas_concat_data_from_multiple_workbooks.py


import pandas as pd
import glob
import os

input_path = "/media/andrew/6446FA2346F9F5A0/code/foundations-for-analytics-\
with-python/excel"
output_file = "pandas_output.xls"

all_workbooks = glob.glob(os.path.join(input_path,'*.xls*'))
data_frames = []
for workbook in all_workbooks:
    all_worksheets = pd.read_excel(
            workbook, sheet_name=None, index_col=None)
    for worksheet_name, data in all_worksheets.items():
        data_frames.append(data)
all_data_concatenated = pd.concat(data_frames, axis=0, ignore_index=True)

writer = pd.ExcelWriter(output_file)
all_data_concatenated.to_excel(
        writer, sheet_name='all_data_all_workbooks', index=False)
writer.save()
  • 求和

pandas_sum_average_multiple_workbooks.py


import pandas as pd
import glob
import os

input_path = "/media/andrew/6446FA2346F9F5A0/code/foundations-for-analytics-\
with-python/excel"
output_file = "pandas_output.xls"

all_workbooks = glob.glob(os.path.join(input_path,'*.xls*'))
data_frames = []
for workbook in all_workbooks:
    all_worksheets = pd.read_excel(workbook, sheetname=None, index_col=None)
    workbook_total_sales = []
    workbook_number_of_sales = []
    worksheet_data_frames = []
    worksheets_data_frame = None
    workbook_data_frame = None
    for worksheet_name, data in all_worksheets.items():
        total_sales = pd.DataFrame(
            [float(str(value).strip('$').replace(',','')) for value in 
             data.ix[:, 'Sale Amount']]).sum()
        number_of_sales = len(data.loc[:, 'Sale Amount'])
        average_sales = pd.DataFrame(total_sales / number_of_sales)

        workbook_total_sales.append(total_sales)
        workbook_number_of_sales.append(number_of_sales)

        data = {
    'workbook': os.path.basename(workbook),
                'worksheet': worksheet_name,
                'worksheet_total': total_sales,
                'worksheet_average': average_sales}

        worksheet_data_frames.append(
            pd.DataFrame(data, 
                         columns=['workbook', 'worksheet', 'worksheet_total', 
                                  'worksheet_average']))
    worksheets_data_frame = pd.concat(
        worksheet_data_frames, axis=0, ignore_index=True)

    workbook_total = pd.DataFrame(workbook_total_sales).sum()
    workbook_total_number_of_sales = pd.DataFrame(
        workbook_number_of_sales).sum()
    workbook_average = pd.DataFrame(
        workbook_total / workbook_total_number_of_sales)

    workbook_stats = {
    'workbook': os.path.basename(workbook),
                      'workbook_total': workbook_total,
                      'workbook_average': workbook_average}

    workbook_stats = pd.DataFrame(workbook_stats, 
                                  columns=['workbook', 'workbook_total',
                                           'workbook_average'])
    workbook_data_frame = pd.merge(
        worksheets_data_frame, workbook_stats, on='workbook', how='left')
    data_frames.append(workbook_data_frame)

all_data_concatenated = pd.concat(data_frames, axis=0, ignore_index=True)

writer = pd.ExcelWriter(output_file)
all_data_concatenated.to_excel(
    writer, sheet_name='sums_and_averages', index=False)
writer.save()

使用excel绘制图表


import pandas as pd
import random

# Some sample data to plot.
cat_1 = ['y1', 'y2', 'y3', 'y4']
index_1 = range(0, 21, 1)
multi_iter1 = {
    'index': index_1}
for cat in cat_1:
    multi_iter1[cat] = [random.randint(10, 100) for x in index_1]

# Create a Pandas dataframe from the data.
index_2 = multi_iter1.pop('index')
df = pd.DataFrame(multi_iter1, index=index_2)
df = df.reindex(columns=sorted(df.columns))

# Create a Pandas Excel writer using XlsxWriter as the engine.
excel_file = 'legend.xlsx'
sheet_name = 'Sheet1'

writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')
df.to_excel(writer, sheet_name=sheet_name)

# Access the XlsxWriter workbook and worksheet objects from the dataframe.
workbook = writer.book
worksheet = writer.sheets[sheet_name]

# Create a chart object.
chart = workbook.add_chart({
    'type': 'line'})

# Configure the series of the chart from the dataframe data.
for i in range(len(cat_1)):
    col = i + 1
    chart.add_series({
        'name':       ['Sheet1', 0, col],
        'categories': ['Sheet1', 1, 0, 21, 0],
        'values':     ['Sheet1', 1, col, 21, col],
    })

# Configure the chart axes.
chart.set_x_axis({
    'name': 'Index'})
chart.set_y_axis({
    'name': 'Value', 'major_gridlines': {
    'visible': False}})

# Insert the chart into the worksheet.
worksheet.insert_chart('G2', chart)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

讨论 钉钉免费群21745728 qq群144081101 567351477

参考资料:http://pandas-xlsxwriter-charts.readthedocs.io/

相关文章
|
1月前
|
数据采集 存储 数据挖掘
Python数据分析:Pandas库的高效数据处理技巧
【10月更文挑战第27天】在数据分析领域,Python的Pandas库因其强大的数据处理能力而备受青睐。本文介绍了Pandas在数据导入、清洗、转换、聚合、时间序列分析和数据合并等方面的高效技巧,帮助数据分析师快速处理复杂数据集,提高工作效率。
68 0
|
2月前
|
人工智能 Python
读取excel工具:openpyxl | AI应用开发
`openpyxl` 是一个 Python 库,专门用于读写 Excel 2010 xlsx/xlsm/xltx/xltm 文件。它是处理 Excel 文件的强大工具,可以让你在不需要安装 Excel 软件的情况下,对 Excel 文件进行创建、修改、读取和写入操作【10月更文挑战第3天】
85 0
|
9天前
|
监控 数据可视化 数据挖掘
数据看板制作工具评测:这6款工具能如何提升企业的数据分析效率?
本文介绍了6款数据看板制作工具,包括板栗看板、Tableau、Power BI、Qlik Sense、Google Data Studio和Looker,从功能、适用场景等方面进行了详细对比,旨在帮助企业选择最合适的工具以实现高效的数据可视化和管理决策。
|
14天前
|
数据挖掘 关系型数据库 Serverless
利用数据分析工具评估特定业务场景下扩缩容操作对性能的影响
通过以上数据分析工具的运用,可以深入挖掘数据背后的信息,准确评估特定业务场景下扩缩容操作对 PolarDB Serverless 性能的影响。同时,这些分析结果还可以为后续的优化和决策提供有力的支持,确保业务系统在不断变化的环境中保持良好的性能表现。
24 2
|
18天前
|
数据可视化 数据挖掘 项目管理
打破协作壁垒,Excel多人协同编辑工具带来翻天覆地的变化!
在现代办公中,团队协作和信息共享至关重要。Excel的多人协同编辑功能显著提升了工作效率,避免了版本冲突和重复劳动。市场上的Google Sheets、Airtable、板栗看板和Zoho Sheet等工具也提供了类似功能。以其清晰的界面和强大的数据分析能力,特别适合项目管理和进度追踪,帮助团队高效协作,达成目标。
|
1月前
|
数据处理 Python
在数据科学领域,Pandas和NumPy是每位数据科学家和分析师的必备工具
在数据科学领域,Pandas和NumPy是每位数据科学家和分析师的必备工具。本文通过问题解答形式,深入探讨Pandas与NumPy的高级操作技巧,如复杂数据筛选、分组聚合、数组优化及协同工作,结合实战演练,助你提升数据处理能力和工作效率。
40 5
|
1月前
|
SQL 数据可视化 数据挖掘
想让Excel表格设计更美观?试试这几款好用工具!
Excel表格设计在项目管理和数据分析中至关重要。本文推荐四款辅助工具:板栗看板、Excel自动图表助手、Think-Cell Chart 和 Power BI,分别在任务管理、图表生成、数据可视化等方面表现突出,帮助你设计出更专业、美观的表格。
54 2
|
1月前
|
机器学习/深度学习 数据采集 数据挖掘
解锁 Python 数据分析新境界:Pandas 与 NumPy 高级技巧深度剖析
Pandas 和 NumPy 是 Python 中不可或缺的数据处理和分析工具。本文通过实际案例深入剖析了 Pandas 的数据清洗、NumPy 的数组运算、结合两者进行数据分析和特征工程,以及 Pandas 的时间序列处理功能。这些高级技巧能够帮助我们更高效、准确地处理和分析数据,为决策提供支持。
40 2
|
1月前
|
存储 数据挖掘 数据处理
Python数据分析:Pandas库的高效数据处理技巧
【10月更文挑战第26天】Python 是数据分析领域的热门语言,Pandas 库以其高效的数据处理功能成为数据科学家的利器。本文介绍 Pandas 在数据读取、筛选、分组、转换和合并等方面的高效技巧,并通过示例代码展示其实际应用。
40 2
|
1月前
|
数据处理
在Excel中,通配符是一种强大的工具
【10月更文挑战第23天】在Excel中,通配符是一种强大的工具
22 4