我知道Python没有强类型,并且它不支持指定返回类型的关键字,例如Java和C中的void
,int
等。我也知道我们可以使用类型提示来告诉用户他们可以从函数中返回某种特定类型的东西。
我正在尝试实现一个Python类,该类将读取一个配置文件(例如,一个JSON文件),该配置文件指示应在pandas
数据帧上应用哪些数据转换方法。配置文件如下所示:
[
{
"input_folder_path": "./input/budget/",
"input_file_name_or_pattern": "Global Budget Roll-up_9.16.19.xlsx",
"sheet_name_of_excel_file": "Budget Roll-Up",
"output_folder_path": "./output/budget/",
"output_file_name_prefix": "transformed_budget_",
"__comment__": "(Optional) File with Python class that houses data transformation functions, which will be imported and used in the transform process. If not provided, then the code will use default class in the 'transform_function.py' file.",
"transform_functions_file": "./transform_functions/budget_transform_functions.py",
"row_number_of_column_headers": 0,
"row_number_where_data_starts": 1,
"number_of_rows_to_skip_from_the_bottom_of_the_file": 0,
"__comment__": "(Required) List of the functions and their parameters.",
"__comment__": "These functions must be defined either in transform_functions.py or individual transformation file such as .\\transform_function\\budget_transform_functions.py",
"functions_to_apply": [
{
"__function_comment__": "Drop empty columns in Budget roll up Excel file. No parameters required.",
"function_name": "drop_unnamed_columns"
},
{
"__function_comment__": "By the time we run this function, there should be only 13 columns total remaining in the raw data frame.",
"function_name": "assert_number_of_columns_equals",
"function_args": [13]
},
{
"__function_comment__": "Map raw channel names 'Ecommerce' and 'ecommerce' to 'E-Commerce'.",
"transform_function_name": "standardize_to_ecommerce",
"transform_function_args": [["Ecommerce", "ecommerce"]]
}
]
}
]
在main.py
代码中,我有类似以下内容:
if __name__ == '__main__':
# 1. Process arguments passed into the program
parser = argparse.ArgumentParser(description=transform_utils.DESC,
formatter_class = argparse.RawTextHelpFormatter,
usage=argparse.SUPPRESS)
parser.add_argument('-c', required=True, type=str,
help=transform_utils.HELP)
args = parser.parse_args()
# 2. Load JSON configuration file
if (not args.c) or (not os.path.exists(args.c)):
raise transform_errors.ConfigFileError()
# 3. Iterate through each transform procedure in config file
for config in transform_utils.load_config(args.c):
output_file_prefix = transform_utils.get_output_file_path_with_name_prefix(config)
custom_transform_funcs_module = transform_utils.load_custom_functions(config)
row_idx_where_data_starts = transform_utils.get_row_index_where_data_starts(config)
footer_rows_to_skip = transform_utils.get_number_of_rows_to_skip_from_bottom(config)
for input_file in transform_utils.get_input_files(config):
print("Processing file:", input_file)
col_headers_from_input_file = transform_utils.get_raw_column_headers(input_file, config)
if transform_utils.is_excel(input_file):
sheet = transform_utils.get_sheet(config)
print("Skipping this many rows (including header row) from the top of the file:", row_idx_where_data_starts)
cur_df = pd.read_excel(input_file,
sheet_name=sheet,
skiprows=row_idx_where_data_starts,
skipfooter=footer_rows_to_skip,
header=None,
names=col_headers_from_input_file
)
custom_funcs_instance = custom_transform_funcs_module.TaskSpecificTransformFunctions()
for func_and_params in transform_utils.get_functions_to_apply(config):
print("=>Invoking transform function:", func_and_params)
func_args = transform_utils.get_transform_function_args(func_and_params)
func_kwargs = transform_utils.get_transform_function_kwargs(func_and_params)
cur_df = getattr(custom_funcs_instance,
transform_utils.get_transform_function_name(
func_and_params))(cur_df, \*unc_args, \*func_kwargs)
In budget_transform_functions.py
file, I have:
class TaskSpecificTransformFunctions(TransformFunctions):
def drop_unnamed_columns(self, df):
"""
Drop columns that have 'Unnamed' as column header, which is a usual
occurrence for some Excel/CSV raw data files with empty but hidden columns.
Args:
df: Raw dataframe to transform.
params: We don't need any parameter for this function,
so it's defaulted to None.
Returns:
Dataframe whose 'Unnamed' columns are dropped.
"""
return df.loc[:, ~df.columns.str.contains(r'Unnamed')]
def assert_number_of_columns_equals(self, df, num_of_cols_expected):
"""
Assert that the total number of columns in the dataframe
is equal to num_of_cols (int).
Args:
df: Raw dataframe to transform.
num_of_cols_expected: Number of columns expected (int).
Returns:
The original dataframe is returned if the assertion is successful.
Raises:
ColumnCountMismatchError: If the number of columns found
does not equal to what is expected.
"""
if df.shape[1] != num_of_cols_expected:
raise transform_errors.ColumnCountError(
' '.join(["Expected column count of:", str(num_of_cols_expected),
"but found:", str(df.shape[1]), "in the current dataframe."])
)
else:
print("Successfully check that the current dataframe has:", num_of_cols_expected, "columns.")
return df
如您所见,我需要Future_transform_functions.py的future来执行,TaskSpecificTransformFunctions中的函数必须始终返回pandas数据帧。我知道在Java中,您可以创建一个接口,实现该接口的任何人都必须遵守该接口中每个方法的返回值。我想知道我们在Python中是否具有类似的构造(或解决方法,可以实现类似的目的)。
希望这个冗长的问题有意义,并且我希望拥有比我更多的Python经验的人能够教我一些有关此的知识。预先非常感谢您的回答/建议!
问题来源: stackoverflow
至少在运行时检查函数返回类型的一种方法是将函数包装在另一个检查返回类型的函数中。为了自动化子类,有init_subclass
。可以按以下方式使用(尚需打磨和处理特殊情况):
import pandas as pd
def wrapCheck(f):
def checkedCall(\*rgs, \*kwargs):
r = f(\*rgs, \*kwargs)
if not isinstance(r, pd.DataFrame):
raise Exception(f"Bad return value of {f.__name__}: {r!r}")
return r
return checkedCall
class TransformFunctions:
def __init_subclass__(cls, \*kwargs):
super().__init_subclass__(\*kwargs)
for k, v in cls.__dict__.items():
if callable(v):
setattr(cls, k, wrapCheck(v))
class TryTransform(TransformFunctions):
def createDf(self):
return pd.DataFrame(data={"a":[1,2,3], "b":[4,5,6]})
def noDf(self, a, b):
return a + b
tt = TryTransform()
print(tt.createDf()) # Works
print(tt.noDf(2, 2)) # Fails with exception
回答来源:stackoverflow
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。