开发者社区> 问答> 正文

Python 有没有一种方法可以在限定类中的函数必须返回特定数据类型的内容?

我知道Python没有强类型,并且它不支持指定返回类型的关键字,例如Java和C中的voidint等。我也知道我们可以使用类型提示来告诉用户他们可以从函数中返回某种特定类型的东西。

我正在尝试实现一个Python类,该类将读取一个配置文件(例如,一个JSON文件),该配置文件指示应在pandas数据帧上应用哪些数据转换方法。配置文件如下所示:

[
  {
    "input_folder_path": "./input/budget/",
    "input_file_name_or_pattern": "Global Budget Roll-up_9.16.19.xlsx",
    "sheet_name_of_excel_file": "Budget Roll-Up",
    "output_folder_path": "./output/budget/",
    "output_file_name_prefix": "transformed_budget_",

    "__comment__": "(Optional) File with Python class that houses data transformation functions, which will be imported and used in the transform process. If not provided, then the code will use default class in the 'transform_function.py' file.",
    "transform_functions_file": "./transform_functions/budget_transform_functions.py",

    "row_number_of_column_headers": 0,
    "row_number_where_data_starts": 1,
    "number_of_rows_to_skip_from_the_bottom_of_the_file": 0,

    "__comment__": "(Required) List of the functions and their parameters.",
    "__comment__": "These functions must be defined either in transform_functions.py or individual transformation file such as .\\transform_function\\budget_transform_functions.py",
    "functions_to_apply": [
      {
        "__function_comment__": "Drop empty columns in Budget roll up Excel file. No parameters required.",
        "function_name": "drop_unnamed_columns"
      },
      {
        "__function_comment__": "By the time we run this function, there should be only 13 columns total remaining in the raw data frame.",
        "function_name": "assert_number_of_columns_equals",
        "function_args": [13]
      },
      {
        "__function_comment__": "Map raw channel names 'Ecommerce' and 'ecommerce' to 'E-Commerce'.",
        "transform_function_name": "standardize_to_ecommerce",
        "transform_function_args": [["Ecommerce", "ecommerce"]]
      }
    ]
  }
]

main.py代码中,我有类似以下内容:

if __name__ == '__main__':
    # 1. Process arguments passed into the program
    parser = argparse.ArgumentParser(description=transform_utils.DESC,
                                     formatter_class = argparse.RawTextHelpFormatter,
                                     usage=argparse.SUPPRESS)
    parser.add_argument('-c', required=True, type=str,
                        help=transform_utils.HELP)
    args = parser.parse_args()

    # 2. Load JSON configuration file
    if (not args.c) or (not os.path.exists(args.c)):
        raise transform_errors.ConfigFileError()

    # 3. Iterate through each transform procedure in config file
    for config in transform_utils.load_config(args.c):
        output_file_prefix = transform_utils.get_output_file_path_with_name_prefix(config)
        custom_transform_funcs_module = transform_utils.load_custom_functions(config)

        row_idx_where_data_starts = transform_utils.get_row_index_where_data_starts(config)
        footer_rows_to_skip = transform_utils.get_number_of_rows_to_skip_from_bottom(config)

        for input_file in transform_utils.get_input_files(config):
            print("Processing file:", input_file)
            col_headers_from_input_file = transform_utils.get_raw_column_headers(input_file, config)

            if transform_utils.is_excel(input_file):
                sheet = transform_utils.get_sheet(config)
                print("Skipping this many rows (including header row) from the top of the file:", row_idx_where_data_starts)
                cur_df = pd.read_excel(input_file,
                                       sheet_name=sheet,
                                       skiprows=row_idx_where_data_starts,
                                       skipfooter=footer_rows_to_skip,
                                       header=None,
                                       names=col_headers_from_input_file
                                       )
                custom_funcs_instance = custom_transform_funcs_module.TaskSpecificTransformFunctions()

                for func_and_params in transform_utils.get_functions_to_apply(config):
                    print("=>Invoking transform function:", func_and_params)
                    func_args = transform_utils.get_transform_function_args(func_and_params)
                    func_kwargs = transform_utils.get_transform_function_kwargs(func_and_params)
                    cur_df = getattr(custom_funcs_instance,
                                     transform_utils.get_transform_function_name(
                                         func_and_params))(cur_df, \*unc_args, \*func_kwargs)

In budget_transform_functions.py file, I have:

class TaskSpecificTransformFunctions(TransformFunctions):
    def drop_unnamed_columns(self, df):
        """
        Drop columns that have 'Unnamed' as column header, which is a usual
        occurrence for some Excel/CSV raw data files with empty but hidden columns.
        Args:
            df: Raw dataframe to transform.
            params: We don't need any parameter for this function,
                    so it's defaulted to None.

        Returns:
            Dataframe whose 'Unnamed' columns are dropped.
        """
        return df.loc[:, ~df.columns.str.contains(r'Unnamed')]

    def assert_number_of_columns_equals(self, df, num_of_cols_expected):
        """
        Assert that the total number of columns in the dataframe
        is equal to num_of_cols (int).

        Args:
            df: Raw dataframe to transform.
            num_of_cols_expected: Number of columns expected (int).

        Returns:
            The original dataframe is returned if the assertion is successful.

        Raises:
            ColumnCountMismatchError: If the number of columns found
            does not equal to what is expected.
        """
        if df.shape[1] != num_of_cols_expected:
            raise transform_errors.ColumnCountError(
                ' '.join(["Expected column count of:", str(num_of_cols_expected),
                          "but found:", str(df.shape[1]), "in the current dataframe."])
            )
        else:
            print("Successfully check that the current dataframe has:", num_of_cols_expected, "columns.")

        return df

如您所见,我需要Future_transform_functions.py的future来执行,TaskSpecificTransformFunctions中的函数必须始终返回pandas数据帧。我知道在Java中,您可以创建一个接口,实现该接口的任何人都必须遵守该接口中每个方法的返回值。我想知道我们在Python中是否具有类似的构造(或解决方法,可以实现类似的目的)。

希望这个冗长的问题有意义,并且我希望拥有比我更多的Python经验的人能够教我一些有关此的知识。预先非常感谢您的回答/建议!

问题来源: stackoverflow

展开
收起
is大龙 2020-03-24 21:39:34 1039 0
1 条回答
写回答
取消 提交回答
  • 至少在运行时检查函数返回类型的一种方法是将函数包装在另一个检查返回类型的函数中。为了自动化子类,有init_subclass。可以按以下方式使用(尚需打磨和处理特殊情况):

    import pandas as pd
    
    def wrapCheck(f):
        def checkedCall(\*rgs, \*kwargs):
            r = f(\*rgs, \*kwargs)
            if not isinstance(r, pd.DataFrame):
                raise Exception(f"Bad return value of {f.__name__}: {r!r}")
    
            return r
    
        return checkedCall
    
    
    class TransformFunctions:
    
        def __init_subclass__(cls, \*kwargs):
            super().__init_subclass__(\*kwargs)
            for k, v in cls.__dict__.items():
                if callable(v):
                    setattr(cls, k, wrapCheck(v))
    
    
    
    class TryTransform(TransformFunctions):
    
        def createDf(self):
            return pd.DataFrame(data={"a":[1,2,3], "b":[4,5,6]})
    
    
        def noDf(self, a, b):
            return a + b
    
    
    tt = TryTransform()
    
    print(tt.createDf())   # Works
    print(tt.noDf(2, 2))   # Fails with exception
    

    回答来源:stackoverflow

    2020-03-24 21:39:42
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
From Python Scikit-Learn to Sc 立即下载
Data Pre-Processing in Python: 立即下载
双剑合璧-Python和大数据计算平台的结合 立即下载