开发者社区 > 大数据与机器学习 > 大数据开发治理DataWorks > 正文

DataWorks我试了csv 和 text 都可以,包我也装了,但是解析不了excel是为啥?

DataWorks我把包传上去了,但是解析不了excel。 我试了csv 和 text 都可以,包我也装了?image.png
image.png
image.png
就是要给你加一个tar,两个都选过都一样,现在我可以解压也可以导入包了,但是解析excel的时候报错。File "", line 18, in
excelData = pd.read_excel('a.xlsx')
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 824, in init
self._reader = self._enginesengine
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 21, in init
super().init(filepath_or_buffer)
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 353, in init
self.book = self.load_workbook(filepath_or_buffer)
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 36, in load_workbook
return open_workbook(filepath_or_buffer)
File "/home/admin/alisatasknode/taskinfo/20230908/datastudio/13/48/05/a7vn7osd4tfj85n6fqwy793d/packages/xlrd/init.py", line 166, in open_workbook
file_format = inspect_format(filename, file_contents)
File "/home/admin/alisatasknode/taskinfo/20230908/datastudio/13/48/05/a7vn7osd4tfj85n6fqwy793d/packages/xlrd/init.py", line 67, in inspect_format
zf = zipfile.ZipFile(timemachine.BYTES_IO(content) if content else path)
File "/home/tops/lib/python3.7/zipfile.py", line 1222, in init
self._RealGetContents()
File "/home/tops/lib/python3.7/zipfile.py", line 1307, in _RealGetContents
fp.seek(self.start_dir, 0)
OSError: [Errno 22] Invalid argument,这里没问题,但是在下载到沙箱环境里面就多了一个tar,image.png
image.png
python 执行操作系统命令看起来就是这个容器里面的这个文件多了一个tar你建一个pyodps3的任务,import os
import sys

@resource_reference{"db_conf.xlsx"}

@resource_reference{"xlrd.gz"}

@resource_reference{"a.xlsx"}

os.system('ls -trl') 这样试下,其他文件都是正常的,就这个tar文件不正常,可能是dataworks 识别他是 tar 格式的文件,然后重新命名了吧
现在我可以解压也可以导入包了,但是解析excel的时候报错。File "", line 18, in
excelData = pd.read_excel('a.xlsx')
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 824, in init
self._reader = self._enginesengine
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 21, in init
super().init(filepath_or_buffer)
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 353, in init
self.book = self.load_workbook(filepath_or_buffer)
File "/home/tops/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 36, in load_workbook
return open_workbook(filepath_or_buffer)
File "/home/admin/alisatasknode/taskinfo/20230908/datastudio/13/48/05/a7vn7osd4tfj85n6fqwy793d/packages/xlrd/init.py", line 166, in open_workbook
file_format = inspect_format(filename, file_contents)
File "/home/admin/alisatasknode/taskinfo/20230908/datastudio/13/48/05/a7vn7osd4tfj85n6fqwy793d/packages/xlrd/init.py", line 67, in inspect_format
zf = zipfile.ZipFile(timemachine.BYTES_IO(content) if content else path)
File "/home/tops/lib/python3.7/zipfile.py", line 1222, in init
self._RealGetContents()
File "/home/tops/lib/python3.7/zipfile.py", line 1307, in _RealGetContents
fp.seek(self.start_dir, 0)
OSError: [Errno 22] Invalid argument

展开
收起
真的很搞笑 2023-09-12 16:53:55 61 0
1 条回答
写回答
取消 提交回答
  • 资源类型是不是选了fileimage.png
    试了一下没能复现 是不是python输出做了处理 list resources;看下 image.png
    建一个odps sql任务执行,解析excel的问题 如果是pyodps脚本的问题,此回答整理自钉群“DataWorks交流群(答疑@机器人)”

    2023-09-12 22:33:55
    赞同 展开评论 打赏

DataWorks基于MaxCompute/Hologres/EMR/CDP等大数据引擎,为数据仓库/数据湖/湖仓一体等解决方案提供统一的全链路大数据开发治理平台。

相关产品

  • 大数据开发治理平台 DataWorks
  • 相关电子书

    更多
    DataWorks数据集成实时同步最佳实践(含内测邀请)-2020飞天大数据平台实战应用第一季 立即下载
    DataWorks调度任务迁移最佳实践-2020飞天大数据平台实战应用第一季 立即下载
    基于DataWorks数据服务构建疫情大屏-2020飞天大数据平台实战应用第一季 立即下载

    相关镜像