Installing a Dependency Library for Function Compute

本文涉及的产品
函数计算FC,每月15万CU 3个月
Serverless 应用引擎免费试用套餐包,4320000 CU,有效期3个月
简介: In common programming practice, projects, libraries, and system environments must be installed and configured in synergy.

In common programming practice, projects, libraries, and system environments must be installed and configured in synergy. Alibaba Cloud Function Compute runs in a prefabricated runtime environment, which pursues higher concurrency and security by sacrificing flexibility. Because the system and code directories are read-only during runtime, dependency libraries need to be pre-installed to the code but not system directory. The installation tool of the new Function Compute platform cannot yet address these changes. This article explains how to use existing tools to install the dependency library to a project with minimal manual intervention.

Two types of dependent packages are required for developing Function Compute, one of which is the DEB software package installed by the APT package manager. The other is the packages installed by a specific language environment manager (such as Maven or pip). The following analyzes the package managers in different language environments.

Installation Directory of a Package Manager

Currently, Function Compute supports the Java, Python, and Node.js environments. The package managers for these three languages are Maven, pip, and NPM respectively. The following describes the installation directory of each of these package managers.

maven

Maven is the package manager for Java. Maven downloads the dependencies declared in the project file pom.xml from the central repository or a private repository to the $M2_HOME/repository directory. The default value of M2_HOME is $HOME/.m2. All Java projects on a development machine share the JAR packages under this local repository directory. During the mvn package phase, all the dependent JAR packages are packaged into the final deliverable. Therefore, Java projects run without depending on any files in the $M2_HOME/repository directory.

pip

Currently, pip is the most popular and recommended Python package manager. Before understanding how to install the installation package to a local directory, you need to learn more about the Python package manager. To help improve your understanding, the following first describes the development history of the Python package manager.

Before 2004, setup.py was recommended for Python installation. To use it, download any module and use the setup.py file supplied with the module.

python setup.py install

setup.py is developed from Distutils. Released in 2000 as part of the Python standard repository, Distutils is used to build and install Python modules.

Therefore, you can also use setup.py to release a Python module or

python setup.py sdist

even package the module into an RPM or EXE file.

python setup.py bdist_rpm
python setup.py bdist_wininst

Similarly to MakeFile, setup.py can be used for building and installation. However, the building and installation processes are integrated, thus you always have to build a module when installing the module, which wastes some resources. In 2004, the Python community released setuptools, which contains the easy_install tool. After that, Python began to support the EGG format and introduced the online repository PyPi, which respectively correspond to the JAR format and the Maven repository from the JAVA community.

The online module repository PyPi offers two key advantages:

  1. It only requires the installation of the pre-compiled EGG package, improving efficiency.
  2. It automatically downloads dependent packages from PyPi and installs them.

Since its release in 2008, pip has gradually replaced easy_install to become the de facto standard Python package manager. As it is compatible with the EGG format, pip prefers the Wheel format and supports installation of the module from a code version repository (for example, GitHub).

The following describes the directory structure of a Python module. The directories of installation files in both EGG and Wheel formats are classified into five types: purelib, platlib, headers, scripts, and data.

Directory Installation location Purpose
purelib $prefix/lib/pythonX.Y/site-packages Pure Python implementation library
platlib $exec-prefix/lib/pythonX.Y/site-packages Platform-related DLL
headers $prefix/include/pythonX.Yabiflags/distname C header files
script $prefix/bin Executable files
data $prefix Data files, such as .conf configuration files and SQL initialization files

$prefix and $exec-prefix are Python compiler parameters, which can be retrieved from sys.prefix and sys.exec_prefix. Their defaults on Linux are both /usr/local.

npm

NPM is the package manager for Node.js. After the npm install command is run, dependent packages are downloaded to the node_modules sub-directory under the current directory. Node.js runtime dependencies are all included in the current directory. However, some Node.js libraries depend on the local environment that was built when the module was installed. The locally dependent libraries cannot run if the build environment (such as Windows) is different from the runtime environment (such as Linux). Additionally, if development libraries and runtime libraries are installed during building, DDLs that were locally installed by the operating system package manager (such as apt-get) may not exist in the container under the runtime environment.

Troubleshooting Problems

Next, you will see how to resolve problems that occur when dependent libraries of Function Compute are installed.

Dependencies Installed to the Global System Directory

Maven and pip install dependent packages to a system directory other than the project directory. When a project is built, Maven packages all external dependencies to the final deliverable. Therefore, projects managed by Maven are free from dependency-related problems during runtime. For JAVA projects not managed by Maven, it is also a common practice to place dependent JAR packages into the current directory or its sub-directory and package them to the final deliverable. In this way, dependency-related problems are prevented during Java runtime. However, such problems occur in pip-managed Python environments. pip installs dependencies to the system directory, but the production environment (except the /tmp directory) of Function Compute is read-only and the prefabricated environment cannot be built.

Native Dependencies

Common Python and Node.js library files depend on the native environment of the system. The DDLs in the compilation and runtime environments must be installed, resulting in poor portability in both cases.

When Function Compute runs on Debian or Ubuntu, the APT package is used to manage system installation programs and libraries. By default, these programs and libraries are installed to a system directory, for example, /usr/bin, /usr/lib, /usr/local/bin, or /usr/local/lib. Therefore, native dependencies also need to be installed to a local directory.

Recommended Solutions

Several intuitive solutions are described as follows:

  1. Ensure that the development system for dependency installation is consistent with the production execution system. Use fcli sbox to install the dependencies.
  2. Place all the dependency files in a local directory. Copy the modules, executable files, and .ddl or .so files from pip to the current directory.

However, in practice, it is difficult to place the dependency files into the current directory.

  1. Library files installed by pip and apt-get are scattered to different directories. This means that you must be familiar with different package managers in order to find these files.
  2. Library files have transitive dependencies. When a library is installed, other libraries on which the library depends are also installed. This makes it very tedious to manually retrieve these dependencies.

In this case, how can we manually install dependencies to the current directory with minimal manual intervention? The following describes some methods used by the pip and APT package managers and compares their pros and cons.

Installation of Dependencies to the Current Directory

Python

Method 1: Use the --install-option parameter

pip install --install-option="--install-lib=$(pwd)" PyMySQL

When --install-option is used, parameters are passed to setup.py. However, neither the .egg nor the .whl files contains the setup.py file. Therefore, using --install-option triggers the installation procedure based on the source code package while setup.py triggers the module building process.

--install-option supports the following options:

File type Option
Python modules --install-purelib
extension modules --install-platlib
all modules --install-lib
scripts --install-scripts
data --install-data
C headers --install-headers

When --install-lib is used, the values of --install-purelib and --install-platlib are overwritten.

In addition, --install-option="--prefix=$(pwd)" supports installation to the current directory, but a sub-directory named lib/python2.7/site-packages will be created under the current directory.

Advantages:

  1. You can install the module to a local directory, such as purelib.

Disadvantages:

  1. This method is inapplicable to modules that do not contain source code packages.
  2. A system is built without making full use of the Wheel package.
  3. To fully install the module, many more parameters need to be configured, which is tedious.

Method 2: Use the --target or -t parameter

pip install --target=$(pwd) PyMySQL

--target is a parameter newly provided by pip. When this parameter is used, the module is directly installed to the current directory without creating the sub-directory named lib/python2.7/site-packages. This method is easy to use and is applicable for modules with a few dependencies.

Method 3: Use PYTHONUSERBASE in conjunction with --user

PYTHONUSERBASE=$(pwd) pip install --user PyMySQL

When --user is used, the module is installed to the site.USER_BASE directory. The default value of this directory is ~/.local for Linux, ~/Library/Python/X.Y for MacOS, and %APPDATA%\Python for Windows. The environment variable PYTHONUSERBASE can be used to change the value of site.USER_BASE.

Similar to --prefix=, when --user is used, the sub-directory named lib/python2.7/site-packages is created.

Method 4: Use virtualenv

pip install virtualenv
virtualenv path/to/my/virtual-env
source path/to/my/virtual-env/bin/activate
pip install PyMySQL

virutalenv is a recommended method from the Python community because it does not contaminate the global environment. When virtualenv is used, both desired modules (such as PyMySQL) and package managers (such as setuptools, pip, and wheel) are saved to a local directory. Although these modules increase the size of the package, they are not used during runtime.

apt-get

DDLs and executable files installed by apt-get also need to be installed to a local directory. After trying the chroot and apt-get -o RootDir=$(pwd) methods recommended on the Internet, we have discarded them because they are defective. Starting on a foundation of the preceding methods, we have made improvements and designed a method that uses apt-get to download DEB packages and uses dpkg to install these packages.

apt-get install -d -o=dir::cache=$(pwd) libx11-6 libx11-xcb1 libxcb1
for f in $(ls ./archives/*.deb)
do 
    dpkg -x $pwd/archives/$f $pwd
done

Running Method

Java loads jar and class files by setting classpaths. nodejs automatically loads the packages under node_modules in the current directory. Here, these common operations are omitted.

Python

Python loads the module file from the directory list that sys.path points to.

> import sys
> print '\n'.join(sys.path)

/usr/lib/python2.7
/usr/lib/python2.7/plat-x86_64-linux-gnu
/usr/lib/python2.7/lib-tk
/usr/lib/python2.7/lib-old
/usr/lib/python2.7/lib-dynload
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages

By default, sys.path includes the current directory. Therefore, the second method can ignore setting sys.path because the module is installed in the current directory when you use the -- target or -t parameter.

You can use sys.path.append(dir) when the program starts because sys.path is an editable array. To improve the portability of the program, you can also use the PYTHONPATH environment variable.

export PYTHONPATH=$PYTHONPATH:$(pwd)/lib/python2.7/site-packages

apt-get

Ensure that executable files and DDLs installed using apt-get are available in the directory list set by the PATH and LD_LIBRARY_PATH environment variables.

PATH

The PATH variable indicates a list of directories that the system uses to search for executable programs. Add bin or sbin directories such as bin, usr/bin, and usr/local/bin to the PATH variable.

export PATH=$(pwd)/bin:$(pwd)/usr/bin:$(pwd)/usr/local/bin:$PATH

Note that the preceding content is applicable to Bash. For Java, Python, and node.js, make adjustments accordingly when modifying the PATH environment variable of the current process.

LD_LIBRARY_PATH

Similar to PATH, LD_LIBRARY_PATH is a directory list in which you can search for DDLs. Typically, the system places dynamic links in the /lib, /usr/lib, and /usr/local/lib directories. Some modules are placed into the sub-directories of these directories, such as /usr/lib/x86_64-linux-gnu. Typically, these sub-directories are recorded in the files under /etc/ld.so.conf.d/.

cat /etc/ld.so.conf.d/x86_64-linux-gnu.conf
# Multiarch support
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu

Therefore, the so files in the directories declared in the files under $(pwd)/etc/ld.so.conf.d/ also must be retrievable from the directory list set by the LD_LIBRARY_PATH environment variable.

Note that modifications to the LD_LIBRARY_PATH environment variable during runtime may not take effect, which is true, at least, for Python. In the LD_LIBRARY_PATH variable, the /code/lib directory has been preset. Therefore, you can soft-link all the dependent so files to the /code/ lib directory.

Conclusion

This document explains how to run the pip and apt-get commands to install libraries in the local directory and set environment variables during runtime so that the program can find the installed local library files.

The four methods provided by Python are applicable to any common scenario. Despite the slight differences described above, you can choose an appropriate method based on your needs.

apt-get is another method. Compared with other methods, this method reduces the package size because it does not require to install the deb package that is already installed in the system. To further reduce the size, you can delete unnecessary files that have been installed, such as the user manual.

This document is part of the technology accumulation process for customizing better tools. On this basis, we will provide better tools in the future to simplify development.

References

  1. How does python find packages?
  2. Pip User Guide
  3. python-lambda-local
  4. python-lambda
  5. Guide to Python package management tools
  6. Running apt-get for another partition/directory?
相关实践学习
如何快速连接云数据库RDS MySQL
本场景介绍如何通过阿里云数据管理服务DMS快速连接云数据库RDS MySQL,然后进行数据表的CRUD操作。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助     相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
数据采集 Serverless API
在函数计算(Function Compute,FC)中部署Stable Diffusion(SD)
在函数计算(Function Compute,FC)中部署Stable Diffusion(SD)
330 2
|
4月前
|
存储 Serverless 数据库
Function Compute
【9月更文挑战第19天】
29 1
|
8月前
|
运维 监控 JavaScript
【阿里云云原生专栏】Serverless架构下的应用部署与运维:阿里云Function Compute深度探索
【5月更文挑战第21天】阿里云Function Compute是事件驱动的无服务器计算服务,让用户无需关注基础设施,专注业务逻辑。本文详述了在FC上部署应用的步骤,包括创建函数、编写代码和部署,并介绍了运维功能:监控告警、日志管理、版本管理和授权管理,提供高效低成本的计算服务。
325 6
|
8月前
|
运维 监控 Dubbo
SAE(Serverless App Engine)和FC(Function Compute)
【1月更文挑战第18天】【1月更文挑战第89篇】SAE(Serverless App Engine)和FC(Function Compute)
230 1
|
8月前
|
存储 Serverless
在阿里云函数计算(Function Compute)中,上传模型的步骤如下
在阿里云函数计算(Function Compute)中,上传模型的步骤如下
330 2
|
监控 前端开发 Serverless
阿里云函数计算(Function Compute,FC)是一种事件驱动的计算服务
阿里云函数计算(Function Compute,FC)是一种事件驱动的计算服务
425 1
|
运维 JavaScript Serverless
Function Compute
函数计算(Function Compute)是云计算领域的一种服务模型,由云服务提供商(例如阿里云、AWS、Google Cloud 等)提供。它是一种无服务器计算服务,允许开发者编写和部署函数,以响应事件触发,而无需管理底层的服务器和基础设施。函数计算提供了弹性的计算资源分配、按需计费、自动扩缩容等特性,使开发者能够聚焦于编写函数逻辑而不必担心底层的运维工作。
310 2
|
Serverless
函数计算(Function Compute)部署失败可能有多种原因
函数计算(Function Compute)部署失败可能有多种原因
175 2
|
弹性计算 监控 负载均衡
阿里云函数计算(Function Compute):快速高效的事件驱动计算
阿里云函数计算(Function Compute)是一种事件驱动计算服务,能够在阿里云上运行代码,且只按照实际使用时间付费。它无需管理服务器和基础架构,并可以与其他阿里云产品以及第三方服务集成,为用户提供了快速、高效、低成本、弹性的云计算能力。
|
数据采集 消息中间件 监控
Function Compute构建高弹性大数据采集系统
解决问题: 1.利用服务器自建数据采集系统成本高,弹性不足。 2.利用服务器自建数据采集系统运维复杂,成本高。
Function Compute构建高弹性大数据采集系统