Guidelines for Function Compute Development - Crawler

本文涉及的产品
函数计算FC,每月15万CU 3个月
Serverless 应用引擎免费试用套餐包,4320000 CU,有效期3个月
简介: The Guidelines for Function Compute Development - Use Fun Local for Local Running and Debugging briefly describes how to use Fun Local for the local running and debugging of functions.

The Guidelines for Function Compute Development - Use Fun Local for Local Running and Debugging briefly describes how to use Fun Local for the local running and debugging of functions. It does not focus on Fun Local's significant efficiency improvements to Function Compute development.

This document uses the development of a simple crawler function as an example (for the codes, see Function Compute Console Template). It demonstrates how to develop a serverless crawler application that features auto scaling and charges by the number of calls.

Procedure

We develop the crawler application in multiple steps. Upon completion of each step, we will perform run verification.

1. Create a Fun project

Create a directory named image-crawler as the root directory of the project.In the directory, create a file named template.yml with the following content:

ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
  localdemo:
    Type: 'Aliyun::Serverless::Service'
    Properties:
      Description: 'local invoke demo'
    image-crawler:
      Type: 'Aliyun::Serverless::Function'
      Properties:
        Handler: index.handler
        CodeUri: code/
        Description: 'Hello world with python2.7!'
        Runtime: python2.7

For more information about the serverless application model defined in Fun, click here.

After the preceding settings, the project directory structure is as follows:

.
└── template.yml

2. Write the Hello World function code

In the root directory, create a directory named code. In the code directory, create a file named index.py containing the Hello World function:

def handler(event, context):
    return 'hello world!'

In the root directory, run the following command:

fun local invoke image-crawler

The function runs successfully:

After the preceding settings, the project directory structure is as follows:

.
├── code
│   └── index.py
└── template.yml

3. Run the function through a trigger event

Modify the code in step 2 and print this event into the log.

import logging

logger = logging.getLogger()

def handler(event, context):
    logger.info("event: " + event)
    return 'hello world!'

Run the function through a trigger event. The following result is returned.

As we can see, the function receives the trigger event properly.

For more Fun Local help information, see.

4. Obtain the web source code

Next, write the code to obtain the web content.

import logging
import json
import urllib

logger = logging.getLogger()

def handler(event, context):
    logger.info("event: " + event)
    evt = json.loads(event)
    url = evt['url']
  
    html = get_html(url)
  
    logger.info("html content length: " + str(len(html)))
    return 'Done!'

def get_html(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html

Because the code logic is simple, we directly use the Urllib library to read the web content.

Run the function. The following result is returned:

5. Parse the images on the webpage

Here, we will parse JPG images on the webpage by using regular expressions. This step is complex because it involves minor adjustments to the regular expressions.To solve the issue quickly, we decide to use the local debugging method provided by Fun Local.For more information about the local debugging method, see Guidelines for Function Compute Development - Use Fun Local for Local Running and Debugging.

First, set a breakpoint in the following line:

logger.info("html content length: " + str(len(html)))

Then, start the function in debugging mode. When VS Code is connected, the function continues running to the line with the breakpoint we set:

Click Locals. We can see local variables, including html, the html source code we obtained.Copy the value of the html variable, analyze it, and then design a regular expression.

Write a simple regular expression, for example, http:\/\/[^\s,"]*\.jpg.

How can we quickly verify that the code is correct?We can use the Watch (monitoring) function provided by VS Code for this purpose.

Create a Watch variable and enter the following value:

re.findall(re.compile(r'http:\/\/[^\s,"]*\.jpg'), html)

Press Enter. The following result is returned:

We may modify the regular expression and test it over and over again until we get it right.

Add the correct image parsing logic to the code:

reg = r'http:\/\/[^\s,"]*\.jpg'
imgre = re.compile(reg)

def get_img(html):
    return re.findall(imgre, html)

Call the logic in the handler method:

def handler(event, context):
    logger.info("event: " + event)
    evt = json.loads(event)
    url = evt['url']
  
    html = get_html(url)
  
    img_list = get_img(html)
    logger.info(img_list)
  
    return 'Done!'

After the code is written, run the code locally to verify the result:

echo '{"url": "https://image.baidu.com/search/index?tn=baiduimage&word=%E5%A3%81%E7%BA%B8"}' \
    | fun local invoke image-crawler

As we can see, values of img_list have been generated on the console:

6. Upload images to an OSS instance

We will store the parsed images on the OSS instance.

First, use environment variables to configure OSS Endpoint and OSS Bucket.

Configure the environment variables in the template (OSS Bucket must be created in advance):

EnvironmentVariables:
    OSSEndpoint: oss-cn-hangzhou.aliyuncs.com
    BucketName: fun-local-test

Then, obtain the two environment variables in the function:

endpoint = os.environ['OSSEndpoint']
bucket_name = os.environ['BucketName']

When running a function, Fun Local provides an additional variable to indicate that the function is running locally.This allows us to perform certain localized operations. For example, we can connect ApsaraDB for RDS for online running and MySQL for local running.

Here, we use the indicator variable to create an OSS client in different ways. This is because the Access Key obtained for AssumeRole by using credentials is a temporary key for online running, while this restriction does not apply to local running.Use either of the two methods provided by OSS to create an OSS client:

creds = context.credentials

if (local):
    auth = oss2.Auth(creds.access_key_id,
                     creds.access_key_secret)
else:
    auth = oss2.StsAuth(creds.access_key_id,
                        creds.access_key_secret,
                        creds.security_token)
                        
bucket = oss2.Bucket(auth, endpoint, bucket_name)

Traverse all images and upload all of them to the OSS instance:

count = 0
for item in img_list:
    count += 1
    logging.info(item)
    # Get each picture
    pic = urllib.urlopen(item)
    # Store all the pictures in oss bucket, keyed by timestamp in microsecond unit
    bucket.put_object(str(datetime.datetime.now().microsecond) + '.png', pic)  

Run the function locally:

echo '{"url": "https://image.baidu.com/search/index?tn=baiduimage&word=%E5%A3%81%E7%BA%B8"}' \
    | fun local invoke image-crawler

From the logs, we can see that the images are parsed one by one and then uploaded to the OSS instance.

After logging on to the OSS console, we can see the images.

Deployment

After local development, we must launch the service online, so it can be called.Fun simplifies the whole process, including logging on to the console, creating services and functions, configuring environment variables, and creating roles.

Local running differs from online running in terms of how we authorize Function Compute to access the OSS instance.To authorize Function Compute to access the OSS instance, add the following configuration to the template.yml file (for more information about Policies, see):

Policies: AliyunOSSFullAccess

Then, the content of the template.yml file is as follows:

ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
  localdemo:
    Type: 'Aliyun::Serverless::Service'
    Properties:
      Description: 'local invoke demo'
      Policies: AliyunOSSFullAccess
    image-crawler:
      Type: 'Aliyun::Serverless::Function'
      Properties:
        Handler: index.handler
        CodeUri: code/
        Description: 'Hello world with python2.7!'
        Runtime: python2.7
        EnvironmentVariables:
          OSSEndpoint: oss-cn-hangzhou.aliyuncs.com
          BucketName: fun-local-test

Next, run the fun deploy command. Logs indicating successful deployment are displayed.

Verification

Verify the deployment on the Function Compute console

Log on to the Function Compute console. We can see that the services, functions, code, and environment variables are ready.

Write the JSON code used for verification into the trigger event and trigger the event:

The returned result is the same as for local running:

Verify the deployment by running a fcli command

For fcli help documentation, see.

On the terminal, run the following command to obtain the function list:

fcli function list --service-name localdemo

As we can see, image-crawler has been created.

{
  "Functions": [
    "image-crawler",
    "java8",
    "nodejs6",
    "nodejs8",
    "php72",
    "python27",
    "python3"
  ],
  "NextToken": null
}

Run the following command to call the function:

fcli function invoke --service-name localdemo \
    --function-name image-crawler \
    --event-str '{"url": "https://image.baidu.com/search/index?tn=baiduimage&word=%E5%A3%81%E7%BA%B8"}'

After it runs successfully, the returned result is the same as the console and Fun Local.

Conclusion

We have now completed the development process.The source code in this document is hosted on GitHub Repo.

This document shows how to use the local running and debugging capabilities of Fun Local to develop a function locally and run the function repeatedly to get feedback and facilitate code iteration.

By running the fun deploy command, we can deploy the function developed locally to the cloud and obtain the expected results without any modification to the code.

The method described in this document is only one of the function development methods in Function Compute.This document intends to show developers that the proper development of functions in Function Compute can be a smooth and enjoyable process.We hope you have fun with Fun.

Note

This article was translated from 《开发函数计算的正确姿势 —— 爬虫》.

相关实践学习
【文生图】一键部署Stable Diffusion基于函数计算
本实验教你如何在函数计算FC上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。函数计算提供一定的免费额度供用户使用。本实验答疑钉钉群:29290019867
建立 Serverless 思维
本课程包括: Serverless 应用引擎的概念, 为开发者带来的实际价值, 以及让您了解常见的 Serverless 架构模式
目录
相关文章
|
数据采集 Serverless API
在函数计算(Function Compute,FC)中部署Stable Diffusion(SD)
在函数计算(Function Compute,FC)中部署Stable Diffusion(SD)
330 2
|
4月前
|
存储 Serverless 数据库
Function Compute
【9月更文挑战第19天】
29 1
|
8月前
|
运维 监控 JavaScript
【阿里云云原生专栏】Serverless架构下的应用部署与运维:阿里云Function Compute深度探索
【5月更文挑战第21天】阿里云Function Compute是事件驱动的无服务器计算服务,让用户无需关注基础设施,专注业务逻辑。本文详述了在FC上部署应用的步骤,包括创建函数、编写代码和部署,并介绍了运维功能:监控告警、日志管理、版本管理和授权管理,提供高效低成本的计算服务。
325 6
|
8月前
|
运维 监控 Dubbo
SAE(Serverless App Engine)和FC(Function Compute)
【1月更文挑战第18天】【1月更文挑战第89篇】SAE(Serverless App Engine)和FC(Function Compute)
230 1
|
8月前
|
存储 Serverless
在阿里云函数计算(Function Compute)中,上传模型的步骤如下
在阿里云函数计算(Function Compute)中,上传模型的步骤如下
330 2
|
监控 前端开发 Serverless
阿里云函数计算(Function Compute,FC)是一种事件驱动的计算服务
阿里云函数计算(Function Compute,FC)是一种事件驱动的计算服务
425 1
|
运维 JavaScript Serverless
Function Compute
函数计算(Function Compute)是云计算领域的一种服务模型,由云服务提供商(例如阿里云、AWS、Google Cloud 等)提供。它是一种无服务器计算服务,允许开发者编写和部署函数,以响应事件触发,而无需管理底层的服务器和基础设施。函数计算提供了弹性的计算资源分配、按需计费、自动扩缩容等特性,使开发者能够聚焦于编写函数逻辑而不必担心底层的运维工作。
310 2
|
Serverless
函数计算(Function Compute)部署失败可能有多种原因
函数计算(Function Compute)部署失败可能有多种原因
175 2
|
弹性计算 监控 负载均衡
阿里云函数计算(Function Compute):快速高效的事件驱动计算
阿里云函数计算(Function Compute)是一种事件驱动计算服务,能够在阿里云上运行代码,且只按照实际使用时间付费。它无需管理服务器和基础架构,并可以与其他阿里云产品以及第三方服务集成,为用户提供了快速、高效、低成本、弹性的云计算能力。
|
数据采集 消息中间件 监控
Function Compute构建高弹性大数据采集系统
解决问题: 1.利用服务器自建数据采集系统成本高,弹性不足。 2.利用服务器自建数据采集系统运维复杂,成本高。
Function Compute构建高弹性大数据采集系统