How to Build an Image Processing Service using Unique Alibaba Function Compute Features

本文涉及的产品
Serverless 应用引擎免费试用套餐包,4320000 CU,有效期3个月
函数计算FC,每月15万CU 3个月
简介:

This article showcases how you can utilize multiple unique features of the Alibaba Cloud Function Compute to build an image processing web service.

Introducing Function Compute

Function Compute is an Alibaba cloud serverless platform that allows engineers to develop an internet scale service with just a few lines of code. It seamlessly handles resource management, auto scaling and load balancing so that developers can focus on their business logic without worrying about managing the underlying infrastructure making it easy to build applications that respond quickly to new information”. Internally, we utilize container technology and develop proprietary distributed algorithms to schedule our user's code on resources that are scaled elastically. Since it's inception a little over an year ago, we have developed many cutting-edge technologies internally aiming to provide our users with high scalability, reliability and performance.

In this guide, we show you a step by step tutorial that showcases some of its innovative features. You can read this quick start guide to familiarize yourself with basic serverless concepts if this is your first time using Function Compute.

Using Network File System

The first feature that we introduce allows developers to write functions that read and write from a network attached file system like Alibaba Cloud NAS.

Motivation

The serverless nature of the platform means that user code can run on different instances each time it is invoked. This further implies that the functions cannot rely on its local file system to store any intermediate results. The developers have to rely on another cloud service like Object Storage Services to share processed results between functions or invocations. This is not ideal as dealing with another distributed service adds extra development overhead and complexities in the code to handle various edge cases.

To solve this problem, we developed the access Network Attached Storage (NAS) feature. NAS is another Alibaba cloud service that offers a highly scalable, reliable and available distributed file system that supports standard file access protocols. We can mount the remote NAS file system to the resource on which the user code is running which effectively creates a "local" file system for the function code to use.

Image Crawling Example

This demo section shows you how to create a serverless web crawler that downloads all the images starting from a seed webpage. This is a quite a challenge problem to be run on a serverless platform as it is not possible to crawl all the websites in one function given the time constraints. However, with the access to NAS feature, it becomes straightforward as one can use the NAS file system to share data between function runs. Below we show a step by step tutorial. We assume that you understand the concept of VPC and know how to create a NAS mount point in a VPC. Otherwise, you can read the basic NAS tutorial before proceeding to the steps below.

Create a service with NAS configuration

  1. Log on to the Function Compute console.
  2. Select the target region in which your NAS is located.
  3. Create a service that uses a pre-created NAS file system. In this demo:
    Screen Shot 2018-09-11 at 11.35.38 PM.png
  4. Enter the Service Name and Description.
  5. Enable Advanced Settings.
  6. Finish the VPC Configs fields, make sure that you select the VPC in which the NAS mount point is located.
    Screen Shot 2018-09-11 at 11.37.28 PM.png

After the VPC Configs are complete, the NAS Config fields appear.

  1. Complete the Nas Config fields as described below.
    NAS Config.jpg

    1. The UserId and GroupId fields are the uid/gid under which the function runs. They determine the owner of all the files created on the NAS file system. You can pick any user/group id for this demo as they are shared among all functions in this service.
  2. The NAS Mount Point drop down menu list all the valid NAS mount points that are accessible from the chosen VPC.
  3. The Remote Path is a directory on the NAS file system, it does not need to be the root directory of the NAS file system. Please choose a directory that you want to store the images.
  4. The Local Mount Path is the local directory where the function can access the remote directory and please remember what you choose here.
  5. Complete the Log Service configuration with your desired logstore destination.
  6. Make sure that you config your role to grant Function compute access to your VPC and logstore.
  7. Click OK.

Create a function that starts every five minutes

Now that we have a service with NAS access, it's time to write the crawler. Since the crawler function has to run many times before it can finish, we use a time trigger to invoke it every 5 minutes.

  1. Log on to the Function Compute console and select the service you just created
  2. Create a function for the service by clicking the plus sign.
    Create Function.jpg
  3. Function Compute provides various function templates to help you quickly build an application. Select to create an empty function for this demo and click next but you can play with other templates when you have time.
  4. Select time trigger in the drop down menu in the next page. Fill out the trigger name and set the invoke interval to be 5 minutes and leave the events empty for now and click next.
    Time Trigger.jpg
  5. Fill in the function name and make sure to select java8 as the runtime. Also fill in the function handler and set the memory to be 2048MB and Time out as 300 seconds and click next
    java function.jpg
  6. Click next and make sure the preview looks good before clicking create.

Write the crawler in Java

Now you should see the function code page and it's time to write the crawler. The handler logic is pretty straightforward as shown below.

  • Parse the time trigger event to get the crawler config.
  • Create the image crawler based on the config. The crawler uses a JAVA HTML Parser to parse html pages to identify images and links.
  • Read the already and not-yet visited web page lists from the NAS file system (only if the function is running in a new environment).
  • Continue the depth-first traverse of the web pages and use the crawler to download any new pictures along the way.
  • Save the newly found web pages to the NAS file system.

Here is an excerpt of the Java code and you can see that we read and write files to the NAS file system exactly the same way as to the local file system.

public class ImageCrawlerHandler implements PojoRequestHandler<TimedCrawlerConfig, CrawlingResult> {
private String nextUrl() {
    String nextUrl;
    do {
        nextUrl = pagesToVisit.isEmpty() ? "" : pagesToVisit.remove(0);
    } while (pagesVisited.contains(nextUrl) );
    return nextUrl;
}

private void initializePages(String rootDir) throws IOException {
    if (this.rootDir.equalsIgnoreCase(rootDir)) {
        return;
    }
    try {
        new BufferedReader(new FileReader(rootDir + CRAWL_HISTORY)).lines()
            .forEach(l -> pagesVisited.add(l));
        new BufferedReader(new FileReader(rootDir + CRAWL_WORKITEM)).lines()
            .forEach(l -> pagesToVisit.add(l));
    } catch (FileNotFoundException e) {
        logger.info(e.toString());
    }
    this.rootDir = rootDir;
}

private void saveHistory(String rootDir, String justVistedPage, HashSet<String> newPages)
    throws IOException {
    //append crawl history to the end of the file
    try (PrintWriter pvfw = new PrintWriter(
        new BufferedWriter(new FileWriter(rootDir + CRAWL_HISTORY, true)));
    ) {
        pvfw.println(justVistedPage);
    }
    //append to be crawled workitems to the end of the file
    try (PrintWriter ptfw = new PrintWriter(
        new BufferedWriter(new FileWriter(rootDir + CRAWL_WORKITEM, true)));
    ) {
        newPages.stream().forEach(p -> ptfw.println(p));
    }
}

@Override
public CrawlingResult handleRequest(TimedCrawlerConfig timedCrawlerConfig, Context context) {
    CrawlingResult crawlingResult = new CrawlingResult();
    this.logger = context.getLogger();
    CrawlerConfig crawlerConfig = null;
    try {
        crawlerConfig = JSON_MAPPER.readerFor(CrawlerConfig.class)
            .readValue(timedCrawlerConfig.payload);
    } catch (IOException e) {
        ....
    }
    ImageCrawler crawler = new ImageCrawler(
        crawlerConfig.rootDir, crawlerConfig.cutoffSize, crawlerConfig.debug, logger);
    int pagesCrawled = 0;
    try {
        initializePages(crawlerConfig.rootDir);
        if (pagesToVisit.isEmpty()) {
            pagesToVisit.add(crawlerConfig.url);
        }
        while (pagesCrawled < crawlerConfig.numberOfPages) {
            String currentUrl = nextUrl();
            if (currentUrl.isEmpty()) {
                break;
            }
            HashSet<String> newPages = crawler.crawl(currentUrl);
            newPages.stream().forEach(p -> {
                if (!pagesVisited.contains(p)) {
                    pagesToVisit.addAll(newPages);
                }
            });
            pagesCrawled++;
            pagesVisited.add(currentUrl);
            saveHistory(crawlerConfig.rootDir, currentUrl, newPages);
        }
        // calculate the total size of the images
       .....
    } catch (Exception e) {
        crawlingResult.errorStack = e.toString();
    }

    crawlingResult.totalCrawlCount = pagesVisited.size();
    return crawlingResult;
}
}
public class ImageCrawler {
...
public HashSet<String> crawl(String url) {
    links.clear();
    try {
        Connection connection = Jsoup.connect(url).userAgent(USER_AGENT);
        Document htmlDocument = connection.get();
        Elements media = htmlDocument.select("[src]");
        for (Element src : media) {
            if (src.tagName().equals("img")) {
                downloadImage(src.attr("abs:src"));
            }
        }
        Elements linksOnPage = htmlDocument.select("a[href]");
        for (Element link : linksOnPage) {
            logDebug("Plan to crawl `" + link.absUrl("href") + "`");
            this.links.add(link.absUrl("href"));
        }

    } catch (IOException ioe) {
       ...
    }
    return links;
}
}

For the sake of simplicity, we have omitted some details and other helper classes. You can get all the code from the awesome-fc github project repo if you would like to run the code and get images from your favorite websites.

Run the crawler

Now that we have written the code, we need to run it. Here are the steps.

  • We use maven to do dependency and build management. Just type the following command after you sync with the repro (assuming you have maven installed already) to create the jar file ready to upload.
mvn clean package
  • Select the Code tab in the function page. Upload the jar file (the one with name ends with dependencies) created in the previous step through the console.
    java function.jpg
  • Select the Triggers tab in the function page. Click the time trigger link to enter the event in Json format. The Json event will be serialized to the crawler config and passed to the function. Click Ok.
    timer event.jpg
  • The time trigger invokes the crawler function every five minutes. Each time, the handler picks up the list of URLs still need to be visited and start from the first one.
  • You can select the Log tab to search for the crawler execution log.

Create a Serverless Service

The second feature that we introduce allows anyone to send an HTTP request to trigger a function execution directly.

Motivation

Now that we have a file system filled with the images downloaded from the web, we want to find a way to serve those images through a web service. The traditional way is to mount the NAS to a VM and start a webserver on it. This is both a waste of resources if the service is lightly used and not scalable when the traffic is heavy. Instead, you can write a serverless function that reads the images stored on the NAS file system and serve it through a HTTP endpoint. In this way, you can enjoy the instant scalability that Function Compute provides while still only pay for the actual usage.

Image Processing Service Example

This demo shows how to write an Image Processing Service.

Create a Function with HTTP Trigger

  1. Log on to the Function Compute console and select the same service as the crawler function.
  2. Create a function for the service by clicking the plus sign.
  3. Select to create an empty python2.7 function and click next.
  4. Select HTTP trigger in the drop down menu and make sure that it supports both GET and POST invoke method and click next. HTTP Trigger.jpg
  5. Finish the rest of the step and click OK.
  6. Get the files from the same github repro and upload the directory to the function.
    Upload Dir.jpg

Image Processing Using Python

Function Compute's python runtime comes with many built-in modules that one can use. In this example, we use both opencv and wand to do image transformations.

Use the HTTP trigger in Python

Even with an image processing function, we still need to setup a web site to serve the requests. Normally, one needs to use another service like API gateway to handle HTTP requests. In this demo, we are going to use the Function Compute HTTP Trigger feature to allow a HTTP request to trigger a function execution directly. With the HTTP trigger, the headers/paths/query in the HTTP requests are all passed to the function handler directly and the function can return the HTML content dynamically.

With these two features, the handler code is surprisingly straightforward and here is a high-level breakdown.

Get the HTTP path and query from the system environ variable.
Use the HTTP path to load the image on the NAS file system.
Apply different image processing techniques based on the query action.
Insert the transformed image onto the pre-build html file and return it.
Here is an excerpt of the handler logic and we can see that wang loads the image stored on NAS just like a normal file on the local system.

import cv2
from wand.image import Image

TEMPLATE = open('/code/index.html').read()
NASROOT = '/mnt/crawler'
face_cascade = cv2.CascadeClassifier('/usr/share/opencv/lbpcascades/lbpcascade_frontalface.xml')

def handler(environ, start_response):
    logger = logging.getLogger()
    context = environ['fc.context']
    path = environ.get('PATH_INFO', "/")
    fileName = NASROOT + path

    try:
        query_string = environ['QUERY_STRING']
        logger.info(query_string)
    except (KeyError):
        query_string = " "

    action = query_dist['action']

    if (action == "show"):
        with Image(filename=fileName) as fc_img:
            img_enc = base64.b64encode(fc_img.make_blob(format='png'))

    elif (action == "facedetect"):
        img = cv2.imread(fileName)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(gray, 1.03, 5)
        for (x, y, w, h) in faces:
            cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 1)
        cv2.imwrite("/tmp/dst.png", img)
        with open("/tmp/dst.png") as img_obj:
            with Image(file=img_obj) as fc_img:
                img_enc = base64.b64encode(fc_img.make_blob(format='png'))
    elif (action == "rotate"):
        assert len(queries) >= 2
        angle = query_dist['angle']
        logger.info("Rotate " + angle)
        with Image(filename=fileName) as fc_img:
            fc_img.rotate(float(angle))
            img_enc = base64.b64encode(fc_img.make_blob(format='png'))
    else:
        # demo, mixed operation

    status = '200 OK'
    response_headers = [('Content-type', 'text/html')]
    start_response(status, response_headers)
    return [TEMPLATE.replace('{fc-py}', img_enc)]

Conclusions

  • You can read the blog to get a more general idea what Function Compute can do.
  • You can also read the official NAS tutorial and other Function Compute documentations to learn more exciting new features.
  • Please give us feedbacks or suggestions in our official Function Compute forum or the official Alibaba Cloud Slack Channel.
相关实践学习
【文生图】一键部署Stable Diffusion基于函数计算
本实验教你如何在函数计算FC上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。函数计算提供一定的免费额度供用户使用。本实验答疑钉钉群:29290019867
建立 Serverless 思维
本课程包括: Serverless 应用引擎的概念, 为开发者带来的实际价值, 以及让您了解常见的 Serverless 架构模式
目录
相关文章
|
11月前
|
数据采集 Serverless API
在函数计算(Function Compute,FC)中部署Stable Diffusion(SD)
在函数计算(Function Compute,FC)中部署Stable Diffusion(SD)
323 2
|
2月前
|
存储 Serverless 数据库
Function Compute
【9月更文挑战第19天】
19 1
|
3月前
|
Java Linux Windows
【Azure 应用服务】App Service / Function App 修改系统时区为中国时区的办法(Azure中所有服务的默认时间都为UTC时间,转换为北京时间需要+8小时)
【Azure 应用服务】App Service / Function App 修改系统时区为中国时区的办法(Azure中所有服务的默认时间都为UTC时间,转换为北京时间需要+8小时)
|
3月前
|
存储 网络安全 数据中心
【Azure 应用服务】Function App / App Service 连接 Blob 报错
【Azure 应用服务】Function App / App Service 连接 Blob 报错
|
3月前
【Azure 应用服务】如何关掉App Service/Function App的FTP部署, 使之变成FTPS
【Azure 应用服务】如何关掉App Service/Function App的FTP部署, 使之变成FTPS
|
3月前
|
关系型数据库 MySQL Linux
【Azure 应用服务】[App Service For Linux(Function) ] Python ModuleNotFoundError: No module named 'MySQLdb'
【Azure 应用服务】[App Service For Linux(Function) ] Python ModuleNotFoundError: No module named 'MySQLdb'
|
3月前
|
SQL 网络协议 NoSQL
【Azure 应用服务】App Service/Azure Function的出站连接过多而引起了SNAT端口耗尽,导致一些新的请求出现超时错误(Timeout)
【Azure 应用服务】App Service/Azure Function的出站连接过多而引起了SNAT端口耗尽,导致一些新的请求出现超时错误(Timeout)
|
6月前
|
运维 监控 JavaScript
【阿里云云原生专栏】Serverless架构下的应用部署与运维:阿里云Function Compute深度探索
【5月更文挑战第21天】阿里云Function Compute是事件驱动的无服务器计算服务,让用户无需关注基础设施,专注业务逻辑。本文详述了在FC上部署应用的步骤,包括创建函数、编写代码和部署,并介绍了运维功能:监控告警、日志管理、版本管理和授权管理,提供高效低成本的计算服务。
308 6
|
6月前
|
消息中间件 运维 Serverless
阿里云函数计算是一种FaaS(Function as a Service)云服务
【4月更文挑战第17天】阿里云函数计算是一种FaaS(Function as a Service)云服务
2112 3
|
6月前
|
运维 监控 Dubbo
SAE(Serverless App Engine)和FC(Function Compute)
【1月更文挑战第18天】【1月更文挑战第89篇】SAE(Serverless App Engine)和FC(Function Compute)
193 1

热门文章

最新文章