neo@MacBook-Pro ~/Documents/crawler % scrapy Scrapy 1.4.0 - project: crawler Usage: scrapy <command> [options] [args] Available commands: bench Run quick benchmark test check Check spider contracts crawl Run a spider edit Edit spider fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates list List available spiders parse Parse URL (using its spider) and print the results runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy Use "scrapy <command> -h" to see more info about a command
neo@MacBook-Pro ~/Documents % scrapy startproject crawler New Scrapy project 'crawler', using template directory '/usr/local/lib/python3.6/site-packages/scrapy/templates/project', created in: /Users/neo/Documents/crawler You can start your first spider with: cd crawler scrapy genspider example example.com
neo@MacBook-Pro ~/Documents/crawler % scrapy genspider netkiller netkiller.cn Created spider 'netkiller' using template 'basic' in module: crawler.spiders.netkiller
neo@MacBook-Pro ~/Documents/crawler % scrapy crawl netkiller
运行结果输出到 json 文件中
neo@MacBook-Pro ~/Documents/crawler % scrapy crawl netkiller -o output.json
原文出处:Netkiller 系列 手札
本文作者:陈景峯
转载请与作者联系,同时请务必标明文章原始出处和作者信息及本声明。