简单的C#爬虫-阿里云开发者社区

简单的C#爬虫

2017-10-26 808

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： using System; using System.Collections.Generic; using System.

using System;  
using System.Collections.Generic;  
using System.IO;  
using System.Linq;  
using System.Net;  
using System.Text;  
using System.Text.RegularExpressions;  
using System.Threading.Tasks;  
  
namespace _2015._5._23通过WebClient类发起请求并下载html  
{  
    class Program  
    {  
        static void Main(string[] args)  
        {  
            #region 抓取网页邮箱  
            //string url = "http://zhidao.baidu.com/link?url=cvF0de2o9gkmk3zW2jY23TLEUs6wX-79E1DQVZG7qaBhEVT_xlh6TO7p0W4qwuAZ_InLymC_-mJBBcpdbzTeq_";  
            //WebClient wc = new WebClient();  
            //wc.Encoding = Encoding.UTF8;  
            //string str = wc.DownloadString(url);  
            //MatchCollection matchs=  Regex.Matches(str,@"\w+@([-\w])+([\.\w])+",RegexOptions.ECMAScript);  
            //foreach (Match item in matchs)  
            //{  
            //    Console.WriteLine(item.Value);  
            //}  
            //Console.WriteLine(matchs.Count);  
            #endregion   
 
            #region 抓取网页图片  
  
            //WebClient wc = new WebClient();  
            //wc.Encoding = Encoding.UTF8;  
            ////下载源网页代码  
            //string html = wc.DownloadString("http://dongxi.douban.com/?dcs=top-nav&dcm=douban");  
            //MatchCollection matches= Regex.Matches(html,"<img.*src=\"(.+?)\".*>");  
            //foreach (Match item in matches)  
            //{  
            //    //下载图片到指定路径  
            //    wc.DownloadFile(item.Groups[1].Value,@"c:\mv\"+Path.GetFileName(item.Groups[1].Value));  
            //}  
            //Console.WriteLine(matches.Count);  
 
            #endregion 爬一些信息  
  
            WebClient wc = new WebClient();  
            wc.Encoding = Encoding.UTF8;  
            string html = wc.DownloadString("http://www.lagou.com/");  
  
            MatchCollection matches= Regex.Matches(html,"<a.*jobs.*>(.*)</a>");  
            foreach (Match item in matches)  
            {  
                Console.WriteLine(item.Groups[1].Value);  
            }  
            Console.WriteLine(matches.Count);  
            Console.ReadKey();                                    
        }  
    }  
}

文章标签：

数据采集

关键词：

C#爬虫

简单的C#爬虫

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

简单的C#爬虫

热门文章

最新文章

相关课程

相关电子书