双色球想必大家都很熟悉了,尽管屡买屡不中,但还是会买。以前就想过利用双色球的走势图得到双色球的数据库,至于得到数据库干什么倒没想过,不过对以往号码有没有重复出现还是挺好奇的。最近写Entity Framework的博客,所以这篇文章的标题里就出现了Entity Framework的身影,其实Entity Framework在下面的程序里只占据了很少的一部分。
下面开始介绍我获取数据库的方法。
双色球的走势图网址:http://zx.caipiao.163.com/trend/ssq_basic.html
打开之后,如下图所示,默认显示的是最近30期的:
根据期号进行查询,可以得到如下的链接:
很容易可以发现beginPeriod表示的是开始期号,endPeriod表示的截止期号。有了这两个参数,就可以得到任意期号的数据了。根据上述方法查询,得到网易彩票提供的最早数据是2004009期。
下面分析走势图的html结构。
谷歌浏览器中,按Ctrl+Shift+i 或Firefox中使用Firebug可查看html的结构。
下图是走势图的html结构,可以看到图表数据在id为chartsTable的表格里。进一步查看,真正有用的数据是在<tbody></tbody>标签中。
下面给出获取<tbody></tbody>之间内容的代码:
1: /// <summary>
2: /// 获取网页的双色球数据
3: /// </summary>
4: /// <param name="startQH">开始期号</param>
5: /// <param name="endQH">截止期号</param>
6: /// <returns></returns>
7: private string GetOriginData(string startQH, string endQH)
8: {
9: string path = string.Format("http://zx.caipiao.163.com/trend/ssq_basic.html?beginPeriod={0}&endPeriod={1}", startQH, endQH);
10: WebRequest wp = WebRequest.Create(path);
11: Stream s = wp.GetResponse().GetResponseStream();
12: StreamReader sr = new StreamReader(s);
13: string content = sr.ReadToEnd();
14: sr.Close();
15: s.Close();
16: int startIndex = content.IndexOf("<tbody id=\"cpdata\">");
17: int endIndex = content.IndexOf("</tbody>");
18: content = content.Substring(startIndex, endIndex - startIndex).Replace("<tr class=\"bg_doe\" >", "<tr>").Replace("<tr >", "<tr>").Replace("\r\n", "");
19: return content;
20: }
<tbody></tbody>中的内容就是<tr></tr>和<td></td>了,下面给出解析<tr>和<td>的代码,有注释,就不多解释了。
1: /// <summary>
2: /// 循环解析Tr
3: /// </summary>
4: /// <param name="wnRepo"></param>
5: /// <param name="content"><tbody></tbody>之间的内容</param>
6: private void ResolveTr(IRepository<WinNo> wnRepo, string content)
7: {
8: string trContent = string.Empty;
9: WinNo wn = null;
10: Regex regex = new Regex("<tr>");
11: //在<tbody></tbody>之间的内容搜索所有匹配<tr>的项
12: MatchCollection matches = regex.Matches(content);
13: foreach (Match item in matches)
14: {
15: wn = new WinNo();
16: //如果当前匹配项的下一个匹配项的值不为空
17: if (!string.IsNullOrEmpty(item.NextMatch().Value))
18: {
19: trContent = content.Substring(item.Index, item.NextMatch().Index - item.Index);
20: }
21: //最后一个<tr>的匹配项
22: else
23: {
24: trContent = content.Substring(item.Index, content.Length - item.Index);
25: }
26: ResolveTd(wn, trContent);
27: wnRepo.Insert(wn);
28: }
29: }
30: /// <summary>
31: /// 在一个TR中,解析TD,获取一期的号码
32: /// </summary>
33: /// <param name="wn"></param>
34: /// <param name="trContent"></param>
35: private void ResolveTd(WinNo wn, string trContent)
36: {
37: //匹配期号的表达式
38: string patternQiHao = "<td align=\"center\" title=\"开奖日期";
39: Regex regex = new Regex(patternQiHao);
40: Match qhMatch = regex.Match(trContent);
41: wn.QiHao = trContent.Substring(qhMatch.Index + 17 + patternQiHao.Length, 7);
42: //匹配蓝球的表达式
43: string patternChartBall02 = "<td class=\"chartBall02\">";
44: regex = new Regex(patternChartBall02);
45: Match bMatch = regex.Match(trContent);
46: wn.B = Convert.ToInt32(trContent.Substring(bMatch.Index + patternChartBall02.Length, 2));
47: //存放匹配出来的红球号码
48: redBoxList = new List<int>();
49: //匹配红球的表达式
50: string patternChartBall01 = "<td class=\"chartBall01\">";
51: regex = new Regex(patternChartBall01);
52: MatchCollection rMatches = regex.Matches(trContent);
53: foreach (Match r in rMatches)
54: {
55: redBoxList.Add(Convert.ToInt32(trContent.Substring(r.Index + patternChartBall01.Length, 2)));
56: }
57: //匹配红球的表达式
58: string patternChartBall07 = "<td class=\"chartBall07\">";
59: regex = new Regex(patternChartBall07);
60: rMatches = regex.Matches(trContent);
61: foreach (Match r in rMatches)
62: {
63: redBoxList.Add(Convert.ToInt32(trContent.Substring(r.Index + patternChartBall07.Length, 2)));
64: }
65: //排序红球号码
66: redBoxList.Sort();
67: //第一个红球号码
68: wn.R1 = redBoxList[0];
69: //第二个红球号码
70: wn.R2 = redBoxList[1];
71: wn.R3 = redBoxList[2];
72: wn.R4 = redBoxList[3];
73: wn.R5 = redBoxList[4];
74: wn.R6 = redBoxList[5];
75: }
下面给出使用到Entity Framework部分的代码:
首先,新建一个WinNo实体,用于表示双色球信息:
1: public class WinNo
2: {
3: /// <summary>
4: /// 主键
5: /// </summary>
6: public int ID { get; set; }
7: /// <summary>
8: /// 期号
9: /// </summary>
10: public string QiHao { get; set; }
11:
12: /// <summary>
13: /// 第一个红球号码
14: /// </summary>
15: public int R1 { get; set; }
16: /// <summary>
17: /// 第二个红球号码
18: /// </summary>
19: public int R2 { get; set; }
20: /// <summary>
21: /// 第三个红球号码
22: /// </summary>
23: public int R3 { get; set; }
24: /// <summary>
25: /// 第四个红球号码
26: /// </summary>
27: public int R4 { get; set; }
28: /// <summary>
29: /// 第五个红球号码
30: /// </summary>
31: public int R5 { get; set; }
32: /// <summary>
33: /// 第六个红球号码
34: /// </summary>
35: public int R6 { get; set; }
36: /// <summary>
37: /// 篮球号码
38: /// </summary>
39: public int B { get; set; }
40: }
其次,使用默认配置即可。
第三,新建一个上下文:SSQContext,代码如下:
1: public class SSQContext : DbContext
2: {
3: public SSQContext()
4: {
5: //Database.SetInitializer(new DropCreateDatabaseAlways<SSQContext>());
6: Database.SetInitializer<SSQContext>(null);
7: }
8:
9: public DbSet<WinNo> WinNos { get; set; }
10:
11: protected override void OnModelCreating(DbModelBuilder modelBuilder)
12: {
13: modelBuilder.Conventions.Remove<PluralizingTableNameConvention>();
14: base.OnModelCreating(modelBuilder);
15: }
16: }
第四,运行程序,结果如下图所示: