Javascript类型推断(2) - 开始训练吧

简介: # Javascript类型推断(2) - 开始训练吧 ## 准备训练数据 下面我们将上一节获取的类型数据信息进行预处理,转化为可以训练的数据。 代码在GetTypes.js中,会创建三个相关目录: ```ts let root = "data/Repos-cleaned"; let outputDirGold = "data/outputs-gold/"; let

Javascript类型推断(2) - 开始训练吧

准备训练数据

下面我们将上一节获取的类型数据信息进行预处理,转化为可以训练的数据。

代码在GetTypes.js中,会创建三个相关目录:

let root = "data/Repos-cleaned";
let outputDirGold = "data/outputs-gold/";
let outputDirAll = "data/outputs-all/";
let outputDirCheckJS = "data/outputs-checkjs";
try {
    fs.mkdirSync(outputDirGold);
    fs.mkdirSync(outputDirAll);
    fs.mkdirSync(outputDirCheckJS);
}
catch (err) {
    console.log(err);
}

其中,outputs-all数据用于训练。而goutputs-gold中保存用户手动标注的类型信息,这个珍贵数据将用于测试集。output-checkjs用于和check js工具的结果做对比。

最终生成的训练数据如下例:

let a = 0 ; let s = "s" ; console . log ( s ) ;    O $number$ O O O O $string$ O O O $Console$ O $void$ O $string$ O O
class Test { public value : number ; constructor ( v ) { this . value = v ; } } let t = new Test ( 0 ) ;    O $any$ O O $number$ O O O O O $number$ O O O O $number$ O $number$ O O O O $Test$ O O $any$ O O O O

就是我们上节所见到的代码和token的对应。

这部分的原理大家应该已经了解了,源代码我们就不详细分析了。

拆分训练集和测试集

训练数据准备完成之后,我们就可以调用lexer.py将其分成训练集和测试集。

下面是我们了前68个工程为例的拆分情况:

File counts= 68
Processing 0: 0xProject__0x.js.json
Processing 1: 1backend__1backend.json
Processing 2: 2fd__graphdoc.json
Processing 3: 43081j__rar.js.json
Processing 4: 500tech__angular-tree-component.json
Processing 5: 5calls__5calls.json
Processing 6: 74th__vscode-vim.json
Processing 7: accounts-js__accounts.json
Processing 8: adriancarriger__angularfire2-offline.json
Processing 9: AFASSoftware__maquette.json
Processing 10: afrad__angular2-websocket.json
Processing 11: aggarwalankush__ionic-mosum.json
Processing 12: aggarwalankush__ionic-push-base.json
Processing 13: ahomu__Talkie.json
Processing 14: aikoven__typescript-fsa.json
Processing 15: aioutecism__amVim-for-VSCode.json
Processing 16: airbrake__airbrake-js.json
Processing 17: ajtoo__vscode-org-mode.json
Processing 18: akfish__node-vibrant.json
Processing 19: akserg__ng2-dnd.json
Processing 20: akserg__ng2-slim-loading-bar.json
Processing 21: akserg__ng2-toasty.json
Processing 22: alamgird__angular-next-starter-kit.json
Processing 23: Alberplz__angular2-color-picker.json
Processing 24: alefragnani__vscode-project-manager.json
Processing 25: alex3165__react-mapbox-gl.json
Processing 26: alexjlockwood__avocado.json
Processing 27: alexjlockwood__ShapeShifter.json
Processing 28: alexjoverm__tslint-config-prettier.json
Processing 29: alexjoverm__typescript-library-starter.json
Processing 30: AlexKhymenko__ngx-permissions.json
Processing 31: AlgusDark__bloomer.json
Processing 32: amcdnl__ngrx-actions.json
Processing 33: anandanand84__technicalindicators.json
Processing 34: andrei-markeev__ts2c.json
Processing 35: andrerpena__react-mde.json
Processing 36: andrucz__ionic2-rating.json
Processing 37: angular-redux__store.json
Processing 38: angular-ui__ui-router.json
Processing 39: angulartics__angulartics2.json
Processing 40: ant-design__ant-design-mobile.json
Processing 41: ant-design__ant-design.json
Processing 42: antivanov__js-crawler.json
Processing 43: APIs-guru__graphql-faker.json
Processing 44: APIs-guru__graphql-lodash.json
Processing 45: APIs-guru__graphql-voyager.json
Processing 46: appbaseio__mirage.json
Processing 47: arangodb__arangojs.json
Processing 48: argonjs__argon.json
Processing 49: arkon__ng-sidebar.json
Processing 50: artemsky__ng-snotify.json
Processing 51: artemsky__vue-snotify.json
Processing 52: artsy__emission.json
Processing 53: ascoders__gaea-editor.json
Processing 54: ascoders__react-native-image-viewer.json
Processing 55: ascoders__react-native-image-zoom.json
Processing 56: ashubham__webshot-factory.json
Processing 57: Asymmetrik__ngx-leaflet.json
Processing 58: atom-community__markdown-preview-plus.json
Processing 59: atom-haskell__ide-haskell.json
Processing 60: atom__atom-languageclient.json
Processing 61: aurelia__ux.json
Processing 62: aurelia__validation.json
Processing 63: auth0__angular2-jwt.json
Processing 64: avatsaev__angular-contacts-app-example.json
Processing 65: avatsaev__angular4-docker-example.json
Processing 66: aviabird__angularspree.json
Processing 67: Azure__kashti.json
Train projects: 54
Validation projects: 7
Test projects: 7
Train files: 2184
Validation files: 364
Test files: 187
Producing vocabularies
Size of source vocab: 3377
Size of target vocab: 707
Writing train/valid/test files
Overall tokens: 896479 train, 134374 valid and 60516 test

最后会生成train.txt, valid.txt和test.txt三个文件。

我们取其中的一行,看看其格式:

<s> import 's' ; import { configure } from 's' ; import * as _UNKNOWN_ from 's' ; configure ( { adapter : new _UNKNOWN_ ( ) } ) ; </s>    O O O O O O $any$ O O O O O O O $any$ O O O $any$ O O $any$ O O $any$ O O O O O O

嗯,还是加工后的源代码,与我们第一节中生成的token类型表的对应。

同时,还会生成source_wl和target_wl两个词表:
其中source_wl是用到的符号表,例:

.
(
)
,
;
:
{
}
's'
"s"
=
this
0
[
]
const
from
=>
import
null
return
if
export
let
expect
<
>
new
?
function
string
<s>
</s>
public
as
private
!
false
true
===

最后一个词是_UNKNOWN_,代表未知词。

而target_wl是类型的表,我们看下前几行:

O
$any$
$string$
$number$
$complex$
$void$
$boolean$
$any[]$
$string[]$
$number[]$
$Assertion$
$undefined$
${}$
$HTMLElement$
$Promise$
$ExpectStatic$
$Promise<any>$
$PromiseConstructor$
$Promise<void>$
$Element$
$this$
$ErrorConstructor$
$ZeroEx$
$Math$
$SignedOrder$
$Projection$
$JSON$
$JsApi$
$StockData$
$Console$
$VNode$
$T$

类型中第一个是未知。

除此之外,还会生成test_projects.txt,例:

43081j__rar.js.json
adriancarriger__angularfire2-offline.json
aikoven__typescript-fsa.json
alexjoverm__tslint-config-prettier.json
AlgusDark__bloomer.json
andrerpena__react-mde.json
arangodb__arangojs.json

格式转换

在使用CNTK处理之前,我们还需要将txt格式转换成CNTK需要的ctf格式。

这个工具去CNTK官网上可以找到:https://github.com/microsoft/CNTK/blob/master/Scripts/txt2ctf.py

调用命令如下,以Windows为例,其它系统就不用路径,直接调用python就好:

& 'C:\Program Files\Python37\python.exe' txt2ctf.py --map data/source_wl data/target_wl --input data/train.txt --output data/train.ctf
& 'C:\Program Files\Python37\python.exe' txt2ctf.py --map data/source_wl data/target_wl --input data/valid.txt --output data/valid.ctf
& 'C:\Program Files\Python37\python.exe' txt2ctf.py --map data/source_wl data/target_wl --input data/test.txt --output data/test.ctf

训练

万事俱备,我们就可以调用infer.py来进行训练了。
请记得安装微软的CNTK框架。

下面是我的训练命令和输出

C:\Python\Python36\python.exe .\infer.py
Selected GPU[0] GeForce GTX 960M as the process wide default device.
-------------------------------------------------------------------
Build info:

                Built time: Apr 23 2019 21:50:08
                Last modified date: Tue Apr 23 17:37:55 2019
                Build type: Release
                Build target: GPU
                With ASGD: yes
                Math lib: mkl
                CUDA version: 10.0.0
                CUDNN version: 7.6.2
                Build Branch: HEAD
                Build SHA1: ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
Training 4597857 parameters in 21 parameter tensors.
-------------------------------------------------------------------
Build info:

                Built time: Apr 23 2019 21:50:08
                Last modified date: Tue Apr 23 17:37:55 2019
                Build type: Release
                Build target: GPU
                With ASGD: yes
                Math lib: mkl
                CUDA version: 10.0.0
                CUDNN version: 7.6.2
                Build Branch: HEAD
                Build SHA1: ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
Learning rate per 1 samples: 0.001
 Minibatch[   1-  10]: loss = 1.052736 * 42461, metric = 14.26% * 42461;
 Minibatch[  11-  20]: loss = 0.671728 * 46088, metric = 13.34% * 46088;
 Minibatch[  21-  30]: loss = 0.486434 * 42913, metric = 8.57% * 42913;
 Minibatch[  31-  40]: loss = 0.542112 * 45928, metric = 9.83% * 45928;

评估效果

在evaluation.py中,修改model_file变量为我们上一步训练好的cntk文件,然后运行就可以评估训练的效果了。

model_file = "models/model-1.cntk"
目录
相关文章
|
7月前
|
JavaScript Java 测试技术
基于springboot+vue.js的编程训练系统附带文章和源代码设计说明文档ppt
基于springboot+vue.js的编程训练系统附带文章和源代码设计说明文档ppt
53 11
|
7月前
|
JavaScript Java 测试技术
基于springboot+vue.js的球队训练信息管理系统附带文章和源代码设计说明文档ppt
基于springboot+vue.js的球队训练信息管理系统附带文章和源代码设计说明文档ppt
57 3
|
7月前
|
前端开发 JavaScript TensorFlow
如何将训练好的Python模型给JavaScript使用?
本文介绍了如何将TensorFlow模型转换为Web格式以实现浏览器中的实际应用。首先,简述了已有一个能够检测扑克牌的TensorFlow模型,目标是将其部署到Web上。接着,讲解了TensorFlow.js Converter的作用,它能将Python API创建的GraphDef模型转化为TensorFlow.js可读取的json格式,用于浏览器中的推理计算。然后,详细说明了Converter的安装、用法及不同输入输出格式,并提供了转换命令示例。最后,文中提到了模型转换后的实践步骤,包括找到导出的模型、执行转换命令以及在浏览器端部署模型的流程。
64 3
|
6月前
|
JavaScript Java 测试技术
基于springboot+vue.js+uniapp小程序的大学生创新创业训练项目管理系统附带文章源码部署视频讲解等
基于springboot+vue.js+uniapp小程序的大学生创新创业训练项目管理系统附带文章源码部署视频讲解等
50 0
|
6月前
|
JavaScript Java 测试技术
基于ssm+vue.js+uniapp小程序的智能训练管理平台附带文章和源代码设计说明文档ppt
基于ssm+vue.js+uniapp小程序的智能训练管理平台附带文章和源代码设计说明文档ppt
38 0
|
XML Web App开发 设计模式
【JavaScript】实战训练小项目-WebAPI
在上一篇文章中,其实我们并没有学JS和HTML的互动,而是各干各的
256 0
|
JavaScript 前端开发 索引
C1能力认证训练题解析 _ 第三部分 _ JavaScript基础(2)
C1能力认证训练题解析 _ 第三部分 _ JavaScript基础(2)
440 0
C1能力认证训练题解析 _ 第三部分 _ JavaScript基础(2)
|
JavaScript 前端开发
C1能力认证训练题解析 _ 第三部分 _ JavaScript基础
C1能力认证训练题解析 _ 第三部分 _ JavaScript基础
358 0
C1能力认证训练题解析 _ 第三部分 _ JavaScript基础
|
Web App开发 机器学习/深度学习 JavaScript
在浏览器上也能训练神经网络?TensorFlow.js带你玩游戏~
一直以来训练神经网络给我们的印象都是复杂、耗时、对硬件要求高。你有没有想过有一天在浏览器上也能训练神经网络~ 本文通过一篇详细的TensorFlow.js教程,带你玩一个用浏览器训练神经网络的游戏!
4085 0
|
人工智能 Linux Python
7月31日云栖精选夜读 | 在浏览器上也能训练神经网络?TensorFlow.js带你玩游戏~
杂原子掺杂碳材料,由于其大比表面积、高孔隙、良好的电子传导性以及热、机械稳定性等特点,已被广泛应用于催化、能源、生命科学等领域。传统的制备方法往往都以不可再生碳源作为原料,制备过程一般要加入昂贵的模板、活化剂及杂原子源等。
12963 0
7月31日云栖精选夜读  | 在浏览器上也能训练神经网络?TensorFlow.js带你玩游戏~