前端智能化漫谈 (1) - pix2code
自从有了GUI图形界面,就诞生了跟图形界面打交道的开发工程师,其中最大的一拨就演化成现在的前端工程师。不管是工作在前端、移动端还是桌面客户端,跟界面布局和切图等工作打交道是工作中的重要一部分。能够直接从设计稿生成代码,不仅是前端工程师的梦想,也是很多设计师同学的期望。
2017年,一篇名为《pix2code: Generating Code from a Graphical User Interface Screenshot》的论文横空出世,立刻引发了广泛关注。
pix2code是做什么的
如下图所示,pix2code通过将屏幕截图与对应的DSL描述通过深度神经网络进行训练,然后给出一张新图去进行推理得出一个新的DSL描述,最后再通过代码生成器变成目标平台上的代码。
下面我们分别看下在Android平台,iOS平台和Web上的例子。
Android平台样例
对应的DSL如下:
stack {
row {
btn, switch
}
row {
radio
}
row {
label, slider, label
}
row {
switch
}
row {
switch, switch, btn
}
row {
btn, btn
}
row {
check
}
}
footer {
btn-dashboard, btn-dashboard, btn-home
}
Web样例
我们再看一个web的样例:
对应的DSL如下:
header {
btn-inactive, btn-inactive, btn-inactive, btn-active, btn-inactive
}
row {
single {
small-title, text, btn-red
}
}
iOS平台样例
对应的DSL如下:
stack {
row {
img, label
}
row {
label, slider, label
}
row {
label, switch
}
row {
label, btn-add
}
row {
img, label
}
row {
label, slider, label
}
row {
img, img, img
}
row {
label, btn-add
}
}
footer {
btn-search, btn-search, btn-download, btn-more
}
Web样例
header {
btn-inactive, btn-inactive, btn-inactive, btn-active, btn-inactive
}
row {
single {
small-title, text, btn-red
}
}
pix2code的原理
pix2code的原理是将抓图通过卷积网络提取特征,同时将DSL通过LSTM循环神经网络进行训练,二者再统一放到一个循环神经网络中进行训练。
推理的时候,只有抓图进入卷积网络,DSL序列为空,输出结果为一DSL序列。
废话不多说,直接上代码。
一些参数
首先是一些配参数,比如输入形状,训练轮数等:
CONTEXT_LENGTH = 48
IMAGE_SIZE = 256
BATCH_SIZE = 64
EPOCHS = 10
STEPS_PER_EPOCH = 72000
模型的保存和读取
训练的网络权值不能浪费了,AModel提供S/L大法:
from keras.models import model_from_json
class AModel:
def __init__(self, input_shape, output_size, output_path):
self.model = None
self.input_shape = input_shape
self.output_size = output_size
self.output_path = output_path
self.name = ""
def save(self):
model_json = self.model.to_json()
with open("{}/{}.json".format(self.output_path, self.name), "w") as json_file:
json_file.write(model_json)
self.model.save_weights("{}/{}.h5".format(self.output_path, self.name))
def load(self, name=""):
output_name = self.name if name == "" else name
with open("{}/{}.json".format(self.output_path, output_name), "r") as json_file:
loaded_model_json = json_file.read()
self.model = model_from_json(loaded_model_json)
self.model.load_weights("{}/{}.h5".format(self.output_path, output_name))
卷积网络
卷积网络部分,6个卷积层分三段,最后是两个1024节点的全连接网络
image_model = Sequential()
image_model.add(Conv2D(32, (3, 3), padding='valid', activation='relu', input_shape=input_shape))
image_model.add(Conv2D(32, (3, 3), padding='valid', activation='relu'))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Dropout(0.25))
image_model.add(Conv2D(64, (3, 3), padding='valid', activation='relu'))
image_model.add(Conv2D(64, (3, 3), padding='valid', activation='relu'))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Dropout(0.25))
image_model.add(Conv2D(128, (3, 3), padding='valid', activation='relu'))
image_model.add(Conv2D(128, (3, 3), padding='valid', activation='relu'))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Dropout(0.25))
image_model.add(Flatten())
image_model.add(Dense(1024, activation='relu'))
image_model.add(Dropout(0.3))
image_model.add(Dense(1024, activation='relu'))
image_model.add(Dropout(0.3))
image_model.add(RepeatVector(CONTEXT_LENGTH))
visual_input = Input(shape=input_shape)
encoded_image = image_model(visual_input)
文本处理网络
文本处理部分使用两个LSTM
decoder = LSTM(512, return_sequences=True)(decoder)
decoder = LSTM(512, return_sequences=False)(decoder)
decoder = Dense(output_size, activation='softmax')(decoder)
self.model = Model(inputs=[visual_input, textual_input], outputs=decoder)
optimizer = RMSprop(lr=0.0001, clipvalue=1.0)
self.model.compile(loss='categorical_crossentropy', optimizer=optimizer)
图像和文本串联在一起
图像和文本都处理好之后,我们将其并联在一起:
decoder = concatenate([encoded_image, encoded_text])
并联好之后,我们再用另外两级LSTM网络来进行训练。
decoder = LSTM(512, return_sequences=True)(decoder)
decoder = LSTM(512, return_sequences=False)(decoder)
decoder = Dense(output_size, activation='softmax')(decoder)
整个网络构建好之后,我们就可以进行训练了。
self.model = Model(inputs=[visual_input, textual_input], outputs=decoder)
optimizer = RMSprop(lr=0.0001, clipvalue=1.0)
self.model.compile(loss='categorical_crossentropy', optimizer=optimizer)
fit的过程中会将之前参数文件中的EPOCHS和BATCH_SIZE读进来,如下:
def fit(self, images, partial_captions, next_words):
self.model.fit([images, partial_captions], next_words, shuffle=False, epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=1)
self.save()
进行推理时的代码如下:
def predict(self, image, partial_caption):
return self.model.predict([image, partial_caption], verbose=0)[0]
def predict_batch(self, images, partial_captions):
return self.model.predict([images, partial_captions], verbose=1)
支持的DSL
最后我们看一下pix2code具体都支持哪些DSL组件。
Android DSL
Android平台支持16种DSL
其中对应到控件的有8个:
- stack
- row
- label
- btn
- slider
- check
- radio
- switch
辅助性的DSL4个: - opening-tag
- closing-tag
- body
- footer
还有4种在FooterBar上的按钮: - btn-home
- btn-dashborad
- btn-notifications
- btn-search
与Android代码的对应表如下:
{
"opening-tag": "{",
"closing-tag": "}",
"body": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<LinearLayout\n xmlns:android=\"http://schemas.android.com/apk/res/android\"\n xmlns:app=\"http://schemas.android.com/apk/res-auto\"\n xmlns:tools=\"http://schemas.android.com/tools\"\n android:id=\"@+id/container\"\n android:layout_width=\"match_parent\"\n android:layout_height=\"match_parent\"\n android:orientation=\"vertical\"\n tools:context=\"com.tonybeltramelli.android_gui.MainActivity\">\n {}\n</LinearLayout>\n",
"stack": "<FrameLayout android:id=\"@+id/content\" android:layout_width=\"match_parent\" android:layout_height=\"match_parent\" android:layout_weight=\"1\" android:padding=\"10dp\">\n <LinearLayout android:layout_width=\"match_parent\" android:layout_height=\"match_parent\" android:orientation=\"vertical\">\n {}\n </LinearLayout>\n</FrameLayout>",
"row": "<LinearLayout android:layout_width=\"match_parent\" android:layout_height=\"wrap_content\" android:orientation=\"horizontal\" android:paddingTop=\"10dp\" android:paddingBottom=\"10dp\" android:weightSum=\"1\">\n{}\n</LinearLayout>",
"label": "<TextView android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:text=\"[TEXT]\" android:textAppearance=\"@style/TextAppearance.AppCompat.Body2\"/>\n",
"btn": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:text=\"[TEXT]\"/>",
"slider": "<SeekBar android:id=\"@+id/[ID]\" style=\"@style/Widget.AppCompat.SeekBar.Discrete\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:layout_weight=\"0.9\" android:max=\"10\" android:progress=\"5\"/>",
"check": "<CheckBox android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:paddingRight=\"10dp\" android:text=\"[TEXT]\"/>",
"radio": "<RadioButton android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:paddingRight=\"10dp\" android:text=\"[TEXT]\"/>",
"switch": "<Switch android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:paddingRight=\"10dp\" android:text=\"[TEXT]\"/>",
"footer": "<LinearLayout android:layout_width=\"match_parent\" android:layout_height=\"wrap_content\" android:orientation=\"horizontal\" android:weightSum=\"1\">\n {}\n</LinearLayout>",
"btn-home": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:background=\"#0ffffff\" android:layout_weight=\"1\" android:drawableBottom=\"@drawable/ic_home_black_24dp\" android:text=\"\"/>",
"btn-dashboard": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:background=\"#0ffffff\" android:layout_weight=\"1\" android:drawableBottom=\"@drawable/ic_dashboard_black_24dp\" android:text=\"\"/>",
"btn-notifications": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:background=\"#0ffffff\" android:layout_weight=\"1\" android:drawableBottom=\"@drawable/ic_notifications_black_24dp\" android:text=\"\"/>",
"btn-search": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:background=\"#0ffffff\" android:layout_weight=\"1\" android:drawableBottom=\"?android:attr/actionModeWebSearchDrawable\" android:text=\"\"/>"
}
iOS DSL
iOS有15个DSL项
常用控件6种:
- stack
- row
- img
- label
- switch
- slider
辅助结构3个: - opening-tag
- closing-tag
- body
特殊结构6个: - btn-add
- footer
- btn-search
- btn-contact
- btn-download
- btn-more
{
"opening-tag": "{",
"closing-tag": "}",
"body": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n<document type=\"com.apple.InterfaceBuilder3.CocoaTouch.Storyboard.XIB\" version=\"3.0\" toolsVersion=\"11201\" systemVersion=\"15G1217\" targetRuntime=\"iOS.CocoaTouch\" propertyAccessControl=\"none\" useAutolayout=\"YES\" useTraitCollections=\"YES\" colorMatched=\"YES\">\n <dependencies>\n <deployment identifier=\"iOS\"/>\n <plugIn identifier=\"com.apple.InterfaceBuilder.IBCocoaTouchPlugin\" version=\"11161\"/>\n <capability name=\"documents saved in the Xcode 8 format\" minToolsVersion=\"8.0\"/>\n </dependencies>\n <scenes>\n <!--View Controller-->\n <scene sceneID=\"qAw-JF-viq\">\n <objects>\n <viewController id=\"[ID]\" sceneMemberID=\"viewController\">\n <layoutGuides>\n <viewControllerLayoutGuide type=\"top\" id=\"[ID]\"/>\n <viewControllerLayoutGuide type=\"bottom\" id=\"[ID]\"/>\n </layoutGuides>\n <view key=\"view\" contentMode=\"center\" id=\"[ID]\">\n <rect key=\"frame\" x=\"0.0\" y=\"0.0\" width=\"375\" height=\"667\"/>\n <autoresizingMask key=\"autoresizingMask\" widthSizable=\"YES\" heightSizable=\"YES\"/>\n <subviews>\n {}\n </subviews>\n <color key=\"backgroundColor\" white=\"1\" alpha=\"1\" colorSpace=\"calibratedWhite\"/>\n </view>\n </viewController>\n <placeholder placeholderIdentifier=\"IBFirstResponder\" id=\"[ID]\" userLabel=\"First Responder\" sceneMemberID=\"firstResponder\"/>\n </objects>\n <point key=\"canvasLocation\" x=\"20\" y=\"95.802098950524751\"/>\n </scene>\n </scenes>\n</document>\n",
"stack": "<stackView opaque=\"NO\" contentMode=\"center\" fixedFrame=\"YES\" axis=\"vertical\" alignment=\"center\" spacing=\"10\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" minX=\"16\" minY=\"20\" width=\"343\" height=\"440\"/>\n <autoresizingMask key=\"autoresizingMask\" flexibleMaxX=\"YES\" flexibleMaxY=\"YES\"/>\n <subviews>\n {}\n </subviews>\n <color key=\"backgroundColor\" red=\"0.80000001190000003\" green=\"0.80000001190000003\" blue=\"0.80000001190000003\" alpha=\"1\" colorSpace=\"calibratedRGB\"/>\n</stackView>",
"row": "<view contentMode=\"center\" ambiguous=\"YES\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" width=\"343\" height=\"65\"/>\n <subviews>\n <stackView opaque=\"NO\" contentMode=\"center\" fixedFrame=\"YES\" spacing=\"30\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" minX=\"8\" minY=\"6\" width=\"337\" height=\"52\"/>\n <autoresizingMask key=\"autoresizingMask\" flexibleMaxX=\"YES\" flexibleMaxY=\"YES\"/>\n <subviews>\n {}\n </subviews>\n </stackView>\n </subviews>\n <color key=\"backgroundColor\" red=\"0.9\" green=\"0.9\" blue=\"0.9\" alpha=\"1\" colorSpace=\"calibratedRGB\"/>\n</view>",
"img": "<imageView userInteractionEnabled=\"NO\" contentMode=\"scaleToFill\" horizontalHuggingPriority=\"251\" verticalHuggingPriority=\"251\" ambiguous=\"YES\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" width=\"36\" height=\"36\"/>\n <color key=\"backgroundColor\" red=\"0.40000000600000002\" green=\"0.40000000600000002\" blue=\"1\" alpha=\"1\" colorSpace=\"calibratedRGB\"/>\n</imageView>",
"label": "<label opaque=\"NO\" userInteractionEnabled=\"NO\" contentMode=\"left\" horizontalHuggingPriority=\"251\" verticalHuggingPriority=\"251\" ambiguous=\"YES\" text=\"[TEXT]\" textAlignment=\"natural\" lineBreakMode=\"tailTruncation\" baselineAdjustment=\"alignBaselines\" adjustsFontSizeToFit=\"NO\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" width=\"255\" height=\"52\"/>\n <fontDescription key=\"fontDescription\" type=\"system\" pointSize=\"17\"/>\n <nil key=\"textColor\"/>\n <nil key=\"highlightedColor\"/>\n</label>",
"switch": "<switch opaque=\"NO\" contentMode=\"scaleToFill\" horizontalHuggingPriority=\"750\" verticalHuggingPriority=\"750\" ambiguous=\"YES\" contentHorizontalAlignment=\"center\" contentVerticalAlignment=\"center\" on=\"YES\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" width=\"51\" height=\"31\"/>\n</switch>",
"slider": "<slider opaque=\"NO\" contentMode=\"scaleToFill\" ambiguous=\"YES\" contentHorizontalAlignment=\"center\" contentVerticalAlignment=\"center\" value=\"0.5\" minValue=\"0.0\" maxValue=\"1\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" width=\"142\" height=\"31\"/>\n</slider>",
"btn-add": "<button opaque=\"NO\" contentMode=\"scaleToFill\" ambiguous=\"YES\" contentHorizontalAlignment=\"center\" contentVerticalAlignment=\"center\" buttonType=\"contactAdd\" lineBreakMode=\"middleTruncation\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" width=\"22\" height=\"22\"/>\n</button>",
"footer": "<tabBar contentMode=\"scaleToFill\" fixedFrame=\"YES\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n <frame key=\"frameInset\" height=\"49\"/>\n <autoresizingMask key=\"autoresizingMask\" widthSizable=\"YES\" flexibleMinY=\"YES\"/>\n <color key=\"backgroundColor\" white=\"0.0\" alpha=\"0.0\" colorSpace=\"calibratedWhite\"/>\n <items>\n {}\n </items>\n</tabBar>",
"btn-search": "<tabBarItem systemItem=\"search\" id=\"[ID]\"/>",
"btn-contact": "<tabBarItem systemItem=\"contacts\" id=\"[ID]\"/>",
"btn-download": "<tabBarItem systemItem=\"downloads\" id=\"[ID]\"/>",
"btn-more": "<tabBarItem systemItem=\"more\" id=\"[ID]\"/>"
}
Web的DSL
Web的DSL有16种:
- opening-tag
- closing-tag
- body
- header
- btn-active
- btn-inactive
- row
- single
- double
- quadruple
- btn-green
- btn-orange
- btn-red
- big-title
- small-title
- text
{
"opening-tag": "{",
"closing-tag": "}",
"body": "<html>\n <header>\n <meta charset=\"utf-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n <link rel=\"stylesheet\" href=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css\" integrity=\"sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u\" crossorigin=\"anonymous\">\n<link rel=\"stylesheet\" href=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap-theme.min.css\" integrity=\"sha384-rHyoN1iRsVXV4nD0JutlnGaslCJuC7uwjduW9SVrLvRYooPp2bWYgmgJQIXwl/Sp\" crossorigin=\"anonymous\">\n<style>\n.header{margin:20px 0}nav ul.nav-pills li{background-color:#333;border-radius:4px;margin-right:10px}.col-lg-3{width:24%;margin-right:1.333333%}.col-lg-6{width:49%;margin-right:2%}.col-lg-12,.col-lg-3,.col-lg-6{margin-bottom:20px;border-radius:6px;background-color:#f5f5f5;padding:20px}.row .col-lg-3:last-child,.row .col-lg-6:last-child{margin-right:0}footer{padding:20px 0;text-align:center;border-top:1px solid #bbb}\n</style>\n <title>Scaffold</title>\n </header>\n <body>\n <main class=\"container\">\n {}\n <footer class=\"footer\">\n <p>© Tony Beltramelli 2017</p>\n </footer>\n </main>\n <script src=\"js/jquery.min.js\"></script>\n <script src=\"js/bootstrap.min.js\"></script>\n </body>\n</html>\n",
"header": "<div class=\"header clearfix\">\n <nav>\n <ul class=\"nav nav-pills pull-left\">\n {}\n </ul>\n </nav>\n</div>\n",
"btn-active": "<li class=\"active\"><a href=\"#\">[]</a></li>\n",
"btn-inactive": "<li><a href=\"#\">[]</a></li>\n",
"row": "<div class=\"row\">{}</div>\n",
"single": "<div class=\"col-lg-12\">\n{}\n</div>\n",
"double": "<div class=\"col-lg-6\">\n{}\n</div>\n",
"quadruple": "<div class=\"col-lg-3\">\n{}\n</div>\n",
"btn-green": "<a class=\"btn btn-success\" href=\"#\" role=\"button\">[]</a>\n",
"btn-orange": "<a class=\"btn btn-warning\" href=\"#\" role=\"button\">[]</a>\n",
"btn-red": "<a class=\"btn btn-danger\" href=\"#\" role=\"button\">[]</a>",
"big-title": "<h2>[]</h2>",
"small-title": "<h4>[]</h4>",
"text": "<p>[]</p>\n"
}