学院与科大讯飞合作“厦门大学-科大讯飞闽南语语音与语言联合实验室”,第一阶段的语音识别演示系统,可能只是简单的闽南语孤立词识别。现成的演示系统有去年写的android演示程序。打算再写个PC端的演示系统,基本的引擎已经搭建好,后续界面和数据库方面再调整优化。再来,最近学习Golang,怎么可以不用上呢?web版演示系统,golang(Beego框架)(后端) + HTML5(前端) + MongoDB(数据库)。
本节,主要讲解web前端的录音工作,以及通过HTML5 websocket传输音频流数据到后端并保存。
来看下代码:
record.html:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
|
<!DOCTYPE HTML>
<html lang=
"en"
>
<head>
<meta charset =
"utf-8"
/>
<title>PONPON Chat by WebSockets</title>
<script type=
"text/javascript"
src=
"/static/lib/recorder.js"
> </script>
<script type=
"text/javascript"
src=
"/static/lib/jquery-1.10.1.min.js"
> </script>
<style type=
'text/css'
>
</style>
</head>
<body>
<audio controls autoplay></audio>
<form>
<input type=
"button"
id=
"record"
value=
"录音"
>
<input type=
"button"
id=
"export"
value=
"发送"
>
</form>
<div id=
"message"
></div>
</body>
<script type=
'text/javascript'
>
//回调函数
var
onFail =
function
(e) {
console.log(
'Rejected!'
, e);
};
//回调函数
var
onSuccess =
function
(s) {
var
context =
new
webkitAudioContext();
var
mediaStreamSource = context.createMediaStreamSource(s);
rec =
new
Recorder(mediaStreamSource);
}
//window.URL = URL || window.URL || window.webkitURL;
navigator.getUserMedia = navigator.getUserMedia ||
navigator.webkitGetUserMedia || navigator.mozGetUserMedia ||
navigator.msGetUserMedia;
var
rec;
var
audio = document.querySelector(
'#audio'
);
function
startRecording() {
if
(navigator.getUserMedia) {
//onSuccess, onFail分别为getUserMedia成功或失败的回调函数
navigator.getUserMedia({audio:
true
}, onSuccess, onFail);
}
else
{
console.log(
'navigator.getUserMedia not present'
);
}
}
startRecording();
//--------------------
$(
'#record'
).click(
function
() {
rec.record();
var
dd = ws.send(
"start"
);
$(
"#message"
).text(
"Click export to stop recording"
);
/*
setInterval函数,看到后面3000没?意思是周期为3000毫
秒,每过3000毫秒,执行一次前面的function,在此处就是
执行function内的rec.clear()和ws.send(blob),直到
clearInterval(intervalKey)函数出现则停止
*/
intervalKey = setInterval(
function
() {
rec.exportWAV(
function
(blob) {
rec.clear();
ws.send(blob);
//audio.src = URL.createObjectURL(blob);
});
}, 3000);
});
$(
'#export'
).click(
function
() {
// first send the stop command
rec.stop();
ws.send(
"stop"
);
clearInterval(intervalKey);
$(
"#message"
).text(
"已发送到服务器!"
);
});
var
ws =
new
WebSocket(
'ws://'
+ window.location.host +
'/record/join'
);
ws.onopen =
function
() {
console.log(
"Openened connection to websocket"
);
};
ws.onclose =
function
(){
console.log(
"Close connection to websocket"
);
}
ws.onerror =
function
(){
console.log(
"Cannot connection to websocket"
);
}
ws.onmessage =
function
(e) {
audio.src = URL.createObjectURL(e.data);
}
</script>
</html>
|
这段代码关键在于navigator.getUserMedia来获得客户端的媒体资源。进入该页面,将向chrome浏览器客户端请求媒体资源。请求成功后:
1
2
3
4
5
6
7
8
|
//创建webkitAudio资源
var
context =
new
webkitAudioContext();
//创建媒体流
var
mediaStreamSource = context.createMediaStreamSource(s);
//录音实例
rec =
new
Recorder(mediaStreamSource);
|
开始录音,执行rec.record(),看下recorder.js:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
|
(
function
(window) {
var
WORKER_PATH =
'/static/lib/recorderWorker.js'
;
var
Recorder =
function
(source, cfg) {
var
config = cfg || {};
var
bufferLen = config.bufferLen || 4096*2;
this
.context = source.context;
/*
下面 createJavaScriptNode()中后两个参数分别为
输入、输出声道数。1指单声道,2指多声道
*/
this
.node =
this
.context.createJavaScriptNode(bufferLen, 2, 2);
var
worker =
new
Worker(config.workerPath || WORKER_PATH);
worker.postMessage({
command:
'init'
,
config: {
sampleRate:
this
.context.sampleRate
}
});
var
recording =
false
,
currCallback;
this
.node.onaudioprocess =
function
(e) {
if
(!recording)
return
;
worker.postMessage({
command:
'record'
,
buffer: [
//获得左声道数据
e.inputBuffer.getChannelData(0)
,
//获得右声道数据
e.inputBuffer.getChannelData(1)
]
});
}
this
.configure =
function
(cfg) {
for
(
var
prop
in
cfg) {
if
(cfg.hasOwnProperty(prop)) {
config[prop] = cfg[prop];
}
}
}
this
.record =
function
() {
recording =
true
;
}
this
.stop =
function
() {
recording =
false
;
}
this
.clear =
function
() {
worker.postMessage({
command:
'clear'
});
}
this
.getBuffer =
function
(cb) {
currCallback = cb || config.callback;
worker.postMessage({
command:
'getBuffer'
})
}
this
.exportWAV =
function
(cb, type) {
currCallback = cb || config.callback;
type = type || config.type ||
'audio/wav'
;
if
(!currCallback)
throw
new
Error(
'Callback not set'
);
worker.postMessage({
command:
'exportWAV'
,
type: type
});
}
worker.onmessage =
function
(e) {
var
blob = e.data;
currCallback(blob);
}
source.connect(
this
.node);
this
.node.connect(
this
.context.destination);
//this should not be necessary
};
Recorder.forceDownload =
function
(blob, filename) {
var
url = (window.URL || window.webkitURL).createObjectURL(blob);
alert(url);
var
link = window.document.createElement(
'a'
);
link.href = url;
link.download = filename ||
'output.wav'
;
var
click = document.createEvent(
"Event"
);
click.initEvent(
"click"
,
true
,
true
);
link.dispatchEvent(click);
}
window.Recorder = Recorder;
})(window);
|
开始录音后,执行this
.node.onaudioprocess,从录音缓冲去录音samples数据,注意:
1
2
3
4
5
6
7
8
|
worker.postMessage({
command:
'record'
,
buffer: [
e.inputBuffer.getChannelData(0)
,
e.inputBuffer.getChannelData(1)
]
});
|
buffer将从录音设备获取两个声道的数据。
recorderWorker.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
|
var
recLength = 0,
recBuffersL = [],
recBuffersR = [],
sampleRate;
this
.onmessage =
function
(e) {
switch
(e.data.command) {
case
'init'
:
init(e.data.config);
break
;
case
'record'
:
record(e.data.buffer);
break
;
case
'exportWAV'
:
exportWAV(e.data.type);
break
;
case
'getBuffer'
:
getBuffer();
break
;
case
'clear'
:
clear();
break
;
}
};
function
init(config) {
sampleRate = config.sampleRate ;
}
//从录音设备获得两个声道的数据
function
record(inputBuffer) {
recBuffersL.push(inputBuffer[0]);
recBuffersR.push(inputBuffer[1]);
recLength += inputBuffer[0].length;
}
//发送处理好的dataview数据
function
exportWAV(type) {
var
bufferL = mergeBuffers(recBuffersL, recLength);
var
bufferR = mergeBuffers(recBuffersR, recLength);
var
interleaved = interleave(bufferL , bufferR);
var
dataview = encodeWAV(interleaved);
var
audioBlob =
new
Blob([dataview], {
type: type
});
this
.postMessage(audioBlob);
}
//从录音缓冲读取数据存入发送缓冲
function
getBuffer() {
var
buffers = [];
buffers.push(mergeBuffers(recBuffersL, recLength));
buffers.push( mergeBuffers(recBuffersR, recLength) );
this
.postMessage(buffers);
}
//清除录音缓冲数据
function
clear(inputBuffer) {
recLength = 0;
recBuffersL = [];
recBuffersR = [];
}
//合并数据
function
mergeBuffers(recBuffers, recLength) {
var
result =
new
Float32Array(recLength);
var
offset = 0;
for
(
var
i = 0; i < recBuffers.length; i++) {
result.set(recBuffers[i], offset);
offset += recBuffers[i].length;
}
return
result;
}
//合并交错左右声道数据
function
interleave(inputL, inputR){
// function interleave(inputL) {
var
length = inputL.length + inputR.length ;
var
result =
new
Float32Array(length);
var
index = 0,
inputIndex = 0;
while
(index < length) {
result[index++] = inputL[inputIndex];
result[index++] = inputR[inputIndex];
inputIndex++;
}
return
result;
}
//数据转码16bit
function
floatTo16BitPCM(output, offset, input) {
for
(
var
i = 0; i < input.length; i++, offset += 2) {
var
s = Math.max(-1, Math.min(1, input[i]));
output.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7FFF,
true
);
}
}
function
writeString(view, offset, string) {
for
(
var
i = 0; i < string.length; i++) {
view.setUint8(offset + i, string.charCodeAt(i));
}
}
//写入44位 wav数据头
function
encodeWAV(samples) {
var
buffer =
new
ArrayBuffer(44 + samples.length * 2);
var
view =
new
DataView(buffer);
/* RIFF identifier */
writeString(view, 0,
'RIFF'
);
/* file length */
view.setUint32(4, 32 + samples.length * 2,
true
);
/* RIFF type */
writeString(view, 8,
'WAVE'
);
/* format chunk identifier */
writeString(view, 12,
'fmt '
);
/* format chunk length */
view.setUint32(16, 16,
true
);
/* sample format (raw) */
view.setUint16(20, 1,
true
);
/* channel count */
view.setUint16(22, 2,
true
);
/* sample rate */
view.setUint32(24, sampleRate,
true
);
/* byte rate (sample rate * block align) */
view.setUint32(28, sampleRate * 4,
true
);
/* block align (channel count * bytes per sample) */
view.setUint16(32, 4,
true
);
/* bits per sample */
view.setUint16(34, 16,
true
);
/* data chunk identifier */
writeString(view, 36,
'data'
);
/* data chunk length */
view.setUint32(40, samples.length * 2,
true
);
floatTo16BitPCM(view, 44, samples);
return
view;
}
|
目前,只能录制48000Hz 16Bit 数据。我调整了录制参数,所需目标格式为8000Hz 16Bit Mono语音数据,但是失败了,录制出的数据仍然是48000Hz 16Bit。由于对前端javascript代码完全不了解,后续再来研究怎么解决这个录音格式的问题。
补:录制单声道的话,在recorder.js中修改this.context.createJavaScriptNode(bufferLen, 1, 1),在recorderWorker.js中把右声道的数据都砍掉就ok了。
再回头看record.html中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
//进入页面服务器发送websocket握手请求
var
ws =
new
WebSocket(
'ws://'
+ window.location.host +
'/record/join'
);
//握手成功
ws.onopen =
function
() {
console.log(
"Openened connection to websocket"
);
};
//断开连接
ws.onclose =
function
(){
console.log(
"Close connection to websocket"
);
}
//握手失败
ws.onerror =
function
(){
console.log(
"Cannot connection to websocket"
);
}
|
每次刷新登入该页面,客户端就会向服务器发送websocket握手请求,握手成功后,js代码中录好音之后 将ws.send(数据)对应到button上,点击按钮就可发送数据了。
golang beego框架后端怎么来处理数据呢?在页面对应的controllers上的代码上定义controller的join方法,代码较为简陋,初步实现功能,后续加上channel等来完善:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
|
package
controllers
import
(
"bufio"
"github.com/astaxie/beego"
"github.com/garyburd/go-websocket/websocket"
"net/http"
"os"
"path"
"strings"
)
type RecordController struct {
beego.Controller
}
func (
this
*RecordController) Join() {
//获取请求端的IP地址
remoteAddr := strings.Split(
this
.Ctx.Request.RemoteAddr,
":"
)[
0
]
mlogger.i(
"Reciving Record Data From Host: "
+ remoteAddr)
//获取websocket的连接实例
ws, err := websocket.Upgrade(
this
.Ctx.ResponseWriter,
this
.Ctx.Request.Header, nil,
1024
,
1024
)
if
_, ok := err.(websocket.HandshakeError); ok {
http.Error(
this
.Ctx.ResponseWriter,
"Not a websocket handshake"
,
400
)
return
}
else
if
err != nil {
beego.Error(
"Cannot setup WebSocket connection:"
, err)
return
}
//以IP地址作为保存wav文件的文件名
wavName :=
"record/"
+ remoteAddr +
".wav"
os.MkdirAll(path.Dir(wavName), os.ModePerm)
_, e := os.Stat(wavName)
if
e == nil {
//删除已有wav文件
os.Remove(wavName)
}
f, err := os.Create(wavName)
mlogger.i(
"Host: "
+ remoteAddr +
" creating file handler ..."
)
defer f.Close()
if
err != nil {
mlogger.e(err)
return
}
w := bufio.NewWriter(f)
for
{
//从websocket上读取数据流
_, p, err := ws.ReadMessage()
if
err != nil {
mlogger.i(
"Host: "
+ remoteAddr +
" disconnected ..."
)
break
}
length := len(p)
if
length ==
4
|| length ==
5
{
//length == 4,说明在web上发送ws.send('stop')
//length == 5,说明在web上发送ws.send('start')
action := string(p)
mlogger.i(
"Client's action: "
+ action +
" recording !"
)
if
action ==
"stop"
{
goto
SAVE
}
else
{
goto
RESTART
}
}
w.Write(p)
continue
SAVE:
mlogger.i(
"Host: "
+ remoteAddr +
" saving wav file wav ..."
)
w.Flush()
mlogger.i(
"Host: "
+ remoteAddr +
" flushing writer ..."
)
f.Close()
mlogger.i(
"Host: "
+ remoteAddr +
" closing the file handler ..."
)
continue
RESTART:
os.Remove(wavName)
f, err = os.Create(wavName)
mlogger.i(
"Host: "
+ remoteAddr +
" creating file handler ..."
)
// defer f.Close()
if
err != nil {
mlogger.e(err)
return
}
w = bufio.NewWriter(f)
}
return
}
|
在路由设置上:
1
2
|
beego.Router(
"/record"
, &controllers.RecordController{})
beego.Router(
"/record/join"
, &controllers.RecordController{},
"get:Join"
)
|
补:
注意到在record.html中:
1
2
3
4
5
6
|
intervalKey = setInterval(
function
() {
rec.exportWAV(
function
(blob) {
rec.clear();
ws.send(blob);
});
}, 3000);
|
setInterval函数中function里ws.send(blob)每过3秒就往服务器发送blob数据,在 recorderWorker.js中的encordWAV函数中,往裸语音数据数据加44位wav头数据,而数据的长度一直是本周期内所录语音数据的长度,这就会出现,最后在服务器保存了3秒以上的数据,但是读到的wav头中关于数据长度的值则只有3秒或3秒以内。并且,每次都往数据wav头也是不对的,44位wav并不是有效的语音数据。所以在recorderWorker.js中应修改encordWAV代码:
1
2
3
4
5
6
|
function
encodeWAV(samples) {
var
buffer =
new
ArrayBuffer(samples.length * 2);
var
view =
new
DataView(buffer);
floatTo16BitPCM(view, 0, samples);
return
view;
}
|
这样就直接往服务器传输裸语音数据流,在record.html上点击发送按钮的事件函数里,添加
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
$(
'#export'
).click(
function
() {
rec.stop();
if
(intervalKey==
null
) {
$(
"#message"
).text(
"请先录音再发送!"
);
return
};
ws.send(sampleRate);
ws.send(channels);
console.log(
'sampleRate:'
+sampleRate+
',channels:'
+channels);
ws.send(
"stop"
);
rec.clear();
clearInterval(intervalKey);
$(
"#message"
).text(
"已发送到服务器!"
);
});
|
服务器就能收到收到数据的samplerate采样率,channels声道数。相应的在golang服务器代码join方法中,添加写44位wav头的代码,把这数据头写在裸语音数据缓存的最前端并保存wav文件即可:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
|
type wavHeader []
byte
//wav 44位文件头
func SetHeader(sampleRate
int
, channel
int
, length uint32) (header wavHeader) {
header = make([]
byte
,
44
)
chunkSize := length +
36
header[
0
] =
'R'
header[
1
] =
'I'
header[
2
] =
'F'
header[
3
] =
'F'
header[
4
] =
byte
(chunkSize &
0xff
)
header[
5
] =
byte
((chunkSize >>
8
) &
0xff
)
header[
6
] =
byte
((chunkSize >>
16
) &
0xff
)
header[
7
] =
byte
((chunkSize >>
24
) &
0xff
)
header[
8
] =
'W'
header[
9
] =
'A'
header[
10
] =
'V'
header[
11
] =
'E'
header[
12
] =
'f'
header[
13
] =
'm'
header[
14
] =
't'
header[
15
] =
' '
header[
16
] =
16
header[
17
] =
0
header[
18
] =
0
header[
19
] =
0
header[
20
] =
1
header[
21
] =
0
header[
22
] =
byte
(channel &
0xff
)
//1 or 2
header[
23
] =
0
header[
24
] =
byte
(sampleRate &
0xff
)
//64 8000
header[
25
] =
byte
((sampleRate >>
8
) &
0xff
)
//31 8000
header[
26
] =
byte
((sampleRate >>
16
) &
0xff
)
//0
header[
27
] =
byte
((sampleRate >>
24
) &
0xff
)
//0
header[
28
] =
byte
((sampleRate *
2
* channel) &
0xff
)
//128 800
header[
29
] =
byte
((sampleRate *
2
* channel) >>
8
&
0xff
)
//62
header[
30
] =
byte
((sampleRate *
2
* channel) >>
16
&
0xff
)
//0
header[
31
] =
byte
((sampleRate *
2
* channel) >>
24
&
0xff
)
//0
header[
32
] =
byte
((channel *
2
) &
0xff
)
//2 or 4
header[
33
] =
0
header[
34
] =
16
header[
35
] =
0
header[
36
] =
'd'
header[
37
] =
'a'
header[
38
] =
't'
header[
39
] =
'a'
header[
40
] =
byte
(length &
0xff
)
header[
41
] =
byte
((length >>
8
) &
0xff
)
header[
42
] =
byte
((length >>
16
) &
0xff
)
header[
43
] =
byte
((length >>
24
) &
0xff
)
return
}
|
如果需要去除语音的静音部分,参考我的github:github.com/liuxp0827/waveIO。最新的waveIO包没来得及上传,对delSilence函数做下修改即可。
完整代码,请浏览附件:http://down.51cto.com/data/1092540。
这样,从前端录音,到websocket传输数据,再到beego后端读写数据到服务器本地就可实现了