准备训练、验证和测试集
重要的是,将数据适当地分割成训练验证测试集(64%-16%-20%),前两个测试集用于优化模型体系结构,后者用于评估模型性能。拆分发生在文件名级别。
#functiontocreatetraining, validationandtestingsets#adaptedfromhttps://colab.sandbox.google.com/notebooks/tpu.ipynb#andhttps://codelabs.developers.google.com/codelabs/keras-flowers-data/#4defcreate_train_validation_testing_sets(TFREC_PATTERN, VALIDATION_SPLIT=0.2, TESTING_SPLIT=0.2): """TFREC_PATTERN: string pattern for the TFREC bucket on GCS"""#seewhichacceleratorisavailabletry: #detectTPUstpu=Nonetpu=tf.distribute.cluster_resolver.TPUClusterResolver() #TPUdetectiontf.config.experimental_connect_to_cluster(tpu) tf.tpu.experimental.initialize_tpu_system(tpu) strategy=tf.distribute.experimental.TPUStrategy(tpu) exceptValueError: #detectGPUsstrategy=tf.distribute.MirroredStrategy() #forGPUormulti-GPUmachinesprint("Number of accelerators: ", strategy.num_replicas_in_sync) #Configuration#adaptedfromhttps://codelabs.developers.google.com/codelabs/keras-flowers-data/#4iftpu: BATCH_SIZE=16*strategy.num_replicas_in_sync#ATPUhas8coressothiswillbe128else: BATCH_SIZE=32#OnColab/GPU, ahigherbatchsizedoesnothelpandsometimesdoesnotfitontheGPU (OOM) #splittingdatafilesbetweentrainingandvalidationfilenames=tf.io.gfile.glob(TFREC_PATTERN) testing_split=int(len(filenames) *TESTING_SPLIT) training_filenames=filenames[testing_split:] testing_filenames=filenames[:testing_split] validation_split=int(len(filenames) *VALIDATION_SPLIT) validation_filenames=training_filenames[:validation_split] training_filenames=training_filenames[validation_split:] validation_steps=int(3670// len(filenames) * len(validation_filenames)) // BATCH_SIZEsteps_per_epoch=int(3670// len(filenames) * len(training_filenames)) // BATCH_SIZEreturntpu, BATCH_SIZE, strategy, training_filenames, validation_filenames, testing_filenames, steps_per_epoch#getthebatcheddataset, optimizingforI/Operformance#followbestpracticeforshufflingandrepeatingdatadefget_batched_dataset(filenames, load_func, train=False): """filenames: filenames to loadload_func: specific loading function to usetrain: Boolean, whether this is a training set"""dataset=load_func(filenames) dataset=dataset.cache() #ThisdatasetfitsinRAMiftrain: #BestpracticesforKeras: #Trainingdataset: repeatthenbatch#Evaluationdataset: donotrepeatdataset=dataset.repeat() dataset=dataset.batch(BATCH_SIZE) dataset=dataset.prefetch(AUTO) #prefetchnextbatchwhiletraining (autotuneprefetchbuffersize) #shouldshuffletoobutthisdatasetwaswellshuffledondiskalreadyreturndataset#source: Datasetperformanceguide: https://www.tensorflow.org/guide/performance/datasets#instantiatethedatasetstraining_dataset_1d=get_batched_dataset(training_filenames_1d, load_dataset_1d, train=True) validation_dataset_1d=get_batched_dataset(validation_filenames_1d, load_dataset_1d, train=False) testing_dataset_1d=get_batched_dataset(testing_filenames_1d, load_dataset_1d, train=False)
模型和训练
最后,我们可以使用kerasapi来构建和测试模型。网上有大量关于如何使用Keras构建模型的信息,所以我不会深入讨论细节,但是这里是使用1D卷积层与池层相结合来从原始音频中提取特征。
#createaCNNmodelwithstrategy.scope(): #createthemodelmodel=tf.keras.Sequential([ tf.keras.layers.Conv1D(filters=128, kernel_size=3, activation='relu', input_shape=[window_size,1], name='conv1'), tf.keras.layers.MaxPooling1D(name='max1'), tf.keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name='conv2'), tf.keras.layers.MaxPooling1D(name='max2'), tf.keras.layers.Flatten(name='flatten'), tf.keras.layers.Dense(100, activation='relu', name='dense1'), tf.keras.layers.Dropout(0.5, name='dropout2'), tf.keras.layers.Dense(20, activation='relu', name='dense2'), tf.keras.layers.Dropout(0.5, name='dropout3'), tf.keras.layers.Dense(8, name='dense3') ]) #compilemodel.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.summary() #trainthemodellogdir="logs/scalars/"+datetime.now().strftime("%Y%m%d-%H%M%S") tensorboard_callback=keras.callbacks.TensorBoard(log_dir=logdir) EPOCHS=100raw_audio_history=model.fit(training_dataset_1d, steps_per_epoch=steps_per_epoch, validation_data=validation_dataset_1d, epochs=EPOCHS, callbacks=tensorboard_callback) #evaluateonthetestdatamodel.evaluate(testing_dataset_1d)
最后一点相关信息是关于使用TensorBoard绘制训练和验证曲线。
%load_exttensorboard%tensorboard--logdirlogs/scalars
总结
总之,对同一个机器学习任务进行不同机器学习方法的基准测试是很有启发性的。该项目强调了领域知识和特征工程的重要性,以及标准的、相对容易的机器学习技术(如naivebayes)的威力。过拟合是一个问题,因为与示例数量相比,特性的规模很大,但我相信未来的努力可以帮助缓解这个问题。
我很高兴地看到了在谱图上进行迁移学习的强大表现,并认为我们可以通过使用更多的音乐理论特征来做得更好。然而,如果有更多的数据可用于提取模式,原始音频的深度学习技术确实显示出希望。我们可以设想一个应用程序,其中分类可以直接发生在音频样本上,而不需要特征工程。