Fast.ai 如何设置带标签的测试集 Test Data Set

fast.ai

fast.ai 是一个加州的创业公司,口号是 Making neural nets uncool again 。他们提供了一个针对深度学习的高级的调用接口。 通过调用 fast.ai 的程序,可以很方便的实现对于程序的设计和利用。这是非常好的。

如何设置 Test 数据集

我发现 fast.ai 的设计中的一个偏好,就是他们不设置 test data 而是只使用 validation data set 作为 CV 的测试。 这个当然没有问题,因为计算机视觉的项目在针对特别大的数据集时,都是默认将 Validation 集作为测试集的。但是对于很多的科研项目,因为你的数据量并不是特别多,所以很多时候你需要将 Validation 集和 Test 集分别开。

当然这个问题也是有解决方法的,具体就是在训练的时候正常训练不实用测试集。在最后评估的时候用 Test 集替换 Validation 集然后测试,具体参考官方文档 Add a test set

In the fastai framework test datasets have no labels - this is the unknown data to be predicted. If you want to validate your model on a test dataset with labels, you probably need to use it as a validation set, as in:


data_test = (ImageList.from_folder(path)
        .split_by_folder(train='train', valid='test')
        .label_from_folder()
        ...)

Another approach, where you do use a normal validation set, and then when the training is over, you just want to validate the test set w/ labels as a validation set, you can do this:


tfms = []
path = Path('data').resolve()
data = (ImageList.from_folder(path)
        .split_by_pct()
        .label_from_folder()
        .transform(tfms)
        .databunch()
        .normalize() ) 
learn = cnn_learner(data, models.resnet50, metrics=accuracy)
learn.fit_one_cycle(5,1e-2)

# now replace the validation dataset entry with the test dataset as a new validation dataset: 
# everything is exactly the same, except replacing `split_by_pct` w/ `split_by_folder` 
# (or perhaps you were already using the latter, so simply switch to valid='test')
data_test = (ImageList.from_folder(path)
        .split_by_folder(train='train', valid='test')
        .label_from_folder()
        .transform(tfms)
        .databunch()
        .normalize()
       ) 
learn.validate(data_test.valid_dl)

Of course, your data block can be totally different, this is just an example.

官方文档的解释还是非常倾向的,当然这个确实藏得很深,不太容易发现。

Written on May 5, 2019