目标检测中的性能评价指标

mAP

Posted by WenlSun" on October 8, 2019

参考文章

对于目标检测问题,常用到的性能评价指标有:mAP(平均准确度均值,精度评价),速度指标(FPS,即每秒处理的图片数量,或者处理每张图片所需的时间,需要在同一硬件环境下比较)

mAP(平均准确度均值)

1. mAP的定义和相关概念

  • mAP:mean Average Precision,即各类别AP的平均值
  • AP:PR曲线下的面积
  • PR曲线:Precision-Recall曲线
  • Precision:$\frac{TP}{TP+FP}$
  • Recall:$\frac{TP}{TP+FN}$
  • TP:$IoU>0.5$ 的检测框,但同一个Ground Truth 只计算一次
  • FP:$IoU<=0.5$ 的检测框,或者检测到同一个Ground Truth的多余检测框的数量
  • FN:没有检测到的Ground Truth的数量。

一般来说mAP是针对整个数据集而言的,AP正对数据集中的某一个类别而言的,而Precision和Recall是针对单张图片的某一类别的。

2. 计算某一类别的AP

由上面的概念可知,当计算某一类别的AP时需要会画出这一类别的PR曲线,就需要先计算数据集中每张图片中这一类别的Precision和Recall。由公式:

  • $Precision=\frac{TP}{TP+FP}$
  • $Recall=\frac{TP}{TP+FN}$

我们只需要统计出相应的TP,FP和FN即可根据上述公式得出这一类别的Precision和Recall。

如何判断TP,FP和FN?

以单张图片为例,首先便利图片中的Ground Truth的目标,提取需要计算的某类别的Ground Truth 目标,之后读取通过检测器检测出来的这种类别的检测框(其他类别先不管),接着过滤掉置信度分数低于置信度阈值的框(也有的未设置信度阈值),将剩下的检测框按照置信度分数从高到低进行排序,最先判断置信度分数最高的检测框与gt box的iou是否大于设定的iou阈值,若大于设定的iou阈值,则将其判断为TP,并将该gt box 标记为已检测(后续的同一个gt的多余检测框都被标记为FP,这就是为什么先要按照置信度分数从高到低排序),若iou小于设定的阈值,则直接将其划分为FP。

1
2
3
4
5
6
7
8
9
10
        if ovmax > ovthresh:              # 若iou大于阈值
            if not R['difficult'][jmax]:  # 且要检测的gt对象非difficult类型
                if not R['det'][jmax]:    # 且gt对象暂未被检测
                    tp[d] = 1.            # 此检测框被标记为TP
                    R['det'][jmax] = 1    # 将此gt目标标记为已检测
                else:
                    fp[d] = 1.            # 若gt对象已被检测,那么此检测框为FP
        else:
            fp[d] = 1.                    # iou小于阈值,此检测框为FP
     

FN的计算很简单,一幅图片中共有多少个gt是知道的,TP我们可以统计出来,用gt的数量减去TP的数量就可以获得FN的数量。至此,我们获得了TP,FP和FN,根据上面的公式即可计算出Precision和Recall。

计算AP

AP的计算三种方式:

  • 在VOC2010以前,只需要选取当recall>=0,0.1,0.2,…,1共11个点时的Precision最大的值,然后求这11个Precision的平均值就可得到AP值。
  • 在VOC2010及以后,需要针对每一个不同的Recall值(包括0和1),选取其大于等于这些recall值时的Precision的最大值,然后计算PR曲线下面的面积作为AP值。
  • COCO数据集,设定多个IOU阈值(0.5-0.95,0.05为步长),在每一个iou阈值下都有某一类别的AP值,然后求不同iou阈值下AP的均值,就是所求的最终的某一类别的AP值。

最后,mAP就是所有类的AP值的均值。以下是完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
def voc_ap(rec, prec, use_07_metric=False):
    """ ap = voc_ap(rec, prec, [use_07_metric])
    Compute VOC AP given precision and recall.
    If use_07_metric is true, uses the
    VOC 07 11 point method (default:False).
    """
    if use_07_metric:
        # 11 point metric
        ap = 0.
        for t in np.arange(0., 1.1, 0.1):
            if np.sum(rec >= t) == 0:
                p = 0
            else:
                p = np.max(prec[rec >= t])
            ap = ap + p / 11.
    else:
        # correct AP calculation
        # first append sentinel values at the end
        mrec = np.concatenate(([0.], rec, [1.]))
        mpre = np.concatenate(([0.], prec, [0.]))

        # compute the precision envelope
        for i in range(mpre.size - 1, 0, -1):
            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

        # to calculate area under PR curve, look for points
        # where X axis (recall) changes value
        i = np.where(mrec[1:] != mrec[:-1])[0]

        # and sum (\Delta recall) * prec
        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap


def voc_eval(detpath,
             annopath,
             imagesetfile,
             classname,
            # cachedir,
             ovthresh=0.5,
             use_07_metric=False):
    """rec, prec, ap = voc_eval(detpath,
                                annopath,
                                imagesetfile,
                                classname,
                                [ovthresh],
                                [use_07_metric])
    Top level function that does the PASCAL VOC evaluation.
    detpath: Path to detections
        detpath.format(classname) should produce the detection results file.
    annopath: Path to annotations
        annopath.format(imagename) should be the xml annotations file.
    imagesetfile: Text file containing the list of images, one image per line.
    classname: Category name (duh)
    cachedir: Directory for caching the annotations
    [ovthresh]: Overlap threshold (default = 0.5)
    [use_07_metric]: Whether to use VOC07's 11 point AP computation
        (default False)
    """
    # assumes detections are in detpath.format(classname)
    # assumes annotations are in annopath.format(imagename)
    # assumes imagesetfile is a text file with each line an image name
    # cachedir caches the annotations in a pickle file

    # first load gt
    #if not os.path.isdir(cachedir):
     #   os.mkdir(cachedir)
    #cachefile = os.path.join(cachedir, 'annots.pkl')
    # read list of images
    with open(imagesetfile, 'r') as f:
        lines = f.readlines()
    imagenames = [x.strip() for x in lines]
    #print('imagenames: ', imagenames)
    #if not os.path.isfile(cachefile):
        # load annots
    recs = {}
    for i, imagename in enumerate(imagenames):
        #print('parse_files name: ', annopath.format(imagename))
        recs[imagename] = parse_gt(annopath.format(imagename))
        #if i % 100 == 0:
         #   print ('Reading annotation for {:d}/{:d}'.format(
          #      i + 1, len(imagenames)) )
        # save
        #print ('Saving cached annotations to {:s}'.format(cachefile))
        #with open(cachefile, 'w') as f:
         #   cPickle.dump(recs, f)
    #else:
        # load
        #with open(cachefile, 'r') as f:
         #   recs = cPickle.load(f)

    # extract gt objects for this class
    class_recs = {}
    npos = 0
    for imagename in imagenames:
        R = [obj for obj in recs[imagename] if obj['name'] == classname]
        bbox = np.array([x['bbox'] for x in R])
        difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
        det = [False] * len(R)
        npos = npos + sum(~difficult)
        class_recs[imagename] = {'bbox': bbox,
                                 'difficult': difficult,
                                 'det': det}

    # read dets
    detfile = detpath.format(classname)
    with open(detfile, 'r') as f:
        lines = f.readlines()

    splitlines = [x.strip().split(' ') for x in lines]
    image_ids = [x[0] for x in splitlines]
    confidence = np.array([float(x[1]) for x in splitlines])

    #print('check confidence: ', confidence)

    BB = np.array([[float(z) for z in x[2:]] for x in splitlines])

    # sort by confidence
    sorted_ind = np.argsort(-confidence)
    sorted_scores = np.sort(-confidence)

    #print('check sorted_scores: ', sorted_scores)
    #print('check sorted_ind: ', sorted_ind)
    BB = BB[sorted_ind, :]
    image_ids = [image_ids[x] for x in sorted_ind]
    #print('check imge_ids: ', image_ids)
    #print('imge_ids len:', len(image_ids))
    # go down dets and mark TPs and FPs
    nd = len(image_ids)
    tp = np.zeros(nd)
    fp = np.zeros(nd)
    for d in range(nd):
        R = class_recs[image_ids[d]]
        bb = BB[d, :].astype(float)
        ovmax = -np.inf
        BBGT = R['bbox'].astype(float)

        if BBGT.size > 0:
            # compute overlaps
            # intersection
            ixmin = np.maximum(BBGT[:, 0], bb[0])
            iymin = np.maximum(BBGT[:, 1], bb[1])
            ixmax = np.minimum(BBGT[:, 2], bb[2])
            iymax = np.minimum(BBGT[:, 3], bb[3])
            iw = np.maximum(ixmax - ixmin + 1., 0.)
            ih = np.maximum(iymax - iymin + 1., 0.)
            inters = iw * ih

            # union
            uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
                   (BBGT[:, 2] - BBGT[:, 0] + 1.) *
                   (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)

            overlaps = inters / uni
            # print(overlaps.shape)
            ovmax = np.max(overlaps)
            ## if there exist 2
            jmax = np.argmax(overlaps)

        if ovmax > ovthresh:
            if not R['difficult'][jmax]:
                if not R['det'][jmax]:
                    tp[d] = 1.
                    R['det'][jmax] = 1
                else:
                    fp[d] = 1.
                   # print('filename:', image_ids[d])
        else:
            fp[d] = 1.
        
    # compute precision recall

    print('npos num:', npos)
    fp = np.cumsum(fp)
    tp = np.cumsum(tp)
    print('check fp:', fp[-1])
    print('check tp', tp[-1])

    rec = tp / float(npos)
    # avoid divide by zero in case the first detection matches a difficult
    # ground truth
    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
    ap = voc_ap(rec, prec, use_07_metric)
    print('Recall: ', rec[-1])
    print('Prec: ', prec[-1])
    return rec, prec, ap

速度指标