Gravity Spy Notes

Gravity Spy’s leveling up and aggregation work as follows.

Confusion Matrix

After classifying on a gold standard subject a volunteer’s confusion matrix is estimated/updated. All gold standard subjects have a machine learning (ML) label. This matrix has one row and one column for each of the categories available to pick from in the front-end (22 at the top level). The column indicates the value the volunteer voted on and row indicates the value of the ML label.

As and example lets look at a case where there are only 5 categories to pick from.

\[\begin{split}\text{CM} = \overbrace{\begin{bmatrix} 5 & 1 & 1 & 0 & 0 \\ 2 & 4 & 1 & 0 & 0 \\ 1 & 0 & 6 & 0 & 1 \\ 0 & 2 & 0 & 6 & 2 \\ 0 & 1 & 2 & 3 & 3 \\ \end{bmatrix}}^{ \textstyle \text{Answer index} }\left.\rule{0cm}{1.2cm}\right\}\text{ML index}\end{split}\]

What this looks like in the code

Within the gravity_spy_user_reducer store this is structured as two nested dictionaries with the first key indicating the column and the second key indicating the row (category labels will be used as keys):

confusion_matrix = {
    '1': {'1': 5, '2': 2, '3': 1},
    '2': {'1': 1, '2': 4, '4': 2, '5': 1},
    '3': {'1': 1, '2': 1, '3': 6, '5': 2},
    '4': {'4': 6, '5': 3},
    '5': {'3': 1, '4': 2, '5': 3}
}

Volunteer skill

To get the volunteer skill the columns of the CM need to be normalized so they sum to 1 (i.e. divide each column by the number N of times a category was voted for).

In this example:

\[\text{N} = \begin{bmatrix} 8, 8, 10, 9, 6 \end{bmatrix}\]

\[\begin{split}\frac{\text{CM}}{\text{N}} = \text{CM}_{\text{Norm}} = \begin{bmatrix} 5/8 & 1/8 & 1/10 & 0 & 0 \\ 2/8 & 4/8 & 1/10 & 0 & 0 \\ 1/8 & 0 & 6/10 & 0 & 1/6 \\ 0 & 2/8 & 0 & 6/9 & 2/6 \\ 0 & 1/8 & 2/10 & 3/9 & 3/6 \\ \end{bmatrix}\end{split}\]

The diagonal of this normalized matrix gives how often the volunteer correctly identifies each of the categories. Once all of these values pass a given threshold (set in the reducer’s configuration) the volunteer is promoted to the next level.

\[\alpha = \text{diag}(\text{CM}_{\text{Norm}}) = \left[ \frac{5}{8}, \frac{4}{8}, \frac{6}{10}, \frac{6}{9}, \frac{3}{6} \right]\]

In this example this volunteer will not be promoted to the next level since not every value of \(\alpha\) is above \(0.7\).

What this looks like in the code

running_reducer/gravity_spy_user_reducer.py returns alpha, level_up, max_workflow_id, max_level, and normalized_confusion_matrix directly on the reducer, and max_level, column_normalization (N above), and confusion_matrix in the store. The alpha keys to check at each level and the threshold they need to pass are set with level_config and first_level key words. When a level up is triggered the level_up value will switch from False to True and the max_workflow_id will indicate the ID for the new workflow to unlock.

An example of the level_config is:

{
    'level_1': {
        'workflow_id': 1,
        'new_categories': [
            'BLIP',
            'WHISTLE'
        ],
        'threshold': 0.7,
        'next_level': 'level_2'
    },
    'level_2': {
        'workflow_id': 2
    }
}

and in this case first_level would be set to ‘level_1’.

Retiring subjects

When the volunteer above classifies a new subject the \(\text{CM}_{\text{Norm}}\) is used to determine how much their vote contributes towards retirement.

Let’s assume this volunteer voted for the 3rd category. Their contribution will be the 3rd column:

\[\text{W}_i = \left[\frac{1}{10}, \frac{1}{10}, \frac{6}{10}, 0, \frac{2}{10} \right]\]

This is averaged with the contributions from the other volunteers who voted on the subject and the ML score \(p^{ML}\)

\[\text{W} = \frac{\sum_{i=1}^{n}{\text{W}_i} + p^{ML}}{n + 1}\]

When the maximum value of W passes the threshold (currently set to 0.9) the image is retired. W is normalized so that all the values in the vector sum to 1.

When the maximum value of \(\text{W}\) passes the threshold (e.g. 0.9) the image is retired.

What this looks like in the code

running_reducer/gravity_spy_subject_reducer.py returns number_views, none_of_the_above_count, category_weights (W above), and max_category_weight. The store has the two counts above and a running sum category_weights_sum.

The subject retirement rule can use a combination of number_views and max_category_weight (e.g. when number_views >= 3 and max_category_weight > 0.9 retire the subject)

Moving subjects to the next level

If three or more volunteers vote for “None of the above” the subject is moved to the next level up.

What this looks like in the code

running_reducer/gravity_spy_subject_reducer.py returns none_of_the_above_count to use for this rule (e.g. when none_of_the_above_count >= 3 move subject to the next level).

Other notes

A volunteer’s classification is ignored if their CM column is all zeros for the answer they have given (i.e. they have never voted for a particular category on any gold standard subject). Additionally, if they classify a subject as “none of the above” the number_views counter is not incremented and the current category_weights is not changed. This also means classifications from non-logged in volunteers are ignored (although if you are not logged in you can not see past the level 1 workflow so not that big a deal).

The ML weights count as 1 view (treated the same as any of the volunteers), so until number_views >= 2 it has not been classified by a volunteer with a non-zero CM column.