Reducers

Question Reducer

This module porvides functions to reduce the question task extracts from panoptes_aggregation.extractors.question_extractor.

panoptes_aggregation.reducers.question_reducer.process_data(data, pairs=False)

Process a list of extracted questions into Counter objects

Parameters:
Returns:

processed_data – A list of Counter objects, one for each extraction

Return type:

list

panoptes_aggregation.reducers.question_reducer.question_reducer(votes_list)

Reduce a list of Counter objects into a single dict

Parameters:votes_list (list) – A list of Counter objects from process_data()
Returns:reduction – A dictionary (formated as a Counter) giving the vote count for each key
Return type:dict

Slider Reducer

This module porvides functions to reduce the slider task extracts from panoptes_aggregation.extractors.slider_extractor.

panoptes_aggregation.reducers.slider_reducer.process_data(data, pairs=False)

Process a list of extracted slider into list

Parameters:data (list) – A list of extractions created by panoptes_aggregation.extractors.question_extractor.slider_extractor()
Returns:processed_data – A list of slider values, one for each extraction
Return type:list
panoptes_aggregation.reducers.slider_reducer.slider_reducer(votes_list)

Reduce a list of slider values into a mean and median

Parameters:votes_list (list) – A list of sldier values from process_data()
Returns:reduction – A dictionary giving the mean, median, and variance of the slider values
Return type:dict

Point Reducer

This module provides functions to cluster points extracted with panoptes_aggregation.extractors.point_extractor.

panoptes_aggregation.reducers.point_reducer.point_reducer(data_by_tool, **kwargs)

Cluster a list of points by tool using DBSCAN

This reducer is for use with panoptes_aggregation.extractors.point_extractor that does not seperate points by frame and does not support subtask reduction. Use panoptes_aggregation.extractors.point_extractor_by_frame and panoptes_aggregation.reducers.point_reducer_dbscan if there are multiple frames or subtasks.

Parameters:
Returns:

reduction – A dictinary with the following keys

  • tool*_points_x : A list of x positions for all points drawn with tool*
  • tool*_points_y : A list of y positions for all points drawn with tool*
  • tool*_cluster_labels : A list of cluster labels for all points drawn with tool*
  • tool*_clusters_count : The number of points in each cluster found
  • tool*_clusters_x : The x position for each cluster found
  • tool*_clusters_y : The y position for each cluster found
  • tool*_clusters_var_x : The x varaince of points in each cluster found
  • tool*_clusters_var_y : The y varaince of points in each cluster found
  • tool*_clusters_var_x_y : The x-y covaraince of points in each cluster found

Return type:

dict

panoptes_aggregation.reducers.point_reducer.process_data(data)

Process a list of extractions into lists of x and y sorted by tool.

Parameters:data (list) – A list of extractions crated by panoptes_aggregation.extractors.point_extractor.point_extractor()
Returns:processed_data – A dictionary with each key being a tool with a list of (x, y) tuples as a vlaue
Return type:dict

Point Reducer DBSCAN

This module provides functions to cluster points extracted with panoptes_aggregation.extractors.point_extractor.

panoptes_aggregation.reducers.point_reducer_dbscan.point_reducer_dbscan(data_by_tool, **kwargs)

Cluster a list of points by tool using DBSCAN

Parameters:
Returns:

reduction – A dictinary with one key per subject frame. Each frame has the following keys

  • tool*_points_x : A list of x positions for all points drawn with tool*
  • tool*_points_y : A list of y positions for all points drawn with tool*
  • tool*_cluster_labels : A list of cluster labels for all points drawn with tool*
  • tool*_clusters_count : The number of points in each cluster found
  • tool*_clusters_x : The x position for each cluster found
  • tool*_clusters_y : The y position for each cluster found
  • tool*_clusters_var_x : The x varaince of points in each cluster found
  • tool*_clusters_var_y : The y varaince of points in each cluster found
  • tool*_clusters_var_x_y : The x-y covaraince of points in each cluster found

Return type:

dict

panoptes_aggregation.reducers.point_reducer_dbscan.process_data(data)

Process a list of extractions into lists of x and y sorted by tool

Parameters:data (list) – A list of extractions crated by panoptes_aggregation.extractors.point_extractor.point_extractor()
Returns:processed_data – A dictionary with each key being a tool with a list of (x, y) tuples as a vlaue
Return type:dict

Point Reducer HDBSCAN

This module provides functions to cluster points extracted with panoptes_aggregation.extractors.point_extractor.

panoptes_aggregation.reducers.point_reducer_hdbscan.point_reducer_hdbscan(data_by_tool, **kwargs)

Cluster a list of points by tool using HDBSCAN

Parameters:
Returns:

reduction – A dictinary with one key per subject frame. Each frame has the following keys

  • tool*_points_x : A list of x positions for all points drawn with tool*
  • tool*_points_y : A list of y positions for all points drawn with tool*
  • tool*_cluster_labels : A list of cluster labels for all points drawn with tool*
  • tool*_cluster_probabilities: A list of cluster probabilities for all points drawn with tool*
  • tool*_clusters_persistance: A mesure for how persistent each cluster is (1.0 = stable, 0.0 = unstable)
  • tool*_clusters_count : The number of points in each cluster found
  • tool*_clusters_x : The weighted x position for each cluster found
  • tool*_clusters_y : The weighted y position for each cluster found
  • tool*_clusters_var_x : The weighted x varaince of points in each cluster found
  • tool*_clusters_var_y : The weighted y varaince of points in each cluster found
  • tool*_clusters_var_x_y : The weighted x-y covaraince of points in each cluster found

Return type:

dict

panoptes_aggregation.reducers.point_reducer_hdbscan.process_data(data)

Process a list of extractions into lists of x and y sorted by tool

Parameters:data (list) – A list of extractions crated by panoptes_aggregation.extractors.point_extractor.point_extractor()
Returns:processed_data – A dictionary with each key being a tool with a list of (x, y) tuples as a vlaue
Return type:dict

Rectangle Reducer

This module provides functions to cluster rectangles extracted with panoptes_aggregation.extractors.rectangle_extractor.

panoptes_aggregation.reducers.rectangle_reducer.process_data(data)

Process a list of extractions into lists of x and y sorted by frame and tool

Parameters:data (list) – A list of extractions crated by panoptes_aggregation.extractors.rectangle_extractor.rectangle_extractor()
Returns:processed_data – A dictionary with each key being a frame dictionary values with keys being tool with a list of (x, y, width, height) tuples as a vlaue
Return type:dict
panoptes_aggregation.reducers.rectangle_reducer.rectangle_reducer(data_by_tool, **kwargs)

Cluster a list of rectangles by tool and frame

Parameters:
Returns:

reduction – A dictinary with the following keys for each frame

  • tool*_rec_x : A list of x positions for all rectangles drawn with tool*
  • tool*_rec_y : A list of y positions for all rectangles drawn with tool*
  • tool*_rec_width : A list of width values for all rectangles drawn with tool*
  • tool*_rec_height : A list of height values for all rectangles drawn with tool*
  • tool*_cluster_labels : A list of cluster labels for all rectangles drawn with tool*
  • tool*_clusters_count : The number of points in each cluster found
  • tool*_clusters_x : The x position for each cluster found
  • tool*_clusters_y : The y position for each cluster found
  • tool*_clusters_width : The widht value for each cluster found
  • tool*_clusters_height : The height value for each cluster found

Return type:

dict


Shape Reducer DBSCAN

This module provides functions to cluster shapes extracted with panoptes_aggregation.extractors.shape_extractor.

panoptes_aggregation.reducers.shape_reducer_dbscan.shape_reducer_dbscan(data_by_tool, **kwargs)

Cluster a shape by tool using DBSCAN

Parameters:
  • data_by_tool (dict) – A dictionary returned by process_data()
  • kwrgsSee DBSCAN
Returns:

reduction – A dictinary with the following keys for each frame

  • tool*_<shape>_<param> : A list of all param for the sahpe drawn with tool*
  • tool*_cluster_labels : A list of cluster labels for all shapes drawn with tool*
  • tool*_clusters_count : The number of points in each cluster found
  • tool*_clusters_<param> : The param value for each cluster found

Return type:

dict


Shape Reducer HDBSCAN

This module provides functions to cluster shapes extracted with panoptes_aggregation.extractors.shape_extractor.

panoptes_aggregation.reducers.shape_reducer_hdbscan.shape_reducer_hdbscan(data_by_tool, **kwargs)

Cluster a shape by tool using HDBSCAN

Parameters:
  • data_by_tool (dict) – A dictionary returned by process_data()
  • kwrgsSee HDBSCAN
Returns:

reduction – A dictinary with the following keys for each frame

  • tool*_<shape>_<param> : A list of all param for the sahpe drawn with tool*
  • tool*_cluster_labels : A list of cluster labels for all shapes drawn with tool*
  • tool*_cluster_probabilities: A list of cluster probabilities for all points drawn with tool*
  • tool*_clusters_persistance: A mesure for how persistent each cluster is (1.0 = stable, 0.0 = unstable)
  • tool*_clusters_count : The number of points in each cluster found
  • tool*_clusters_<param> : The param value for each cluster found

Return type:

dict


Survey Reducer

This module provides functions to reduce survey task extracts from panoptes_aggregation.extractors.survey_extractor.

panoptes_aggregation.reducers.survey_reducer.process_data(data)

Process a list of extracted survey data into a dictionary of sub-question answers sorted organized by choice

Parameters:data (list) – A list of extractions created by panoptes_aggregation.extractors.survey_extractor.survey_extractor()
Returns:processed_data – A dictionary where the keys are the choice made and the values are a list of dicts containing Counters for each sub-question asked.
Return type:dict
panoptes_aggregation.reducers.survey_reducer.survey_reducer(data_in)

Reduce the survey task answers as a list of dicts (one for each choice marked)

Parameters:data_in (dict) – A dictionary created by process_data()
Returns:reduction – A list that has one element for choice marked. Each element is a dict of the form
  • choice : The choice made
  • total_vote_count : The number of users that classified the subject
  • choice_count : The number of users that made this choice
  • answers_* : Counters for each answer to sub-question *
Return type:list

Polygon As Line Tool for Text Reducer

This module provides functions to reduce the polygon-text extractions from panoptes_aggregation.extractors.poly_line_text_extractor.

panoptes_aggregation.reducers.poly_line_text_reducer.poly_line_text_reducer(data_by_frame, **kwargs_dbscan)

Reduce the polygon-text answers as a list of lines of text.

Parameters:
Returns:

reduction – A dictionary with on key for each frame of the subject that have lists as values. Each item of the list represents one line transcribed of text and is a dictionary with three keys:

  • clusters_x : the x position of each identified word
  • clusters_y : the y position of each identified word
  • clusters_text : A list of text at each cluster position
  • gutter_label : A label indicating what “gutter” cluster the line is from
  • line_slope: The slope of the line of text in degrees
  • slope_label : A label indicating what slope cluster the line is from
  • number_views : The number of users that transcribed the line of text
  • consensus_score : The average number of users who’s text agreed for the line
    Note, if consensus_score is the same a number_views every user agreed with each other

Note: the image coordiate system is left handed with y increasing downward.

Return type:

dict

panoptes_aggregation.reducers.poly_line_text_reducer.process_data(data_list, process_by_line=False)

Process a list of extractions into a dictionary of loc and text organized by frame

Parameters:data_list (list) – A list of extractions created by panoptes_aggregation.extractors.poly_line_text_extractor.poly_line_text_extractor()
Returns:processed_data – A dictionary with keys for each frame of the subject and values being dictionaries with x, y, text, and slope keys. x, y, and text are list-of-lists, each inner list is from a single annotaiton, slope is the list of slopes (in deg) for each of these inner lists.
Return type:dict

Text aggregation utilities

This module provides utility functions used in the polyton-as-line-text-reducer code from panoptes_aggregation.reducers.poly_line_text_reducer.

panoptes_aggregation.reducers.text_utils.align_words(word_line, xy_line, text_line, kwargs_cluster, kwargs_dbscan)

A function to take the annotations for one line of text, aligns the words, and finds the end-points for the line.

Parameters:
  • word_line (np.array) – An nx1 array with the x-position of each dot in the rotated coordiate frame.
  • xy_line (np.array) – An nx2 array with the non-rotated (x, y) positions of each dot.
  • text_line (np.array) – An nx1 array with the text for each dot.
  • kwargs_cluster (dict) – A dictionary containing the eps_*, metric, and dot_freq keywords
  • kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords
Returns:

  • clusters_x (list) – A list with the start and end x-position of the line
  • clusters_y (list) – A list with the start and end y-position of the line
  • clusters_text (list) – A list-of-lists with the words transcribed at each dot cluster found. One list per cluster. Note: the empty strings that were added to each annotaiton are stripped before returning the words.

panoptes_aggregation.reducers.text_utils.angle_metric(t1, t2)

A metric for the distance between angles in the [-180, 180] range

Parameters:
  • t1 (float) – Theta one in degrees
  • t2 (float) – Theta two in degrees
Returns:

distance – The distance between the two input angles in degrees

Return type:

float

panoptes_aggregation.reducers.text_utils.avg_angle(theta)

A function that finds the avage of an array of angles that are in the range [-180, 180].

Parameters:theta (array) – An array of angles that are in the range [-180, 180] degrees
Returns:average – The average angle
Return type:float
panoptes_aggregation.reducers.text_utils.cluster_by_gutter(x_slope, y_slope, text_slope, kwargs_cluster, kwargs_dbscan)

A function to take the annotations for each frame of a subject and group them based on what side of the page gutter they are on.

Parameters:
  • x_slope (list) – A list-of-lists of the x values for each drawn dot. There is one item in the list for annotation made by the user.
  • y_slope (list) – A list-of-lists of the y values for each drawn dot. There is one item in the list for annotation made by the user.
  • text_slope (list) – A list-of-lists of the text for each drawn dot. There is one item in the list for annotation made by the user.
  • kwargs_cluster (dict) – A dictionary containing the eps_*, metric, and dot_freq keywords
  • kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords
Returns:

frame_gutter – A list of the resulting extractions, one item per line of text found.

Return type:

list

panoptes_aggregation.reducers.text_utils.cluster_by_line(xy_rotate, xy_gutter, text_gutter, annotation_labels, kwargs_cluster, kwargs_dbscan)

A function to take the annotations for one slope_label and cluster them based on perpendicular distance (e.g. lines of text).

Parameters:
  • xy_rotate (np.array) – An array of shape nx2 containing the (x, y) positions of each dot drawn in the rotate coordiate frame.
  • xy_gutter (np.array) – An array of shape nx2 containing the (x, y) positions for each dot drawn.
  • text_gutter (np.array) – An array of shape nx1 containing the text for each dot drawn. Note: each annotation has an empty string added to the end so this array has the same shape as xy_slope.
  • annotation_labels (np.array) – An array of shape nx1 containing a unique lable indicating what annotation each position/text came from. This information is used to ensure one annotation does not span multiple lines.
  • kwargs_cluster (dict) – A dictionary containing the eps_*, metric, and dot_freq keywords
  • kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords
Returns:

frame_lines – A list of reductions, one for each line. Each reduction is a dictionary containing the information for the line.

Return type:

list

panoptes_aggregation.reducers.text_utils.cluster_by_slope(x_frame, y_frame, text_frame, slope_frame, kwargs_cluster, kwargs_dbscan)

A function to take the annotations for one gutter_label and cluster them based on what slope the transcription is.

Parameters:
  • x_frame (list) – A list-of-lists of the x values for each drawn dot. There is one item in the list for annotation made by the user.
  • y_frame (list) – A list-of-lists of the y values for each drawn dot. There is one item in the list for annotation made by the user.
  • text_frame (list) – A list-of-lists of the text for each drawn dot. There is one item in the list for annotation made by the user. The inner text lists are padded with an empty string at the end so there is the same number of words as there are dots.
  • slope_frame (list) – A list of the slopes (in deg) for each annotation
  • kwargs_cluster (dict) – A dictionary containing the eps_*, metric, and dot_freq keywords
  • kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords
Returns:

frame_slope – A list of the resulting extractions, one item per line of text found.

Return type:

list

panoptes_aggregation.reducers.text_utils.cluster_by_word(word_line, xy_line, text_line, annotation_labels, kwargs_cluster, kwargs_dbscan)

A function to take the annotations for one line of text and cluster them based on the words in the line.

Parameters:
  • word_line (np.array) – An nx1 array with the x-position of each dot in the rotated coordiate frame.
  • xy_line (np.array) – An nx2 array with the non-rotated (x, y) positions of each dot.
  • text_line (np.array) – An nx1 array with the text for each dot.
  • annotation_labels (np.array) – An nx1 array with a lable indicating what annotaiton each word belongs to.
  • kwargs_cluster (dict) – A dictionary containing the eps_*, metric, and dot_freq keywords
  • kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords
Returns:

  • clusters_x (list) – A list with the x-position of each dot cluster found
  • clusters_y (list) – A list with the y-position of each dot cluster found
  • clusters_text (list) – A list-of-lists with the words transcribed at each dot cluster found. One list per cluster. Note: the empty strings that were added to each annotaiton are stripped before returning the words.

panoptes_aggregation.reducers.text_utils.consensus_score(clusters_text)

A function to take clustered text data and return the consensus score

Parameters:clusters_text (list) – A list-of-lists with length equal to the number of words in a line of text and each inner list contains the transcriptions for each word.
Returns:consensus_score – A value indicating the average number of users that agree on the line of text.
Return type:float
panoptes_aggregation.reducers.text_utils.gutter(lines_in, tol=0)

Cluster list of input line segments by what side of the page gutter they are on.

Parameters:lines_in (list) – A list-of-lists containing one line segment per item. Each line segment should contain only the x-coordinate of each point on the line.
Returns:gutter_index – A numpy array containing the cluster label for each input line. This label idicates what side of the gutter(s) the input line segment is on.
Return type:array
panoptes_aggregation.reducers.text_utils.overlap(x, y, tol=0)

Check if two line segments overlap

Parameters:
  • x (list) – A list with the start and end point of the first line segment
  • y (lits) – A list with the start and end point of the second line segment
  • tol (float) – The tolerance to consider lines overlapping. Default 0, positive value indicate small overlaps are not considered, negitive values idicate small gaps are not considered.
Returns:

overlap – True if the two line segments overlap, False otherwise

Return type:

bool

panoptes_aggregation.reducers.text_utils.sort_labels(db_labels, data, reducer=<function mean>, descending=False)

A function that takes in the cluster lables for some data and returns a sorted (by the original data) list of the unique lables in.

Parameters:
  • db_labels (list) – A list of cluster lables, one labele for each data point.
  • data (np.array) – The data the lables belong to
  • reducer (function (optional)) – The function used to combine the data for each label. Defualt: np.mean
  • descending (bool (optional)) – A flag indicating if the lables should be sorted in descending order. Default: False
Returns:

lables – A list of unique cluster lables sorted in either ascending or descending order.

Return type:

list

panoptes_aggregation.reducers.text_utils.tokenize(self, contents)

Tokenize only on space so angle bracket tags are not split


Shakespeares World Variants Reducer

This module provides a fuction to reduce the variants data from extracts.

panoptes_aggregation.reducers.sw_variant_reducer.sw_variant_reducer(extracts)

Reduce all variants for a subject into one list

Parameters:extracts (list) – A list of extracts created by panoptes_aggregation.extractors.sw_variant_extractor.sw_variant_extractor()
Returns:reduction – A dictionary with at most one key, variants with the list of all variants in the subject
Return type:dict