Reducers
Question Reducer
This module provides functions to reduce the question task extracts from
panoptes_aggregation.extractors.question_extractor.
- panoptes_aggregation.reducers.question_reducer.question_reducer(data_list, pairs=False, track_user_ids=False, **kwargs)
- Reduce a list of extracted questions into a “counter” dict - Parameters:
- data_list (list) – A list of extractions created by - panoptes_aggregation.extractors.question_extractor.question_extractor()
- pairs (bool, optional) – Default False. How multiple choice questions are treated. When True the set of all choices is treated as a single answer 
- track_user_ids (bool, optional) – Default False. Set to True to also track the user_ids that gave each answer. 
 
- Returns:
- reduction – A dictionary (formated as a Counter) giving the vote count for each key. If user_ids is True it will also contain a list of user_ids for each answer given. 
- Return type:
 
Question Consensus Reducer
This module porvides functions to reduce the question task extracts from
panoptes_aggregation.extractors.question_extractor.
- panoptes_aggregation.reducers.question_consensus_reducer.question_consensus_reducer(data_list, pairs=False, **kwargs)
- Reduce a list of extracted questions into a consensus description dict - Parameters:
- data_list (list) – A list of extractions created by - panoptes_aggregation.extractors.question_extractor.question_extractor()
- pairs (bool, optional) – Default False. How multiple choice questions are treated. When True the set of all choices is treated as a single answer 
 
- Returns:
- reduction – A dictinary with the following keys - most_likely : key with greatest number of classifications/votes 
- num_votes : vote count for mostly likely key 
- agreement : fraction of total votes held by most likely key. 
 
- Return type:
 
Slider Reducer
This module provides functions to reduce the slider task extracts from
panoptes_aggregation.extractors.slider_extractor.
- panoptes_aggregation.reducers.slider_reducer.process_data(data, pairs=False)
- Process a list of extracted slider into list - Parameters:
- data (list) – A list of extractions created by - panoptes_aggregation.extractors.question_extractor.slider_extractor()
- Returns:
- processed_data – A list of slider values, one for each extraction 
- Return type:
 
- panoptes_aggregation.reducers.slider_reducer.slider_reducer(votes_list)
- Reduce a list of slider values into a mean and median - Parameters:
- votes_list (list) – A list of sldier values from - process_data()
- Returns:
- reduction – A dictionary giving the mean, median, and variance of the slider values 
- Return type:
 
Point Reducer
This module provides functions to cluster points extracted with
panoptes_aggregation.extractors.point_extractor.
- panoptes_aggregation.reducers.point_reducer.point_reducer(data_by_tool, **kwargs)
- Cluster a list of points by tool using DBSCAN - This reducer is for use with - panoptes_aggregation.extractors.point_extractorthat does not seperate points by frame and does not support subtask reduction. Use- panoptes_aggregation.extractors.point_extractor_by_frameand- panoptes_aggregation.reducers.point_reducer_dbscanif there are multiple frames or subtasks.- Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- kwrgs – See DBSCAN 
 
- Returns:
- reduction – A dictinary with the following keys - tool*_points_x : A list of x positions for all points drawn with tool* 
- tool*_points_y : A list of y positions for all points drawn with tool* 
- tool*_cluster_labels : A list of cluster labels for all points drawn with tool* 
- tool*_clusters_count : The number of points in each cluster found 
- tool*_clusters_x : The x position for each cluster found 
- tool*_clusters_y : The y position for each cluster found 
- tool*_clusters_var_x : The x varaince of points in each cluster found 
- tool*_clusters_var_y : The y varaince of points in each cluster found 
- tool*_clusters_var_x_y : The x-y covaraince of points in each cluster found 
 
- Return type:
 
- panoptes_aggregation.reducers.point_reducer.process_data(data)
- Process a list of extractions into lists of x and y sorted by tool. - Parameters:
- data (list) – A list of extractions crated by - panoptes_aggregation.extractors.point_extractor.point_extractor()
- Returns:
- processed_data – A dictionary with each key being a tool with a list of (x, y) tuples as a vlaue 
- Return type:
 
Point Reducer DBSCAN
This module provides functions to cluster points extracted with
panoptes_aggregation.extractors.point_extractor.
- panoptes_aggregation.reducers.point_reducer_dbscan.point_reducer_dbscan(data_by_tool, **kwargs)
- Cluster a list of points by tool using DBSCAN - Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- kwargs – See DBSCAN 
 
- Returns:
- reduction – A dictionary with one key per subject frame. Each frame has the following keys - tool*_points_x : A list of x positions for all points drawn with tool* 
- tool*_points_y : A list of y positions for all points drawn with tool* 
- tool*_cluster_labels : A list of cluster labels for all points drawn with tool* 
- tool*_clusters_count : The number of points in each cluster found 
- tool*_clusters_x : The x position for each cluster found 
- tool*_clusters_y : The y position for each cluster found 
- tool*_clusters_var_x : The x variance of points in each cluster found 
- tool*_clusters_var_y : The y variance of points in each cluster found 
- tool*_clusters_var_x_y : The x-y covariance of points in each cluster found 
 
- Return type:
 
Point Reducer HDBSCAN
This module provides functions to cluster points extracted with
panoptes_aggregation.extractors.point_extractor.
- panoptes_aggregation.reducers.point_reducer_hdbscan.point_reducer_hdbscan(data_by_tool, **kwargs)
- Cluster a list of points by tool using HDBSCAN - Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- kwargs – See HDBSCAN 
 
- Returns:
- reduction – A dictionary with one key per subject frame. Each frame has the following keys - tool*_points_x : A list of x positions for all points drawn with tool* 
- tool*_points_y : A list of y positions for all points drawn with tool* 
- tool*_cluster_labels : A list of cluster labels for all points drawn with tool* 
- tool*_cluster_probabilities: A list of cluster probabilities for all points drawn with tool* 
- tool*_clusters_count : The number of points in each cluster found 
- tool*_clusters_x : The weighted x position for each cluster found 
- tool*_clusters_y : The weighted y position for each cluster found 
- tool*_clusters_var_x : The weighted x variance of points in each cluster found 
- tool*_clusters_var_y : The weighted y variance of points in each cluster found 
- tool*_clusters_var_x_y : The weighted x-y covariance of points in each cluster found 
 
- Return type:
 
Rectangle Reducer
This module provides functions to cluster rectangles extracted with
panoptes_aggregation.extractors.rectangle_extractor.
- panoptes_aggregation.reducers.rectangle_reducer.process_data(data)
- Process a list of extractions into lists of x and y sorted by frame and tool - Parameters:
- data (list) – A list of extractions crated by - panoptes_aggregation.extractors.rectangle_extractor.rectangle_extractor()
- Returns:
- processed_data – A dictionary with each key being a frame dictionary values with keys being tool with a list of (x, y, width, height) tuples as a value 
- Return type:
 
- panoptes_aggregation.reducers.rectangle_reducer.rectangle_reducer(data_by_tool, **kwargs)
- Cluster a list of rectangles by tool and frame - Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- kwargs – See DBSCAN 
 
- Returns:
- reduction – A dictionary with the following keys for each frame - tool*_rec_x : A list of x positions for all rectangles drawn with tool* 
- tool*_rec_y : A list of y positions for all rectangles drawn with tool* 
- tool*_rec_width : A list of width values for all rectangles drawn with tool* 
- tool*_rec_height : A list of height values for all rectangles drawn with tool* 
- tool*_cluster_labels : A list of cluster labels for all rectangles drawn with tool* 
- tool*_clusters_count : The number of points in each cluster found 
- tool*_clusters_x : The x position for each cluster found 
- tool*_clusters_y : The y position for each cluster found 
- tool*_clusters_width : The width value for each cluster found 
- tool*_clusters_height : The height value for each cluster found 
 
- Return type:
 
Shape Reducer DBSCAN
This module provides functions to cluster shapes extracted with
panoptes_aggregation.extractors.shape_extractor.
- panoptes_aggregation.reducers.shape_reducer_dbscan.shape_reducer_dbscan(data_by_tool, **kwargs)
- Cluster a shape by tool using DBSCAN - Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- metric_type (str) – Either “euclidean” to use a euclidean metric in the N-dimension shape parameter space or “IoU” for the intersection of union metric based on shape overlap. The IoU metric can only be used with the following shape: - rectangle 
- rotateRectangle 
- circle 
- ellipse 
 
- estimate_average (bool) – For the IoU metric estimate the average by the most representative shape from the cluster, this is significantly faster to compute than the true average, True by default. 
- kwargs – See DBSCAN 
 
- Returns:
- reduction – A dictionary with the following keys for each frame - tool*_<shape>_<param> : A list of all param for the shape drawn with tool* 
- tool*_cluster_labels : A list of cluster labels for all shapes drawn with tool* 
- tool*_clusters_count : The number of points in each cluster found 
- tool*_clusters_<param> : The param value for each cluster found 
 - If the “IoU” metric type is used there is also - tool*_clusters_sigma : The standard deviation of the average shape under the IoU metric 
 
- Return type:
 
Shape Reducer OPTICS
This module provides functions to cluster shapes extracted with
panoptes_aggregation.extractors.shape_extractor.
- panoptes_aggregation.reducers.shape_reducer_optics.shape_reducer_optics(data_by_tool, **kwargs)
- Cluster a shape by tool using OPTICS - Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- metric_type (str) – Either “euclidean” to use a euclidean metric in the N-dimension shape parameter space or “IoU” for the intersection of union metric based on shape overlap. The IoU metric can only be used with the following shape: - rectangle 
- rotateRectangle 
- circle 
- ellipse 
 
- estimate_average (bool) – For the IoU metric estimate the average by the most representative shape from the cluster, this is significantly faster to compute than the true average, True by default. 
- kwargs – See OPTICS 
 
- Returns:
- reduction – A dictionary with the following keys for each frame - tool*_<shape>_<param> : A list of all param for the shape drawn with tool* 
- tool*_cluster_labels : A list of cluster labels for all shapes drawn with tool* 
- tool*_clusters_count : The number of points in each cluster found 
- tool*_clusters_<param> : The param value for each cluster found 
 - If the “IoU” metric type is used there is also - tool*_clusters_sigma : The standard deviation of the average shape under the IoU metric 
 
- Return type:
 
Shape Reducer HDBSCAN
This module provides functions to cluster shapes extracted with
panoptes_aggregation.extractors.shape_extractor.
- panoptes_aggregation.reducers.shape_reducer_hdbscan.shape_reducer_hdbscan(data_by_tool, **kwargs)
- Cluster a shape by tool using HDBSCAN - Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- metric_type (str) – Either “euclidean” to use a euclidean metric in the N-dimension shape parameter space or “IoU” for the intersection of union metric based on shape overlap. The IoU metric can only be used with the following shape: - rectangle 
- rotateRectangle 
- circle 
- ellipse 
 
- estimate_average (bool) – For the IoU metric estimate the average by the most representative shape from the cluster, this is significantly faster to compute than the true average, True by default. 
- kwargs – See HDBSCAN 
 
- Returns:
- reduction – A dictionary with the following keys for each frame - tool*_<shape>_<param> : A list of all param for the shape drawn with tool* 
- tool*_cluster_labels : A list of cluster labels for all shapes drawn with tool* 
- tool*_cluster_probabilities: A list of cluster probabilities for all points drawn with tool* 
- tool*_clusters_count : The number of points in each cluster found 
- tool*_clusters_<param> : The param value for each cluster found 
 - If the “IoU” metric type is used there is also - tool*_clusters_sigma : The standard deviation of the average shape under the IoU metric 
 
- Return type:
 
Polygon/Freehand Tool Reducer Using DBSCAN
This module provides functions to reduce the polygon extractions from both
panoptes_aggregation.extractors.polygon_extractor and
panoptes_aggregation.extractors.bezier_extractor using the
algorithm DBSCAN.
All polygons are assumed to be closed. Any unclosed polygons will be closed.
- panoptes_aggregation.reducers.polygon_reducer.polygon_reducer(data_by_tool, **kwargs_dbscan)
- Cluster a polygon/freehand/Bezier tools using DBSCAN. - There is a choice in how the cluster is averaged into a single cluster, with the varies choices listed below. - A custom “IoU” metric type is used to measure the distance between the polygons. - Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- kwargs – 
- average_type : Must be either “union”, which returns the union of the cluster, “intersection” which returns the intersection of the cluster, “last”, which returns the last polygon to be created in the cluster, or “median”, which returns the polygon with the minimum total distance to the other polygons. Defaults to “median”. 
- created_at : A list of when the classifications were made. 
- collab : A boolean indicating whether the annotations column is included in the output. Defaults to False. 
 
 
- Returns:
- reduction – A dictionary with the following keys for each frame, task and tool: - tool*_cluster_labels : A list of cluster labels for polygons provided for this frame and tool 
- tool*_clusters_count : The number of points in each cluster found for this frame and tool 
- tool*_clusters_x : A list of the x values of each cluster 
- tool*_clusters_y : A list of the y values of each cluster 
- tool*_consensus : A list of the overall consensus of each cluster. A value of 1 is perfect agreement, a value of 0 is complete disagreement. This is found by subtracting`IoU_cluster_mean_distance` from 1 
- annotations : Contains the consensus polygons in the original classification format, which is included in the output if collab is set to True. For use with the Zooniverse front-end. 
 
- Return type:
 
- panoptes_aggregation.reducers.polygon_reducer.process_data(data)
- Process a list of extractions into a dictionary organized by frame, Task and tool. - This also closes and simplifies the polygons. - Parameters:
- data (list) – A list of extractions created by - panoptes_aggregation.extractors.polygon_extractor()or- panoptes_aggregation.extractors.bezier_extractor()
- Returns:
- data_by_tool – A dictionary with one key for each frame of the subject and each tool used for the classification. The value for each key is a dictionary with two keys X and data. X is a 2D array with each row mapping to the data held in data. The first column contains row indices and the second column is an index assigned to each user. data is a list of dictionaries, which contains the polygon data to be reduced. It is of the form {‘polygon’: shapely.geometry.polygon.Polygon, ‘gold_standard’: bool}. 
- Return type:
 
Polygon/Freehand Tool Reducer Using DBSCAN - Contours
This module is an extension of panoptes_aggregation.reducers.polygon_reducer
to provide the contours of intersection/overlap. These can be used to estimate the
cluster average and its uncertainty.
All polygons are assumed to be closed. Any unclosed polygons will be closed.
Note, this reduction is one cluster per row.
- panoptes_aggregation.reducers.polygon_reducer_contours.polygon_reducer_contours(data_by_tool, **kwargs_dbscan)
- Cluster a polygon/freehand/Bezier tools using DBSCAN, then find the contours of this cluster. - The contours are defined by the overlap/intersection of the polygons in the cluster. Each contour is the union of at least the number of intersections of its position in the list. E.g. the second contour is the largest polygon/area of at least two volunteers agreeing, the third is at least three volunteers etc. - A custom “IoU” metric type is used. - This reduction will take much longer than - panoptes_aggregation.reducers.polygon_reducer. As it retruns a list rather than a dictionary this may cause issues with any subsequent data processing with Caesar.- The default method for finding the contours is slow but accurate. However, the algorithm time per cluster increases approximately exponentially with number of polygons in the cluster. Therefore, for cases with clusters of many polygons, a more effcient but less accurate rasterisation based approach is used. This can be used instead of the default setting the kwarg rasterisation to True. - Parameters:
- data_by_tool (dict) – A dictionary returned by - panoptes_aggregation.reducers.process_data
- average_type (str) – Either “union”, which returns the union of the cluster, “intersection” which retruns the intersection of the cluster, “last”, which returns the last polygon to be annotated in the cluster or “median”, which returns the polygon with minimal IoU distance to the other polygons of the cluster. 
- kwargs – - rasterisation/rasterization: String/boolean. If True the contours are found using rasterisation, if False intersections are used. Defaults to ‘auto’, which uses rasterisation if more than 9 in the cluster. 
- num_grid_points: An integer which defines the number of grid points per axis when rasterisation is True. A higher number results in more accuracy but also increases computational time. Defaults to 100. 
- smoothing: A string to choose the type of smoothing used for rasterisation (if used). If ‘minimal_sides’, the number of sides of the contour is minimised. If ‘rounded’, corners are rounded. If ‘no_smoothing’, no smoothing is done. Defaults to ‘minimal_sides’. 
 
 
- Returns:
- reduction – A list of dictionaries. Each dictionary has following keys for each frame, task and tool: - tool*_cluster_labels : A list of cluster labels for polygons provided. This is for all of the clusters for this frame and tool 
- tool*_cluster_label_for_contours : The index of the cluster whose contours are listed, corresponding to the labels in tool*_cluster_labels 
- tool*_number_of_contours : The number of contours of the cluster 
- tool*_contours_x : A list of the x values of each contour 
- tool*_contours_y : A list of the y values of each contour 
- tool*_consensus : A list of the the overall consensus of each cluster. A value of 1 is perfect agreement, a value of 0 is complete disagreement. This is found by subtracting`IoU_cluster_mean_distance` from 1 
 
- Return type:
 
Survey Reducer
This module provides functions to reduce survey task extracts from
panoptes_aggregation.extractors.survey_extractor.
- panoptes_aggregation.reducers.survey_reducer.process_data(data)
- Process a list of extracted survey data into a dictionary of sub-question answers sorted organized by choice - Parameters:
- data (list) – A list of extractions created by - panoptes_aggregation.extractors.survey_extractor.survey_extractor()
- Returns:
- processed_data – A dictionary where the keys are the choice made and the values are a list of dicts containing Counters for each sub-question asked. 
- Return type:
 
- panoptes_aggregation.reducers.survey_reducer.survey_reducer(data_in)
- Reduce the survey task answers as a list of dicts (one for each choice marked) - Parameters:
- data_in (dict) – A dictionary created by - process_data()
- Returns:
- reduction – A list that has one element for choice marked. Each element is a dict of the form - choice : The choice made 
- total_vote_count : The number of users that classified the subject 
- choice_count : The number of users that made this choice 
- answers_* : Counters for each answer to sub-question * 
 
- Return type:
 
Polygon As Line Tool for Text Reducer
This module provides functions to reduce the polygon-text extractions from
panoptes_aggregation.extractors.poly_line_text_extractor.
- panoptes_aggregation.reducers.poly_line_text_reducer.poly_line_text_reducer(data_by_frame, **kwargs_dbscan)
- Reduce the polygon-text answers as a list of lines of text. - Parameters:
- data_by_frame (dict) – A dictionary returned by - process_data()
- kwargs – 
- eps_slope : How close the angle of two lines need to be in order to be placed in the same angle cluster. 
- eps_line : How close vertically two lines need to be in order to be identified as the same line. 
- eps_word : How close horizontally the end points of a line need to be in order to be identified as a single point. 
- gutter_tol : How much neighboring columns can overlap horizontally and still be identified as multiple columns. 
- dot_freq : “line” if dots are drawn at the start and end point of a line, “word” if dots are drawn between each word. Note: “word” was proposed for a project but was never used, I don’t expect it ever will. This will likely be depreciated in a future release. 
- min_samples : For all clustering stages this is how many points need to be close together for a cluster to be identified. Set this to 1 for all annotations to be kept 
- min_word_count : The minimum number of times a word must be identified for it to be kept in the consensus text. 
- low_consensus_threshold : The minimum consensus score allowed to be considered “done” 
- minimum_views : A value that is passed along to the font-end to set when lines should turn grey (has no effect on aggregation) 
 
 
- Returns:
- reduction – A dictionary with on key for each frame of the subject that have lists as values. Each item of the list represents one line transcribed of text and is a dictionary with these keys: - clusters_x : the x position of each identified word 
- clusters_y : the y position of each identified word 
- clusters_text : A list of text at each cluster position 
- gutter_label : A label indicating what “gutter” cluster the line is from 
- line_slope: The slope of the line of text in degrees 
- slope_label : A label indicating what slope cluster the line is from 
- number_views : The number of users that transcribed the line of text 
- consensus_score : The average number of users who’s text agreed for the line. Note, if consensus_score is the same a number_views every user agreed with each other 
- low_consensus : True if the consensus_score is less than the threshold set by the low_consensus_threshold keyword 
 - For the entire subject the following is also returned: * low_consensus_lines : The number of lines with low consensus * transcribed_lines : The total number of lines transcribed on the subject - Note: the image coordiate system has y increasing downward. 
- Return type:
 
- panoptes_aggregation.reducers.poly_line_text_reducer.process_data(data_list, process_by_line=False)
- Process a list of extractions into a dictionary of loc and text organized by frame - Parameters:
- data_list (list) – A list of extractions created by - panoptes_aggregation.extractors.poly_line_text_extractor.poly_line_text_extractor()
- Returns:
- processed_data – A dictionary with keys for each frame of the subject and values being dictionaries with x, y, text, and slope keys. x, y, and text are list-of-lists, each inner list is from a single annotaiton, slope is the list of slopes (in deg) for each of these inner lists. 
- Return type:
 
Text aggregation utilities
This module provides utility functions used in the polyton-as-line-text-reducer code from
panoptes_aggregation.reducers.poly_line_text_reducer.
- panoptes_aggregation.reducers.text_utils.align_words(word_line, xy_line, text_line, kwargs_cluster, kwargs_dbscan)
- A function to take the annotations for one line of text, aligns the words, and finds the end-points for the line. - Parameters:
- word_line (np.array) – An nx1 array with the x-position of each dot in the rotated coordinate frame. 
- xy_line (np.array) – An nx2 array with the non-rotated (x, y) positions of each dot. 
- text_line (np.array) – An nx1 array with the text for each dot. 
- gs_line (np.array) – An array of bools indicating if the annotation was made in gold standard mode 
- kwargs_cluster (dict) – A dictionary containing the eps_* and dot_freq keywords 
- kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords 
 
- Returns:
- clusters_x (list) – A list with the start and end x-position of the line 
- clusters_y (list) – A list with the start and end y-position of the line 
- clusters_text (list) – A list-of-lists with the words transcribed at each dot cluster found. One list per cluster. Note: the empty strings that were added to each annotaiton are stripped before returning the words. 
 
 
- panoptes_aggregation.reducers.text_utils.angle_metric(t1, t2)
- A metric for the distance between angles in the [-180, 180] range - Parameters:
- t1 (float) – Theta one in degrees 
- t2 (float) – Theta two in degrees 
 
- Returns:
- distance – The distance between the two input angles in degrees 
- Return type:
 
- panoptes_aggregation.reducers.text_utils.avg_angle(theta)
- A function that finds the average of an array of angles that are in the range [-180, 180]. - Parameters:
- theta (array) – An array of angles that are in the range [-180, 180] degrees 
- Returns:
- average – The average angle 
- Return type:
 
- panoptes_aggregation.reducers.text_utils.cluster_by_gutter(x_slope, y_slope, text_slope, gs_slope, data_index_slope, ext_index_slope, kwargs_cluster, kwargs_dbscan)
- A function to take the annotations for each frame of a subject and group them based on what side of the page gutter they are on. - Parameters:
- x_slope (np.array) – A list-of-lists of the x values for each drawn dot. There is one item in the list for annotation made by the user. 
- y_slope (np.array) – A list-of-lists of the y values for each drawn dot. There is one item in the list for annotation made by the user. 
- text_slope (np.array) – A list-of-lists of the text for each drawn dot. There is one item in the list for annotation made by the user. 
- gs_slope (np.array) – A list of bools indicating if the annotation was made in gold standard mode 
- data_index_slope (np.array) – A list of indices indicating what classification each classification came from 
- ext_index_slope (np.array) – A list of extractor indices used to map the reduction to the extract 
- kwargs_cluster (dict) – A dictionary containing the eps_* and dot_freq keywords 
- kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords 
 
- Returns:
- frame_gutter – A list of the resulting extractions, one item per line of text found. 
- Return type:
 
- panoptes_aggregation.reducers.text_utils.cluster_by_line(xy_rotate, xy_gutter, text_gutter, annotation_labels, gs_gutter, data_index_gutter, ext_index_gutter, kwargs_cluster, kwargs_dbscan)
- A function to take the annotations for one slope_label and cluster them based on perpendicular distance (e.g. lines of text). - Parameters:
- xy_rotate (np.array) – An array of shape nx2 containing the (x, y) positions of each dot drawn in the rotate coordinate frame. 
- xy_gutter (np.array) – An array of shape nx2 containing the (x, y) positions for each dot drawn. 
- text_gutter (np.array) – An array of shape nx1 containing the text for each dot drawn. Note: each annotation has an empty string added to the end so this array has the same shape as xy_slope. 
- annotation_labels (np.array) – An array of shape nx1 containing a unique label indicating what annotation each position/text came from. This information is used to ensure one annotation does not span multiple lines. 
- gs_gutter (np.array) – An array of bools indicating if the annotation was made in gold standard mode 
- data_index_gutter (np.array) – An array of indices indicating what classification each classification came from 
- ext_index_gutter (np.array) – A list of extractor indices used to map the reduction to the extract 
- kwargs_cluster (dict) – A dictionary containing the eps_*, and dot_freq keywords 
- kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords 
 
- Returns:
- frame_lines – A list of reductions, one for each line. Each reduction is a dictionary containing the information for the line. 
- Return type:
 
- panoptes_aggregation.reducers.text_utils.cluster_by_slope(x_frame, y_frame, text_frame, slope_frame, gs_frame, data_index_frame, ext_index_frame, kwargs_cluster, kwargs_dbscan)
- A function to take the annotations for one gutter_label and cluster them based on what slope the transcription is. - Parameters:
- x_frame (np.array) – A list-of-lists of the x values for each drawn dot. There is one item in the list for annotation made by the user. 
- y_frame (np.array) – A list-of-lists of the y values for each drawn dot. There is one item in the list for annotation made by the user. 
- text_frame (np.array) – A list-of-lists of the text for each drawn dot. There is one item in the list for annotation made by the user. The inner text lists are padded with an empty string at the end so there is the same number of words as there are dots. 
- slope_frame (np.array) – A list of the slopes (in deg) for each annotation 
- gs_frame (np.array) – A list of bools indicating if the annotation was made in gold standard mode 
- data_index_frame (np.array) – A list of indices indicating what classification each classification came from 
- ext_index_frame (np.array) – A list of extractor indices used to map the reduction to the extract 
- kwargs_cluster (dict) – A dictionary containing the eps_* and dot_freq keywords 
- kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords 
 
- Returns:
- frame_slope – A list of the resulting extractions, one item per line of text found. 
- Return type:
 
- panoptes_aggregation.reducers.text_utils.cluster_by_word(word_line, xy_line, text_line, annotation_labels, kwargs_cluster, kwargs_dbscan)
- A function to take the annotations for one line of text and cluster them based on the words in the line. - Parameters:
- word_line (np.array) – An nx1 array with the x-position of each dot in the rotated coordinate frame. 
- xy_line (np.array) – An nx2 array with the non-rotated (x, y) positions of each dot. 
- text_line (np.array) – An nx1 array with the text for each dot. 
- annotation_labels (np.array) – An nx1 array with a label indicating what annotaiton each word belongs to. 
- kwargs_cluster (dict) – A dictionary containing the eps_* and dot_freq keywords 
- kwargs_dbscan (dict) – A dictionary containing all the other DBSCAN keywords 
 
- Returns:
- clusters_x (list) – A list with the x-position of each dot cluster found 
- clusters_y (list) – A list with the y-position of each dot cluster found 
- clusters_text (list) – A list-of-lists with the words transcribed at each dot cluster found. One list per cluster. Note: the empty strings that were added to each annotaiton are stripped before returning the words. 
 
 
- panoptes_aggregation.reducers.text_utils.consensus_score(clusters_text)
- A function to take clustered text data and return the consensus score - Parameters:
- clusters_text (list) – A list-of-lists with length equal to the number of words in a line of text and each inner list contains the transcriptions for each word. 
- Returns:
- consensus_score (float) – A value indicating the average number of users that agree on the line of text. 
- consensus_text (str) – A string with the consensus sentence 
 
 
- panoptes_aggregation.reducers.text_utils.gutter(lines_in, tol=0)
- Cluster list of input line segments by what side of the page gutter they are on. - Parameters:
- lines_in (list) – A list-of-lists containing one line segment per item. Each line segment should contain only the x-coordinate of each point on the line. 
- Returns:
- gutter_index – A numpy array containing the cluster label for each input line. This label indicates what side of the gutter(s) the input line segment is on. 
- Return type:
- array 
 
- panoptes_aggregation.reducers.text_utils.overlap(x, y, tol=0)
- Check if two line segments overlap - Parameters:
- x (list) – A list with the start and end point of the first line segment 
- y (lits) – A list with the start and end point of the second line segment 
- tol (float) – The tolerance to consider lines overlapping. Default 0, positive value indicate small overlaps are not considered, negative values indicate small gaps are not considered. 
 
- Returns:
- overlap – True if the two line segments overlap, False otherwise 
- Return type:
 
- panoptes_aggregation.reducers.text_utils.sort_labels(db_labels, data, reducer=<function mean>, descending=False)
- A function that takes in the cluster lables for some data and returns a sorted (by the original data) list of the unique lables in. - Parameters:
- db_labels (list) – A list of cluster lables, one label for each data point. 
- data (np.array) – The data the lables belong to 
- reducer (function (optional)) – The function used to combine the data for each label. Default: np.mean 
- descending (bool (optional)) – A flag indicating if the lables should be sorted in descending order. Default: False 
 
- Returns:
- lables – A list of unique cluster lables sorted in either ascending or descending order. 
- Return type:
 
- panoptes_aggregation.reducers.text_utils.tokenize(self, contents)
- Tokenize only on space so angle bracket tags are not split 
Shakespeares World Variants Reducer
This module provides a fuction to reduce the variants data from extracts.
- panoptes_aggregation.reducers.sw_variant_reducer.sw_variant_reducer(extracts)
- Reduce all variants for a subject into one list - Parameters:
- extracts (list) – A list of extracts created by - panoptes_aggregation.extractors.sw_variant_extractor.sw_variant_extractor()
- Returns:
- reduction – A dictionary with at most one key, variants with the list of all variants in the subject 
- Return type:
 
Dropdown Reducer
This module porvides functions to reduce the dropdown task extracts from
panoptes_aggregation.extractors.dropdown_extractor.
- panoptes_aggregation.reducers.dropdown_reducer.dropdown_reducer(votes_list)
- Reducer a list-of-lists of Counter objects into one list of dicts - Parameters:
- votes_list (list) – A list-of-lists of Counter objects from - process_data()
- Returns:
- reduction – A dictionary with one key value the contains a list of dictionaries (one for each dropdown in the task) giving the vote count for each key 
- Return type:
 
- panoptes_aggregation.reducers.dropdown_reducer.process_data(data)
- Process a list of extracted dropdown answers into Counter objects - Parameters:
- data (list) – A list of extractions created by - panoptes_aggregation.extractors.dropdown_extractor.dropdown_extractor()
- Returns:
- process_data – A list-of-lists of Counter objects. The is one element of the outer list for each classification made, and one element of the inner list for each dropdown list in the task. 
- Return type:
 
TESS Column Reducer
This module provides functions to reduce the column task extracts for the TESS project.
Extracts are from panoptes_aggregation.extractors.shape_extractor.
- panoptes_aggregation.reducers.tess_reducer_column.process_data(data, **kwargs_extra_data)
- Process a list of extractions into lists of x and y sorted by tool - Parameters:
- data (list) – A list of extractions crated by - panoptes_aggregation.extractors.shape_extractor.shape_extractor()
- Returns:
- processed_data – A dictionary with two keys - data: An Nx2 numpy array containing the center and width of each column drawn 
- index: A list of length N indicating the extract index for each drawn column 
 
- Return type:
 
- panoptes_aggregation.reducers.tess_reducer_column.tess_reducer_column(data_by_tool, **kwargs)
- Cluster TESS columns using DBSCAN - Parameters:
- data_by_tool (dict) – A dictionary returned by - process_data()
- user_id (keyword, list) – A list containing the user IDs for each extract 
- relevant_reduction (keyword, list) – A list containing the TESS user reduction for each extract - panoptes_aggregation.running_reducers.tess_user_reducer.tess_user_reducer()
- x (keyword, str) – Either “center” or “left” and indicates if the x value of the classification is the center or left side of the column 
- kwargs – See DBSCAN 
 
- Returns:
- reduction – A dictionary with the following keys - centers : A list with the center x position for all identified columns 
- widths : A list with the full width of all identified columns 
- counts : A list with the number of volunteers who identified each column 
- weighted_counts : A list with the weighted number of volunteers who identified each column 
- user_ids: A list of lists with the user_id for each volunteer who marked each column 
- max_weighted_counts: The largest likelihood of a transit for this subject 
 
- Return type:
 
TESS Gold Standard Reducer
This module porvides functions to reduce the gold standard task extracts for the TESS project.
- panoptes_aggregation.reducers.tess_gold_standard_reducer.process_data(extracts)
- Process the feedback extracts - Parameters:
- extracts (list) – A list of extracts from Caesar’s pluck field extractor 
- Returns:
- success – A list-of-lists, one list for each classification with booleans indicating the volunteer’s success at finding each gold standard transit in a subject. 
- Return type:
 
- panoptes_aggregation.reducers.tess_gold_standard_reducer.tess_gold_standard_reducer(data)
- Calculate the difficulty of a gold standard TESS subject - Parameters:
- data (list) – The results of - process_data()
- Returns:
- output – A dictinary with one key difficulty that is a list with the fraction of volunteers who successfully found each gold standard transit in a subject. 
- Return type:
 
Utilities for polygon_reducer
This module provides utilities used to reduce the polygon extractions
for panoptes_aggregation.reducers.polygon_reducer.
- panoptes_aggregation.reducers.polygon_reducer_utils.IoU_cluster_mean_distance(distances_matrix)
- The mean IoU_metric_polygon distance between the polygons of the cluster. - Parameters:
- distances_matrix (numpy.ndarray) – A symmetric-square array, with the off-diagonal elements containing the IoU_metric_polygon distance between the cluster members. The diagonal elements are all zero. This is found using IoU_distance_matrix_of_cluster. 
- Returns:
- distances_mean – The mean of the IoU_metric_polygon defined distance between the polygons of the cluster. 
- Return type:
 
- panoptes_aggregation.reducers.polygon_reducer_utils.IoU_distance_matrix_of_cluster(cdx, X, data)
- Find distance matrix using IoU_metric_polygon for a cluster. - The cdx argument is used to define the cluster out of the full X and data data sets, which may also contain other polygons not in the cluster. - Parameters:
- cdx (numpy.ndarray) – A 1D array of booleans, corresponding to the polygons in X and data which are in the cluster. True if in the cluster, False otherwise. 
- X (numpy.ndarray) – A 2D array with each row mapping to the data held in data. The first column contains row indices and the second column is an index assigned to each user. 
- data_cluster (list) – A list of dicts that take the form {polygon: shapely.geometry.polygon.Polygon, ‘time’: float, ‘gold_standard’, bool} There is one element in this list for each member of the cluster. 
 
- Returns:
- distances_matrix – A symmetric-square array, with the off-diagonal elements containing the IoU distance between the cluster members. The diagonal elements are all zero. 
- Return type:
- numpy.ndarray 
 
- panoptes_aggregation.reducers.polygon_reducer_utils.IoU_metric_polygon(a, b, data_in=[])
- Find the Intersection of Union distance between two polygons. This is based on the Jaccard metric - To use this metric within the clustering code without having to precompute the full distance matrix a and b are index mappings to the data contained in data_in. a and b also contain the user information that is used to help prevent self-clustering. The polygons used to calculate the IoU distance are contained in data_in, along with the timestamp of creation. - Parameters:
- a (list) – A two element list containing [index mapping to data, index mapping to user] 
- b (list) – A two element list containing [index mapping to data, index mapping to user] A list of the parameters for shape 2 (as defined by PFE) 
- data_in (list) – A list of dicts that take the form {polygon: shapely.geometry.polygon.Polygon, ‘time’: float, ‘gold_standard’, bool} There is one element in this list for each classification made. The time should be a Unix timestamp float. 
 
- Returns:
- distance – The IoU distance between the two polygons. 0 means the polygons are the same, 1 means the polygons don’t overlap, values in the middle mean partial overlap. 
- Return type:
 
- panoptes_aggregation.reducers.polygon_reducer_utils.cluster_average_intersection(data, **kwargs)
- Find the intersection of provided cluster data - Parameters:
- data (list) – A list of dicts that take the form {polygon: shapely.geometry.polygon.Polygon, ‘gold_standard’, bool} There is one element in this list for each classification made. 
- kwargs – - created_at : A list of when the classifcations was made. Not used in this average. 
- distance_matrix : A symmetric-square array, with the off-diagonal elements containing the IoU_metric_polygon distance between the cluster members. The diagonal elements are all zero. This is found using IoU_distance_matrix_of_cluster. Not used in this average. 
 
 
- Returns:
- intersection_all – The shapely intersection of the shapely polygons in the cluster. 
- Return type:
- shapely.geometry.polygon.Polygon 
 
- panoptes_aggregation.reducers.polygon_reducer_utils.cluster_average_intersection_contours(data, **kwargs)
- Find contours of intersection as a list. Each item of the list will be the largest contour of i intersections, with the next item being the contour i+1 intersection etc. The intersection is where the polygons overlap. This is useful for plotting the uncertainty in the cluster. - The algorithm used is as follows. First find the largest simply-connected union polygon for the cluster and add it to the list intersection_contours. Next, every intersection of two polygons is found, and made into new shapely polygons. This makes a list of ‘level-2’ polygons. These polygons may overlap. Then, find the largest simply-connected union polygon of the level-2 polygons. This is the polygon of at least 2 intersections (i.e. area where at least 2 volunteers agree). Add it to list intersection_contours. - If there is more than one level-2 polygons, which intersect, then the intersection of the level-2 polygons is found as a list. These are the level-3 polygons, as each polygon is made from at least three intersections. Then find the largest simply-connected union polygon of the level-3 polygons. This is the polygon of at least 3 intersections (i.e. area where at least 3 volunteers agree). Add it to list intersection_contours. - Continue this process until either 10 iterations have been done, or only one unique intersection polygon remains. - Parameters:
- data (list) – A list of dicts that take the form {polygon: shapely.geometry.polygon.Polygon, ‘gold_standard’, bool} There is one element in this list for each classification made. 
- kwargs – - created_at : A list when the classifcation was made. Not used in this average. 
- distance_matrix : A symmetric-square array, with the off-diagonal elements containing the IoU_metric_polygon distance between the cluster members. The diagonal elements are all zero. This is found using IoU_distance_matrix_of_cluster. Not used in this average. 
 
 
- Returns:
- intersection_contours – List of shapely objects. Each shape at position i in the list is the largest simply-connected contour of at least i intersections. 
- Return type:
 
- panoptes_aggregation.reducers.polygon_reducer_utils.cluster_average_intersection_contours_rasterisation(data, **kwargs)
- Find contours of intersection as a list. Each item of the list will be the largest contour of i intersections, with the next item being the contour i+1 intersection etc. The intersection is where the polygons overlap. This is useful for plotting the uncertainty in the cluster. - This approach uses rasterisation to find the contours. A square grid, with the number of grid points along each of the two axis given by num_grid_points, is placed over the cluster. Then the number of polygon intersections in each grid square are counted. Contours are then made from this 2D surface of intersection counts. - This function has the advantage of being more efficient than - cluster_average_intersection_contourswhen the number of polygons in the cluster is large (approximately when greater than 8). Equally if num_grid_points is small, say 10, then rasterisation is faster in most cases but gives poorer quality contours with increased risk of grid-spacing based artifacts.- The resulting contours are smoothed by default. - Parameters:
- data (list) – A list of dicts that take the form {polygon: shapely.geometry.polygon.Polygon, ‘gold_standard’, bool} There is one element in this list for each classification made. 
- kwargs – - num_grid_points: The number of grid points per axis. A larger number means greater resolution, but takes longer. Default is 100. 
- smoothing: A string to choose the type of smoothing used. If ‘minimal_sides’, the number of sides of the contour is minimised. If ‘rounded’, corners are rounded. If ‘no_smoothing’, no smoothing is done. Defaults to ‘minimal_sides’. 
- created_at : A list when the classifcation was made. Not used in this average. 
- distance_matrix : A symmetric-square array, with the off-diagonal elements containing the IoU_metric_polygon distance between the cluster members. The diagonal elements are all zero. This is found using IoU_distance_matrix_of_cluster. Not used in this average. 
 
 
- Returns:
- intersection_contours – List of shapely objects. Each shape at position i in the list is the largest simply-connected contour of at least i intersections. 
- Return type:
 
- panoptes_aggregation.reducers.polygon_reducer_utils.cluster_average_last(data, **kwargs)
- Find the last created polygon of provided cluster data - Parameters:
- data (list) – A list of dicts that take the form {polygon: shapely.geometry.polygon.Polygon, ‘gold_standard’, bool} There is one element in this list for each classification made. The time should be a Unix timestamp float. 
- kwargs – - created_at : A list of when the classifcations was made. 
- distance_matrix : A symmetric-square array, with the off-diagonal elements containing the IoU_metric_polygon distance between the cluster members. The diagonal elements are all zero. This is found using IoU_distance_matrix_of_cluster. Not used in this average. 
 
 
- Returns:
- last – The last created shapely polygon in the cluster. 
- Return type:
- shapely.geometry.polygon.Polygon 
 
- panoptes_aggregation.reducers.polygon_reducer_utils.cluster_average_median(data, **kwargs)
- Find the ‘median’ of provided cluster data, i.e. the polygon of the cluster with the minimum total distance to the other polygons. - Parameters:
- data (list) – A list of dicts that take the form {polygon: shapely.geometry.polygon.Polygon, ‘gold_standard’, bool} There is one element in this list for each classification made. 
- kwargs – - created_at : A list when the classifcation was made. Not used in this average. 
- distance_matrix : A symmetric-square array, with the off-diagonal elements containing the IoU_metric_polygon distance between the cluster members. The diagonal elements are all zero. This is found using IoU_distance_matrix_of_cluster. 
 
 
- Returns:
- median – The ‘median’ polygon in the cluster. 
- Return type:
- shapely.geometry.polygon.Polygon 
 
- panoptes_aggregation.reducers.polygon_reducer_utils.cluster_average_union(data, **kwargs)
- Find the union of provided cluster data - Parameters:
- data (list) – A list of dicts that take the form {polygon: shapely.geometry.polygon.Polygon, ‘gold_standard’, bool} There is one element in this list for each classification made. 
- kwargs – - created_at : A list when the classifcation was made. Not used in this average. 
- distance_matrix : A symmetric-square array, with the off-diagonal elements containing the IoU_metric_polygon distance between the cluster members. The diagonal elements are all zero. This is found using IoU_distance_matrix_of_cluster. Not used in this average. 
 
 
- Returns:
- union_all – The shapely union of the shapely polygons in the cluster. 
- Return type:
- shapely.geometry.polygon.Polygon 
 
Utilities for optics_line_text_reducer
This module provides utilities used to reduce the polygon-text extractions
for panoptes_aggregation.reducers.optics_line_text_reducer.  It
assumes that all extracts are full lines of text in the document.
- panoptes_aggregation.reducers.optics_text_utils.cluster_of_one(X, data, user_ids, extract_index)
- Create “clusters of one” out of the data passed in. Lines of text identified as noise are kept around as clusters of one so they can be displayed in the front-end to the next user. - Parameters:
- X (list) – A nx2 list with each row containing [index mapping to data, index mapping to user] 
- data (list) – A list containing dictionaries with the original data that X maps to, of the form {‘x’: [start_x, end_x], ‘y’: [start_y, end_y], ‘text’: [‘text for line’], ‘gold_standard’: bool}. 
- user_ids (list) – A list of user_ids (The second column of X maps to this list) 
- extract_index (list) – A list of n values with the extract index for each of rows in X 
 
- Returns:
- clusters – A list with n clusters each containing only one classification 
- Return type:
 
- panoptes_aggregation.reducers.optics_text_utils.get_min_samples(N)
- Get the min_samples attribute based on the number of users who have transcribed the subject. These values were found based on example data from ASM. - Parameters:
- N (integer) – The number of users who have see the subject 
- Returns:
- min_samples – The value to use for the min_samples keyword in OPTICS 
- Return type:
- integer 
 
- panoptes_aggregation.reducers.optics_text_utils.metric(a, b, data_in=[])
- Calculate the distance between two drawn lines that have text associated with them. This distance is found by summing the euclidean distance between the start points of each line, the euclidean distance between the end points of each line, and the Levenshtein distance of the text for each line. The Levenshtein distance is done after stripping text tags and consolidating whitespace. - To use this metric within the clustering code without haveing to precompute the full distance matrix a and b are index mappings to the data contained in data_in. a and b also contain the user information that is used to help prevent self-clustering. - Parameters:
- a (list) – A two element list containing [index mapping to data, index mapping to user] 
- b (list) – A two element list containing [index mapping to data, index mapping to user] 
- data_in (list) – A list of dicts that take the form {x: [start_x, end_x], y: [start_y, end_y], ‘text’: [‘text for line’], ‘gold_standard’, bool} There is one element in this list for each classification made. 
 
- Returns:
- distance – The distance between a and b 
- Return type:
 
- panoptes_aggregation.reducers.optics_text_utils.order_lines(frame_in, angle_eps=30, gutter_eps=150)
- Place the identified lines within a single frame in reading order - Parameters:
- frame (list) – A list of identified transcribed lines (one frame from panoptes_aggregation.reducers.optics_line_text_reducer.optics_line_text_reducer) 
- angle_eps (float) – The DBSCAN eps value to use for the slope clustering 
- gutter_eps (float) – The DBSCAN eps value to use for the column clustering 
 
- Returns:
- frame_ordered – The identified transcribed lines in reading order. The slope_label and gutter_label values are added to each line to indicate what cluster it belongs to. 
- Return type:
 
- panoptes_aggregation.reducers.optics_text_utils.remove_user_duplication(labels_, core_distances_, users)
- Make sure a users only shows up in a cluster at most once. If a user does show up more than once in a cluster take the point with the smallest core distance, all others are assigned as noise (-1). - Parameters:
- labels_ (numpy.array) – A list containing the cluster labels for each data point 
- core_distances_ (numpy.array) – A list of core distance for each data point 
- users (numpy.array) – A list of indices that map to users, one for each data point 
 
- Returns:
- clean_labels_ – A list containing the new cluster labels. 
- Return type:
- numpy.array 
 
- panoptes_aggregation.reducers.optics_text_utils.strip_tags(s)
- Remove square bracket tags from text and consolidating whitespace - Parameters:
- s (string) – The input string 
- Returns:
- clean_s – The cleaned string 
- Return type:
- string 
 
Line Tool with Text Subtask Reducer using OPTICS
This module provides functions to reduce the polygon-text extractions from
panoptes_aggregation.extractors.poly_line_text_extractor using the
density independent clustering algorithm OPTICS.  It is assumed that all
extracts are full lines of text in the document.
- panoptes_aggregation.reducers.optics_line_text_reducer.optics_line_text_reducer(data_by_frame, **kwargs_optics)
- Reduce the line-text extracts as a list of lines of text. - Parameters:
- data_by_frame (dict) – A dictionary returned by - process_data()
- kwargs – 
- min_samples : The smallest number of transcribed lines needed to form a cluster. auto will set this value based on the number of volunteers who transcribed on a page within a subject. 
- xi : Determines the minimum steepness on the reachability plot that constitutes a cluster boundary. 
- angle_eps : How close the angle of two lines need to be in order to be placed in the same angle cluster. Note: This will only change the order of the lines. 
- gutter_eps : How close the x position of the start of two lines need to be in order to be placed in the same column cluster. Note: This will only change the order of the lines. 
- min_line_length : The minimum length a transcribed line of text needs to be in order to be used in the reduction. 
- low_consensus_threshold : The minimum consensus score allowed to be considered “done”. 
- minimum_views : A value that is passed along to the font-end to set when lines should turn grey (has no effect on aggregation) 
 
 
- Returns:
- reduction – A dictionary with on key for each frame of the subject that have lists as values. Each item of the list represents one line transcribed of text and is a dictionary with these keys: - clusters_x : the x position of each identified word 
- clusters_y : the y position of each identified word 
- clusters_text : A list of lists containing the text at each cluster position There is one list for each identified word, and each of those lists contains one item for each user that identified the cluster. If the user did not transcribe the word an empty string is used. 
- line_slope: The slope of the line of text in degrees 
- number_views : The number of users that transcribed the line of text 
- consensus_score : The average number of users who’s text agreed for the line Note, if consensus_score is the same a number_views every user agreed with each other 
- user_ids: List of panoptes user ids in the same order as clusters_text 
- gold_standard: List of bools indicating of the if a transcription was made in frontends gold standard mode 
- slope_label: integer indicating what slope cluster the line belongs to 
- gutter_label: integer indicating what gutter cluster (i.e. column) the line belongs to 
- low_consensus : True if the consensus_score is less than the threshold set by the low_consensus_threshold keyword 
 - For the entire subject the following is also returned: * low_consensus_lines : The number of lines with low consensus * transcribed_lines : The total number of lines transcribed on the subject - Note: the image coordinate system has y increasing downward. 
- Return type:
 
- panoptes_aggregation.reducers.optics_line_text_reducer.process_data(data_list, min_line_length=0.0)
- Process a list of extractions into a dictionary organized by frame - Parameters:
- data_list (list) – A list of extractions created by - panoptes_aggregation.extractors.poly_line_text_extractor.poly_line_text_extractor()
- Returns:
- processed_data – A dictionary with one key for each frame of the subject. The value for each key is a dictionary with two keys X and data. X is a 2D array with each row mapping to the data held in data. The first column contains row indices and the second column is an index assigned to each user. data is a list of dictionaries of the form {‘x’: [start_x, end_x], ‘y’: [start_y, end_y], ‘text’: [‘text for line’], ‘gold_standard’: bool}. 
- Return type:
 
Text Tool Reducer
This module provides functions to reducer the panoptes text tool into an alignment table.
- panoptes_aggregation.reducers.text_reducer.process_data(data_list)
- Flatten list of extracts into a list of strings. Empty strings are not returned 
- panoptes_aggregation.reducers.text_reducer.text_reducer(data_in, **kwargs)
- Reduce a list of text into an alignment table :Parameters: data (list) – A list of strings to be aligned - Returns:
- reduction – A dictionary with the following keys: - aligned_text: A list of lists containing the aligned text. There is one list for each identified word, and each of those lists contains one item for each user that entered text. If the user did not transcribe a word an empty string is used. 
- number_views: Number of volunteers who entered non-blank text 
- consensus_score: The average number of users who’s text agreed. Note, if consensus_score is the same a number_views every user agreed with each other 
 
- Return type:
 
First N True Reducer
This module is designed to reduce boolean-valued extracts e.g.
panoptes_aggregation.extractors.all_tasks_empty_extractor.
It returns true if and only if the first N extracts are True.
- panoptes_aggregation.reducers.first_n_true_reducer.first_n_true_reducer(data_list, n=0, **kwargs)
- Reduce a list of boolean values to a single boolean value. - Parameters:
- data_list (list) – A list of dicts containing a “result” key which should correspond with a boolean value. 
- n (int) – The first n results in data_list must be True. 
 
- Returns:
- reduction – reduction[“result”] is True if the first n results in data_list are True. Otherwise False. 
- Return type: