straditize.binary module¶

A module to read in and digitize the pollen diagram

Disclaimer

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

Classes

`BarDataReader`(args, *kwargs)	A DataReader for digitizing bar pollen diagrams
`DataReader`(image[, ax, extent, plot, …])	A class to read in and digitize the data files of the pollen diagram
`LineDataReader`(image[, ax, extent, plot, …])	A data reader for digitizing line diagrams
`RoundedBarDataReader`(args, *kwargs)	A bar data reader that can be used for rounded bars

Functions

`groupby_arr`(arr)	Groupby a boolean array
`only_parent`(func)	Call the given func only from the parent reader

class straditize.binary.BarDataReader(*args, **kwargs)[source]¶

Bases: straditize.binary.DataReader

A DataReader for digitizing bar pollen diagrams

Compared to the base DataReader class, this reader implements a different strategy in digitizing and finding the samples. When digitizing the full diagram, we try to find the distinct bars using the get_bars() method. These bars might have to be splitted manually if they are not easy to distinguish. One key element to distinguish to adjacent bars is the specified tolerance.

The base class works for rectangular bars. If you require rounded bars, use the RoundedBarDataReader

Parameters: tolerance (int) – If x0 is the value in a pixel row y and x1 the value in the next pixel row y+1, then the two pixel rows are considered as belonging to different bars if abs(x1 - x0) > tolerance (see the get_bars() method and the tolerance attribute)

Methods

`create_grouper`(ds, columns, args, *kwargs)	Create the grouper that plots the results
`digitize`([do_split, inplace])	Reimplemented to ignore the rows between the bars
`find_potential_samples`(col[, min_len, …])	Find the bars in the column
`from_dataset`(ds, args, *kwargs)	Create a new `DataReader` from a `xarray.Dataset`
`get_bars`(arr[, do_split])	Find the distinct bars in an array
`shift_vertical`(pixels)	Shift the columns vertically.
`to_dataset`([ds])	All the necessary data as a `xarray.Dataset`

Attributes

`min_fract`	The minimum fraction of overlap for two bars to be considered as the
`nc_meta`	dict() -> new empty dictionary
`samples_at_boundaries`	There should not be samples at the boundaries because the first
`tolerance`	Tolerance to distinguish bars.

create_grouper(ds, columns, *args, **kwargs)[source]¶

Create the grouper that plots the results

Parameters

ds (xarray.Dataset) – The dataset with the data
columns (list of int) – The numbers of the columns for which the grouper should be created
fig (matplotlib.figure.Figure) – The matplotlib figure to plot on
x0 (float) – The left boundary of the larger Bbox of the stratigraphic diagram
y0 (int) – The upper boundary of the larger Bbox of the stratigraphic diagram
width (float) – The width of the final axes between 0 and 1
height (float) – The height of the final axis between 0 and 1
ax0 (matplotlib.axes.Axes) – The larger matplotlib axes whose bounding box shall be used.
transformed (bool) – If True, y-axes and x-axes have been translated (see the px2data_x() and px2data_y() methods)
colnames (list of str) – The column names to use in the plot
**kwargs – any other keyword argument that is passed to the psy_strat.stratplot.StratGroup.from_dataset() method

Returns

The grouper that visualizes the given columns in the fig

Return type

psy_strat.stratplot.StratGroup

digitize(do_split=False, inplace=True)[source]¶

Reimplemented to ignore the rows between the bars

Parameters

do_split (bool) – If True and a bar is 1.7 times longer than the mean, it is splitted into two.
inplace (bool) – If True (default), the full_df attribute is updated. Otherwise a DataFrame is returned

find_potential_samples(col, min_len=None, max_len=None, filter_func=None)[source]¶

Find the bars in the column

This method gets the bars in the given col and returns the distinct indices

Parameters

col (int) – The column for which to find the extrema
min_len (int) – The minimum length of one extremum. If the width of the interval where we found an extrumum is smaller than that, the extremum is ignored. If None, this parameter does not have an effect (i.e. min_len=1).
max_len (int) – The maximum length of one extremum. If the width of the interval where we found an extrumum is greater than that, the extremum is ignored. If None, this parameter does not have an effect.
filter_func (function) – A function to filter the extreme. It must accept one argument which is a list of integers representing the indices of the extremum in a

Returns

list of list of int of shape (N, 2) – The list of N extremum locations. Each tuple in this list represents an interval a where one extremum might be located
list of list of int – The excluded extremum locations that are ignored because we could not find a change of sign in the slope.

See also

find_samples()

classmethod from_dataset(ds, *args, **kwargs)[source]¶

Create a new DataReader from a xarray.Dataset

Parameters

ds (xarray.Dataset) – The dataset that has been stored with the to_dataset() method
*args,**kwargs – Any other arguments passed to the DataReader constructor

Returns

The reader recreated from ds

Return type

DataReader

get_bars(arr, do_split=False)[source]¶

Find the distinct bars in an array

Parameters

arr (np.ndarray) – The array to find the bars in
do_split (bool) – If True and a bar is 1.7 times longer than the mean, it is splitted into two.

Returns

list of list of ints – The list of the distinct positions of the bars
list of floats – The heights for each of the bars
list of list of ints – The indices of bars that are longer than 1.7 times the mean of the other bars and should be splitted. If do_split is True, they have been splitted already

max_len = None¶

min_fract = 0.9¶: The minimum fraction of overlap for two bars to be considered as the same sample (see unique_bars())

min_len = None¶

nc_meta = {'bars{reader}_bars': {'dims': ('bars{reader}_bar', 'limit'), 'long_name': 'Boundaries of bars', 'units': 'px'}, 'bars{reader}_full_data_orig': {'dims': ('ydata', 'bars{reader}_column'), 'long_name': 'Full digitized data ignoring bars', 'units': 'px'}, 'bars{reader}_max_len': {'dims': (), 'long_name': 'Maximum length of a bar'}, 'bars{reader}_min_fract': {'dims': (), 'long_name': 'Minimum fraction for overlap estimation'}, 'bars{reader}_min_len': {'dims': (), 'long_name': 'Minimum length of a bar'}, 'bars{reader}_nbars': {'dims': 'bars{reader}_column', 'long_name': 'number of bars per column'}, 'bars{reader}_nsplit': {'dims': 'bars{reader}_column', 'long_name': 'number of the splitted bars'}, 'bars{reader}_splitted': {'dims': ('bar_split', 'limit'), 'long_name': 'Boundaries of bars to split', 'units': 'px'}, 'bars{reader}_tolerance': {'dims': (), 'long_name': 'bar distinguishing tolerance'}, 'binary': {'dims': ('reader', 'ydata', 'xdata'), 'long_name': 'Binary images for data readers'}, 'col_map': {'dims': 'column', 'long_name': 'Mapping from column to reader', 'units': 'reader_index'}, 'column_ends': {'dims': 'column', 'long_name': 'Ends of the columns', 'units': 'px'}, 'column_starts': {'dims': 'column', 'long_name': 'Start of the columns', 'units': 'px'}, 'exag_col_map': {'dims': 'column', 'long_name': 'Mapping from column to exaggerated reader', 'units': 'reader_index'}, 'full_data': {'dims': ('ydata', 'column'), 'long_name': 'Full digitized data', 'units': 'px'}, 'hline': {'long_name': 'Horizontal line location', 'units': 'px'}, 'is_exaggerated': {'dims': 'reader', 'long_name': 'Exaggeration factor'}, 'occurences': {'comments': 'The locations where the only an occurence of a taxa is highlighted without value', 'dims': ('occurence', 'xy'), 'long_name': 'taxa occurences'}, 'reader': {'dims': 'reader', 'long_name': 'index of the reader'}, 'reader_cls': {'dims': 'reader', 'long_name': 'The name of the class constructor'}, 'reader_image': {'dims': ('reader', 'ydata', 'xdata', 'rgba'), 'long_name': 'RGBA images for data readers', 'units': 'color'}, 'reader_mod': {'dims': 'reader', 'long_name': 'The module of the reader class'}, 'rough_locs': {'dims': ('sample', 'column', 'limit'), 'long_name': 'Rough locations for samples'}, 'sample': {'long_name': 'Sample location', 'units': 'px'}, 'samples': {'dims': ('sample', 'column'), 'long_name': 'Sample data', 'units': 'px'}, 'shifted': {'dims': 'column', 'long_name': 'Vertical shift per column', 'units': 'px'}, 'vline': {'long_name': 'Vertical line location', 'units': 'px'}, 'xaxis_translation': {'dims': ('reader', 'px_data', 'limit'), 'long_name': 'Pixel to data mapping for x-axis'}}¶

samples_at_boundaries = False¶: There should not be samples at the boundaries because the first sample is in the middle of the first bar

shift_vertical(pixels)[source]¶

Shift the columns vertically.

Parameters: pixels (list of floats) – The y-value for each column for which to shift the values. Note that theses values have to be greater than or equal to 0

to_dataset(ds=None)[source]¶

All the necessary data as a xarray.Dataset

Parameters: ds (xarray.Dataset) – The dataset in which to insert the data. If None, a new one will be created
Returns: Either the given ds or a new xarray.Dataset instance
Return type: xarray.Dataset

tolerance = 2¶: Tolerance to distinguish bars. If x0 is the value in a pixel row y and x1 the value in the next pixel row y+1, then the two pixel rows are considered as belonging to different bars if abs(x1 - x0) > tolerance

class straditize.binary.DataReader(image, ax=None, extent=None, plot=True, children=[], parent=None, magni=None, plot_background=False, binary=None)[source]¶

Bases: straditize.label_selection.LabelSelection

A class to read in and digitize the data files of the pollen diagram

The source image is stored in the image attribute, the binary array of it is stored in the binary attribute. A labeled version created by the skimage.morphology.label() function, is stored in the labels attribute and can regenerated using the reset_labels() method.

Subclasses of this class should reimplement the digitize() method that digitizes the diagram, and the find_potential_samples() method.

There is always one parent reader stored in the parent attribute. This is then the reader that is accessible through the straditize.straditizer.Straditizer.data_reader attribute and holds the references to other readers in it’s children attribute

Parameters

image (PIL.Image.Image) – The image of the diagram
ax (matplotlib.axes.Axes) – The matplotlib axes to plot on
extent (list) – List of four number specifying the extent of the image in it’s source. This extent will be used for the call of matplotlib.pyplot.imshow()
children (list of DataReader) – Child readers for other columns in case the newly created instance is the parent reader
parent (DataReader) – The parent reader.
magni (straditize.magnifier.Magnifier) – The magnifier for the given ax
plot_background (bool) – If True (and plot is True), a white, opaque are is plotted below the plot_im
binary (None) – The binary version of the given image. If not provided, the to_binary_pil() method is used with the given image

Methods

`add_samples`(samples[, rough_locs])	Add samples to the found ones
`close`()
`color_labels`([categorize])	The labels of the colored array
`create_exaggerations_reader`(factor[, cls])	Create a new exaggerations reader for this reader
`create_grouper`(ds, columns, fig, x0, y0, …)	Create the grouper that plots the results
`create_variable`(ds, vname, data, **kwargs)	Insert the data into a variable in an `xr.Dataset`
`digitize`([use_sum, inplace])	Digitize the binary image to create the full dataframe
`digitize_exaggerated`([fraction, absolute, …])	Merge the exaggerated values into the original digitized result
`disable_label_selection`(args, *kwargs)	Disable the label selection
`draw_figure`()	Draw the matplotlib `fig` and the `magni` figure
`end_column_selection`()	End the column selection and rmove the artists
`estimated_column_starts`([threshold])	The estimated column starts as `numpy.ndarray`.
`find_potential_samples`(col[, min_len, …])	Find potential samples in an array
`find_samples`([min_fract, pixel_tol])	Find the samples in the diagram
`found_extrema_per_row`()	Calculate how many columns have a potential sample in each pixel row
`from_dataset`(ds, args, *kwargs)	Create a new `DataReader` from a `xarray.Dataset`
`get_bbox_for_cols`(columns, x0, y0, width, height)	Get the boundary boxes for the columns of this reader in the results
`get_binary_for_col`(col)	Get the binary array for a specific column
`get_cross_column_features`([min_px])	Get features that are contained in two or more columns
`get_disconnected_parts`([fromlast, from0, …])	Identify parts in the `binary` data that are not connected
`get_labeled_array`()	Create a connectivity-based labeled array of the `binary` data
`get_occurences`()	Extract the positions of the occurences from the selection
`get_parts_at_column_ends`([npixels])	Identify parts in the `binary` data that touch the next column
`get_reader_for_col`(col)	Get the reader for a specific column
`get_surrounding_slopes`(indices, arr)
`image_array`()	The RGBA values of the colored image
`is_obstacle`(indices, arr)	Check whether the found extrema is only an obstacle of the picture
`mark_as_exaggerations`(mask)	Mask the given array as exaggerated
`merge_close_samples`(locs[, rough_locs, …])
`merged_binaries`()	Get the binary data from all children and merge them into one array
`merged_labels`()	Get the labeled binary data from all children merged into one array
`new_child_for_cols`(columns, cls[, plot])	Create a new child reader for specific columns
`plot_background`([ax])	Plot a white layer below the `plot_im`
`plot_color_image`([ax])	Plot the colored `image` on a matplotlib axes
`plot_full_df`([ax])	Plot the lines for the digitized diagram
`plot_image`([ax])	Plot the `binary` data image on a matplotlib axes
`plot_other_potential_samples`([tol, …])	Plot potential samples that are not yet in the `samples`
`plot_potential_samples`([excluded, ax, plot_kws])	Plot the ranges for potential samples
`plot_results`(df[, ax, fig, transformed])	Plot the reconstructed diagram
`plot_sample_hlines`([ax])	Plot one horizontal line per sample in the `sample_locs`
`plot_samples`([ax])	Plot the diagram as lines reconstructed from the samples
`px2data_x`(coord)	Transform the pixel coordinates into data coordinates
`recognize_hlines`([fraction, min_lw, max_lw, …])	Recognize horizontal lines in the plot and subtract them
`recognize_vlines`([fraction, min_lw, max_lw, …])	Recognize horizontal lines in the plot and subtract them
`recognize_xaxes`([fraction, min_lw, max_lw, …])	Recognize (and potentially remove) x-axes at bottom and top
`recognize_yaxes`([fraction, min_lw, max_lw, …])	Find (and potentially remove) y-axes in the image
`remove_in_children`(arr, amask)	Update the child reader images after having removed binary data
`remove_plots`()	Remove all plotted artists by this reader
`reset_column_starts`()	Reset the column starts, `full_df`, `shifted`
`reset_image`(image[, binary])	Reset the image for this straditizer
`reset_labels`()	Reset the `labels` array
`reset_samples`()	Reset the samples
`resize_axes`(grouper, bounds)	Resize the axes based on column boundaries
`set_as_parent`()	Set this instance as the parent reader
`set_hline_locs_from_selection`([selection])	Save the locations of horizontal lines
`set_vline_locs_from_selection`([selection])	Save the locations of vertical lines
`shift_vertical`(pixels[, draw])	Shift the columns vertically.
`show_cross_column_features`([min_px, remove])	Highlight and maybe remove cross column features
`show_disconnected_parts`([fromlast, from0, …])	Highlight or remove disconnected parts
`show_parts_at_column_ends`([npixels, remove])	Highlight or remove features that touch the column ends
`show_small_parts`([n, remove])	Highlight and potentially remove small features in the image
`start_column_selection`([use_all])	Enable the user to select columns
`to_binary_pil`(image[, threshold])	Convert an image to a binary
`to_dataset`([ds])	All the necessary data as a `xarray.Dataset`
`to_grey_pil`(image[, threshold])	Convert an image to a greyscale image
`unique_bars`([min_fract, asdict])	Estimate the unique bars
`update_image`(arr, amask)	Update the image after having removed binary data
`update_rgba_image`(arr, mask)	Update the RGBA image from the given 3D-array

Attributes

`all_column_bounds`	The boundaries for the data columns
`all_column_ends`	1D numpy array with the ends for all column (including child reader)
`all_column_starts`	1D numpy array with the ends for all column (including child reader)
`ax`	The matplotlib axes where the `plot_im` is plotted on
`background`	White rectangle that represents the background of the binary image.
`binary`	A 2D numpy array representing the binary version of the `image`
`children`	Child readers for specific columns.
`column_bounds`	The boundaries for the data columns
`column_ends`	1D numpy array with the ends for each column of this reader
`column_starts`	1D numpy array with the starts for each column of this reader
`columns`	The indices of the columns that are handled by this reader
`exaggerated_reader`	The reader that represents the exaggerations
`extent`	The extent of the `plot_im`
`fig`	The matplotlib figure of the `ax`
`full_df`	The full `pandas.DataFrame` of the digitized image
`hline_locs`	`list` or floats. The indexes of horizontal lines
`image`	PIL.Image.Image of the diagram part with mode RGBA
`is_exaggerated`	Exaggeration factor that is not 0 if this reader represents exaggeration
`iter_all_readers`	Iter through the `parent` reader and it’s `children`
`label_arrs`	Built-in mutable sequence.
`labels`	A connectivity-based labeled version of the `binary` data
`magni`	the `straditize.magnifier.Magnifier` for the `ax`
`magni_background`	White rectangle that represents the background of the binary image in the magnifier.
`magni_plot_im`	magnified `plot_im`
`min_fract`	The minimum fraction of overlap for two bars to be considered as the
`nc_meta`	A mapping from variable name to meta information
`non_exaggerated_reader`	The reader that represents the exaggerations
`num_labels`	The maximum label in the `labels` array
`occurences`	A set of tuples marking the position of an occurence
`occurences_dict`	A mapping from column number to an numpy array with the indices of
`occurences_value`	The value that is given to the occurences in the measurements
`parent`	Parent reader for this instance.
`plot_im`	the matplotlib image artist
`rough_locs`	The `pandas.DataFrame` with rough locations for the samples.
`sample_locs`	The `pandas.DataFrame` with locations and values of the
`samples_at_boundaries`	a boolean flag that shall indicate if we assume that the first and last
`shifted`	The number of pixels the columns have been shifted
`strat_plot_identifier`	str(object=’’) -> str
`vline_locs`	`list` or floats. The indexes of vertical lines
`xaxis_px`	The x indices in column pixel coordinates that are used for x-axes

add_samples(samples, rough_locs=None)[source]¶

Add samples to the found ones

Parameters

samples (series, 1d-array or DataFrame) –
The samples. If it is series, we assume that the index represents the y-value of the sample and the value the x-position (see xcolumns). In case of a 1d-array, we assume that the data represents the y-values of the samples. In case of a DataFrame, we assume that the columns correspond to columns in the full_df attribute and are True where we have a sample.

Note that the y-values must be in image coordinates (see extent attribute).
rough_locs (DataFrame) – The rough locations of the new samples (see the rough_locs attribute)

property all_column_bounds¶: The boundaries for the data columns

property all_column_ends¶

1D numpy array with the ends for all column (including child reader)

See also

all_column_starts: The starts for all column
all_column_bounds: The (start, end)-tuple for all of the columns
column_ends: The ends for this specific reader

reader

property all_column_starts¶

1D numpy array with the ends for all column (including child reader)

See also

all_column_ends: The ends for all column
all_column_bounds: The (start, end)-tuple for all of the columns
column_starts: The starts for this specific reader

reader

ax = None¶: The matplotlib axes where the plot_im is plotted on

background = None¶: White rectangle that represents the background of the binary image. This is only plotted by the parent reader

binary = None¶: A 2D numpy array representing the binary version of the image

children = []¶: Child readers for specific columns. Is not empty if and only if the parent attribute is this instance

close()[source]¶

color_labels(categorize=1)[source]¶: The labels of the colored array

property column_bounds¶: The boundaries for the data columns

property column_ends¶

1D numpy array with the ends for each column of this reader

See also

column_starts: The starts for each column
column_bounds: The (start, end)-tuple for each of the columns
all_column_ends: The ends for all columns, including child

reader

property column_starts¶

1D numpy array with the starts for each column of this reader

See also

column_ends: The ends for each column
column_bounds: The (start, end)-tuple for each of the columns
all_column_starts: The starts for all columns, including child

reader

property columns¶: The indices of the columns that are handled by this reader

create_exaggerations_reader(factor, cls=None)[source]¶

Create a new exaggerations reader for this reader

Parameters

factor (float) – The exaggeration factor
cls (type) – The DataReader subclass

Returns

The new exaggerated reader

Return type

instance of cls

create_grouper(ds, columns, fig, x0, y0, width, height, ax0=None, transformed=True, colnames=None, **kwargs)[source]¶

Create the grouper that plots the results

Parameters

ds (xarray.Dataset) – The dataset with the data
columns (list of int) – The numbers of the columns for which the grouper should be created
fig (matplotlib.figure.Figure) – The matplotlib figure to plot on
x0 (float) – The left boundary of the larger Bbox of the stratigraphic diagram
y0 (int) – The upper boundary of the larger Bbox of the stratigraphic diagram
width (float) – The width of the final axes between 0 and 1
height (float) – The height of the final axis between 0 and 1
ax0 (matplotlib.axes.Axes) – The larger matplotlib axes whose bounding box shall be used.
transformed (bool) – If True, y-axes and x-axes have been translated (see the px2data_x() and px2data_y() methods)
colnames (list of str) – The column names to use in the plot
**kwargs – any other keyword argument that is passed to the psy_strat.stratplot.StratGroup.from_dataset() method

Returns

The grouper that visualizes the given columns in the fig

Return type

psy_strat.stratplot.StratGroup

create_variable(ds, vname, data, **kwargs)[source]¶

Insert the data into a variable in an xr.Dataset

Parameters

ds (xarray.Dataset) – The destination dataset
vname (str) – The name of the variable in the nc_meta mapping. This name might include {reader} which will then be replaced by the number of the reader in the iter_all_readers attribute
data (np.ndarray) – The numpy array to store in the variable specified by vname
**kwargs – A mapping from dimension to slicer that should be used to slice the dataset

Returns

The resolved vname that has been used in the dataset

Return type

str

digitize(use_sum=False, inplace=True)[source]¶

Digitize the binary image to create the full dataframe

Parameters

use_sum (bool) – If True, the sum of cells that are not background are used for each column, otherwise the value of the cell is used that has the maximal distance to the column start for each row
inplace (bool) – If True (default), the full_df attribute is updated. Otherwise a DataFrame is returned

Returns

The digitization result if inplace is True, otherwise None

Return type

None or pandas.DataFrame

digitize_exaggerated(fraction=0.05, absolute=8, inplace=True, return_mask=False)[source]¶

Merge the exaggerated values into the original digitized result

Parameters

fraction (float between 0 and 1) – The fraction under which the exaggerated data should be used. Set this to 0 to ignore it.
absolute (int) – The absolute value under which the exaggerated data should be used. Set this to 0 to ignore it.
inplace (bool) – If True (default), the full_df attribute is updated. Otherwise a DataFrame is returned
return_mask (bool) – If True, a boolean 2D array is returned indicating where the exaggerations have been used

Returns

pandas.DataFrame or None – If inplace is False, the digitized result. Otherwise, if return_mask is True, the mask where the exaggerated results have been used. Otherwise None
pandas.DataFrame, optionally – If inplace is False and return_mask is True, a pandas.DataFrame containing the boolean mask where the exaggerated results have been used. Otherwise, this is skipped

disable_label_selection(*args, **kwargs)[source]¶

Disable the label selection

This will disconnect the pick_event and remove the selection images

Parameters: remove (bool) – Whether to remove the selection image from the plot. If None, the _remove attribute is used

See also

enable_label_selection(), remove_selected_labels()

draw_figure()[source]¶: Draw the matplotlib fig and the magni figure

end_column_selection()[source]¶: End the column selection and rmove the artists

estimated_column_starts(threshold=None)[source]¶

The estimated column starts as numpy.ndarray.

We assume a new column a pixel column $i$ if

the previous pixel column $i-1$ did not contain any data ($D(i-1) = 0$)
THE amount of data points doubled compared to $i-1$ ($D(i) geq 2cdot D(i-1)$)
the amount of data points steadily increases within the next few columns to a value twice as large as the previous column ($D(i+n) geq 2cdot D(i-1)$ with $n>0$ and $D(i+j) geq D(i)$ for all $0 < j geq n$)

Each potential column starts must also be covered by a given threshold.

Parameters: threshold (float between 0 and 1) – The fraction that has to be covered to assume a valid column start. By default, 0.1 (i.e. 10 percent)
Returns: The starts for each column
Return type: np.ndarray

property exaggerated_reader¶: The reader that represents the exaggerations

property extent¶: The extent of the plot_im

property fig¶: The matplotlib figure of the ax

find_potential_samples(col, min_len=None, max_len=None, filter_func=None)[source]¶

Find potential samples in an array

This method finds extrema in an array and returns the indices where the extremum might be. The algorithm thereby filters out obstacles by first going over the array, making sure, that there is a change of sign in the slope in the found extremum, and if not, ignores it and flattens it out.

Parameters

col (int) – The column for which to find the extrema
min_len (int) – The minimum length of one extremum. If the width of the interval where we found an extrumum is smaller than that, the extremum is ignored. If None, this parameter does not have an effect (i.e. min_len=1).
max_len (int) – The maximum length of one extremum. If the width of the interval where we found an extrumum is greater than that, the extremum is ignored. If None, this parameter does not have an effect.
filter_func (function) – A function to filter the extreme. It must accept one argument which is a list of integers representing the indices of the extremum in a

Returns

list of list of int of shape (N, 2) – The list of N extremum locations. Each tuple in this list represents an interval a where one extremum might be located
list of list of int – The excluded extremum locations that are ignored because we could not find a change of sign in the slope.

See also

find_samples()

find_samples(min_fract=None, pixel_tol=5, *args, **kwargs)[source]¶

Find the samples in the diagram

This function finds the samples using the find_potential_samples() function. It combines the found extrema from all columns and estimates the exact location using an interpolation of the slope

Parameters

min_fract (float) – The minimum fraction between 0 and 1 that two bars have to overlap such that they are considered as representing the same sample. If None, the min_fract attribute is used
min_len (int) – The minimum length of one extremum. If the width of the interval where we found an extrumum is smaller than that, the extremum is ignored. If None, this parameter does not have an effect (i.e. min_len=1).
max_len (int) – The maximum length of one extremum. If the width of the interval where we found an extrumum is greater than that, the extremum is ignored. If None, this parameter does not have an effect.
filter_func (function) – A function to filter the extreme. It must accept one argument which is a list of integers representing the indices of the extremum in a

Returns

pandas.DataFrame – The x- and y-locations of the samples. The index is the y-location, the columns are the columns in the full_df.
pandas.DataFrame – The rough locations of the samples. The index is the y-location of the columns, the values are lists of the potential sample locations.

found_extrema_per_row()[source]¶

Calculate how many columns have a potential sample in each pixel row

Returns: A series with one entry per pixel row. The values are the number of columns in the diagram that have a potential sample noted in the rough_locs
Return type: pandas.Series

classmethod from_dataset(ds, *args, **kwargs)[source]¶

Create a new DataReader from a xarray.Dataset

Parameters

ds (xarray.Dataset) – The dataset that has been stored with the to_dataset() method
*args,**kwargs – Any other arguments passed to the DataReader constructor

Returns

The reader recreated from ds

Return type

DataReader

property full_df¶: The full pandas.DataFrame of the digitized image

get_bbox_for_cols(columns, x0, y0, width, height)[source]¶

Get the boundary boxes for the columns of this reader in the results plot

This method is used by the plot_results() method to get the Bbox for a psy_strat.stratplot.StratGroup grouper

Parameters

columns (list of int) – The column numbers to use
x0 (float) – The left boundary of the larger Bbox of the stratigraphic diagram
y0 (int) – The upper boundary of the larger Bbox of the stratigraphic diagram
width (float) – The width of the final axes between 0 and 1
height (float) – The height of the final axis between 0 and 1

Returns

The boundary box for the given columns in the matplotlib figure

Return type

matplotlib.transforms.Bbox

See also

plot_results()

get_binary_for_col(col)[source]¶: Get the binary array for a specific column

get_cross_column_features(min_px=50)[source]¶

Get features that are contained in two or more columns

Parameters: min_px (int) – The number of pixels that have to be contained in each column
Returns: The 2D boolean mask with the same shape as the binary array that is True if a data pixel is considered as to belong to a cross column feature
Return type: np.ndarray of dtype bool

get_disconnected_parts(fromlast=5, from0=10, cross_column=False)[source]¶

Identify parts in the binary data that are not connected

Parameters

fromlast (int) – A pixel x1 > x0 is considered as disconnected, if it is at least x1 - x0 >= fromlast. If this is 0, it is ignored and only from0 is considered.
from0 (int) – A pixel is considered as disconnected if it is more than from0 pixels away from the column start. If this is 0, it is ignored and only fromlast is considered
cross_column (bool) – If False, disconnected features are only marked in the column where the disconnection has been detected. Otherwise the entire feature is marked

Returns

The 2D boolean mask with the same shape as the binary array that is True if a data pixel is considered as to be disconnected

Return type

np.ndarray of dtype bool

get_labeled_array()[source]¶: Create a connectivity-based labeled array of the binary data

get_occurences()[source]¶: Extract the positions of the occurences from the selection

get_parts_at_column_ends(npixels=2)[source]¶

Identify parts in the binary data that touch the next column

Parameters: npixels (int) – If a data pixel is less than npixels away from the column end, it is considered to be at the column end and marked
Returns: A boolean mask with the same shape as the binary data that is True where a pixel is considered to be at the column end
Return type: np.ndarray of dtype bool

get_reader_for_col(col)[source]¶

Get the reader for a specific column

Parameters: col (int) – The column of interest
Returns: Either the reader or None if no reader could be found
Return type: DataReader or None

get_surrounding_slopes(indices, arr)[source]¶

hline_locs = None¶: list or floats. The indexes of horizontal lines

image = None¶: PIL.Image.Image of the diagram part with mode RGBA

image_array()[source]¶: The RGBA values of the colored image

is_exaggerated = 0¶: Exaggeration factor that is not 0 if this reader represents exaggeration plots

is_obstacle(indices, arr)[source]¶: Check whether the found extrema is only an obstacle of the picture

property iter_all_readers¶: Iter through the parent reader and it’s children

label_arrs = ['binary', 'labels', 'image_array']¶

labels = None¶: A connectivity-based labeled version of the binary data

magni = None¶: the straditize.magnifier.Magnifier for the ax

magni_background = None¶: White rectangle that represents the background of the binary image in the magnifier. This is only plotted by the parent reader

magni_color_plot_im = None¶

magni_plot_im = None¶: magnified plot_im

mark_as_exaggerations(mask)[source]¶

Mask the given array as exaggerated

Parameters: mask (2D np.ndarray of dtype bool) – A mask with the same shape as the binary array that is True if a cell should be interpreted as the visualization of an exaggeration

merge_close_samples(locs, rough_locs=None, pixel_tol=5)[source]¶

merged_binaries()[source]¶

Get the binary data from all children and merge them into one array

Returns: The binary image with the same shape as the binary data
Return type: np.ndarray of dtype int

merged_labels()[source]¶

Get the labeled binary data from all children merged into one array

Returns: The labeled binary image with the same shape as the label data
Return type: np.ndarray of dtype int

min_fract = 0.9¶: The minimum fraction of overlap for two bars to be considered as the same sample (see unique_bars())

nc_meta = {'binary': {'dims': ('reader', 'ydata', 'xdata'), 'long_name': 'Binary images for data readers'}, 'col_map': {'dims': 'column', 'long_name': 'Mapping from column to reader', 'units': 'reader_index'}, 'column_ends': {'dims': 'column', 'long_name': 'Ends of the columns', 'units': 'px'}, 'column_starts': {'dims': 'column', 'long_name': 'Start of the columns', 'units': 'px'}, 'exag_col_map': {'dims': 'column', 'long_name': 'Mapping from column to exaggerated reader', 'units': 'reader_index'}, 'full_data': {'dims': ('ydata', 'column'), 'long_name': 'Full digitized data', 'units': 'px'}, 'hline': {'long_name': 'Horizontal line location', 'units': 'px'}, 'is_exaggerated': {'dims': 'reader', 'long_name': 'Exaggeration factor'}, 'occurences': {'comments': 'The locations where the only an occurence of a taxa is highlighted without value', 'dims': ('occurence', 'xy'), 'long_name': 'taxa occurences'}, 'reader': {'dims': 'reader', 'long_name': 'index of the reader'}, 'reader_cls': {'dims': 'reader', 'long_name': 'The name of the class constructor'}, 'reader_image': {'dims': ('reader', 'ydata', 'xdata', 'rgba'), 'long_name': 'RGBA images for data readers', 'units': 'color'}, 'reader_mod': {'dims': 'reader', 'long_name': 'The module of the reader class'}, 'rough_locs': {'dims': ('sample', 'column', 'limit'), 'long_name': 'Rough locations for samples'}, 'sample': {'long_name': 'Sample location', 'units': 'px'}, 'samples': {'dims': ('sample', 'column'), 'long_name': 'Sample data', 'units': 'px'}, 'shifted': {'dims': 'column', 'long_name': 'Vertical shift per column', 'units': 'px'}, 'vline': {'long_name': 'Vertical line location', 'units': 'px'}, 'xaxis_translation': {'dims': ('reader', 'px_data', 'limit'), 'long_name': 'Pixel to data mapping for x-axis'}}¶: A mapping from variable name to meta information

new_child_for_cols(columns, cls, plot=True)[source]¶

Create a new child reader for specific columns

Parameters

columns (list of int) – The columns for the new reader
cls (type) – The DataReader subclass
plot (bool) – Plot the binary image

Returns

The new reader for the specified columns

Return type

instance of cls

property non_exaggerated_reader¶: The reader that represents the exaggerations

property num_labels¶: The maximum label in the labels array

property occurences¶

A set of tuples marking the position of an occurence

An occurence, motivated by pollen diagrams, just highlights the existence at a certain point without giving the exact value. In pollen diagrams, these are usually taxa that were found but have a percentage of less than 0.5 %.

This set of tuples (x, y) contains the coordinates of the occurences. The first value in each tuple is the y-value, the second the x-value.

See also

occurences_dict: A mapping from column number to occurences

property occurences_dict¶: A mapping from column number to an numpy array with the indices of an occurence

occurences_value = -9999¶: The value that is given to the occurences in the measurements

parent = None¶: Parent reader for this instance. Might be the instance itself

plot_background(ax=None, **kwargs)[source]¶

Plot a white layer below the plot_im

Parameters

ax (matplotlib.axes.Axes) – The matplotlib axes to plot on. If not given, the ax attribute is used
**kwargs – Any other keyword that is given to the matplotlib.pyplot.imshow() function

plot_color_image(ax=None, **kwargs)[source]¶

Plot the colored image on a matplotlib axes

Parameters

ax (matplotlib.axes.Axes) – The matplotlib axes to plot on. If not given, the ax attribute is used
**kwargs – Any other keyword that is given to the matplotlib.pyplot.imshow() function

plot_full_df(ax=None, *args, **kwargs)[source]¶

Plot the lines for the digitized diagram

Parameters

ax (matplotlib.axes.Axes) – The matplotlib axes to plot on
*args,**kwargs – Any other argument and keyword argument that is passed to the matplotlib.pyplot.plot() function

plot_im = None¶: the matplotlib image artist

plot_image(ax=None, **kwargs)[source]¶

Plot the binary data image on a matplotlib axes

Parameters

ax (matplotlib.axes.Axes) – The matplotlib axes to plot on. If not given, the ax attribute is used and (if this is None, too) a new figure is created
**kwargs – Any other keyword that is given to the matplotlib.pyplot.imshow() function

plot_other_potential_samples(tol=1, already_found=None, *args, **kwargs)[source]¶

Plot potential samples that are not yet in the samples attribute

Parameters

tol (int) – The pixel tolerance for a sample. If the distance between a potential sample and all already existing sample is greater than tolerance, the potential sample will be plotted
already_found (np.ndarray) – The pixel rows of samples that have already been found. If not specified, the index of the sample_locs is used
excluded (bool) – If True, plot the excluded samples instead of the included samples (see the return values in find_potential_samples())
ax (matplotlib.axes.Axes) – The matplotlib axes to plot on
plot_kws (dict) – Any other keyword argument that is passed to the matplotlib.pyplot.plot() function. By default, this is equal to {'marker': '+'}
min_len (int) – The minimum length of one extremum. If the width of the interval where we found an extrumum is smaller than that, the extremum is ignored. If None, this parameter does not have an effect (i.e. min_len=1).
max_len (int) – The maximum length of one extremum. If the width of the interval where we found an extrumum is greater than that, the extremum is ignored. If None, this parameter does not have an effect.
filter_func (function) – A function to filter the extreme. It must accept one argument which is a list of integers representing the indices of the extremum in a

plot_potential_samples(excluded=False, ax=None, plot_kws={}, *args, **kwargs)[source]¶

Plot the ranges for potential samples

This method plots the rough locations of potential samples (see find_potential_samples()

Parameters

excluded (bool) – If True, plot the excluded samples instead of the included samples (see the return values in find_potential_samples())
ax (matplotlib.axes.Axes) – The matplotlib axes to plot on
plot_kws (dict) – Any other keyword argument that is passed to the matplotlib.pyplot.plot() function. By default, this is equal to {'marker': '+'}
min_len (int) – The minimum length of one extremum. If the width of the interval where we found an extrumum is smaller than that, the extremum is ignored. If None, this parameter does not have an effect (i.e. min_len=1).
max_len (int) – The maximum length of one extremum. If the width of the interval where we found an extrumum is greater than that, the extremum is ignored. If None, this parameter does not have an effect.
filter_func (function) – A function to filter the extreme. It must accept one argument which is a list of integers representing the indices of the extremum in a

plot_results(df, ax=None, fig=None, transformed=True)[source]¶

Plot the reconstructed diagram

This method plots the reconstructed diagram using the psy-strat module.

Parameters

df (pandas.DataFrame) – The data to plot. E.g. the sample_locs or the straditize.straditizer.Straditizer.final_df data
ax (matplotlib.axes.Axes) – The axes to plot on. If None, a new one is created inside the given fig
fig (matplotlib.figure.Figure) – The matplotlib figure to plot on. If not given, the current figure (see matplotlib.pyplot.gcf()) is used
transformed (bool) – If True, y-axes and x-axes have been translated (see the px2data_x() and px2data_y() methods)

Returns

psyplot.project.Project – The newly created psyplot project with the plotters
list of psy_strat.stratplot.StratGroup instances – The groupers for the different columns

plot_sample_hlines(ax=None, **kwargs)[source]¶

Plot one horizontal line per sample in the sample_locs

Parameters

ax (matplotlib.axes.Axes) – The matplotlib axes to plot on
*args,**kwargs – Any other keyword argument that is passed to the matplotlib.pyplot.hlines() function

plot_samples(ax=None, *args, **kwargs)[source]¶

Plot the diagram as lines reconstructed from the samples

Parameters

ax (matplotlib.axes.Axes) – The matplotlib axes to plot on
*args,**kwargs – Any other argument and keyword argument that is passed to the matplotlib.pyplot.plot() function

px2data_x(coord)[source]¶

Transform the pixel coordinates into data coordinates

Parameters: coord (1D np.ndarray) – The coordinate values in pixels
Returns: The numpy array starting from 0 with transformed coordinates
Return type: np.ndarray

Notes

Since the x-axes for stratographic plots are usually interrupted, the return values here are relative and therefore always start from 0

recognize_hlines(fraction=0.3, min_lw=1, max_lw=None, remove=False, **kwargs)[source]¶

Recognize horizontal lines in the plot and subtract them

This method removes horizontal lines in the data diagram, i.e. rows whose non-background cells cover at least the specified fraction of the row.

Parameters

fraction (float) – The fraction (between 0 and 1) that has to be covered to recognize a horizontal line
min_lw (int) – The minimum line width for a line
max_lw (int) – The maximum line width for a line or None if it should be ignored
remove (bool) – If True, they will be removed immediately, otherwise they are displayed using the enable_label_selection() method and can be removed through the remove_selected_labels() method

Other Parameters

``**kwargs`` – Additional keywords are parsed to the enable_label_selection() method in case remove is False

Notes

This method has to be called before the digitize() method!

recognize_vlines(fraction=0.3, min_lw=1, max_lw=None, remove=False, **kwargs)[source]¶

Recognize horizontal lines in the plot and subtract them

This method removes horizontal lines in the data diagram, i.e. rows whose non-background cells cover at least the specified fraction of the row.

Parameters

fraction (float) – The fraction (between 0 and 1) that has to be covered to recognize a horizontal line
min_lw (int) – The minimum line width for a line
max_lw (int) – The maximum line width for a line or None if it should be ignored
remove (bool) – If True, they will be removed immediately, otherwise they are displayed using the enable_label_selection() method and can be removed through the remove_selected_labels() method

Other Parameters

``**kwargs`` – Additional keywords are parsed to the enable_label_selection() method in case remove is False

Notes

This method should be called before the column starts are set

recognize_xaxes(fraction=0.3, min_lw=1, max_lw=None, remove=False, **kwargs)[source]¶

Recognize (and potentially remove) x-axes at bottom and top

Parameters

fraction (float) – The fraction (between 0 and 1) that has to be covered to recognize an x-axis
min_lw (int) – The minimum line width of an axis
max_lw (int) – Tha maximum line width of an axis. If not specified, it will be ignored
remove (bool) – If True, they will be removed immediately, otherwise they are displayed using the enable_label_selection() method and can be removed through the remove_selected_labels() method

recognize_yaxes(fraction=0.3, min_lw=0, max_lw=None, remove=False)[source]¶

Find (and potentially remove) y-axes in the image

Parameters

fraction (float) – The fraction (between 0 and 1) that has to be covered to recognize a y-axis
min_lw (int) – The minimum line width of an axis
max_lw (int) – Tha maximum line width of an axis. If not specified, the median if the axes widths is taken
remove (bool) – If True, they will be removed immediately, otherwise they are displayed using the enable_label_selection() method and can be removed through the remove_selected_labels() method

remove_in_children(arr, amask)[source]¶

Update the child reader images after having removed binary data

Calls the update_image() and update_rgba_image() methods for all children

remove_plots()[source]¶: Remove all plotted artists by this reader

reset_column_starts()[source]¶: Reset the column starts, full_df, shifted and occurences

reset_image(image, binary=False)[source]¶

Reset the image for this straditizer

Parameters

image (PIL.Image.Image) – The new image
binary (bool) – If True, then the image is considered as the binary image and the image attribute is not touched

reset_labels()[source]¶: Reset the labels array

reset_samples()[source]¶: Reset the samples

resize_axes(grouper, bounds)[source]¶

Resize the axes based on column boundaries

This method sets the x-limits for the different columns to the given bounds and resizes the axes

Parameters

grouper (psy_strat.stratplot.StratGroup) – The grouper that manages the plot
bounds (np.ndarray of shape (N, 2)) – The boundaries for the columns handled by the grouper

property rough_locs¶

The pandas.DataFrame with rough locations for the samples. It has one row per sample in the sample_locs dataframe and ncols * 2 columns, where ncols is the number of columns in the sample_locs.

If the potential sample sample_locs.iloc[i, col] ranges j to k (see the find_potential_samples() method), the cell at rough_locs.iloc[i, col * 2] specifies the first y-pixel (j) and rough_locs.iloc[i, col * 2 + 1] the last y-pixel (+1), i.e. k where this sample might be located

property sample_locs¶: The pandas.DataFrame with locations and values of the samples

samples_at_boundaries = True¶: a boolean flag that shall indicate if we assume that the first and last rows shall be a sample if they contain non-zero values

set_as_parent()[source]¶: Set this instance as the parent reader

set_hline_locs_from_selection(selection=None)[source]¶

Save the locations of horizontal lines

This methods takes every pixel row in the hline_locs attribute where at least 30% is selected. The digitize method will interpolate at these indices.

set_vline_locs_from_selection(selection=None)[source]¶

Save the locations of vertical lines

This methods takes every pixel column in the vline_locs attribute where at least 30% is selected.

shift_vertical(pixels, draw=True)[source]¶

Shift the columns vertically.

Parameters

pixels (list of floats) – The y-value for each column for which to shift the values. Note that theses values have to be greater than or equal to 0
draw (bool) – If True, the ax is drawn at the end

shifted = None¶: The number of pixels the columns have been shifted

show_cross_column_features(min_px=50, remove=False, **kwargs)[source]¶

Highlight and maybe remove cross column features

Parameters

min_px (int) – The number of pixels that have to be contained in each column
remove (bool) – If True, remove the data in the binary array, etc. If False, the enable_label_selection() method is envoked and the user can select the features to remove
select_all (bool) – If True and remove is False, all labels in arr will be selected and the given selection is ignored
selection (np.ndarray of dtype bool) – A boolean mask with the same shape as arr that is True where a pixel should be selected. If remove is True, only this mask will be used.
img (matplotlib image) – The image for the selection. If not provided, a new image is created
set_picker (bool) – If True, connect the matplotlib pick_event to the pick_label() method

show_disconnected_parts(fromlast=5, from0=10, remove=False, **kwargs)[source]¶

Highlight or remove disconnected parts

Parameters

%(DataReader.get_disconnected_parts.parameters.fromlast|from0)s –
%(DataReader._show_parts2remove.parameters.no_arr)s –

show_parts_at_column_ends(npixels=2, remove=False, **kwargs)[source]¶

Highlight or remove features that touch the column ends

Parameters

%(DataReader.get_parts_at_column_ends.parameters)s –
%(DataReader._show_parts2remove.parameters.no_arr)s –

show_small_parts(n=10, remove=False, **kwargs)[source]¶

Highlight and potentially remove small features in the image

Parameters

n (int) – The maximal size of a feature to be considered as small
remove (bool) – If True, remove the data in the binary array, etc. If False, the enable_label_selection() method is envoked and the user can select the features to remove
select_all (bool) – If True and remove is False, all labels in arr will be selected and the given selection is ignored
selection (np.ndarray of dtype bool) – A boolean mask with the same shape as arr that is True where a pixel should be selected. If remove is True, only this mask will be used.
img (matplotlib image) – The image for the selection. If not provided, a new image is created
set_picker (bool) – If True, connect the matplotlib pick_event to the pick_label() method

See also

skimage.morphology.remove_small_objects()

start_column_selection(use_all=False)[source]¶

Enable the user to select columns

Parameters: use_all (bool) – If True, all columns can be selected. Otherwise only the columns in the columns attribute can be selected

strat_plot_identifier = 'percentages'¶

static to_binary_pil(image, threshold=690)[source]¶

Convert an image to a binary

Parameters

image (PIL.Image.Image) – The RGBA image file
threshold (float) – If the multiplied RGB values in a cell are above the threshold, the cell is regarded as background and will be set to 0

Returns

The binary image of integer type

Return type

np.ndarray of ndim 2

to_dataset(ds=None)[source]¶

All the necessary data as a xarray.Dataset

Parameters: ds (xarray.Dataset) – The dataset in which to insert the data. If None, a new one will be created
Returns: Either the given ds or a new xarray.Dataset instance
Return type: xarray.Dataset

static to_grey_pil(image, threshold=690)[source]¶

Convert an image to a greyscale image

Parameters

image (PIL.Image.Image) – The RGBA image file
threshold (float) – If the multiplied RGB values in a cell are above the threshold, the cell is regarded as background and will be set to 0

Returns

The greyscale image of integer type

Return type

np.ndarray of ndim 2

unique_bars(min_fract=None, asdict=True, *args, **kwargs)[source]¶

Estimate the unique bars

This method puts the overlapping bars of the different columns together

Parameters

min_fract (float) – The minimum fraction between 0 and 1 that two bars have to overlap such that they are considered as representing the same sample. If None, the min_fract attribute is used
asdict (bool) – If True, dictionaries are returned

Returns

A list of the bar locations. If asdict is True (default), each item in the returned list is a dictionary whose keys are the column indices and whose values are the indices for the corresponding column. Otherwise, a list of _Bar objects is returned

Return type

list

update_image(arr, amask)[source]¶

Update the image after having removed binary data

This method is in the remove_callbacks mapping and is called after a pixel has been removed from the binary data. It mainly just calls the reset_labels() method and updates the plot

update_rgba_image(arr, mask)[source]¶

Update the RGBA image from the given 3D-array

This method is in the remove_callbacks mapping and is called after a pixel has been removed from the binary data. It updates the image attribute

Parameters

arr (3D np.ndarray of dtype float) – The image array
mask (boolean mask of the same shape as arr) – The mask of features that shall be set to 0 in arr

vline_locs = None¶: list or floats. The indexes of vertical lines

xaxis_data = None¶

property xaxis_px¶: The x indices in column pixel coordinates that are used for x-axes translations

class straditize.binary.LineDataReader(image, ax=None, extent=None, plot=True, children=[], parent=None, magni=None, plot_background=False, binary=None)[source]¶

Bases: straditize.binary.DataReader

A data reader for digitizing line diagrams

This class does not have a significantly different behaviour than the base DataReader class, but might be improved with more specific features in the future

Parameters

image (PIL.Image.Image) – The image of the diagram
ax (matplotlib.axes.Axes) – The matplotlib axes to plot on
extent (list) – List of four number specifying the extent of the image in it’s source. This extent will be used for the call of matplotlib.pyplot.imshow()
children (list of DataReader) – Child readers for other columns in case the newly created instance is the parent reader
parent (DataReader) – The parent reader.
magni (straditize.magnifier.Magnifier) – The magnifier for the given ax
plot_background (bool) – If True (and plot is True), a white, opaque are is plotted below the plot_im
binary (None) – The binary version of the given image. If not provided, the to_binary_pil() method is used with the given image

Attributes

strat_plot_identifier

str(object=’’) -> str

strat_plot_identifier = 'default'¶

class straditize.binary.RoundedBarDataReader(*args, **kwargs)[source]¶

Bases: straditize.binary.BarDataReader

A bar data reader that can be used for rounded bars

Parameters: tolerance (int) – If x0 is the value in a pixel row y and x1 the value in the next pixel row y+1, then the two pixel rows are considered as belonging to different bars if abs(x1 - x0) > tolerance (see the get_bars() method and the tolerance attribute)

Attributes

tolerance

int([x]) -> integer

tolerance = 10¶

straditize.binary.groupby_arr(arr)[source]¶

Groupby a boolean array

Parameters

arr (np.ndarray of ndim 1 of dtype bool) – An array that can be converted to a numeric array

Returns

keys (np.ndarrayrdi) – The keys in the array
starts (np.ndarray) – The index of the first element that correspond to the key in keys

straditize.binary.only_parent(func)[source]¶: Call the given func only from the parent reader