Multipanel Categorical
Overview
This python function is useful for plotting multipanel plots when one variable is categorical.
Function
PyCMLutil.plots.multi_panel_cat.multi_panel_cat_from_flat_data(
data_file_string = [],
excel_sheet = 'Sheet1',
pandas_data = [],
template_file_string=[],
output_image_file_string = [],
dpi = 300)
Parameters:
Key | Type | Comment |
---|---|---|
data_file_string | str, optional | Path to the data file (either in format of csv or xlsx). The default is []. |
excel_sheet | str, optional | Excel sheet where the data are stored. The default is Sheet1. |
pandas_data | Pandas DataFrame, optional | DataFrame containing the data. The default is []. |
template_file_string | str, optional | Path to the .json structure file. The default is []. |
output_image_file_string | str, optional | Path where the output plot is saved. The default is []. |
dpi | int, optional | Image resolution. The default is 300. |
Returns:
Key | Comment |
---|---|
figure | Handle to the produced pyplot figure |
ax | Handle to an array of the pyplot axes. |
Data spreadsheet
Similar to multipanel plot function for numerical, multipanel categorical plot reads data in two-dimensional tabular structure stored either in excel spreadsheets or Pandas DataFrame. The only difference in this function is that one of the variables must be categorical.
Template files
Again, multipanel categorical plot function uses similar JSON format for the template file. The only differences are explaind in below.
At the moment, this function only supports three type of seaborn plots for categorical data as follows. The following optional formatting parameters for each type of categorical plots can be set by the user. Otherwise, default values will be used.
-
Following parameteres can be set by the user based on the desription provided by seaborn:
Parameters Type Comment marker str, optional Marker shape based on the markers list. jitter float, optional Amount of jitter (only along the categorical axis) to apply. This can be useful when you have many points and they overlap, so that it is easier to see the distribution. You can specify the amount of jitter (half the width of the uniform random variable support), or just use True for a good default. dodge bool, optional When using hue nesting, setting this to True will separate the strips for different hue levels along the categorical axis. Otherwise, the points for each level will be plotted on top of each other. field_palette palette name, optional Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors. We highly recommend to pick one of the qualitative colormaps offered by matplotlib. marker_ec str, optional edge color of markers. marker_elw str, optional edge line width of markers. marker_size str, optional marker size. - To change the default values for any of these parameters, user can do that within the
series
sub-section underpanels
section in the template file as is explained for multipanel numerical plots.
For instance, in below example,jitter
anddodge
are manually set by the user. Wherease, for the second plot, deafult values are being used by not defining a new value."series": [ { "field": "LVESVi", "style": "strip", "field_label": "ESVi", "jitter":0.5, "dodge":true }, { "field" : "LVEDVi", "style" : "strip", "field_label": "EDVi" } ]
- Previous method might be tough when a large number of stripplots are difined within a multipanel categorical figure. Alternatively, user can globally change some of the parameteres described in above by defining a section called
strip_formatting
in the template file. In this way, all defined changes would be applied to all stripplot within the figure.
Default values for
strip_formatting
are:"strip_formatting": { "jitter": true, "dodge": true, "marker_ec": None, "marker_elw": 0.5, "marker_size": 8, "marker_list": ['o','^','s','x','*'] }
- To change the default values for any of these parameters, user can do that within the
-
Following parameteres can be set by the user based on the desription provided by seaborn:
Parameters Type Comment color_saturation float, optional Proportion of the original saturation to draw colors at. Large patches often look better with slightly desaturated colors, but set this to 1 if you want the plot colors to perfectly match the input color spec. dodge bool, optional When using hue nesting, setting this to True will shift the boxplot along the categorical axis. Otherwise, the points for each level will be plotted on top of each other. field_palette palette name, optional Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors. We highly recommend to pick one of the qualitative colormaps offered by matplotlib. linewidth float, optional Width of the gray lines that frame the plot elements. box_width float, optional Width of a full element when not using hue nesting, or width of all the elements for one level of the major grouping variable. - Similar to stripplot, the default values of these parameters can be changed by the user where the boxplot is being defined under
series
sub-section in the template file. For instance, in below example,field_palette
is manually set by the user. ````javascript
“series”: [ { “field”: “SVi”, “style”: “box”, “field_palette”: “pastel” } ]
2. Again, this might be complicated when you are dealing with a large number of **boxplots**. Alternatively, user can globally change some of the parameteres described in above by defining a section called `box_formatting` in the **template** file. In this way, all defined changes in these parameteres would be applied to all **boxplots** within the figure. Default values for `box_formatting` are: ````javascript "box_formatting": { "color_saturation": 1, "dodge": true, "box_width": 0.75, "linewidth": 1 }
- Similar to stripplot, the default values of these parameters can be changed by the user where the boxplot is being defined under
-
Following parameteres can be set by the user based on the desription provided by seaborn:
Parameters Type Comment confidence_int float or “sd” or None, optional Size of confidence intervals to draw around estimated values. If “sd”, skip bootstrapping and draw the standard deviation of the observations. If None, no bootstrapping will be performed, and error bars will not be drawn. estimator str, optional Name of statistical function to estimate within each categorical bin, e.g., "mean"
implies numpy.meanfield_palette palette name, optional Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors. We highly recommend to pick one of the qualitative colormaps offered by matplotlib. dodge bool, optional When using hue nesting, setting this to True will shift the pointplot along the categorical axis. Otherwise, the points for each level will be plotted on top of each other. linestyle str, optional Line styles to use for each of the hue levels. errwidth float, optional Thickness of error bar lines (and caps). capsize float, optional Width of the “caps” on error bars. markers str, optional Marker shape based on the markers list. join bool, optional If True, lines will be drawn between point estimates at the same hue level. - The default values of these parameteres can be altered for each pointplot within a subplot/ panel by specify them in
series
sub-section. For example, in the following panel, first pointplot uses statistical function of mean as the estimator, while the second one uses median.
"series": [ { "field": "SVi", "style": "point", "estimator": "mean" }, { "field": "SVi", "style": "point", "estimator": "median" } ]
- To globally adjust these parameters, user needs to define a section called
point_formatting
and assign the new values in there.
Default values for
point_formatting
are:"point_formatting": { "confidence_int": "sd", "estimator": "mean", "linestyle": "-", "dodge": true, "join": true, "errwidth": None, "capsize": None }
- The default values of these parameteres can be altered for each pointplot within a subplot/ panel by specify them in
Note
- hue, hue_order, and order of categorical data can be defined in two manners:
- They can be assigned globally to all subplots/panels via defining
global_hue
,global_hue_order
, andorder
, respectively in thex_display
section of the template file as follows:"x_display": { "global_x_field": "global x-axis variable", "label": "global x-axis label", "order": ["value_1","value_2"], "global_hue": "global hue variable", "global_hue_order": ["hue_value_1","hue_value_2"] }
- Or they can be defined defined for each subplot independent of other panels via defining
hue
,hue_oreder
, andx_order
at each panel data in the template file. For instance:{ "column": 2, "hue": "valvular_disorder", "hue_order": ["AS","MR"], "x_order": ["control","patients"], "y_info": { "label":"Ejection\nfraction", "scaling_type": "close_fit", "series": [ { "field": "EF", "style": "box", "field_palette": "Set2" } ] } }
Now try demos to ger more familiar.
- They can be assigned globally to all subplots/panels via defining