Multipanel Categorical

Overview
Function
- Data spreadsheet
- Template files
Note

Overview

This python function is useful for plotting multipanel plots when one variable is categorical.

Function

PyCMLutil.plots.multi_panel_cat.multi_panel_cat_from_flat_data(
        data_file_string = [],
        excel_sheet = 'Sheet1',
        pandas_data = [],
        template_file_string=[],
        output_image_file_string = [],
        dpi = 300)

Parameters:

Key	Type	Comment
data_file_string	str, optional	Path to the data file (either in format of csv or xlsx). The default is [].
excel_sheet	str, optional	Excel sheet where the data are stored. The default is Sheet1.
pandas_data	Pandas DataFrame, optional	DataFrame containing the data. The default is [].
template_file_string	str, optional	Path to the .json structure file. The default is [].
output_image_file_string	str, optional	Path where the output plot is saved. The default is [].
dpi	int, optional	Image resolution. The default is 300.

Returns:

Key	Comment
figure	Handle to the produced pyplot figure
ax	Handle to an array of the pyplot axes.

Data spreadsheet

Similar to multipanel plot function for numerical, multipanel categorical plot reads data in two-dimensional tabular structure stored either in excel spreadsheets or Pandas DataFrame. The only difference in this function is that one of the variables must be categorical.

Template files

Again, multipanel categorical plot function uses similar JSON format for the template file. The only differences are explaind in below.

At the moment, this function only supports three type of seaborn plots for categorical data as follows. The following optional formatting parameters for each type of categorical plots can be set by the user. Otherwise, default values will be used.

stripplot():

Following parameteres can be set by the user based on the desription provided by seaborn:

Parameters	Type	Comment
marker	str, optional	Marker shape based on the markers list.
jitter	float, optional	Amount of jitter (only along the categorical axis) to apply. This can be useful when you have many points and they overlap, so that it is easier to see the distribution. You can specify the amount of jitter (half the width of the uniform random variable support), or just use True for a good default.
dodge	bool, optional	When using hue nesting, setting this to True will separate the strips for different hue levels along the categorical axis. Otherwise, the points for each level will be plotted on top of each other.
field_palette	palette name, optional	Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors. We highly recommend to pick one of the qualitative colormaps offered by matplotlib.
marker_ec	str, optional	edge color of markers.
marker_elw	str, optional	edge line width of markers.
marker_size	str, optional	marker size.

To change the default values for any of these parameters, user can do that within the series sub-section under panels section in the template file as is explained for multipanel numerical plots.
For instance, in below example, jitter and dodge are manually set by the user. Wherease, for the second plot, deafult values are being used by not defining a new value.

 "series":
         [
             {
                 "field": "LVESVi",
                 "style": "strip",
                 "field_label": "ESVi",
                 "jitter":0.5,
                 "dodge":true        
             },
             {
                 "field" : "LVEDVi",
                 "style" : "strip",
                 "field_label": "EDVi"
             }
         ]

Previous method might be tough when a large number of stripplots are difined within a multipanel categorical figure. Alternatively, user can globally change some of the parameteres described in above by defining a section called strip_formatting in the template file. In this way, all defined changes would be applied to all stripplot within the figure.

Default values for strip_formatting are:

 "strip_formatting":
     {
         "jitter": true,
         "dodge": true,
         "marker_ec": None,
         "marker_elw": 0.5,
         "marker_size": 8,
         "marker_list": ['o','^','s','x','*']
     }

boxplot():

Following parameteres can be set by the user based on the desription provided by seaborn:

Parameters	Type	Comment
color_saturation	float, optional	Proportion of the original saturation to draw colors at. Large patches often look better with slightly desaturated colors, but set this to 1 if you want the plot colors to perfectly match the input color spec.
dodge	bool, optional	When using hue nesting, setting this to True will shift the boxplot along the categorical axis. Otherwise, the points for each level will be plotted on top of each other.
field_palette	palette name, optional	Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors. We highly recommend to pick one of the qualitative colormaps offered by matplotlib.
linewidth	float, optional	Width of the gray lines that frame the plot elements.
box_width	float, optional	Width of a full element when not using hue nesting, or width of all the elements for one level of the major grouping variable.

Similar to stripplot, the default values of these parameters can be changed by the user where the boxplot is being defined under series sub-section in the template file. For instance, in below example, field_palette is manually set by the user. ````javascript

“series”: [ { “field”: “SVi”, “style”: “box”, “field_palette”: “pastel” } ]

 2. Again, this might be complicated when you are dealing with a large number of **boxplots**. Alternatively, user can globally change some of the parameteres described in above by defining a section called `box_formatting` in the **template** file. In this way, all defined changes in these parameteres would be applied to all **boxplots** within the figure. 

 Default values for `box_formatting` are:

 ````javascript

 "box_formatting":
     {
         "color_saturation": 1,
         "dodge": true,
         "box_width": 0.75,
         "linewidth": 1
     }

pointplot():

Following parameteres can be set by the user based on the desription provided by seaborn:

Parameters	Type	Comment
confidence_int	float or “sd” or None, optional	Size of confidence intervals to draw around estimated values. If “sd”, skip bootstrapping and draw the standard deviation of the observations. If None, no bootstrapping will be performed, and error bars will not be drawn.
estimator	str, optional	Name of statistical function to estimate within each categorical bin, e.g., `"mean"` implies numpy.mean
field_palette	palette name, optional	Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors. We highly recommend to pick one of the qualitative colormaps offered by matplotlib.
dodge	bool, optional	When using hue nesting, setting this to True will shift the pointplot along the categorical axis. Otherwise, the points for each level will be plotted on top of each other.
linestyle	str, optional	Line styles to use for each of the hue levels.
errwidth	float, optional	Thickness of error bar lines (and caps).
capsize	float, optional	Width of the “caps” on error bars.
markers	str, optional	Marker shape based on the markers list.
join	bool, optional	If True, lines will be drawn between point estimates at the same hue level.

The default values of these parameteres can be altered for each pointplot within a subplot/ panel by specify them in series sub-section. For example, in the following panel, first pointplot uses statistical function of mean as the estimator, while the second one uses median.

 "series":
             [
                 {
                     "field": "SVi",
                     "style": "point",
                     "estimator": "mean"
                 },
                 {
                     "field": "SVi",
                     "style": "point",
                     "estimator": "median"
                 }
             ]

To globally adjust these parameters, user needs to define a section called point_formatting and assign the new values in there.

Default values for point_formatting are:

 "point_formatting":
     {
         "confidence_int": "sd",
         "estimator": "mean",
         "linestyle": "-",
         "dodge": true,
         "join": true,
         "errwidth": None,
         "capsize": None
     }

Note

hue, hue_order, and order of categorical data can be defined in two manners:

They can be assigned globally to all subplots/panels via defining global_hue, global_hue_order, and order, respectively in the x_display section of the template file as follows:

   "x_display":
   {
       "global_x_field": "global x-axis variable",
       "label": "global x-axis label",
       "order": ["value_1","value_2"],
       "global_hue": "global hue variable",
       "global_hue_order": ["hue_value_1","hue_value_2"]
   }

Or they can be defined defined for each subplot independent of other panels via defining hue, hue_oreder, and x_order at each panel data in the template file. For instance:

   {
       "column": 2,
       "hue": "valvular_disorder",
       "hue_order": ["AS","MR"],
       "x_order": ["control","patients"],
       "y_info":
       {
           "label":"Ejection\nfraction",
           "scaling_type": "close_fit",
           "series":
           [
               {
                   "field": "EF",
                   "style": "box",
                   "field_palette": "Set2"
               }
           ]
       }
   }

Now try demos to ger more familiar.