API Reference¶
This reference provides detailed documentation for all modules, classes, and methods in the current release of Neurolearn.
nltools.data
: Data Types¶

class
nltools.data.
Brain_Data
(data=None, Y=None, X=None, mask=None, output_file=None, **kwargs)[source]¶ Brain_Data is a class to represent neuroimaging data in python as a vector rather than a 3dimensional matrix.This makes it easier to perform data manipulation and analyses.
Parameters:  data – nibabel data instance or list of files
 Y – Pandas DataFrame of training labels
 X – Pandas DataFrame Design Matrix for running univariate models
 mask – binary nifiti file to mask brain data
 output_file – Name to write out to nifti file
 **kwargs – Additional keyword arguments to pass to the prediction algorithm

align
(target, method='procrustes', n_features=None, axis=0, *args, **kwargs)[source]¶ Align Brain_Data instance to target object
Can be used to hyperalign source data to target data using Hyperalignemnt from Dartmouth (i.e., procrustes transformation; see nltools.stats.procrustes) or Shared Response Model from Princeton (see nltools.external.srm). (see nltools.stats.align for aligning many data objects together). Common Model is shared response model or centered target data.Transformed data can be back projected to original data using Tranformation matrix.
Examples
 Hyperalign using procrustes transform:
 out = data.align(target, method=’procrustes’)
 Align using shared response model:
 out = data.align(target, method=’probabilistic_srm’, n_features=None)
 Project aligned data into original data:
 original_data = np.dot(out[‘transformed’].data,out[‘transformation_matrix’].T)
Parameters:  target – (Brain_Data) object to align to.
 method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]
 n_features – (int) number of features to align to common space. If None then will select number of voxels
 axis – (int) axis to align on
Returns:  (dict) a dictionary containing transformed object,
transformation matrix, and the shared response matrix
Return type: out

append
(data, **kwargs)[source]¶ Append data to Brain_Data instance
Parameters:  data – (Brain_Data) Brain_Data instance to append
 kwargs – optional inputs to Design_Matrix append
Returns: (Brain_Data) new appended Brain_Data instance
Return type: out

apply_mask
(mask)[source]¶ Mask Brain_Data instance
Parameters: mask – (Brain_Data or nifti object) mask to apply to Brain_Data object Returns: (Brain_Data) masked Brain_Data object Return type: masked

astype
(dtype)[source]¶ Cast Brain_Data.data as type.
Parameters: dtype – datatype to convert Returns: Brain_Data instance with new datatype Return type: Brain_Data

bootstrap
(function, n_samples=5000, save_weights=False, n_jobs=1, random_state=None, *args, **kwargs)[source]¶ Bootstrap a Brain_Data method.
Example Useage: b = dat.bootstrap(‘mean’, n_samples=5000) b = dat.bootstrap(‘predict’, n_samples=5000, algorithm=’ridge’) b = dat.bootstrap(‘predict’, n_samples=5000, save_weights=True)Parameters:  function – (str) method to apply to data for each bootstrap
 n_samples – (int) number of samples to bootstrap with replacement
 save_weights – (bool) Save each bootstrap iteration (useful for aggregating many bootstraps on a cluster)
 n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.Returns:
output: summarized studentized bootstrap output

decompose
(algorithm='pca', axis='voxels', n_components=None, *args, **kwargs)[source]¶ Decompose Brain_Data object
Parameters:  algorithm – (str) Algorithm to perform decomposition types=[‘pca’,’ica’,’nnmf’,’fa’]
 axis – dimension to decompose [‘voxels’,’images’]
 n_components – (int) number of components. If None then retain as many as possible.
Returns: a dictionary of decomposition parameters
Return type: output

detrend
(method='linear')[source]¶ Remove linear trend from each voxel
Parameters: type – (‘linear’,’constant’, optional) type of detrending Returns: (Brain_Data) detrended Brain_Data instance Return type: out

distance
(method='euclidean', **kwargs)[source]¶ Calculate distance between images within a Brain_Data() instance.
Parameters: method – (str) type of distance metric (can use any scikit learn or sciypy metric) Returns: (Adjacency) Outputs a 2D distance matrix. Return type: dist

extract_roi
(mask, method='mean')[source]¶ Extract activity from mask
Parameters:  mask – (nifiti) nibabel mask can be binary or numbered for different rois
 method – type of extraction method (default=mean) NOTE: Only mean currently works!
Returns: mean within each ROI across images
Return type: out

filter
(sampling_freq=None, high_pass=None, low_pass=None, **kwargs)[source]¶ Apply 5th order butterworth filter to data. Wraps nilearn functionality. Does not default to detrending and standardizing like nilearn implementation, but this can be overridden using kwargs.
Parameters:  sampling_freq – sampling freq in hertz (i.e. 1 / TR)
 high_pass – high pass cutoff frequency
 low_pass – low pass cutoff frequency
 kwargs – other keyword arguments to nilearn.signal.clean
Returns: Filtered Brain_Data instance
Return type:

find_spikes
(global_spike_cutoff=3, diff_spike_cutoff=3)[source]¶ Function to identify spikes from Time Series Data
Parameters:  global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.
 diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.
Returns: pandas dataframe with spikes as indicator variables

icc
(icc_type='icc2')[source]¶  Calculate intraclass correlation coefficient for data within
 Brain_Data class
ICC Formulas are based on: Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, 86(2), 420.
icc1: x_ij = mu + beta_j + w_ij icc2/3: x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij
Code modifed from nipype algorithms.icc https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py
Parameters: icc_type – type of icc to calculate (icc: voxel random effect, icc2: voxel and column random effect, icc3: voxel and column fixed effect) Returns: (np.array) intraclass correlation coefficient Return type: ICC

iplot
(threshold=0, surface=False, anatomical=None, **kwargs)[source]¶ Create an interactive brain viewer for the current brain data instance.
Parameters:  threshold – (float/str) twosided threshold to initialize the visualization, maybe be a percentile string; default 0
 surface – (bool) whether to create a surfacebased plot; default False
 anatomical – nifti image or filename to overlay
 kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf
Returns: interactive brain viewer widget

multivariate_similarity
(images, method='ols')[source]¶ Predict spatial distribution of Brain_Data() instance from linear combination of other Brain_Data() instances or Nibabel images
Parameters:  self – Brain_Data instance of data to be applied
 images – Brain_Data instance of weight map
Returns:  dictionary of regression statistics in Brain_Data
instances {‘beta’,’t’,’p’,’df’,’residual’}
Return type: out

plot
(limit=5, anatomical=None, view='axial', threshold_upper=None, threshold_lower=None, **kwargs)[source]¶ Create a quick plot of self.data. Will plot each image separately
Parameters:  limit – (int) max number of images to return
 anatomical – (nifti, str) nifti image or file name to overlay
 view – (str) ‘axial’ for limit number of axial slices; ‘glass’ for orthoview glass brain; ‘mni’ for multislice view mni brain; ‘full’ for both glass and mni views
 threshold_upper – (str/float) threshold if view is ‘glass’, ‘mni’, or ‘full’
 threshold_lower – (str/float)threshold if view is ‘glass’, ‘mni’, or ‘full’

predict
(algorithm=None, cv_dict=None, plot=True, **kwargs)[source]¶ Run prediction
Parameters:  algorithm – Algorithm to use for prediction. Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’
 cv_dict – Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y
 plot – Boolean indicating whether or not to create plots.
 **kwargs – Additional keyword arguments to pass to the prediction algorithm
Returns: a dictionary of prediction parameters
Return type: output

predict_multi
(algorithm=None, cv_dict=None, method='searchlight', rois=None, process_mask=None, radius=2.0, scoring=None, n_jobs=1, verbose=0, **kwargs)[source]¶ Perform multiregion prediction. This can be a searchlight analysis or multiroi analysis if provided a Brain_Data instance with labeled nonoverlapping rois.
Parameters:  algorithm (string) – algorithm to use for prediction Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’
 cv_dict – Type of cross_validation to use. Default is 3fold. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y
 method (string) – one of ‘searchlight’ or ‘roi’
 rois (string/nltools.Brain_Data) – nifti file path or Brain_data instance containing nonoverlapping regionsofinterest labeled by integers
 process_mask (nib.Nifti1Image/nltools.Brain_Data) – mask to constrain where to perform analyses; only applied if method = ‘searchlight’
 radius (float) – radius of searchlight in mm; default 2mm
 scoring (function) – callable scoring function; see sklearn documentation; defaults to estimator’s default scoring function
 n_jobs (int) – The number of CPUs to use to do permutation; default 1 because this can be very memory intensive
 verbose (int) – whether parallelization progress should be printed; default 0
Returns: image of results
Return type: output

randomise
(n_permute=5000, threshold_dict=None, return_mask=False, **kwargs)[source]¶ Run massunivariate regression at each voxel with inference performed via permutation testing ala randomise in FSL. Operates just like .regress(), but intended to be used for secondlevel analyses.
Parameters:  n_permute (int) – number of permutations
 threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}
 return_mask – (bool) optionally return the thresholding mask
Returns: dictionary of maps for betas, tstats, and pvalues
Return type: out

regions
(min_region_size=1350, extract_type='local_regions', smoothing_fwhm=6, is_mask=False)[source]¶ Extract brain connected regions into separate regions.
Parameters:  min_region_size (int) – Minimum volume in mm3 for a region to be kept.
 extract_type (str) – Type of extraction method [‘connected_components’, ‘local_regions’]. If ‘connected_components’, each component/region in the image is extracted automatically by labelling each region based upon the presence of unique features in their respective regions. If ‘local_regions’, each component/region is extracted based on their maximum peak value to define a seed marker and then using random walker segementation algorithm on these markers for region separation.
 smoothing_fwhm (scalar) – Smooth an image to extract more sparser regions. Only works for extract_type ‘local_regions’.
 is_mask (bool) – Whether the Brain_Data instance should be treated as a boolean mask and if so, calls connected_label_regions instead.
Returns: Brain_Data instance with extracted ROIs as data.
Return type:

regress
(mode='ols', **kwargs)[source]¶ Run a massunivariate regression across voxels. Three types of regressions can be run: 1) Standard OLS (default) 2) Robust OLS (heteroscedasticty and/or autocorrelation robust errors), i.e. OLS with “sandwich estimators” 3) ARMA (autoregressive and movingaverage lags = 1 by default; experimental)
For more information see the help for nltools.stats.regress
ARMA notes: This experimental mode is similar to AFNI’s 3dREMLFit but without spatial smoothing of voxel autocorrelation estimates. It can be very computationally intensive so parallelization is used by default to try to speed things up. Speed is limited because a unique ARMA model is fit to each voxel (like AFNI/FSL), but unlike SPM, which assumes the same AR parameters (~0.2) at each voxel. While coefficient results are typically very similar to OLS, stderrors and so tstats, dfs and and pvals can differ greatly depending on how much autocorrelation is explaining the response in a voxel relative to other regressors in the design matrix.
Parameters:  mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or ‘arma’
 kwargs (dict) – keyword arguments to nltools.stats.regress
Returns:  dictionary of regression statistics in Brain_Data instances
{‘beta’,’t’,’p’,’df’,’residual’}
Return type: out

scale
(scale_val=100.0)[source]¶  Scale all values such that they are on the range [0, scale_val],
 via grandmean scaling. This is NOT globalscaling/intensity normalization. This is useful for ensuring that data is on a common scale (e.g. good for multiple runs, participants, etc) and if the default value of 100 is used, can be interpreted as something akin to (but not exactly) “percent signal change.” This is consistent with default behavior in AFNI and SPM. Change this value to 10000 to make consistent with FSL.
Parameters: scale_val – (int/float) what value to send the grandmean to; default 100

similarity
(image, method='correlation')[source]¶ Calculate similarity of Brain_Data() instance with single Brain_Data or Nibabel image
Parameters:  image – (Brain_Data, nifti) image to evaluate similarity
 method – (str) Type of similarity [‘correlation’,’dot_product’,’cosine’]
Returns: (list) Outputs a vector of pattern expression values
Return type: pexp

smooth
(fwhm)[source]¶ Apply spatial smoothing using nilearn smooth_img()
Parameters: fwhm – (float) full width half maximum of gaussian spatial filter Returns: Brain_Data instance

standardize
(axis=0, method='center')[source]¶ Standardize Brain_Data() instance.
Parameters:  axis – 0 for observations 1 for features
 method – [‘center’,’zscore’]
Returns: Brain_Data Instance

threshold
(upper=None, lower=None, binarize=False, coerce_nan=True)[source]¶  Threshold Brain_Data instance. Provide upper and lower values or
 percentages to perform twosided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every nonzero value.
Parameters:  upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for onesided thresholding.
 lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for onesided thresholding.
 binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every nonzero value; default False
 coerce_nan (bool) – coerce nan values to 0s; default True
Returns: Thresholded Brain_Data object.

transform_pairwise
()[source]¶ Extract brain connected regions into separate regions.
Args:
Returns: Brain_Data instance tranformed into pairwise comparisons Return type: Brain_Data

ttest
(threshold_dict=None, return_mask=False)[source]¶ Calculate one sample ttest across each voxel (twosided)
Parameters:  threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05} or {‘permutation’:tcfe, n_permutation:5000}
 return_mask – (bool) if thresholding is requested, optionall return the mask of voxels that exceed threshold, e.g. for use with another map
Returns:  (dict) dictionary of regression statistics in Brain_Data
instances {‘t’,’p’}
Return type: out

upload_neurovault
(access_token=None, collection_name=None, collection_id=None, img_type=None, img_modality=None, **kwargs)[source]¶  Upload Data to Neurovault. Will add any columns in self.X to image
 metadata. Index will be used as image name.
Parameters:  access_token – (str, Required) Neurovault api access token
 collection_name – (str, Optional) name of new collection to create
 collection_id – (int, Optional) neurovault collection_id if adding images to existing collection
 img_type – (str, Required) Neurovault map_type
 img_modality – (str, Required) Neurovault image modality
Returns: (pd.DataFrame) neurovault collection information
Return type: collection

class
nltools.data.
Adjacency
(data=None, Y=None, matrix_type=None, labels=None, **kwargs)[source]¶ Adjacency is a class to represent Adjacency matrices as a vector rather than a 2dimensional matrix. This makes it easier to perform data manipulation and analyses.
Parameters:  data – pandas data instance or list of files
 matrix_type – (str) type of matrix. Possible values include: [‘distance’,’similarity’,’directed’,’distance_flat’, ‘similarity_flat’,’directed_flat’]
 Y – Pandas DataFrame of training labels
 **kwargs – Additional keyword arguments

append
(data)[source]¶ Append data to Adjacency instance
Parameters: data – (Adjacency) Adjacency instance to append Returns: (Adjacency) new appended Adjacency instance Return type: out

bootstrap
(function, n_samples=5000, save_weights=False, n_jobs=1, random_state=None, *args, **kwargs)[source]¶ Bootstrap an Adjacency method.
Example Useage: b = dat.bootstrap(‘mean’, n_samples=5000) b = dat.bootstrap(‘predict’, n_samples=5000, algorithm=’ridge’) b = dat.bootstrap(‘predict’, n_samples=5000, save_weights=True)Parameters:  function – (str) method to apply to data for each bootstrap
 n_samples – (int) number of samples to bootstrap with replacement
 save_weights – (bool) Save each bootstrap iteration (useful for aggregating many bootstraps on a cluster)
 n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.Returns:
output: summarized studentized bootstrap output

distance
(method='correlation', **kwargs)[source]¶ Calculate distance between images within an Adjacency() instance.
Parameters: method – (str) type of distance metric (can use any scikit learn or sciypy metric) Returns: (Adjacency) Outputs a 2D distance matrix. Return type: dist

distance_to_similarity
(beta=1)[source]¶ Convert distance matrix to similarity matrix
Parameters: beta – (float) parameter to scale exponential function (default: 1) Returns: (Adjacency) Adjacency object Return type: out

mean
(axis=0)[source]¶ Calculate mean of Adjacency
Parameters: axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle. Returns:  float if single, adjacency if axis=0, np.array if axis=1
 and multiple
Return type: mean

plot_label_distance
(labels=None, ax=None)[source]¶ Create a violin plot indicating within and between label distance
Parameters: labels (np.array) – numpy array of labels to plot Returns: violin plot handles Return type: f

plot_mds
(n_components=2, metric=True, labels_color=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, n_jobs=1, view=(30, 20), figsize=[12, 8], ax=None, *args, **kwargs)[source]¶ Plot Multidimensional Scaling
Parameters:  n_components – (int) Number of dimensions to project (can be 2 or 3)
 metric – (bool) Perform metric or nonmetric dimensional scaling; default
 labels_color – (str) list of colors for labels, if len(1) then make all same color
 n_jobs – (int) Number of parallel jobs
 view – (tuple) view for 3Dimensional plot; default (30,20)
Returns: returns matplotlib figure
Return type: fig

plot_silhouette
(labels=None, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]¶ Create a silhouette plot

regress
(X, mode='ols', **kwargs)[source]¶ Run a regression on an adjacency instance. You can decompose an adjacency instance with another adjacency instance. You can also decompose each pixel by passing a design_matrix instance.
Parameters:  X – Design matrix can be an Adjacency or Design_Matrix instance
 method – type of regression (default: ols)
Returns: (dict) dictionary of stats outputs.
Return type: stats

similarity
(data, plot=False, perm_type='2d', n_permute=5000, metric='spearman', **kwargs)[source]¶ Calculate similarity between two Adjacency matrices. Default is to use spearman correlation and permutation test. :param data: Adjacency data, or 1d array same size as self.data :param perm_type: (str) ‘1d’,‘2d’, ‘jackknife’, or None :param metric: (str) ‘spearman’,’pearson’,’kendall’
Estimate the social relations model from a matrix for a roundrobin design
X_{ij} = m + lpha_i + eta_j + g_{ij} + episolon_{ijl}
where X_{ij} is the score for person i rating person j, m is the group mean, lpha_i is person i’s actor effect, eta_j is person j’s partner effect, g_{ij} is the relationship effect and episolon_{ijl} is the error in measure l for actor i and partner j.
This model is primarily concerned with partioning the variance of the various effects.
Code is based on implementation presented in Chapter 8 of Kenny, Kashy, & Cook (2006). Tests replicate examples presented in the book. Note, that this method assumes that actor scores are rows (lower triangle), while partner scores are columnns (upper triangle). The minimal sample size to estimate these effects is 4.
 Model Assumptions:
 Social interactions are exclusively dyadic
 People are randomly sampled from population
 No order effects
 The effects combine additively and relationships are linear
In the future we might update the formulas and standard errors based on Bond and Lashley, 1996
Parameters:  self – (adjacency) can be a single matrix or many matrices for each group
 summarize_results – (bool) will provide a formatted summary of model results
 nan_replace – (bool) will replace nan values with row and column means
Returns: (pd.Series/pd.DataFrame) All of the effects estimated using SRM
Return type: estimated effects

stats_label_distance
(labels=None, n_permute=5000, n_jobs=1)[source]¶ Calculate permutation tests on within and between label distance.
Parameters:  labels (np.array) – numpy array of labels to plot
 n_permute (int) – number of permutations to run (default=5000)
Returns:  dictionary of within and between group differences
and pvalues
Return type: dict

std
(axis=0)[source]¶ Calculate standard deviation of Adjacency
Parameters: axis – (int) calculate std over features (0) or data (1). For data it will be on upper triangle. Returns:  float if single, adjacency if axis=0, np.array if axis=1 and
 multiple
Return type: std

threshold
(upper=None, lower=None, binarize=False)[source]¶  Threshold Adjacency instance. Provide upper and lower values or
 percentages to perform twosided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every nonzero value.
Parameters:  upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for onesided thresholding.
 lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for onesided thresholding.
 binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every nonzero value; default False
Returns: thresholded Adjacency instance
Return type:

ttest
(permutation=False, **kwargs)[source]¶ Calculate ttest across samples.
Parameters: permutation – (bool) Run ttest as permutation. Note this can be very slow. Returns:  (dict) contains Adjacency instances of t values (or mean if
 running permutation) and Adjacency instance of p values.
Return type: out

class
nltools.data.
Design_Matrix
(*args, **kwargs)[source]¶ Design_Matrix is a class to represent design matrices with special methods for data processing (e.g. convolution, upsampling, downsampling) and also intelligent and flexible and intelligent appending (e.g. automatically keep certain columns or polynomial terms separated during concatentation). It plays nicely with Brain_Data and can be used to build an experimental design to pass to Brain_Data’s X attribute. It is essentially an enhanced pandas df, with extra attributes and methods. Methods always return a new design matrix instance (copy). Column names are always string types.
Parameters:  sampling_freq (float) – sampling rate of each row in hertz; To covert seconds to hertz (e.g. in the case of TRs for neuroimaging) using hertz = 1 / TR
 convolved (list, optional) – on what columns convolution has been performed; defaults to None
 polys (list, optional) – list of polynomial terms in design matrix, e.g. intercept, polynomial trends, basis functions, etc; default None

add_dct_basis
(duration=180, drop=0)[source]¶ Adds unit scaled cosine basis functions to Design_Matrix columns, based on spmstyle discrete cosine transform for use in highpass filtering. Does not add intercept/constant. Care is recommended if using this along with .add_poly(), as some columns will be highlycorrelated.
Parameters:  duration (int) – length of filter in seconds
 drop (int) – index of which early/slow bases to drop if any; will always drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2); default None

add_poly
(order=0, include_lower=True)[source]¶ Add nth order Legendre polynomial terms as columns to design matrix. Good for adding constant/intercept to model (order = 0) and accounting for slowfrequency nuisance artifacts e.g. linear, quadratic, etc drifts. Care is recommended when using this with .add_dct_basis() as some columns will be highly correlated.
Parameters:  order (int) – what order terms to add; 0 = constant/intercept (default), 1 = linear, 2 = quadratic, etc
 include_lower – (bool) whether to add lower order terms if order > 0

append
(dm, axis=0, keep_separate=True, unique_cols=[], fill_na=0, verbose=False)[source]¶ Method for concatenating another design matrix row or columnwise. When concatenating rowwise, has the ability to keep certain columns separated if they exist in multiple design matrices (e.g. keeping separate intercepts for multiple runs). This is on by default and will automatically separate out polynomial columns (i.e. anything added with the add_poly or add_dct_basis methods). Additional columns can be separate by run using the unique_cols parameter. Can also add new polynomial terms during vertical concatentation (when axis == 0). This will by default create new polynomial terms separately for each design matrix
Parameters:  dm (Design_Matrix or list) – design_matrix or list of design_matrices to append
 axis (int) – 0 for rowwise (vertcat), 1 for columnwise (horzcat); default 0
 keep_separate (bool,optional) – whether try and uniquify columns; defaults to True; only applies when axis==0
 unique_cols (list,optional) – what additional columns to try to keep separated by uniquifying, only applies when axis = 0; defaults to None
 fill_na (str/int/float) – if provided will fill NaNs with this value during rowwise appending (when axis = 0) if separate columns are desired; default 0
 verbose (bool) – print messages during append about how polynomials are going to be separated

clean
(fill_na=0, exclude_polys=False, thresh=0.95, verbose=True)[source]¶ Method to fill NaNs in Design Matrix and remove duplicate columns based on data values, NOT names. Columns are dropped if they are correlated >= the requested threshold (default = .95). In this case, only the first instance of that column will be retained and all others will be dropped.
Parameters:  fill_na (str/int/float) – value to fill NaNs with set to None to retain NaNs; default 0
 exclude_polys (bool) – whether to skip checking of polynomial terms (i.e. intercept, trends, basis functions); default False
 thresh (float) – correlation threshold to use to drop redundant columns; default .95
 verbose (bool) – print what column names were dropped; default True

convolve
(conv_func='hrf', columns=None)[source]¶ Perform convolution using an arbitrary function.
Parameters:  conv_func (ndarray or string) – either a 1d numpy array containing output of a function that you want to convolve; a samples by kernel 2d array of several kernels to convolve; or the string ‘hrf’ which defaults to a glover HRF function at the Design_matrix’s sampling_freq
 columns (list) – what columns to perform convolution on; defaults to all nonpolynomial columns

downsample
(target, **kwargs)[source]¶  Downsample columns of design matrix. Relies on
 nltools.stats.downsample, but ensures that returned object is a design matrix.
Parameters:  target (float) – desired frequency in hz
 kwargs – additional inputs to nltools.stats.downsample

heatmap
(figsize=(8, 6), **kwargs)[source]¶ Visualize Design Matrix spm style. Use .plot() for typical pandas plotting functionality. Can pass optional keyword args to seaborn heatmap.

replace_data
(data, column_names=None)[source]¶ Convenient method to replace all data in Design_Matrix with new data while keeping attributes and polynomial columns untouched.
Parameters: columns_names (list) – list of columns names for new data

upsample
(target, **kwargs)[source]¶  Upsample columns of design matrix. Relies on
 nltools.stats.upsample, but ensures that returned object is a design matrix.
Parameters:  target (float) – desired frequence in hz
 kwargs – additional inputs to nltools.stats.downsample

vif
(exclude_polys=True)[source]¶  Compute variance inflation factor amongst columns of design matrix,
 ignoring polynomial terms. Much faster that statsmodels and more reliable too. Uses the same method as Matlab and R (diagonal elements of the inverted correlation matrix).
Returns: list with length == number of columns  intercept exclude_polys (bool): whether to skip checking of polynomial terms (i.e. intercept, trends, basis functions); default True Return type: vifs (list)
nltools.analysis
: Analysis Tools¶

class
nltools.analysis.
Roc
(input_values=None, binary_outcome=None, threshold_type='optimal_overall', forced_choice=None, **kwargs)[source]¶ Roc Class
The Roc class is based on Tor Wager’s Matlab roc_plot.m function and allows a user to easily run different types of receiver operator characteristic curves. For example, one might be interested in single interval or forced choice.
Parameters:  input_values – nibabel data instance
 binary_outcome – vector of training labels
 threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]
 **kwargs – Additional keyword arguments to pass to the prediction algorithm

calculate
(input_values=None, binary_outcome=None, criterion_values=None, threshold_type='optimal_overall', forced_choice=None, balanced_acc=False)[source]¶ Calculate Receiver Operating Characteristic plot (ROC) for singleinterval classification.
Parameters:  input_values – nibabel data instance
 binary_outcome – vector of training labels
 criterion_values – (optional) criterion values for calculating fpr & tpr
 threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]
 forced_choice – index indicating position for each unique subject (default=None)
 balanced_acc – balanced accuracy for singleinterval classification (bool). THIS IS NOT COMPLETELY IMPLEMENTED BECAUSE IT AFFECTS ACCURACY ESTIMATES, BUT NOT PVALUES OR THRESHOLD AT WHICH TO EVALUATE SENS/SPEC
 **kwargs – Additional keyword arguments to pass to the prediction algorithm

plot
(plot_method='gaussian', balanced_acc=False, **kwargs)[source]¶ Create ROC Plot
Create a specific kind of ROC curve plot, based on input values along a continuous distribution and a binary outcome variable (logical)
Parameters:  plot_method – type of plot [‘gaussian’,’observed’]
 binary_outcome – vector of training labels
 **kwargs – Additional keyword arguments to pass to the prediction algorithm
Returns: fig
nltools.stats
: Stats Tools¶

nltools.stats.
pearson
(x, y)[source]¶ Correlates row vector x with each row vector in 2D array y. From neurosynth.stats.py  author: Tal Yarkoni

nltools.stats.
zscore
(df)[source]¶ zscore every column in a pandas dataframe or series.
Parameters: df – (pd.DataFrame) Pandas DataFrame instance Returns: (pd.DataFrame) zscored pandas DataFrame or series instance Return type: z_data

nltools.stats.
fdr
(p, q=0.05)[source]¶ Determine FDR threshold given a p value array and desired false discovery rate q. Written by Tal Yarkoni
Parameters:  p – (np.array) vector of pvalues (only considers nonzero pvalues)
 q – (float) false discovery rate level
Returns:  (float) pvalue threshold based on independence or positive
dependence
Return type: fdr_p

nltools.stats.
holm_bonf
(p, alpha=0.05)[source]¶ Compute corrected pvalues based on the HolmBonferroni method, i.e. stepdown procedure applying iteratively less correction to highest pvalues. A bit more conservative than fdr, but much more powerful thanvanilla bonferroni.
Parameters:  p – (np.array) vector of pvalues
 alpha – (float) alpha level
Returns:  (float) pvalue threshold based on bonferroni
stepdown procedure
Return type: bonf_p

nltools.stats.
threshold
(stat, p, thr=0.05, return_mask=False)[source]¶ Threshold test image by pvalue from p image
Parameters:  stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)
 p – (Brain_Data) Brain_data instance of pvalues
 threshold – (float) pvalue to threshold stat image
 return_mask – (bool) optionall return the thresholding mask; default False
Returns: Thresholded Brain_Data instance
Return type: out

nltools.stats.
multi_threshold
(t_map, p_map, thresh)[source]¶ Threshold test image by multiple pvalue from p image
Parameters:  stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)
 p – (Brain_Data) Brain_data instance of pvalues
 threshold – (list) list of pvalues to threshold stat image
Returns: Thresholded Brain_Data instance
Return type: out

nltools.stats.
winsorize
(data, cutoff=None, replace_with_cutoff=True)[source]¶ Winsorize a Pandas DataFrame or Series with the largest/lowest value not considered outlier
Parameters:  data – (pd.DataFrame, pd.Series) data to winsorize
 cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}
 replace_with_cutoff – (bool) If True, replace outliers with cutoff. If False, replaces outliers with closest existing values; (default: False)
Returns: (pd.DataFrame, pd.Series) winsorized data
Return type: out

nltools.stats.
trim
(data, cutoff=None)[source]¶ Trim a Pandas DataFrame or Series by replacing outlier values with NaNs
Parameters:  data – (pd.DataFrame, pd.Series) data to trim
 cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}
Returns: (pd.DataFrame, pd.Series) trimmed data
Return type: out

nltools.stats.
calc_bpm
(beat_interval, sampling_freq)[source]¶ Calculate instantaneous BPM from beat to beat interval
Parameters:  beat_interval – (int) number of samples in between each beat (typically RR Interval)
 sampling_freq – (float) sampling frequency in Hz
Returns: (float) beats per minute for time interval
Return type: bpm

nltools.stats.
downsample
(data, sampling_freq=None, target=None, target_type='samples', method='mean')[source]¶ Downsample pandas to a new target frequency or number of samples using averaging.
Parameters:  data – (pd.DataFrame, pd.Series) data to downsample
 sampling_freq – (float) Sampling frequency of data in hertz
 target – (float) downsampling target
 target_type – type of target can be [samples,seconds,hz]
 method – (str) type of downsample method [‘mean’,’median’], default: mean
Returns: (pd.DataFrame, pd.Series) downsmapled data
Return type: out

nltools.stats.
upsample
(data, sampling_freq=None, target=None, target_type='samples', method='linear')[source]¶ Upsample pandas to a new target frequency or number of samples using interpolation.
Parameters:  data – (pd.DataFrame, pd.Series) data to upsample (Note: will drop nonnumeric columns from DataFrame)
 sampling_freq – Sampling frequency of data in hertz
 target – (float) upsampling target
 target_type – (str) type of target can be [samples,seconds,hz]
 method – (str) [‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’] where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order (default: linear)
Returns: upsampled pandas object

nltools.stats.
one_sample_permutation
(data, n_permute=5000, tail=2, n_jobs=1, random_state=None)[source]¶ One sample permutation test using randomization.
Parameters:  data – (pd.DataFrame, pd.Series, np.array) data to permute
 n_permute – (int) number of permutations
 tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
 n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
Returns: (dict) dictionary of permutation results [‘mean’,’p’]
Return type: stats

nltools.stats.
two_sample_permutation
(data1, data2, n_permute=5000, tail=2, n_jobs=1, random_state=None)[source]¶ Independent sample permutation test.
Parameters:  data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute
 data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute
 n_permute – (int) number of permutations
 tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
 n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
Returns: (dict) dictionary of permutation results [‘mean’,’p’]
Return type: stats

nltools.stats.
correlation_permutation
(data1, data2, n_permute=5000, metric='spearman', tail=2, n_jobs=1, random_state=None)[source]¶ Permute correlation.
Parameters:  data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute
 data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute
 n_permute – (int) number of permutations
 metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]
 tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
 n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
Returns: (dict) dictionary of permutation results [‘correlation’,’p’]
Return type: stats

nltools.stats.
matrix_permutation
(data1, data2, n_permute=5000, metric='spearman', tail=2, n_jobs=1, random_state=None)[source]¶ Permute 2dimensional matrix correlation (mantel test).
Chen, G. et al. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to intersubject correlation analysis at the group level. Neuroimage, 142, 248259.
Parameters:  data1 – (pd.DataFrame, np.array) square matrix
 data2 – (pd.DataFrame, np.array) square matrix
 n_permute – (int) number of permutations
 metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]
 tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
 n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
Returns: (dict) dictionary of permutation results [‘correlation’,’p’]
Return type: stats

nltools.stats.
jackknife_permutation
(data1, data2, metric='spearman', p_value='permutation', n_jobs=1, n_permute=5000, tail=2, random_state=None)[source]¶ This function uses a randomization test on a jackknife of absolute distance/similarity of each subject
Parameters:  data1 – (Adjacency, pd.DataFrame, np.array) square matrix
 data2 – (Adjacency, pd.DataFrame, np.array) square matrix
 metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]
 tail – (int) either 1 for onetail or 2 for twotailed test (default: 2)
 p_value – [‘ttest’, ‘permutation’]
 n_permute – (int) number of permutations
 n_jobs – (int) The number of CPUs to use to do the computation. 1 means all CPUs.
Returns: (dict) dictionary of permutation results [‘correlation’,’p’]
Return type: stats

nltools.stats.
make_cosine_basis
(nsamples, sampling_freq, filter_length, unit_scale=True, drop=0)[source]¶  Create a series of cosine basis functions for a discrete cosine
 transform. Based off of implementation in spm_filter and spm_dctmtx because scipy dct can only apply transforms but not return the basis functions. Like SPM, does not add constant (i.e. intercept), but does retain first basis (i.e. sigmoidal/linear drift)
Parameters:  nsamples (int) – number of observations (e.g. TRs)
 sampling_freq (float) – sampling frequency in hertz (i.e. 1 / TR)
 filter_length (int) – length of filter in seconds
 unit_scale (true) – assure that the basis functions are on the normalized range [1, 1]; default True
 drop (int) – index of which early/slow bases to drop if any; default is to drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2)
Returns: nsamples x number of basis sets numpy array
Return type: out (ndarray)

nltools.stats.
summarize_bootstrap
(data, save_weights=False)[source]¶ Calculate summary of bootstrap samples
Parameters:  sample – (Brain_Data) Brain_Data instance of samples
 save_weights – (bool) save bootstrap weights
Returns: (dict) dictionary of Brain_Data summary images
Return type: output

nltools.stats.
regress
(X, Y, mode='ols', stats='full', **kwargs)[source]¶ This is a flexible function to run several types of regression models provided X and Y numpy arrays. Y can be a 1d numpy array or 2d numpy array. In the latter case, results will be output with shape 1 x Y.shape[1], in other words fitting a separate regression model to each column of Y.
Does NOT add an intercept automatically to the X matrix before fitting like some other software packages. This is left up to the user.
This function can compute regression in 3 ways: 1) Standard OLS 2) OLS with robust sandwich estimators for standard errors. 3 robust types of estimators exist:
 ‘hc0’  classic huberwhite estimator robust to heteroscedasticity (default)
 ‘hc3’  a variant on huberwhite estimator slightly more conservative when sample sizes are small
 ‘hac’  an estimator robust to both heteroscedasticity and autocorrelation; autocorrelation lag can be controlled with the ‘nlags’ keyword argument; default is 1
 ARMA (autoregressive movingaverage) model (experimental). This model is fit through statsmodels.tsa.arima_model.ARMA, so more information about options can be found there. Any settings can be passed in as kwargs. By default fits a (1,1) model with starting lags of 2. This mode is computationally intensive and can take quite a while if Y has many columns. If Y is a 2d array joblib.Parallel is used for faster fitting by parallelizing fits across columns of Y. Parallelization can be controlled by passing in kwargs. Defaults to multithreading using 10 separate threads, as threads don’t require large arrays to be duplicated in memory. Defaults are also set to enable memorymapping for very large arrays if backend=’multiprocessing’ to prevent crashes and hangs. Various levels of progress can be monitored using the ‘disp’ (statsmodels) and ‘verbose’ (joblib) keyword arguments with integer values > 0.
Examples
Standard OLS
>>> results = regress(X,Y,mode='ols')
Robust OLS with heteroscedasticity (hc0) robust standard errors
>>> results = regress(X,Y,mode='robust')
Robust OLS with heteroscedasticty and autocorrelation (with lag 2) robust standard errors
>>> results = regress(X,Y,mode='robust',robust_estimator='hac',nlags=2)
Autoregressive mode with autoregressive and movingaverage lags = 1
>>> results = regress(X,Y,mode='arma',order=(1,1))
Autoregressive model with autoregressive lag = 2, movingaverage lag = 3, and multiprocessing instead of multithreading using 8 cores (this can use a lot of memory if input arrays are very large!).
>>> results = regress(X,Y,mode='arma',order=(2,3),backend='multiprocessing',n_jobs=8)
Parameters:  X (ndarray) – design matrix; assumes intercept is included
 Y (ndarray) – dependent variable array; if 2d, a model is fit to each column of Y separately
 mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or ‘arma’
 robust_estimator (str,optional) – kind of robust estimator to use if mode = ‘robust’; default ‘hc0’
 nlags (int,optional) – autocorrelation lag correction if mode = ‘robust’ and robust_estimator = ‘hac’; default 1
 order (tuple,optional) – autoregressive and movingaverage orders for mode = ‘arma’; default (1,1)
 kwargs (dict) – additional keyword arguments to statsmodels.tsa.arima_model.ARMA and joblib.Parallel
Returns: coefficients t: tstatistics (coef/sterr) p : pvalues df: degrees of freedom res: residuals
Return type: b

nltools.stats.
procrustes
(data1, data2)[source]¶ Procrustes analysis, a similarity test for two data sets.
Each input matrix is a set of points or vectors (the rows of the matrix). The dimension of the space is the number of columns of each matrix. Given two identically sized matrices, procrustes standardizes both such that:  \(tr(AA^{T}) = 1\).  Both sets of points are centered around the origin. Procrustes ([1]_, [2]_) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize \(M^{2}=\sum(data1data2)^{2}\), or the sum of the squares of the pointwise differences between the two input datasets. This function was not designed to handle datasets with different numbers of datapoints (rows). If two data sets have different dimensionality (different number of columns), this function will add columns of zeros to the smaller of the two.
Parameters:  data1 – array_like Matrix, n rows represent points in k (columns) space data1 is the reference data, after it is standardised, the data from data2 will be transformed to fit the pattern in data1 (must have >1 unique points).
 data2 – array_like
n rows of data in k space to be fit to data1. Must be the same
shape
(numrows, numcols)
as data1 (must have >1 unique points).
Returns:  array_like
A standardized version of data1.
 mtx2 : array_like
The orientation of data2 that best fits data1. Centered, but not necessarily \(tr(AA^{T}) = 1\).
 disparity : float
\(M^{2}\) as defined above.
 R : (N, N) ndarray
The matrix solution of the orthogonal Procrustes problem. Minimizes the Frobenius norm of dot(data1, R)  data2, subject to dot(R.T, R) == I.
 scale : float
Sum of the singular values of
dot(data1.T, data2)
.
Return type: mtx1

nltools.stats.
procrustes_distance
(mat1, mat2, n_permute=5000, tail=2, n_jobs=1, random_state=None)[source]¶ Use procrustes superposition to perform a similarity test between 2 matrices. Matrices need to match in size on their first dimension only, as the smaller matrix on the second dimension will be padded with zeros. After aligning two matrices using the procrustes transformation, use the computed disparity between them (sum of squared error of elements) as a similarity metric. Shuffle the rows of one of the matrices and recompute the disparity to perform inference (PeresNeto & Jackson, 2001).
Parameters:  mat1 (ndarray) – 2d numpy array; must have same number of rows as mat2
 mat2 (ndarray) – 1d or 2d numpy array; must have same number of rows as mat1
 n_permute (int) – number of permutation iterations to perform
 tail (int) – either 1 for onetailed or 2 for twotailed test; default 2
 n_jobs (int) – The number of CPUs to use to do permutation; default 1 (all)
Returns: similarity between matrices bounded between 0 and 1 pval (float): permuted pvalue
Return type: similarity (float)

nltools.stats.
align
(data, method='deterministic_srm', n_features=None, axis=0, *args, **kwargs)[source]¶ Align subject data into a common response model.
Can be used to hyperalign source data to target data using Hyperalignemnt from Dartmouth (i.e., procrustes transformation; see nltools.stats.procrustes) or Shared Response Model from Princeton (see nltools.external.srm). (see nltools.data.Brain_Data.align for aligning a single Brain object to another). Common Model is shared response model or centered target data.Transformed data can be back projected to original data using Tranformation matrix.
Examples
 Hyperalign using procrustes transform:
 out = align(data, method=’procrustes’)
 Align using shared response model:
 out = align(data, method=’probabilistic_srm’, n_features=None)
 Project aligned data into original data:
 original_data = [np.dot(t.data,tm.T) for t,tm in zip(out[‘transformed’], out[‘transformation_matrix’])]
Parameters:  data – (list) A list of Brain_Data objects
 method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]
 n_features – (int) number of features to align to common space. If None then will select number of voxels
 axis – (int) axis to align on
Returns:  (dict) a dictionary containing a list of transformed subject
matrices, a list of transformation matrices, the shared response matrix, and the intersubject correlation of the shared resposnes
Return type: out

nltools.stats.
find_spikes
(data, global_spike_cutoff=3, diff_spike_cutoff=3)[source]¶ Function to identify spikes from fMRI Time Series Data
Parameters:  data – Brain_Data or nibabel instance
 global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.
 diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.
Returns: pandas dataframe with spikes as indicator variables

nltools.stats.
correlation
(data1, data2, metric='pearson')[source]¶ This function calculates the correlation between data1 and data2
Parameters:  data1 – (np.array) x
 data2 – (np.array) y
 metric – (str) type of correlation [“spearman” or “pearson” or “kendall”]
Returns: (np.array) correlations p: (float) pvalue
Return type: r

nltools.stats.
distance_correlation
(x, y, bias_corrected=True, ttest=False)[source]¶ Compute the distance correlation betwen 2 arrays to test for multivariate dependence (linear or nonlinear). Arrays must match on their first dimension. It’s almost always preferable to compute the bias_corrected version which can also optionally perform a ttest. This ttest operates on a statistic thats ~dcorr^2 and will be also returned.
Explanation: Distance correlation involves computing the normalized covariance of two centered euclidean distance matrices. Each distance matrix is the euclidean distance between rows (if x or y are 2d) or scalars (if x or y are 1d). Each matrix is centered prior to computing the covariance either using doublecentering or ucentering, which corrects for bias as the number of dimensions increases. Ucentering is almost always preferred in all cases. It also permits inference of the normalized covariance between each distance matrix using a onetailed directional ttest. (Szekely & Rizzo, 2013). While distance correlation is normally bounded between 0 and 1, ucentering can produce negative estimates, which are never significant.
Validated against the dcor and dcor.ttest functions in the ‘energy’ R package and the dcor.distance_correlation, dcor.udistance_correlation_sqr, and dcor.independence.distance_correlation_t_test functions in the dcor Python package.
Parameters:  x (ndarray) – 1d or 2d numpy array of observations by features
 y (ndarry) – 1d or 2d numpy array of observations by features
 bias_corrected (bool) – if false use doublecentering which produces a biasedestimate that converges to 1 as the number of dimensions increase. Otherwise used ucentering to correct this bias. Note this must be True if ttest=True; default True
 ttest (bool) – perform a ttest using the bias_corrected distance correlation; default False
Returns: dictionary of results (correlation, t, p, and df.) Optionally, covariance, x variance, and y variance
Return type: results (dict)

nltools.stats.
transform_pairwise
(X, y)[source]¶ Transforms data into pairs with balanced labels for ranking Transforms a nclass ranking problem into a twoclass classification problem. Subclasses implementing particular strategies for choosing pairs should override this method. In this method, all pairs are choosen, except for those that have the same target value. The output is an array of balanced classes, i.e. there are the same number of 1 as +1
Reference: “Large Margin Rank Boundaries for Ordinal Regression”, R. Herbrich, T. Graepel, K. Obermayer. Authors: Fabian Pedregosa <fabian@fseoane.net>
Alexandre Gramfort <alexandre.gramfort@inria.fr>Parameters:  X – (np.array), shape (n_samples, n_features) The data
 y – (np.array), shape (n_samples,) or (n_samples, 2) Target labels. If it’s a 2D array, the second column represents the grouping of samples, i.e., samples with different groups will not be considered.
Returns:  (np.array), shape (k, n_feaures)
Data as pairs, where k = n_samples * (n_samples1)) / 2 if grouping values were not passed. If grouping variables exist, then returns values computed for each group.
 y_trans: (np.array), shape (k,)
Output class labels, where classes have values {1, +1} If y was shape (n_samples, 2), then returns (k, 2) with groups on the second dimension.
Return type: X_trans
nltools.datasets
: Dataset Tools¶
NeuroLearn datasets¶
functions to help download datasets

nltools.datasets.
get_collection_image_metadata
(collection=None, data_dir=None, limit=10)[source]¶ Get image metadata associated with collection
Args: collection: (int) collection id data_dir: (str) data directory limit: (int) number of images to increment
Returns: metadata: (pd.DataFrame) Dataframe with full image metadata from collection

nltools.datasets.
download_collection
(collection=None, data_dir=None, overwrite=False, resume=True, verbose=1)[source]¶ Download images and metadata from Neurovault collection
Args: collection: (int) collection id data_dir: (str) data directory
Returns: metadata: (pd.DataFrame) Dataframe with full image metadata from collection files: (list) list of files of downloaded collection

nltools.datasets.
fetch_emotion_ratings
(data_dir=None, resume=True, verbose=1)[source]¶ Download and loads emotion rating dataset from neurovault
Parameters: data_dir – (string, optional). Path of the data directory. Used to force data storage in a specified location. Default: None Returns: (Brain_Data) Brain_Data object with downloaded data. X=metadata Return type: out

nltools.datasets.
fetch_pain
(data_dir=None, resume=True, verbose=1)[source]¶ Download and loads pain dataset from neurovault
Parameters: data_dir – (string, optional) Path of the data directory. Used to force data storage in a specified location. Default: None Returns: (Brain_Data) Brain_Data object with downloaded data. X=metadata Return type: out

nltools.datasets.
fetch_localizer
(subject_ids=None, get_anats=False, data_type='raw', data_dir=None, url=None, resume=True, verbose=1)[source]¶ Download and load Brainomics Localizer dataset (94 subjects). “The Functional Localizer is a simple and fast acquisition procedure based on a 5minute functional magnetic resonance imaging (fMRI) sequence that can be run as easily and as systematically as an anatomical scan. This protocol captures the cerebral bases of auditory and visual perception, motor actions, reading, language comprehension and mental calculation at an individual level. Individual functional maps are reliable and quite precise. The procedure is decribed in more detail on the Functional Localizer page.” This code is modified from fetch_localizer_contrasts from nilearn.datasets.funcs.py. (see http://brainomics.cea.fr/localizer/) “Scientific results obtained using this dataset are described in Pinel et al., 2007” [1]
Notes: It is better to perform several small requests than a big one because the Brainomics server has no cache (can lead to timeout while the archive is generated on the remote server). For example, download n_subjects=np.array(1,10), then n_subjects=np.array(10,20), etc.
Parameters:  subject_ids – (list) List of Subject IDs (e.g., [‘S01’,’S02’]. If None is given, all 94 subjects are used.
 get_anats – (boolean) Whether individual structural images should be fetched or not.
 data_type – (string) type of data to download. Valid values are [‘raw’,’preprocessed’]
 data_dir – (string, optional) Path of the data directory. Used to force data storage in a specified location.
 url – (string, optional) Override download URL. Used for test only (or if you setup a mirror of the data).
 resume – (bool) Whether to resume download of a partlydownloaded file.
 verbose – (int) Verbosity level (0 means no message).
Returns:  (Bunch)
Dictionarylike object, the interest attributes are :  ‘functional’: string list
Paths to nifti contrast maps
 ’structural’ string
 Path to nifti files corresponding to the subjects structural images
Return type: data
References
Pinel, Philippe, et al. “Fast reproducible identification and largescale databasing of individual functional cognitive networks.” BMC neuroscience 8.1 (2007): 91.
nltools.cross_validation
: CrossValidation Tools¶

class
nltools.cross_validation.
KFoldStratified
(n_splits=3, shuffle=False, random_state=None)[source]¶ KFolds cross validation iterator which stratifies continuous data (unlike scikitlearn equivalent).
Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k  1 remaining folds form the training set. Extension of KFold from scikitlearn cross_validation model
Parameters:  n_splits – int, default=3 Number of folds. Must be at least 2.
 shuffle – boolean, optional Whether to shuffle the data before splitting into batches.
 random_state – None, int or RandomState Pseudorandom number generator state used for random sampling. If None, use default numpy RNG for shuffling

split
(X, y, groups=None)[source]¶ Generate indices to split data into training and test set.
Parameters:  X – arraylike, shape (n_samples, n_features)
Training data, where n_samples is the number of samples
and n_features is the number of features.
Note that providing
y
is sufficient to generate the splits and hencenp.zeros(n_samples)
may be used as a placeholder forX
instead of actual training data.  y – arraylike, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.
 groups – (object) Always ignored, exists for compatibility.
Returns: (ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.
Return type: train
 X – arraylike, shape (n_samples, n_features)
Training data, where n_samples is the number of samples
and n_features is the number of features.
Note that providing

nltools.cross_validation.
set_cv
(Y=None, cv_dict=None, return_generator=True)[source]¶ Helper function to create a scikit learn compatible cv object using common parameters for prediction analyses.
Parameters:  Y – (pd.DataFrame) Pandas Dataframe of Y labels
 cv_dict – (dict) Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout}
 return_generator (bool) – return a cv generator instead of an instance; default True
Returns: a scikitlearn modelselection generator
Return type: cv

class
nltools.cross_validation.
KFoldStratified
(n_splits=3, shuffle=False, random_state=None)[source] KFolds cross validation iterator which stratifies continuous data (unlike scikitlearn equivalent).
Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k  1 remaining folds form the training set. Extension of KFold from scikitlearn cross_validation model
Parameters:  n_splits – int, default=3 Number of folds. Must be at least 2.
 shuffle – boolean, optional Whether to shuffle the data before splitting into batches.
 random_state – None, int or RandomState Pseudorandom number generator state used for random sampling. If None, use default numpy RNG for shuffling

split
(X, y, groups=None)[source] Generate indices to split data into training and test set.
Parameters:  X – arraylike, shape (n_samples, n_features)
Training data, where n_samples is the number of samples
and n_features is the number of features.
Note that providing
y
is sufficient to generate the splits and hencenp.zeros(n_samples)
may be used as a placeholder forX
instead of actual training data.  y – arraylike, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.
 groups – (object) Always ignored, exists for compatibility.
Returns: (ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.
Return type: train
 X – arraylike, shape (n_samples, n_features)
Training data, where n_samples is the number of samples
and n_features is the number of features.
Note that providing
nltools.mask
: Mask Tools¶
NeuroLearn Mask Classes¶
Classes to represent masks

nltools.mask.
create_sphere
(coordinates, radius=5, mask=None)[source]¶ Generate a set of spheres in the brain mask space
Parameters:  radius – vector of radius. Will create multiple spheres if len(radius) > 1
 centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

nltools.mask.
expand_mask
(mask, custom_mask=None)[source]¶ expand a mask with multiple integers into separate binary masks
Parameters:  mask – nibabel or Brain_Data instance
 custom_mask – nibabel instance or string to file path; optional
Returns: Brain_Data instance of multiple binary masks
Return type: out

nltools.mask.
collapse_mask
(mask, auto_label=True, custom_mask=None)[source]¶  collapse separate masks into one mask with multiple integers
 overlapping areas are ignored
Parameters:  mask – nibabel or Brain_Data instance
 custom_mask – nibabel instance or string to file path; optional
Returns:  Brain_Data instance of a mask with different integers indicating
different masks
Return type: out

nltools.mask.
roi_to_brain
(data, mask_x)[source]¶ This function will create convert an expanded binary mask of ROIs (see expand_mask) based on a vector of of values. The dataframe of values must correspond to ROI numbers.
This is useful for populating a parcellation scheme by a vector of Values
Parameters:  data – Pandas series or dataframe of ROI by observation
 mask_x – an expanded binary mask
Returns:  (Brain_Data) Brain_Data instance where each ROI is now populated
with a value
Return type: out
nltools.file_reader
: File Reading¶
NeuroLearn File Reading Tools¶

nltools.file_reader.
onsets_to_dm
(F, sampling_freq, run_length, header='infer', sort=False, keep_separate=True, add_poly=None, unique_cols=[], fill_na=None, **kwargs)[source]¶ This function can assist in reading in one or several in a 23 column onsets files, specified in seconds and converting it to a Design Matrix organized as samples X Stimulus Classes. Onsets files must be organized with columns in one of the following 4 formats:
 ‘Stim, Onset’
 ‘Onset, Stim’
 ‘Stim, Onset, Duration’
 ‘Onset, Duration, Stim’
No other file organizations are currently supported
Parameters:  F (filepath/DataFrame/list) – path to file, pandas dataframe, or list of files or pandas dataframes
 sampling_freq (float) – sampling frequency in hertz; for TRs use (1 / TR) run_length (int): number of TRs in the run these onsets came from
 sort (bool, optional) – whether to sort the columns of the resulting design matrix alphabetically; defaults to False
 (int, optional (addpoly) – what order polynomial terms to add as new columns (e.g. 0 for intercept, 1 for linear trend and intercept, etc); defaults to None
 header (str,optional) – None if missing header, otherwise pandas header keyword; defaults to ‘infer’
 keep_separate (bool) – whether to seperate polynomial columns if reading a list of files and using the addpoly option
 unique_cols (list) – additional columns to keep seperate across files (e.g. spikes)
 fill_nam (str/int/float) – what value fill NaNs in with if reading in a list of files
 kwargs – additional inputs to pandas.read_csv
 Returns – Design_Matrix class
nltools.util
: Utilities¶
NeuroLearn Utilities¶
handy utilities.

nltools.utils.
get_anatomical
()[source]¶ Get nltools default anatomical image. DEPRECATED. See MNI_Template and resolve_mni_path from nltools.prefs

nltools.utils.
set_algorithm
(algorithm, *args, **kwargs)[source]¶ Setup the algorithm to use in subsequent prediction analyses.
Parameters:  algorithm – The prediction algorithm to use. Either a string or an (uninitialized) scikitlearn prediction object. If string, must be one of ‘svm’,’svr’, linear’,’logistic’,’lasso’, ‘lassopcr’,’lassoCV’,’ridge’,’ridgeCV’,’ridgeClassifier’, ‘randomforest’, or ‘randomforestClassifier’
 kwargs – Additional keyword arguments to pass onto the scikitlearn clustering object.
Returns: dictionary of settings for prediction
Return type: predictor_settings

nltools.utils.
set_decomposition_algorithm
(algorithm, n_components=None, *args, **kwargs)[source]¶ Setup the algorithm to use in subsequent decomposition analyses.
Parameters:  algorithm – The decomposition algorithm to use. Either a string or an (uninitialized) scikitlearn decomposition object. If string must be one of ‘pca’,’nnmf’, ica’,’fa’
 kwargs – Additional keyword arguments to pass onto the scikitlearn clustering object.
Returns: dictionary of settings for prediction
Return type: predictor_settings
nltools.plotting
: Plotting Tools¶
NeuroLearn Plotting Tools¶
Numerous functions to plot data

nltools.plotting.
dist_from_hyperplane_plot
(stats_output)[source]¶ Plot SVM Classification Distance from Hyperplane
Parameters: stats_output – a pandas file with prediction output Returns: Will return a seaborn plot of distance from hyperplane Return type: fig

nltools.plotting.
scatterplot
(stats_output)[source]¶ Plot Prediction Scatterplot
Parameters: stats_output – a pandas file with prediction output Returns: Will return a seaborn scatterplot Return type: fig

nltools.plotting.
probability_plot
(stats_output)[source]¶ Plot Classification Probability
Parameters: stats_output – a pandas file with prediction output Returns: Will return a seaborn scatterplot Return type: fig

nltools.plotting.
roc_plot
(fpr, tpr)[source]¶ Plot 1Specificity by Sensitivity
Parameters:  fpr – false positive rate from Roc.calculate
 tpr – true positive rate from Roc.calculate
Returns: Will return a matplotlib ROC plot
Return type: fig

nltools.plotting.
plot_stacked_adjacency
(adjacency1, adjacency2, normalize=True, **kwargs)[source]¶ Create stacked adjacency to illustrate similarity.
Parameters:  matrix1 – Adjacency instance 1
 matrix2 – Adjacency instance 2
 normalize – (boolean) Normalize matrices.
Returns: matplotlib figure

nltools.plotting.
plot_mean_label_distance
(distance, labels, ax=None, permutation_test=False, n_permute=5000, fontsize=18, **kwargs)[source]¶ Create a violin plot indicating within and between label distance.
Parameters:  distance – pandas dataframe of distance
 labels – labels indicating columns and rows to group
 ax – matplotlib axis to plot on
 permutation_test – (bool) indicates whether to run permuatation test or not
 n_permute – (int) number of permutations to run
 fontsize – (int) fontsize for plot labels
Returns: heatmap stats: (optional if permutation_test=True) permutation results
Return type: f

nltools.plotting.
plot_between_label_distance
(distance, labels, ax=None, permutation_test=True, n_permute=5000, fontsize=18, **kwargs)[source]¶ Create a heatmap indicating average between label distance
Parameters:  distance – (pandas dataframe) brain_distance matrix
 labels – (pandas dataframe) group labels
 ax – axis to plot (default=None)
 permutation_test – (boolean)
 n_permute – (int) number of samples for permuation test
 fontsize – (int) size of font for plot
Returns: heatmap out: pandas dataframe of pairwise distance between conditions within_dist_out: average pairwise distance matrix mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions p_dist_out: (optional if permutation_test=True) pvalue for difference in distance between conditions
Return type: f

nltools.plotting.
plot_silhouette
(distance, labels, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]¶ Create a silhouette plot indicating between relative to within label distance
Parameters:  distance – (pandas dataframe) brain_distance matrix
 labels – (pandas dataframe) group labels
 ax – axis to plot (default=None)
 permutation_test – (boolean)
 n_permute – (int) number of samples for permuation test
 Optional keyword args:
 figsize: (list) dimensions of silhouette plot colors: (list) color triplets for silhouettes. Length must equal number of unique labels
Returns: heatmap # out: pandas dataframe of pairwise distance between conditions # within_dist_out: average pairwise distance matrix # mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions # p_dist_out: (optional if permutation_test=True) pvalue for difference in distance between conditions Return type: # f

nltools.plotting.
plot_t_brain
(objIn, how='full', thr='unc', alpha=None, nperm=None, cut_coords=[], **kwargs)[source]¶ Takes a brain data object and computes a 1 sample ttest across it’s first axis. If a list is provided will compute difference between brain data objects in list (i.e. paired samples ttest). :param objIn: (list/Brain_Data) if list will compute difference map first :param how: (list) whether to plot a glass brain ‘glass’, 3 viewmultislice mni ‘mni’, or both ‘full’ :param thr: (str) what method to use for multiple comparisons correction unc, fdr, or tfce :param alpha: (float) pvalue threshold :param nperm: (int) number of permutations for tcfe; default 1000 :param cut_coords: (list) x,y,z coords to plot brain slice :param kwargs: optionals args to nilearn plot functions (e.g. vmax)

nltools.plotting.
plot_brain
(objIn, how='full', thr_upper=None, thr_lower=None, **kwargs)[source]¶ More complete brain plotting of a Brain_Data instance :param obj: (Brain_Data) object to plot :param how: (str) whether to plot a glass brain ‘glass’, 3 viewmultislice mni ‘mni’, or both ‘full’ :param thr_upper: (str/float) thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold() :param thr_lower: (str/float) thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold() :param kwargs: optionals args to nilearn plot functions (e.g. vmax)

nltools.plotting.
plot_interactive_brain
(brain, threshold=1e06, surface=False, percentile_threshold=False, anatomical=None, **kwargs)[source]¶ This function leverages nilearn’s new javascript based brain viewer functions to create interactive plotting functionality.
Parameters:  brain (nltools.Brain_Data) – a Brain_Data instance of 1d or 2d shape (i.e. 3d or 4d volume)
 threshold (float/str) – threshold to initialize the visualization, maybe be a percentile string; default 0
 surface (bool) – whether to create a surfacebased plot; default False
 percentile_threshold (bool) – whether to interpret threshold values as percentiles
 kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf
Returns: interactive brain viewer widget
nltools.simulator
: Simulator Tools¶
NeuroLearn Simulator Tools¶
Tools to simulate multivariate data.

class
nltools.simulator.
Simulator
(brain_mask=None, output_dir=None)[source]¶ 
create_cov_data
(cor, cov, sigma, mask=None, reps=1, n_sub=1, output_dir=None)[source]¶ create continuous simulated data with covariance
Parameters:  cor – amount of covariance between each voxel and Y variable
 cov – amount of covariance between voxels
 sigma – amount of noise to add
 radius – vector of radius. Will create multiple spheres if len(radius) > 1
 center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
 reps – number of data repetitions
 n_sub – number of subjects to simulate
 output_dir – string path of directory to output data. If None, no data will be written
 **kwargs – Additional keyword arguments to pass to the prediction algorithm

create_data
(levels, sigma, radius=5, center=None, reps=1, output_dir=None)[source]¶ create simulated data with integers
Parameters:  levels – vector of intensities or class labels
 sigma – amount of noise to add
 radius – vector of radius. Will create multiple spheres if len(radius) > 1
 center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
 reps – number of data repetitions useful for trials or subjects
 output_dir – string path of directory to output data. If None, no data will be written
 **kwargs – Additional keyword arguments to pass to the prediction algorithm

create_ncov_data
(cor, cov, sigma, masks=None, reps=1, n_sub=1, output_dir=None)[source]¶ create continuous simulated data with covariance
Parameters:  cor – amount of covariance between each voxel and Y variable (an int or a vector)
 cov – amount of covariance between voxels (an int or a matrix)
 sigma – amount of noise to add
 mask – region(s) where we will have activations (list if more than one)
 reps – number of data repetitions
 n_sub – number of subjects to simulate
 output_dir – string path of directory to output data. If None, no data will be written
 **kwargs – Additional keyword arguments to pass to the prediction algorithm

gaussian
(mu, sigma, i_tot)[source]¶ create a 3D gaussian signal normalized to a given intensity
Parameters:  mu – average value of the gaussian signal (usually set to 0)
 sigma – standard deviation
 i_tot – sum total of activation (numerical integral over the gaussian returns this value)

n_spheres
(radius, center)[source]¶ generate a set of spheres in the brain mask space
Parameters:  radius – vector of radius. Will create multiple spheres if len(radius) > 1
 centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

normal_noise
(mu, sigma)[source]¶ produce a normal noise distribution for all all points in the brain mask
Parameters:  mu – average value of the gaussian signal (usually set to 0)
 sigma – standard deviation
