HyperNiche: What’s New


New Graphs		New Analyses
Graph Enhancements		Data Management and Sampling
Additions to Existing Analyses

Graphs

Slices Plot
The idea of 3D slices is that the surface is cut vertically into some number of parallel slices. The response variable is always plotted on the vertical axis, Z, but the surface is sliced at particular values of X or Y. These slices are then stacked on top of each other and viewed in 2D.

Slicing a response surface is a great way to illustrate interactions among predictors. If the shape of the slices changes with the parallel plane on which the surface is cut, then the model is showing interactions. The model illustrated shows such interactions, in that the shape of the curve depends on where the surface is sliced.

Boxplots (simple 1-way or 2-way grouping)
A boxplot provides a simple graphical representation of the central tendency and spread in a variable. You can create a boxplot for a variable for all cases, or a series of boxplots to compare groups of cases, as defined by a categorical variable in the predictor matrix. HyperNiche allows you to build the boxplots either from percentiles (the classic boxplot) or from standard deviations or standard errors. The diagram illustrated shows the main elements of a boxplot.

Graph Enhancements

Contour lines added. Contour plots use shading or lines or both.
Present multiple graph types on single graph (scatterplot + fitted line).
Increased to 32 colors and symbols
Hide symbols for particular groups: Groups | Hide Categories
Last Graph repeats the previous graph.
Text tool to place new labels anywhere on graphs.
Print Preview with zoom.
Save Graph as GIF with optional transparent background.
Increased Scatterplot Matrix maximum from 10 to 20 variables.
Added Gray Scale option for black and white publications.

Analyses

2 1/2 times faster than previous version!
Bootstrap resampling for confidence intervals or quantiles for measures of fit.
Bootstrapping can be used to estimate confidence intervals for a statistic, usually for a measure of fit (e.g. r^2, logB, or AUC). The basic idea of bootstrap resampling is to estimate the variability in a statistic of interest, based on repeatedly sampling the data. T he data are sampled with replacement. Commonly, people will take a sample of the same size as the data set being sampled. For example, if you have N=100, and you repeatedly take a sample with replacement of 100 items, your statistic of interest will be variable, depending on the content of that sample. The more the statistic varies from sample to sample, the less reliable is your estimate of that statistic.

Variability bands for predictions and graphs based on bootstrap resampling.
In regression modeling, confidence intervals are used to indicate the uncertainty associated with the estimate of the mean at a point on a regression curve for a specific value of a predictor. Likewise, we would like to evaluate uncertainty for estimates from models with two or more predictors.

Measures of uncertainty for nonparametric regression are known to be biased – Bowman and Azzalini (1997, p. 75) refer to this bias as "inevitable." Although various methods can be used to try to correct for the bias, these are difficult to implement. As an alternative, Bowman and Azzalini suggested using "variability bands"—indications of the level of variability in pointwise nonparametric regression estimates, without attempting to correct for bias, and calling these "variability bands," rather than confidence bands. Such variability bands indicate pointwise confidence intervals for the estimated means, rather than confidence intervals for specific values of the response variable that would be observed at that point in the predictor space. The 5th and 95th percentile variability bands indicate that 90% of the time, the mean of a set of new observations should fall within the band.

Validation of bootstrap samples against reserved data set.
If you have reserved some of your data, you can test your bootstrap samples against the reserve – the validation data. HyperNiche will calculate model fit based on the ability of the model to predict data on which the model is NOT based. This provides a measure fit against an independent data set. By making this comparison with bootstrap resampling, you can evaluate the consistency of your model. The best models will be able to reliably predict the response for a validation data set, regardless of the subsample of the data that is used.

Fit Model | Evaluate All
Evaluates a list of models, one at a time, and tabulates results for easy comparison and without attempting improvements. Use Evaluate All Models to re-examine model statistics or generate special kinds of output not provided during Free Search or Screening. Evaluate All is especially useful for comparing different models for the same response variable, or for summarizing or comparing the set of best models for a series of related response variables.

Data Management and Sampling

Modify | Random Sample. Random and stratified random sampling.
Use this option to select a simple random sample or stratified random sample of the rows in your data. You specify the sample size. Sampling can be applied to either matrix separately or both simultaneously. You can sample with or without replacement, according to a checkbox in the Random Sample dialog.

Modify | Delete Rows or Columns filtered by variables in matrix.
Use this option to remove a subset of rows your data, according to the values of a variable in either your response or predictor matrix. You specify the rule for this by specifying a variable, a logical operator (=, <, <=, >=, <>), and a value.

For example, say you have a variable specifying StandAge in the predictor matrix. You wish to select a subset of your data that excludes the stands younger than 100 years. To do this, select StandAge in the picklist of variables, "<" in the Logic box, and 100 in the "Compare Value" box. The consequence of this is: "Delete rows with StandAge < 100"

Import/Export | Excel Simple Spreadsheet. The header data lines are added for you.
The fastest and most reliable way to get spreadsheet data into HyperNiche is to import it from a "simple spreadsheet." This is the most common way to organize spreadsheets of data: sample units are rows, variables are columns, and variable names are column headers. We call this a "simple spreadsheet" because it lacks the additional header lines that HyperNiche uses to document the contents of the spreadsheet. The complete spreadsheet data format also has lines specifying the number and content of the rows and columns, along with a row specifying variable type.

Tools | Find Duplicate Names
Search for duplicate names in either or both matrices. Duplicate names can cause problems with procedures in HyperNiche that use these names as unique identifiers. For example, if you choose a particular column to delete in your predictor matrix, and two columns have the same name, HyperNiche doesn’t "know" which column to delete, so it cannot proceed.

Tools | Go To Cell
Jump to a particular cell in a matrix. This is particularly useful for large matrices that are difficult to navigate by scrolling. After you specify a cell, HyperNiche scrolls in the matrix (if necessary) and outlines the selected cell.

Option to view row numbers and column letters as in Excel.
Select this option if you wish to display the column letters and row numbers that identify the cells in your matrix, as in Excel. These are then displayed in addition to your row and column names.

Automatic detection and repair of bad data values in matrices.
The Matrix Error dialog appears when you open or import a spreadsheet with obvious errors in it. Errors can be empty cells (missing values) or disallowed values in the data part of the matrix. For example, the data in the body of a matrix must all be numeric. Including non-numeric characters will cause the Matrix Error dialog to appear.

Additions to Existing Analyses

Automated settings to control flexibility and parsimony (overfitting controls)
All of the nonparametric modeling methods in HyperNiche share a variety of controls for the flexibility, parsimony, and continuity of the response functions. Understanding and using these controls is essential to effective modeling. In particular, your choice of these options can make a big difference with special problems, such as small data sets or clumped distributions along predictor variables.

The methods for controlling overfitting differ between Nonparametric Multiplicative Regression (NPMR) and Generalized Linear Modeling (GLMs). The most popular overfitting controls for GLMs are the AIC (Akaike Information Criterion) and the BIC (Bayesian Information Criterion) for model selection. The AIC and BIC depend on the number of parameters in a model. Because NPMR models do not have explicit parameters as such, AIC and BIC are not applicable to NPMR models. Instead, use the controls on overfitting provided in HyperNiche (minimum average neighborhood size, minimum data:predictor ratio, and the improvement criterion), as explained below.

To help you select appropriate settings for the task at hand, and to control overfitting, HyperNiche provides three automated settings for the Overfitting Controls: (1) "Conservative," "Medium," and "Aggressive" for the Improvement criterion, (2) Data:predictor ratio minimum, and (3) Minimum average neighborhood size. Use the conservative settings if you wish to use relatively stiff curves, the aggressive settings if seek a more flexible fit to the data, and medium settings if you wish something in between. The default is "medium," a setting that is reasonable for many data sets. Select custom if you wish to set manually each of the three controls. For more see NPMR Introduction pdf

Smarter default for "Minimum neighborhood size for estimate."
Formerly, for the purpose of graphing, the default was a constant (1). In version 2 the default is based 0.5 x N*, where N* is the average neighborhood size from the model fitting phase for the selected model.