|
|
|
|
|
Graphs |
|
- Slices Plot
The idea of 3D slices is that the surface is cut vertically into some number of parallel
slices. The response variable is always plotted on the vertical axis, Z, but the
surface is sliced at particular values of X or Y. These slices are then stacked on
top of each other and viewed in 2D.
Slicing a response surface is a great way to illustrate interactions among predictors.
If the shape of the slices changes with the parallel plane on which the surface is
cut, then the model is showing interactions. The model
illustrated shows such interactions, in that the shape of the curve depends on where the
surface is sliced.
- Boxplots (simple 1-way or 2-way
grouping)
A boxplot provides a simple graphical representation of the central tendency and spread in
a variable. You can create a boxplot for a variable for all cases, or a series of
boxplots to compare groups of cases, as defined by a categorical variable in the predictor
matrix. HyperNiche allows you to build the boxplots either from percentiles (the
classic boxplot) or from standard deviations or standard errors. The diagram illustrated shows the main elements
of a boxplot.
|
|
Graph Enhancements |
|
- Contour lines added. Contour plots
use shading or lines or both.
- Present multiple graph types on single graph
(scatterplot + fitted line).
- Increased to 32 colors and symbols
- Hide symbols for particular groups: Groups | Hide Categories
- Last Graph repeats the previous graph.
- Text tool to place new labels anywhere on graphs.
- Print Preview with zoom.
- Save Graph as GIF with optional transparent background.
- Increased Scatterplot Matrix maximum from 10 to 20 variables.
- Added Gray Scale option for black and
white publications.
|
|
Analyses |
|
- 2 1/2 times faster than previous version!
- Bootstrap resampling for confidence intervals or
quantiles for measures of fit.
Bootstrapping can be used to estimate confidence intervals for a statistic, usually for a
measure of fit (e.g. r^2, logB, or AUC). The basic idea of bootstrap resampling is
to estimate the variability in a statistic of interest, based on repeatedly sampling the
data. T he data are sampled with replacement. Commonly, people will take a sample of
the same size as the data set being sampled. For example, if you have N=100, and you
repeatedly take a sample with replacement of 100 items, your statistic of interest will be
variable, depending on the content of that sample. The more the statistic varies
from sample to sample, the less reliable is your estimate of that statistic.
- Variability bands for predictions and graphs based on
bootstrap resampling.
In regression modeling, confidence intervals are used to indicate the uncertainty
associated with the estimate of the mean at a point on a regression curve for a specific
value of a predictor. Likewise, we would like to evaluate uncertainty for estimates
from models with two or more predictors.
Measures of uncertainty for nonparametric regression are known to be biased Bowman and Azzalini (1997, p. 75) refer to
this bias as "inevitable." Although various methods can be used to try to
correct for the bias, these are difficult to implement. As an alternative, Bowman
and Azzalini suggested using "variability bands"indications of the level
of variability in pointwise nonparametric regression estimates, without attempting to
correct for bias, and calling these "variability bands," rather than confidence
bands. Such variability bands indicate pointwise confidence intervals for the
estimated means, rather than confidence intervals for specific values of the
response variable that would be observed at that point in the predictor space. The
5th and 95th percentile variability bands indicate that 90% of the time, the mean of a set
of new observations should fall within the band.
- Validation of bootstrap samples against reserved data
set.
If you have reserved some of your data, you can test your bootstrap samples against the
reserve the validation data. HyperNiche will calculate model fit based on the
ability of the model to predict data on which the model is NOT based. This provides
a measure fit against an independent data set. By making this comparison with
bootstrap resampling, you can evaluate the consistency of your model. The best
models will be able to reliably predict the response for a validation data set, regardless
of the subsample of the data that is used.
- Fit Model | Evaluate All
Evaluates a list of models, one at a time, and tabulates results for easy comparison and
without attempting improvements. Use Evaluate All Models to re-examine
model statistics or generate special kinds of output not provided during Free Search
or Screening. Evaluate All is especially useful for comparing different
models for the same response variable, or for summarizing or comparing the set of best
models for a series of related response variables.
|
|
Data Management and Sampling |
|
- Modify | Random Sample. Random and
stratified random sampling.
Use this option to select a simple random sample or stratified random sample of the rows
in your data. You specify the sample size. Sampling can be applied to either
matrix separately or both simultaneously. You can sample with or without
replacement, according to a checkbox in the Random Sample dialog.
- Modify | Delete Rows or Columns filtered by
variables in matrix.
Use this option to remove a subset of rows your data, according to the values of a
variable in either your response or predictor matrix. You specify the rule for this
by specifying a variable, a logical operator (=, <, <=, >=, <>), and a
value.
For example, say you have a variable specifying StandAge in the predictor matrix.
You wish to select a subset of your data that excludes the stands younger than 100
years. To do this, select StandAge in the picklist of variables, "<" in
the Logic box, and 100 in the "Compare Value" box. The consequence of this
is: "Delete rows with StandAge < 100"
- Import/Export | Excel Simple Spreadsheet.
The header data lines are added for you.
The fastest and most reliable way to get spreadsheet data into HyperNiche is to import it
from a "simple spreadsheet." This is the most common way to organize
spreadsheets of data: sample units are rows, variables are columns, and variable names are
column headers. We call this a "simple spreadsheet" because it lacks the
additional header lines that HyperNiche uses to document the contents of the spreadsheet.
The complete spreadsheet data format also has lines specifying the number and
content of the rows and columns, along with a row specifying variable type.
- Tools | Find Duplicate Names
Search for duplicate names in either or both matrices. Duplicate names can
cause problems with procedures in HyperNiche that use these names as unique identifiers.
For example, if you choose a particular column to delete in your predictor matrix,
and two columns have the same name, HyperNiche doesnt "know" which column
to delete, so it cannot proceed.
- Tools | Go To Cell
Jump to a particular cell in a matrix. This is particularly useful for large
matrices that are difficult to navigate by scrolling. After you specify a cell,
HyperNiche scrolls in the matrix (if necessary) and outlines the selected cell.
- Option to view row numbers and column letters as in
Excel.
Select this option if you wish to display the column letters and row numbers that identify
the cells in your matrix, as in Excel. These are then displayed in addition to your
row and column names.
- Automatic detection and repair of bad data values in
matrices.
The Matrix Error dialog appears when you open or import a spreadsheet with obvious errors
in it. Errors can be empty cells (missing values) or disallowed values in the data
part of the matrix. For example, the data in the body of a matrix must all be
numeric. Including non-numeric characters will cause the Matrix Error dialog to
appear.
|
|
Additions to Existing Analyses |
|
- Automated settings to control flexibility and parsimony
(overfitting controls)
All of the nonparametric modeling methods in HyperNiche share a variety of controls for
the flexibility, parsimony, and continuity of the response functions. Understanding
and using these controls is essential to effective modeling. In particular, your
choice of these options can make a big difference with special problems, such as small
data sets or clumped distributions along predictor variables.
The methods for controlling overfitting differ between Nonparametric Multiplicative Regression (NPMR) and Generalized Linear
Modeling (GLMs). The most popular overfitting controls for GLMs are the AIC (Akaike
Information Criterion) and the BIC (Bayesian Information Criterion) for model selection.
The AIC and BIC depend on the number of parameters in a model. Because NPMR
models do not have explicit parameters as such, AIC and BIC are not applicable to NPMR
models. Instead, use the controls on overfitting provided in HyperNiche (minimum
average neighborhood size, minimum data:predictor ratio, and the improvement criterion),
as explained below.
To help you select appropriate settings for the task at hand, and to control overfitting,
HyperNiche provides three automated settings for the Overfitting Controls: (1)
"Conservative," "Medium," and "Aggressive" for the
Improvement criterion, (2) Data:predictor ratio minimum, and (3) Minimum average
neighborhood size. Use the conservative settings if you wish to use relatively stiff
curves, the aggressive settings if seek a more flexible fit to the data, and medium
settings if you wish something in between. The default is "medium," a
setting that is reasonable for many data sets. Select custom if you wish to set
manually each of the three controls. For more see NPMR Introduction pdf
- Smarter default for "Minimum neighborhood size for
estimate."
Formerly, for the purpose of graphing, the default was a constant (1). In version 2 the
default is based 0.5 x N*, where N* is the average neighborhood size from the model
fitting phase for the selected model.
|
|
|
|