I have been working on this software CMAT since 1995 when my wife
went for work to New Orleans and I felt lonely in Raleigh NC.
Here
is a small summary of the history, content, and features of CMAT.
Since September 28, 2016, there is a new version of CMAT ready
for download from the web. This is probably the most bug and error
free version of the last five releases. There are maybe still a
very few minor things I could fix, maybe at the end of this year 2016.
Due to another and more urgent project, I must have a break in coding CMAT
starting Oktober 2016. For a 73 year old it will not be easy to
continue the CMAT coding in summer of 2017.
There are a number of new
papers
available describing some outlier analysis of the voting behavior
during the last elections in Germany, Austria, and Switzerland.
- Structural Equation Modeling
Some Guidelines
- with our automatic modeling function we are creating excellent
CFA (confirmatory factor analysis) models for your data;
(at no charge if we are not able to find models with p larger than 0.01)
- structural equation modeling (SEM) for metric and ordinal data
(multiple sample analysis)
CFA model improvement algorithm
- Linear and Nonlinear Optimization:
using CMAT and Matlab (Optimization Toolbox)
- Linear and Nonlinear Statistics:
- dimension reduction and variable selection
- normalization and analysis of microarray data
(using Bioconductor)
- structural equation modeling (SEM), IRT, factor analysis, and PLS
- Data Mining with CMAT, R, and SAS Enterprise Miner (v. 9.1.3)
- Programming in SAS: DATA Step, IML, STAT, ETS, OR, and EM
(using SAS software version 9.1.3)
- Programming CMAT, Matlab, R, C, and Fortran
- Release 1: 1996 (Copyright February 1997)
- Release 2: December 1999
- Release 3: December 2002
- Release 4: July 2007
- Release 5: January 2009
- Release 6: November 2011:
I'm now developing with MS Visual Studio 2010 (C/C++) and
Intel Parallel Studio (Fortran 90)
- Release 7: December 2013
- Release 8: December 2015
- Release 9: September 2016
I'm still looking for some people who want to work with me on this,
especially:
- Somebody who does some testing of the language and functions.
Enters bug reports. Would need to have some Math and Stat background.
- Somebody who does some marketing. Would need to know about competing
software in Math and Stat (Octave, Matlab, and SAS IML).
Wolfgang defines WCU as the ratio among the number of people who actually
use a software (i.e. would be even willing to pay for the use) and the
number of people who developed that software.
Now, since CMAT has only one developer, the smallest numbers of WCU
are 0 and 1, and compared to the high values of WCU for SPSS and SAS
(not even thinking of Google and Facebook) CMAT seems to be rather bad.
However, quite a number of packages of R don't do much better than CMAT:-)
Please read this license carefully before using the software.
By installing or using this software, you are agreeing to be
bound by the terms of this license. If you do not agree to
the terms of this license, either contact the author or
promptly remove the software.
- This software may not be distributed to third parties.
The free version may only be used for non-profit research and teaching.
- Using the software for commercial applications or for profit
needs a specific license agreement.
- Supply of any part of this software as part of another
software requires the separate prior written agreement of the
CMAT managers, which may include financial terms.
- This software is copyrighted and may not be modified,
decompiled, reverse engineered, or disassembled.
- Due acknowledgment shall be made of the use of CMAT
in research reports or publications.
"everything free comes without guarantee".
Patience is expected. The author is grateful for responses by users such as bug
reports or proposals for improvement.
At this time there is only a Windows version. A Linux version would be more
appropriate and will be out shortly.
The software and manual of CMAT are offered “AS IS” and without warranties as to
performance or merchantability. The seller’s and/or redistributors may have made
statements about this software. Any such statements do not constitute warranties
and shall not be relied on by the user in deciding whether to use this program.
This program is offered without any express or implied warranties whatsoever,
because of the diversity of conditions and hardware under which this program may
be used, no warranty of fitness for a particular purpose is offered.
The users of this software are advised to test the program thoroughly
before relying on it. The users must assume the entire risk of using the program;
any liability of seller, provider or manufacturer will be limited exclusively
to product replacement.
In no event shall the maker or distributor of CMAT be liable for any loss of profit
or any other commercial damage, including but not limited to special, incidental,
consequential or other damages in the use, installation and
application of CMAT.
There are two ways of installing CMAT: either you copy the
most important files from the download site or copy the complete
directory structure from a DVD distributed by the developer.
The files from this website represent the most recent but not
completely tested version whereas the files on DVD correspond
to an earlier, but in general more stable release.
The download files are zipped (using the 7z software with the -tzip
option so it can be unzipped using the common ZIP software) but
needs a password for unzipping which can be obtained from the author
on demand (email, LinkedIn).
The following directory structure is highly recommended:
- Download and unzip the file (about 6 MB):
cmat_util.zip
creating the directory C:/cmat_util containing the
directory gnuplot and the files for using 7z which is used
by the CMAT function system().
- Create directory C:/cmat.
- Download into cmat and unzip the file (about 16 MB)
cmat_com.zip
creating the subdirectory cmat/com
which contains the executable cmat.exe,
a number of DLLs, and the message files for the output.
- Download into cmat and unzip the file (about 62 MB)
cmat_test.zip
creating the following subdirectories:
- cmat/test with general test examples
- cmat/tgplt with test examples for plotting
- cmat/tmicro with test examples for micro array data
- cmat/tnlp with test examples for optimization (LP and NLP)
- cmat/tsem with test examples for structural equation
modeling, factor analysis, rotation, and IRT modeling
- Download into cmat and unzip the file (about 24 MB)
cmat_data.zip
creating the directory cmat/data which contains
the following subdirectories:
- mytst is almost empty and would be some good
place for users work
- tdata contains a large number of data sets, many of
them used from the test examples. All files have the extension .dat
- Download into cmat and unzip the file (about 430 MB!)
cmat_save.zip
creating the cmat/save which contains a number of .dob files
generated using the obj2fil() function which can be read by
the fil2obj() function. These files are used by some
applications with very large data sets. This directory is only
needed if you use the functions obj2fil() and fil2obj()
in your CMAT input.
- Download into cmat and unzip the file (about 14 MB)
cmat_doc.zip
creating the directory cmat/doc which contains a number
of pdf files documenting CMAT, including all Newsletters
(see also below). The files for this directory can also be
accessed directly on the wcmat website.
When e.g. EMACS is installed, CMAT can now be used in command line or
batch mode in Windows, preferably in C:/cmat/mytst.
Only when using the Tcl/TK graphical interface, for online documentation
Adobe Reader must be installed
in a directory Reader (containing files AcroRd32.exe, AcroRd32.dll etc.)
preferably at the same hierarchy level as the cmat directory.
Then the help("string") function can be used to open the reference
manual at the specified term. You may move easily between the more than
400 terms (bookmarks) of the CMAT Reference Manual by clicking the
many more hyperlinks. The manual can also be opened at a specific page
or for searching a specified term.
The Tcl/TK GUI can be used also for accessing the reference manual.
For graphical output (plotting) CMAT has an interface to the gnuplot
software. In CMAT you may connect to gnuplot either interactively or
in batch mode (running scripts).
Most Linux distributions have gnuplot included. For Windows and other OS
gnuplot can be downloaded from the internet free of charge, see for example
here
(There is also a demo gallery.) For some "terminal" output of
gnuplot (like SVG, EPSLATEX, and PDF) you may have to download
some additional software (SVGViewer, graphixs, etc).
An excellent book about gnuplot is:
Philip K. Janert, "Gnuplot in Action",
Greenwich CT: Manning Publications Co., 2009
Toshihiko Kawano has his "not-so-freqently asked questions"
here
and here something about
gnuplot tricks.
CMAT is a scripting language like Matlab or R or SAS/IML.
CMAT can be run either in batch mode or interactively with
command line input. In MS Windows this can be done either
- with one of the two available graphical interfaces,
one in Windows and one in Unix style,
- or either by calling cmat.exe in a DOS Command window,
or using Emacs (and evtl. BASH) in Unix style.
See remarks in the CMAT Tutorial document.
It is best to run CMAT (either interactively or in Batch mode)
from the mytst, test, tnlp, or tsem directory.
The test, tnlp, and tsem directories contain a large number
of batch example files all ending with the extension .inp
together with the .log and .txt output files. When running
those examples, e.g.
cmat tode.inp
you should obtain the same results, but the log file will show
the actual date of the execution. Note, that the CMAT input must always be
started with a { bracelet for the start of a compound statement.
At least one statement (maybe an empty one) must be run before the
script is closed by a mstching } bracelet.
You will find more information about downloading and running CMAT
in Windows, Linux, and Unix at the download site.
Bugs are usually fixed when they are found. However, patience is expected:
If you are aware of any problems with CMAT please contact the developer.
- There are some problems with the import() and export()
functions. They will probably be fixed during the next few months.
- There is a problem with the con (LPasL1) version of
the lp() function. The pcx and clp versions
can be used instead.
The following are .PDF files which are available for download
at the download site. Only the Tutorial and the
Summary Manual can be downloaded from here.
Use the right mouse button for downloading files.
-
Complete User's Manual (about 18 mb, close to 3000 pages):
- User Software License Agreement
- Introduction
- Installing and Running CMAT
- Restrictions and Comparisons
- Tutorial: Basic Elements of the CMAT Language
- Summary of Operators, Keywords, and Functions
- Reference Guide
- Some Details
- The Bibliography
-
CMAT Reference Manual(about 14 mb, more than 2000 pages):
- Reference Guide
- The Bibliography
-
CMAT Tutorial (about 200 pages):
- Introduction
- Tutorial: Basic Elements of the CMAT Language
- Summary of Operators, Keywords, and Functions
- The Bibliography
- CMAT Details and Examples:
- Details
- The Bibliography
-
CMAT Summary Manual (about 110 pages):
- Introduction
- Summary of Operators, Keywords, and Functions
- The Bibliography
Here some
short guidelines
about how to use the hyperref package in LaTeX.
Please note that the developments reported in the last posted newsletter
must not be implemented in the last posted software version on the net or
are not much tested and may not work with the software posted at the site.
- Scalars: (long) int, (double) real, (double) complex, string
- Vectors (dense and sparse) for all data types, even mixed
- Matrices (dense and sparse) for all data types, even mixed
and for some specific matrix types (diagonal, band, symmetric,
triangular)
- Tensors (dense and sparse) for all data types, even mixed
- Lists, where each entry can be scalar, vector, matrix, tensor,
list, or struct; entries are referred to by index
- Structures (since end of 2015), where each entry can be scalar,
vector, matrix, tensor, list, or struct; entries are referred
to by compound name
- KD Trees (not completely finished, internally only)
- An Important Language Extension:
This two page paper describes a new form of
matrix literal
permitting the input of matrices containing string data without quotes.
(Note, for using that unquoted form of string input, matrix literals
may not contain white space inside.)
- Data Objects in CMAT:
This paper sketches some aspects of the
data objects
implemented in CMAT.
- Tensor and List Operations in CMAT:
Many matrix operations have been extended to tensors. However, this
paper sketches also some more specific and additional operations for
tensors and data lists.
The download site also contains a number of technical reports
illustrating applications of CMAT:
- Semiannual Newsletters Starting 2003
The
CMAT Newsletters
report about the progress in the development and illustrate
some applications of CMAT.
- On the Use of Matrix Language:
This small
paper
illustrates the difference between educational and efficient programming.
Note, CMAT almost always knows when a matrix is symmetric and takes
advantage of this. Also, identity matrices are stored as diagonal matrices.
Sparsity in matrices and vectors is detected automatically.
Such examples are often found in statistics.
- CMAT Code for some Matlab Programs by
Olvi Mangasarian and Helen Zhang.
- Presentation at DAGStat Conference, Bielefeld, March 2007:
- Variable Selection Algorithm for Micro Array Data:
The analysis of gene expression data is currently a very challenging task.
However this paper shows that we can use CMAT to find a very small number
of genes from 22283 genes of an Affymetrix chip which yields an
exact classification of two kinds of cancer.
- On the new
CFA model improvement algorithm
- On the difference of p values for
exact logistic regression
computed by SAS PROC LOGISTIC, elrm in R, and CMAT.
CMAT was first written for Unix and later for Windows.
- Versions for Mac, Linux, and Unix (should be easy since C code
is portable, and lex and yacc are native in Unix)
- Dynamic binding of C and Fortran users code.
- Extending preprocessor commands.
- Newsletter January 2003
- Faster matrix concatenation
- Reading and Writing of Matlab version 5 .mat Files
- ISOREG(): Isotone Regression (PAVA: Pool Adjacent Violators Algorithm,
Optimal Scaling)
- VARSEL(): Single and multiple response variable selection:
Forward, backward, and stepwise selection.
All subset combinations or randomly generated samples.
- Newsletter March 2003
- SIGN2(a,b) and SIGN4(a1,a2,a3,b) signum functions
- CANCOR(): Canonical Correlation Analysis
- MBURG(): Modified Burg algorithm for one- and twodimensional time series
- Newsletter May 2003
- GLIM(): fixed some bugs and added some observationwise stat and ROC curve
- GLMIXD(): fixed lots of bugs, added type 1 and type 3 estimates,
and added some observationwise stat and ROC curve
- CDF23(): 2 and 3 dimensional quadrature of normal and t distribution
- PROMEP(): experimental design
- NOHARM(): factor analysis for dichotomous (0,1) data with robust
(nonnormal) GOF and ASEs
- FACTOR(): exploratory factor analysis with robust (nonnormal) GOF and ASEs
- Newsletter July 2003
- SEM(): robust asympt. standard errors and robust Satorra-Bentler Chisquare
Jackknife for identifying model outliers ("misfits")
- SVM(): automatic parameter tuning and two new methods (NSVM and FSM)
block, split, and random cross validation
Jackknife for identifying model outliers ("misfits")
- Newsletter September 2003
- POLYCHOR(): polychoric correlation matrices and their covariance matrix
- ODE(): two new algorithms
- TRI2VEC(): move triangular (lower or upper) part of matrix to vector
- GENEREAD(): reading microarray data
- Newsletter November 2003
- KSTEST(): new features for Kolmogorov-Smirnov test
- KSPROB(): compute prob of Kolmogorov CDF
- HOTELL(): compute classic and robust one-sample Hotellings Test
with confidence intervals
- PLS(): partial least squares (PLS) and principal components regression (PCR)
block, split, and random cross validation
- SVDTRIP(): compute singular triplets(U,D,V) for large and/or sparse matrices
- Newsletter January 2004
- SIR(): sliced inverse regression for dimension reduction
(including principal Hessian directions)
- Newsletter June 2004
- PLS(): randomization test for number of components added
- SVM(): new feature selection methods added
- GAROTTE(): feature selection algorithm by Breiman (1993)
- LARS(): group of feature selection methods including the following:
LARS: Least Angle Regression (Efron, Hastie, Johnstone & Tibshirani, 2002)
Lasso: Tibshirani (1996), Osborne, Presnell & Turlach (2000),
Foreward Stagewise: (Efron, Hastie, Johnstone & Tibshirani, 2002)
Ridge Regression
Elastic Net (Zou & Hastie; 2003)
Univariate Soft Thresholding (Donoho et al., 1995)
- NLKPCA(): nonlinear kernel PCA (Schoelkopf, ; Rosipal & Trejo; 2001))
- NLKPLS(): nonlinear kernel PLS (Bennett & Embrechts; 2003)
- Newsletter December 2004
- REG(): Jackknifing for outlier (misfit) detection added
- FROTATE(): many new rotation methods added (Bernaards & Jennrich, 2004)
- LRFORW(): linear LS forward selection method for very many variables
(no need for storing the large X'X matrix)
- RANDISC(): some generators for discrete random variates
(Marsaglia, Tsang, & Wang, 2004)
- SCREETST(): methods for testing the significant number of
eigen values of X'X or covariance matrices
or singular values of rectangular matrices
- SCALPHA(): computes the sample coefficient by Cronbach (1951)
with asymptotic standard error and confidence interval
- SPLIT(): simple CART algorithm for binary response
with Chisquare split criterion
- VARCLUS(): variable cluster algorithm for very many variables
(similar to SAS PROC VARCLUS, but without the need to store
the large X'X or covariance matrix)
- Newsletter July 2005
- PCA(): implements eight different algorithms of principal
component analysis including asymptotic standard errors
and confidence intervals for unrotated and rotated
component loadings (normal theory analytic and bootstrap)
- FACTOR(): asymptotic standard errors and confidence intervals
for orthogonal and obliquely rotated factor solutions were added
(normal theory analytic and bootstrap)
- NOHARM(): entire suite of rotation algorithms are added;
asymptotic standard errors and confidence intervals
for orthogonal and obliquely rotated factor solutions were added
(normal theory analytic and bootstrap)
- CENTROID(): implements methods for classical centroid decomposition
- GENEREAD(): for comma (or otherwise) separated data set input
- HISTOGRM(): implements computation of histogram frequencies
- IMPUTE(): implements various methods for missing value imputation
- NNMF(): implements algorithm for nonnegative matrix factorization
- PERMUTE(): obtains permutations and combinations (all or stepwise)
- QUANTILE(): compute quantiles
- SIMDID(): compute some (not so great) similarity and distance measures
- SVDUPD(): rank-k update of the svd of a matrix
- Newsletter December 2005
- GLIM(): multinomial Logit model is added
- ANACOR(): correspondence analysis of frequency tables (Gifi, 1990)
- ANAPROF(): correspondence analysis of profile data (Gifi, 1990)
- PRINCALS(): principal component analysis for categorical data (Gifi, 1990)
- CNDCOV(): conditional covariariance matrices (time series)
- Newsletter July 2006
- CANALS(): canonical correlation analysis of two sets
of variables (Gifi, 1990)
- CUCLCR(): cubic cluster criterion and R2 for
given cluster decomposition (Sarle, 1983)
- DEMREG(): univariate Deming regression
- DIXONR(): pdf, cdf and critical values for Dixon's r
- FICA(): (Fast) Independent Component Analysis
- HISTPLOT(): plotting a histogram
- HOMALS(): homogeneity analysis of categorical data (Gifi, 1990)
- ITA(): classical and inductive item tree analysis (Schrepp, 2006)
- OVERALS(): canonical correlation analysis of more than
two sets of variables (Gifi, 1990)
- PRIMALS(): one-dimensional homogeneity analysis of
categorical data (Gifi, 1990)
- SDD(): SemiDiscrete Decomposition (Kolda & O'Leary, 1999)
- SGMANOVA(): multivariate analysis of variance based on
spatial signs signs (robust estimation method)
- XYPLOT(): plotting a X-Y diagram
- ZOVERW(): probability and density of the normal distributed
ratio z/w of normal distributed variables z and w (Marsaglia, 2006)
- Newsletter December 2006
- BYTE(): transfers integers into characters using the ASCI table
- BRANKS(): computes tied and bivariate ranks
- COVLAG(): computes autocovariance estimates for a vector time series
- GEE(): generalized estimation of equations (Liang and Zeeger)
- RANKTIE(): averaging tie ranking of entries of a vector
- ROCCOMP(): test the equality of areas under ROC curve (DeLong et al, 1988)
- SCAD(): Smoothly Clipped Absolute Deviations (Fan \& Li, 2002)
for LS regression, PH regression, and SVM
- SMSVM(): (structured) multicategory SVM (Lee, LIN, and Wahba, 2003)
- TOEPLITZ(): generates (block) Toeplitz matrix
- Newsletter July 2007
- Language extension for multidimensional arrays and
implementing tensor operations
- Language extension for lists of data objects
- CONST(): multidimensional extension of CONS()
- COVSHRK(): shrinking the covariance matrix for outliers in data
- DIM(): returning sizes of dimension of data objects (vectors, matrices, tensors)
- DIMLABEL(): assigning or pulling labels from tensor dimensions
- DIMNAME(): assigning or pulling names from tensor dimensions
- LOC(): returning index locations of specific data entries
- MAT2TEN(): create tensor from list of matrices
- RANDT(): multidimensional extension of RAND()
- TEN2MAT(): create list of matrices from tensor
- TEN2VEC(): move entries of tensor into data vector
- VEC2TEN(): move entries of data vector into tensor
- TENPERM(): reorder (permute) tensor dimensions
- TENTVEC(): multiply tensor with vector or list of vectors
- TENTMAT(): multiply tensor with matrix or list of matrices
- TENTTEN(): multiply tensor with tensor
- Extension to MAX() and MIN() functions
- Extension to SCAD function
- Extension to SMSVM() function (Lee, Lin, and Wahba, 2003)
- Extension to SVM() function
- Two types of index processing in matrices
- Fixing bugs for new release 4
- Newsletter December 2007
- DIM(): returns dimensionality of data object
- IRTML(): maximum likelihood IRT (item response theory)
- IRTMS(): Mokken scale IRT
- MDS(): multidimensional scaling and unfolding
- SETDIFF(): set difference of two data objects
- SETISECT(): set intersection of two data objects
- SETMEMBR(): returns binary membership of the entries of one object in another one
- SETUNION(): set union of two data objects
- SETXOR(): set XOR (eXclusive OR) of two data objects
- SIZE(): returns size in dimensions of data object
- SORTROW(): sorts rows of a data matrix w.r.t to sorting key
- UNIQUE(): returns the the unique entries of a data object
- VEC2TRI(): moves vector into compact triangular matrix
- Newsletter July 2008
- CODAPP(): applying the complete orthogonal decomposition
- HBADDTST(), HBANOVA(), HBBARTLETT(), HBCOVAR(), HBDISCRIM(),
HBLRG(), HBLTST(), HBRCMP()
- SELC():
- URD1OUT():
- Extension of MIN() and MAX() functions for more than two arguments
- Extension of IRTML(): input probability data and R1 measure
- extensions for COD(), MDS, and ODE() functions
- Extension of NLP() and NLE(): more return arguments
- Extension of NLP(): grid search
- Extension of NLP(): new conjuage gradient techniques: Birgin-Martinez
and scaled PR and FR
- Many test examples for NLP() in Part II
- Newsletter December 2008
- Extension of NLP(): UOBYQA and NEWUOA algorithms
- Extension of NLP(): Nonsmooth BT algorithms
- LOG2(): logarithm w.r.t. base 2
- ENCRYPT() and DECRYPT(): for encrypting files and directories
- LOCATN(): algorithms for the optimal location assignment problem
(greedy algorithm and Langrangean relaxation)
- NLFIT(): data mining using stagewise nonlinear regression
using sets of activation and link functions
- NLFITPRD(): scoring a data set using the model from NLFIT()
- Some algorithms for normalizing microarray data
- Comparing the performance of random generators for normal distribution
- Application of the location assignment for matching the
performance of an index fund
- Testing CMAT for release 5 in January 2009
- Newsletter July 2009
- Fixed bugs in MRAND()
- Extension to the CMAT language:
Adding k-dimensional trees to the set of data objects:
- KDTCRT(): create k-dimensional tree from data matrix
- KDTNEA(): obtain nearest neighbor nodes of kD tree
- KDTRNG(): obtain nodes of kD tree inside ball with specified radius
- Extensions to AFFVSN():
- Extensions to GLIM(): Hosmer and Lemeshow Test
- Extensions to CLUSTER(): more and better returns, some plotting
- Extensions to TOEPLITZ(): Levinson, Trench, and Durbin algorithm added
for solving the Yule-Walker equations
- Extensions to UNIVAR(): many more location and scale measures
- AFFRMA(): Robust Multichip Average (Bolstad et.al, 2003) (not finished yet)
- ARIMA(): is still in the works
- ARMCOV(): modified covariance method for liner time series prediction
- BURG(): compute moving average whitening filter using method by Burg (1968)
- LDP(): linear distance programming (Lawson and Hanson, 1995)
- LOESS(): multivariate robust locally weighted regression (Cleveland, Grosse, and Shyu, 1992)
- LOWES(): univariate robust locally weighted regression (Cleveland, 1979)
- MEMPSD(): compute power spectrum of autoregressive filter
- POLYFIT(): fitting the polynomial model
- POLYVAL(): evaluating the polynomial model
- PPPD(): compute percentage points of Pearson distribution
- PWELCH(): compute power spectrum by periodogram method (Welch, 1967)
- SAMPLE(): equal and unequal probability sampling with or without replacement
- SORTP(): partial sorting for quantile
- STAND(): columnwise standardization of numeric matrix wrt. location and scale
- TSLOCFOR(): forecasting zero or first order local model
- TSLOCTST(): error testing of zero or first order local model
- TSMEAS(): for a large variety of time series measurements
- TSTRANS(): for a variety of time series data transformations,
surrogates, and filters
- X11(): seasonal smoothing of monthly or quarterly time series data
- Some functions for combinatorics:
- COMBN(): generate all combinations of m elements taken n at the time
- DMNOM(): density of multinomial distribution
- HCUBE(): generate all points on hypercube lattice
- RMULT(): random generator for multinomial distribution (similar to MRAND())
- NSIMPLEX(): get number of points on (p,n) simplex
- XSIMPLEX(): generate all points on (p,n) simplex
- Illustrating Hosmer and Lemeshow Test
- Newsletter December 2009
- Extensions to NLP(): new option for nonlinear constraints
and new methods (e.g. Simulated Annealing, subgradient methods)
- Extensions to TSMEAS(): new functions added:
- Partial autocorrelations with robust asymptotic standard errors
- Ljung-Box test for serial correlation
- (Robust) LM test for serial correlation
- Newey-West covariance matrix
- OLS Regression with Newey-West (HAC) as. standard errors
- (Augmented) Dickey-Fuller testing
- Granger causality testing (Likelihood-Ratio, LM, and Wald inference)
- Vector AR modeling (homo- and heteroskedastic, correlated and uncorrelated errors)
- Impulse response modeling for specified lead
(homo- and heteroskedastic, correlated and uncorrelated erros)
- Extensions to TSTRANS(): new transformations added :
- Box-Cox transform
- Lag and Log transform
- Baxter-King filtering
- Hodrick-Prescott filtering
- Extensions to KSTEST(): many new distributions added, additional returns
- BERKOW(): Berkowitz testing for time series or cross sectional data for distributions like KSTEST()
- ARMA(): ML estimation of the AutoRegressive Moving Average model
- ARMAFORE(): Forecasting using ARMA model estimates
- ARHETERO(): Heterogeneous AR model estimation
- GARCH(): ARCH, GARCH, TARCH, AVARCH, ZARCH, APARCH, EGARCH, AGARCH,
NAGARCH, IGARCH, FIGARCH model estimation
- JARBERA(): Jarque-Bera test for normal distribution
- MUCOMP(): comparing different hypotheses (model restrictions)
for linearly constrained ("confirmatory") ANOVA (Kuiper, Klugkist, & Hojtink)
- SHAPWILK(): Shapiro-Wilk test for normal distribution
- Newsletter July 2010
- Extending val = MAX(a,...) to < val,ind > = MAX(a,...) and the same for MIN()
- Extending NNMF() for symmetric nonnegative matrix factorization, C=HH' and C=HSH'
- Extending NNMF() for (left, right, and bi-) orthogonal nonnegative matrix factorization, C=UH' and C=USH'
- Extending NNMF() for orthogonal symmetric nonnegative matrix factorization, C=HH' and C=HSH'
- Creating a large number of links for the Reference Manual
- The gnuplot ... gpend syntax for interactive gnuplot input
- BORUTA(): Variable selection algorithm (wrapper of Random Forest; Kursa & Rudnicki)
- GPBATCH(): Running gnuplot input scripts in batch mode
- HELP(): Opening the Reference Manual at specific bookmark terms
- < val,ind > = MAXN(a,n) and < val,ind > = MAXN(a,n) for the n largest resp.
smallest values of a
- RANFOR(): Random Forest algorithm for Classification and Regression (Breiman)
- RAFPRD(): Scoring the Random Forest model for Classification and Regression
- PROPURS(): Projected Pursuit PCA (Friedman & Tukey)
- SURVCURV(): Survival curves: Adjusted for Cox PH and Aalen's Model (Zhang et.al, 2007)
- SURVREG(): Survival regression: Cox proportional Hazards Model, Aalen's additive model, GLIM models
(extreme, logistic, Gaussian; Weibull, loglog, lognormal, exponential, Rayleigh)
- SYSTEM(): Execute shell commands, save output in string data
- SPAWN(): Execute child process (with or without reentry or running concurrently)
- ZIP7(): (Encrypted) Compressing and decompressing using the 7-zip program
- Newsletter December 2010
- Modifying LRFORW(): options matrix input argument
- Extending LOC(): for indices of missing values
- Extending NLREG(): permitting simple boundary, linear, and nonlinear constraints and specifying derivatives
- AUROC(): area under the ROC curve with asymptotic standard errors
- DELTA(): Delta method for computing asymptotic standard errors
- HBTTEST(): various forms of t test
- LRALLV(): all variables subsets regression (full enumeration and stochastic search)
- NOBLANKS(): removing leading and trailing blanks in string data
- ORDER(): hierarchical ranking in tied obsservations (similar to R function)
- SMP(): stochastic matching pursuit and componentwise Gibbs sampler for variable selection (Chen et.al)
- SURVCURV(): Survival curves: common types: Kaplan-Meier, Fleming-Harrington, Tsiatis, Aalen, Kalbfleisch-Prentice, Greenwood etc.
- SURVFOR(): Survival forest (not finished yet)
- SURVPRD(): Survival regression test set scoring (prediction and residuals)
- Newsletter July 2011
- BIDIMREG(): Bidimensional Regression between 2-dimensional configurations (Tobler, 1994)
- CFA(): Confirmatory Factor Analysis (categorical data, robust, with automatic model search)
- DEG2RAD() and RAD2DEG(): conversion between degrees and radians
- INVUPD(): rank-1 update of the inverse of a pd matrix
- SOM(): Self-Organizing Maps, supervised Kohonen networks
- Newsletter December 2011
- Now the input of hexadecimal numbers is permitted (transformed into unsigned long integers)
- Extended REPLACE(): for multiple replacements in scalars, vectors, matrices, or tensors
- Optional second argument for SRAND() for initialization of specific uniform generators
- Added: uniform random generators to RAND() function:
- Mersenne-Twister(Matsumoto and Nishimura, 1998)
- Advanced Encryption Standard (ASE) (Rijndael)
- GFSR4 (Ziff, 1998)
- RANLUX (Luescher, 1994)
- Worked over: MRAND() for multivariate random number generation
- New distributions for multivariate random number generation: t(mu,sigma,df), Pearson, Khintchine
- Worked over tests for univariate normality: Kolmogorov-Smirnov, Anderson-Darling, Shapiro-Wilks,
Jarqe-Bera
- MVNTEST(): Various tests of multivariate normality: Mardia's tests of MV skewness and kurtosis,
Royston, Henze-Zirkler, Doornik-Hansen, Small, Mudholkar, etc.
- CPERM(): permutation of columns of a matrix
- RPERM(): permutation of rows of a matrix
- ASSOC(): data mining items (Agrarwal, Imielinski, and Swami, 1993)
- RULES(): data mining items (Agrarwal, Imielinski, and Swami, 1993)
- SEQU(): data mining items (Agrarwal, Imielinski, and Swami, 1993)
- HANKEL(): create Hankel matrices
- CHNGTXT(): changes in string data (scalars, vectors, matrices, and tensors)
- CONVHULL(): convex hull of 2-, 3-, d-dimensional point configurations (Barber, Dobkin, and Huhdanpaa, 1996)
- DELAUNAY(): Delaunay triangulation (Barber, Dobkin, and Huhdanpaa, 1996)
- VORONIN(): Voronin diagrams (Barber, Dobkin, and Huhdanpaa, 1996)
- Newsletter July 2012
- Worked over: ENCRYPT() and DECRYPT(): more methods (AES, SHA2,...) and vector input for names of files and directories
- CSVREAD(): Reading of CSV (Comma-Separated-Values) files
- ENCRYP2(), DECRYP2(): similar to ENCRYPT() and DECRYPT(), however treats string objects (not files or directories)
- RECUPAR(): Recursive partitioning (tree splitting similar to SAS Macro TREEDISC, but much faster)
- INSERT(): with arguments compatible with SAS/IML function
- MIXREGV(): implementation of Don Hedeker's Fortran program MIXREGLS
- REMOVE(): with arguments compatible with SAS/IML function
- Almost all string functions (str...) extended for vector, matrix, and tensor arguments
-
- Newsletter December 2012 and July 2013
- Changed to MS Visual C/C++ 2010 and Intel Parallel Studio XE 2013
- Worked over polychoric correlations for multiple sample applications,
and functions HBTTEST(), SORTROW(),
- ENCRYPT(): added SHA-3 method to
- HORNER(): efficient evaluation of polynomials
- KDE() : 1- and 2-dimensional kernel density estimation
- MVSVM(): stepwise multivariate SVM (Thayanathan, 2005)
- PERMCOMB(): permutation and combination (stepwise)
- PREFNAME(): generate sets of prefix names
- RVM(): Relevance vector machine
- SDCSPM(): testing Srivstava's condition
- Application: Two-Phase Logistic Modeling
- Newsletter December 2013
- Worked over TTEST()
- Worked over linear regression: REG(), LRFORW(), and LRALL()
- Worked over general linear modeling: GLMOD(), e.g.
added multiple comparison techniques (Interfacing MULTCOMP())
- Worked over multivariate normal and t probabilities: CDFMVN()
- ICDFMV(): inverse CDF for multivariate normal and t distribution (given prob find quantile)
- MAHALANOBIS(): Mahalanobis distances (diagonal and full)
- MULTCOMP(): various parametric and nonparametric methods for
multiple comparison of means and medians of K>2 samples
- PADJUST(): adjusted multivariate probabilities (e.g. Bonferroni, Holm, Hochberg)
- PDFMV(): PDF for multivariate normal and t distribution
- WILCOX(): Wilcoxon rank sum test and signed rank test comparing two samples
- Newsletter July 2014
- Worked over LP(): interface with LPSOLVE,
now with integer constraints and sensitivity analysis
- Added to NLP(): LINCOA (LINear Constraint Optimization Algorithm) to NLP() (M.J.D. Powell)
- BOUNDBOX(): compute smallest rectangular box surrounding a specified set of points
- HAMILTON(): find all or some Hamiltonian circuits in directional graphs
- KNAPSACK(): solving (approximately) the one- and multidimensional Knapsack problem
- LATLONG(): various computations with Latitude and Longitude data
- LOCAT1(): solves multifacility location problem
- LPASSIGN(): solving the linear assignment problem with LPSOLVE, LAPJV, or
the Hungarian method and solving the linear bottleneck assignment problem with BOTJV
- LPTRANSP(): solving the linear transport problem with LPSOLVE
- MAXEMPTY(): finds coordinates and volume of largest empty box parallel to (x,y) axes
and surrounded by a specified set of points
- MSTGRA(): Minimum Spanning Tree based on graph data (MSTREE was renamed into MSTDIS
for Minimum Spanning Tree based on distance data)
- SPLNET(): compute shortest path (length) between two points of network (graph) data
- TSP(): various methods for solving the symmetric and asymmetric Traveling Salesman Problem
(Interface to Linkern and Concorde)
- Newsletter December 2014
- FILESTAT(): returns vector of file statistics
- MPSFILE(): transforming MPS file to matrix notation of LP specification
and vice versa
- PRITFILE(): transfer text file rowwise into vector of strings
- CONTSIM(): Random Generation of contingency tables with specified row and column sums
- All LP algorithms worked over
- Interface to Clp (Coin Linear Programming) and Cbc (Coin Branch-and-Cut) in COIN-OR
Some "Exact Statistics" for small samples:
- XCTBINOM(): exact binomial test: p values and confidence intervals
- XCTBIP1() : P(alt) for exact binomial test
- XCTBIPOW(): Power for exact binomial test
- XCTBISSZ(): Sample size for exact binomial test
- XCTPOISS(): exact Poisson test: p values and confidence intervals
- XCTFISHR(): Fisher's exact test for 2 x 2 contingency tables
- XCTFIPOW(): Power for Fisher's exact test for 2 x 2 contingency tables
- XCTFISSZ(): Sample size for Fisher's exact test for 2 x 2 contingency tables
- XCTLOG(): MCMC Method for exact logistic regression (see elrm)
- XCTMCNEM(): McNemar's exact test for 2 x 2 contingency tables
- XCTSIMU(): MC method of Fisher's exact test for m x n contingency tables
- XCTHYBR(): Hybrid method for Fisher's exact test of m x n contingency tables
- Newsletter December 2015
- Extending data types to Structs, entries are referred by
compound name separated by dot: struct_name.entry_name
- RANK() function extended for very large and sparse matrices
- Work on preprocessor for LP() function
- CLP interface for SVM() regression (Bi et.al, 2002) and SMSVM (Lee, Lin, and Wahba, 2003)
- RCCOUNT(): count the occurence of numeric or string values in
rows and columns of a matrix
- SOUND(): playing sound (of specified frequency and duration)
on speakers
- Newsletter July 2016
- This is now the very carefully tested version 9 ready for
downlowd moved to the internet at the end of September 2016.
Maybe it is the most error free since version 4.
- Work on printed output of tensors and lists
- Work on function SVM() for parameter tuning
- FNAMPID(): concatenate filename and actual process ID
- FREMOVE(): remove file with specified name
- FRENAME(): rename file
- ERROR(): print error message into log
- LSTLABEL(): assigning labels to entries of data lists
- LSTNAME(): assigning names to entries of data lists
- OUTLIER(): various methods for finding outliers in univariate data
- PID(): returns integer process ID
- SVMFSM(): SVM one-step feature selection (sparse L1 model fit)
- SVMSTW(): SVM stepwise feature selection (forward and backward, for only linear kernel)
- WARNING(): print warning message into log
- Rename objects with RENAME statement
Now, CMAT got some attention at
Dilbert
Algorithm, my delight
Running at the speed of light
What the genius, who the geek
Could forge thy objects ever sleek?
In what language were thee writ
That enabled every bit?
What the templates - how applied
And what the math down deep inside?
|
Thou wert born with MPI
And that enabled thee to fly
On Beowulf we set thee free
All cycles thou consumed with glee.
When the stars come out at night
And ask a sacrificial rite
Do we just sneer and charge ahead
With no fear and with no dread?
|
All this wonder; all this speed
And yet I wait here still in need
Alas the run's untimely halt
Said naught but that some seg did fault.
Algorithm, my delight
Running at the speed of light
What the genius, who the geek
Could forge thy objects ever sleek?
|
Contact information:
Back to Homepage