good and efficient linear algebra for real and complex data, sparse matrices and tensors and mixed data types, in addition to Blas, Lapack, and Arpack (for real and complex eigenvalues and eigenvectors of large sparse matrices and singular values of large sparse matrices). I also developed many algorithms myself, like quite powerful methods for the nonnegative matrix factorisation, iterative methods for the solution of large sparse linear systems, and total least squares (DTLS and PTLS by van Huffel). Together with Robert Hartwig of NC State University we developed a modification of Aasen's algorithm for the solution of linear systems with large and sparse positive semidefinte (symmetric and maybe singular) coefficient matrices which works well for applications where iterative methods may have convergence problems.
linear, quadratic, and general nonlinear optimization, unconstrained and constrained for smooth and nonsmooth objective functions. In addition to many algorithms which I wrote myself similar to those which I developed for SAS PROC NLP, e.g. conjugate gradient, (limited memory) quasi Newton, Newton-Raphson, trust region, Levenberg-Marquardt, I included many algorithms which were given to me from other scientists. For example by Stephen Write (pcx), and many from Mike Powell at Cambridge University especially those for nonsmooth problems (COBYLA, UOBYQA, BOBYQA, NEWUOA) and linear constrained problems (LINCOA), some by Kaj Madsen (who obtained his PhD with Mike Powell) from the Technical University of Denmark (linear L1, Linfinity, and MinMax estimation), and from Mustafa Pinar (nonlinear L1 and Linfinity estimation). Thanks also to Richard Brent. For nonsmooth problems I also implemented the well working Nelder-Mead method and the unconstrained and boundary constrained bundle trust region methods (developed by the late Prof. Zowe and his group from Bayreuth).
some well designed algorithms for basic statistical problems, especially for predictive modeling, and some not so easy available. With the permission of the original author (Don Hedeker) I implemented his MIXOR, MIXNOM, and MIXREGV algorithms so that they work with my own optimization algorithms (but was not able to make them significant faster). The robust methods LMS, LTS, MVE, and MCD which I developed (with the help of Peter Rousseeuw) also for SAS Institute I reprogrammed for CMAT, and added the Hotelling's test (maybe I should add more). Logistic regression and general GLIMs, as well as LARS, SCAD, Huber and very general nonlinear regression were developed around my own optimization methods. Also included were some more unknown regression methods, like orthogonal regression (errors in variables), bidimensional regression for the comparison of twodimensional maps (for face recognition), univariate Deming regression etc.
many methods for data mining covering large part of the functionality of SAS Enterprise Miner. It needs a larger paper to describe those. The so-called R2 node in SAS EM is basically a stepwise forward linear regression algorithm which is very much restricted by the number p of variables due to O(p^2) memory need. My algorithm for this problem needs only O(p) memory and is therefore able to solve problems with hundreds of thousands of variables, like those with microarray data. Hybrid neural networks and support vector machines (thanks to Olvi Mangasarian and Kristin Bennett) similar to those I developed for SAS Institute.
almost every feature which is available in other programs like AMOS or LISREL is implemented, except the full information ML method of missing values. Polychoric correlations, multiple sample solutions, and robust (Satorra-Bentler) Chi^2 and confidence intervals are available, with the help of Albert Maydeu-Olivares and the late Rod McDonald. In addition I developed a very powerful algorithm for finding the zero patterns of CFA solutions with large p values which works for up to about 40 variables. Many rotation methods for factor loadings and principal components including confidence intervals of the rotated loadings are available (Jennrich and Ogasarawa). There are also some well designed functions for IRT and MDS (thanks to Jim Ramsay for MULTSCALE) in CMAT.
many methods for the encryption of string data (like passwords), or files and entire directories of files. My own developed method is based on an unpublished algorithm and permits the specification of long passwords which should be stored externally. (Thanks to George Marsaglia.)
a good selection of cdf's, pdf's, and random generators, for uniform distribution methods by L'Ecuyer and Marsaglia, Mersenne-Twister, AES, RANLUX etc. and many for other distributions like Marsaglia's Ziggurat. When reviewing his papers for JSS I enjoyed very much working with George Marsaglia. For multivariate cdf of normal and t distribution I added code developed by Alan Genz and Frank Bretz which works well up to about 100 dimensions.
Example in
cmat/testtborut2.inp tlrallv2.inp tlrallv3.inp tlrallv4.inp tmicro2.inp tmixrpana.inp tmixrpan2.inp trand2.inp tranfor2.inp tsvm12.inp tsvm13.inp tsvm23.inp tsvm24.inp tsmp3.inp
|
Time in Secs
for PC12812 3418 1853 1450 2736 345 342 1799 2389 1156 1542 1162 18349 19262
|
Time in Secs
for PC24785 6466 2752 2253 5627 1382 1386 9443 3891 3283 2781 5123 100491 81609
|
Time in Secs
for PC36605 12874 9917 8027 3576 ??? 4839 4532 24425 7570 - - - - -
|