Timeline of operating systems

This article presents a timeline of events in the history of computer operating systems from 1951 to the current day. For a narrative explaining the overall developments, see the History of operating systems. == 20th Century == == 1940s == 1949 EDSAC was considered the first operating system developed by Maurice Wilkes and manufactured by the University of Cambridge == 1950s == 1951 LEO I 'Lyons Electronic Office' was the commercial development of EDSAC computing platform, supported by British firm J. Lyons and Co. 1953 DYSEAC - an early machine capable of distributing computing 1955 General Motors Operating System made for IBM 701 MIT's Tape Director operating system made for UNIVAC 1103 1956 GM-NAA I/O for IBM 704, based on General Motors Operating System 1957 Atlas Supervisor (Manchester University) (Atlas computer project start) BESYS (Bell Labs), for IBM 704, later IBM 7090 and IBM 7094 1958 University of Michigan Executive System (UMES), for IBM 704, 709, and 7090 1959 SHARE Operating System (SOS), based on GM-NAA I/O == 1960s == 1960 IBSYS (IBM for its 7090 and 7094) 1961 CTSS demonstration (MIT's Compatible Time-Sharing System for the IBM 7094) MCP (Burroughs Master Control Program) for B5000 1962 Atlas Supervisor (Manchester University) (Atlas computer commissioned) BBN Time-Sharing System GCOS (GE's General Comprehensive Operating System, originally GECOS, General Electric Comprehensive Operating Supervisor) 1963 ADMIRAL AN/FSQ-32, another early time-sharing system begun CTSS becomes operational (MIT's Compatible Time-Sharing System for the IBM 7094) JOSS, an interactive time-shared system that did not distinguish between operating system and language Titan Supervisor, early time-sharing system begun 1964 Berkeley Timesharing System (for Scientific Data Systems' SDS 940) Chippewa Operating System (for CDC 6600 supercomputer) Dartmouth Time-Sharing System (Dartmouth College's DTSS for GE computers) EXEC 8 (UNIVAC) KDF9 Timesharing Director (English Electric) – an early, fully hardware secured, fully pre-emptive process switching, multi-programming operating system for KDF9 (originally announced in 1960) OS/360 (IBM's primary OS for its S/360 series) (announced) PDP-6 Monitor (DEC) descendant renamed TOPS-10 in 1970 SCOPE (CDC 3000 series) 1965 BOS/360 (IBM's Basic Operating System) DECsys TOS/360 (IBM's Tape Operating System) Livermore Time Sharing System (LTSS) Multics (MIT, GE, Bell Labs for the GE-645) (announced) Pick operating system SIPROS 66 (Simultaneous Processing Operating System) THE multiprogramming system (Technische Hogeschool Eindhoven) development TSOS (later VMOS) (RCA) 1966 DOS/360 (IBM's Disk Operating System) GEORGE 1 & 2 for ICT 1900 series Mod 1 Mod 2 Mod 8 MS/8 (Richard F. Lary's DEC PDP-8 system) MSOS (Mass Storage Operating System) OS/360 (IBM's primary OS for its S/360 series) PCP and MFT (shipped) RAX Remote Users of Shared Hardware (RUSH), a time-sharing system developed by Allen-Babcock for the IBM 360/50 SODA for Elwro's Odra 1204 Universal Time-Sharing System (XDS Sigma series) 1967 CP-40, predecessor to CP-67 on modified IBM System/360 Model 40 CP-67 (IBM, also known as CP/CMS) Conversational Programming System (CPS), an IBM time-sharing system under OS/360 Michigan Terminal System (MTS) (time-sharing system for the IBM S/360-67 and successors) ITS (MIT's Incompatible Timesharing System for the DEC PDP-6 and PDP-10) OS/360 MVT ORVYL (Stanford University's time-sharing system for the IBM S/360-67) TSS/360 (IBM's Time-sharing System for the S/360-67, never officially released, canceled in 1969 and again in 1971) WAITS (SAIL, Stanford Artificial Intelligence Laboratory, time-sharing system for DEC PDP-6 and PDP-10, later TOPS-10) 1968 Airline Control Program (ACP) (IBM) B1 (NCR Century series) CALL/360, an IBM time-sharing system for System/360 HP Real-Time Executive (HP RTE) – Hewlett-Packard HP Time-Shared BASIC (HP TSB) – Hewlett-Packard (time-sharing system for the HP 2000) THE multiprogramming system (Eindhoven University of Technology) publication TSS/8 (DEC for the PDP-8) VP/CSS 1969 B2 (NCR Century series) B3 (NCR Century series) GEORGE 3 For ICL 1900 series MINIMOP Multics (MIT, GE, Bell Labs for the GE-645 and later the Honeywell 6180) (opened for paying customers in October) RC 4000 Multiprogramming System (RC) TENEX (Bolt, Beranek and Newman for DEC systems, later TOPS-20) Unics (later Unix) (AT&T, initially on DEC computers) Xerox Operating System == 1970s == 1970 DOS-11 (PDP-11) 1971 EMAS Kronos RSTS-11 2A-19 (First released version; PDP-11) RSX-15 OS/8 1972 B4 (NCR Century series) COS-300 Data General RDOS Edos MUSIC/SP OS/4 OS 1100 OS/2000 (Honeywell 2000-series) Operating System/Virtual Storage 1 (OS/VS1) Operating System/Virtual Storage 2 R1 (OS/VS2 SVS) PRIMOS (written in FORTRAN IV, that didn't have pointers, while later versions, around version 18, written in a version of PL/I, called PL/P) Virtual Machine/Basic System Extensions Program Product (BSEPP or VM/SE) Virtual Machine/System Extensions Program Product (SEPP or VM/BSE) Virtual Machine Facility/370 (VM/370), sometimes known as VM/CMS 1973 Эльбрус-1 (Elbrus-1) – Soviet computer – created using high-level language uЭль-76 (AL-76/ALGOL 68) Alto OS CP-V (Control Program V) RSX-11D RT-11 VME – implementation language S3 (ALGOL 68) 1974 ACOS-2 (NEC) ACOS-4 ACOS-6 CP/M DOS-11 V09-20C (Last stable release, June 1974) Hydra – capability-based, multiprocessing OS kernel MONECS Multi-Programming Executive (MPE) – Hewlett-Packard Operating System/Virtual Storage 2 R2 (MVS) OS/7 OS/16 OS/32 Sintran III 1975 BS2000 V2.0 (First released version) COS-350 ISIS NOS (Control Data Corporation) OS/3 (Univac) VS/9 (formerly RCA's TSOS, later named VMOS) Version 6 Unix XVM/DOS XVM/RSX 1976 Cambridge CAP computer – all operating system procedures written in ALGOL 68C, with some closely associated protected procedures in BCPL Cray Operating System DX10 FLEX TOPS-20 TX990/TXDS Tandem Nonstop OS v1 Thoth 1977 1BSD AMOS KERNAL OASIS operating system OS68 OS4000 RMX-80 System 88 (Exec) System Support Program (IBM System/34 and System/36) TRSDOS Virtual Memory System (VMS) V1.0 (Initial commercial release, October 25) VRX (Virtual Resource eXecutive) VS Virtual Memory Operating System 1978 2BSD Apple DOS Control Program Facility (IBM System/38) Cray Time Sharing System (CTSS) DPCX (IBM) DPPX (IBM) HDOS KSOS – secure OS design from Ford Aerospace KVM/370 – security retro-fit of IBM VM/370 Lisp machine (CADR) MVS/System Extensions (MVS/SE) OS4 (Naked Mini 4) PTDOS TRIPOS UCSD p-System (First released version) Z80-RIO 1979 Atari DOS 3BSD CP-6 Idris MP/M MVS/System Extensions R2 (MVS/SE2) NLTSS POS Sinclair BASIC Transaction Processing Facility (TPF) (IBM) UCLA Secure UNIX – an early secure UNIX OS based on security kernel UNIX/32V DOS/VSE Version 7 Unix == 1980s == 1980 86-DOS AOS/VS (Data General) Business Operating System CTOS DOSPLUS (TRS-80) MVS/System Product (MVS/SP) V1 NewDos/80 OS-9 RMX-86 RS-DOS SOS Virtual Machine/System Product (VM/SP) Xenix 1981 Acorn MOS Aegis SR1 (First Apollo/DOMAIN systems shipped on March 27) CP/M-86 DRX (Distributed Resource Executive) iMAX – OS for Intel's iAPX 432 capability machine MCS (Multi-user Control System) MS-DOS PC DOS Pilot (Xerox Star operating system) UNOS UTS V VERSAdos VRTX VSOS (Virtual Storage Operating System) Xinu first release 1982 Commodore DOS LDOS (By Logical Systems, Inc. – for the Radio Shack TRS-80 Models I, II & III) PCOS (Olivetti M20) pSOS QNX Stratus VOS Sun UNIX (later SunOS) 0.7 Ultrix Unix System III VAXELN 1983 Coherent DNIX EOS GNU (project start) Lisa Office System 7/7 LOCUS – UNIX compatible, high reliability, distributed OS MVS/System Product V2 (MVS/Extended Architecture, MVS/XA) Novell NetWare (S-Net) PERPOS ProDOS RTU (Real-Time Unix) STOP – TCSEC A1-class, secure OS for SCOMP hardware SunOS 1.0 VSE/System Package (VSE/SP) Version 1 1984 AMSDOS CTIX (Unix variant) DYNIX Mac OS (System 1.0) MSX-DOS NOS/VE PANOS PC/IX ROS Sinclair QDOS SINIX UNICOS Venix 2.0 Virtual Machine/Extended Architecture Migration Assistance (VM/XA MA) 1985 AmigaOS Atari TOS DG/UX DOS Plus Graphics Environment Manager Harmony MacOS 2 MIPS RISC/os Oberon – written in Oberon SunOS 2.0 Version 8 Unix Virtual Machine/Extended Architecture System Facility (VM/XA SF) Windows 1.0 Windows 1.01 Xenix 2.0 1986 AIX 1.0 Cronus distributed OS FlexOS GEMSOS – TCSEC A1-class, secure kernel for BLACKER VPN & GTNP GEOS Genera 7.0 HP-UX MacOS 3 SunOS 3.0 TR-DOS TRIX Version 9 Unix 1987 Arthur (much improved version came in 1989 under the name RISC OS) BS2000 V9.0 IRIX (3.0 is first SGI version) MacOS 4 MacOS 5 MDOS MINIX 1.0 OS/2 (1.0) PC-MOS/386 Topaz – semi-distributed OS for DEC Firefly workstation written in Modula-2+ and garbage collected VxWorks Windows 2.0 1988 A/UX (Apple Computer) AOS/VS II (Data General) CP/M rebranded as DR-DOS Flex machine – tagged, capability machine with OS and other software written

Cross-validation (statistics)

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance. == Motivation == Assume a model with one or more unknown parameters, and a data set to which the model can be fit (the training data set). The fitting process optimizes the model parameters to make the model fit the training data as well as possible. If an independent sample of validation data is taken from the same population as the training data, it will generally turn out that the model does not fit the validation data as well as it fits the training data. The size of this difference is likely to be large especially when the size of the training data set is small, or when the number of parameters in the model is large. Cross-validation is a way to estimate the size of this effect. === Example: linear regression === In linear regression, there exist real response values y 1 , … , y n {\textstyle y_{1},\ldots ,y_{n}} , and n p-dimensional vector covariates x1, ..., xn. The components of the vector xi are denoted xi1, ..., xip. If least squares is used to fit a function in the form of a hyperplane ŷ = a + βTx to the data (xi, yi) 1 ≤ i ≤ n, then the fit can be assessed using the mean squared error (MSE). The MSE for given estimated parameter values a and β on the training set (xi, yi) 1 ≤ i ≤ n is defined as: MSE = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 = 1 n ∑ i = 1 n ( y i − a − β T x i ) 2 = 1 n ∑ i = 1 n ( y i − a − β 1 x i 1 − ⋯ − β p x i p ) 2 {\displaystyle {\begin{aligned}{\text{MSE}}&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-{\boldsymbol {\beta }}^{T}\mathbf {x} _{i})^{2}\\&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})^{2}\end{aligned}}} If the model is correctly specified, it can be shown under mild assumptions that the expected value of the MSE for the training set is (n − p − 1)/(n + p + 1) < 1 times the expected value of the MSE for the validation set (the expected value is taken over the distribution of training sets). Thus, a fitted model and computed MSE on the training set will result in an optimistically biased assessment of how well the model will fit an independent data set. This biased estimate is called the in-sample estimate of the fit, whereas the cross-validation estimate is an out-of-sample estimate. Since in linear regression it is possible to directly compute the factor (n − p − 1)/(n + p + 1) by which the training MSE underestimates the validation MSE under the assumption that the model specification is valid, cross-validation can be used for checking whether the model has been overfitted, in which case the MSE in the validation set will substantially exceed its anticipated value. (Cross-validation in the context of linear regression is also useful in that it can be used to select an optimally regularized cost function.) === General case === In most other regression procedures (e.g. logistic regression), there is no simple formula to compute the expected out-of-sample fit. Cross-validation is, thus, a generally applicable way to predict the performance of a model on unavailable data using numerical computation in place of theoretical analysis. == Types == Two types of cross-validation can be distinguished: exhaustive and non-exhaustive cross-validation. === Exhaustive cross-validation === Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set. ==== Leave-p-out cross-validation ==== Leave-p-out cross-validation (LpO CV) involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set. LpO cross-validation require training and validating the model C p n {\displaystyle C_{p}^{n}} times, where n is the number of observations in the original sample, and where C p n {\displaystyle C_{p}^{n}} is the binomial coefficient. For p > 1 and for even moderately large n, LpO CV can become computationally infeasible. For example, with n = 100 and p = 30, C 30 100 ≈ 3 × 10 25 . {\displaystyle C_{30}^{100}\approx 3\times 10^{25}.} A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under ROC curve of binary classifiers. ==== Leave-one-out cross-validation ==== Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1. The process looks similar to jackknife; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only. LOO cross-validation requires less computation time than LpO cross-validation because there are only C 1 n = n {\displaystyle C_{1}^{n}=n} passes rather than C p n {\displaystyle C_{p}^{n}} . However, n {\displaystyle n} passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate. Pseudo-code algorithm: Input: x, {vector of length N with x-values of incoming points} y, {vector of length N with y-values of the expected result} interpolate( x_in, y_in, x_out ), { returns the estimation for point x_out after the model is trained with x_in-y_in pairs} Output: err, {estimate for the prediction error} Steps: err ← 0 for i ← 1, ..., N do // define the cross-validation subsets x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N]) y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N]) x_out ← x[i] y_out ← interpolate(x_in, y_in, x_out) err ← err + (y[i] − y_out)^2 end for err ← err/N === Non-exhaustive cross-validation === Non-exhaustive cross validation methods do not compute all ways of splitting the original sample. These methods are approximations of leave-p-out cross-validation. ==== k-fold cross-validation ==== In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples, often referred to as "folds". Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used, but in general k remains an unfixed parameter. For example, setting k = 2 results in 2-fold cross-validation. In 2-fold cross-validation, the dataset is randomly shuffled into two sets d0 and d1, so that both sets are equal size (this is usually implemented by shuffling the data array and then splitting it in two). We then train on d0 and validate on d1, followed by training on d1 and validating on d0. When k = n (the number of observations), k-fold cross-validation is equivalent to leave-one-out cr

Arabic Speech Corpus

The Arabic Speech Corpus is a Modern Standard Arabic (MSA) speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of more than 3.7 hours of MSA speech aligned with recorded speech on the phoneme level. The annotations include word stress marks on the individual phonemes. The Arabic Speech Corpus was built as part of a doctoral project by Nawar Halabi at the University of Southampton funded by MicroLinkPC who own an exclusive license to commercialise the corpus, but the corpus is available for strictly non-commercial purposes through the official Arabic Speech Corpus website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. == Purpose == The corpus was mainly built for speech synthesis purposes, specifically Speech Synthesis, but the corpus has been used for building HMM based voices in Arabic. It was also used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems. == Contents == The package contains the following: 1813 .wav files containing spoken utterances. 1813 .lab files containing text utterances. 1813 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line. orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line. Orthography is in Buckwalter Format which is friendlier where there is software that does not read Arabic script. It can be easily converted back to Arabic. There is an extra 18 minutes of fully annotated corpus (separate from above but with the same structure as above) which was used to evaluated the corpus (see PhD thesis). The corpus was also used to prove that using automatically extracted, orthography-based stress marks improve the quality of speech synthesis in MSA.

Junction tree algorithm

The junction tree algorithm (also known as 'Clique Tree') is a method used in machine learning to extract marginalization in general graphs. In essence, it entails performing belief propagation on a modified graph called a junction tree. The graph is called a tree because it branches into different sections of data; nodes of variables are the branches. The basic premise is to eliminate cycles by clustering them into single nodes. Multiple extensive classes of queries can be compiled at the same time into larger structures of data. There are different algorithms to meet specific needs and for what needs to be calculated. Inference algorithms gather new developments in the data and calculate it based on the new information provided. == Junction tree algorithm == === Hugin algorithm === If the graph is directed then moralize it to make it un-directed. Introduce the evidence. Triangulate the graph to make it chordal. Construct a junction tree from the triangulated graph (we will call the vertices of the junction tree "supernodes"). Propagate the probabilities along the junction tree (via belief propagation) Note that this last step is inefficient for graphs of large treewidth. Computing the messages to pass between supernodes involves doing exact marginalization over the variables in both supernodes. Performing this algorithm for a graph with treewidth k will thus have at least one computation which takes time exponential in k. It is a message passing algorithm. The Hugin algorithm takes fewer computations to find a solution compared to Shafer-Shenoy. === Shafer-Shenoy algorithm === Computed recursively Multiple recursions of the Shafer-Shenoy algorithm results in Hugin algorithm Found by the message passing equation Separator potentials are not stored The Shafer-Shenoy algorithm is the sum product of a junction tree. It is used because it runs programs and queries more efficiently than the Hugin algorithm. The algorithm makes calculations for conditionals for belief functions possible. Joint distributions are needed to make local computations happen. === Underlying theory === The first step concerns only Bayesian networks, and is a procedure to turn a directed graph into an undirected one. We do this because it allows for the universal applicability of the algorithm, regardless of direction. The second step is setting variables to their observed value. This is usually needed when we want to calculate conditional probabilities, so we fix the value of the random variables we condition on. Those variables are also said to be clamped to their particular value. The third step is to ensure that graphs are made chordal if they aren't already chordal. This is the first essential step of the algorithm. It makes use of the following theorem: Theorem: For an undirected graph, G, the following properties are equivalent: Graph G is triangulated. The clique graph of G has a junction tree. There is an elimination ordering for G that does not lead to any added edges. Thus, by triangulating a graph, we make sure that the corresponding junction tree exists. A usual way to do this, is to decide an elimination order for its nodes, and then run the Variable elimination algorithm. The variable elimination algorithm states that the algorithm must be run each time there is a different query. This will result to adding more edges to the initial graph, in such a way that the output will be a chordal graph. All chordal graphs have a junction tree. The next step is to construct the junction tree. To do so, we use the graph from the previous step, and form its corresponding clique graph. Now the next theorem gives us a way to find a junction tree: Theorem: Given a triangulated graph, weight the edges of the clique graph by their cardinality, |A∩B|, of the intersection of the adjacent cliques A and B. Then any maximum-weight spanning tree of the clique graph is a junction tree. So, to construct a junction tree we just have to extract a maximum weight spanning tree out of the clique graph. This can be efficiently done by, for example, modifying Kruskal's algorithm. The last step is to apply belief propagation to the obtained junction tree. Usage: A junction tree graph is used to visualize the probabilities of the problem. The tree can become a binary tree to form the actual building of the tree. A specific use could be found in auto encoders, which combine the graph and a passing network on a large scale automatically. === Inference Algorithms === Loopy belief propagation: A different method of interpreting complex graphs. The loopy belief propagation is used when an approximate solution is needed instead of the exact solution. It is an approximate inference. Cutset conditioning: Used with smaller sets of variables. Cutset conditioning allows for simpler graphs that are easier to read but are not exact.

FastICA

FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration. == Algorithm == === Prewhitening the data === Let the X := ( x i j ) ∈ R N × M {\displaystyle \mathbf {X} :=(x_{ij})\in \mathbb {R} ^{N\times M}} denote the input data matrix, M {\displaystyle M} the number of columns corresponding with the number of samples of mixed signals and N {\displaystyle N} the number of rows corresponding with the number of independent source signals. The input data matrix X {\displaystyle \mathbf {X} } must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it. Centering the data entails demeaning each component of the input data X {\displaystyle \mathbf {X} } , that is, for each i = 1 , … , N {\displaystyle i=1,\ldots ,N} and j = 1 , … , M {\displaystyle j=1,\ldots ,M} . After centering, each row of X {\displaystyle \mathbf {X} } has an expected value of 0 {\displaystyle 0} . Whitening the data requires a linear transformation L : R N × M → R N × M {\displaystyle \mathbf {L} :\mathbb {R} ^{N\times M}\to \mathbb {R} ^{N\times M}} of the centered data so that the components of L ( X ) {\displaystyle \mathbf {L} (\mathbf {X} )} are uncorrelated and have variance one. More precisely, if X {\displaystyle \mathbf {X} } is a centered data matrix, the covariance of L x := L ( X ) {\displaystyle \mathbf {L} _{\mathbf {x} }:=\mathbf {L} (\mathbf {X} )} is the ( N × N ) {\displaystyle (N\times N)} -dimensional identity matrix, that is, A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data X {\displaystyle \mathbf {X} } , E { X X T } = E D E T {\displaystyle E\left\{\mathbf {X} \mathbf {X} ^{T}\right\}=\mathbf {E} \mathbf {D} \mathbf {E} ^{T}} , where E {\displaystyle \mathbf {E} } is the matrix of eigenvectors and D {\displaystyle \mathbf {D} } is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by === Single component extraction === The iterative algorithm finds the direction for the weight vector w ∈ R N {\displaystyle \mathbf {w} \in \mathbb {R} ^{N}} that maximizes a measure of non-Gaussianity of the projection w T X {\displaystyle \mathbf {w} ^{T}\mathbf {X} } , with X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} denoting a prewhitened data matrix as described above. Note that w {\displaystyle \mathbf {w} } is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function f ( u ) {\displaystyle f(u)} , its first derivative g ( u ) {\displaystyle g(u)} , and its second derivative g ′ ( u ) {\displaystyle g^{\prime }(u)} . Hyvärinen states that the functions are useful for general purposes, while may be highly robust. The steps for extracting the weight vector w {\displaystyle \mathbf {w} } for single component in FastICA are the following: Randomize the initial weight vector w {\displaystyle \mathbf {w} } Let w + ← E { X g ( w T X ) T } − E { g ′ ( w T X ) } w {\displaystyle \mathbf {w} ^{+}\leftarrow E\left\{\mathbf {X} g(\mathbf {w} ^{T}\mathbf {X} )^{T}\right\}-E\left\{g'(\mathbf {w} ^{T}\mathbf {X} )\right\}\mathbf {w} } , where E { . . . } {\displaystyle E\left\{...\right\}} means averaging over all column-vectors of matrix X {\displaystyle \mathbf {X} } Let w ← w + / ‖ w + ‖ {\displaystyle \mathbf {w} \leftarrow \mathbf {w} ^{+}/\|\mathbf {w} ^{+}\|} If not converged, go back to 2 === Multiple component extraction === The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, 1 M {\displaystyle \mathbf {1_{M}} } is a column vector of 1's of dimension M {\displaystyle M} . Algorithm FastICA Input: C {\displaystyle C} Number of desired components Input: X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} Prewhitened matrix, where each column represents an N {\displaystyle N} -dimensional sample, where C <= N {\displaystyle C<=N} Output: W ∈ R N × C {\displaystyle \mathbf {W} \in \mathbb {R} ^{N\times C}} Un-mixing matrix where each column projects X {\displaystyle \mathbf {X} } onto independent component. Output: S ∈ R C × M {\displaystyle \mathbf {S} \in \mathbb {R} ^{C\times M}} Independent components matrix, with M {\displaystyle M} columns representing a sample with C {\displaystyle C} dimensions. for p in 1 to C: w p ← {\displaystyle \mathbf {w_{p}} \leftarrow } Random vector of length N while w p {\displaystyle \mathbf {w_{p}} } changes w p ← 1 M X g ( w p T X ) T − 1 M g ′ ( w p T X ) 1 M w p {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {1}{M}}\mathbf {X} g(\mathbf {w_{p}} ^{T}\mathbf {X} )^{T}-{\frac {1}{M}}g'(\mathbf {w_{p}} ^{T}\mathbf {X} )\mathbf {1_{M}} \mathbf {w_{p}} } w p ← w p − ∑ j = 1 p − 1 ( w p T w j ) w j {\displaystyle \mathbf {w_{p}} \leftarrow \mathbf {w_{p}} -\sum _{j=1}^{p-1}(\mathbf {w_{p}} ^{T}\mathbf {w_{j}} )\mathbf {w_{j}} } w p ← w p ‖ w p ‖ {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {\mathbf {w_{p}} }{\|\mathbf {w_{p}} \|}}} output W ← [ w 1 , … , w C ] {\displaystyle \mathbf {W} \leftarrow {\begin{bmatrix}\mathbf {w_{1}} ,\dots ,\mathbf {w_{C}} \end{bmatrix}}} output S ← W T X {\displaystyle \mathbf {S} \leftarrow \mathbf {W^{T}} \mathbf {X} }

Private cloud computing infrastructure

Private cloud computing infrastructure is a category of cloud computing that provides comparable benefits to public cloud systems, such as self-service and scalability, but it does so via a proprietary framework. In contrast to public clouds, which cater to multiple entities, a private cloud is specifically designed for the requirements and objectives of one organization. == Definition == A private cloud computing infrastructure constitutes a distinctive model of cloud computing that facilitates a secure and distinct cloud environment where only the intended client can function. It can either be physically housed in the organization's in-house data center or be managed by a third-party provider. In a private cloud, the infrastructure and services are always sustained on a private network, and both the hardware and software are devoted exclusively to a single organization. == History == The concept of private cloud infrastructure started to take shape around the mid-2000s, coinciding with the rise of other cloud computing forms. It came into existence as a solution to the shortcomings of public clouds, particularly concerns over data control, security, and network performance. IT departments began to mirror the automation and self-service features of the public cloud in their data centers. Over time, these services became more advanced, and private cloud technology has been refined to address businesses and organizations' diverse needs. == Architecture == Private cloud computing infrastructure generally involves a mix of hardware, network infrastructure, and virtualization software. The hardware, often referred to as a cloud server or cloud array, consists of a server rack or a collection of server racks containing the storage and processors that constitute the cloud. The virtualization software, such as Hyper-V, OpenStack, or VMWare, establishes and oversees virtual machines with which users interact. The network infrastructure connects the private cloud to users and may facilitate connectivity with other on-premises data centers or clouds. == Applications == Private cloud infrastructures are usually utilized by medium to large businesses and organizations that need robust control over their data, have extensive computing needs, or have specific regulatory or compliance obligations. This includes healthcare organizations, government agencies, financial institutions, and any business that needs to process and store large data volumes.

Optimal discriminant analysis and classification tree analysis

Optimal Discriminant Analysis (ODA) and the related classification tree analysis (CTA) are exact statistical methods that maximize predictive accuracy. For any specific sample and exploratory or confirmatory hypothesis, optimal discriminant analysis (ODA) identifies the statistical model that yields maximum predictive accuracy, assesses the exact Type I error rate, and evaluates potential cross-generalizability. Optimal discriminant analysis may be applied to > 0 dimensions, with the one-dimensional case being referred to as UniODA and the multidimensional case being referred to as MultiODA. Optimal discriminant analysis is an alternative to ANOVA (analysis of variance) and regression analysis.