R packages by snoweye

pbdZMQ - Programming with Big Data -- Interface to 'ZeroMQ'

'ZeroMQ' is a well-known library for high-performance asynchronous messaging in scalable, distributed applications. This package provides high level R wrapper functions to easily utilize 'ZeroMQ'. We mainly focus on interactive client/server programming frameworks. For convenience, a minimal 'ZeroMQ' library (4.2.2) is shipped with 'pbdZMQ', which can be used if no system installation of 'ZeroMQ' is available. A few wrapper functions compatible with 'rzmq' are also provided.

Last updated 5 months ago

zeromq3

9.90 score 17 stars 25 dependents 47 scripts 13k downloads

phyclust - Phylogenetic Clustering (Phyloclustering)

Phylogenetic clustering (phyloclustering) is an evolutionary Continuous Time Markov Chain model-based approach to identify population structure from molecular data without assuming linkage equilibrium. The package phyclust (Chen 2011) provides a convenient implementation of phyloclustering for DNA and SNP data, capable of clustering individuals into subpopulations and identifying molecular sequences representative of those subpopulations. It is designed in C for performance, interfaced with R for visualization, and incorporates other popular open source programs including ms (Hudson 2002) <doi:10.1093/bioinformatics/18.2.337>, seq-gen (Rambaut and Grassly 1997) <doi:10.1093/bioinformatics/13.3.235>, Hap-Clustering (Tzeng 2005) <doi:10.1002/gepi.20063> and PAML baseml (Yang 1997, 2007) <doi:10.1093/bioinformatics/13.5.555>, <doi:10.1093/molbev/msm088>, for simulating data, additional analyses, and searching the best tree. See the phyclust website for more information, documentations and examples.

Last updated 1 years ago

8.33 score 9 stars 8 dependents 126 scripts 1.6k downloads

EMCluster - EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution

EM algorithms and several efficient initialization methods for model-based clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semi-supervised learning.

Last updated 5 months ago

openblas

7.22 score 18 stars 1 dependents 123 scripts 490 downloads

pbdMPI - R Interface to MPI for HPC Clusters (Programming with Big Data Project)

A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package. pbdMPI embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data and resource-independent RNG reproducibility is included. It is based on S4 classes and methods.

Last updated 4 months ago

openmpi

7.12 score 2 stars 3 dependents 181 scripts 739 downloads

QZ - Generalized Eigenvalues and QZ Decomposition

Generalized eigenvalues and eigenvectors use QZ decomposition (generalized Schur decomposition). The decomposition needs an N-by-N non-symmetric matrix A or paired matrices (A,B) with eigenvalues reordering mechanism. The decomposition functions are mainly based Fortran subroutines in complex*16 and double precision of LAPACK library (version 3.10.0 or later).

Last updated 1 years ago

fortranopenblas

5.20 score 2 stars 53 scripts 255 downloads

cubfits - Codon Usage Bias Fits

Estimating mutation and selection coefficients on synonymous codon bias usage based on models of ribosome overhead cost (ROC). Multinomial logistic regression and Markov Chain Monte Carlo are used to estimate and predict protein production rates with/without the presence of expressions and measurement errors. Work flows with examples for simulation, estimation and prediction processes are also provided with parallelization speedup. The whole framework is tested with yeast genome and gene expression data of Yassour, et al. (2009) <doi:10.1073/pnas.0812841106>.

Last updated 3 years ago

4.83 score 7 stars 32 scripts 730 downloads

pbdSLAP - Programming with Big Data -- Scalable Linear Algebra Packages

Utilizing scalable linear algebra packages mainly including 'BLACS', 'PBLAS', and 'ScaLAPACK' in double precision via 'pbdMPI' based on 'ScaLAPACK' version 2.0.2.

Last updated 3 months ago

fortranopenmpi

4.48 score 4 scripts 725 downloads

MixSim - Simulating Data to Study Performance of Clustering Algorithms

The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of 'MixSim', there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.

Last updated 7 months ago

openblas

4.48 score 1 stars 3 dependents 83 scripts 543 downloads

MixfMRI - Mixture fMRI Clustering Analysis

Utilizing model-based clustering (unsupervised) for functional magnetic resonance imaging (fMRI) data. The developed methods (Chen and Maitra (2023) <doi:10.1002/hbm.26425>) include 2D and 3D clustering analyses (for p-values with voxel locations) and segmentation analyses (for p-values alone) for fMRI data where p-values indicate significant level of activation responding to stimulate of interesting. The analyses are mainly identifying active voxel/signal associated with normal brain behaviors. Analysis pipelines (R scripts) utilizing this package (see examples in 'inst/workflow/') is also implemented with high performance techniques.

Last updated 4 months ago

4.26 score 2 stars 18 scripts 238 downloads

pmclust - Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model

Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs 'pbdMPI' to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through 'pbdMPI' and MPI' implementations such as 'OpenMPI' and 'MPICH'. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.

Last updated 1 years ago

3.70 score 5 stars 4 scripts 149 downloads