CIPRES

ParallelStructure

ParallelStructure is an R-based implementation of the program Structure (Pritchard, et al., 2000). Structure is a tool for genetic analysis of population structure. The implementation of ParallelStructure uses R to take advantage of multi-core processors.

Detailed information about ParallelStructure was provided by the author here.

Manual for ParallelStructure here.

Manual for Structure.

Structure Google Group

Input files: ParallelStructure accepts as input two files: one file contains the data set, and has the same format as a Structure software file. The second is a joblist file which identifies individual populations to be analyzed within the data set. A list of tasks to be performed is stored in a joblistfile. In joblist file, each line corresponds to an individual job. While STRUCTURE input format requires a different dataset for each set of each population, ParallelStructure allows the user to work from a large input file containing all the populations one might need to analyze in STRUCTURE. The joblist file lists a set of "jobs" that can include all or only a subset of population. This avoids making a different input file for each population subset. The user defines the set of populations to be included, and specifies the STRUCTURE parameter K, burin and number of iteration in the oblist file.

Job joblist format is below. The job desginator is first (T1, T2, etc); then the populations to be analyzed, separated by commas (1,2,3,4 in line 1), then the K value (3 in line 1), the burnin (1000 in line 1) and total iterations (10000 in line 1). If all populations in the data must be analyzed pairwise (all VS all), the list of populations for the given job can be replaced by "pairwaise.matrix" (see job T11 in example joblist).

T1 1,2,3,4 3 1000 10000
T2 1,2,3,4 4 1000 10000
T3 2,3,4,5 3 1000 10000
T4 2,3,4,5 4 1000 10000
T5 3,4,5 3 1000 10000
T6 3,4,5,6 4 1000 10000
T7 3,4,5,6 3 1000 10000
T8 4,5,6,7 3 1000 10000
T9 4,5,6,7 4 1000 10000
T10 5,6,7,8 3 1000 10000
T11 pairwise.matrix 2 1000 10000
T12 1,2,3,4,5,6,7,8 2 1000 10000
T13 1,2,3,4,5,6,7,8 3 1000 10000
T14 1,2,3,4,5,6,7,8 4 1000 10000
T15 1,2,3,4,5,6,7,8 5 1000 10000
T16 1,2,3,4,5,6,7,8 6 1000 10000
T17 1,2,3,4,5,6,7,8 7 1000 10000
T18 1,2,3,4,5,6,7,8 8 1000 10000
T19 3,4,5,6,7,8 3 1000 10000
T20 3,4,5,6,7,8 4 1000 10000

Output files: ParallelStructure returns a set of files. First, a set of PDF files, one for each analysis specified in the joblist file. And second, it creates a set of results files, a pair for each analysis in joblist.txt. If parameters printqhat=1 or plot output=1, q files and graphs in _pdf format are produced. ParallelStructure also produces one .csv file called "results summary" in the working directory. This file contains a table that summarises for each job listed in the joblist file: main job parameters (job ID, k, number of iteration and burnin, as well as result summary statistics log likelihood of the data, mean and variance of the log likelihood, and mean value of alpha).

The table below shows the kinds of results returned by CIPRES Science Gateway:

Input File Names	Sample File from a Test
pdf file	pstructure_example_data.txt
joblist1.txt	pstructure_joblist1.txt

Sample Output File Type	File Name
log file	pstructure_job_T1.pdf
tree file(s)	pstructure_results_job_T1_f
operators file	pstructure_results_job_T1_q
results summary file	pstructure_results_summary.csv

If you use Parallel Structure here, please cite:

Besnier, F., and Glover, K. A. (2013) ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers. PLoS ONE 8, e70651

Pritchard, J. K., Stephens, M., and Donnelly, P. (2000) Inference of Population Structure Using Multilocus Genotype Data. Genetics 155, 945

If there is a tool or a feature you need, please let us know.

Parallel Structure

ParallelStructure

Get 1000 Hours free