Get 1000 Hours free
On the UCSD Supercomputer
Start Your TrialParallelStructure is an R-based implementation of the program Structure (Pritchard, et al., 2000). Structure is a tool for genetic analysis of population structure. The implementation of ParallelStructure uses R to take advantage of multi-core processors.
Detailed information about ParallelStructure was provided by the author here.
Manual for ParallelStructure here.
Input files: ParallelStructure accepts as input two files: one file contains the data set, and has the same format as a Structure software file. The second is a joblist file which identifies individual populations to be analyzed within the data set. A list of tasks to be performed is stored in a joblistfile. In joblist file, each line corresponds to an individual job. While STRUCTURE input format requires a different dataset for each set of each population, ParallelStructure allows the user to work from a large input file containing all the populations one might need to analyze in STRUCTURE. The joblist file lists a set of "jobs" that can include all or only a subset of population. This avoids making a different input file for each population subset. The user defines the set of populations to be included, and specifies the STRUCTURE parameter K, burin and number of iteration in the oblist file.
Job joblist format is below. The job desginator is first (T1, T2, etc); then the populations to be analyzed, separated by commas (1,2,3,4 in line 1), then the K value (3 in line 1), the burnin (1000 in line 1) and total iterations (10000 in line 1). If all populations in the data must be analyzed pairwise (all VS all), the list of populations for the given job can be replaced by "pairwaise.matrix" (see job T11 in example joblist).
T1 1,2,3,4 3 1000 10000
T2 1,2,3,4 4 1000 10000
T3 2,3,4,5 3 1000 10000
T4 2,3,4,5 4 1000 10000
T5 3,4,5 3 1000 10000
T6 3,4,5,6 4 1000 10000
T7 3,4,5,6 3 1000 10000
T8 4,5,6,7 3 1000 10000
T9 4,5,6,7 4 1000 10000
T10 5,6,7,8 3 1000 10000
T11 pairwise.matrix 2 1000 10000
T12 1,2,3,4,5,6,7,8 2 1000 10000
T13 1,2,3,4,5,6,7,8 3 1000 10000
T14 1,2,3,4,5,6,7,8 4 1000 10000
T15 1,2,3,4,5,6,7,8 5 1000 10000
T16 1,2,3,4,5,6,7,8 6 1000 10000
T17 1,2,3,4,5,6,7,8 7 1000 10000
T18 1,2,3,4,5,6,7,8 8 1000 10000
T19 3,4,5,6,7,8 3 1000 10000
T20 3,4,5,6,7,8 4 1000 10000
Output files: ParallelStructure returns a set of files. First, a set of PDF files, one for each analysis specified in the joblist file. And second, it creates a set of results files, a pair for each analysis in joblist.txt. If parameters printqhat=1 or plot output=1, q files and graphs in _pdf format are produced. ParallelStructure also produces one .csv file called "results summary" in the working directory. This file contains a table that summarises for each job listed in the joblist file: main job parameters (job ID, k, number of iteration and burnin, as well as result summary statistics log likelihood of the data, mean and variance of the log likelihood, and mean value of alpha).
The table below shows the kinds of results returned by CIPRES Science Gateway:
Input File Names | Sample File from a Test |
pdf file | pstructure_example_data.txt |
joblist1.txt | pstructure_joblist1.txt |
Sample Output File Type | File Name |
log file | pstructure_job_T1.pdf |
tree file(s) | pstructure_results_job_T1_f |
operators file | pstructure_results_job_T1_q |
results summary file | pstructure_results_summary.csv |
If you use Parallel Structure here, please cite:
Besnier, F., and Glover, K. A. (2013) ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers. PLoS ONE 8, e70651
Pritchard, J. K., Stephens, M., and Donnelly, P. (2000) Inference of Population Structure Using Multilocus Genotype Data. Genetics 155, 945
If there is a tool or a feature you need, please let us know.