Parallel Structure

ParallelStructure

ParallelStructure is an R-based implementation of the program Structure (Pritchard, et al., 2000). Structure is a tool for genetic analysis of population structure. The implementation of ParallelStructure uses R to take advantage of multi-core processors.

Detailed information about ParallelStructure was provided by the author here.

Manual for ParallelStructure here.

Manual for Structure.

Structure Google Group

Input files: ParallelStructure accepts as input two files: one file contains the data set, and has the same format as a Structure software file. The second is a joblist file which identifies individual populations to be analyzed within the data set. A list of tasks to be performed is stored in a joblist file. In joblist file, each line corresponds to an individual job. While STRUCTURE input format requires a different dataset for each set of each population, ParallelStructure allows the user to work from a large input file containing all the populations one might need to analyze in STRUCTURE. The joblist file lists a set of "jobs" that can include  all or only a subset of population. This avoids making a different input file for each population subset. The user defines the set of populations to be included, and specifies the STRUCTURE parameter K, burin and number of iteration in the oblist file.

Job joblist format is below. The job desginator is first (T1, T2, etc); then the populations to be analyzed, separated by commas (1,2,3,4 in line 1), then the K value (3 in line 1), the burnin (1000 in line 1) and total iterations (10000 in line 1). If all populations in the data must be analyzed pairwise (all VS all), the list of populations for the given job can be replaced by "pairwaise.matrix" (see job T11 in example joblist).
 

T1 1,2,3,4 3 1000 10000
T2 1,2,3,4 4 1000 10000
T3 2,3,4,5 3 1000 10000
T4 2,3,4,5 4 1000 10000
T5 3,4,5 3 1000 10000
T6 3,4,5,6 4 1000 10000
T7 3,4,5,6 3 1000 10000
T8 4,5,6,7 3 1000 10000
T9 4,5,6,7 4 1000 10000
T10 5,6,7,8 3 1000 10000
T11 pairwise.matrix 2 1000 10000
T12 1,2,3,4,5,6,7,8 2 1000 10000
T13 1,2,3,4,5,6,7,8 3 1000 10000
T14 1,2,3,4,5,6,7,8 4 1000 10000
T15 1,2,3,4,5,6,7,8 5 1000 10000
T16 1,2,3,4,5,6,7,8 6 1000 10000
T17 1,2,3,4,5,6,7,8 7 1000 10000
T18 1,2,3,4,5,6,7,8 8 1000 10000
T19 3,4,5,6,7,8 3 1000 10000
T20 3,4,5,6,7,8 4 1000 10000

Output files: ParallelStructure returns a set of files. First, a set of PDF files, one for each analysis specified in the joblist file. And second, it creates a set of results files, a pair for each analysis in joblist.txt. If parameters printqhat=1 or plot output=1, q  files and graphs in  _pdf format are produced. ParallelStructure  also  produces  one .csv  file called "results summary" in the working directory.  This  file contains a table that summarises for each job listed in the joblist  file:  main job parameters (job ID, k, number of iteration and burnin, as well as result summary statistics log likelihood of the data, mean and variance of the log likelihood, and mean value of alpha).

The table below shows the kinds of results returned by CIPRES Science Gateway:

Input File Names Sample File from a Test
pdf file pstructure_example_data.txt
joblist1.txt pstructure_joblist1.txt
   
Sample Output File Type File Name
log file pstructure_job_T1.pdf
tree file(s) pstructure_results_job_T1_f
operators file pstructure_results_job_T1_q
results summary file pstructure_results_summary.csv
   

 

If you use Parallel Structure here, please cite:

Besnier, F., and Glover, K. A. (2013) ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers. PLoS ONE 8, e70651

Pritchard, J. K., Stephens, M., and Donnelly, P. (2000) Inference of Population Structure Using Multilocus Genotype Data. Genetics 155, 945

If there is a tool or a feature you need, please let us know.

hummingbird in flight

Get 1000 Hours free

On the UCSD Supercomputer

Start Your Trial