Crimap Instructions |

last update: 16/10/2013

# Background #

The original [Crimap](http://compgen.rutgers.edu/old/multimap/crimap/) is out of date. It was first written when allozymes were the prevalent molecular markers. Consequently, it has some problems dealing with large pedigrees and large numbers of markers that article available for analysis today.

To deal with this we use the Crigen suite of programs, which prepare text files to make Crimap more capable and efficient with handling modern datasets which are larger and more complicated.

An [improved version](http://www.genome.iastate.edu/tools/share/crimap/) of Crimap was produced by Jill Maddox and Ian Evans (2.504) which is generally better than the old. However the Crigen suite of programs is not compatible with the improved Crimap.

Consequently, analysis starts with Crigen, then the old Crimap, then continues with Crigen (Autogroup) to produce chromosome-specific files, and then moves to the new Crimap which is better for the analysis.

[Michael D Grosz](http://www.aaccnet.org/membership/people/Pages/corpdetail.aspx?MID=102258) and Monsanto have been providing the Crigen code to researchers to serve the community, and have kindly granted permission to distribute the Crigen programs through this site.

[Jill Maddox](mailto:[email protected]) and Ian Evans, who are still developing the improved Crimap, provide it for free through the animal genome website. Downloading requires to fill a short form with information on how it is used. It is important to provide as much information as possible, to help with making sure it works properly with current experimental designs, and to help with funding which is easier if a large user base can be demonstrated.

It is best to compile Crimap on the particular machine or OS used, to make sure it works with the available memory efficiently. For example it is possible fora pre-compiled version to crash when it runs out of memory, but a version compiled on the same machine to work fine. In practice, a pre-compiled version should work well enough, except if it crashes, in which case you should compile your own version which may not crash.

In practice I start the analysis with Crimap 2.4 and Crigen on Ubuntu Linux, running on a virtual machine on MacOSX because I could not compile the old programs in a modern machine. I then continue to the Crimap 2.5 which can be compiled on MacOSX (see Installation Instructions)

All commands have more options with which to run, however the following steps can be used as a backbone of the analysis. The full options are available from the three manuals that come with Crimap.

# Workflow #

## Prepare a .gen file ##

The general format of a .gen file is shown in the table.

ANIMAL_ID SIRE_ID DAM_ID Marker1 Marker2 Marker3
1 0 0 A C A T A A
2 0 0 A C A G A C
3 1 2 A C A G A C

Individuals must be named with numbers.

Sex-linked markers will be homozygotes for one sex (for example in males in *Drosophila*). Edit their genotyping so they are heterozygous, with one allele not present in females. For example a locus which is scored as either A or T and in males is either AA or TT should be changed in males to become AC or TC. Alternatively, the non existent allele can be set to `-1`.

Full instructions on how to prepare a `.gen` file from GenomeStudio are at the my [GenomeStudio workflow](http://parisveltsos.com/research/genomestudio.html#output).

## Generate a Twopoint file for Autogroup ##

Separate the pedigree into families of about 50 individuals for 3 generations. These files are easier for Crimap to work with.

./crigen -g myData.gen -o 10 -size 50 -gen 3

./crimap 10 prepare

choose no no no and then option 7 (twopoint) yes yes

Note, segmentation errors can be reported during the 'prepare' command. I found that they were resolved by choosing Linux encoding for line endings in the `.gen` file. This is the default way text files are saved from BBEdit or Textmate.

![Save as window in TextMate. Choose 'LF (recommended)' for line encoding.][figLineEndingEncoding]

Alternatively, it is possible to convert line endings to Linux format in the terminal using

r '\r' '\n' < in.txt > out.txt

next, run twopoint

./crimap 10 twopoint > chr10.tpt

The output of this command differs between the new and old crimap versions. The old version provides the sex-average lod score while the new version provides the sex-specific lod score. Up to now we should be running the old version because the Autogroup command (below) expects its input in the old format.

`chr10.tpt` shows all two-way linkages between markers. Consult the lod scores (end of each line). They give a feel of how linked the markers are. The file can be consulted to find markers that are not linked, to be assigned to different chromosomes.

## Identify obvious errors

Investigate the individuals that cause many inheritance errors by cross-referencing to their scoring in GenomeStudio. If the errors cannot be reduced, mark the individuals as un-scored for the error-prone markers (manually in the genotyping file). If the same individual causes errors with many markers consider removing the individual from the analysis.

An easy way to identify pedigree errors from a genotyping file is to use [GenotypeChecker](http://bioinformatics.roslin.ac.uk/genotypechecker/) by the Roslin institute. The `.gen` file that Crimap uses needs to be converted to work with GenotypeChecker.

**Update 131217** - A newer program [VIPER](http://bioinformatics.roslin.ac.uk/viper/) has been developers by Roslin Institute, which is a little different but gives more information about errors in a compact form. I have not tried it out but it is probably worth trying out.

### Prepare files for Genotype Checker

The following instructions generate files for [Genotype Checker](http://bioinformatics.roslin.ac.uk/genotypechecker/resources.html) using R. They are adapted from a script provided by [Jon Slate](http://www.jon-slate.staff.shef.ac.uk). The paths assume you are working on a Mac.

The two input files and the associated R script are available [here](../resources/forAcademics/crimap/ped2GenotypeChecker.zip), but also described in the text below.

#### Input files

We need two input files per chromosome:

A genotype file which looks like

Progeny Sire Dam Sex MarkerA.1 MarkerA.2 MarkerB.1 MarkerB.2
98 0 0 M A G A G
168 98 1 F G G G G
169 57 100 F A G A G
170 57 100 F G G A G
99 0 0 F A G A G

and a map file which looks like

marker position
MarkerA 0.000
MarkerB 43.987

#### In R

Read in the genotype file

genotype_file<-read.table("/Users/username/Desktop/genotype_file.txt", header=T)

Retain just the pedigree information

GenotypeCheckerPedFile <- genotype_file[,1:4]

Import map data and add a marker order column to them

example.map <- read.table('/Users/username/Desktop/exampleMap.txt', header=T, sep="\t")
maporder <- example.map

Store the number of markers in a variable

m <- (ncol(genotype_file)-4) * 0.5

Store marker order in a column

maporder$maporder <- seq (1,m,1)

Make the genotype file for genotype checker

GenotypeCheckerMarkerFile <- genotype_file[,c(1,5,6)]
GenotypeCheckerMarkerFile[,4] <- maporder[1,1]
GenotypeCheckerMarkerFile <- GenotypeCheckerMarkerFile [,c(1,4,2,3)]
colnames(GenotypeCheckerMarkerFile) <- c("ID","Locus","Allele1","Allele2")

for (i in 2:m) {
GenotypeCheckerTemp <- genotype_file[,c(1,(2*i+3),(2*i+4))]
GenotypeCheckerTemp[,4] <- maporder[i,1]
GenotypeCheckerTemp <- GenotypeCheckerTemp[,c(1,4,2,3)]
colnames(GenotypeCheckerTemp) <- c("ID","Locus","Allele1","Allele2")
GenotypeCheckerMarkerFile <- rbind (GenotypeCheckerMarkerFile,GenotypeCheckerTemp)
}

Find and replace unknown genotypes with ? required by genotype checker

GenotypeCheckerMarkerFile$Allele1 <- gsub ("^0","?", GenotypeCheckerMarkerFile$Allele1)
GenotypeCheckerMarkerFile$Allele2 <- gsub ("^0","?", GenotypeCheckerMarkerFile$Allele2)

Export data

write.table (GenotypeCheckerMarkerFile, '/Users/username/Desktop/GenotypeCheckerMarkerFile', col.names=F, sep="\t",quote=F,row.names=F)
write.table (GenotypeCheckerPedFile, '/Users/username/Desktop/GenotypeCheckerPedFile', col.names=F, sep="\t",quote=F,row.names=F)

### Crimap pedigree error reporting ###

After using GenotypeChecker, this section is not as useful, but I have kept it in case someone finds it useful.

The crimap command above outputs inheritance errors in the terminal window. If they are too many, they do not fit in the terminal window even if scrolling up. The following steps recover them and use them to make a manageable text file which can be used to identify particularly error-prone individuals, which may reflect pedigree errors or bad DNA preparation for those individuals.

Run the `script` command in the Terminal. This outputs all Terminal text to a `typescript.txt` file. Run the commands as normal and then run `exit`. For example: