Synteny module

Provides functionality to construct special synteny graph from pangenome annotation (and actual sequences).

readTransMap

 readTransMap (transMapFile, ATaccessionName='araport',
               similarityGroupColName='Orthogroup')

Read trandmap file which is pretty much a tsv file with first column containing orthogroup, or similarity group ID, and the rest of columns correspond to each accession with unique gene/interval IDs in each columns for the given orthogroup. If one accession has several genes in the given orthogroup, then they will be in a comma separated list.


generateOrder

 generateOrder (files, priorityAccession='TIAR10')

Genrate a list of files and float a file which contains specific priority accession name (or any given string) on top of the list.

It is not used any more, but left if will be needed in the future.


getIDs

 getIDs (iterator)

readGFF

 readGFF (gffFile:str)

Function which reads GFF3 file into a dict structure. For large GFF files random search is extremely slow and is impossible to use. This greatly speed up the process (from hours to seconds).


addPangenomePositions

 addPangenomePositions (pangenomeFiles)

processAccessions

 processAccessions (annotationFiles, ATmap=None, pangenomeDict=None,
                    similarityIDKey=None, similarityIDAssignment=None,
                    pangenomeFiles=None, sequenceFilesDict=None,
                    seqidJoinSym='_', ATsplitSym=',')

This function process only custom annotation for a group of accession with similarity IDs. It also creates separate ATmap and returns the last number of unmatchedID for consistency. There is a separate function for processing reference annotation.

sequenceFilesDict can either be None or dict with accession IDs as keys and paths to FASTA files with sequences of annotated elements as values.

pangenomeFiles: list[str] or None. If list of strings is provided, then its length should be the same length as annotationFiles and should be in the same order as in annotationFiles, i.e. pangenomeFiles[i] should correspond to annotationFiles[i]

similarityIDAssignment: str. Can be either ‘gene’ or ‘mRNA’. If it is not any of these an error will be raised.


recordSegment

 recordSegment (name, segmentIDs, segmentIDToNumDict, sequence=None,
                gfaFile=None, segmentData=None)

recordAnnotation

 recordAnnotation (nodeID, accessionID, sequenceID, chrID, start, end, og,
                   atList, sequence, nodesMetadata, pstart=-1, pend=-1)

recordAltChr

 recordAltChr (nodeID, accessionID, chrID, start, end, nodesMetadata)

readSegmentIDs

 readSegmentIDs (path)

writeSegmentIDs

 writeSegmentIDs (path, segmentIDs)

writePath

 writePath (gfaFile, AccessionID, path, cigar, doCigars)