Synteny module
readTransMap
readTransMap (transMapFile, ATaccessionName='araport', similarityGroupColName='Orthogroup')
Read trandmap file which is pretty much a tsv file with first column containing orthogroup, or similarity group ID, and the rest of columns correspond to each accession with unique gene/interval IDs in each columns for the given orthogroup. If one accession has several genes in the given orthogroup, then they will be in a comma separated list.
generateOrder
generateOrder (files, priorityAccession='TIAR10')
Genrate a list of files and float a file which contains specific priority accession name (or any given string) on top of the list.
It is not used any more, but left if will be needed in the future.
getIDs
getIDs (iterator)
readGFF
readGFF (gffFile:str)
Function which reads GFF3 file into a dict structure. For large GFF files random search is extremely slow and is impossible to use. This greatly speed up the process (from hours to seconds).
addPangenomePositions
addPangenomePositions (pangenomeFiles)
processAccessions
processAccessions (annotationFiles, ATmap=None, pangenomeDict=None, similarityIDKey=None, similarityIDAssignment=None, pangenomeFiles=None, sequenceFilesDict=None, seqidJoinSym='_', ATsplitSym=',')
This function process only custom annotation for a group of accession with similarity IDs. It also creates separate ATmap and returns the last number of unmatchedID for consistency. There is a separate function for processing reference annotation.
sequenceFilesDict can either be None or dict with accession IDs as keys and paths to FASTA files with sequences of annotated elements as values.
pangenomeFiles: list[str] or None. If list of strings is provided, then its length should be the same length as annotationFiles and should be in the same order as in annotationFiles, i.e. pangenomeFiles[i] should correspond to annotationFiles[i]
similarityIDAssignment: str. Can be either ‘gene’ or ‘mRNA’. If it is not any of these an error will be raised.
recordSegment
recordSegment (name, segmentIDs, segmentIDToNumDict, sequence=None, gfaFile=None, segmentData=None)
recordAnnotation
recordAnnotation (nodeID, accessionID, sequenceID, chrID, start, end, og, atList, sequence, nodesMetadata, pstart=-1, pend=-1)
recordAltChr
recordAltChr (nodeID, accessionID, chrID, start, end, nodesMetadata)
addLink
addLink (links, prevPathSegment, name, forward)
links: mutable prevPathSegment: mutable
generatePathsLinks
generatePathsLinks (genesAll, ATmap, accessionID, sequences, OGList, segmentIDs, nodesMetadata, segmentIDToNumDict, links, usCounter, chromosomeID=None, doUS=True, segmentData=None, gfaFile=None)
This function takes a list of genes in specific format (genesAll) and some extra data and pretty much generates a graph (gene graph from annotations).
gfaFile: file handle to write segments to GFA file OGList: mutable links: mutable usCounter: mutable
readSegmentIDs
readSegmentIDs (path)
writeSegmentIDs
writeSegmentIDs (path, segmentIDs)
writePath
writePath (gfaFile, AccessionID, path, cigar, doCigars)
writeLinks
writeLinks (gfaFile, links, doCigars=True)