warnings.filterwarnings("ignore")Export module
Imports and templates
Functions intro
Notation and terminology
In documentation, we refer to graph nucleotides, columns and components. Components contain columns and columns contain nucleotides.
In the code variable names and comments use slightly different notation. Columns in documentation are bins in code and comments, whereas graph nucleotides in documentation are called columns in the code and comments. This happened for the legacy reasons, i.e. originally there was no nucleotide numbers (columns) in the visualised graph structure and components were split into bins (literally, equal sized bins). It is not true anymore, but old terminology left here.
Ideally all variable names and comments should be changes in line with documentation notation, but I have no idea when this can happen.
For various operational or legacy reasons, some of the data structures (usually, lists/array) use 0-based indexing, whereas some others (usually dicts) can be 0-based or 1-based. Here are the main structures with numerical indexing and their index bases:
- components: keys: 0-based, values: occupants: 0-based, binNumbers: 0-based
- componentToNode: keys: 0-based, values: 1-based
- nodeToComponent: keys: 0-based, values: 1-based
- newToOldInd and oldToNewInd: both index and values are 0-based numbers of components in previous and current zoomlayer.
- fromLinks: top level keys (from nodes): 1-based, bottom level keys (to nodes): 1-based, values (list of participants): 0-based
- toLinks: top level keys (to nodes): 1-based, bottom level keys (from nodes): 1-based, values (list of participants): 0-based
- fromComponentLinks: top level keys (from components): 1-based, bottom level keys (to components): 1-based, values (set of participants): 0-based
- toComponentLinks: top level keys (to components): 1-based, bottom level keys (from components): 1-based, values (set of participants): 0-based
Generating base layer
This set of functions generate the data structures for initial, lowest level zoom (nucleotide or minimum unit resolution). The main orchestration function is baseLayerZoom.
Functions
outLeftRight
outLeftRight (nodeInversionInPath, leftFarLink, rightFarLink, reason, debug=False, inversionThreshold=0.5)
recordLinks
recordLinks (nodeIdx, nextNode, pathID, step, nodeInversionInPath, nonLinearCond, pathNodeArray, fromLinks, toLinks, debug=False, inversionThreshold=0.5)
checkForBreak
checkForBreak (nodeIdx, nodeLen, nodePathsIdx, nodeSeqInPath, uniqueNodePathsIDs, pathNodeCount, pathLengths, pathNodeArray, pathDirArray, occupancy, inversion, fromLinks, toLinks, nBins, maxLengthComponent, blockEdges, inversionThreshold=0.5, debug=False)
Function to check whether the component should be broken before (left) and/or after (right) it.
| Type | Default | Details | |
|---|---|---|---|
| nodeIdx | |||
| nodeLen | |||
| nodePathsIdx | |||
| nodeSeqInPath | |||
| uniqueNodePathsIDs | |||
| pathNodeCount | |||
| pathLengths | |||
| pathNodeArray | |||
| pathDirArray | |||
| occupancy | |||
| inversion | |||
| fromLinks | |||
| toLinks | |||
| nBins | |||
| maxLengthComponent | |||
| blockEdges | |||
| inversionThreshold | float | 0.5 | |
| debug | bool | False | |
| Returns | leftFarLink: bool. Shows whether there is a far link on the left that will require component break. |
/home/pigrenok/.pyenv/versions/3.10.9/envs/pygengraph/lib/python3.10/site-packages/fastcore/docscrape.py:225: UserWarning: Unknown section Return
else: warn(msg)
nodeStat
nodeStat (nodeIdx, pathNodeArray, nodeLengths)
Function calculate information about node as part of the overall graph.
finaliseComponentBase
finaliseComponentBase (component, components, componentNucleotides, matrix, occupants, nBins, componentLengths, nucleotides, zoomLevel, accessions, inversionThreshold=0.5)
processAnnotationInterval
processAnnotationInterval (posStart, posEnd, annotation, res)
combineAnnotation
combineAnnotation (combAnnotation)
updateEdges
updateEdges (accEdge, edgeAccessions, compNum)
Function fills up either accStarts or accEnds (on which component each accession starts and on which ends). compNum is assumed to be 1-based.
Wrapper
Now ‘positions’ key in metadata contains either one position (chr:posStart..posEnd) or two comma separated positions where one is genomic position, and another one is pangenomic position.
baseLayerZoom
baseLayerZoom (graph, outputPath, outputName, pathNodeArray, pathDirArray, pathLengths, nodeLengths, pathNodeLengthsCum, maxLengthComponent, blockEdges, CPUS=32, inversionThreshold=0.5, isSeq=True, debug=False, debugTime=False)
Transfer from nodes to components (links and other structures)
This is one of the first processes happening while exporting graph. While graph operates with nodes (which can be linearly connected with each other in all paths), then exporting works with components. In almost all cases, components have at least some non-linear links with other components on both sides. The only exclusion is when a component is too large and split into several ones. In this case two components will be connected by 100% linear links. Also, graph operates with paths along with nodes, whereas exporting works with components and accession-specific links between them.
These functions (with main orchestrating one is nodeToComponentLinks) are converting nodes and paths to components and links.
splitforwardInversedNodeComp
splitforwardInversedNodeComp (pathList, component, isInverse)
fillLinksBase
fillLinksBase (nodeInComp, nodeToComponent, fromLinks, toLinks, fromComponentLinks, toComponentLinks, compNum, components, doLeft=True, doRight=True)
convertLink
convertLink (linkFrom, linkTo, translateDict, forwardLinks, isZoom)
recordUpdatedPairedLink
recordUpdatedPairedLink (firstLinkSet, secondLinkSet, firstLink, secondLink, substituteLink, pairedLinksConv)
convertRemovableComponents
convertRemovableComponents (translateDict, linkLengths, pairedLinks, interconnectedLinks, blockEdges, forwardLinks, isZoom=True)
translateDict should be a dict in format {<old node/component id 0-based>:<new component id 1-based>} pathNodeInv should be a dict of dicts of the following structure: {
This is done through fromLinks and toLinks and throught associated directions of available accessions. For this we need to loop through strands and do it separately for each strand.
For paired links there is a possibility that a single node link will give several component links. In this case, the cross product of all first and second links will be added to converted paired links.
❗The substitute links should be added only to the paths that contained both first and second links in the first place. This should be controlled in link removal routine.
nodeToComponentLinks
nodeToComponentLinks (components, componentToNode, nodeToComponent, fromLinks, toLinks, graph, fromComponentLinks, toComponentLinks, linkLengths=None, pairedLinks=None, interconnectedLinks=None, blockEdges=None, debug=False)
Identifying collapsible links and rearrangement blocks (works incorrectly, left now for compatibility).
In order to be able to generate multiple zoom levels of the graph view, non-linear links describing small (too small to show at the given zoom level) rearrangements should disappear whereas links describing larger blocks should persist. This will allow to see larger rearrangements clearly on higher zoom levels.
In order to do it, each link should be associated with some size (or rearrangement), so, that when each zoom level is generated, they can be removed when the rearrangement cannot be shown at the given zoom level.
Some links are also associated with each other, and when they are removed new links (usually linear ones) should be reinstated to make larger rearrangements clearer.
At the moment, the process of identifying these sizes is not working great as it leaves too much non-linear links to the very top level where suddenly all non-linear links disappear and the whole graph from over-complicated jumps to pretty much trivial without any rearrangements. If to use digital map analogy, most of country roads persist while you zoom out on the map until almost the whole Earth is in view and then at some point the view becomes just a blue/green ball with very rough boundary of continents and oceans.
At the moment, all associated links get into a pool of so called interconnected links and if one link gets associated with specific size, then all links get the same association, and then maximum size is selected. But that means that if one link describes one small and also on the edge of large rearrangement, and another link is only associated with large rearrangement, then the latter link will also be associated with the size of large rearrangement and will stay until the zoom level where the large rearrangement is too small to show. That is incorrect.
I think, each link should get its own associations with sizes (and maximum should be taken) and clearing of the link should happen individually. Yet, if one link with smaller size and one with larger size are paired, the reinstated link should appear after smaller link removed.
Another alternative is just to get contiguous blocks in each path and associate each link pair (describing start and end of each block) as a pair of links that needs to be cleared in association with the size of this block. Need control of repeats in these blocks. If it happens, then a single link can describe a whole rearrangement. In addition, an extra control for inversion is also needed. In particular, if outside the block the numbers do not create a range to fin the inverted node (e.g. 1+,4-,3+, or 3+,2-,5+), then it should be ignorred for this step. It means there is a smaller rearrangement within larger one.
Another alternative (described in TODO) is to convert paths of nodes to paths of edges and operate with them. I guess, it is not far away from the previous paragraph.
Identifying path breaks
findBreaksInPath
findBreaksInPath (combinedArray, nextNodeDict)
identifyPathBreaks
identifyPathBreaks (combinedNodeDirArray, pathLengths, pathNextNode)
Block processing
interweaveArrays
interweaveArrays (a, b)
extractGapsBlocks
extractGapsBlocks (block, path, nodeLengths, getComplex=False)
This function either split block by gaps (e.g. block [1,2,4,5,6,8] will yield [1,2],[4,5,6],[8])
If getComplex is set to True, then first gaps are filtered for nodes that are not passed by the path. After that, edges are identified and then for them nodes not passed by the path are filtered out. Then we find the longest block out of edges, and then the longest edge combine with all gaps and find the shortest one. That shortest one is going to be the one returned.
E.g. block [1,2,4,5,8] will give edges [1,2],[4,5],[8] and gaps [3],[6,7].
If path does not contain 6, then edges will be the same, but gaps will be [3],[6]
If path does not contain 3, then edges will be [1,2,4,5],[8] and gaps [6,7]
The exact block which will be returned depends on sizes of each node.
checkSplitBlock
checkSplitBlock (block, gapList=None)
Not used at the moment
Function checks if the block has any gaps and split into a list of blocks between gaps (alternatively fill gaps or leave things as they are). At the moment the gapped block will be converted to list of blocks between gaps
blockListToLengths
blockListToLengths (blockList, nodeLengths)
convertBlocksToLengths
convertBlocksToLengths (linksBlocks, nodeLengths)
Converting blocks associated with each link to lengths and then selecting the longest one (?)
Link processing
addToLinkPool
addToLinkPool (link1, link2, interconnectedLinks)
blockFromSingleLink
blockFromSingleLink (pathID, link, pathNodeInversionRate, pathNextNode)
Identify block from a single link It is the block that the link bounds, i.e.: If link if forward then it is inside the link + any side that is inverted If link is backward, then it is inside + any side that is normal direction.
checkIndividualLink
checkIndividualLink (link, pathID, usedSecondInPairLink)
Function checks if this link is already second in pair. If it is, then it is not considered separately (return True?). Otherwise, it should be considered and block generated (using blockFromSingleLink) and associated with this link.
processDoublePairedLinks
processDoublePairedLinks (leftLink, rightLink, pathID, doublePairedLinks, pairedLinks, interconnectedLinks, linksBlocks, pathNextNode)
processIndividualLink
processIndividualLink (link, pathID, pathNodeInversionRate, pathNextNode, usedSecondInPairLink)
recordLinkBlockAssociation
recordLinkBlockAssociation (link, blockList, linksBlocks)
findNextNode
findNextNode (node, combinedArray)
processPseudoPair
processPseudoPair (breakPos, returnPos, pathID, pathNodeArray, combinedNodeDirArray, pathNextNode, nodeLengths, usedSecondInPairPath, pairedLinks, linksBlocks)
processStartsEnds
processStartsEnds (mainLink, linkStarts, linkEnds, interconnectedLinks, forwardLinks)
Currently not in use.
TODO!!! Need to add checks for whether one link is intersecting the other or one is fully inside.
postprocessLinksBlocks
postprocessLinksBlocks (linksBlocks, interconnectedLinks)
processPathBreaks
processPathBreaks (pathBreakCoordPairs, pathNodeArray, pathNextNode, combinedNodeDirArray, pathNodeInversionRate, pathLengths, nodeLengths, forwardLinks)
Rearrangement blocks
addBlockEdge
addBlockEdge (edge, size, blockEdges)
identifyRearrangementBlocks
identifyRearrangementBlocks (nodesStructure, nodeLengths)
block Edges is a dict with a structure:
Wrapper
getRemovableStructures
getRemovableStructures (graph=None, nodeLengths=None, pathLengths=None, pathNodeArray=None, pathDirArray=None, pathNextNode=None, forwardLinks=None, inversionThreshold=0.5)
getBlockEdges
getBlockEdges (graph=None, nodeLengths=None, pathLengths=None, pathNodeArray=None, pathDirArray=None, pathNextNode=None, forwardLinks=None, inversionThreshold=0.5)
Generating zoom layer
This set of functions (with nextLayerZoom being main orchestration function) doing the job of generating next zoom level by collapsing columns and then components together after smaller non-linear links are removed (by different set of functions).
Finalising bin and component
addLink
addLink (fromComp, fromStrand, toComp, toStrand, pathList, fromComponentLinks, toComponentLinks)
def getOccInvChange(binColLengths,binBlockLength,binOcc,binInv,prevOcc,prevInv,inversionThreshold=0.5):
occChanged = False
invChanged = False
occ = {}
inv = {}
for pathID in binOcc:
# Averaging occupancy
occ[pathID] = sum([bl*bo for bl,bo in zip(binColLengths,binOcc[pathID])])/binBlockLength
# Do comparison through floor and then abs difference > 0
if np.abs(np.floor(occ[pathID]+0.5)-np.floor(prevOcc.get(pathID,occ[pathID])+0.5))>0 \
and occ[pathID]>0.5 and prevOcc.get(pathID,occ[pathID])>0.5:
occChanged = True
prevOcc[pathID] = occ[pathID]
# Averaging invertion
inv[pathID] = sum([bl*bo*bi for bl,bo,bi in zip(binColLengths,binOcc[pathID],binInv[pathID])])/(binBlockLength*occ[pathID])
if (inv[pathID]-inversionThreshold)*(prevInv.get(pathID,inv[pathID])-inversionThreshold)<0 or \
(inv[pathID]-inversionThreshold)*(prevInv.get(pathID,inv[pathID])-inversionThreshold)==0 and \
inv[pathID]*prevInv.get(pathID,inv[pathID])>inversionThreshold*inversionThreshold:
# The second comdition after `or` is taking the case where one is equal to inversionThreshold
# and another is more than inversionThreshold.
invChanged = True
prevInv[pathID] = inv[pathID]
return occChanged,invChanged,occ,inv,prevOcc,prevInvgetOccInv
getOccInv (binColLengths, binBlockLength, binOcc, binInv, inversionThreshold=0.5)
combineIntervals
combineIntervals (posPath)
recordBinZoom
recordBinZoom (occ, inv, binPosArray, nBins, nCols, binBlockLength, binBlockLengths, binColLengths, binColStart, binColStarts, binColEnd, binColEnds, matrix, inversionThreshold=0.5)
getAverageInv
getAverageInv (binBlockLengths, matrixPathArray)
finaliseComponentZoom
finaliseComponentZoom (component, components, componentLengths, nBins, nCols, occupants, binBlockLengths, binColStarts, binColEnds, matrix, starts, ends, forwardPaths, invertedPaths, compInvNum, compInvDen, inversionThreshold=0.5)
| Type | Default | Details | |
|---|---|---|---|
| component | |||
| components | |||
| componentLengths | componentNucleotides, | ||
| nBins | |||
| nCols | |||
| occupants | |||
| binBlockLengths | |||
| binColStarts | |||
| binColEnds | |||
| matrix | |||
| starts | |||
| ends | |||
| forwardPaths | |||
| invertedPaths | |||
| compInvNum | |||
| compInvDen | |||
| inversionThreshold | float | 0.5 |
finaliseBinZoom
finaliseBinZoom (compNum, binOcc, binInv, binPosArray, nBins, nCols, binBlockLength, binBlockLengths, binColLengths, binColStart, binColStarts, binColEnd, binColEnds, matrix, newComponent, newComponents, newComponentLengths, newFromComponentLinks, newToComponentLinks, occupants, linkLengths, starts, ends, forwardPaths, invertedPaths, pathsToInversion, newToOldInd, oldToNewInd, inversionThreshold=0.5)
| Type | Default | Details | |
|---|---|---|---|
| compNum | |||
| binOcc | |||
| binInv | |||
| binPosArray | |||
| nBins | |||
| nCols | |||
| binBlockLength | |||
| binBlockLengths | |||
| binColLengths | |||
| binColStart | |||
| binColStarts | |||
| binColEnd | |||
| binColEnds | |||
| matrix | |||
| newComponent | |||
| newComponents | |||
| newComponentLengths | compAccDir,#newComponentNucleotides, | ||
| newFromComponentLinks | |||
| newToComponentLinks | |||
| occupants | |||
| linkLengths | |||
| starts | |||
| ends | |||
| forwardPaths | |||
| invertedPaths | |||
| pathsToInversion | |||
| newToOldInd | |||
| oldToNewInd | |||
| inversionThreshold | float | 0.5 |
Break component?
getMatrixPathElement
getMatrixPathElement (matrix, pathID)
checkChange
checkChange (compNum, components, zoomLevel, blockEdges)
joinComponents
joinComponents (leftComp, rightComp, maxLengthComponent, inversionThreshold=0.5)
!!! ⚠️ Currently not used
If the joining was successful, the function will return a joined component.
If the joining was not successful and was aborted for one of the following reasons, it will return a list of original components. The reasons for aborting the joining can be the following: - In one of the paths the invertion is lower than threshold in one component and higher in the other. - Left component contains at least one end - Right component contains at least one start
The function will not check links for coming or going on the right of the left component and left of the right component. It will just get left links from left component and right links from right component and assign them to the new component.
checkLinksZoom
checkLinksZoom (compNum, fromComponentLinks, toComponentLinks)
checkForBreaksZoom
checkForBreaksZoom (zoomLevel, compNum, components, fromComponentLinks, toComponentLinks, blockEdges)
Update links
splitPositiveNegative
splitPositiveNegative (compID, accs, components)
This function simply pulls all accession presented in the component and split them into forward and inversed.
| Type | Details | |
|---|---|---|
| compID | ||
| accs | ||
| components | ||
| Returns | posAcc: list[int]. IDs of accession which has forward direction in given component. |
intersectAccLists
intersectAccLists (accList, dirDict)
updateLinks
updateLinks (newToOldInd, oldToNewInd, fromComponentLinks, toComponentLinks, linkLengths, pairedLinks, interconnectedLinks, blockEdges, accStarts, accEnds, components, compAccDir, newFromComponentLinks={}, newToComponentLinks={})
newToOldInd and oldToNewInd: both index and values are 0-based numbers of components in previous and current zoomlayer.
Main layer generation function + assistant function
isStartEnd
isStartEnd (compNum, components)
nextLayerZoom
nextLayerZoom (zoomLevel, components, componentLengths, fromComponentLinks, toComponentLinks, graph, accStarts, accEnds, maxLengthComponent, linkLengths, pairedLinks, interconnectedLinks, blockEdges, inversionThreshold=0.5, debug=False, debugTime=False)
| Type | Default | Details | |
|---|---|---|---|
| zoomLevel | |||
| components | |||
| componentLengths | componentNucleotides, | ||
| fromComponentLinks | |||
| toComponentLinks | |||
| graph | |||
| accStarts | |||
| accEnds | |||
| maxLengthComponent | |||
| linkLengths | |||
| pairedLinks | |||
| interconnectedLinks | |||
| blockEdges | |||
| inversionThreshold | float | 0.5 | |
| debug | bool | False | |
| debugTime | bool | False |
Clear elements too small to show
This set of functions (with the orchestrating function being clearInvisible) look at earlier identified non-linear link to size (or number of nucleotides) associations and if the next zoom level is larger than some sizes, then these links are removed (with reinstating of some of linear links instead).
After that Isolation blocks are identified and removed. Isolation block is a contiguous block of components (columns) that are connected only to each other but not to any of components outside the block.
Removing links and rearrangement blocks associated to too small blocks
removeLink
removeLink (fromComponentLinks, toComponentLinks, linkList, remLinks, link, pairedLink=None, subLink=None, subLinks=None, remLinkAccessions=None)
This function remove the main link.
If paired and substitute links are provided, the paired link will be checked (if it is not removed or in the queue to be removed), it will be added to the queue
After that common accessions for the same strand (for each separately) for start of main link and and end of paired link are found and substitute link is established for all such accessions.
If the substitute link is not (k,k+1), but (k,k+p), then in componentLinks all links (k,k+1),(k+1,k+1),…,(k+p-1,k+p) are established.
processCollapsibleBlocks
processCollapsibleBlocks (zoomLevel, linkLengths, pairedLinks, interconnectedLinks, fromComponentLinks, toComponentLinks)
clearRearrangementBlocks
clearRearrangementBlocks (zoomLevel, blockEdges)
Find isolated blocks
Identify empty edges
testStartEnd
testStartEnd (compNum, isLeft, components, accStarts, accEnds)
findEmptyEdges
findEmptyEdges (fromComponentLinks, toComponentLinks, accStarts, accEnds, components)
Identify all empty edges by simply finding components that do not appear either in toComponentLinks (left empty) or fromComponentLinks (right empty)
Identify isolated blocks
checkExternalLinks
checkExternalLinks (blockStart, blockEnd, fromComponentLinks, toComponentLinks, components)
createNewBoundaries
createNewBoundaries (blockStart, blockEnd, externalLinksComps, leftEmptyList, rightEmptyList)
# Test for `createNewBoundaries`
import numpy as np
st = [2,5,6,8]
end = [2,3,4,6,8,9,10,11]
blocks = [[2,11],[2,3],[5,11],[8,11],[8,9],[8,11]]
blockSplits = [[[2,3],[5,11]],[[2,2]],[[6,6],[8,11]],[[8,9]],[[8,8]],[]]
externals = [[4],[3],[5,7],[10],[9],[8,9,10,11]]
for bl,blSpl,ext in zip(blocks,blockSplits,externals):
blSplTT = createNewBoundaries(*bl,ext,st,end)
assert blSpl == blSplTT,f'Expected {blSpl}, but got {blSplTT}'# Another test for `createNewBoundaries`
leftEmptyList = [2056, 3080, 3081, 2092, 2099, 1593, 3643, 2627, 1116, 2653, 2655, 3168, 2658, 613, 1637, 1638, 106, 1654, 2695, 2192, 1169, 1686, 2714, 3757, 2233, 3781, 723, 1240, 224, 1761, 1762, 1766, 3323, 1804, 786, 2331, 802, 2850, 807, 811, 1839, 1841, 3396, 3397, 1863, 3400, 843, 3423, 1898, 1899, 882, 884, 3463, 402, 2451, 3478, 408, 3482, 934, 426, 1962, 3504, 3516, 3519, 3520, 451, 1994, 1995, 972, 2506, 463, 3024, 1493, 1494, 3542, 1525]
rightEmptyList = [402, 2451, 3478, 407, 280, 3482, 2848, 802, 934, 807, 426, 811, 2091, 2092, 3757, 1839, 3504, 1841, 3516, 3519, 3405, 463, 722, 1240, 1761, 1762, 1766, 1899]
blockStart = 3396
blockEnd = 3405
externalLinksComps = [3396, 3397, 3398, 3399, 3400, 3401, 3402, 3403, 3404, 3405]
createNewBoundaries(blockStart,blockEnd,externalLinksComps,leftEmptyList,rightEmptyList)[]
identifyIsolatedBlocks
identifyIsolatedBlocks (leftEmptyList, rightEmptyList, fromComponentLinks, toComponentLinks, components)
Removing Isolated Blocks
updateLinksRemoveComp
updateLinksRemoveComp (oldToNewInd, fromComponentLinks, toComponentLinks, linkLengths, pairedLinks, interconnectedLinks, blockEdges, accStarts, accEnds)
removeIsolatedBlocks
removeIsolatedBlocks (isolatedBlockList, components, componentLengths, fromComponentLinks, toComponentLinks, accStarts, accEnds, linkLengths, pairedLinks, interconnectedLinks, blockEdges)
Clearing small element wrapping function
clearInvisible
clearInvisible (zoomLevel, linkLengths, pairedLinks, interconnectedLinks, blockEdges, fromComponentLinks, toComponentLinks, accStarts, accEnds, components, componentLengths)
Exporting layer
These functions, with the main one being exportLayer, are exporting prepared zoom level (cleaned and collapsed by other functions) into Pantograph Visualisation tool data structures (JSON chunk files).
createZoomLevelDir
createZoomLevelDir (outputPath, outputName, zoomLevel)
Creates a directory for zoom level chunks. The function will take care of correct directory level separator.
finaliseChunk
finaliseChunk (rootStruct, zoomLevel, chunk, nucleotides, nBins, chunkNum, curCompCols, prevTotalCols, outputPath, outputName)
addLinksToComp
addLinksToComp (compNum, components, fromComponentLinks, toComponentLinks)
checkLinks
checkLinks (leftComp, rightComp)
searchIndicesPosRecord
searchIndicesPosRecord (redisConn, redisCaseID, zoomLevel, accessions, posMapping)
exportLayer
exportLayer (zoomLevel, components, componentNucleotides, fromComponentLinks, toComponentLinks, rootStruct, outputPath, outputName, maxLengthComponent, maxLengthChunk, inversionThreshold=0.5, redisConn=None, redisCaseID=None, accessions=None, debug=False)
Main exporter wrapper with its helper functions
This is the main orchestrating function that export a single graph to Pantograph Visualisation tool with a couple of auxiliary functions.
compLinksToAccCompLinks
compLinksToAccCompLinks (compLinks, doCompDir=False)
recordZoomLevelForDebug
recordZoomLevelForDebug (zoomNodeToComponent, zoomComponentToNodes, zoomComponents, nodeToComponent, componentToNodes, components, zoomLevel)
A function which records result of segmentation to dictionaries, which holds results for all zoom levels. It is currently used only for debugging purposes and in normal operation all zoom level dictionaries are not created and used.
| Type | Details | |
|---|---|---|
| zoomNodeToComponent | ||
| zoomComponentToNodes | ||
| zoomComponents | ||
| nodeToComponent | ||
| componentToNodes | ||
| components | ||
| zoomLevel | ||
| Returns | Returns modified dictionaries with zoom in the beginning of the names. Theoretically, |
searchIndicesGeneRecord
searchIndicesGeneRecord (redisConn, redisCaseID, geneMapping, genPosMapping, altChrGenPosMapping, genPosSearchMapping, pangenPosSearchMapping)
Recording prepared metadata structures into Redis DB
exportToPantograph
exportToPantograph (graph=None, inputPath=None, GenomeGraphParams={}, outputPath=None, outputName=None, outputSuffix=None, isSeq=True, nodeLengths=None, redisConn=None, zoomLevels=[1], fillZoomLevels=True, maxLengthComponent=100, maxLengthChunk=20, inversionThreshold=0.5, debug=False, returnDebugData=False)
This function is used by exportProject function and should not normally be used independently now.
Project generation
exportProject
exportProject (projectID, projectName, caseDict, pathToIndex, pathToGraphs, redisHost=None, redisPort=6379, redisDB=0, suffix='', maxLengthComponent=100, maxLengthChunk=6, inversionThreshold=0.5, isSeq=True, zoomLevels=[1], fillZoomLevel=True)
This is the only function that should normally be used to export a set of graphs (e.g. a graph per chromosome) to Pantograph Visualisation tool as a project (or interconnected structure).
Exporting of each graph creates a case directorybin2file.json file which describes the case overall and each zoom level. At the same time, each zoom level is contained in multiple chunk JSON files, each zoom level n is in the directory n inside the case directory. Each JSON chunk files contains all required information to visualise up to maxLengthChunk components at a given zoom level.
ALl case directories are in project directory together with <projectID>_project.json, which is simply provides association between case names and and corresponding directory name.
Finally, information about the project will be recorded to Pantograph Visualisation tool data index to make it discoverable by the tool.
In addition, no metadata is recorded into these files as it inflates it very quickly. Instead, a very simple (optional) API works alongside main Pantograph Visualisation tool which provides a lot of various metadata on request if API available or do nothing if not. This API uses Redis DB with special DB schema.
When graphs are exported some metadata (annotations, genome and pangenome positions) can be recorded to Redis DB. If Redis DB is not available or recording of metadata is not needed, then parameter redisHost should be omitted. Otherwise, if Redis DB is available and metadata should be recorded, then redisHost should be set to the hostname (or IP address) of the Redis DB server