Utils module

Provides utility functions and classes

Sequence operations

Sequence inversion


generateComplementDict

 generateComplementDict (seqType='DNA', isDict=True)

The function generateComplementDict generates a dictionary for complementing the DNA sequence. It can be applied to RNA to identify inverted sequences.

seqType: str, Can be either ‘DNA’ or ‘RNA’ at the moment. If ‘DNA’, then the complement to four known nucleotide (A, C, G, T) will be provided. All other letters (B, D, H, U, N and all others) will be translated to N.


complementSequence

 complementSequence (seq, complementDict='DNA')

reverseSequence

 reverseSequence (seq)

inverseSequence

 inverseSequence (seq, complementDict='DNA')

Other file operations


checkNodeLengthsFile

 checkNodeLengthsFile (GFAPath)

Path files operations


sortAccessions

 sortAccessions (sort, _paths)
/home/pigrenok/.pyenv/versions/3.10.9/envs/pygengraph/lib/python3.10/site-packages/fastcore/docscrape.py:225: UserWarning: Unknown section Return
  else: warn(msg)

pathFileToPathDict

 pathFileToPathDict (filePath, directional=True, sort=True, v2=True)

Reads path file (ASCII file) and translates it to path dictionary for GenGraph class constructor.

Path file has a path on each line in the following format: : <nodeID[+|-]>[,<nodeID[+,-]>]

Export parameters processing and validating


pathConvert

 pathConvert (inputPath, suffix='')

checkZoomLevels

 checkZoomLevels (zoomLevels)

Check that each previous zoom level is factor of next one


adjustZoomLevels

 adjustZoomLevels (zoomLevels)

If there is no zoom level 1, adds it to the list.

Utility classes

Numpy to JSON encoder


NpEncoder

 NpEncoder (skipkeys=False, ensure_ascii=True, check_circular=True,
            allow_nan=True, sort_keys=False, indent=None, separators=None,
            default=None)

Extensible JSON https://json.org encoder for Python data structures.

Supports the following objects and types by default:

Python JSON
dict object
list, tuple array
str string
int, float number
True true
False false
None null

To extend this to recognize other objects, subclass and implement a .default() method with another method that returns a serializable object for o if possible, otherwise it should call the superclass implementation (to raise TypeError).

Bidirectional dict structure


bidict

 bidict (*args, **kwargs)

Here is a class for a bidirectional dict, inspired by Finding key from value in Python dictionary and modified to allow the following 2) and 3).

Note that :

  1. The inverse directory bd.inverse auto-updates itself when the standard dict bd is modified.
  2. The inverse directory bd.inverse[value] is always a list of keys such that value in bd[key] for each key.
  3. Unlike the bidict module from https://pypi.python.org/pypi/bidict, here we can have 2 keys having same value, this is very important.
  4. After modification, values in the “forward” (not inversed) dict can be lists (or any iterables theoretically, but only list was tested).

For implementing 4), new method add was introduced. If d[key].append(value) attempted, the link between main and inversed dict will be broken. Method add can accept both

Credit: Implemented as an answer to https://stackoverflow.com/questions/3318625/how-to-implement-an-efficient-bidirectional-hash-table by Basj (https://stackoverflow.com/users/1422096/basj).

Redis utility

DB cleaning and maintenance


resetDB

 resetDB (redisServer='redis', port=6379)

Reset the whole database. Be careful, it is impossible re restore DB once it was flushed.

Functions implementing secondary interval set in Redis database


iset_add

 iset_add (r, name, intervalMapping)

Add members with intervals to interval set. If interval set does not exist, it will be created. In reality, it will create two Redis Sorted Sets for starts and ends of the intervals. The rest of the functions iset_ will know what to do with them.

r: Redis object. Redis client. name: string. Name of the interval set. intervalMapping: dict. Dictionary with names of intervals as keys and tuples with start and end of intervals.

Return number of added intervals. In reality, it adds equal number of elements to two sorted sets, if number of added elements are not equal, DataError is raised.


iset_get

 iset_get (r, name, member=None)

Return either the whole interval set or specific name(s) with its interval.

r: Redis object. Redis client. name: string. Name of the interval set. member: string, list, tuple or None. If None, function return all members with their respective intervals. If string, returns a single member with its interval, if list or tuple, returns all requested members with their respecitve intervals.

Return a dictionary with member names as keys and tuples with interval starts and ends as values. For member names not found in interval set, the value for the given key will be a tuple (None,None).


iset_score

 iset_score (r, name, start, end=None)

Returns all member names whose interval contains a given value or intersects with the given interval

r: Redis object. Redis client. name: string. Name of the interval set start: int. Query value or the start of query interval. end: int or None. If None, start is treated as a single query value. If int, then start is the start of the query interval, end is the end of the query interval.

Returns a list of members whose intervals either contain query value or intersects with query interval.


iset_not_score

 iset_not_score (r, name, start, end=None)

Returns all intervals (member names only) where query value is not contained or query interval is not intersecting. Inverison of iset_score() function

r: Redis object. Redis client. name: string. Name of the interval set start: int. Query value or the start of query interval. end: int or None. If None, start is treated as a single query value. If int, then start is the start of the query interval, end is the end of the query interval.

Returns a list of members whose intervals either does not contain query value or does not intersect with query interval.


iset_del

 iset_del (r, name, member=None)

Return either the whole interval set or specific name(s) with its interval.

r: Redis object. Redis client. name: string. Name of the interval set. member: string, list, tuple or None. If None, function return all members with their respective intervals. If string, returns a single member with its interval, if list or tuple, returns all requested members with their respecitve intervals.

Return number of removed intervals. In reality, it removes equal number of elements from two sorted sets, if number of added elements are not equal, DataError is raised.