Utils module
Sequence operations
Sequence inversion
generateComplementDict
generateComplementDict (seqType='DNA', isDict=True)
The function generateComplementDict generates a dictionary for complementing the DNA sequence. It can be applied to RNA to identify inverted sequences.
seqType: str, Can be either ‘DNA’ or ‘RNA’ at the moment. If ‘DNA’, then the complement to four known nucleotide (A, C, G, T) will be provided. All other letters (B, D, H, U, N and all others) will be translated to N.
complementSequence
complementSequence (seq, complementDict='DNA')
reverseSequence
reverseSequence (seq)
inverseSequence
inverseSequence (seq, complementDict='DNA')
Other file operations
checkNodeLengthsFile
checkNodeLengthsFile (GFAPath)
Path files operations
sortAccessions
sortAccessions (sort, _paths)
/home/pigrenok/.pyenv/versions/3.10.9/envs/pygengraph/lib/python3.10/site-packages/fastcore/docscrape.py:225: UserWarning: Unknown section Return
else: warn(msg)
pathFileToPathDict
pathFileToPathDict (filePath, directional=True, sort=True, v2=True)
Reads path file (ASCII file) and translates it to path dictionary for GenGraph class constructor.
Path file has a path on each line in the following format:
Export parameters processing and validating
pathConvert
pathConvert (inputPath, suffix='')
checkZoomLevels
checkZoomLevels (zoomLevels)
Check that each previous zoom level is factor of next one
adjustZoomLevels
adjustZoomLevels (zoomLevels)
If there is no zoom level 1, adds it to the list.
Utility classes
Numpy to JSON encoder
NpEncoder
NpEncoder (skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Extensible JSON https://json.org encoder for Python data structures.
Supports the following objects and types by default:
| Python | JSON |
|---|---|
| dict | object |
| list, tuple | array |
| str | string |
| int, float | number |
| True | true |
| False | false |
| None | null |
To extend this to recognize other objects, subclass and implement a .default() method with another method that returns a serializable object for o if possible, otherwise it should call the superclass implementation (to raise TypeError).
Bidirectional dict structure
bidict
bidict (*args, **kwargs)
Here is a class for a bidirectional dict, inspired by Finding key from value in Python dictionary and modified to allow the following 2) and 3).
Note that :
- The inverse directory bd.inverse auto-updates itself when the standard dict bd is modified.
- The inverse directory bd.inverse[value] is always a list of keys such that value in bd[key] for each key.
- Unlike the bidict module from https://pypi.python.org/pypi/bidict, here we can have 2 keys having same value, this is very important.
- After modification, values in the “forward” (not inversed) dict can be lists (or any iterables theoretically, but only list was tested).
For implementing 4), new method add was introduced. If d[key].append(value) attempted, the link between main and inversed dict will be broken. Method add can accept both
Credit: Implemented as an answer to https://stackoverflow.com/questions/3318625/how-to-implement-an-efficient-bidirectional-hash-table by Basj (https://stackoverflow.com/users/1422096/basj).
Redis utility
DB cleaning and maintenance
resetDB
resetDB (redisServer='redis', port=6379)
Reset the whole database. Be careful, it is impossible re restore DB once it was flushed.
Functions implementing secondary interval set in Redis database
iset_add
iset_add (r, name, intervalMapping)
Add members with intervals to interval set. If interval set does not exist, it will be created. In reality, it will create two Redis Sorted Sets for starts and ends of the intervals. The rest of the functions iset_ will know what to do with them.
r: Redis object. Redis client. name: string. Name of the interval set. intervalMapping: dict. Dictionary with names of intervals as keys and tuples with start and end of intervals.
Return number of added intervals. In reality, it adds equal number of elements to two sorted sets, if number of added elements are not equal, DataError is raised.
iset_get
iset_get (r, name, member=None)
Return either the whole interval set or specific name(s) with its interval.
r: Redis object. Redis client. name: string. Name of the interval set. member: string, list, tuple or None. If None, function return all members with their respective intervals. If string, returns a single member with its interval, if list or tuple, returns all requested members with their respecitve intervals.
Return a dictionary with member names as keys and tuples with interval starts and ends as values. For member names not found in interval set, the value for the given key will be a tuple (None,None).
iset_score
iset_score (r, name, start, end=None)
Returns all member names whose interval contains a given value or intersects with the given interval
r: Redis object. Redis client. name: string. Name of the interval set start: int. Query value or the start of query interval. end: int or None. If None, start is treated as a single query value. If int, then start is the start of the query interval, end is the end of the query interval.
Returns a list of members whose intervals either contain query value or intersects with query interval.
iset_not_score
iset_not_score (r, name, start, end=None)
Returns all intervals (member names only) where query value is not contained or query interval is not intersecting. Inverison of iset_score() function
r: Redis object. Redis client. name: string. Name of the interval set start: int. Query value or the start of query interval. end: int or None. If None, start is treated as a single query value. If int, then start is the start of the query interval, end is the end of the query interval.
Returns a list of members whose intervals either does not contain query value or does not intersect with query interval.
iset_del
iset_del (r, name, member=None)
Return either the whole interval set or specific name(s) with its interval.
r: Redis object. Redis client. name: string. Name of the interval set. member: string, list, tuple or None. If None, function return all members with their respective intervals. If string, returns a single member with its interval, if list or tuple, returns all requested members with their respecitve intervals.
Return number of removed intervals. In reality, it removes equal number of elements from two sorted sets, if number of added elements are not equal, DataError is raised.