edgePy.data_import.mongodb package¶
Submodules¶
edgePy.data_import.mongodb.gene_functions module¶
The core Python code for generating data.
-
edgePy.data_import.mongodb.gene_functions.get_canonical_raw(result: Dict[str, Any]) → Optional[int][source]¶ An approximation of the raw count of reads.
Parameters: result – the entry from the data collection Returns: the raw count (as an integer)
-
edgePy.data_import.mongodb.gene_functions.get_canonical_rpkm(result: Dict[str, Any]) → Optional[int][source]¶ Get the rpkm from the database for a given entry in the data collection.
Parameters: result – the entry in the data collection Returns: the rpkm value
-
edgePy.data_import.mongodb.gene_functions.get_gene_list(mongo_reader: Any, database: str = 'ensembl_90_37') → Dict[str, str][source]¶ get the list of genes from the mongo database, to translated ensg ids to symbols.
Parameters: - mongo_reader – the mongo wrapper
- database – database name to use.
-
edgePy.data_import.mongodb.gene_functions.get_genelist_from_file(filename: str) → Optional[List][source]¶ Converts a genelist file into a list of genes. Simple function, but can be expanded if needed. :Parameters: filename – gene list file name.
-
edgePy.data_import.mongodb.gene_functions.get_sample_details(group_by: str, mongo_reader: Any, database: str) → Dict[Any, Dict[str, Any]][source]¶ Get details from the samples collection. Use this to decide which samples to query data for.
Parameters: - group_by – the name of the key to group samples by (Category-based key)
- mongo_reader – the mongo wrapper
- database – the database to use
Returns: details required for each sample available.
-
edgePy.data_import.mongodb.gene_functions.translate_genes(genes: Optional[List[str]], mongo_reader: Any, database: str = 'ensembl_90_37') → Tuple[List[str], Dict[str, str]][source]¶ Functions to translate a list of genes in to ENGS symbols and vice versa.
Parameters: - genes – list of genes to filter on.
- mongo_reader – the mongo connector
- database – the name of the database to use. “pytest” for unit testimg (mocking)
Returns: a list of ensg symbols, a list of gene symbols
edgePy.data_import.mongodb.mongo_import module¶
-
class
edgePy.data_import.mongodb.mongo_import.ImportFromMongodb(host: str, port: int, mongo_key: Optional[str], mongo_value: Union[str, List, None], gene_list_file: Optional[str])[source]¶ Bases:
objectA utility for importing mongo data from a proprietary mongodb database - hopefully we’ll open this database up in the future. If not, we can re-engineer it from the examples given.
Parameters: - host – the name of the machine hosting the database
- port – the port number (usually 27017)
- mongo_key – a key in the samples collection to filter on
- mongo_value – accepted values in the samples collection to
- gene_list_file – a list of genes to filter the results on.
-
get_data_from_mongo(database: str, rpkm_flag: bool = False) → Tuple[List[str], Dict[collections.abc.Hashable, Any], List[str], Dict[collections.abc.Hashable, Any]][source]¶ Run the queries to get the samples, from mongo, and then use that data to retrieve the counts.
Parameters: - database – name of the database to retrieve data from.
- rpkm_flag – takes the rpkm values from the mongodb, instead of the raw counts
Returns: the list of samples, the data itself, the gene list and the categories of the samples.
-
edgePy.data_import.mongodb.mongo_import.parse_arguments(parser: Any = None, ci_values: List[str] = None) → Any[source]¶ Standard argparse wrapper for interpreting command line arguments.
Parameters: - parser – if there’s an existing parser, provide it, else, this will
- create a new one.
- ci_values – use for testing purposes only.
edgePy.data_import.mongodb.mongo_wrapper module¶
A simple library for wrapping around mongo collections and access issues.
-
class
edgePy.data_import.mongodb.mongo_wrapper.MongoInserter(host: str, port: int, database: str, collection: str, connect: bool = True)[source]¶ Bases:
edgePy.data_import.mongodb.mongo_wrapper.MongoWrapperThis class is a thin layer on the MongoWrapper class, which is a thin layer on the pymongo library. It is used for instances where you want to insert data into a mongodb collection. It creates a buffer which is periodically flushed to Mongo.
Parameters: - host – the name of the machine hosting the database
- port – the port number (usually 27017)
- database – db name
- collection – collection name
- connect – whether to create the new session, or to attach to an existing session, set to false,
- if this is being instantiated by a subprocesses.
-
class
edgePy.data_import.mongodb.mongo_wrapper.MongoUpdater(host: str, port: int, database: str, collection: str, connect: bool = True)[source]¶ Bases:
edgePy.data_import.mongodb.mongo_wrapper.MongoWrapperThis class is a thin layer on the MongoWrapper class, which is a thin layer on the pymongo library. It is used for instances where you want to Update data in a mongodb collection. It creates a buffer which is periodically flushed and written to mongo.
Parameters: - host – the name of the machine hosting the database
- port – the port number (usually 27017
- database – db name
- collection – collection name
- connect – whether to create the new session, or to attach to an existing session, set to false, if this is being instantiated by a subprocesses.
-
add(updatedict: Dict[collections.abc.Hashable, Any], setdict: Dict[collections.abc.Hashable, Any]) → None[source]¶ Add a record to the buffer
Parameters: - updatedict – the criteria for the update query
- setdict – the dictionary describing the new record - OR use {$set: {}} to update a particular key without replacing the existing record.
-
class
edgePy.data_import.mongodb.mongo_wrapper.MongoWrapper(host: str, port: Union[str, int] = 27017, connect: bool = True, verbose: bool = False)[source]¶ Bases:
objectThis class is for use as a thin layer for interactinvg with the Mongo Database using pymongo. Pymongo is an entirely reasonable way of working with Mongodb, but fails to provide some very common functions that are frequently used.
This class should be used for efficient retrieval of information from the database.
Parameters: - host – the name of the machine hosting the database
- port – the port number (usually 27017
- connect – whether to create the new session, or to attach to an existing session, set to false, if this is being instantiated by a subprocesses.
- verbose – suppresses output, when set to false.
-
create_index(database: str, collection: str, key: str) → None[source]¶ A tool for creating indexes on a given collection.
Parameters: - database – db name
- collection – collection name
- key – the field name to create the index on.
-
find_as_cursor(database: str, collection: str, query: Dict[collections.abc.Hashable, Any] = None, projection: Dict[collections.abc.Hashable, Any] = None) → Iterable[source]¶ Do a find operation on a mongo collection and return the data as a cursor, the (native MongoClient find return type.)
Parameters: - database – db name
- collection – collection name
- query – a dictionary providing the criteria for the find command
- projection – a dictionary that gives the projection - the fields to return.
Returns: a cursor object, to be used as an iterator.
-
find_as_dict(database: str, collection: str, query: Dict[collections.abc.Hashable, Any] = None, field: str = '_id', projection: Dict[collections.abc.Hashable, Any] = None) → Iterable[source]¶ - Do a find operation on a mongo collection, but return the data as a dictionary
Parameters: - database – db name
- collection – collection name
- query – a dictionary providing the criteria for the find command
- projection – a dictionary that gives the projection - the fields to return.
- field – the field in the projection for which the value will be used as the Hashable key of the dict.
Returns: a dictionary representation of the returned data.
-
find_as_list(database: str, collection: str, query: Dict[collections.abc.Hashable, Any] = None, projection: Dict[collections.abc.Hashable, Any] = None) → Iterable[source]¶ Do a find operation on a mongo collection, but return the data as a list
Parameters: - database – db name
- collection – collection name
- query – a dictionary providing the criteria for the find command
- projection – a dictionary that gives the projection - the fields to return.
Returns: a list representation of the returned data.
-
get_db(database: str, collection: str) → Any[source]¶ This function simply hides the db name when using pytest-mongodb, when the database name should always be ‘pytest’
Parameters: - database – database name
- collection – collection name
Returns: the collection object ready for use with .find() or similar.