edgePy.data_import.mongodb package

Submodules

edgePy.data_import.mongodb.gene_functions module

The core Python code for generating data.

edgePy.data_import.mongodb.gene_functions.get_canonical_raw(result: Dict[str, Any]) → Optional[int][source]

An approximation of the raw count of reads.

Parameters:result – the entry from the data collection
Returns:the raw count (as an integer)
edgePy.data_import.mongodb.gene_functions.get_canonical_rpkm(result: Dict[str, Any]) → Optional[int][source]

Get the rpkm from the database for a given entry in the data collection.

Parameters:result – the entry in the data collection
Returns:the rpkm value
edgePy.data_import.mongodb.gene_functions.get_gene_list(mongo_reader: Any, database: str = 'ensembl_90_37') → Dict[str, str][source]

get the list of genes from the mongo database, to translated ensg ids to symbols.

Parameters:
  • mongo_reader – the mongo wrapper
  • database – database name to use.
edgePy.data_import.mongodb.gene_functions.get_genelist_from_file(filename: str) → Optional[List][source]

Converts a genelist file into a list of genes. Simple function, but can be expanded if needed. :Parameters: filename – gene list file name.

edgePy.data_import.mongodb.gene_functions.get_sample_details(group_by: str, mongo_reader: Any, database: str) → Dict[Any, Dict[str, Any]][source]

Get details from the samples collection. Use this to decide which samples to query data for.

Parameters:
  • group_by – the name of the key to group samples by (Category-based key)
  • mongo_reader – the mongo wrapper
  • database – the database to use
Returns:

details required for each sample available.

edgePy.data_import.mongodb.gene_functions.translate_genes(genes: Optional[List[str]], mongo_reader: Any, database: str = 'ensembl_90_37') → Tuple[List[str], Dict[str, str]][source]

Functions to translate a list of genes in to ENGS symbols and vice versa.

Parameters:
  • genes – list of genes to filter on.
  • mongo_reader – the mongo connector
  • database – the name of the database to use. “pytest” for unit testimg (mocking)
Returns:

a list of ensg symbols, a list of gene symbols

edgePy.data_import.mongodb.mongo_import module

class edgePy.data_import.mongodb.mongo_import.ImportFromMongodb(host: str, port: int, mongo_key: Optional[str], mongo_value: Union[str, List, None], gene_list_file: Optional[str])[source]

Bases: object

A utility for importing mongo data from a proprietary mongodb database - hopefully we’ll open this database up in the future. If not, we can re-engineer it from the examples given.

Parameters:
  • host – the name of the machine hosting the database
  • port – the port number (usually 27017)
  • mongo_key – a key in the samples collection to filter on
  • mongo_value – accepted values in the samples collection to
  • gene_list_file – a list of genes to filter the results on.
get_data_from_mongo(database: str, rpkm_flag: bool = False) → Tuple[List[str], Dict[collections.abc.Hashable, Any], List[str], Dict[collections.abc.Hashable, Any]][source]

Run the queries to get the samples, from mongo, and then use that data to retrieve the counts.

Parameters:
  • database – name of the database to retrieve data from.
  • rpkm_flag – takes the rpkm values from the mongodb, instead of the raw counts
Returns:

the list of samples, the data itself, the gene list and the categories of the samples.

translate_gene_list(database: str) → None[source]

If there was a list of genes provided, convert them to ENSG symbols.

Parameters:database – name of the database
edgePy.data_import.mongodb.mongo_import.parse_arguments(parser: Any = None, ci_values: List[str] = None) → Any[source]

Standard argparse wrapper for interpreting command line arguments.

Parameters:
  • parser – if there’s an existing parser, provide it, else, this will
  • create a new one.
  • ci_values – use for testing purposes only.

edgePy.data_import.mongodb.mongo_wrapper module

A simple library for wrapping around mongo collections and access issues.

class edgePy.data_import.mongodb.mongo_wrapper.MongoInserter(host: str, port: int, database: str, collection: str, connect: bool = True)[source]

Bases: edgePy.data_import.mongodb.mongo_wrapper.MongoWrapper

This class is a thin layer on the MongoWrapper class, which is a thin layer on the pymongo library. It is used for instances where you want to insert data into a mongodb collection. It creates a buffer which is periodically flushed to Mongo.

Parameters:
  • host – the name of the machine hosting the database
  • port – the port number (usually 27017)
  • database – db name
  • collection – collection name
  • connect – whether to create the new session, or to attach to an existing session, set to false,
  • if this is being instantiated by a subprocesses.
add(record: Union[List[Any], Dict[collections.abc.Hashable, Any]]) → None[source]

Add a record to the buffer

Parameters:record – the record to add to the mongo inserter buffer
close() → None[source]

Close the MongoInserter - flush the buffer.

create_index_key(key: str) → None[source]

A tool for creating indexes on the collection.

flush() → None[source]

Flush out the buffer and write to mongo db.

class edgePy.data_import.mongodb.mongo_wrapper.MongoUpdater(host: str, port: int, database: str, collection: str, connect: bool = True)[source]

Bases: edgePy.data_import.mongodb.mongo_wrapper.MongoWrapper

This class is a thin layer on the MongoWrapper class, which is a thin layer on the pymongo library. It is used for instances where you want to Update data in a mongodb collection. It creates a buffer which is periodically flushed and written to mongo.

Parameters:
  • host – the name of the machine hosting the database
  • port – the port number (usually 27017
  • database – db name
  • collection – collection name
  • connect – whether to create the new session, or to attach to an existing session, set to false, if this is being instantiated by a subprocesses.
add(updatedict: Dict[collections.abc.Hashable, Any], setdict: Dict[collections.abc.Hashable, Any]) → None[source]

Add a record to the buffer

Parameters:
  • updatedict – the criteria for the update query
  • setdict – the dictionary describing the new record - OR use {$set: {}} to update a particular key without replacing the existing record.
close() → None[source]

Close the MongoInserter - flush the buffer.

flush() → None[source]

Flush out the buffer and write to mongo db.

class edgePy.data_import.mongodb.mongo_wrapper.MongoWrapper(host: str, port: Union[str, int] = 27017, connect: bool = True, verbose: bool = False)[source]

Bases: object

This class is for use as a thin layer for interactinvg with the Mongo Database using pymongo. Pymongo is an entirely reasonable way of working with Mongodb, but fails to provide some very common functions that are frequently used.

This class should be used for efficient retrieval of information from the database.

Parameters:
  • host – the name of the machine hosting the database
  • port – the port number (usually 27017
  • connect – whether to create the new session, or to attach to an existing session, set to false, if this is being instantiated by a subprocesses.
  • verbose – suppresses output, when set to false.
create_index(database: str, collection: str, key: str) → None[source]

A tool for creating indexes on a given collection.

Parameters:
  • database – db name
  • collection – collection name
  • key – the field name to create the index on.
find_as_cursor(database: str, collection: str, query: Dict[collections.abc.Hashable, Any] = None, projection: Dict[collections.abc.Hashable, Any] = None) → Iterable[source]

Do a find operation on a mongo collection and return the data as a cursor, the (native MongoClient find return type.)

Parameters:
  • database – db name
  • collection – collection name
  • query – a dictionary providing the criteria for the find command
  • projection – a dictionary that gives the projection - the fields to return.
Returns:

a cursor object, to be used as an iterator.

find_as_dict(database: str, collection: str, query: Dict[collections.abc.Hashable, Any] = None, field: str = '_id', projection: Dict[collections.abc.Hashable, Any] = None) → Iterable[source]
Do a find operation on a mongo collection, but return the data as a dictionary
Parameters:
  • database – db name
  • collection – collection name
  • query – a dictionary providing the criteria for the find command
  • projection – a dictionary that gives the projection - the fields to return.
  • field – the field in the projection for which the value will be used as the Hashable key of the dict.
Returns:

a dictionary representation of the returned data.

find_as_list(database: str, collection: str, query: Dict[collections.abc.Hashable, Any] = None, projection: Dict[collections.abc.Hashable, Any] = None) → Iterable[source]

Do a find operation on a mongo collection, but return the data as a list

Parameters:
  • database – db name
  • collection – collection name
  • query – a dictionary providing the criteria for the find command
  • projection – a dictionary that gives the projection - the fields to return.
Returns:

a list representation of the returned data.

get_db(database: str, collection: str) → Any[source]

This function simply hides the db name when using pytest-mongodb, when the database name should always be ‘pytest’

Parameters:
  • database – database name
  • collection – collection name
Returns:

the collection object ready for use with .find() or similar.

insert(database: str, collection: str, data_list: List[Any]) → None[source]

bulk insert of items into a mongodb collection.

Parameters:
  • database – db name
  • collection – collection name
  • data_list – a list of documents to insert into mongodb.

Module contents