ramjet.py_mapper

Code for TensorFlow’s Dataset class which allows for multiprocessing in CPU map functions.

Module Contents

class PyMapper(map_function: Callable, number_of_parallel_calls: int)[source]

A class which allows for mapping a py_function to a TensorFlow dataset in parallel on CPU.

__init__(self, map_function: Callable, number_of_parallel_calls: int)[source]
static pool_worker_initializer()[source]

Used to initialize each worker process.

send_to_map_pool(self, *example_elements)[source]

Sends the tensor element to the pool for processing.

Parameters:example_elements – The elements list to be processed by the pool. That is, each example_elements is the contents of a single example in the dataset. Often this may be a single element.
Returns:The output of the map function on the element.
map_to_dataset(self, dataset: tf.data.Dataset, output_types: Union[List[tf.dtypes.DType], tf.dtypes.DType] = tf.float32, output_shapes: Union[List[Tuple[int, ...]], Tuple[int, ...]] = None, flat_map: bool = False)[source]

Maps the map function to the passed dataset.

Parameters:
  • dataset – The dataset to apply the map function to.
  • output_types – The TensorFlow output types of the function to convert to.
  • output_shapes – The shape of the outputs of the dataset.
  • flat_map – Determines whether to flatten the first level of the output, similar to TensorFlow’s flat_map. Note, the output_types should be the shape of the unflattened output.
Returns:

The mapped dataset.

map_py_function_to_dataset(dataset: tf.data.Dataset, map_function: Callable, number_of_parallel_calls: int, output_types: Union[Tuple[tf.dtypes.DType, ...], tf.dtypes.DType] = tf.float32, output_shapes: Union[List[Tuple[int, ...]], Tuple[int, ...]] = None, flat_map: bool = False) → tf.data.Dataset[source]

A one line wrapper to allow mapping a parallel py function to a dataset.

Parameters:
  • dataset – The dataset whose elements the mapping function will be applied to.
  • map_function – The function to map to the dataset.
  • number_of_parallel_calls – The number of parallel calls of the mapping function.
  • output_types – The TensorFlow output types of the function to convert to.
  • output_shapes – The shape to set the outputs to clarify from Python to TensorFlow.
  • flat_map – Determines whether to flatten the first level of the output, similar to TensorFlow’s flat_map. Note, the output_types should be the shape of the un-flattened output.
Returns:

The mapped dataset.