dl_data_pipeline.validator package¶
Submodules¶
dl_data_pipeline.validator.base_validator module¶
- class dl_data_pipeline.validator.base_validator.Validator[source]¶
Bases:
ABC
Abstract base class for data validation.
This class provides a template for creating specific validators that check the integrity or correctness of data. Implementations of this class should define the validate method to perform the actual validation logic.
Subclasses must override the validate method to provide specific validation rules. If the data does not meet the validation criteria, the method should raise a ValidationError.
- Parameters:
data (Any) – The data to be validated.
- Raises:
ValidationError – If the data fails to meet the validation criteria defined in the subclass implementation.
- _abc_impl = <_abc._abc_data object>¶
- abstract validate(data: Any) None [source]¶
base class for data validation
- Parameters:
data (Any) – data to be processed
- Raises:
ValidationError – data didn’t validate
dl_data_pipeline.validator.mean_var_validator module¶
- class dl_data_pipeline.validator.mean_var_validator.MeanVarValidator(target_mean: float | None = None, max_mean_gap: float = 1.0, target_var: float | None = None, max_var_gap: float = 1.0)[source]¶
Bases:
Validator
Validator class that checks whether the mean and variance of a dataset fall within specified ranges.
This class extends the Validator base class to provide validation based on statistical properties of the data. It checks if the mean and variance of the input data are within user-defined acceptable ranges.
- Parameters:
target_mean (float, optional) – The target mean value that the data should be close to. If None, mean validation is not performed.
max_mean_gap (float, optional) – The maximum allowed deviation of the mean from the target mean. Defaults to 1.0.
target_var (float, optional) – The target variance value that the data should be close to. If None, variance validation is not performed.
max_var_gap (float, optional) – The maximum allowed deviation of the variance from the target variance. Defaults to 1.0.
- Raises:
ValueError – If both target_mean and target_var are None.
ValidationError – If the mean or variance of the data falls outside the acceptable range defined by target_mean and max_mean_gap, or target_var and max_var_gap.
Examples
>>> validator = MeanVarValidator(target_mean=0.0, max_mean_gap=0.1, >>> target_var=1.0, max_var_gap=0.1) >>> data = np.array([0.05, -0.02, 0.03]) >>> validator.validate(data) # This will pass if mean and variance are within the specified gaps
- _abc_impl = <_abc._abc_data object>¶
- validate(data: Any) None [source]¶
base class for data validation
- Parameters:
data (Any) – data to be processed
- Raises:
ValidationError – data didn’t validate
dl_data_pipeline.validator.min_max_validator module¶
- class dl_data_pipeline.validator.min_max_validator.MinMaxValidator(min_value: float | None = None, max_value: float | None = None)[source]¶
Bases:
Validator
Validator class that checks whether the values in a dataset fall within specified minimum and maximum bounds.
This class extends the Validator base class to provide validation based on the minimum and maximum values of the data. It ensures that all values in the input data are within the defined range.
- Parameters:
min_value (float, optional) – The minimum allowable value for the data. If None, the minimum value is not validated.
max_value (float, optional) – The maximum allowable value for the data. If None, the maximum value is not validated.
- Raises:
ValueError – If both min_value and max_value are None.
ValidationError – If any value in the data is outside the range defined by min_value and max_value.
Examples
>>> validator = MinMaxValidator(min_value=0.0, max_value=1.0) >>> data = np.array([0.5, 0.8, 1.2]) >>> validator.validate(data) # This will raise a ValidationError because 1.2 > 1.0
- _abc_impl = <_abc._abc_data object>¶
- validate(data: Any) None [source]¶
base class for data validation
- Parameters:
data (Any) – data to be processed
- Raises:
ValidationError – data didn’t validate
dl_data_pipeline.validator.shape_validator module¶
- class dl_data_pipeline.validator.shape_validator.ShapeValidator(accepted_shape: Any, shape_getter: str = 'shape')[source]¶
Bases:
Validator
Validator class that checks if the shape of the data matches the expected shape.
This class extends the Validator base class to ensure that the shape of the input data matches a specified shape. It uses a specified attribute getter method to retrieve the shape of the data.
- Parameters:
accepted_shape (Any) – The shape that the data should match. This can be any value that represents the expected shape (e.g., a tuple for multidimensional arrays).
shape_getter (str, optional) – The name of the attribute to be used for retrieving the shape of the data. Defaults to “shape”. This should be the name of an attribute that returns the shape of the data (e.g., ‘shape’ for NumPy arrays).
- Raises:
ValidationError – If the data does not have the specified shape or if the shape attribute is not present.
Examples
>>> validator = ShapeValidator((100, 200)) >>> data = np.zeros((100, 200)) >>> validator.validate(data) # This will pass because the shape matches.
>>> validator = ShapeValidator((100, 200)) >>> data = np.zeros((100, 300)) >>> validator.validate(data) # This will raise a ValidationError because the shape does not match.
- _abc_impl = <_abc._abc_data object>¶
- validate(data: Any) None [source]¶
base class for data validation
- Parameters:
data (Any) – data to be processed
- Raises:
ValidationError – data didn’t validate
dl_data_pipeline.validator.type_validator module¶
- class dl_data_pipeline.validator.type_validator.TypeValidator(*accepted_types: type)[source]¶
Bases:
Validator
Validator class that checks if the data is of one of the accepted types.
This class extends the Validator base class to ensure that the type of the input data matches one of the specified acceptable types.
- Parameters:
*accepted_types (type) – A variable number of type arguments representing the acceptable types for the data. The data must be an instance of at least one of these types.
- Raises:
ValidationError – If the data is not an instance of any of the accepted types.
Examples
>>> validator = TypeValidator(int, float) >>> validator.validate(42) # This will pass because 42 is an int.
>>> validator = TypeValidator(int, float) >>> validator.validate("string") # This will raise a ValidationError because "string" is not an int or float.
- _abc_impl = <_abc._abc_data object>¶
- validate(data: Any) None [source]¶
base class for data validation
- Parameters:
data (Any) – data to be processed
- Raises:
ValidationError – data didn’t validate
dl_data_pipeline.validator.validation_error module¶
- exception dl_data_pipeline.validator.validation_error.ValidationError[source]¶
Bases:
Exception
Exception raised for errors in data validation.
This exception is used to indicate that data has failed a validation check. It is typically raised by validator classes when the data does not meet specified criteria.
- Inherits from:
Exception: Base class for all built-in exceptions.
Examples
>>> if not some_validation_check(data): >>> raise ValidationError("The data did not meet the validation criteria.")
- add_note()¶
Exception.add_note(note) – add a note to the exception
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
Module contents¶
This module provides various classes for data validation, extending the base Validator class.
The validators included in this module are designed to check different properties of data to ensure it meets specific criteria. They are useful for ensuring data integrity and quality in various data processing and analysis applications.
Classes: - TypeValidator: Checks if the data is of one of the specified types. - ShapeValidator: Validates that the data has a specific shape. - MinMaxValidator: Ensures that the minimum and maximum values of the data fall within specified bounds. - MeanVarValidator: Validates that the mean and variance of the data are within acceptable ranges.
Exceptions: - ValidationError: Raised when data does not meet the validation criteria defined by any of the validators.
- Usage Example:
>>> from validators import TypeValidator, ShapeValidator, MinMaxValidator, MeanVarValidator, RangeValidator >>> validator = TypeValidator(int, float) >>> validator.validate(5) # Passes as 5 is of type int >>> validator.validate("string") # Raises ValidationError as "string" is not of type int or float
>>> shape_validator = ShapeValidator((3, 3)) >>> shape_validator.validate(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])) # Passes as shape is (3, 3) >>> shape_validator.validate(np.array([[1, 2, 3], [4, 5, 6]])) # Raises ValidationError as shape is (2, 3)
>>> min_max_validator = MinMaxValidator(min_value=0, max_value=10) >>> min_max_validator.validate(np.array([1, 5, 9])) # Passes as min=1 and max=9 are within bounds >>> min_max_validator.validate(np.array([-1, 5, 12])) # Raises ValidationError as min=-1 and max=12 are out of bounds
>>> mean_var_validator = MeanVarValidator(target_mean=5.0, max_mean_gap=1.0, target_var=2.0, max_var_gap=0.5) >>> mean_var_validator.validate(np.array([4, 5, 6])) # Passes as mean and variance are within acceptable ranges >>> mean_var_validator.validate(np.array([1, 2, 3])) # Raises ValidationError as mean or variance are out of bounds