The Prediction_Data_validation class handles all validation for raw prediction data, ensuring files meet schema requirements before processing. Invalid files are automatically separated for review.
The system validates filenames using regex from predictionDataValidation.py:80-95:
def manualRegexCreation(self): """ Method Name: manualRegexCreation Description: This method contains a manually defined regex based on the "FileName" given in "Schema" file. This Regex is used to validate the filename of the prediction data. Output: Regex pattern On Failure: None """ regex = "['fraudDetection']+['_'']+[\\d_]+[\\d]+\\.csv" return regex
Creates Good_Raw/ and Bad_Raw/ directories for file sorting.
path = os.path.join("Prediction_Raw_Files_Validated/", "Good_Raw/")if not os.path.isdir(path): os.makedirs(path)path = os.path.join("Prediction_Raw_Files_Validated/", "Bad_Raw/")if not os.path.isdir(path): os.makedirs(path)
2
Filename Validation
Validates each file against the regex pattern and timestamp requirements from predictionDataValidation.py:228-274.Files matching all criteria are copied to Good_Raw/, others to Bad_Raw/.
3
Column Count Validation
Verifies that each good file has exactly 38 columns from predictionDataValidation.py:279-318:
def validateColumnLength(self, NumberofColumns): try: f = open("Prediction_Logs/columnValidationLog.txt", 'a+') self.logger.log(f, "Column Length Validation Started!!") for file in listdir('Prediction_Raw_Files_Validated/Good_Raw/'): csv = pd.read_csv("Prediction_Raw_Files_Validated/Good_Raw/" + file) if csv.shape[1] == NumberofColumns: csv.to_csv("Prediction_Raw_Files_Validated/Good_Raw/" + file, index=None, header=True) else: shutil.move("Prediction_Raw_Files_Validated/Good_Raw/" + file, "Prediction_Raw_Files_Validated/Bad_Raw") self.logger.log(f, "Invalid Column Length for the file!! File moved to Bad Raw Folder :: %s" % file) self.logger.log(f, "Column Length Validation Completed!!") except Exception as e: self.logger.log(f, "Error Occured:: %s" % e) raise e
4
Missing Values Check
Validates that no column has all values missing from predictionDataValidation.py:325-364:
def validateMissingValuesInWholeColumn(self): try: f = open("Prediction_Logs/missingValuesInColumn.txt", 'a+') self.logger.log(f, "Missing Values Validation Started!!") for file in listdir('Prediction_Raw_Files_Validated/Good_Raw/'): csv = pd.read_csv("Prediction_Raw_Files_Validated/Good_Raw/" + file) count = 0 for columns in csv: if (len(csv[columns]) - csv[columns].count()) == len(csv[columns]): count += 1 shutil.move("Prediction_Raw_Files_Validated/Good_Raw/" + file, "Prediction_Raw_Files_Validated/Bad_Raw") self.logger.log(f, "Invalid Column Length for the file!! File moved to Bad Raw Folder :: %s" % file) break if count == 0: csv.to_csv("Prediction_Raw_Files_Validated/Good_Raw/" + file, index=None, header=True) except Exception as e: self.logger.log(f, "Error Occured:: %s" % e) raise e
5
Archive Bad Files
Moves invalid files to archive with timestamp for review.
Files that fail validation are archived from predictionDataValidation.py:181-223:
def moveBadFilesToArchiveBad(self): now = datetime.now() date = now.date() time = now.strftime("%H%M%S") try: path = "PredictionArchivedBadData" if not os.path.isdir(path): os.makedirs(path) source = 'Prediction_Raw_Files_Validated/Bad_Raw/' dest = 'PredictionArchivedBadData/BadData_' + str(date) + "_" + str(time) if not os.path.isdir(dest): os.makedirs(dest) files = os.listdir(source) for f in files: if f not in os.listdir(dest): shutil.move(source + f, dest) file = open("Prediction_Logs/GeneralLog.txt", 'a+') self.logger.log(file, "Bad files moved to archive") path = 'Prediction_Raw_Files_Validated/' if os.path.isdir(path + 'Bad_Raw/'): shutil.rmtree(path + 'Bad_Raw/') self.logger.log(file, "Bad Raw Data Folder Deleted successfully!!") file.close() except OSError as e: file = open("Prediction_Logs/GeneralLog.txt", 'a+') self.logger.log(file, "Error while moving bad files to archive:: %s" % e) file.close() raise OSError
Bad files are archived with a timestamp (e.g., PredictionArchivedBadData/BadData_2026-03-04_143022/) for later review and resubmission after correction.