Parser
Word Parser
Word Parser is a class that actually extracts all the content i.e. Text, headings, paragraphs, tags from the document. When the object of Word Parser will be created then it will call its method extract content from the document.
This function iterates over all the documents and store the tags, i.e. Heading, text, para in tag list and extract tables by calling GetTables and extract headings by calling GetHeadings.
GetTables: Iterate all over the document and return tables.
GetHeadings: Iterate all over the document and return headings.
Then create the object of Document Parser class and pass headings, tables and tags to it.
Document Parser
Document Parser is a class that actually structure the data given from Word Parser and also gives all required content from the document. The content consists of tables, paragraphs, headings, Q&A, Q&A Section, All Questions, All answers, Get Headings Text, Get Questions from text etc..
When the object of Document Parser will create, it will call its method Make_Structure in its constructor.
The Make_Structure iterate all over the text and update the attribute of its class, i.e. Text, tlist and headings.
The description of methods that gives required content is given below.
This function returns all text present in the document. This function takes heading as input and iterate all over the headings in a document, append the text under that heading (given as input) in a list of text. And return all those texts under the heading in the form of a list of text as a list of text.
This function takes the list of text to search as an argument and iterate all over it, and matches it with the corresponding text in self. Text (attribute of Document Parser), if matches successfully append matched text heading to heading to push. After complete iteration convert heading to push to set and list, and append it to list of heading and return a list of heading.
This function takes heading as input and iterate all over the headings in a document, append the table under that heading (given as input) in df. And return all those tables under the heading in the form of a list of data-frame as df.
This function iterates all over the tables. Pre-processing them
This function all over the tables in the document and concatenate them with a new data-frame. Pre-processing it. Merge its columns and return it.
This function takes the table of type data-frame as an argument. Iterate all over the data-frame cell by cell. Remove garbage characters from cell and merge columns of one row and append it to the new data-frame. And return new data-frame with merged columns.
This function iterates all over the tables in the document and append them to new data-frame and returns first column of new data-frame as answers after applying merge answers on new data-frame.
Merge Answers: Merge Answers is same like merge columns. The only difference is it removes the first column of questions and merges the answers and return them.
This function takes question as the list of questions as an argument and returns the answers corresponding to those questions. Rules Architecture
Create Rule
First, it query from the DB and extract the record of that rule which we want to use.
Check whether it has some and_rules OR or_rules, if they exist, then create complex rules (A rule with multiple rules) object else create only rule object.
In both cases rule object will be created on the basis of section_identifier and compliance_type.
Train Model Flow Diagram
The Query from the DB and extract records of those rules on which we want our system to be trained.
If the system has no labeled data, then system will show the message that “All models are trained.“
Else system gets labeled data for all the rules on which our system needs to be trained.
Then the system will create the thread and start training model on those rules and also show the status.
Label Without Document Flow Diagram
Label data without document means, during the process of labeling the data system will accept the data without validating it. The Query from the DB and extract records of those rules on which we want our data to be labeled.
Create the rule object against all rules.
UI form will be shown to the user. The User will enter the labeling information and the system will label data against the rule and then system will show the response.
Label Data Flow Diagram
Label data in the document means the system will validate the labeled data with the document. If data exists, then system will further proceed the otherwise show the error message.
Query from the DB and extract records of those rules on which we want our data to be labeled.
Create the rule object against all rules.
UI form will be shown to user and user will enter the labeling information and update the status against all the rules. Then system will validate the data. If it validate successfully insert the record in DB and shows response of “Data Labeled Successfully“ else shows the error response.
Compliance Flow Diagram
Query from DB and extract records of those rules on which we want our system will decide to compliance. Create the rule object against all rules
System will check whether it is compliance or not checks on the basis of rules and show the compliance report to user. Feedback Flow Diagram
Rule Object is created.
System will show the feedback of already created rule object.
If it needs some changes then user will enter the feedback data.
System will check the compliance against feedback data.
Update the rule in DB according to feedback data.
Update the report and it will then show to user.
Functions
Label Sentences
User enter the relevant sentence and update the status.
System will label the data and validate the sentence(with document, without document).
Save Label Data
When user click on update button, then system transform the labeled data in json format and update its section identifier value and insert the record in DB.
Training Without Document on Section: Whole Document with Compliance Type: String Based
The User will enter relevant sentence.
If no sentence then system will show the error.
If sentence exists then system will check if it is for compliance or non compliance.
Then system will insert the record and show the status.
Data Received : : {‘form_data’: {‘compliance-decision’: ‘1’, ‘question’: ”, ‘section’: ”, ‘relevantSentences’: [‘ahmed’, ‘abrar’], ‘reason’: ‘Required information is available.’, ‘entities’: [], ‘intent’: ”, ‘NLU_Sentences’: []}, ‘rule_id’: ‘5da068ead01b52000100c91e’, ‘user_id’: ‘3’, ‘client_id’: ‘1’, ‘document_type_id’: ‘1’}
form_data = {‘compliance-decision’: ‘1’, ‘question’: ”, ‘section’: ”, ‘relevantSentences’: [‘ahmed’, ‘abrar’], ‘reason’: ‘Required information is available.’, ‘entities’: [], ‘intent’: ”, ‘NLU_Sentences’: []}
(form_data extracted from Data received)
compliance_decision = form_data[‘compliance-decision’] (1 or 0)
relevant_sentences = form_data[‘relevantSentences’] ([‘ahmed’, ‘abrar’])
Compliance Sentences: [<typess.Training_data object at 0x7ffa6920ce80>, <typess.Training_data object at 0x7ffa6920cef0>] ( [{‘py/object’: ‘typess.Training_data’, ‘Label’: ‘yes’, ‘Sentence’: ‘ahmed’}, {‘py/object’: ‘typess.Training_data’, ‘Label’: ‘yes’, ‘Sentence’: ‘abrar’}] )
data = {‘NLU_Data’: [], ‘client_id’: ‘1’, ‘compliance-decision’: ‘1’, ‘compliance_data’: [{‘py/object’: ‘typess.Training_data’, ‘Label’: ‘yes’, ‘Sentence’: ‘ahmed’}, {‘py/object’: ‘typess.Training_data’, ‘Label’: ‘yes’, ‘Sentence’: ‘abrar’}], ‘doc_id’: ”, ‘document_type_id’: ‘1’, ‘file_name’: ”, ‘is_included_in_training’: ‘0’, ‘rule_id’: ‘5da068ead01b52000100c91e’, ‘section_data’: [], ‘user_id’: ‘3’}
Leave a comment