You are given 4 days to complete this assessment. For this assignment, you have access to the following Azure OpenAI resources:
- Document Intelligence for Optical Character Recognition (OCR)
- GPT-4o and GPT-4o Mini as Large Language Models (LLMs)
- ADA 002 for text embeddings
All required resources have already been deployed in Azure. There is no need to create additional resources for this assignment.
The necessary Azure credentials have been included in the email containing this assignment. Please refer to these credentials for accessing the pre-deployed resources.
The Git repository for this assignment contains two important folders:
-
phase1_data: This folder contains:
- 1 raw PDF file that you can use to create more examples if needed
- 3 filled documents for testing and development
-
phase2_data: This folder contains:
- HTML files that serve as the knowledge base for Part 2 of the home assignment
Develop a system that extracts information from ביטוח לאומי (National Insurance Institute) forms using OCR and Azure OpenAI.
- Use Azure Document Intelligence for OCR. Learn more about Document Intelligence layout
- Use Azure OpenAI to extract fields and generate JSON output.
- Create a simple UI to upload a PDF/JPG file and display the resulting JSON. You can use Streamlit or Gradio for the UI implementation.
- Handle forms filled in either Hebrew or English.
- For any fields not present or not extractable, use an empty string in the JSON output.
- Implement a method to validate the accuracy and completeness of the extracted data.
{
"lastName": "",
"firstName": "",
"idNumber": "",
"gender": "",
"dateOfBirth": {
"day": "",
"month": "",
"year": ""
},
"address": {
"street": "",
"houseNumber": "",
"entrance": "",
"apartment": "",
"city": "",
"postalCode": "",
"poBox": ""
},
"landlinePhone": "",
"mobilePhone": "",
"jobType": "",
"dateOfInjury": {
"day": "",
"month": "",
"year": ""
},
"timeOfInjury": "",
"accidentLocation": "",
"accidentAddress": "",
"accidentDescription": "",
"injuredBodyPart": "",
"signature": "",
"formFillingDate": {
"day": "",
"month": "",
"year": ""
},
"formReceiptDateAtClinic": {
"day": "",
"month": "",
"year": ""
},
"medicalInstitutionFields": {
"healthFundMember": "",
"natureOfAccident": "",
"medicalDiagnoses": ""
}
}Here is a translation of the fields in Hebrew:
{
"שם משפחה": "",
"שם פרטי": "",
"מספר זהות": "",
"מין": "",
"תאריך לידה": {
"יום": "",
"חודש": "",
"שנה": ""
},
"כתובת": {
"רחוב": "",
"מספר בית": "",
"כניסה": "",
"דירה": "",
"ישוב": "",
"מיקוד": "",
"תא דואר": ""
},
"טלפון קווי": "",
"טלפון נייד": "",
"סוג העבודה": "",
"תאריך הפגיעה": {
"יום": "",
"חודש": "",
"שנה": ""
},
"שעת הפגיעה": "",
"מקום התאונה": "",
"כתובת מקום התאונה": "",
"תיאור התאונה": "",
"האיבר שנפגע": "",
"חתימה": "",
"תאריך מילוי הטופס": {
"יום": "",
"חודש": "",
"שנה": ""
},
"תאריך קבלת הטופס בקופה": {
"יום": "",
"חודש": "",
"שנה": ""
},
"למילוי ע\"י המוסד הרפואי": {
"חבר בקופת חולים": "",
"מהות התאונה": "",
"אבחנות רפואיות": ""
}
}Develop a microservice-based chatbot system that answers questions about medical services for Israeli health funds (Maccabi, Meuhedet, and Clalit) based on user-specific information. The system should be capable of handling multiple users simultaneously without maintaining server-side user memory.
-
Microservice Architecture
- Implement the chatbot as a stateless microservice using FastAPI or Flask.
- Handle multiple concurrent users efficiently.
- Manage all user session data and conversation history client-side (frontend).
-
User Interface
- Develop a frontend using Gradio or Streamlit.
- Implement two main phases: User Information Collection and Q&A.
-
Azure OpenAI Integration
- Utilize the Azure OpenAI client library for Python.
- Implement separate prompts for the information collection and Q&A phases.
-
Data Handling
- Use provided HTML files provided in the 'phase2_data' folder as the knowledge base for answering questions.
-
Multi-language Support
- Implement support for Hebrew and English.
-
Error Handling and Logging
- Implement comprehensive error handling and validation.
- Create a logging system to track chatbot activities, errors, and interactions.
Collect the following user information:
- First and last name
- ID number (valid 9-digit number)
- Gender
- Age (between 0 and 120)
- HMO name (מכבי | מאוחדת | כללית)
- HMO card number (9-digit)
- Insurance membership tier (זהב | כסף | ארד)
- Provide a confirmation step for users to review and correct their information.
Note: This process should be managed exclusively through the LLM, avoiding any hardcoded question-answer logic or form-based filling in the UI
- Transition to answering questions based on the user's HMO and membership tier.
- Utilize the knowledge base from provided HTML files.
- Pass all necessary user information and conversation history with each request to maintain statelessness.
- Microservice Architecture Implementation
- Technical Proficiency (Azure OpenAI usage, data processing)
- Prompt Engineering and LLM Utilization
- Code Quality and Organization
- User Experience
- Performance and Scalability
- Documentation
- Innovation
- Logging and Monitoring Implementation
- Provide source code via GitHub.
- Include setup and run instructions.
Good luck! For any questions, feel free to contact me.