Final Project for EMGT 308 - Solutions Architecture and the Cloud
SmartDocs Architect is a cloud-hosted AI-powered document intelligence system built using Retrieval-Augmented Generation (RAG).
It allows users to upload documents, convert them into vector embeddings, retrieve semantically relevant context, and ask grounded questions based only on uploaded content.
The current MVP is deployed on AWS EC2 as a single-instance cloud workload.
Cloud Deployment (AWS EC2): http://54.226.223.159:8501/
This project demonstrates core EMGT 308 concepts:
- Cloud deployment on AWS EC2
- Workload hosting in cloud infrastructure
- Layered solution architecture
- Infrastructure and application separation
- Scalable architecture planning
- Upload PDF and TXT files
- Parse and chunk document text
- Generate embeddings using Gemini API
- Store vectors in ChromaDB
- Ask grounded questions
- Display retrieved source chunks
- Clear knowledge base
| User |
| ↓ |
| Internet |
| ↓ |
| AWS EC2 Instance |
| ↓ |
| Streamlit Application |
| ↓ |
| Gemini API + ChromaDB |
| User |
| ↓ |
| Application Load Balancer |
| ↓ |
| Auto Scaling Group |
| ↓ |
| Multiple EC2 Instances |
| ↓ |
| Amazon S3 |
- Python
- Streamlit
- Gemini API
- ChromaDB
- PyPDF
- python-dotenv
- AWS EC2
SmartDocs Architect/
│
├── app.py
├── requirements.txt
├── README.md
├── .gitignore
├── .env.example
│
├── app/
│ ├── __init__.py
│ ├── ingest.py
│ ├── rag.py
│ ├── parsers.py
│ └── utils/
│ ├── __init__.py
│ └── helpers.py
│
├── docs/
│ └── architecture-notes.md
│
├── screenshots/
└── chroma_db/
- Upload document
- Parse text
- Chunk content
- Generate embeddings
- Store vectors
- Ask question
- Retrieve relevant chunks
- Generate grounded answer
- Ubuntu EC2 instance
- Security Group:
- SSH (22)
- TCP 8501
- Streamlit served externally
Mac / Linux:
python3 -m streamlit run app.py --server.port 8501 --server.address 0.0.0.0Windows:
python -m streamlit run app.py --server.port 8501 --server.address 0.0.0.0py -m streamlit run app.py --server.port 8501 --server.address 0.0.0.0http://54.226.223.159:8501
YouTube: https://youtu.be/MLLgwZVvE4Q
- Single EC2 instance
- Local vector persistence
- No authentication
- Elastic IP
- Load Balancer
- Auto Scaling
- Amazon S3 storage
Wineel Wilson Dasari




