This repository contains an implementation of a multi-module inverted index system developed as part of a university course on Parallel Computing.
The project focuses on efficient text indexing and search over large document collections, with an emphasis on multithreading, scalability and performance evaluation.
The system is built around a custom implementation of an inverted index data structure and supports concurrent indexing and querying using Java multithreading tools.
The solution follows a client–server architecture and includes a dedicated module for performance comparison under different workloads and thread configurations.
The system consists of the following modules:
invertedindex– core data structure and indexing logicserver– handles client requests and manages indexing processesclient– sends search and indexing requestsapi– defines the application-level communication protocolperformanceComparison– evaluates execution time under varying parameters
The system follows a client–server architecture with asynchronous request handling.
- The client connects to the server via network sockets and sends search or indexing requests.
- The server manages multiple client connections concurrently using a thread pool.
- If the inverted index is not yet built, the server initiates the indexing process and continuously reports progress back to the client.
- Communication between client and server is implemented via a custom application-level protocol defined in the
apimodule.
This architecture enables scalable concurrent access to the inverted index while maintaining efficient resource utilization.
The system was tested using:
- Unit testing (core services and concurrent data structures)
- Integration testing (client–server interaction)
- Concurrency safety testing
- End-to-end scenarios
A dedicated performance analysis module was implemented to evaluate the impact of parallelization.
Overall execution time VS number of threads and input size
Execution time VS number of threads and input size
-
Parallel indexing is effective for medium and large datasets
-
On medium-sized inputs, performance improves by up to 2×
-
For small datasets, parallelization may be inefficient due to thread overhead
-
The optimal number of threads equals the number of logical CPU cores
-
Increasing threads beyond this limit does not improve performance and may degrade execution time
Experiments were conducted using subsets of the IMDB movie reviews dataset with varying input sizes to ensure realistic workload conditions.
The developed system demonstrates that:
-
Properly designed parallel data structures significantly improve scalability
-
Multithreading must be applied selectively, depending on input size
-
Custom concurrency control can outperform naive parallel implementations
Before running the project, ensure the following tools are installed:
- Java Development Kit (JDK) (version specified in
pom.xml) - Apache Maven
- Git
Verify installation:
java -version
mvn -version
git --versionClone the repository from GitHub:
git clone https://github.com/ellyzaveta/course-work-pc.git
cd course-work-pc
The project is a multi-module Maven project. To build all modules and download dependencies, run:
mvn clean install
Each module contains its own configuration file located at:
<module-name>/src/main/resources/application.properties
Before running any module, make sure all required properties are properly configured.
Edit:
server/src/main/resources/application.properties
Configure the following properties:
- directory.path — path to the directory containing input text files
- server.port — port on which the server will run
Edit:
client/src/main/resources/application.properties
Configure:
- server.host — server host (e.g., localhost)
- server.port — must match the server port defined in the server configuration
cd server
mvn spring-boot:run
cd client
mvn spring-boot:run
Configuration
Edit:
performancecomparison/src/main/resources/application.properties
Configure:
- performance.testdata.paths — list of input file paths of different sizes
- server.port — port for the performance comparison server
Run Performance Comparison
cd performancecomparison
mvn spring-boot:run
After the application starts, open a browser and navigate to:
http://localhost:{port}
where {port} is the value specified in application.properties.
course-work-pc/
│
├── docs/ # Documentation and results
│ ├── architecture/ # UML and architecture diagrams
│ └── results/ # Performance evaluation charts
│
├── invertedindex/ # Inverted index core module
├── server/ # Server-side implementation
├── client/ # Client application
├── api/ # Application-level protocol
├── performanceComparison/ # Performance analysis module
│
├── README.md # Project overview and results

