Inverted Index

This repository contains an implementation of a multi-module inverted index system developed as part of a university course on Parallel Computing.

The project focuses on efficient text indexing and search over large document collections, with an emphasis on multithreading, scalability and performance evaluation.

Project Overview

The system is built around a custom implementation of an inverted index data structure and supports concurrent indexing and querying using Java multithreading tools.

The solution follows a client–server architecture and includes a dedicated module for performance comparison under different workloads and thread configurations.

Architecture

The system consists of the following modules:

invertedindex – core data structure and indexing logic
server – handles client requests and manages indexing processes
client – sends search and indexing requests
api – defines the application-level communication protocol
performanceComparison – evaluates execution time under varying parameters

System modules interaction

Client–Server Architecture

The system follows a client–server architecture with asynchronous request handling.

The client connects to the server via network sockets and sends search or indexing requests.
The server manages multiple client connections concurrently using a thread pool.
If the inverted index is not yet built, the server initiates the indexing process and continuously reports progress back to the client.
Communication between client and server is implemented via a custom application-level protocol defined in the api module.

This architecture enables scalable concurrent access to the inverted index while maintaining efficient resource utilization.

Client-Server Architecture

Testing

The system was tested using:

Unit testing (core services and concurrent data structures)
Integration testing (client–server interaction)
Concurrency safety testing
End-to-end scenarios

Results

A dedicated performance analysis module was implemented to evaluate the impact of parallelization.

Overall execution time VS number of threads and input size

Execution time VS number of threads and input size

Key findings:

Parallel indexing is effective for medium and large datasets
On medium-sized inputs, performance improves by up to 2×
For small datasets, parallelization may be inefficient due to thread overhead
The optimal number of threads equals the number of logical CPU cores
Increasing threads beyond this limit does not improve performance and may degrade execution time

Dataset

Experiments were conducted using subsets of the IMDB movie reviews dataset with varying input sizes to ensure realistic workload conditions.

Conclusion

The developed system demonstrates that:

Properly designed parallel data structures significantly improve scalability
Multithreading must be applied selectively, depending on input size
Custom concurrency control can outperform naive parallel implementations

Running the Project

Prerequisites

Before running the project, ensure the following tools are installed:

Java Development Kit (JDK) (version specified in pom.xml)
Apache Maven
Git

Verify installation:

java -version
mvn -version
git --version

Getting the Project

Clone the repository from GitHub:

git clone https://github.com/ellyzaveta/course-work-pc.git
cd course-work-pc

Build the Project

The project is a multi-module Maven project. To build all modules and download dependencies, run:

mvn clean install

Configuration

Each module contains its own configuration file located at:

<module-name>/src/main/resources/application.properties

Before running any module, make sure all required properties are properly configured.

Client–Server Mode

Server Configuration

Edit:

server/src/main/resources/application.properties

Configure the following properties:

directory.path — path to the directory containing input text files
server.port — port on which the server will run

Client Configuration

Edit:

client/src/main/resources/application.properties

Configure:

server.host — server host (e.g., localhost)
server.port — must match the server port defined in the server configuration

Run the Server

cd server
mvn spring-boot:run

Run the Client (in a separate terminal)

cd client
mvn spring-boot:run

Performance Comparison Mode

Configuration

Edit:

performancecomparison/src/main/resources/application.properties

Configure:

performance.testdata.paths — list of input file paths of different sizes
server.port — port for the performance comparison server

Run Performance Comparison

cd performancecomparison
mvn spring-boot:run

After the application starts, open a browser and navigate to:

http://localhost:{port}

where {port} is the value specified in application.properties.

Repository Structure

course-work-pc/ 
│
├── docs/                         # Documentation and results
│   ├── architecture/             # UML and architecture diagrams
│   └── results/                  # Performance evaluation charts
│
├── invertedindex/                # Inverted index core module
├── server/                       # Server-side implementation
├── client/                       # Client application
├── api/                          # Application-level protocol
├── performanceComparison/        # Performance analysis module
│
├── README.md                     # Project overview and results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inverted Index

Project Overview

Table of Contents

Architecture

Client–Server Architecture

Testing

Results

Key findings:

Dataset

Conclusion

Running the Project

Prerequisites

Getting the Project

Build the Project

Configuration

Client–Server Mode

Server Configuration

Client Configuration

Run the Server

Run the Client (in a separate terminal)

Performance Comparison Mode

Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.idea		.idea
api		api
client		client
docs		docs
invertedindex		invertedindex
performancecomparison		performancecomparison
server		server
README.md		README.md
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

Inverted Index

Project Overview

Table of Contents

Architecture

Client–Server Architecture

Testing

Results

Key findings:

Dataset

Conclusion

Running the Project

Prerequisites

Getting the Project

Build the Project

Configuration

Client–Server Mode

Server Configuration

Client Configuration

Run the Server

Run the Client (in a separate terminal)

Performance Comparison Mode

Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages