1- # CAPIO
1+ # CAPIO: Cross Application Programmable IO
22
3- CAPIO (Cross-Application Programmable I/O), is a middleware aimed at injecting streaming capabilities to workflow steps
4- without changing the application codebase. It has been proven to work with C/C++ binaries, Fortran Binaries, JAVA,
5- python and bash.
3+ CAPIO is a middleware aimed at injecting streaming capabilities into workflow steps
4+ without changing the application codebase. It has been proven to work with C/C++ binaries, Fortran, Java, Python, and
5+ Bash.
66
7- [ ![ codecov] ( https://codecov.io/gh/High-Performance-IO/capio/graph/badge.svg?token=6ATRB5VJO3 )] ( https://codecov.io/gh/High-Performance-IO/capio )
8- ![ CI-Tests] ( https://github.com/High-Performance-IO/capio/actions/workflows/ci-tests.yaml/badge.svg )
9- [ ![ License: MIT] ( https://img.shields.io/badge/License-MIT-yellow.svg )] ( https://raw.githubusercontent.com/High-Performance-IO/capio/master/LICENSE )
7+ [ ![ codecov] ( https://codecov.io/gh/High-Performance-IO/capio/graph/badge.svg?token=6ATRB5VJO3 )] ( https://codecov.io/gh/High-Performance-IO/capio ) ![ CI-Tests] ( https://github.com/High-Performance-IO/capio/actions/workflows/ci-tests.yaml/badge.svg ) [ ![ License: MIT] ( https://img.shields.io/badge/License-MIT-yellow.svg )] ( https://raw.githubusercontent.com/High-Performance-IO/capio/master/LICENSE )
108
11- ## Build and run tests
9+ > [ !TIP]
10+ > CAPIO is now multibackend and dynamic by nature: you do not need MPI, to benefit for the in-memory IO improvements!
11+ > Just use a MTCL provided backend, if you want the in-memory IO, or fall back to the file system backend (default) if
12+ > oy just want to coordinate IO operations between workflow steps!
13+
14+ Compatible on:
15+ - ![ Architecture] ( https://img.shields.io/badge/Architecture-x86__64_/_amd64-50C878.svg )
16+ - ![ Architecture] ( https://img.shields.io/badge/Architecture-RISC--V_(riscv64)-50C878.svg )
17+ - ![ Architecture] ( https://img.shields.io/badge/Architecture-ARM64_coming_soon-red.svg )
18+
19+ ---
20+ ## Automatic install with SPACK
21+
22+ CAPIO is on SPACK! to install it automatically, just add the High Performance IO
23+ repo to spack and then install CAPIO:
24+ ``` bash
25+ spack repo add https://github.com/High-Performance-IO/hpio-spack.git
26+ spack install capio
27+ ```
28+
29+ > [ !WARNING]
30+ > To use this method, you need spack >= v1.0.0
31+
32+ ## π§ Manual Build and Install
1233
1334### Dependencies
1435
15- CAPIO depends on the following software that needs to be manually installed:
36+ ** Required manually: **
1637
17- - ` cmake >=3.15 `
18- - ` c++20 ` or newer
19- - ` openmpi `
38+ - ` cmake >= 3.15 `
39+ - ` C++20 `
2040- ` pthreads `
2141
22- The following dependencies are automatically fetched during cmake configuration phase, and compiled when required.
42+ ** Fetched/compiled during configuration: **
2343
24- - [ syscall_intercept] ( https://github.com/pmem/syscall_intercept ) to intercept syscalls
25- - [ Taywee/args] ( https://github.com/Taywee/args ) to parse server command line inputs
26- - [ simdjson/simdjson] ( https://github.com/simdjson/simdjson ) to parse json configuration files
44+ - [ syscall_intercept] ( https://github.com/pmem/syscall_intercept ) - Intercept and handles LINUX system calls
45+ - [ Taywee/args] ( https://github.com/Taywee/args ) - Parse user input arguments
46+ - [ simdjson/simdjson] ( https://github.com/simdjson/simdjson ) - Parse fast JSON files
47+ - [ MTCL] ( https://github.com/ParaGroup/MTCL ) - Provides abstractions over multiple communication backends
2748
28- ### Compile capio
49+ ### Compile CAPIO
2950
3051``` bash
3152git clone https://github.com/High-Performance-IO/capio.git capio && cd capio
@@ -35,169 +56,124 @@ cmake --build . -j$(nproc)
3556sudo cmake --install .
3657```
3758
38- It is also possible to enable log in CAPIO, by defining ` -DCAPIO_LOG=TRUE ` .
59+ To enable logging support, pass ` -DCAPIO_LOG=TRUE ` during the CMake configuration phase .
3960
40- ## Use CAPIO in your code
61+ ---
4162
42- Good news! You don't need to modify your code to benefit from the features of CAPIO. You have only to do three steps (
43- the first is optional).
63+ ## π§βπ» Using CAPIO in Your Code
4464
45- 1 ) Write a configuration file for injecting streaming capabilities to your workflow
65+ Good news! You ** donβt need to modify your application code ** . Just follow these steps:
4666
47- 2 ) Launch the CAPIO daemons with MPI passing the (eventual) configuration file as argument on the machines in which you
48- want to execute your program (one daemon for each node). If you desire to specify a custom folder
49- for capio, set ` CAPIO_DIR ` as a environment variable.
50- ``` bash
51- [CAPIO_DIR= your_capiodir] [mpiexec -N 1 --hostfile your_hostfile] capio_server -c conf.json
52- ```
67+ ### 1. Create a Configuration File * (optional but recommended)*
5368
54- > [ !NOTE]
55- > if ` CAPIO_DIR ` is not specified when launching capio_server, it will default to the current working directory of
56- > capio_server.
69+ Write a CAPIO-CL configuration file to inject streaming into your workflow. Refer to
70+ the [ CAPIO-CL Docs] ( https://capio.hpc4ai.it/docs/coord-language/ ) for details.
5771
58- 3 ) Launch your programs preloading the CAPIO shared library like this:
59- ``` bash
60- CAPIO_DIR=your_capiodir \
61- CAPIO_WORKFLOW_NAME=wfname \
62- CAPIO_APP_NAME=appname \
63- LD_PRELOAD=libcapio_posix.so \
64- ./your_app < args>
65- ```
72+ ### 2 Launch the workflow with CAPIO
6673
67- > [! WARNING]
68- > ` CAPIO_DIR` must be specified when launching a program with the CAPIO library. if ` CAPIO_DIR` is not specified, CAPIO
69- > will not intercept syscalls.
74+ To launch your workflow with capio you can follow two routes:
7075
71- # ## Available environment variables
76+ #### A) Use ` capiorun ` for simplified operations
7277
73- CAPIO can be controlled through the usage of environment variables. The available variables are listed below:
78+ You can simplify the execution of workflow steps with CAPIO using the ` capiorun ` utility. See the
79+ [ ` capiorun ` documentation] ( capio-run/readme.md ) for usage and examples. ` capiorun ` provides an easier way to manage
80+ daemon startup and environment preparation, so that the user do not need to manually prepare the environment.
7481
75- # ### Global environment variable
82+ #### B) Manually launch CAPIO
7683
77- - ` CAPIO_DIR` This environment variable tells to both server and application the mount point of capio;
78- - ` CAPIO_LOG_LEVEL` this environment tells both server and application the log level to use. This variable works only
79- if ` -DCAPIO_LOG=TRUE` was specified during cmake phase;
80- - ` CAPIO_LOG_PREFIX` This environment variable is defined only for capio_posix applications and specifies the prefix of
81- the logfile name to which capio will log to. The default value is ` posix_thread_` , which means that capio will log by
82- default to a set of files called ` posix_thread_* .log` . An equivalent behaviour can be set on the capio server using
83- the ` -l` option;
84- - ` CAPIO_LOG_DIR` This environment variable is defined only for capio_posix applications and specifies the directory
85- name to which capio will be created. If this variable is not defined, capio will log by default to ` capio_logs` . An
86- equivalent behaviour can be set on the capio server using the ` -d` option;
87- - ` CAPIO_CACHE_LINES` : This environment variable controls how many lines of cache are presents between posix and server
88- applications. defaults to 10 lines;
89- - ` CAPIO_CACHE_LINE_SIZE` : This environment variable controls the size of a single cache line. defaults to 256KB;
84+ Launch the CAPIO Daemons: start one daemon per node. Optionally set ` CAPIO_DIR ` to define the CAPIO mount point:
9085
91- # ### Server only environment variable
86+ ``` bash
87+ [CAPIO_DIR= your_capiodir] capio_server -c conf.json
88+ ```
9289
93- - ` CAPIO_FILE_INIT_SIZE` : This environment variable defines the default size of pre allocated memory for a new file
94- handled by capio. Defaults to 4MB. Bigger sizes will reduce the overhead of malloc but will fill faster node memory.
95- Value has to be expressed in bytes;
96- - ` CAPIO_PREFETCH_DATA_SIZE` : If this variable is set, then data transfers between nodes will be always, at least of the
97- given value in bytes;
90+ > [ !CAUTION]
91+ > If ` CAPIO_DIR ` is not set, it defaults to the current working directory.
9892
99- # ### Posix only environment variable
93+ You can now start your application. Just set the right environment variable and remember to set ` LD_PRELOAD ` to the
94+ ` libcapio_posix.so ` intercepting library:
10095
101- > [! WARNING]
102- > The following variables are mandatory. If not provided to a posix, application, CAPIO will not be able to correctly
103- > handle the
104- > application, according to the specifications given from the json configuration file!
105-
106- - ` CAPIO_WORKFLOW_NAME` : This environment variable is used to define the scope of a workflow for a given step. Needs to
107- be the same one as the field ` " name" ` inside the json configuration file;
108- - ` CAPIO_APP_NAME` : This environment variable defines the app name within a workflow for a given step;
109-
110- # # How to inject streaming capabilities into your workflow
111-
112- With CAPIO is possible to run the applications of your workflow that communicates through files concurrently. CAPIO will
113- synchronize transparently the concurrent reads and writes on those files. If a file is never modified after it is closed
114- you can set the streaming semantics equals to " on_close" on the configuration file. In this way, all the reads done on
115- this file will hung until the writer closes the file, allowing the consumer application to read the file even if the
116- producer is still running.
117- Another supported file streaming semantics is " append" in which a read is satisfied when the producer writes the
118- requested data. This is the most aggressive (and efficient) form of streaming semantics (because the consumer can start
119- reading while the producer is writing the file). This semantic must be used only if the producer does not modify a piece
120- of data after it is written.
121- The streaming semantic on_termination tells CAPIO to not allowing streaming on that file. This is the default streaming
122- semantics if a semantics for a file is not specified.
123- The following is an example of a simple configuration:
124-
125- ` ` ` json
126- {
127- " name" : " my_workflow" ,
128- " IO_Graph" : [
129- {
130- " name" : " writer" ,
131- " output_stream" : [
132- " file0.dat" ,
133- " file1.dat" ,
134- " file2.dat"
135- ],
136- " streaming" : [
137- {
138- " name" : [" file0.dat" ],
139- " committed" : " on_close"
140- },
141- {
142- " name" : [" file1.dat" ],
143- " committed" : " on_close" ,
144- " mode" : " no_update"
145- },
146- {
147- " name" : [" file2.dat" ],
148- " committed" : " on_termination"
149- }
150- ]
151- },
152- {
153- " name" : " reader" ,
154- " input_stream" : [
155- " file0.dat" ,
156- " file1.dat" ,
157- " file2.dat"
158- ]
159- }
160- ]
161- }
96+ ``` bash
97+ CAPIO_DIR=your_capiodir
98+ CAPIO_WORKFLOW_NAME=wfname
99+ CAPIO_APP_NAME=appname
100+ LD_PRELOAD=libcapio_posix.so
101+ ./your_app < args>
102+
103+ killall -USR1 capio_server
162104```
163105
164- > [! NOTE]
165- > We are working on an extension of the possible streaming semantics and in a detailed
166- > documentation about the configuration file!
106+ > [ !CAUTION ]
107+ > if ` CAPIO_APP_NAME ` and ` CAPIO_WORKFLOW_NAME ` are not set (or are set but do not match the values present in the
108+ > CAPIO-CL configuration file), CAPIO will not be able to operate correctly !
167109
168- # # Examples
110+ > [ !tip]
111+ > To gracefully shut down the capio server instance, just send the SIGUSR1 signal.
112+ > the capio_server process will then automatically clean up and terminate itself!
169113
170- The [examples](examples) folder contains some examples that shows how to use mpi_io with CAPIO.
171- There are also examples on how to write JSON configuration files for the semantics implemented by CAPIO:
114+ ---
172115
173- - [on_close](https://github.com/High-Performance-IO/capio/wiki/Examples#on_close-semantic): A pipeline composed by a
174- producer and a consumer with " on_close" semantics
175- - [no_update](https://github.com/High-Performance-IO/capio/wiki/Examples#noupdate-semantics): A pipeline composed by a
176- producer and a consumer with " no_update" semantics
177- - [mix_semantics](https://github.com/High-Performance-IO/capio/wiki/Examples#mixed-semantics): A pipeline composed by a
178- producer and a consumer with mix semantics
116+ ## βοΈ Environment Variables
179117
180- # # Report bugs + get help
118+ ### π Global
181119
182- [Create a new issue](https://github.com/High-Performance-IO/capio/issues/new)
120+ | Variable | Description |
121+ | -------------------------| ----------------------------------------------------|
122+ | ` CAPIO_DIR ` | Shared mount point for server and application |
123+ | ` CAPIO_LOG_LEVEL ` | Logging level (requires ` -DCAPIO_LOG=TRUE ` ) |
124+ | ` CAPIO_LOG_PREFIX ` | Log file name prefix (default: ` posix_thread_ ` ) |
125+ | ` CAPIO_LOG_DIR ` | Directory for log files (default: ` capio_logs ` ) |
126+ | ` CAPIO_CACHE_LINE_SIZE ` | Size of a single CAPIO cache line (default: 256KB) |
183127
184- [Get help](https://github.com/High-Performance-IO/capio/wiki)
128+ ### π₯οΈ Server-Only
185129
186- > [! TIP]
187- > A [wiki](https://github.com/High-Performance-IO/capio/wiki) is in development! You might want to check the wiki to get
188- > more in depth information about CAPIO!
130+ | Variable | Description |
131+ | ----------------------| ----------------------------------------------------------------------------|
132+ | ` CAPIO_METADATA_DIR ` | Directory for metadata files. Defaults to ` CAPIO_DIR ` . Must be accessible. |
133+
134+ ### π POSIX-Only (Mandatory)
135+
136+ > β οΈ These are required by CAPIO-POSIX. Without them, your app will not behave as configured in the JSON file.
137+
138+ | Variable | Description |
139+ | -----------------------| -------------------------------------------------|
140+ | ` CAPIO_WORKFLOW_NAME ` | Must match ` "name" ` field in your configuration |
141+ | ` CAPIO_APP_NAME ` | Name of the step within your workflow |
142+
143+ ---
144+
145+ ## π Extended documentation
146+
147+ Documentation and examples are available on the official site:
148+
149+ π [ https://capio.hpc4ai.it/docs ] ( https://capio.hpc4ai.it/docs )
150+
151+ ---
152+
153+ ## π Report Bugs & Get Help
154+
155+ - [ Create an issue] ( https://github.com/High-Performance-IO/capio/issues/new )
156+ - [ Official Documentation] ( https://capio.hpc4ai.it/docs )
157+
158+ ---
159+
160+ ## π₯ CAPIO Team
161+
162+ Made with β€οΈ by:
163+
164+ - Marco Edoardo Santimaria β <marcoedoardo.santimaria@unito.it > (Designer & Maintainer)
165+ - Iacopo Colonnelli β <iacopo.colonnelli@unito.it > (Workflow Support & Maintainer)
166+ - Massimo Torquati β <massimo.torquati@unipi.it > (Designer)
167+ - Marco Aldinucci β <marco.aldinucci@unito.it > (Designer)
189168
190- # # CAPIO Team
169+ ** Former Members: **
191170
192- Made with :heart: by:
171+ - Alberto Riccardo Martinelli β < albertoriccardo.martinelli@unito.it > (Designer & Maintainer)
193172
194- Alberto Riccardo Martinelli < albertoriccardo.martinelli@unito.it> (designer and maintainer) \
195- Marco Edoardo Santimaria < marcoedoardo.santimaria@unito.it> (Designer and maintainer) \
196- Iacopo Colonnelli < iacopo.colonnelli@unito.it> (Workflows expert and maintainer) \
197- Massimo Torquati < massimo.torquati@unipi.it> (Designer) \
198- Marco Aldinucci < marco.aldinucci@unito.it> (Designer)
173+ ---
199174
200- # # Papers
201- [! [CAPIO](https://img.shields.io/badge/CAPIO-10.1109/HiPC58850.2023.00031-red)]([https://arxiv.org/abs/2206.10048](https://dx.doi.org/10.1109/HiPC58850.2023.00031))
175+ ## π Publications
202176
177+ [ ![ CAPIO] ( https://img.shields.io/badge/CAPIO-10.1109/HiPC58850.2023.00031-red )] ( https://dx.doi.org/10.1109/HiPC58850.2023.00031 )
203178
179+ [ ![ ] ( https://img.shields.io/badge/CAPIO--CL-10.1007%2Fs10766--025--00789--0-green?style=flat&logo=readthedocs )] ( https://doi.org/10.1007/s10766-025-00789-0 )
0 commit comments