Skip to content

Rom1-J/TXT2CSV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TXT2CSV

FOSSA Status Github release action Github commit action

Script to convert large .txt files (or any other format) to .csv via a regular expression.


Installation

From Sources

$ git clone https://github.com/Rom1-J/TXT2CSV
$ cd TXT2CSV
$ make build  # assuming you already have go installed on your system

Then you can find the executable inside dist/ directory.

From Builds


Usage

$ ./txt2csv -h
# Usage of ./main:
#   -input string
#         Input file
#   -output string
#         Output file (default "stdout")
#   -regex string
#         Regex to use
#   -threads int
#         Number of threads to use (default 12)

Examples

$ time ./txt2csv -input=extra/example/input.txt -regex="(?P<uuid_a>(?:[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12})):(?P<random>(?:\w|\s|\:)+):(?P<uuid_b>(?:[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}))" -threads=48 -output=extra/example/result.csv
# CSV header: [uuid_a random uuid_b garbage]
# Regex: (?P<uuid_a>(?:[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12})):(?P<random>(?:\w|\s|\:)+):(?P<uuid_b>(?:[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}))
# Threads: 48
# Done!
# ./txt2csv -input=extra/example/input.txt  -threads=48 -output=extra/example/result.csv  0.06s user 0.00s system 450% cpu 0.015 total

$ cd extra/example
$ python verify.py
# Test passed!

Performances tests

Specs:

  • CPU: Intel i7-9750H (12) @ 4.500GHz
  • Disk: NVMe
  • Memory: 32GB

Sample:

  • Size: ~890MB
  • Lines: 15,271,670
  • Regex: (?P<value_a>.*):(?P<value_b>[\w.-]+@[\w.-]+):(?P<value_c>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?P<value_d>.*)

Runs:

  • 135.09s user 3.39s system 1080% cpu 12.820 total
  • 136.42s user 3.61s system 1087% cpu 12.879 total
  • 136.58s user 3.48s system 1083% cpu 12.927 total

License

FOSSA Status

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors