Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
4f0748e
basic mappings for a pixplot; known bug
dale-wahl Oct 24, 2023
9ca73e0
Merge branch 'master' into cartagrapher
dale-wahl Oct 25, 2023
cae91bc
add in LOTS of necessary steps so it actually works
dale-wahl Oct 26, 2023
6f62b13
add the pixplot template as a base
dale-wahl Oct 26, 2023
d473454
fix up atlas overlapping images & increase res
dale-wahl Oct 26, 2023
7177441
create a "plots" endpoint that uses pixplot_template and theoretical …
dale-wahl Nov 1, 2023
8c38621
fix up some path issues in the js file
dale-wahl Nov 1, 2023
4459394
add plot as preview!
dale-wahl Nov 1, 2023
c0a042e
rename
dale-wahl Nov 2, 2023
97bd8c3
Update .gitignore
dale-wahl Nov 2, 2023
946a167
Update .gitignore
dale-wahl Nov 2, 2023
0ae3382
add pixplot_template images
dale-wahl Nov 2, 2023
372d1ac
Update .gitignore
dale-wahl Nov 2, 2023
fad263d
add additional images for pixplot
dale-wahl Nov 2, 2023
c01c6f4
build paths based on two different sources (assets and data)
dale-wahl Nov 2, 2023
3c8256e
Merge branch 'master' into cartographer
dale-wahl Nov 2, 2023
2cbda77
Merge branch 'master' into cartographer
dale-wahl Nov 7, 2023
3e47291
create preset that auto runs cartographer
dale-wahl Nov 7, 2023
dfae9d9
only run preset image downloader
dale-wahl Nov 7, 2023
9ea862c
remove debug log
dale-wahl Nov 7, 2023
701a480
serve archived files
dale-wahl Nov 29, 2023
1ce0caf
serve archived files via frontend; use those images in cartographer
dale-wahl Nov 30, 2023
d1184b9
fix umap thumbsize when actually grid
dale-wahl Nov 30, 2023
64e5a2c
fix up the cartographer page a bit
dale-wahl Nov 30, 2023
1ca28a2
Merge branch 'master' into cartographer
dale-wahl Nov 30, 2023
e1fd2bf
do NOT change that 128 thumbnail size
dale-wahl Dec 12, 2023
067506b
serve archive files via generator (as opposed to extracting and delet…
dale-wahl Dec 12, 2023
c4ad487
fix adding annotation labels to mapped results
dale-wahl Dec 12, 2023
7ae8315
preview accepts url params to increase number/size of preview
dale-wahl Dec 12, 2023
ab6de26
attach preset to download images (instead of cartographer)
dale-wahl Dec 12, 2023
a6347db
preview zip datasets w/ cartographer if exists
dale-wahl Dec 12, 2023
98c3279
preset does not copy results_file but updates its results_file; clean…
dale-wahl Dec 12, 2023
6cbbf5b
render something for zips when no cartographer exists
dale-wahl Dec 12, 2023
73aa724
Merge branch 'master' into cartographer
dale-wahl Dec 14, 2023
ef0e633
Merge branch 'master' into cartographer
dale-wahl Dec 14, 2023
d07914e
add tiktok and telegram presets to use cartographer
dale-wahl Dec 14, 2023
9cc6e9e
add metadata to plot!
dale-wahl Dec 15, 2023
dd7f26e
modify cartographer to use max amount
dale-wahl Dec 18, 2023
13367be
use collages
dale-wahl Dec 18, 2023
409c3ba
dataset updates: add get_children method, mod get_all_children, allow…
dale-wahl Dec 20, 2023
7b3902f
Merge branch 'master' into cartographer
dale-wahl Dec 20, 2023
9772f53
moved hash_similarity_network.py to video_hasher.py
dale-wahl Dec 20, 2023
562a53b
staticmethod to init a dataset w/o db
dale-wahl Dec 21, 2023
82c8187
prep cartographer to check for coordinate-maps
dale-wahl Dec 21, 2023
adf58ca
create coordinate-map datasets from sigma network preview - disabled
dale-wahl Dec 21, 2023
d7030cc
Merge branch 'master' into cartographer
dale-wahl Jan 9, 2024
89c87e6
allow text on categorical layout only
dale-wahl Jan 12, 2024
b68ed8c
cartographer: enable date layout; and almost categorical (hidden curr…
dale-wahl Jan 12, 2024
e312888
Merge branch 'master' into cartographer
dale-wahl Jan 30, 2024
8e8e0ee
get archived file handle file not found
dale-wahl Jan 30, 2024
8ed5a32
cartographer use archive zip instead of results subfolder
dale-wahl Jan 30, 2024
f8168bd
update to use get_children() dataset method
dale-wahl Jan 31, 2024
ee32662
time some routes in debug mode
dale-wahl Jan 31, 2024
98ede11
If button is hidden (say because you don't want to implement it yet),…
dale-wahl Feb 1, 2024
96c7586
Merge branch 'master' into cartographer
dale-wahl Feb 7, 2024
f903b34
cartographer: fix front sizes on layout change!!!
dale-wahl Feb 7, 2024
6e0a6a2
cartographer: increase character count to display more categories
dale-wahl Feb 7, 2024
0c86246
cartographer: category view works now!
dale-wahl Feb 7, 2024
9d9934f
cartographer: found that stupid floating zero
dale-wahl Feb 8, 2024
98dba8e
pixplot_template: move metadata to left of image view; fix thumbs in …
dale-wahl Feb 8, 2024
dd25db3
cartographer: tested a better categorical point_size
dale-wahl Feb 8, 2024
6229333
fix get_all_children method to allow non instantiated datasets
dale-wahl Feb 20, 2024
c25f8ec
Merge branch 'master' into cartographer
dale-wahl Feb 20, 2024
14fe1f0
remove dataset.get
dale-wahl Feb 20, 2024
4d62dcf
fix typo and remove time_this debug
dale-wahl Feb 20, 2024
5d16065
revert .env change (mistake)
dale-wahl Feb 20, 2024
8970f64
add cartographer for video scenes
dale-wahl Feb 20, 2024
cb370e6
Merge branch 'master' into cartographer
dale-wahl Feb 21, 2024
a04ee29
deactivate video_scene_frames to plot pipeline
dale-wahl Feb 21, 2024
b5ee46b
cartographer handle directories
dale-wahl Feb 21, 2024
e053fee
reenable video-scene-frames preset to plot
dale-wahl Feb 21, 2024
5685d31
remove video-scene-frames preset; breaks other preset's `is_compatible`
dale-wahl Feb 21, 2024
14af493
add ui_only parameter to DataSet.get_available_processors() and Basic…
dale-wahl Feb 29, 2024
77fa3d3
Merge branch 'display_in_ui' into cartographer
dale-wahl Feb 29, 2024
9743254
update image downloaders and presets to use display_in_ui instead of …
dale-wahl Feb 29, 2024
1b6c0c8
don't delete twice
dale-wahl Feb 29, 2024
7720631
preview zip files opens new window as opposed to iframe
dale-wahl Feb 29, 2024
7f68486
Merge branch 'master' into cartographer
dale-wahl May 8, 2024
cb477c9
fix up ui display changes
dale-wahl May 8, 2024
c6bdc04
fix is_hidden from tiktok video to image downloader
dale-wahl May 8, 2024
219f30b
Merge branch 'master' into cartographer
dale-wahl May 28, 2024
ad79e70
fix up max images (if 0, max would always use 0)
dale-wahl May 28, 2024
8bcc5b1
map umap optional!
dale-wahl May 28, 2024
35c1c6d
alphabetic is also optional
dale-wahl May 28, 2024
40c5d41
Merge branch 'master' into cartographer
dale-wahl Jun 16, 2025
966970c
merging is sooooo fun
dale-wahl Jun 16, 2025
3dc91c3
black nor ruff not god could have saved me from all these
dale-wahl Jun 16, 2025
3b676db
remove dup annotation function in dataset
dale-wahl Jun 17, 2025
fc66b76
search.py remove pipeline/next processor logic (already in BasicProce…
dale-wahl Jun 17, 2025
01400dd
tiktok downloader remove redundancies
dale-wahl Jun 18, 2025
cbf32bd
download_videos: children already instantiated
dale-wahl Jun 18, 2025
3589188
views_dataset: fix zip preview to conform w/ changes to csv.html
dale-wahl Jun 18, 2025
de60e5c
try to get Stijn's pixplot updates, fail a lot, get most of them. pro…
dale-wahl Jun 18, 2025
ccdcab4
Merge branch 'master' into cartographer
dale-wahl Jul 3, 2025
80a68f2
fixes to result-parameters for zip
dale-wahl Jul 3, 2025
0b07242
ruff stuff mostly
dale-wahl Jul 4, 2025
edf3040
more merge fixes!
dale-wahl Jul 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,6 @@ webtool/venv/
*.ipynb
venv/

# do not ignore interface images
!webtool/static/img/*.png
!webtool/static/img/*.gif
!webtool/static/img/*.jpg
!webtool/static/img/favicon/*.ico
!webtool/static/img/flags/*.png
!common/assets/github-screenshots/*.png

# generated by 4CAT
webtool/static/css/colours.css
webtool/static/img/favicon/favicon.ico
Expand All @@ -66,3 +58,13 @@ keys/
images/
sphinx-3.3.1/
sphinx/

# do not ignore interface images
!webtool/static/img/*.png
!webtool/static/img/*.gif
!webtool/static/img/*.jpg
!webtool/static/img/favicon/*.ico
!webtool/static/img/flags/*.png
!webtool/static/pixplot_template/assets/images/*
!webtool/static/pixplot_template/assets/images/icons/*
!common/assets/github-screenshots/*.png
33 changes: 20 additions & 13 deletions backend/lib/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -662,7 +662,7 @@ def write_csv_items_and_finish(self, data):
self.dataset.update_status("Finished")
self.dataset.finish(len(data))

def write_archive_and_finish(self, files, num_items=None, compression=zipfile.ZIP_STORED, finish=True):
def write_archive_and_finish(self, filelist_or_folder, num_items=None, compression=zipfile.ZIP_STORED, finish=True):
"""
Archive a bunch of files into a zip archive and finish processing

Expand All @@ -676,21 +676,28 @@ def write_archive_and_finish(self, files, num_items=None, compression=zipfile.ZI
:param bool finish: Finish the dataset/job afterwards or not?
"""
is_folder = False
if issubclass(type(files), PurePath):
is_folder = files
if not files.exists() or not files.is_dir():
raise RuntimeError("Folder %s is not a folder that can be archived" % files)
if issubclass(type(filelist_or_folder), PurePath):
# folder with files
is_folder = filelist_or_folder
if not filelist_or_folder.exists() or not filelist_or_folder.is_dir():
raise RuntimeError("Folder %s is not a folder that can be archived" % filelist_or_folder)

files = files.glob("*")

# create zip of archive and delete temporary files and folder
# create zip of archive and delete temporary files and folder
self.dataset.update_status("Compressing results into archive")
done = 0
with zipfile.ZipFile(self.dataset.get_results_path(), "w", compression=compression) as zip:
for output_path in files:
zip.write(output_path, output_path.name)
output_path.unlink()
done += 1
with zipfile.ZipFile(self.dataset.get_results_path(), "w", compression=compression) as zipf:
if is_folder:
for root, dirs, files in os.walk(filelist_or_folder):
for file in files:
zipf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file), filelist_or_folder))
done += 1
else:
# list of files
for output_path in filelist_or_folder:
zipf.write(output_path, output_path.name)
output_path.unlink()
done += 1

# delete temporary folder
if is_folder:
Expand Down
5 changes: 4 additions & 1 deletion backend/workers/cleanup_tempfiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,13 @@ def work(self):
# if for whatever reason there are multiple hashes in the filename,
# the key would always be the last one
key = possible_keys.pop()

try:
dataset = DataSet(key=key, db=self.db)
except DataSetException:
# TODO: would another dataset user same get_result_folder_path? and additional files?
if self.db.fetchone(f"select * from datasets where result_file = '{file.name}'") is not None:
# Another dataset is using this file
continue
# the dataset has been deleted since, but the result file still
# exists - should be safe to clean up
if file.name not in tracked_files:
Expand Down
29 changes: 15 additions & 14 deletions common/lib/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -891,20 +891,21 @@ def delete(self, commit=True, queue=None):
self.db.delete("datasets_owners", where={"key": self.key}, commit=commit)
self.db.delete("users_favourites", where={"key": self.key}, commit=commit)

# delete from drive
try:
if self.get_results_path().exists():
self.get_results_path().unlink()
if self.get_results_path().with_suffix(".log").exists():
self.get_results_path().with_suffix(".log").unlink()
if self.get_results_folder_path().exists():
shutil.rmtree(self.get_results_folder_path())

except FileNotFoundError:
# already deleted, apparently
pass
except PermissionError as e:
self.db.log.error(
# delete from drive if not used elsewhere
if self.db.fetchone(f"select * from datasets where result_file = '{self.get_results_path().name}' and key != '{self.key}'") is None:
try:
if self.get_results_path().exists():
self.get_results_path().unlink()
if self.get_results_path().with_suffix(".log").exists():
self.get_results_path().with_suffix(".log").unlink()
if self.get_results_folder_path().exists():
shutil.rmtree(self.get_results_folder_path())

except FileNotFoundError:
# already deleted, apparently
pass
except PermissionError as e:
self.db.log.error(
f"Could not delete all dataset {self.key} files; they may need to be deleted manually: {e}"
)

Expand Down
25 changes: 25 additions & 0 deletions common/lib/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Miscellaneous helper functions for the 4CAT backend
"""
import subprocess
import zipfile
import imagehash
import hashlib
import requests
Expand Down Expand Up @@ -119,6 +120,7 @@ def sniff_encoding(file):

return "utf-8-sig" if maybe_bom == b"\xef\xbb\xbf" else "utf-8"


def sniff_csv_dialect(csv_input):
"""
Determine CSV dialect for an input stream
Expand All @@ -141,6 +143,13 @@ def sniff_csv_dialect(csv_input):
return dialect, has_header


def get_html_redirect_page(url):
"""
Returns a html string to redirect to PixPlot.
"""
return f"<head><meta http-equiv='refresh' charset='utf-8' content='0; URL={url}'></head>"


def get_git_branch():
"""
Get current git branch
Expand Down Expand Up @@ -1279,6 +1288,22 @@ def split_urls(url_string, allowed_schemes=None):
return potential_urls


def get_archived_file(archive_path, archived_file, temp_dir):
with zipfile.ZipFile(archive_path, "r") as archive_file:
archive_contents = sorted(archive_file.namelist())

if archived_file in archive_contents:
info = archive_file.getinfo(archived_file)
if info.is_dir():
raise IsADirectoryError("File is a directory")

archive_file.extract(archived_file, temp_dir)

return temp_dir.joinpath(archived_file)

else:
raise FileNotFoundError("File not found in archive")

def folder_size(path='.'):
"""
Get the size of a folder using os.scandir for efficiency
Expand Down
10 changes: 2 additions & 8 deletions processors/machine_learning/pix-plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from werkzeug.utils import secure_filename

from common.lib.dmi_service_manager import DmiServiceManager, DsmOutOfMemory, DmiServiceManagerException
from common.lib.helpers import UserInput, ellipsiate
from common.lib.helpers import UserInput, get_html_redirect_page, ellipsiate
from backend.lib.processor import BasicProcessor

__author__ = "Dale Wahl"
Expand Down Expand Up @@ -229,7 +229,7 @@ def process(self):

# Results HTML file redirects to output_dir/index.html
plot_url = ('https://' if self.config.get("flask.https") else 'http://') + self.config.get("flask.server_name") + '/result/' + f"{os.path.relpath(self.dataset.get_results_folder_path(), self.dataset.folder)}/index.html"
html_file = self.get_html_page(plot_url)
html_file = get_html_redirect_page(plot_url)

# Write HTML file
with self.dataset.get_results_path().open("w", encoding="utf-8") as output_file:
Expand Down Expand Up @@ -387,12 +387,6 @@ def format_metadata(self, temp_path):
self.dataset.update_status("Metadata.csv created")
return metadata_file_path if rows_written != 0 else False

def get_html_page(self, url):
"""
Returns a html string to redirect to PixPlot.
"""
return f"<head><meta http-equiv='refresh' charset='utf-8' content='0; URL={url}'></head>"

def clean_filename(self, s):
"""
Given a string that points to a filename, return a clean filename
Expand Down
53 changes: 53 additions & 0 deletions processors/networks/coordinate_map.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import json

from backend.lib.processor import BasicProcessor


__author__ = "Dale Wahl"
__credits__ = ["Dale Wahl"]
__maintainer__ = "Dale Wahl"
__email__ = "4cat@oilab.eu"

from common.lib.user_input import UserInput


class CoordinateMap(BasicProcessor):
"""
Wrapper DataSet for a JSON file with plot coordinates that can be used by the cartographer to plot images accordingly
"""
type = "coordinate-map" # job type ID
category = "Networks" # category
title = "Coordinate Map" # title displayed in UI
description = "Generate via network \"preview\" and export node coordinates" # description displayed in UI
extension = "json" # extension of result file, used internally and in UI

options= {
"coordinates": {
"type": UserInput.OPTION_TEXT_JSON,
"default": {},
"help": "JSON containing nodes and their coordinates",
"tooltip": "e.g. {'node_1': {'x': 0, 'y': 0}, 'node_2': {'x': 1, 'y': 1}}",
}
}

@classmethod
def is_compatible_with(cls, module=None, config=None):
"""
Currently can only be used by the sigma network visualizer; no 4CAT modules have the appropriate input
"""
# TODO: this needs to be able to run on network datasets, but requires input from sigma preview
# How do we hide from frontend, but still allow is_compatible_with to return True?
# return module.get_extension() == "gexf"
return False # TODO enable this, button in gexf.html when notifications work...

def process(self):
"""
This takes a JSON containing coordinates as input and saves it as a 4CAT Dataset. Designed to be used with
the sigma network visualizer.
"""
json_data = self.parameters.get("coordinates")

with open(self.dataset.get_results_path(), "w") as f:
f.write(json.dumps(json_data))

self.dataset.finish(len(json_data))
Loading
Loading