4cat Versions Save

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.

v1.42

3 weeks ago

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.

When updating a Docker-based 4CAT, the front-end interface may fail to restart properly, marked by an error message like 'Error upgrading front-end container' in the restart log. In this case please run an upgrade via Docker Desktop or the command line as indicated on this page.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

  • Fix a bug in the restart procedure that could result in the front-end container failing to restart and upgrade when running 4CAT via Docker (765f29e9232afdf284ab1667b0f371951e0bf2f4)
  • Fix a bug that could result in a processor crash when trying to filter datasets for a string on columns containing numeric values (537d76456e2826e8c4dd7026ec5b2d436370fad8)
  • Fix a bug that could result in a worker crash when importing CrowdTangle-formatted CSV files (91c3da176fad90ba16871fa8892fac5a0df13785)
  • Fix an issue with mapping Twitter data that could result in a crash (43c6ed646994111188bde66d5bcfe4ab602e8512)
  • Added the possibility to create notifications for all users with a certain tag in the Control Panel (c43e76daae3c2e6ecdb218ee749315b985eccca4)
  • Added a data source for importing TikTok comment data from Zeeschuimer (50a4434a37d71af6a9470c7fc4a236b043cbfb4d)
  • Updated the default 4CAT configuration to enable the import of Gab and TikTok Comment data (342a4037411e7ccaa50b25a4686434bec39e2568)
  • Updated Douyin item mapping to properly process items not assigned to a specific room (6918baeabc7a08b6a63495c5d38c86b2c88bca44, 1fd78b2362840299e80f5540c9fedc1be3b06da1)

v1.41

1 month ago

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or if you encounter issues when upgrading via the web UI.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

Processors

  • Fixed an issue with the image downloading processor where it would not properly follow links to images for 4chan datasets (e5f1f703247a5763d3d0e03c44ee31ab60b8a8ed)
  • Fixed an issue with the TF-IDF processor where results would be off if fewer results than the requested top n results were available (44848a8f4b9fea07e7f9ce03e4fe0d696d5f1d27)
  • Fixed a rare crash that could occur when a processor would encounter a FileNotFound exception while a Slack webhook was configured for logging (131a0eca0ad514b1ee57803e5c560ab0e56de42d, #422)
  • Updated dataset filters to give filtered datasets a more context-senstive name, based on the original's name as well as the filter type (3ef3e5ec9adbd8ddd128ce2b3f8fa3b1de1297e3)
  • Updated the PixPlot processor to allow for a longer run time (2582538303e31470ed6bf8a01645f7b45af15e5d)
  • Added a dedicated processor for downloading Telegram videos, replacing the generic one for datasets from that data source (94c814b9cab2ae2be10d5c5d3f6cfe20898e349c, 3f15410af3a278f5644f41f49e25498a1fac3c76)
  • Added 'emoji' count option to 'Count values' processor, to count how often emoji occur in a dataset (bb50fc946fb6cdd8454969514bdc6d5ecf3f3530)
  • Added 'Fetch URL metadata' processor, to fetch details about URLs mentioned in a dataset (a0baae17d8f11e4cae7cc261f8d406b1b1ce628a)
  • Added options to the Telegram image downloader to fetch link preview thumbnails (8a7da5317defdafb5bdbf74dcbeb68e464fa21f4)

Data sources

  • Fixed an issue with Telegram datasets that made items not have unique IDs in certain situations (a8b36dc5682df7c16e25474ea8fdbfc4f12f9d46)
  • Fixed an issue when mapping Instagram datasets where a crash could occur if the 'full_name' of a user could not be determined (fa3be93bafef17e95881207604efa1212d562d9e)
  • Fixed an issue with the Telegram data source where the 'max messages to fetch' setting would not be parsed correctly (d749237ec5c103b286ba8086904e405e232fc14c)
  • Updated the warnings given by imported datasets to the user about items that could not properly be imported or for which some data was missing to be more accurate (db05ae5e565248e865e67b8ea60e6653357bb1f4, #418)
  • Added columns with reactions, link details and number of forwards to Telegram dataset CSV exports (e653e3d8fb9c01697d96316df6f7634454671191, e4a93442efb84d73d6a4c9af9bc46a8f3e3fdda2)
  • Added support for image galleries to the Douyin data source (876f4a4b6df51ec4b30a048c32191438b6778f90)
  • Added a 4CAT setting to control the amount of entities that can be fetched at a time via the Telegram dataset (cd2e74d251491a93bc66dc7a64e8b2a60b0ed8ae)

Web interface

  • Fixed an issue where the UI would not prompt for confirmation when deleting a configuration tag (39f2ec40faa3b8493bd5525279aeaeb2e4f586e0)
  • Fixed an issue where deleting a user: tag would delete all user: tags (9b4981d8c7358f31ed65d9f161d556e578389801)
  • Fixed an issue where the colour of the favicon would revert to pink in certain situations (073587efc581adca0608988573ac83ea8b0c93d0)
  • Fixed an issue where the 'Request access' link would be visible on the login page even if requesting access was disabled (28d733d56204231f4089660ff61282174aac7aed, 1f2cb77e3cb0fc9b5403da52aaa925b33089d18f)
  • Fixed an issue where the control panel could be unresponsive when 4CAT's data folder was very large; disk usage is now calculated every few hours in the background (c8ad90b3436cff600320d3b2efdf6144240ea59d)
  • Fixed an issue where the configuration tag priority order could be edited via the Settings page; use the Configuration Tags page instead (ae1c00fb3a521a2c3258b2597b04322d202c3ee7)
  • Updated the user filter on the user list page of the control panel to be case-insensitive (940bac72c7e53bec9e136867c13e2a0a355961a4)
  • Updated the layout of the control panel's Settings page to make it easier to navigate (d36254a188947fff507e8df59f793e98b3be1570)
  • Updated the 'Share' dialog on dataset pages to allow comma-separated multiple item entry (6d8cb067bc12f8be68749f74a7291e0849494225)
  • Updated some processors to hide/show certain options depending on the value of other options chosen (#397)
  • Updated the CSV preview in the web UI to make hyperlinks clickable (daa7291e813e62fed4600a4acb8430004836cb86)
  • Added links to a list of users with the tag to the 'Configuration tags' page in the control panel (9b4981d8c7358f31ed65d9f161d556e578389801)

Full Changelog: https://github.com/digitalmethodsinitiative/4cat/compare/v1.40...v1.41

v1.40

2 months ago

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.

When updating a Docker-based 4CAT, upgrading to this version may fail or appear to not have made any changes the first time. This is due to a bug fixed in this version. If this happens to you, follow the 'Docker - how to upgrade with command' instructions via the link above.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

  • Optimised Docker build (93ddb4b6bd4b4d17b959919f6ab944f8878b1895)
  • Fix an issue with migrating/upgrading to a new version inside a Docker-based 4CAT (aeb8090d64c250e3b7473ce21cfc9eaff088cb19)
  • Fix an issue where the video downloader could fail when a link redirected too often or a video lacked a content type header (97209cbaeb46014a0de8b1e86e304e2725a5f12c)
  • Fix an issue where Twitter datasets exported as CSV could have different columns depending on the date the dataset was imported into 4CAT (ce2b2d5674881d850470ff49bcad22f87e7a45a0)
  • Fix an issue where the 4CAT front-end log would not start properly when running 4CAT via gunicorn but not inside a Docker container (84168e945e2ecf963cfdac3409d60544b521f694)
  • Update LinkedIn item mapper to handle recently collected datasets (38a865ef6eb3f487a5525de47a44c9e7048b4073)
  • Update Douban capture module to properly collect comment like counts (e1211c73735374086ff9098a161f088c730b4e1c)
  • Add various explanatory tooltips to dataset result pages (c0aa4c75a40c0f1316d1440cf61039c7371803ec)
  • Make 4CAT more robust in how it maps content imported with Zeeschuimer into CSV files (#409)

v1.39

3 months ago

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

  • Fix several issues that could occur when trying to import CSV files (#404, 9963cd868a1eb8c7c4015152fe0ca5b51241d685, 02acd865415dd6a1d5b9e9675bcaab050fe2cf13, 0049de694f010e9d08f5c27658663c9799b93b94, 151d498568ce328848a85034a5eddce9d30ef5b8, cdfe75c7eca937bf5977cb26ee3d09ff25fce923)
  • Fix an issue where the canonical host name for a Docker-based 4CAT front-end would not always be set correctly (#395, #403, 0d1dc05605d012ab2bb1d1ff25620ca6c82856b5)
  • Fix an issue where tags would linger in the database after they were no longer associated with any user, or would be stored in the wrong order of preference (c38a2158524bfdb4469889bdbdc28b85535306af, 79f58bf703a1c7b1d2e068c5a8107c2d16329c13)
  • Fix an issue where uploads from Zeeschuimer would fail for data sources whose ID starts with a number (e.g. 9gag)
  • Fix an issue where making a user would crash the front-end if a user with the same e-mail already exists (14c1f9ce12c8104418fa20318e109132e0a60d92)
  • Fix a crash in the 'Post/topic matrix' processor (be6ea8c21a51334dc18b005dc942bca6482aa05c)
  • Fix an issue where the user-specific setting for the max amount of downloaded TikTok videos would be ignored (df2462fc46f63118227c6f4c10522972495de6d2)
  • Fix a crash when duplicating a dataset and copying the dataset's owners (a3fdbadfb64c32495f63535936475544002304b1)
  • Fix an issue where the TikTok metadata fetcher would fail with a 'proxy unavailable' error when trying to run multiple TikTok processors at the same time (04faf2306b660f8a5b81f7a2fb869a0b50df152e)
  • Fix an issue with the Telegram image downloader (23edf4473a4c7dcea21a17f9336182d75ded01b7)
  • Fix an issue where network processors could crash with a divide by zero error if run on an empty dataset (e10cb2e4b8e56cb9d3791b1a51f3de8569cc910e)
  • Add a clearer error message when trying to merge datasets that are not CSV or NDJSON (c5fbe02f59111050bc6c2be3d35cb6493e0eb93a)
  • Add an explicit edge weight to generated networks that is properly recognised by Gephi (c41587079a44842a91e4c9f5539bbefb9a037765)
  • Add the option to only capture the first frame of a video when extracting video frames (b3981c3ca4b5abbf672190ca0a6d46a5f3f9dc74)
  • Add various image processors (e.g. image wall) as supported for video frame datasets (588290ad92856698b4fad883743b9840cfb2886f)
  • Add improved compatibility between video hash processor and image classification processor so that video frames can be visually categorised (a4e6904fe21ed174ab6a6246a6008853d352309c)
  • Add data source options to explicitly define the Tumblr API key to use if no 4CAT-wide keys have been configured (fdbdca94e3103b61f11d3ebab116a407dd960cad)
  • Add a 4CAT setting that determines which proxy headers will be taken into account for URLs generated in the front-end (5e47dacbba62413ea566fe369a9b5f985824946b)
  • Update the TikTok import processor to cope with the new data formats provided by Zeeschuimer (f24828b4ee7644cd73094a7652f5ebfd324ec9b6)
  • Update code that determines place of a dataset in the queue to be more efficient (239726bde31b83e84ec53785d5cc4b5eab0b2ead)
  • Update dataset importer to stream files, which should prevent issues with very large data files (7a1c4b92f6ff5f5546f3941faa36a6651981fb83)
  • Update the type of the jobid column of the jobs table to BIGINT to avoid issues with long-running 4CAT servers (3414a964ebc19949897317c1af7600abdce26da6)
  • Update jobs table to no longer have a useless 'status' column (9f493b2787dbd885ebac2e92d7114566b23daf29)
  • Update processor presets so they do not linger in an unfinished state if one of their components crashes, finish with an error instead (b815c5406b342e3dad980ab22be193850a0d1396, ddf9aabddd8ad718a83cb232d2ad7cb534c6e0a4)
  • Update various image dataset processors to produce more compatible .metadata.json files (87ec4d0cb18efbe0bd112dfa40f0928064afabf5)
  • Updated version for the videohash library dependency (5f5e10f90beb32a943c6f30347cdc95258897923)

v1.38

6 months ago

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fix for a bug introduced in the previous release, v1.37:

  • Fixed an issue that made the CSV upload data source never get past the 'please define your columns' stage when uploading CSVs with a custom format.

v1.37

6 months ago

:warning: Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. :warning: We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following features and fixes:

Processors

  • Added a 'top hashtags' processor for datasets that contain hashtags (this is a preset for the 'count values' processor)
  • Added more configuration options for the image wall processors to limit how large datasets can be
  • Fixed a bug that made the co-link processor crash when used with particularly small datasets
  • Fixed various issues with processing data via the DMI Service Manager

Data sources

  • Data parsers for data imported from Zeeschuimer have been updated to allow data captured from the current version of the supported platforms.
  • Added a data source that allows importing datasets from other 4CAT servers (#352, #375). This is not enabled by default but can be enabled in the Control Panel.
  • Fixed an issue where CSV files would erroneously be detected as having no header rows upon importing them (#392)
  • Fixed a number of issues with Telegram data parsing (#368, #371)

Deployment and configuration

  • 4CAT will now update Python libraries to their latest compatible version when running migrate.py or upgrading via the control panel.
  • Docker images are now published for both ARM and x64 processor architectures (#392)
  • Added a button to the 'Restart or upgrade' control panel page to restart only the front-end
  • Added the option to migrate to a development branch of 4CAT via the control panel's "Upgrade" page. This requires enabling the 'Can upgrade to development branch' privilege in user settings before it is available.
  • Fixed bugs with restarting the 4CAT front-end via the control panel when running via Apache, gunicorn or uwsgi
  • Fixed a bug where generated URLs could have the wrong scheme when running 4CAT behind a reverse proxy
  • Fixed a race condition that could cause the front-end container to crash on start-up when using 4CAT via Docker (#378)
  • Fixed a potential issue when installing 4CAT via Docker with the latest version of the Postgres image (#382)

Interface

  • Added a panel to the control panel which shows the active user tags for the currently logged in user
  • Added a page to the control panel that allows creating many users at once by uploading a CSV file with user data
  • Added a 'User Interface' category to the Settings panel to configure 4CAT's interface, for example to show in-line dataset previews and what to use as the 4CAT 'home page' (#380)
  • Added the option for users to now receive an e-mail alert when their dataset is completed, which can be enabled via the control panel through the 'Show email when complete option' option in the 'User interface' settings (#329, #385)
  • Added an indication of the precise place in the queue for queued datasets (#239)
  • Added the option to force a particular configuration tag by passing a specific HTTP header. This can be used to serve a different configuration of 4CAT depending on e.g. the used domain name, or other factors as determined by the reverse proxy serving 4CAT (#380)
  • Fixed an issue with the manipulation of user tags via the control panel (#383, #384)
  • Fixed an issue with changing the ownership for many datasets at once via the 'Bulk dataset management' page
  • Fixed an issue that allowed the 'About' page to appear twice in the site navigation

v1.36

10 months ago

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes for bugs introduced in the previous release, v1.35:

  • Fixed an issue where 4CAT would not be able to properly fetch the latest version from GitHub when upgrading
  • Fixed an issue where admin users were not able to manage user notifications
  • Fixed the date filter on the 'Dataset bulk management' page
  • Fixed an issue with the expiration worker that would not actually delete expired datasets

Additionally, this release incorporates the following features and fixes:

  • Fixed an issue where imported Twitter data with withheld quote tweets could not be parsed
  • Fixed an issue where imported CrowdTangle datasets with Facebook data could not be parsed if they had an empty 'Post Created ' column
  • Fixed an issue where imported Douyin datasets with incomplete post data could not be parsed
  • Fixed an issue where the tokeniser would crash when running it on a dataset without a 'body' column if not specifying a column to extract tokens from
  • The upgrade dialog in the control panel now links to the release notes of the latest available upgrade

v1.35

10 months ago

This release of 4CAT fixes the following bugs that were introduced in the previous release, v1.34:

  • Fixed an issue when upgrading from an older version where the datasets table would not correctly be upgraded to the new scheme
  • Fixed an issue where a dataset's parent key could be NULL or an empty string; it should now always be an empty string
  • Fixed an issue where filtering datasets by user would have no effect
  • Fixed an issue where datasets made with a filter processor would not have the same ownership as the original dataset
  • Fixed an issue where deleting a child dataset would redirect to the dataset overview page instead of the parent dataset result page
  • Fixed an issue where the CLIP image processor would not read the correct configuration values
  • Fixed permalinks to processor results
  • Fixed crash in the worker that deletes expired datasets and users
  • Fixed the link to Gephi Lite in network previews
  • Added POSTGRES_TAG to .env allowing users to choose which Postgres database image they wish to use. The Postgres 15 Docker image is incompatible with Postgres 14 and users may wish to set POSTGRES_TAG=14 to continue using version 14 or otherwise follow Postgres instructions on how to upgrade. This should not be an issue if you are upgrading 4CAT through the web interface.

This release changes the way dataset expiration works, to avoid situations where it is ambiguous whether a dataset should 'expire' and be deleted automatically. Expiration can now only be configured per data source, not globally. To make this easier controls have been added to the control panel to set expiration time for multiple data sources at once, and to 'keep' or 'expire' datasets in bulk.

⚠️ If you are upgrading 4CAT and had datasets or data sources set to expire, 4CAT will automatically disable all expiration ⚠️ to avoid datasets expiring inadvertently. Please inspect the relevant settings and adjust as needed after upgrading. You can find the controls at the 'Data sources' tab in the 4CAT settings and the 'Dataset bulk management' page, both in the control panel.

Additionally, this release of 4CAT incorporates the following new features and fixes:

  • Added a page to the control panel that shows the latest 250 lines of the backend log files
  • Added a page to the control panel where datasets can be managed in bulk
  • Added a column link_url to the CSV export of LinkedIn datasets containing the link embedded in a post
  • Added a column is_withheld to the CSV export of Twitter datasets imported with Zeeschuimer indicating if a tweet was withheld (withheld tweets could previously crash data exports)
  • Added a user privilege which controls whether users can manipulate datasets they do not own (disabled by default, enabled for admins)
  • Added Explorer styling for Douyin datasets
  • Added a setting controlling how many images the PixPlot processor can process
  • Fixed an issue with the TikTok URLs downloader which would erroneously try to capture posts behind a login wall
  • Streamlined annotating data via the Explorer and subsequent usage of annotations
  • The co-link network processor will now no longer add redundant loops
  • The 'data sources' setting in the control panel is now easier to manipulate and has more explanatory information

v1.34

10 months ago

⚠️ Docker 4CAT releases v1.30 to v1.33 have a bug where upgrading 4CAT would never complete due to issues fetching the latest version. Please follow these instructions for upgrading if you are running one of these versions. You can find your 4CAT version by going to the Control Panel and clicking the 'Restart or Upgrade' button. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following new features and fixes:

Processors

  • New processor to extract audio to files from downloaded videos
  • New processors to work with video files (require ffmpeg)
    • Processors to download videos generically (with yt-dlp) and specifically for TikTok (#343)
    • Processors to detect scenes in videos and capture frames from videos to images (with ffmpeg)
    • Processors to render captured frames to timelines
    • Processor to merge multiple videos into one 'video stack'
  • New Machine Learning-based processors via support for containerised processors run through the in-progress DMI Service Manager
    • New processor to convert speech-to-text with Whisper AI
    • New processor to categorise images with OpenAI's CLIP model and visualise results
  • Fix an issue where the NDJSON to CSV processor would not include all NDJSON fields in the CSV output
  • Fix an issue with the Word Embeddings processors that crashed when using certain prevalence thresholds (#353)
  • The tokeniser and CSV converter can now run on NDJSON files
  • The column filter processor can now ignore letter case

Data sources

  • Add support for importing Douyin and Twitter data from Zeeschuimer
  • Add a data source for VK/VKontakte, using the VK API (#358)
  • Fix issue with the Tumblr data source which crashes it when failing to connect to the API
  • Fix an issue where importing CSV files would crash if certain columns were not included or in the wrong format
  • Fix an issue with the Word Trees processor which would make it crash for certain datasets
  • Disabled Twitter and Reddit data sources by default as the relevant APIs are no longer functional

Deployment and configuration

  • Add --branch command line argument to migrate.py to allow migrating to a specific git repository branch
  • Add environment variable to allow configuring the database port used in Docker builds
  • Fix issue with failing Docker builds due to improperly included dependencies
  • Fix issue where data sources based on imports could not be disabled properly
  • Fix an issue where it was not checked if the user was logged in when exporting datasets to 4CAT
  • Fix an issue where upgrading 4CAT would never complete due to issues fetching the latest version from GitHub (#356)

Interface

  • Add a configuration option to toggle the availability of the 'Request account' page on the login page
  • Add 'Open with Gephi Lite' button to network previews
  • Add a separate control for the secondary interface colour in the 4CAT settings
  • Add an option to the 'Create dataset' page to allow the user to choose how to anonymise/pseudonymise a dataset
  • Add an icon to the dataset overview page indicating if a dataset is scheduled for deletion (#330)
  • Add controls to the dataset page that allow sharing datasets with other users (#311)
  • Add some statistics to the control panel front page and move notifications and user management to their own pages
  • Add the option to assign 'tags' to users, where each tag can override certain configuration settings, so you can configure some users to have different privileges than others (#331)
  • Add current version number to interface footer
  • Add initial support for imported Twitter and TikTok data to the Explorer.
  • Merge data source settings with general settings page in control panel
  • Overhaul of settings page in control panel
  • Fix an issue where options for data sources and processors that were a checkbox were not parsed properly (#336, #337)
  • The 4CAT favicon now automatically matches the interface colour (#364)
  • Upgrade Font Awesome to 6.4.0
  • The default name for datasets imported from Zeeschuimer is now more descriptive

v1.33

1 year ago

This hotfix release fixes the following bug in the previously released v1.32:

  • Somehow we ended reintroducing the same bug 1.30 needed a hotfix for. Sorry for the inconvenience!

Instructions for upgrading can be found on the wiki.