4cat Versions Save

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.

v1.34

10 months ago

⚠️ Docker 4CAT releases v1.30 to v1.33 have a bug where upgrading 4CAT would never complete due to issues fetching the latest version. Please follow these instructions for upgrading if you are running one of these versions. You can find your 4CAT version by going to the Control Panel and clicking the 'Restart or Upgrade' button. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following new features and fixes:

Processors

New processor to extract audio to files from downloaded videos
New processors to work with video files (require ffmpeg)
- Processors to download videos generically (with yt-dlp) and specifically for TikTok (#343)
- Processors to detect scenes in videos and capture frames from videos to images (with ffmpeg)
- Processors to render captured frames to timelines
- Processor to merge multiple videos into one 'video stack'
New Machine Learning-based processors via support for containerised processors run through the in-progress DMI Service Manager
- New processor to convert speech-to-text with Whisper AI
- New processor to categorise images with OpenAI's CLIP model and visualise results
Fix an issue where the NDJSON to CSV processor would not include all NDJSON fields in the CSV output
Fix an issue with the Word Embeddings processors that crashed when using certain prevalence thresholds (#353)
The tokeniser and CSV converter can now run on NDJSON files
The column filter processor can now ignore letter case

Data sources

Add support for importing Douyin and Twitter data from Zeeschuimer
Add a data source for VK/VKontakte, using the VK API (#358)
Fix issue with the Tumblr data source which crashes it when failing to connect to the API
Fix an issue where importing CSV files would crash if certain columns were not included or in the wrong format
Fix an issue with the Word Trees processor which would make it crash for certain datasets
Disabled Twitter and Reddit data sources by default as the relevant APIs are no longer functional

Deployment and configuration

Add --branch command line argument to migrate.py to allow migrating to a specific git repository branch
Add environment variable to allow configuring the database port used in Docker builds
Fix issue with failing Docker builds due to improperly included dependencies
Fix issue where data sources based on imports could not be disabled properly
Fix an issue where it was not checked if the user was logged in when exporting datasets to 4CAT
Fix an issue where upgrading 4CAT would never complete due to issues fetching the latest version from GitHub (#356)

Interface

Add a configuration option to toggle the availability of the 'Request account' page on the login page
Add 'Open with Gephi Lite' button to network previews
Add a separate control for the secondary interface colour in the 4CAT settings
Add an option to the 'Create dataset' page to allow the user to choose how to anonymise/pseudonymise a dataset
Add an icon to the dataset overview page indicating if a dataset is scheduled for deletion (#330)
Add controls to the dataset page that allow sharing datasets with other users (#311)
Add some statistics to the control panel front page and move notifications and user management to their own pages
Add the option to assign 'tags' to users, where each tag can override certain configuration settings, so you can configure some users to have different privileges than others (#331)
Add current version number to interface footer
Add initial support for imported Twitter and TikTok data to the Explorer.
Merge data source settings with general settings page in control panel
Overhaul of settings page in control panel
Fix an issue where options for data sources and processors that were a checkbox were not parsed properly (#336, #337)
The 4CAT favicon now automatically matches the interface colour (#364)
Upgrade Font Awesome to 6.4.0
The default name for datasets imported from Zeeschuimer is now more descriptive

v1.33

1 year ago

This hotfix release fixes the following bug in the previously released v1.32:

Somehow we ended reintroducing the same bug 1.30 needed a hotfix for. Sorry for the inconvenience!

Instructions for upgrading can be found on the wiki.

v1.32

1 year ago

This release of 4CAT incorporates the following new features and fixes:

Processors

Reworked TikTok video and image downloaders that can also download materials for posts of which the relevant link in the dataset has 'expired'
Support for tokenisation of Chinese text in the Tokeniser and Word Tree processors
Added a processor for extensive normalisation of URLs (e.g. youtu.be -> youtube.com) for easier link analysis
The column filter processor can now be used on any dataset with (mapped) columns
The user-tag and co-tag network processors can now be used on any dataset with hashtags

Data sources

Twitter data collected with Zeeschuimer can now be imported into 4CAT
CSV import can now parse Weibo data collected with Bazhuayu
CSV import has an option for automatically filling the ID columns so that datasets without per-item IDs can be imported
Fixed an issue where imported LinkedIn data could not be parsed properly
Fixed an issue where Reddit collection would crash for posts with malformed image metadata
Fixed an issue where datasets could be uploaded from Zeeschuimer even when the user was not logged in

Interface & management

The main accent colour of the 4CAT interface can now be configured in the 4CAT settings and is randomised upon first accessing a 4CAT instance
The name of the 4CAT instance (displayed on top of the interface) is randomised upon first accessing a 4CAT instance
Updated jQuery dependency
The log file link is now always displayed on dataset result pages, even if the dataset has not finished yet or is empty
Clarify that the Docker version of 4CAT can, in fact, use HTTPS

Other

Fixed an issue where 4CAT would mysteriously start crashing because of a lack of 'flavour' of Path objects
Bumped versions of various dependencies

v1.31

1 year ago

This hotfix release fixes the following bugs in the previously released v1.30:

An issue that made it impossible to get out of the 'congratulations, you have updated!' dialog after updating
An issue with the configuration of the back-end Docker container that prevented restarts from the 4CAT web interface
Some small fixes and tweaks that were just too late for the previous release.

Instructions for upgrading can be found on the wiki.

v1.30

1 year ago

Snapshot of 4CAT as of January 2023. Many changes and fixes since the last official release, including:

New and updated processors

A processor for downloading videos and a number of video analysis processors to analyse the downloaded videos with (#303)
A processor to merge multiple datasets into a new combined dataset (#301)
The datasets created with Filter processors now have the same type as the dataset that was filtered (#291, #292, #312)
An enhanced and more flexible processor to expand shortened URLs in a dataset with (#312)
Processors to annotate downloaded images with Clarifai and visualise the results as a network.
A processor to 'refresh' TikTok data, which attempts to update expired thumbnail and video links (among other things).
The 'semantic frame extractor' processor has been removed.
The 'pyLDAvis' processor has been removed (the package it relied on is unmaintained and intermittently broke builds).

New and updated data sources

Four new data sources for which data can be imported from Zeeeschuimer: 9gag, Imgur, Parler and LinkedIn (the Parler data source was reworked from the existing data source for that platform)
Uploading your own CSV data for 4CAT to analyse is much more flexible now and allows you to indicate the CSV column format yourself (#214, #297)
4chan datasets now index whether a post was deleted or not and when creating a dataset it is possible to exclude deleted posts (#306, #309)
The Reddit data source has been updated to conform with changes to the Pushshift API (#327)
The 'parliament speeches' and 'The Guardian Climate Change comments' data sources have been removed.

Interface updates, 4CAT control & management

Switching between data sources on the 'Create dataset' page no longer shows erroneous "Invalid data source selected" popups (#314)
The 'All datasets' page is no longer available to non-admin users.
A separate control panel for toggling the availability of data sources and setting automatic expiration for datasets created with that data source (#310)
When 4CAT is first installed, it now optionally tells us (the developers) that it has been installed (#284, #308)
The port through which 4CAT should connect to a mail server can now be configured properly (#299, #302)
The 4cat_backend container now logs directly to the container entrypoint's output stream so it can be viewed with docker logs.

And many smaller fixes & updates. If you are running at least 4CAT 1.29, you can update your 4CAT instance via the 'Restart & upgrade' button in the Control Panel.

v1.29

1 year ago

Snapshot of 4CAT as of October 2022. Many changes and fixes since the last official release, including:

Restart and upgrade 4CAT via the web interface (#181, #287, #288)
Addition of several processors for Twitter datasets to increase inter-operability with DMI-TCAT
DMI-TCAT data source, which can interface with a DMI-TCAT instance to create datasets from tweets stored therein (#226)
LinkedIn data source, to be used together with Zeeschuimer
Fixes & improvements to Docker container set-up and build process (#269, #270, #290)
A number of processors have been updated to transparently filter NDJSON datasets instead of turning them into CSV datasets (#253, #282, #291, #292)
And many smaller fixes & updates

From this release onwards, 4CAT can be upgraded to the latest release via the Control Panel in the web interface.

v1.26

2 years ago

Many updates:

Configuration is now stored in the database and (mostly) editable via the web GUI
The Telegram datasource now collects more data and stores the 'raw' message objects as NDJSON
Dialogs in the web UI now use custom widgets instead of alert()
Twitter datasets will retrieve the expected amount of tweets before capturing and ask for confirmation if it is a high number
Various fixes and tweaks to the Dockerfiles
New extended data source information pages with details about limitations, caveats, useful links, etc
And much more

v1.25

2 years ago

Snapshot of 4CAT as of 24 February 2022. Many changes and fixes since the last official release, including:

Explore and annotate your datasets interactively with the new Explorer (beta)
Datasets can be set to automatically get deleted after a set amount of time, and can be made private
Incremental refinement of the web interface
Twitter datasets can be exported to a DMI-TCAT instance
User accounts can now be deactivated (banned)
Many smaller fixes and new features

v1.21

2 years ago

Snapshot of 4CAT as of 28 September 2021. Many changes and fixes since the last official release, including:

User management via control panel
Improved Docker support
Improved 4chan data dump import helper scripts
Improved country code filtering for 4chan/pol/ datasets
More robust and versatile network analysis processors
Various new filter processors
Topic modeling processor
Support for non-academic Twitter API queries
Option to download NDJSON datasets as CSV
Support for hosting 4CAT with a non-root URL
And many more

v1.18a

3 years ago

A release to trigger publication on Zenodo.