A Github scanning tool that identifies hardcoded credentials while filtering the false positive data through machine learning models :lock:
Restructure the project and repackage the tool in a more modern way.
Introduce src layout and a pyproject.toml
while deprecating setup.py
.
Version tracking is now delegated to a dynamic file and old package data (old folders for BoW models deprecated in v4.4) have been deleted.
A new badge has been introduced to show what python versions are currently supported by Credential Digger (this point is often source of confusion as we have to rely on external libraries not necessarily supporting the latest available python versions). This badge will be updated only after the release of pypi package (it relies on metadata published there).
Minor version has been increased to better distinguish from the old package structure, even if it could also not be strongly needed in such a case.
get_discoveries_with_rules
method in both client and CLIscan_file
method in server/UIget_discoveries
to also return the matching rule of a discovery. A with_rule
optional parameter has been added (default to False
)Main features:
hyperscan
version 0.2.0 (i.e., based on libhyperscan5
) for python3.8 (was hyperscan
0.1.5 before, based on libhyperscan4
).Bugfixes details:
git_username
authenticationWe add an optional parameter git_username
that can be set to authenticate in order to perform a scan.
While this parameter is not mandatory for GitHub (neither .com nor enterprise), it is needed for some private git servers and for private bitbucket repos.
If the git_token
is not set, this parameter is ignored (since the tool cannot authenticate with a username without token).
On the contrary, if the git_token
is set, the username used for authenticating the tool is either git_username
(if set) or oauth2
(the default value, which is the one adopted by github)
The git_username
parameter is supported not only in the python library but also in the CLI and in the UI (with a new optional input field).
add_rule
)export_discoveries
more efficient by not loading all the discoveries of a repo (unless needed)With this release we restructure the ML models in order to improve their precision. Moreover, the new models will be directly integrated in the project, overcoming the painful download&linking needed for the former ones.
All the changes are transparent to the final user (i.e., no API or function definition changed), thus there was no need for a major upgrade to v5.
We decided to deprecate the fasttext approach and shifted to the usage of a regex to filter out false positive file paths. Indeed, according to our tests, we noticed that we can keep a good precision while decreasing the overhead
We decided to deprecate the old fasttext double-model (extractor+classifier) approach in order to shift to a NLP approach based on CodeBERT. Overall, it's slower but way more precise, even if it only works for password. Hence, the change of name from SnippetModel to PasswordModel. Moreover, since the PasswordModel only works for passwords, we added a check in the Client to only run this model over password discoveries.
download
function has been deprecated and models are managed automatically by Credential Diggercategories
enum in the postgres db in order to drive the users to 4 main rule categories. Nevertheless, this enum is only enforced in new postgres installations to make the transition smootherscan_snapshot
from v4.3.1Credits also go to the wonderful work from @melisande1
New features:
Minor improvements and fixes: