A collection of all the data i could extract from 1 billion leaked credentials from internet.
Leaving the 20 year old stuff of red team behind. Stuff works fine, and no one bothers to check/replace it through decades.
You can check the status.txt in this repository to keep track of included dumps.
During my research, i've noticed a handful high entropy passwords (10 characters, uppercase-lowercase-digit) that were being reused. These passwords had really low occurrance rates, but it was still a lot more than i was expecting.
Some noticable stuff about these:
I've filtered passwords which are 10 character long, and matches (?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=^[A-Z][A-Za-z0-9]+[A-Z]$)(?!.*[a-z]{3})(?!.*[A-Z]{3})
which had an occurrence rating of less than 1.2 per 100 million.
I've released this list of 39576 passwords in mystery-list.txt under this repository. I've refiltered it to get 763k passwords matching this pattern.
I have no idea what this uncovers and what it implies, but i'm suspecting a password manager out there is creating passwords with low entropy, causing repetitions over a lot of users. All the ideas about this are welcome and appreciated.
Please create an issue and explain what you want to learn, and if its interesting i'll query the thing and add the result!
257.669.588
were filtered as either corrupt data(gibberish in improper format) or test accounts.168.919.919
passwords, and 393.386.953
usernames.123456
. It covers roughly 0.722%
of all the passwords. (Around 7 million times per billion)6.607%
of all the passwords.36.28%
, and with most common 10 million passwords hit rate is at 54.00%
.9.4822
characters.8.83%
of the passwords are unique - they were only found once.
9.7965
characters.7.082%
of these passwords contain special characters - Rest matches ^[a-zA-Z0-9]$
20.02%
of these passwords are letters only, and 15.02%
is only lowercase.
9.3694
characters.I've partitioned my data depending on the top level domains of the email providers. (filter here: https://gist.github.com/FlameOfIgnis/9a1da894e8ae385a1ee58b8a734b8979)
I'm only releasing short lists of top 150 passwords for now. I'll eventually release full lists. I'll try to refrain from releasing an incomplete version publicly for now.
I'll update these lists per billion credentials i process. So even though some of the languages lists are not complete yet, they'll be in a better shape soon.
I've had enough data for
accounts to generate 1M password lists.
In contrast to that, I had too little data to work with for
And i had a total of 0 accounts with Slovene language.
14.344.391
passwords.14.344.391
(same as rockyou) most common passwords, 11.583.476
of them were not in the rockyou.txt (a ratio of %80)Very likely that around 8 of these are from test accounts/bad dumps that i failed to filter correctly
123hfjdk147
1464688081
159753qq
2012comeer
6V21wbgad
<password>
Blink123
D1lakiss
Exigent
Groupd2013
Indya123
N0=Acc3ss
R9lw4j8khX
Status
Telechargement
aobo2010
baili123com
bhf
cme2012
demon1q2w3e
demon1q2w3e4r
demon1q2w3e4r5t
exigent
g13916055158
hg0209
lincogo1
lizottes
megaparol12345
minecraft
nks230kjs82
nonmember
nyq28Giz1Z
pk3x7w9W
rr123456rr
startfinding
youbye123
yuantuo2012
I'll try to work in chunks of 1 billion credentials and update it regularly as it processes the data, until i run out of dumps.
Like the project? Help me throw more resources at it!
bc1quxwhewutde2ehjzqcflcgdjqwg34pmcdq3chcp