I was reading this ACLU blog post about how DreamHost was served with a warrant to hand over IP addresses of some 1.3 million visitors to a website they host, and it got me thinking: do websites really need to store IP addresses of their visitors?
There are a lot of VPN companies such as Private Internet Access that advertise far and wide that they explicitly chose not to keep any logs. The idea is that if the VPN provider is served with a warrant for user activity, they would have no data to hand over, because they never stored anything in the first place. Why don't websites do that?
Most websites log your IP address like crazy -- at the very least, your IP might show up in the server access logs as you navigate the site. Software like Apache httpd or nginx log IP's by default in their access logs. Furthermore, most web apps would explicitly store IP addresses in comments posted by visitors or in account logs. I'm guilty of this too: I log the IP address for each comment posted, but I realized now that I never actually do anything with that information, so I probably don't need to bother storing IP addresses at all.
At one point I thought storing IP addresses might be useful in case I had to ban a spammy user. But I never do that, because banning users by IP addresses is not effective: a lot of users have wildly dynamic IP's that change frequently, a whole other subset of users can change their own IP just by rebooting their cable modem, and if you keep a long-term ban of an IP ("you can never visit my site again!"), you'll just end up banning innocent users who have inherited the banned IP address.
Configure your web server software (Apache, nginx, etc.) to not store IPs in access logs. They do by default, so you'd have to explicitly configure them not to.
In your custom web app, think long and hard every time you consider putting an IP address into your database or application logs. The default answer should be "do NOT store them without a good reason."
If you have a good reason to store the IP address, still consider the length of time that you need to store it and how to make bulk record warrants impractical. Ideally, you should only store them in memory (not written to the filesystem, where they might be left in an old log file or recovered by file undeletion tools) and get rid of them as soon as you no longer need them.
If you need to temporarily store IP addresses to rate limit failed login attempts and protect account passwords from being brute forced "online," you could store this information temporarily in Redis. A typical rate limiting scheme might be to prevent more than 5 failed login attempts over the span of 30 minutes, so in this example you wouldn't need to cache an IP any longer than 30 minutes or so.
If you need to compare IP addresses (i.e. to tell whether the current request has an IP that matches a recently seen user), you might store hashes of the IP address rather than the IP address itself. This would serve the purpose of uniquely identifying users, but obfuscates their IP address. If you're asked to hand over records, you would hand over the list of hashes because you didn't store the original IP.
But hashes can be brute forced (basically by guessing every possible IP address, and checking if the hash matches), and IP addresses are particularly predictable, so you'd wanna use a slow hashing algorithm.
Use a purposefully slow hashing algorithm like bcrypt or Argon2. These hashing algorithms are designed for securely storing passwords by making it so it takes a "long time" to compute even a single hash.
So at best, the site host could hand over a list of bcrypt hashes for the most recent hour-or-so of user activity, and let them have fun brute forcing the IP addresses that correspond to those hashes. There are only 4 billion IPv4 addresses, so while it would be slow, they might eventually crack all of them. But for IPv6 addresses? They'd probably never brute force even a single bcrypt hash. That's akin to randomly guessing an AES 128-bit encryption key (something I've never heard of being done successfully), because IPv6 addresses are also 128-bit numbers! Have fun slowly passing those through a bcrypt algorithm one at a time.
If your site is actively being hammered by a spammer, you would be able to determine their IP address at the time because you store it temporarily in Redis or somewhere. So when you're under active attack, you can figure out WTF is going on and blackhole the IP address or whatever.
But two weeks later? When that spammer has given up and gone home? Your server should retain no trace at all of that IP address anywhere. No log files, no databases, it should've been long since purged from Redis and nowhere to be seen in the system RAM.
I'll update my Rophako CMS to not explicitly store IP addresses in its database files, and configure nginx to not log IPs into the log files.
I literally never check IP addresses so I don't expect that this will disrupt anything. Your use case might be different. If you do need to store addresses temporarily, still do so carefully as outlined above.
I think "not storing IPs of website visitors" could become a selling point for privacy-oriented websites in the future, just like it is for VPN providers now. My blog is neither of those things, but my blog has always been where I get to experiment with neat things like this, so I'll be trying this anyway.