Apps composed of open source pieces

Noah Petherbridge
kirsle
Posted by Noah Petherbridge on Thursday, September 14 2017 @ 06:41:15 PM

The Internet is full of freely available source code. If you're a software engineer and you're writing a new application, chances are a lot of the code you're writing has already been written thousands of times before by other engineers that came before you.

Some engineers seem to believe you can compose an entire complicated app just by mixing and matching tiny pieces already written before you. Pull a session manager from here, a template engine from there... a login manager, a password manager, a database accessor... all of these being small off-the-shelf components that you're trying to duct tape together into one coherent app. The actual code you as a developer write is just the few lines needed to stitch these all together. You'll have a production-ready app running in just a few minutes!

Sure, but in my career as a software engineer I've learned that it's usually better to write all those pieces yourself so that they fit together perfectly how you want, not just "good enough."

This is a story of a particularly annoying Python module I was dealing with at work.

Flask-Login

Flask-Login is a third-party plugin for Flask (a Python web app framework) that makes it easy for you to handle the part where users can log in to your app. On its surface it sounds quite simple:

  • That User class you already have for SQLAlchemy? Just extend the UserMixin from Flask-Login to power it up even more.
  • When a user should be logged in, just call login_user(user) giving it your SQLAlchemy User model instance. logout_user() will log them out.
  • You have a global current_user object available to your entire app, which can query the login status and directly access your User model.

(Note: this blog post isn't meant to pick on Flask-Login in particular, it just serves as a really good example for me right now. The underlying problems in using off-the-shelf code is much larger and wider than this one module.)

Flask-Login, it says, only really cares that your user has some sort of unique ID, and besides that it stays out of your way.

At my current job, a Python app I was assigned to was already using Flask-Login, and I didn't see a good reason not to use it, so I left it as is. It was working well enough so far and seemed pretty simple. And then over the next several months, as product requirements shifted around and I had to start touching code that gets uncomfortably close to Flask-Login, it started to show its limitations.

Redis Session Storage

By default, Flask stores user sessions directly on the user's own web browser -- inside the session cookie. You can set arbitrary JSON data on the user's session, and it gets encoded into the cookie. The session is signed, of course, with a cryptographic algorithm and a secret so that the end user isn't allowed to tamper with his copy of the session.

This is the use case that Flask-Login was apparently designed for. Flask-Login only cares about an "ID" that it can find your users by, and it even lets you define your own function for how to match an ID with a user.

In a basic Flask-Login app, the ID would probably just be your database primary key, and Flask-Login would set session["user_id"] = 42 or so. All the places where user IDs appear would have the same values and it would all make sense.

The first changing requirement we had was to replace Flask's default session manager with one backed by a Redis cache. This isn't an uncommon feature in a Flask app: there's even a Server-side Sessions with Redis snippet linked in the official docs.

With a Redis session store, you make up a random "session ID" and put that in the user's cookie, rather than putting the actual session data there. Instead, the "session ID" is used by the server to look up the session details in Redis.

This didn't sit well with Flask-Login, however. It needs an ID to keep track of users by, and if the only thing you're sending to the browser is an opaque Session ID, then that's what Flask-Login has to use. I wrote my own "user loader" function for Flask-Login to take the user's Session ID, look it up in Redis and return the user from there.

But now this meant that Flask-Login thinks the "user_id" is something ugly like 287bbe58-03f5-46ea-bac8-5ac60c924b36 instead of 42. Ugh. And you see things in the session data like session["user_id"] and it's not a user ID at all, but a session ID! These are annoying, but I could live with these quirks.

And then...

Another Layer of Authentication

Shifting requirements again, and now our app needed to support the concept of "a user who's not really a user, but is unique and needs some sort of authentication-like protection of their data."

We'll call them pseudo-users. They're not real users because they don't have a User object in the database, nor a user ID, and they don't have a password or much else useful. But they have a workflow where the pseudo-user hits one page on the server to "register" and they're given a random token, and they use that token on other pages to "stay logged in," but they don't have an account. But the randomized token protects their data from other random hackers trying to get into it. Make sense?

Regardless, this was such a crazy esoteric idea that it was beyond Flask-Login's capability. Flask-Login needed to deal with users, damn it, and if you can't keep all your users in one place it won't work.

For this, I had to basically completely circumvent Flask-Login and store my own keys directly on the session object.

What if I want to edit another user's session?

More new requirements for our app! We needed a workflow to be possible where a sign-up process is started by one computer, and completed on a different one (so they have two different sessions), and afterward, both sessions should be logged in to the new account.

(The two sessions learned of each other through some one-time e-mail links and things. Suffice it to say, the second session knows the session ID of the first one and it wants to make that first session be logged in as the new user account too)

In a default Flask app that keeps your session completely inside the cookie, this kind of thing would just be impossible. The data is on the end users' browsers, you can't do anything about it.

But remember we're using Redis, and we just give users session IDs, and we keep the data on the server, so if we know a different user's session ID, the app should be able to modify that other user's session instead of the current one.

Flask-Login rears its ugly head again.

Because Flask-Login was designed for cookie-based sessions, it has all these extra security precautions built in, where it hashes the end user's IP address and web browser User Agent and stores that in the session too. The idea is that if a hacker were to steal your cookie, they wouldn't be able to log in as you because their IP address and User Agent would likely be different than yours.

We didn't need that complexity because our data is in Redis where a hacker can't get it. Protecting the user's Session ID isn't something we require for our purpose at this time, and if we need to later, I'd rather program my own solution to that, too.

This was making it difficult to modify another session without also causing it to be invalidated.

Hair-pulling Bugs

At this point, Flask-Login has already been on thin ice for me and it was one more fuck-up away from being gutted out and custom code replacing it.

Today at work, one of my coworkers was having a bizarre bug wherein he wasn't able to stay logged in to the Flask app. He'd click the sign-in button and the page would immediately reload back to the login page.

The latest code on the master branch worked fine, but on his branch it was impossible to log in. His code didn't go anywhere near the authentication code. All he did was add a new SQLAlchemy table to the database and add a column to the User table, an is_active boolean.

I checked the diff between his branch and master and nothing about it looked like it should cause the login to fail. I did some poking at his code, and it turned out the server wasn't even sending a session cookie at all.

It turned out that Flask-Login's UserMixin class has already reserved the attribute is_active for its own purpose, and by re-using that name in our User class, we confused Flask-Login and it thought the user shouldn't be allowed to log in, and so it wouldn't send them a session cookie. It didn't emit any kind of error or anything.

How I would not use Flask-Login

I've known about Flask-Login for many years, but had decided I wouldn't use it for any of my own code. This was my first experience actually trying to use it, and I was right to avoid it.

Here's what I would do instead (links go to the Python code that, at the time of writing, powers this very web blog):

And all throughout my code, I can check the session["login"] boolean to see if the user is logged in, and check session["uid"] to get their user ID. In other Flask apps I've also stuck these on the request-global g object for easier access, like g.user_id or g.logged_in.

It's not difficult to write a user login system, and I'd always rather write my own than use an off the shelf library. Doing it myself gives me full control over how it's doing it and lets me extend it in any direction I want to go.

Categories:

[ Blog ]

Comments

There are 0 comments on this page.

Add a Comment

Your name:
Your Email:
Message:
Comments can be formatted with Markdown, and you can use
emoticons in your comment.

If you can see this, don't touch the following fields.