I made my first Twitter bot today. It was surprisingly easy! I was worried I'd have to get all up in Oauth's business (please God no), but I didn't. Instead, I managed to go from zero to bot in the length of Philip Glass's Koyaanisqatsi. Aw yiss.

xkcd/

-- xkcd/1646

Specs

  1. Auto-delete my old tweets from Twitter.
  2. Save them somewhere else.


Walkthrough

Let's get started!

import os
import tweepy
import argparse
import pandas as pd
from datetime import datetime, timedelta

So, I grabbed os for file management, tweepy as my Twitter API Python wrapper of choice (docs), argparse for command line arguments, pandas for lazy data munging, and datetime stuff for filename management and defining what an "old" tweet is.

First, let's connect! After registering my app on apps.twitter.com, I granted it Read-Write access and generated some user tokens. I put these in a local dotfile, because SECURITY. (Incidentally, whoa, check out this cool dotfile zoo...) I made a "hidden" function, _connect(), that'll grab the permissions from my dotfile, do the whole Oauth thing, and return a Tweepy Twitter API object. "Hidden" because it won't be accessible from the command-line, but rather called from the other main functions (below).

def _connect():
    print("Connecting to Twitter...")
    consumer_secret = os.environ['TWITTER_CSECRET']
    consumer_key = os.environ['TWITTER_CKEY']
    access_token = os.environ['ACC_TOKEN']
    access_token_secret = os.environ['ACC_SECRET']
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    return tweepy.API(auth)

At this point, I should really test stuff out... but... I was lazy and continued.

Next, I wanted a function to archive all my tweets. I called this backfill(). No arguments, because - meh - I can't think of any. It just reads through my entire timeline and archives what's there into a csv, filenamed to the date.

Word on the street is Twitter is limiting the tweets it shows you on the website. Via the API, you can grab a default of 20 (or a maximum of 200) tweets from the api.user_timeline() method. This StackOverflow post includes the for loop to paginate through and grab up to 3,200 tweets.

def backfill():
    api = _connect()
    all_my_tweets = []
    print("Downloading all tweets...")
    for status in tweepy.Cursor(api.user_timeline).items():
        all_my_tweets.append({'id': status.id,
                            'date': status.created_at.strftime('%Y-%m-%d'),
                            'text': status.text,
                            'retweets': status.retweet_count,
                            'favorites': status.favorite_count
                            })

    pd.DataFrame(all_my_tweets).to_csv(f"{datetime.now().strftime('%Y-%m-%d')}_backfill_twitter.csv")

After the backfill was done, and my precious old tweets were safe, I wrote the deleter function, delete_old_tweets():

def delete_old_tweets(date):
    api = _connect()

    # Ugh filename
    for filename in os.listdir():
        if '_backfill_twitter' in filename:
            twitter_backup = filename

    df = pd.read_csv(twitter_backup)
    df['date'] = pd.to_datetime(df['date'])
    print(f"{len(df)} tweets found.")

    to_delete = df[df['date'] <= datetime.strptime(date, '%Y-%m-%d')]['id'].values
    print(f"Deleting {len(to_delete)} tweets from before {date}.")

    for tweet_id in to_delete:
        try:
            api.destroy_status(tweet_id)
        except tweepy.error.TweepError as e:
            print(e)

Finally, I prep it for command-line execution. I add two flags/arguments: a boolean for backfilling (-b), and a string date for the oldest tweet you want to keep (-d). The default will be anything older than 30 days. This means that, if I go quiet on Twitter and I let the bot run, it'll clean out my timeline entirely. Which is fine (but I might want to have it leave a little note there, then, "This timeline has been archived by a bot. Angela will return soon" etc etc).

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Blue bird killer.")
    parser.add_argument('-b', default=False, action='store_true', dest='backfill', help='Backup all tweets?')
    parser.add_argument('-d', default=None, dest='date', help='Delete tweets before this date ("YYYY-MM-DD") (Default is anything older than 30 days).')
    args = parser.parse_args()

    if args.backfill:
        backfill()

    if args.date:
        delete_date = args.date
    else:
        delete_date = datetime.now() - timedelta(30)

    delete_old_tweets(delete_date)


Thoughts

My main sticking point was, as always: abstraction. I knew I wanted to: (1) archive before I delete, but (2) not archive the entire timeline every time, and (3) delete based on the Twitter object's ID.

Koyaanisqatsi was wrapping up, so I was rushing and I made a mess instead: I separated out deleting from archiving (backfill() isn't called in delete_old_tweets(), and I think that makes sense...), but I made a mess with the filenames and duplicate deleting. If you backfill on 2018-01-02 and want to delete tweets tomorrow, what filename do you choose? For now, I just choose any old archive filename, ignoring the date. Silly! Also, an archive csv is the snapshot of your entire timeline on that date - so if an archive is used for deleting more than once, you'll run into an error: you'll be trying to delete tweets you already deleted!

Much better would be a living archive, one with a status field that gets toggled: deleted, live, whatever. I guess I could set up a MySQL database or something? Or write over the csv when I delete? I don't like either idea; one seems like overkill, the other seems brittle.

TODO

I could set up a cron job to run this locally, a la my automatic desktops script. I could put it on some server somewhere; buy some space on an AWS EC2 instance. I could (finally) set up my little Raspberry Pi and have it always-on. Or I could just manually run the script as I remember to (WORST OPTION).

For now, though, I'm happy with having a functional bot that only took about a Glass album to write (apparently about an hour).

Favorite track from Koyaanisqatsi

Prophecies!

Readings