An AWS Solution for a small tweeting app – Data Engineering with Salim Saeedi

Problem

Suppose you need to develop an application to send tweets on a regular basis – based on data in a database. Using AWS, two type of resource need to be accessed: (1) External resources on internet – like Twitter API and (2) Internal resouces on AWS cloud like RDS.

Solution

Three Architectures are considerable and it relates to how we define services in AWS cloud to access internet:

Option	Storage	Compute	Architecture	Cost (assuming all other services are on free tier)
1	RDS	Lambda	Read data from RDS. Define a lambd a function in a VPC with access to internet (Interner Gateway). In lambda read the data for the tweet and call Twitter API. PRO: Simpler architecture CON: Higher cost	Minimum $30 a month for NAT Gateway
2	RDS	Lambda	Read data from RDS. Define two lambda functions: First lambda, in a private VPC to read data from RDS. Second lambda not associated to any VPC that does the Twitter API call PRO: Lower Cost CON: More sophisticated architecure	Minimum $10 a month for VPC Endpoints
3	S3	Lambda	Don’t use db. Read data from text files stored in S3. Define two lambda functions simillar to 2nd option. PRO: Even Lower Cost CON: More sophisticated architecure + manual data handling	Around $1 a month

This document deals with option 2. Other options can be achieved by altering this option. Here the steps

Set up VPC

VPC: Set up one VPC (or use default)
Subnets: Set up three subnects. You will use one of them for the public subnet and the other two for private. We will use private subnets here for RDS. RDS needs at least two subnets.
Internet Gateway: Create one internet gateway
Route Tables:
1. Set up Private Route Table: One default Route Table is already set up and associated to three subnets above. Consider this as the private route table.
2. Set up Public Route Table: Create another Route Table. Set its destination to the VPC range and Target=Local. Add another destination=0.0.0.0/0 and set the Traget=<internet gateway set up above>
  1. Amend Public Subnet: Choose one of subnets and change its Route Table to the public route table just created above
VPC Endpoint: Create a new VPC Endpoint in above VPC and associate it to private subnets.

Set up RDS

Create an RDS service based on db platform of your choice (MySQL, Postgres, etc)
Associate private subnets (set in VPC section) to this RDS.
In order to load data into RDS you have two choices
1. Keep RDS private:
  You may need to keep RDS private for security reasons and it depends on the project’s secuirty guidelines. If this is a personal project, move on to the next option.
  You need to set up a publicly accessible EC2 preferrably with Amazon CLI already installed. Put EC2 in the same private subnets. Set up a NAT Gateway and then Connect to EC2. Then you will be able to use CLI to load data into db e.g. from data loaded into S3 before. However this solution contradicts the purpose of Option 2 architecture. Because setting up NAT Gateway incurres additional cost that relates to Option 1. However it is possible to set up NAT Gateway temporarily and delete it later.
2. Make RDS publicly available temporarily:
  The easy method here is to make the private subnets, public to access RDS from a db client temporarily. To this end, assign an internet gateway to private subnets. Also make sure to adjust VPC’s Security Group for correct incoming IPs related to db client software (e.g. MySQL Workbench, etc)

Set up Lambda

Two Lambdas required here:

Data Gatherer Lambda

This lambda will connect to RDS, generate the tweet text and calls the other lambda that sends the tweet. It also returns the message and result of tweet publication in JSON.
Note: “keys” is a custom module containing connection string info. It refers to a keys.py file like below. This file is also looking to lamda parameters for some of info.

import os

mysql_user = os.environ['mysql_user']
mysql_password = os.environ['mysql_password']
mysql_dbname = os.environ['mysql_dbname']
mysql_rds_host = '???.???.???.rds.amazonaws.com'

Create a new lambda called “data_gatherer”. Associate it to above VPC and private subnets defined.

1. Logging: You can use logging.getlogger().info method to log debug info to CloudWatch

import json
import logging
import sys
import pymysql

import keys

import requests
import boto3

mylogger = logging.getLogger()
mylogger.setLevel(logging.INFO)

2. Connect to RDS:

try:
    conn = pymysql.connect(keys.mysql_rds_host, user=keys.mysql_user, passwd=keys.mysql_password, db=keys.mysql_dbname, connect_timeout=20)
except pymysql.MySQLError as e:
    mylogger.error("ERROR: Unexpected error: Could not connect to MySQL instance.")
    mylogger.error(e)
    sys.exit()

mylogger.info("SUCCESS: Connection to RDS MySQL instance succeeded")

3. Read the data

def lambda_handler(event, context):
    cursor = conn.cursor()
    
    # Get today tweet
    cursor.execute( 'select tweet_post from Messages')
    today_msg = cursor.fetchall()[0][0]

4. Call the publisher lambda
https://???.execute-api.ap-southeast-2.amazonaws.com/???/??? below shows the API Gateway’s endpoint for Publisher Lambda. You don’t have this url yet. You will generate this url after the other lambda’s API Gateway is set up.

    if today_msg is not None:
        mypayload = dict(message=today_msg)
        r = requests.request(method = 'GET', url = 'https://???.execute-api.ap-southeast-2.amazonaws.com/???/???', params = mypayload)
        mylogger.info(r.text)

5. Return generated message and API response in JSON

    return {
        "statusCode": 200,
        "body": json.dumps({
            "message": today_msg,
            "API response": r.text
            # "location": ip.text.replace("\n", "")
        }),
    }

6. Set up trigger

This lambda will be triggered based on the clock. You can use CloudWatch Events as the trigger. Schedule expression is set up with linux’s cron command. As an example cron(0 14 * * ? *) will send tweets every day at 14:00 (UTC time)

Publisher Lambda

This is a function that is going to be called from “data gatherer” lambda. This is the one that actually sends the tweet. Note: “keys” is a custom module containing API keys. The file keys.py looks like this with autherisations obtained from https://developer.twitter.com/

YOUR_APP_KEY = '???'
YOUR_APP_SECRET = '???'
USER_OAUTH_TOKEN = '???'
USER_OAUTH_TOKEN_SECRET = '???'

1. Logging: You can use logging.getlogger().info method to log debug info to CloudWatch

import json
import logging
import sys
import requests
from requests_oauthlib import OAuth1

import keys

logger = logging.getLogger()
logger.setLevel(logging.INFO)

2. Define the tweet sending function

def tweet(msg):
    URL = "https://api.twitter.com/1.1/statuses/update.json"
    auth = OAuth1(keys.YOUR_APP_KEY, keys.YOUR_APP_SECRET, keys.USER_OAUTH_TOKEN, keys.USER_OAUTH_TOKEN_SECRET)
    PARAMS = { 'status': msg }
    r = requests.post(url = URL, params = PARAMS, auth = auth)
    return r

3. Write the handler function

def lambda_handler(event, context):
    today_msg = event['queryStringParameters']['message']

    if today_msg is not None:
        tweet_result = tweet(today_msg)

    return {
        "statusCode": 200,
        "body": json.dumps({
            "message": "Tweet Sent: "+today_msg,
            "API response" : tweet_result.text
            # "location": ip.text.replace("\n", "")
        }),
    }

Set up API Gateway

One of the lambdas are going to call the other one. Hence the second one needs to have an end point to be callable. To this end an API Gateway needs to be set up.

Create a new API
Create API method (GET or ANY in this example)
1. Method Request: Authorisation=NONE
2. Integration Reqest: Integration Type=Lambda Function, Use Lambda Proxy Integration=TRUE, Lambda Function=<Choose “publisher” Lambda name>
3. Method Response: Use default settings
Finally go to “publisher” Lambda
1. Add API Gateway Trigger
2. Choose API Gateway set up above

Getting it running

If all above steps are done correctly the application will successfuly start wokring based on the events set up in “published” lambda’s Cloudwatch Event’s cron statement.

For debugging, Cloudwatch can be used.