Game Tracker

Technical Paper — Database Design, Steam Integration, and Feed Architecture

What It Is

Game Tracker is a personal web application for tracking a game library, including playtime, achievements, and session history. It runs as a standalone Flask service on its own subdomain and keeps all data in a local SQLite database. Steam data syncs automatically: a background thread polls the Steam Web API every 30 minutes to update hours, and a scheduled job at 00:05 EST records a complete snapshot of the library each night.

The core design tension in this project was between freshness and correctness. Steam's API provides playtime data, but it doesn't expose a session log — only cumulative hours. Reconstructing "what was played when" from cumulative snapshots required careful schema design and a diffing strategy that turned out to be the most interesting technical problem in the whole system.

This document covers the database schema and the reasoning behind its structure, the Steam API integration and its constraints, the snapshot and diff mechanism that powers the activity feed, the live session detection algorithm, and the three-tier now-playing endpoint.

Database Schema

The database has five tables. Three store the primary library data; two handle the time-series snapshot system that makes the activity feed possible.

Core Library Tables

games is the primary table. Each row represents one game: title, platform, status (Playing / Completed / Backlog / Dropped / Wishlist), a 0–10 rating, cumulative hours played, and an optional Steam App ID. The steam_app_id column is the foreign key used to correlate local records with Steam API responses. is_favorite is a boolean integer used in the default sort order, surfacing starred games at the top.

CREATE TABLE games (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    title           TEXT NOT NULL,
    platform        TEXT,
    status          TEXT,          -- 'Playing', 'Completed', 'Backlog', 'Dropped', 'Wishlist'
    rating          INTEGER,
    hours_played    REAL,
    steam_app_id    INTEGER,
    cover_url       TEXT,
    completion_date TEXT,
    last_played_at  TEXT,
    is_favorite     INTEGER DEFAULT 0,
    created_at      TEXT DEFAULT (datetime('now'))
);

achievements is straightforward: each row is one achievement for one game, storing the title, description, unlock date, and icon URL fetched from the Steam API. The ON DELETE CASCADE on the foreign key means deleting a game automatically removes all its achievements — no orphan cleanup required.

steam_import_status exists solely to make bulk imports resumable. Importing a large Steam library (hundreds of games) requires one API call per game for achievements, each sleeping 1 second to respect Steam's rate limit. An import of 300 games takes 5+ minutes. If the server restarts mid-import, this table records which app IDs have already been processed so the import resumes from where it stopped rather than starting over or creating duplicates.

CREATE TABLE steam_import_status (
    steam_app_id          INTEGER UNIQUE,
    game_imported         INTEGER DEFAULT 0,
    achievements_imported INTEGER DEFAULT 0,
    error_message         TEXT    -- null on success; stores API error message on failure
);

Snapshot Tables

The snapshot system is what makes the activity feed possible. Steam exposes only cumulative playtime — there is no session log, no start/end timestamp, no "you played this game from X to Y" data. To reconstruct session history, the system takes a nightly snapshot of total hours per game and computes the difference between consecutive snapshots.

daily_snapshots records a daily summary: total library hours and total games with any playtime. The UNIQUE constraint on date means the nightly job can upsert — if the job runs twice on the same day (e.g. after a restart), it overwrites rather than creating a duplicate row.

daily_game_snapshots records the per-game breakdown. Each row is one game's cumulative hours on one date. The UNIQUE(date, game_id) composite constraint serves the same upsert purpose. This is the table that makes diff queries efficient: to find what was played on a given day, self-join this table on game_id with an offset of one day.

CREATE TABLE daily_game_snapshots (
    id           INTEGER PRIMARY KEY AUTOINCREMENT,
    date         TEXT NOT NULL,
    game_id      INTEGER NOT NULL,
    game_title   TEXT NOT NULL,
    hours_played REAL NOT NULL,
    cover_url    TEXT,
    UNIQUE(date, game_id),
    FOREIGN KEY(game_id) REFERENCES games(id) ON DELETE CASCADE
);

The game_title and cover_url columns are denormalised into the snapshot table. This means the feed can be queried without joining back to games, and historical entries remain accurate even if a game's title or cover URL is later edited.

Steam API Integration

The Steam Web API is used for four distinct operations: fetching the owned game list, fetching achievement data for a specific app, fetching real-time player presence (for the now-playing endpoint), and fetching recently played games. Each operation hits a different endpoint and has different rate-limit characteristics.

Owned Games and Playtime Sync

The GetOwnedGames endpoint returns the complete library with cumulative playtime in minutes. This is what the background poller calls every 30 minutes. The response is a flat list — no pagination, no cursor — so the entire library is returned in one call. Updating local hours is a straightforward batch of UPDATE statements, one per game with a steam_app_id.

Achievement Import and Rate Limiting

Fetching achievements requires a separate call to GetPlayerAchievements per game, since Steam has no bulk achievement endpoint. For a library of hundreds of games, this means hundreds of sequential API calls. Steam rate-limits the achievement endpoint to approximately one request per second; exceeding this results in HTTP 429 responses that silently return empty data rather than an explicit error code in some cases.

The import loop enforces a mandatory 1-second sleep between achievement fetches. Games are sorted by playtime descending before import begins, so the most-played titles (and their achievements) are imported first in case the import is interrupted.

# sort most-played first so high-priority games import even if interrupted
steam_games.sort(key=lambda x: x.get('playtime_forever', 0), reverse=True)

for i, game in enumerate(steam_games):
    app_id = game['appid']

    # idempotent: skip if already imported
    status = import_status.get(app_id)
    if status and status['game_imported'] == 1:
        continue

    # ... insert game row, then fetch achievements ...
    conn.commit()   # per-game commit: a restart loses at most one game's work

    if i < len(steam_games) - 1:
        time.sleep(1)   # Steam rate limit on achievement endpoint

The per-game commit() inside the loop is deliberate. A single transaction spanning the entire import would mean a crash mid-way loses all inserted data. With a commit per game, the worst case is one game's work lost on a restart — and steam_import_status ensures the completed games are not re-imported.

Auto-Completion Detection

During any Steam sync, if the fetched achievement list shows 100% unlock rate, the game's status is automatically set to 'Completed' and the completion date is derived from the latest achievement unlock date. This means 100% games self-manage their status without manual intervention, and the completion date is accurate to the day the last achievement was earned rather than the day of the sync.

unlocked = sum(1 for a in achievements if a.get('achieved', 0))
total    = len(achievements)

if total > 0 and unlocked == total:
    dates = [a['unlock_date'] for a in achievements if a['unlock_date']]
    try:
        completion_date = max(
            datetime.strptime(d, '%Y-%m-%d') for d in dates
        ).strftime('%Y-%m-%d')
    except (ValueError, TypeError):
        completion_date = datetime.now().strftime('%Y-%m-%d')

    cur.execute(
        'UPDATE games SET status=?, completion_date=? WHERE id=?',
        ('Completed', completion_date, game_id)
    )

Snapshot System

The nightly snapshot job runs at 00:05 EST (05:05 UTC) via a daemon thread running the schedule library. It runs in three phases: first a Steam hours sync to ensure the snapshot captures today's play, then a check for new Steam games added since the last run, then the snapshot itself.

Recording a Snapshot

The snapshot procedure writes two things: a summary row into daily_snapshots (total hours across the whole library, count of games with any playtime), and a per-game row for every game with hours into daily_game_snapshots. Both use upsert logic — if a row for today already exists (e.g. the job ran twice), it's overwritten rather than duplicated.

def record_daily_snapshot(self):
    date_str = self.get_current_date_est().isoformat()
    cur.execute('SELECT SUM(hours_played) FROM games WHERE hours_played IS NOT NULL')
    total_hours = cur.fetchone()[0] or 0

    # upsert summary
    cur.execute('SELECT id FROM daily_snapshots WHERE date=?', (date_str,))
    if cur.fetchone():
        cur.execute('UPDATE daily_snapshots SET total_hours=?, games_played=? WHERE date=?',
                    (total_hours, games_count, date_str))
        cur.execute('DELETE FROM daily_game_snapshots WHERE date=?', (date_str,))
    else:
        cur.execute('INSERT INTO daily_snapshots (date, total_hours, games_played) VALUES (?,?,?)',
                    (date_str, total_hours, games_count))

    # per-game rows
    for game in cur.execute('SELECT id, title, hours_played, cover_url FROM games WHERE hours_played > 0'):
        cur.execute(
            'INSERT INTO daily_game_snapshots (date, game_id, game_title, hours_played, cover_url)'
            ' VALUES (?,?,?,?,?)',
            (date_str, game['id'], game['title'], game['hours_played'], game['cover_url'])
        )
    conn.commit()

Diffing Snapshots

To reconstruct a session on a given date, the diff query self-joins daily_game_snapshots on game_id, linking each date's row to the previous day's row. Any game whose hours increased — even fractionally — shows up as played that day. A threshold of 0.09 hours (about 5 minutes) filters out floating-point rounding noise from Steam's minutes-to-hours conversion.

# 30-day diff in a single query — no Python loop over dates
SELECT dgs1.date, dgs1.game_id, dgs1.game_title, dgs1.cover_url,
       dgs1.hours_played as total_hours,
       dgs1.hours_played - COALESCE(dgs2.hours_played, 0) as hours_added
FROM daily_game_snapshots dgs1
LEFT JOIN daily_game_snapshots dgs2
    ON dgs1.game_id = dgs2.game_id
   AND dgs2.date = date(dgs1.date, '-1 day')   -- SQLite date arithmetic
WHERE dgs1.date >= date('now', '-30 days')
  AND (dgs1.hours_played - COALESCE(dgs2.hours_played, 0)) > 0.09
ORDER BY dgs1.date DESC, hours_added DESC
LIMIT 100

The LEFT JOIN handles games with no previous-day entry — a game that appears in the snapshot for the first time (either newly imported or played for the first time since the tracker started) has a NULL join row, so COALESCE(..., 0) treats it as if it had zero hours the day before. This correctly counts a first-time play as the full hours total, not as missing data.

SQLite's date('now', '-1 day') date arithmetic handles month and year boundaries correctly, including leap years. There is no need to compute the previous date in Python; SQLite's date functions are used directly in the query.

Activity Feed

The activity feed endpoint combines three data sources — live sessions, historical sessions, and recent achievement unlocks — into a single chronological list. The interesting challenge is that "live" and "historical" sessions can overlap: a game played today will appear in both the live detection query and the historical diff for today's snapshot date. The feed must merge these without double-counting.

Live Session Detection

Live sessions are detected without waiting for the midnight snapshot. The query compares each game's current hours_played value against the most recent snapshot row. Any game whose current hours exceed the snapshot by more than 0.09 is considered actively in a session or recently played since the last snapshot.

# subquery finds the most recent snapshot date
SELECT g.id as game_id, g.title as game_title, g.cover_url,
       g.hours_played as current_hours,
       COALESCE(dgs.hours_played, 0) as snapshot_hours,
       g.hours_played - COALESCE(dgs.hours_played, 0) as hours_added
FROM games g
LEFT JOIN (
    SELECT game_id, hours_played
    FROM daily_game_snapshots
    WHERE date = (SELECT MAX(date) FROM daily_game_snapshots)
) dgs ON g.id = dgs.game_id
WHERE g.hours_played > COALESCE(dgs.hours_played, 0) + 0.09
ORDER BY hours_added DESC

The inner subquery anchors to the latest snapshot date globally — not per-game. This means if a game was never in a snapshot before (newly added), its dgs.hours_played is NULL and COALESCE treats it as zero, correctly marking all its hours as "added since last snapshot".

Merging and Deduplication

After running all three queries (live, historical, achievements), the Python layer performs deduplication. Live games are collected into a set by ID. When iterating historical sessions, any entry for today's date whose game ID is in the live set is skipped — that game is already represented in the feed as a live session. Achievement groups are assembled by (date, game_id) to batch multiple unlocks from the same game on the same day into a single feed item.

live_game_ids = {s['game_id'] for s in live_sessions}

for s in hist_sessions:
    # today's snapshot entry for a live game would double-count it
    if s['date'] == today and s['game_id'] in live_game_ids:
        continue
    feed.append({'type': 'session', 'live': False, **s})

# sort: live items first (they have today's date + live=True), then newest date
feed.sort(key=lambda x: (not x.get('live'), x['date']), reverse=True)

The sort key is a tuple: negated live flag first, then date string (ISO format, so lexicographic sort is chronological). Live items have live=True, which negates to False (0), sorting them before non-live items (True → 1). Within each group, newer dates sort higher with reverse=True.

Now Playing — Three-Tier Detection

The /api/now-playing endpoint is public (no auth required) and is used by the main portfolio site to show what's currently being played. It uses three detection strategies in priority order, falling back to the next if the previous produces no result.

# Strategy Source Latency
1 Real-time presence Steam GetPlayerSummaries ~200ms API call
2 Recent hours delta Steam GetRecentlyPlayedGames + DB last_played_at ~200ms API call
3 DB fallback last_played_at column in games table sub-millisecond

Tier 1: Real-Time Presence

GetPlayerSummaries returns a gameextrainfo field when the user is currently in a game on Steam. This is the same data shown on a Steam profile page. When present, it's the most accurate signal — it means the game is open right now. The app ID from the API is cross-referenced against the local database to attach the cover URL and total hours.

Tier 2: Recently Played Delta

Steam's GetRecentlyPlayedGames returns up to 10 recently played games with biweekly playtime. When tier 1 produces nothing (the user is set to Invisible, or not in a game), this endpoint is polled instead. The returned app IDs are fetched from the DB in a single IN query, and the game with the freshest last_played_at timestamp in the DB is chosen as the best candidate.

# batch DB lookup — one query for all returned app_ids
app_ids = [g['app_id'] for g in recently]
placeholders = ','.join('?' * len(app_ids))
cur.execute(
    f'SELECT id, steam_app_id, cover_url, hours_played, last_played_at FROM games WHERE steam_app_id IN ({placeholders})',
    app_ids
)
db_rows = {r['steam_app_id']: r for r in cur.fetchall()}

# pick whichever has the most recent last_played_at in the DB
best = recently[0]
for g in recently[1:]:
    row = db_rows.get(g['app_id'])
    if row and row['last_played_at'] > db_rows.get(best['app_id'], {}).get('last_played_at'):
        best = g

Tier 3: DB Fallback

If both Steam API tiers return nothing (API unavailable, no recent games in the 2-week window), the endpoint falls back to a simple ORDER BY last_played_at DESC LIMIT 1 query against the local database. This always returns something as long as any game has been played since the tracker started — the most recently touched entry.

The three-tier structure means the endpoint degrades gracefully: full API access gives real-time accuracy; API unavailability falls back to a best-effort recent guess; complete API failure still returns a reasonable answer from the DB. The playing: true flag in the response is only set when tier 1 fires.

Background Threads

Two daemon threads run alongside the Flask application process. Neither uses Flask's app context (all database access opens its own connection), so they are safe to start before the first request.

Nightly Scheduler

The scheduler thread runs a schedule loop waking every 60 seconds to check for pending jobs. One job is registered: the 05:05 UTC (00:05 EST) snapshot task. The job sequence is: sync Steam hours → check for new Steam games → record snapshot. This ordering ensures the snapshot captures today's play rather than yesterday's cached values.

def setup_daily_scheduler(tracker):
    def job():
        update_all_steam_hours_sync()   # fetch current hours from Steam
        sync_new_steam_games()          # import games added since last run
        tracker.record_daily_snapshot() # write snapshot rows

    schedule.every().day.at('05:05').do(job)

    def run_scheduler():
        while True:
            schedule.run_pending()
            time.sleep(60)

    threading.Thread(target=run_scheduler, daemon=True).start()

Steam Poller

The poller thread wakes every 30 minutes to sync all Steam hours. It starts with a 60-second delay so the Flask app finishes initialising before the first API call. Errors are logged but do not crash the thread — the next poll happens 30 minutes later regardless.

def setup_steam_poller(interval_minutes=30):
    def poll():
        time.sleep(60)            # startup delay
        while True:
            try:
                update_all_steam_hours_sync()
            except Exception as e:
                logger.error(f"Steam poll error: {e}")
            time.sleep(interval_minutes * 60)

    threading.Thread(target=poll, daemon=True, name='steam-poller').start()

Both threads are daemon threads, meaning they are automatically killed when the main process exits. There is no shutdown coordination or graceful drain — for a personal-use application that runs continuously, this is acceptable. A production service would use proper worker processes (Celery, RQ) with shutdown hooks.

Games API — Avoiding N+1 Queries

The main games list endpoint could naively be implemented as a query for all games, then a separate query per game to count its achievements. For a library of 300 games, that's 301 database round-trips. Instead, a single LEFT JOIN with conditional COUNTs returns every game alongside its achievement progress in one round-trip.

SELECT g.*,
       COUNT(CASE WHEN a.unlocked=1 THEN 1 END) AS unlocked_achievements,
       COUNT(a.id)                               AS total_achievements,
       CASE
         WHEN COUNT(a.id) > 0 THEN
           ROUND(COUNT(CASE WHEN a.unlocked=1 THEN 1 END) * 100.0
                 / COUNT(a.id), 1)
         ELSE 0
       END AS completion_percentage
FROM games g
LEFT JOIN achievements a ON g.id = a.game_id
GROUP BY g.id
ORDER BY g.is_favorite DESC, g.created_at DESC

The LEFT JOIN (rather than INNER JOIN) ensures games with no achievements still appear in the result — their achievement columns are NULL or zero. The CASE WHEN COUNT(a.id) > 0 guard prevents division-by-zero in the percentage calculation for those games.

The CASE WHEN a.unlocked=1 THEN 1 END expression inside COUNT is a standard SQL trick for conditional counting: it evaluates to 1 for unlocked achievements and NULL for locked ones, and COUNT ignores NULLs. This is equivalent to a WHERE clause but applied to a subset of rows without losing the overall GROUP BY aggregate.

Filtering and Sorting

The endpoint accepts query parameters for status filtering, platform filtering, and sort order. These are assembled dynamically to avoid generating invalid SQL while preventing injection by parameterising all user-supplied values:

conditions = ["1=1"]   # always-true base condition
params     = []

if status := request.args.get('status'):
    conditions.append("g.status = ?")
    params.append(status)

if platform := request.args.get('platform'):
    conditions.append("g.platform = ?")
    params.append(platform)

ALLOWED_SORTS = {'title', 'hours_played', 'rating', 'created_at'}
sort_col  = request.args.get('sort', 'created_at')
if sort_col not in ALLOWED_SORTS:
    sort_col = 'created_at'   # reject unknown column names entirely

where_clause = ' AND '.join(conditions)
query = f"""
    SELECT g.*, COUNT(CASE WHEN a.unlocked=1 THEN 1 END) AS unlocked_achievements ...
    FROM games g LEFT JOIN achievements a ON g.id = a.game_id
    WHERE {where_clause}
    GROUP BY g.id
    ORDER BY g.is_favorite DESC, {sort_col} DESC
"""
cur.execute(query, params)

The sort column is validated against an allowlist of known-safe column names rather than parameterised. SQLite (and SQL generally) doesn't support parameterised column names — only parameterised values. The allowlist prevents ORDER BY injection while keeping the query dynamic.

Data Integrity — Edge Cases in the Diff System

The snapshot diff system looks simple but has several edge cases that required deliberate handling.

Steam Minutes-to-Hours Rounding

Steam reports playtime in integer minutes. Converting to hours with round(minutes / 60, 1) introduces rounding artifacts: two snapshots with no actual play between them can show a difference of up to 0.1 hours (6 minutes) simply from how the rounding lands. The 0.09-hour threshold in the diff query accounts for this — changes smaller than ~5.4 minutes are filtered out as rounding noise rather than real play.

Games Added Mid-Archive

If a game is added to the local database after the tracker has been running for a while, its first snapshot entry will have no prior-day row to diff against. The LEFT JOIN ... COALESCE(dgs2.hours_played, 0) pattern handles this: the NULL from the missing join becomes zero, and the full current hours show as "added" for that day. This is technically not accurate (the hours were accumulated before the tracker was set up), but it's the correct behaviour for "when did the tracker first see this game".

Deleted Games

When a game is deleted from the games table, ON DELETE CASCADE removes its achievement rows and snapshot rows. This means deleted games disappear from the feed immediately. Historical dates that previously showed that game will look like that game was never played on those dates. This is acceptable for a personal tracker (deletions are intentional) but would need audit-log behaviour in a multi-user system.

Steam Hours Going Backwards

Rarely, Steam's API returns a lower playtime value than a previous call. This can happen due to server-side corrections, game resets, or VAC actions. The diff threshold handles the false-positive case (a small dip looks like no play). The local database is only updated with higher hour values to avoid writing retrograde data:

# only update if Steam reports more hours than we have locally
cur.execute(
    'SELECT hours_played FROM games WHERE steam_app_id = ?', (app_id,)
)
row = cur.fetchone()
if row and steam_hours > (row['hours_played'] or 0):
    cur.execute(
        'UPDATE games SET hours_played = ?, last_played_at = ? WHERE steam_app_id = ?',
        (steam_hours, datetime.utcnow().isoformat(), app_id)
    )

Design Decisions

Design Decisions

SQLite over PostgreSQL

The application runs on a single server with a single user. SQLite's file-based storage means zero configuration, no separate process, trivial backup (copy the file), and no connection pooling to manage. The write concurrency limitation — SQLite serialises all writes — is not a constraint here. The only concurrent writes are the background threads, which each open short-lived connections and commit quickly.

Snapshot Granularity

Snapshots are daily, not per-session. A per-session log would require the app to be open during play, which isn't guaranteed. The daily snapshot approach works passively — it records whatever cumulative state Steam reports at midnight, with no dependency on the app being active during a play session. The tradeoff is that a day with two separate play sessions for the same game is indistinguishable from one long session.

Denormalised Title and Cover in Snapshots

Storing game_title and cover_url directly in daily_game_snapshots avoids a JOIN on every feed query and preserves historical accuracy: if a game's title is corrected in the games table, old feed entries still show the title as it was at the time of the snapshot. The storage overhead is negligible — text columns in SQLite compress well and the snapshot table grows by at most a few hundred rows per day.

EST Timezone for Snapshot Date

The snapshot date is computed in EST (UTC−5) rather than UTC. This means "today" for snapshot purposes aligns with the calendar day as experienced by the user. A game played at 11pm local time shows up in that day's snapshot, not the next day's. The UTC equivalent job time (05:05) was chosen to fall well after midnight EST on any date, including daylight saving transitions.