Automating Word Count Display on a Static Blog with Python + Git Hooks
May 15, 2025 · 5 min read
Why I Wanted This
While working on my personal blog hosted on GitHub Pages, I had a simple idea: what if each post could show its word count automatically? I did not want to do this manually or hard-code anything — it felt like something a script should handle. More importantly, I wanted it to update automatically every time I made a commit.
This turned into a fun mini-project that taught me more about Git hooks, Python scripting, encoding quirks, and working around GitHub Pages' static limitations.
The Setup I Ended Up With
portfolio/
├── blogs/ # My HTML blog posts
│ ├── post1.html
│ └── post2.html
├── data/
│ └── blog-word-count.json # Auto-generated word count file
├── scripts/
│ └── pre-commit.ps1 # PowerShell script to update the word count
├── generate_word_counts.py # Python script to scan all blog files
└── .git/
└── hooks/
└── pre-commit # Git hook that triggers pre-commit.ps1
Writing the Python Word Counter
The heart of the setup was a Python script that loops through all HTML files in my blogs/ folder, extracts the visible
text using BeautifulSoup, counts the words, and writes the result to a JSON file. It is flexible enough to work for any
file name.
I ran into a surprising bug early on: I added emoji icons to the terminal output to make the logs friendlier. Windows terminal encoding (cp1252) could not handle them, which broke the script in Git pre-commit context. Lesson learned: keep hooks ASCII-safe. Once that was fixed, the script worked flawlessly.
Automating with PowerShell + Git Hook
Next, I wanted to automate running the script before every commit. I initially tried putting the script directly in
.git/hooks/, but realized that Git ignores anything inside .git/ when cloning or sharing repos. I moved the logic to
a tracked scripts/pre-commit.ps1 file and added a simple shell shim in .git/hooks/pre-commit to call it:
#!/bin/sh
powershell -ExecutionPolicy Bypass -File "scripts/pre-commit.ps1"
The PowerShell script looks like this:
Write-Host "Running generate_word_counts.py..."
$process = Start-Process -FilePath "python" -ArgumentList "generate_word_counts.py" -NoNewWindow -Wait -PassThru
if ($process.ExitCode -ne 0) {
Write-Host "Python script failed. Aborting commit."
exit 1
}
git add data/blog-word-count.json
Write-Host "Word count JSON updated and staged."
Once this was hooked up correctly, committing felt magical — Git ran my script, updated the JSON, and staged it.
Displaying the Word Count on My Site
To actually use the data, I added this simple JS snippet to my HTML pages:
<p>This post has <span id="post1">...</span> words.</p>
<script>
fetch('data/blog-word-count.json')
.then(res => res.json())
.then(data => {
for (const key in data) {
const el = document.getElementById(key);
if (el) {
el.textContent = data[key] + " words";
}
}
});
</script>
Now each post updates its word count dynamically when deployed.
A Few Gotchas Along the Way
- Unicode crashes in terminal output: Using emojis in Python
print()statements causedUnicodeEncodeError. The fix was replacing them with ASCII-safe tags. - Git could not find my script: Initially used a relative path that assumed the hook was executing from a different
location. Fixed by referencing
scripts/pre-commit.ps1directly.
What I Learned
- Git hooks are powerful, but picky about paths and encoding.
- Avoid emojis in anything that runs through Git hooks on Windows.
- You can still build dynamic features on static sites — if you pre-process the data.
- It is satisfying to automate something small and have it just work every time.
Further Reading
- Wikipedia: Quadtree
- Check out my GitHub for implementation examples