I spent some time today doing support for Freebird, Puppetry and Easy Diffusion. Identified a bug in Freebird (bone axis gizmos aren't scaling correctly in VR), got annoyed by how little documentation I've written for Puppetry's scripting API, and got reminded about how annoying it is for Easy Diffusion to force-download the poor quality starter model (stock SD 1.4) during installation.
The majority of the day was spent in using a local LLM for classifying emails. I get a lot of repetitive emails for FindStarlink - people telling me whether they saw Starlink or not (using the predictions on the website). The first part of my reply is always a boilerplate "Glad you saw it" or "Sorry about that", followed by email-specific replies. I'd really like the system to auto-fill the first part of the email, if it's a report about Starlink sighting.
Is classification even necessary or correct?
Typing/pasting the first part of the reply tens of times a day is quite cumbersome. On occasion I've received over a hundred emails in a single day, and on a particularly peaky day, 1000 emails in a single day. So user empathy is one aspect, but I'm also a single human being.
I could of course just remove the entire email aspect, and make it a faceless "Yes/No" button on the website (to confirm Starlink sightings). But I think it would reduce the accuracy of the sighting reports, and would also make the site a colder place.
I understand that using an email classifier to auto-insert a reply is also cold, but it's still me replying manually after reading the mail. It's more like an auto-complete on steroids, not an auto-responder.
Classifier details
The Llama 3.1 8B model was pretty accurate. I tested it against a dump of 500 emails that I've labeled in the past as "fail", "success", and "other".
The first message was sent with a "user"
role: "Here is an email. It reports whether the sender saw the Starlink train of satellites."
, followed by the email subject and contents (plaintext).
I also included two messages with a "system"
role, with more hints about classifying the emails, and restricting the output to three labels only.
I also tried with the Llama 3.2 1B model, but it was significantly poorer in accuracy.
Here's the rough script:
OPENAI_API_HOST = "http://localhost:1234"
MODEL_NAME = "Llama-3.1-8B-Lexi-Uncensored-V2-GGUF"
LABELS = ["success", "fail", "other"]
CLASSIFY_PROMPT = "Here is an email. It reports whether the sender saw the Starlink train of satellites."
SYSTEM_MESSAGE1 = {
"role": "system",
"content": 'Reply with "fail" if the user failed to see Starlink, reply with "success" if the user successfully saw Starlink. Otherwise reply with "other".',
}
SYSTEM_MESSAGE2 = {
"role": "system",
"content": 'Understand the sentiment of the email. Sometimes the sender will describe an unrelated topic or ask an unrelated question. Classify such email as "other". Sometimes the user will say that a previous viewing was amazing, or that they can confirm seeing it or that they saw it (the satellites) or that it was visible or that the timings were spot on or correct or that it works well, classify those as "success". Sometimes they will say did not see or don\'t see or was not visible or that it was a let down or generally a negative experience, classify those as "fail".',
}
SYSTEM_MESSAGE3 = {"role": "system", "content": 'reply only with "fail", "success" or "other"'}
def classify(text):
message = {"role": "user", "content": f"{CLASSIFY_PROMPT}\n\n{text}"}
response = requests.post(
f"{OPENAI_API_HOST}/v1/chat/completions",
json={"model": MODEL_NAME, "messages": [message, SYSTEM_MESSAGE1, SYSTEM_MESSAGE2, SYSTEM_MESSAGE3]},
)
if response.status_code != 200:
raise RuntimeError(f"Unexpected response from server. Status code: {response.status_code}:", response.text)
response = response.json()
response = response["choices"][0]["message"]["content"].lower().strip()
if response not in LABELS:
print("---")
print(f"Unexpected label: {response} for {text}")
print("---")
response = "other"
return response
Next step
Now I'd like to fetch the latest emails for the starlink email address, and then run the classifier, and have it save a draft reply for each email if it is classified as success
or fail
. I'm okay with running this script on my PC, maybe once every morning.
I tried writing a plugin for Thunderbird, and to be honest I didn't enjoy the experience. The API and developer tooling is quite nice, but Thunderbird is really slow and flaky at raising the 'new email' event handler. And the UI would freeze occasionally. I spent an entire afternoon fighting the system, and didn't even get to link the local LLM to it.
Plus I'd like a simpler UI, which presents a card layout of all the emails that need my attention, and a simple textbox under it showing the proposed reply. If I agree, I can type any additional content and/or press Send. Or I can click a button to pick the correct classification, and do the same. Reducing my effort is important to me.
So I also wrote a simple script that talks to GMail directly (using Google's python library), and got it to fetch my emails. This could be expanded to build such a UI for myself, since I don't need a general-purpose email client (GMail's web interface is enough for me).
In general, I'm surprised that we don't have programmable email clients, or email clients that classify emails for you etc. Like I'd expect GMail to atleast offer this as a feature, with all the talent and compute Google has. Maybe they're working on this?
For now, I'm probably going to park this for a while.