Literally just figured out how to get my browsing history piped into Grafana+Loki so that I have a single source of truth regarding my browsing history using a userscript that ignores CORS and just POSTs to Loki's API.
Apparently that's going away entirely now and there is no good way to stream browsing history anywhere if it's not tied to a browser maker's services (even if they are selfhosted a la Firefox Sync).
You can't even read from the browsers history database because browsers for some reason lock the entire database even while running in WAL mode (my original plan was to do something similar to litestream and just attach to the places.sqlite or History files and push the url to Loki, but that just doesn't work).
There are reasonable reasons to do this, but it really seems that it's just to curb user agency with their own data and devices.
> [...] to improve the isolation of data between different origins, cross-origin requests are no longer possible from content scripts, unless the destination website opts in via CORS.
If the server doesn't supply your target domain in the CORS header, the request should fail if my understanding is correct. If it is this would allow a website like YouTube to block SponsorBlock from loading it's data. There might be ways to bypass this but for me it's just a side project, so I haven't really looked too deep into it.
Loki and Grafana running locally on a server (raspberry pi 3B+), your preferred userscript extension plus a script that POSTs to Loki's API [1].
The post function:
function post_to_loki () {
const data = {"streams":[
{
"stream": {
"job":"browser",
"user_agent":Navigator.userAgent,
"nodename":"desktop",
},
"values":[
[Date.now()*1000000, {"location":window.location, "referrer":document.referrer}]
]
}]};
let control = GM_xmlhttpRequest({
"url": "https://loki.example.lan/loki/api/v1/push",
"method": "POST",
"headers": {"Content-Type": "application/json"},
"data": JSON.stringify(data)
});
}
Now whatever actually triggers the POST function is up to you (I ended up using [2]).
In Grafana, I have a few dashboards specific to the facets I want to see.
I have one for Dev stuff which tracks the latest 100 urls, the frequency count of websites, and a State Timeline of all the events. This filters specifically stackoverflow, MDN, any url contains "docs", github, and a few internal urls.
In an other dashboard I have a view of all the places I read online with similar panels to Dev, but with different filters.
Then I have a final dashboard which works solely on Google search queries, but instead of grabbing the search page's URL, I grab the search term from the page I actually clicked on which makes searches much more interesting.
By chance, have a repo of the full extension that you built out? I'm playing around w/ it, but having issues w/ even getting the `pageURLCheckTimer` to do a console.log("asda") at a fixed interval.
Apparently that's going away entirely now and there is no good way to stream browsing history anywhere if it's not tied to a browser maker's services (even if they are selfhosted a la Firefox Sync).
You can't even read from the browsers history database because browsers for some reason lock the entire database even while running in WAL mode (my original plan was to do something similar to litestream and just attach to the places.sqlite or History files and push the url to Loki, but that just doesn't work).
There are reasonable reasons to do this, but it really seems that it's just to curb user agency with their own data and devices.