
A more general take on the "service requires no personal data" vs. "services requires to recognise returning subjects" vs. "some functionality of the service requires recognising returning subjects" problem in conjunction with FIM, prompted by off-list communications:
One of the open issues is that an SP simply cannot offer any personalisation based on federated authentication (i.e., ask the subject for more data when needed) *unless* the IDP already sent along an identifier that allows the SP to recognise the subject every time. I've made a note for this here https://docs.google.com/document/d/1pIaEXfw9ZWnXM4p6Dd2Lri7RFWKgr7ObKLEGfUy2... but this needs more words than are appropriate to add there.
At the point of sending an stable identifier along during login ("option 5.b" in the document) some people are already up in arms -- see some of the comments submitted on the RA21 paper[1] and the two articles that prompted the "myth busting"-response[2] -- because if IDPs now always send along (even opaque, pairwise/SP-specific) identifiers with every login (in order to enable the SP to provide personalisation features) that also necessarily allows an SP to track every access from that subject to that SP while being logged in (whether such tracking happens or not; it's then possible).
Which would arguably lessen the subject's privacy compared to some other access models, e.g. if access only comes from/via an EZproxy/reverse proxy server it's the proxy's IP address and HTTP User Agent that the publisher sees, not the subject's. The presence of the proxy makes tracking of accessed content per subject more difficult, though certainly not impossible given modern and ever-improving de-anonymisation techniques. (I.e., the privacy-enhancing features or side-effects of reverse proxies will only diminish over time due to the increasing compexity of web applications. An arms race that proxies will not be able to keep up with, IMHO.) Note that I'm not saying that reverse proxying is the overall preferrable model over FIM -- it's major drawback lies in the assumption of a certain access starting point during content discovery (also no standards exist for their behaviour or configuration) -- and the privacy protection proxies can offer will be reduced going forward as claimed above.
There are a few ways to deal with this:
1. Accept the potential consequences of user tracking (while enabling easy access to all features at SPs, including those that require personalisation) by recommending to always send along an opaque service-specific identifier.
2. Prevent easy personalisation by recommending to avoid sending along any (stable) identifiers. Of course offering personalisation would still be possible but it will require the subject to register a local account with the publisher, resulting in other problems:
2.1 Authorization: Since the subject is authorized to access licensed resources on behalf of the institution/library covering the license costs a local login to the SP using SP-issued credentials breaks the connection the subject has with its linstitution/library (and therfore the signal whether the subject is authorized to access a given resource is lost -- or at least frozen at the point of registration of the local account. Changes in the affiliation the subject has with the linstitution/library cannot be reflected automatically when using the local account).
2.2 Ease of use: Having to register and maintain a local account with the SP -- with *every* SP I need to rely on features that require personalisation -- means having to manage Yet Another Username and Password. And this often comes with hightened support costs and may also raise the barrier to get legal access to the desired resources/features due to having to manage another set of credentials.
2.3 Security: There should be plenty of evidenve that subjects to not manage credentials for the many web sites well (or securely) and that re-use of credentials across independent sites creates security many problems.
3. I guess one could also try to categorise SPs into two separate, non-overlapping sets: Those that require personalisation and those that do not. Not sure how useful such a distinction would be going forward, when more personalisation may be added by platform providers (or expected by subjects using those platforms) or how (and by whom) the decision what category a publisher/SP should be in would be made for each SP.
4. Technically it's possible today to have the IDP ask the subject to consent before sending data along to an SP. But there is no technology widely deployed that can make that process sufficiently easy to use and understand here, IMO. (E.g. required attributes -- those granting access based on purely non-personal data signalling the institution's assertion that the subject should be allowed access -- should always be sent, but asking the subject for consent may lead to the subject not enabling release of that data, resulting in "Access Denied" errors at the SP, which may not be that easy/obvious to be fixed by the subject. Similarly, optional identifying data would have to be conciously allowed by the subject -- but only if/when s/he intends to use personalisation at the given SP -- but there's noone there to inform the subject about the finer details of this decision at this point. Should the IDP then remember that choice? Or ask every time? Or ask when to ask again? Too much choice here is bad as the UX can easily be overwhelming, resulting in bad decisions, etc.)
So 4 is probably too hard to get deployed and to get done right everywhere (esp. if people already think releasing name and email by mistake is too easy as it is in current systems).
I'm not sure 3 provides enough benefit/s to justify the problems in getting it deployed. Which would mostly leave options 1 or 2 for us to recommend:
* Accept tracking possibilities by publishers (and the fact that we might not prevent that even with reverse proxies going forward),
or
* Accept that no (easy, secure) personalisation will be possible and that personalisation offered still has problems with tying local accounts at the SP to institutionally licensed content.
TBH I don't know what listing the options in section 5 of the guidelines draft is intended to achieve -- that each SP is clearly sorted into one of those classes (I'm avoiding the term "category" here)? Or that each IDP picks their local preference? The aim of this activity should be a consistent, easily-understood and easily-deployed set of guidelines, right? So how many "options" would still be presented on equal terms next to each other in such a document?
Best regards, -peter
[1] https://groups.niso.org/apps/group_public/document.php?document_id=21376 [2] https://scholarlykitchen.sspnet.org/2018/02/07/myth-busting-five-commonly-he...