
I have a proposal on the issue of "optional personalisation" but to get us all on the same page I'm trying to clarify a few things first:
* Jiri Pavlik jiri.pavlik@techlib.cz [2021-03-15 16:46]:
IMHO there are users who wish to have anonymous access and there are also users who wish to have a profile, use personalisation. So a solution there could be let users decide about releasing pairwise-id (eduPersonTargetedID) using CAR.
The main issue here is that changing all IDPs in existence to support user choice at login time does not scale. (Of course many IDPs already provide consent-based interfaces but probably not by offering fine-grained attribute release policies.) (A part of that issue is that whether an IDP offers such a UI or not is not generally observable from outside of the IDP org.[1])
Another, as Meshna points out, is that asking th subject whether 'FooTechnicalWhattheheck' should be released to some SP, is not likely sufficient for them to determine the consequences of that decision: The IDP would need to provide more context on the specific use-cases this would enable/prevent at the SP and the IDP is not well-positioned to provide that. (E.g. we don't have metadata on "why" something is needed)
[ There's a third aspect, the one of to enabling optional personalisation for accessing institutionally licensed content using local account registrations at the SP and the problems that come with this, but I'm explicitly NOT dealing with that one in this post. ]
Now, *both* of these aspects (too many IDPs to change, IDP lacking per-attribute context to authoritatively speak on their use at the SP) could be solved by the SP, though:
1. The SP could ask the subject on-access whether or not to enable* personalisation features based on the received pseudonymous identifier. The SP is perfectly positioned to provide the context and consequences to the subject in order for her to make an informed decision. (I.e., the SP knows about data usage itself makes and can explain it.)
2. In this field (access to institutionally licenced resources) there are significantly fewer SPs that would need to be changed than there are IDPs accessing those SPs. (I.e., scales better than the "change all the IDPs in the world" approach.)
Note that in this model the IDPs always release the pseudonymous identifier (except for SPs that signal they never need it, of course). The SP just asks for the subject's permission before actually making use of it (for the reasons given above).
Of course that's precisely also the reason why such a model is unlikely to succeed (besides SP's assumed reservations wrt having to add yet more UI clutter): People would have to *trust* the SP to Do The Right Thing™ (and not, say, track subjects using the provided identifier no matter what they chose). At least from what I've read on scholarlykitchen et al. (en-)trusting publishers with a persistent identifier (that enables personalisation, but also tracking) seems to be completely and utterly out of the question. Case in point: Bernd O. on this list would rather remove the most obvious and direct method of accessing an SP -- open its home page and "log in" -- than release a pseudonymous identifier to an SP for personalisation purposes. (I.e., the reservation here is not even specific to /optional/ personalisation, which I'm discussing here, but to any kind of personalisation based on a privacy-preserving identifier provided by the institution):
* Bernd Oberknapp bo@ub.uni-freiburg.de [2021-03-13 14:40]:
If publishers would force us to send an eduPersonTargetedID just for personalization I would consider dropping Shibboleth for those publishers and using our EZproxy instead.
There can be no doubt that data minimalism (here: avoiding data that also enables tracking to even become available to the SP) is more effective than trying to prevent data misuse after it has been shared with the SP (whether by legal or technical measures). But we'd be overly naïve to think the SP couldn't perform similar kinds of tracking *without* the pseudonymous identifier present (cf. browser fingerprinting, "de-anonymisation") and sent from the IDP.
And of course scepticism about is more than appopriate in times of pervasive online (and increasingly offline/IRL) surveillance. But I wonder whether we couldn't find some compromise in this specific vertical sector, for these specific, very limited, data items (an authorisation signal and a pseudonymous identifier) that would enable IDPs to en-*trust* SP with that specific data and therefore allow moving the consent-gathering to the SP.
Let the IDP always release a pseudonymous identifier to SPs of a certain class/category where the SP: * always requires recognising subjects across sessions, or * can make use of the identifier for optional personalisation, based on SP-gathered user consent.
I.e., the requirement for "SP-gathered user consent before use" would become part of the entity category definition, which could theoretically support later audits of an SP against those declarations of conformity. Would that help improve the trust into such SPs, sufficient to provide them with a persistent (longer-lived than a single session) identifier?
-peter
[1] The SAML 2.0 specification actually defines signalling for that in section 8.4 but I'm not aware of any implementations or deployments making use of that. https://www.oasis-open.org/committees/download.php/56776/sstc-saml-core-erra...