Re: [Fim4l] FIM4L guidelines

13 Mai 2019


      A more general take on the "service requires no personal data"
vs. "services requires to recognise returning subjects" vs. "some
functionality of the service requires recognising returning subjects"
problem in conjunction with FIM, prompted by off-list communications:
One of the open issues is that an SP simply cannot offer any
personalisation based on federated authentication (i.e., ask the
subject for more data when needed) *unless* the IDP already sent along
an identifier that allows the SP to recognise the subject every time.
I've made a note for this here
https://docs.google.com/document/d/1pIaEXfw9ZWnXM4p6Dd2Lri7RFWKgr7ObKLEGfUy2...
but this needs more words than are appropriate to add there.
At the point of sending an stable identifier along during login
("option 5.b" in the document) some people are already up in arms --
see some of the comments submitted on the RA21 paper[1] and the two
articles that prompted the "myth busting"-response[2] -- because if
IDPs now always send along (even opaque, pairwise/SP-specific)
identifiers with every login (in order to enable the SP to provide
personalisation features) that also necessarily allows an SP to track
every access from that subject to that SP while being logged in
(whether such tracking happens or not; it's then possible).
Which would arguably lessen the subject's privacy compared to some
other access models, e.g. if access only comes from/via an
EZproxy/reverse proxy server it's the proxy's IP address and HTTP User
Agent that the publisher sees, not the subject's. The presence of the
proxy makes tracking of accessed content per subject more difficult,
though certainly not impossible given modern and ever-improving
de-anonymisation techniques. (I.e., the privacy-enhancing features or
side-effects of reverse proxies will only diminish over time due to
the increasing compexity of web applications. An arms race that
proxies will not be able to keep up with, IMHO.)
Note that I'm not saying that reverse proxying is the overall
preferrable model over FIM -- it's major drawback lies in the
assumption of a certain access starting point during content discovery
(also no standards exist for their behaviour or configuration) -- and
the privacy protection proxies can offer will be reduced going
forward as claimed above.
There are a few ways to deal with this:
1. Accept the potential consequences of user tracking (while enabling
easy access to all features at SPs, including those that require
personalisation) by recommending to always send along an opaque
service-specific identifier.
2. Prevent easy personalisation by recommending to avoid sending along
any (stable) identifiers.
Of course offering personalisation would still be possible but it will
require the subject to register a local account with the publisher,
resulting in other problems:
2.1 Authorization: Since the subject is authorized to access licensed
resources on behalf of the institution/library covering the license
costs a local login to the SP using SP-issued credentials breaks the
connection the subject has with its linstitution/library (and therfore
the signal whether the subject is authorized to access a given
resource is lost -- or at least frozen at the point of registration of
the local account. Changes in the affiliation the subject has with the
linstitution/library cannot be reflected automatically when using the
local account).
2.2 Ease of use: Having to register and maintain a local account with
the SP -- with *every* SP I need to rely on features that require
personalisation -- means having to manage Yet Another Username and
Password. And this often comes with hightened support costs and may
also raise the barrier to get legal access to the desired
resources/features due to having to manage another set of credentials.
2.3 Security: There should be plenty of evidenve that subjects to not
manage credentials for the many web sites well (or securely) and that
re-use of credentials across independent sites creates security many
problems.
3. I guess one could also try to categorise SPs into two separate,
non-overlapping sets: Those that require personalisation and those
that do not.
Not sure how useful such a distinction would be going forward, when
more personalisation may be added by platform providers (or expected
by subjects using those platforms) or how (and by whom) the decision
what category a publisher/SP should be in would be made for each SP.
4. Technically it's possible today to have the IDP ask the subject to
consent before sending data along to an SP. But there is no technology
widely deployed that can make that process sufficiently easy to use
and understand here, IMO.
(E.g. required attributes -- those granting access based on purely
non-personal data signalling the institution's assertion that the
subject should be allowed access -- should always be sent, but asking
the subject for consent may lead to the subject not enabling release
of that data, resulting in "Access Denied" errors at the SP, which
may not be that easy/obvious to be fixed by the subject.
Similarly, optional identifying data would have to be conciously
allowed by the subject -- but only if/when s/he intends to use
personalisation at the given SP -- but there's noone there to inform
the subject about the finer details of this decision at this
point. Should the IDP then remember that choice? Or ask every time?
Or ask when to ask again? Too much choice here is bad as the UX can
easily be overwhelming, resulting in bad decisions, etc.)
So 4 is probably too hard to get deployed and to get done right
everywhere (esp. if people already think releasing name and email by
mistake is too easy as it is in current systems).
I'm not sure 3 provides enough benefit/s to justify the problems in
getting it deployed.
Which would mostly leave options 1 or 2 for us to recommend:
* Accept tracking possibilities by publishers (and the fact that we
  might not prevent that even with reverse proxies going forward),
or
* Accept that no (easy, secure) personalisation will be possible and
  that personalisation offered still has problems with tying local
  accounts at the SP to institutionally licensed content.
TBH I don't know what listing the options in section 5 of the
guidelines draft is intended to achieve -- that each SP is clearly
sorted into one of those classes (I'm avoiding the term "category"
here)? Or that each IDP picks their local preference?
The aim of this activity should be a consistent, easily-understood and
easily-deployed set of guidelines, right? So how many "options" would
still be presented on equal terms next to each other in such a
document?
Best regards,
-peter
[1] https://groups.niso.org/apps/group_public/document.php?document_id=21376
[2] https://scholarlykitchen.sspnet.org/2018/02/07/myth-busting-five-commonly-he...

Re: [Fim4l] FIM4L guidelines

Peter Schober