[Fim4l] Statistics issue use-case

Peter Schober peter.schober at univie.ac.at
Mon May 13 12:47:48 CEST 2019


* Bernd Oberknapp <bo at ub.uni-freiburg.de> [2019-05-11 12:38]:
> Regarding RA21, this is to some extend based on the fact that some
> publishers already have tried to enforce in contract negotiations,
> with reference to RA21, that libraries switch to SAML as the only
> authentication method and in some cases that they not only provide a
> persistent/targeted/pairwise ID but also personal data like names
> and email addresses.

So on the one hand libraries agree to such contract terms -- releasing
other people's personal data to such publishers for no reason and
without a legal basis (for the sake of the argument we'll have to
assume that the SP does not in fact need any of that data, otherwise
there'd be no problem to begin with) -- on the other hand libraries
here are campaigning and acting as if they were the last and only
defenders of privacy.

The argument (made earlier on this list, IIRC) that SAML shouldn't be
used because it's possible to misconfigure it is also interesting.
Web and e-mail servers can also be (and sometimes are) misconfigured,
sometimes resulting in leaking personal data. Still that doesn't stop
anyone from using the technology. Why should this be different for SAML?

Wanting some magic bullet that works consistently everywhere
everytime, is secure (per the current state of the art), is as privacy
preserving as possible but sufficiently flexible to cater to all
relevant use-cases, requires no client set-up whatsoever, does not
require subjects to change their content discovery strategies or tools
and CANNOT POSSIBLY BE MISCONFIGURED to "leak" personal data... is an
interesting set of requirements. I'd sure like to see any alternative
that satisfies those criteria.

> That's why many libraries, at least in Germany, wouldn't support any
> recommendation that promotes SAML as the only authentication method
> or doesn't include anonymous access via SAML.

A blanket recommendation to send more data than is (sometimes)
necessary would violate fundamental principles of data protection
(minimalism) and would possibly risk violating European data
protection law. (Though you might ask at what point you're trying to
be more catholic than the pope.)

So a recommendation would probably have to take into account the
differences between two types of service: Those that cannot work at
all without recognising returning subjects (need a stable identifier
for the subject) and those that do not (need as little data as
possible) while in both cases still fulfilling requirements to perform
access control as needed.
SAML Metadata is of course suitable to express this in as much detail
as needed on a per-service basis. But if an Entity Category is needed
(not that we know it is, yet) that would mean we'd need two different
categories, even for the same general use-case of
anonymous/pseudonymous access to licensed e-resources: one with the
ability to recognise returning subjects, one without.

Whether that added granularity is worth the added complexity (yet
something that could be misconfigured!) -- for the exact same use-case
-- is an open question.
Seems we're in for another contradiction: Libraries want stuff that
cannot be misconfigured, but they also need more than
"one-size-fits-all" to ensure the least amount of data is sent for
each of the common cases.

Unless we know that the "cannot work at all without recognising
returning subjects" case is not a current (and will not become a
common) requirement? Then a single category (or configuration
recommendation) would suffice, one that would not recommend to release
a stable idenifier for the subject.

Of course then we're still left with the problem of optional
personalisation (resulting in Yet Another Username and Password for
the subject, at each and every SP where that's required for the
desired featurs to work).

-peter



More information about the FIM4L mailing list