Re: [Fim4l] Statistics issue use-case

10 Mai 2019


      Hi Peter,
Thanks for giving your as usual valuable thoughts on this. Allow me to 
make my point a bit clearer:
I am still impressed by the negative mindset of public libraries I found 
towards RA21 and for-profit scientific publishers in general, which I 
noticed at meetings and in individual conversations. There you really 
have the feeling of two sides:
1.) The research side that is publicly funded and creating the content 
that in their view should be available to the public as open access. 
Research and HE institution libraries count themselves to this side.
2.) The publishers that take the content and their copyright via 
contracts, and sell it, so that the public funded libraries need to pay 
for public funded content.
These two sides work together and in the old world of printing actual 
books it was a win win situation since the research side didn't need to 
take care of redaction, layout, the actual printing and the marketing. 
One more publisher service was the organising of quality review. The 
actual review was mostly done by the research side.
In the new world of born-digital content provided on the web there is 
less need for the publisher services. Also for printed content the 
redaction and the layout are very often already done by the researcher. 
Of course the publishers have moved on as well providing databases and 
such and some now also provide open access services, but as commercial 
organisation often with share holders that want to consume the profits, 
again they want to charge considerable amounts for basically providing 
PDFs via a web server.
Being a member of the commercial side myself I do know that a company 
has to earn money and has to gain at least some profit or it will die.  
Nevertheless I likewise do understand the ideas of the research side 
about self-organised open access.
In the frame of the open access discurse it is difficult enough to still 
promote access control (although there are of course a number of good 
reasons for that).
Within FIM4L and in contrast to RA21, the research libraries are IMO the 
main stakeholders and if we want to promote FIM in libraries we need to 
take their mind set into account. That is what I was saying from the 
beginning of this activity. On the technical side, there is total 
agreement between FIM4L and RA21 and I am very happy to see that we have 
all 4 stakeholder groups united here to further FIM (libraries, NRNs, 
publishers and IT service providers). But if we now write guidelines 
they IMO should primarily reflect the library interests.
There are a lot of still legal practices of collecting personal data and 
there are a lot of reasons for publishers to do so, from getting better 
to know the customers, creating statistics, collecting data for feeding 
AI tools and for providing better services to the users by 
personalisation features or features like "users that read this article 
also read that article". These all can be seen as legitimate reasons, 
but still it should be the decision of the IdP/library side, who are 
responsible for the personal data of users, how much data they want or 
they are allowed to release. And I think in this case at least we two 
might agree that less is better than more.
I for one would like to make a clear distinction between such commercial 
SPs and publicly funded research infrastructure SPs that are more 
trusted in the research community and that thus could or even should be 
provided with more data, if they ask for them.
Of course you are right that research institutions and publishers have a 
one to one relationship that is formed by contracts and that could 
individually be configured in the release policies of the IdP. Entity 
categories are nevertheless helpful for grouping SPs and for having 
release policies for such groups instead of single SPs.
May be now you have more sympathy for my proposal
Coco and R&S -> more personal data
publisherCoCo -> less (but not zero) personal data
As to your remarks:
...
there are for-profit and non-profit library service operators in
common use today, so that division is not overly helpful in practice.
I wasn't talking about library services providers at all, but only about 
libraries on the IdP side. On the IdP see no general difference between 
for-profit and non-for-profit both have the same obligations towards 
their users. Of course each library needs to decide themselves and may 
be for-profit ones might be more willing to provide personal data to 
publishers, but this is not my point.
In contrast to RA21 I by now do not see for-profit libraries here, but 
of course they are most welcome to join.
...
And at least GDPR doesn't differentiate between those cases at all.
Yes this is of course true, that is why I above made a distinction 
between legal and legitimate, but the IdP side still might want to 
differentiate between the two (research infrastructures and for-profit 
publishers)
...
due to
that contractual relationship with institutions there's an extablished
communication channel that allows to tie in discussions about up
technical integration, too; so if anything they're closer to the
institutions that their "same side" publicly funded research projects)
Yes and especially in cases, where there is no one2one contract, entity 
categories are helpful.
But I understand: if release policies to publishers always need to be 
agreed upon in contracts, you are right, we wouldn't need a category 
publisherCoco. But in our guidelines we should still promote not to sign 
contracts that include release a lot of personal data, without clearly 
stating for what reason the individial attributes are needed on the SP 
side and why they should not be prompted from the user herself instead 
of released by the institution.
...
I also don't share your blanket statement of distrust wrt publishers
who according to your first paragraph above are likely to misuse
personal data even if they claim accordance with CoCo (and therefore,
GDPR).
Again: not every legal usage might be in the interest of the library.
...
Not least because their business model isn't based on
monetising that personal data, AFAIK.
Well what do you mean by monetising? I do not assume that they are 
thinking of selling mail addresses to spammers, but the use cases I 
mentioned above (collect data for better services etc.) are most 
probably legal ways of increasing profits, which I also would call 
monetising. If they are not doing it now, they might want to in future, 
and I am quite sure they are at least thinking about it.
@at all: sorry for this very long post, but that was my take on a "more 
detailed discussion"
Cheers,
Peter
Am 07.05.19 um 14:29 schrieb Peter Schober:
...
Just come back from a short vacation, sry for the delay:

Peter Gietz peter.gietz@daasi.de [2019-04-29 17:24]:

...
Yes, and I was part of that struggle too. The difference: we were (or at
least I was) talking about research infrastructures that often had
involvement of the IdP institution and that always belong to the same "side"
(=publically funded research). In that case of course also personal data
should be sent (but haven't a lot) to the SP and for such cases IMO R&S and
Coco were invented and/or pushed. When I now learn that publishers promote
Coco (and might want to activate the respective entity category), personal
data might then be sent to the other "side", where it is, despite Coco, not
certain that the data are used in a way the IdP wants them toi be used, and
if it is Coco v1 it is only about European SPs any way, isn't it?
[...]
...
In an ideal world, all public research infrastructures would get email
address and such, if they activate R&S (to prove they belong to the research
community)  and Coco (to prove that they adhere to EU privacy legislation).
Publishers would be able to get more than a targeted ID (pairwise subjectID
of course) if they activate - lets call it publisherCoco for now - (to prove
that they belong to the community of for profit publishers and o prove that
they adhere to EU privacy legislation), like a persitent ID, but still no
email, etc., which they still would have to ask from the user. That was the
idea.
The above would need some serious unpacking and closer analysis in
order to draw any conclusions from it, IMO.
E.g. there are for-profit and non-profit library service operators in
common use today, so that division is not overly helpful in practice.
And at least GDPR doesn't differentiate between those cases at all.
(I.e., you don't get a "get out of jail free"-card if you're
non-profit and/or part of publicly funded research: The exact same
criteria and rules apply either way.)
I find your whole notion of "sides" questionable which in my
experience mirrors neither the "same side" situation (many research
projects are having impossibly hard times[1] getting institutions to
release needed attributes so that those institutions' own members can
perform the research they've been hired to do, as you allude to above)
nor the "other side" one (e.g. publishers pretty much always have a
contract with either the institution itself or an agent acting on
behalf of the institutiton, such as a local library consortium; due to
that contractual relationship with institutions there's an extablished
communication channel that allows to tie in discussions about up
technical integration, too; so if anything they're closer to the
institutions that their "same side" publicly funded research projects).
I also don't share your blanket statement of distrust wrt publishers
who according to your first paragraph above are likely to misuse
personal data even if they claim accordance with CoCo (and therefore,
GDPR). Not least because their business model isn't based on
monetising that personal data, AFAIK.
I'm still open to hearing arguments about what exactly it is you want
to achieve and what you feel the problem/s is/are in this area but I
think that requires more detailed discussion.
Best regards,
-peter
[1] https://refeds.org/a/1154 to reference one well-known example
_______________________________________________
FIM4L mailing list
FIM4L@lists.daasi.de
http://lists.daasi.de/listinfo/fim4l
-- 

Peter Gietz, CEO

DAASI International GmbH
Europaplatz 3
D-72072 Tübingen
Germany

phone: +49 7071 407109-0
fax:   +49 7071 407109-9
email: peter.gietz@daasi.de
web:   www.daasi.de

Sitz der Gesellschaft: Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 382175
Geschäftsleitung: Peter Gietz