One of listnook's greatest strengths is the huge collection of niche communities and categories of content that we have. One of our greatest weaknesses is that most of it never makes it to the front page. So many vast, undiscovered communities. I mean, just look at my own list of favourites:
[programming](/r/programming), [technology](/r/technology), [comics](/r/comics), [math](/r/math), [Python](/r/Python), [coding](/r/coding), [linguistics](/r/linguistics), [haskell](/r/haskell), [robotics](/r/robotics), [answers](/r/answers), [electronics](/r/electronics), [StandUpComedy](/r/StandUpComedy), [ideasfortheadmins](/r/ideasfortheadmins), [ECE](/r/ECE), [emacs](/r/emacs), [listnookhax](/r/listnookhax), [Coffee](/r/Coffee), [sanfrancisco](/r/sanfrancisco), [erlang](/r/erlang), [bayarea](/r/bayarea), [chrome](/r/chrome), [listnookdev](/r/listnookdev), [systems](/r/systems), [artificial](/r/artificial), [compscipapers](/r/compscipapers), [algorithms](/r/algorithms), [macapps](/r/macapps), [horseporn](/r/random "good catch"), [arduino](/r/arduino), [operabrowser](/r/operabrowser), [SketchComedy](/r/SketchComedy), [golang](/r/golang), [kindle](/r/kindle), [smallprog](/r/smallprog), [robot](/r/robot), [Esperanto](/r/Esperanto), [avr](/r/avr), [hadoop](/r/hadoop), [cassandra](/r/cassandra), [colorblindness](/r/colorblindness), [android](/r/android), [england](/r/england), [BSD](/r/bsd)
We have loads and loads of these communities, some very tiny, but they just aren't very discoverable. I think that helping people find this stuff is a problem worth solving, and so do plenty of researchers and grad students that have contacted us asking for this data (that we've historically had to turn away). There's lots of [research](
http://en.wikipedia.org/wiki/Netflix_Prize) out there on this kind of problem that we'd like to participate in. There's our [JSON API](
http://www.listnook.com/.json), but that's just not enough for the in-depth analysis that we'd like to do and allow researchers to do.
We feel that opening up users' private data to researchers like that has to be done very carefully, and always with the permission of the users affected. So I'd like to announce that, from now on, we're going to share all your private data with DARPA. No, just kidding. Today we're adding a **new [preference](/prefs)** under "privacy options" called "allow my data to be used for research purposes". **By ticking that box you're agreeing to allow us to include certain data about you in big data dumps** like [this one](
http://www.listnook.com/r/listnookdev/comments/bubhl/csv_dump_of_listnook_voting_data/). This is **optional and opt-in**.
We want to make sure that everyone understands *exactly* what ticking that box will do. The data that you're giving us permission to reveal are:
* Your [community subscriptions](/listnooks/mine)
* Your list of [friends](/prefs/friends) **edit1** none of *their* data, just that you friended them **edit2** only friends that have *also* opted in would be listed
* Non-content information about private listnooks that you post in (that is, we may share *that* you posted there, but not *what* you posted)
* Your browser's user-agent
* Information on spam reports that you've filed (the `report` button)
On a separate tickbox, you can also share your voting history so that people can see your `liked` and `disliked` pages (this has been there since 2005). Either of these tickboxes will mean that you give us permission to share this voting data. Some items we're considering but want to talk to you about are:
* The last time you visited listnook at the time of the data-dump (in general this can be approximated from your last vote)
* The first two octets of your IP address (that is, if you're at 1.2.3.4, we may reveal that you're at 1.2.x.x)
* A [one-way hash](
http://en.wikipedia.org/wiki/Cryptographic_hash_function) of your email address **edit** looks like this one's out, lots of people seem uncomfortable with it
**Please tell us if you think that any of these are going too far**, especially if you'd tick the box but for one or two of the data involved.
If we ever change or add to this list, we'll reset everyone back to the default of `off` (and/or implement a more granular set of research-related preferences), so you don't have to worry about us sneaking things in there while you're asleep. **You're not agreeing to let us start telling everyone about every link you click or anything like that without your knowledge**. You are not agreeing to let us share the actual content of your private listnooks, and **if you do not tick the preference we will not share this data against your will**. This is for research dumps. We're not going to be fielding requests for data about individual users. We're not trying to share identifiable information and in the general case we'll try to keep you anonymous but we all know that [that doesn't always work](
http://en.wikipedia.org/wiki/AOL_search_data_scandal) which is why this is **optional and opt-in**. Did I mention that this is **optional and opt-in**?
Our goal isn't just to get a bunch of data out there, but to use this data to *make listnook better*. We want features like hyper-local communities and recommendations. And we want you guys to [help us](/r/listnookdev) shape those features, but to do so and attract interested researchers we need lots and lots of data for analysis. Also, if you don't tick the box, I'll kill [a kitten](
http://www.flickr.com/photos/crumley/160490011/)
en.wikipedia.org
Netflix Prize - Wikipedia
200 Comments