• Information
  • Data
  • Forum
  • Leaderboard
  • All Forums » HPN » Heritage Health Prize

External Data

» Next
Topic
<123> All
spacer
Anthony Goldbloom (Kaggle)
Competition Admin
Kaggle Admin
Posts 350
Thanks 67
Joined 20 Jan '10
spacer

Entrants are welcome to use other data to develop and test their algorithms and entries until 11:59:59 UTC on April 4, 2012 if the data are (i) freely available to all other Entrants’ and (i) published (or a link provided) to the data in the “External Data” on this Forum topic within one (1) week of an entry submission using the other data.  Entrants may not use any data other than the Data Sets after 11:59:59 UTC on April 4, 2012 without prior approval.

 
#1 / Posted 14 months ago
spacer
Christopher Hefele
Posts 73
Thanks 41
Joined 1 Jul '10
Anthony, I'm confused about your statement above. In the rules, seems to emphasize NEW data --- "Entrants may not use new external data in connection with the development of their entries after 11:59:59 UTC on April 4, 2012 without the prior written permission of Sponsor." In contrast, your post above seems to say we cannot use ANY external data after 4/4/2012 (that is, we have to back the external data out of our models to continue competing). Can you clarify / confirm what happens after 4/4/2012 in regards to using external data (or not)? Can we use data sources that have been declared in the forums up to that point? Or is all external data forbidden after 4/4/2012?
 
#2 / Posted 14 months ago
spacer
Jeremy Howard (Kaggle)
Kaggle Admin
Posts 165
Thanks 58
Joined 13 Oct '10
spacer
Anthony's post is incorrect - the rules are correct. I emailed Anthony yesterday to ask him to update his post, but he has been a little pre-occupied with other things! ;) External data is allowed after 4/4/2012, as long as its been released before then.
Thanked by Christopher Hefele , James Hall , and Vikram Jha
 
#3 / Posted 14 months ago
spacer
factfiber
Posts 4
Joined 5 Apr '11
Does "freely available" imply anything about the format? Should it be csv? Is, for instance, HCUPNet data (available via a query interface) allowed? I don't relish having to parse someone's post of HL7 EDI data....
 
#4 / Posted 14 months ago
spacer
Jeremy Howard (Kaggle)
Kaggle Admin
Posts 165
Thanks 58
Joined 13 Oct '10
spacer
That's a good question Shaunc - if there are external data sets that are only usable in practice if additional information is provided (lookup tables, parsing algorithms, etc) we will require all the information and data necessary to utilise the data set. It doesn't have to be CSV, but it does have to be a file in a format that can be utilised from standard programming and analysis tools without commercial libraries. If people have trouble getting files into a format that is readily sharable, they can contact us and we'll see if we can help. We certainly don't want technical problems to stop anyone from accessing data that could help get better answers!
 
#5 / Posted 14 months ago
spacer
Jeremy Howard (Kaggle)
Kaggle Admin
Posts 165
Thanks 58
Joined 13 Oct '10
spacer
Oh sorry forgot to answer re HCUP data. My understanding is that to download files from there requires purchase of a license (and it's not OK to use their online interface, unless you can and do save the results and share them on the forum, since the rules require external data be there). I'm not familiar with their licensing terms - if you are able to get a hold of data that you can legally share here and HPN can use for this purpose, please do. If you can't, but think any particular data set you're aware of would be really useful for HPN to purchase for this competition, please tell us the details and we'll pass it on to HPN to follow up - I expect they'll be happy to buy data for competitors if it helps get better solutions.
 
#6 / Posted 14 months ago
spacer
Uri Blass
Posts 252
Thanks 4
Joined 5 Aug '10

I wonder how can you prove that people do not use other data to develop and test their algorithm.

People may use other data but practically deny it and explain the constants that they use or the conditions that they check as a simple guess because even people who do not use other data guess and a doctor with some experience may simply guess better based on her(his) experience.

 
#7 / Posted 11 months ago
spacer
ATG
Posts 2
Joined 12 Dec '10

I am also curious to know what the answer ti Uri's question would be - since we're not required to provide any model equation. Of course it can be made required for the top performers at a later stage. If there is any other way of checking the use of external data, admin, can you please let us know?

 
#8 / Posted 11 months ago
spacer
Uri Blass
Posts 252
Thanks 4
Joined 5 Aug '10

I do not use external data but I am certainly guessing things by try and error in the leaderboard because I have no way to test.

The problem is that we have no set of people with data from year 1 to year 4 so we have no way to test models of how to predict year 4 based on year 1,2,3.

We can try to predict year 3 based on year 1 and year 2 data but it is clearly a different problem than predicting year 4 based on years 1,2,3.

Data about different people(that we do not need to predict) for years 1,2,3,4 certainly can help.

 
#9 / Posted 11 months ago
spacer
cnie
Posts 4
Joined 7 Apr '11

You can predict Y3 based Y1,Y2, and then predict Y4 based on Y2,Y3. 

 
#10 / Posted 10 months ago
spacer
Justin Washtell
Posts 48
Thanks 15
Joined 26 Aug '10

@Jeremy. I just ran a number of queries on HCUPnet and downloaded the results in spreadsheet format. At no point was I presented with any terms to accept and I can find no mention of limitations on the use or redistribution of the data on the site. What I could find was...

"You can purchase many HCUP databases to do more detailed analyses not possible through HCUPnet".

and

"Many of the databases that are featured in HCUPnet can be purchased through the HCUP Central Distributor or from the States. If you find that HCUPnet does not answer all your questions, or you need more sophisticated statistics, then you may wish to purchase the databases and do your own analysis... You will need to complete an application form and sign a data use agreement before purchasing your data."

This suggests that the aggregate data available through the site is entirely free to use in any way. Are you in a position to confirm that this is so and that we can use it in this contest? Despite its limitations, it looks extremely useful!

 
#11 / Posted 9 months ago
spacer
Justin Washtell
Posts 48
Thanks 15
Joined 26 Aug '10

FYI, I have just submitted the following query to AHRQ, who manage HCPUnet:

"Hello. I cannot find on your websites any information concerning limitations on the use of the aggregare data that is freely available through the HCUPnet query interface. Can I ask you to clarify: Are there any limitations and, if so, where are these described? Or are any tables of figures produced by the free query interface essentially in the public domain?"

 
#12 / Posted 9 months ago
spacer
Jeremy Howard (Kaggle)
Kaggle Admin
Posts 165
Thanks 58
Joined 13 Oct '10
spacer

According to HCUPNet, "It is the responsibility of the user to contact and obtain the needed copyright permissions prior to reproducing materials in any form" (www.ahrq.gov/news/gdlcopyr.htm ). So I think we should wait for you to receive a reply from your query to HCUPNet - or directly contact the copyright owner of the data you wish to use.

Thanked by Justin Washtell , and Ian11
 
#13 / Posted 9 months ago
spacer
Justin Washtell
Posts 48
Thanks 15
Joined 26 Aug '10

That link would seem to apply to "clinical practice guidelines" only, which I do not think are anything to do with the databases. At any rate, I received this today, in direct response to the email which I copied above...


"Dear Mr. Washtell:

Thank you for your e-mail and interest in HCUPnet. Your e-mail was forwarded to the HCUP User Support inbox. Information obtained through HCUPnet is considered public information and no special permission is required to publish the statistics. We do, however, request that you source the information with the appropriate citation. Recommendations for citing are located on the HCUP-US Website at hcup-us.ahrq.gov/tech_assist/citations.jsp.

If you have any additional questions, please contact User Support at this address.

Sincerely,

HCUP User Support"


Can I take it from this then that I/we can build models using this data as long as I/we post a copy of the data (and/or sufficient information for other users to generate the exact same data through the HCUPnet interface) on here - along with the requisite citation of course.

I can forward the actual email I received to Kaggle/HPN if necessary.

 
#14 / Posted 9 months ago
spacer
Justin Washtell
Posts 48
Thanks 15
Joined 26 Aug '10

Further...


Dear Mr. Washtell:

I am responding to your inquiry on behalf of Randie Siegel, AHRQ's associate director for publishing and electronic dissemination. You want to know about limitations to publishing research based on HCUPnet data. The tables produced by HCUPnet are in the public domain, but source citation is greatly appreciated and encouraged. As far as I can tell, no data use agreement is needed.

There is a page on the HCUP Web site that addresses the issue of publication requirements, “Requirements for Publishing Results with HCUP Data” (www.hcup-us.ahrq.gov/db/publishing.jsp). I see from the description page about HCUPnet, that HCUPnet is programmed to automatically abide by the privacy rules set down for using any of the HCUP databases. Otherwise, the publishing requirements page links to a page of suggested citations (www.hcup-us.ahrq.gov/tech_assist/citations.jsp), with a section on citing HCUPnet:

Citing HCUPnet:

First list HCUPnet, then HCUP, followed by the appropriate data years, and then AHRQ and the related Web link. Lastly, include the date of access. Consider the following example:
HCUPnet. Healthcare Cost and Utilization Project (HCUP). 2006-2009. Agency for Healthcare Research and Quality, Rockville, MD. hcupnet.ahrq.gov/ Accessed May 5, 2003.

If you still have questions about using HCUPnet data, contact HCUP User Support via email (hcup@ahrq.gov).

Sincerely,


 
#15 / Posted 9 months ago
<123> All
spacer Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?
« Back to forum
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.