External Data
» NextTopic
Anthony Goldbloom (Kaggle)
Posts 350Competition Admin Kaggle Admin Thanks 67 Joined 20 Jan '10 |
Entrants are welcome to use other data to develop and test their algorithms and entries until 11:59:59 UTC on April 4, 2012 if the data are (i) freely available to all other Entrants’ and (i) published (or a link provided) to the data in the “External Data” on this Forum topic within one (1) week of an entry submission using the other data. Entrants may not use any data other than the Data Sets after 11:59:59 UTC on April 4, 2012 without prior approval. |
#1
/ Posted
14 months ago
|
Christopher Hefele
Posts 73Thanks 41 Joined 1 Jul '10 |
|
#2
/ Posted
14 months ago
|
Jeremy Howard (Kaggle)
Posts 165Kaggle Admin Thanks 58 Joined 13 Oct '10 |
Thanked by
Christopher Hefele ,
James Hall ,
and
Vikram Jha
|
#3
/ Posted
14 months ago
|
factfiber
Posts 4Joined 5 Apr '11 |
|
#4
/ Posted
14 months ago
|
Jeremy Howard (Kaggle)
Posts 165Kaggle Admin Thanks 58 Joined 13 Oct '10 |
|
#5
/ Posted
14 months ago
|
Jeremy Howard (Kaggle)
Posts 165Kaggle Admin Thanks 58 Joined 13 Oct '10 |
|
#6
/ Posted
14 months ago
|
Uri Blass
Posts 252Thanks 4 Joined 5 Aug '10 |
I wonder how can you prove that people do not use other data to develop and test their algorithm. People may use other data but practically deny it and explain the constants that they use or the conditions that they check as a simple guess because even people who do not use other data guess and a doctor with some experience may simply guess better based on her(his) experience. |
#7
/ Posted
11 months ago
|
ATG
Posts 2Joined 12 Dec '10 |
I am also curious to know what the answer ti Uri's question would be - since we're not required to provide any model equation. Of course it can be made required for the top performers at a later stage. If there is any other way of checking the use of external data, admin, can you please let us know? |
#8
/ Posted
11 months ago
|
Uri Blass
Posts 252Thanks 4 Joined 5 Aug '10 |
I do not use external data but I am certainly guessing things by try and error in the leaderboard because I have no way to test. The problem is that we have no set of people with data from year 1 to year 4 so we have no way to test models of how to predict year 4 based on year 1,2,3. We can try to predict year 3 based on year 1 and year 2 data but it is clearly a different problem than predicting year 4 based on years 1,2,3. Data about different people(that we do not need to predict) for years 1,2,3,4 certainly can help. |
#9
/ Posted
11 months ago
|
cnie
Posts 4Joined 7 Apr '11 |
You can predict Y3 based Y1,Y2, and then predict Y4 based on Y2,Y3. |
#10
/ Posted
10 months ago
|
Justin Washtell
Posts 48Thanks 15 Joined 26 Aug '10 |
@Jeremy. I just ran a number of queries on HCUPnet and downloaded the results in spreadsheet format. At no point was I presented with any terms to accept and I can find no mention of limitations on the use or redistribution of the data on the site. What I could find was... "You can purchase many HCUP databases to do more detailed analyses not possible through HCUPnet". and "Many of the databases that are featured in HCUPnet can be purchased through the HCUP Central Distributor or from the States. If you find that HCUPnet does not answer all your questions, or you need more sophisticated statistics, then you may wish to purchase the databases and do your own analysis... You will need to complete an application form and sign a data use agreement before purchasing your data." This suggests that the aggregate data available through the site is entirely free to use in any way. Are you in a position to confirm that this is so and that we can use it in this contest? Despite its limitations, it looks extremely useful! |
#11
/ Posted
9 months ago
|
Justin Washtell
Posts 48Thanks 15 Joined 26 Aug '10 |
FYI, I have just submitted the following query to AHRQ, who manage HCPUnet: "Hello. I cannot find on your websites any information concerning limitations on the use of the aggregare data that is freely available through the HCUPnet query interface. Can I ask you to clarify: Are there any limitations and, if so, where are these described? Or are any tables of figures produced by the free query interface essentially in the public domain?" |
#12
/ Posted
9 months ago
|
Jeremy Howard (Kaggle)
Posts 165Kaggle Admin Thanks 58 Joined 13 Oct '10 |
According to HCUPNet, "It is the responsibility of the user to contact and obtain the needed copyright permissions prior to reproducing materials in any form" (www.ahrq.gov/news/gdlcopyr.htm ). So I think we should wait for you to receive a reply from your query to HCUPNet - or directly contact the copyright owner of the data you wish to use.
Thanked by
Justin Washtell ,
and
Ian11
|
#13
/ Posted
9 months ago
|
Justin Washtell
Posts 48Thanks 15 Joined 26 Aug '10 |
That link would seem to apply to "clinical practice guidelines" only, which I do not think are anything to do with the databases. At any rate, I received this today, in direct response to the email which I copied above... "Dear Mr. Washtell: Thank you for your e-mail and interest in HCUPnet. Your e-mail was forwarded to the HCUP User Support inbox. Information obtained through HCUPnet is considered public information and no special permission is required to publish the statistics. We do, however, request that you source the information with the appropriate citation. Recommendations for citing are located on the HCUP-US Website at hcup-us.ahrq.gov/tech_assist/citations.jsp. If you have any additional questions, please contact User Support at this address. Sincerely, HCUP User Support" Can I take it from this then that I/we can build models using this data as long as I/we post a copy of the data (and/or sufficient information for other users to generate the exact same data through the HCUPnet interface) on here - along with the requisite citation of course. I can forward the actual email I received to Kaggle/HPN if necessary. |
#14
/ Posted
9 months ago
|
Justin Washtell
Posts 48Thanks 15 Joined 26 Aug '10 |
Further... Dear Mr. Washtell: I am responding to your inquiry on behalf of Randie Siegel, AHRQ's associate director for publishing and electronic dissemination. You want to know about limitations to publishing research based on HCUPnet data. The tables produced by HCUPnet are in the public domain, but source citation is greatly appreciated and encouraged. As far as I can tell, no data use agreement is needed. There is a page on the HCUP Web site that addresses the issue of publication requirements, “Requirements for Publishing Results with HCUP Data” (www.hcup-us.ahrq.gov/db/publishing.jsp). I see from the description page about HCUPnet, that HCUPnet is programmed to automatically abide by the privacy rules set down for using any of the HCUP databases. Otherwise, the publishing requirements page links to a page of suggested citations (www.hcup-us.ahrq.gov/tech_assist/citations.jsp), with a section on citing HCUPnet: Citing HCUPnet: First list HCUPnet, then HCUP, followed by the appropriate data years, and then AHRQ and the related Web link. Lastly, include the date of access. Consider the following example: If you still have questions about using HCUPnet data, contact HCUP User Support via email (hcup@ahrq.gov). Sincerely, |
#15
/ Posted
9 months ago
|
with —