-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathfeedback.txt
More file actions
248 lines (142 loc) · 15.8 KB
/
feedback.txt
File metadata and controls
248 lines (142 loc) · 15.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
The following is feedback from other Udacity students on the project:
Okay, viewing it with the "Open" link helps. Is there a way to select multiple occupations from the legend? When I click on one, it switches off the previous one...
Fang Lu23 Feb 2016
This is very interesting information. What I liked: The animation in the beginning although fast is kinda interesting to watch a second. The moving parts function properly.
A couple of questions just to understand the information presented. Are these loans for starting a small business, credit cards? I'm not familiar with why a person would take out a 20% loan. For the 1-25k bracket are the loans paycheck advance loans?
* What do you notice in the visualization?
Homemakers have a really high APR compared to other occupations. But most occupations closely follow the average
* What questions do you have about the data?
What is the size of data set? Is the data set inherently cut into these income brackets.
* What relationships do you notice?
Relationships are kind of hard to see. Possible to look at other relationships such as number of people who applies for loans in a bracket and the APR.
* What do you think is the main takeaway from this visualization?
All sorts of occupations take out loans. Homemakers have very high APR. The 25-50k bracket has the lowest APR in general (wondering if more people are in this bracket as well).
I have a couple of ideas:
The visualization is exploratory in nature in it's current state. I was wondering if you had any insights that you could communicate through your presentation of the data (more Explanatory in nature).
I couple of ways to present the data that might have some insights is to group the occupations by the Major Occupational groups and see if any difference can be see between them. I obtained the below grouping from a quick internet search.
'The Occupational Classification System manual was created for Bureau of Labor Statistics (BLS) field economists to help ensure correct occupational matches when collecting compensation data. Available to the public, this manual allows the user to lookup job descriptions for occupations found in the NCS bulletins and is used by field economists in the classification of thousands of occupations.
Major Occupational Groups (MOGs)
MOG A Professional, Technical and Related Occupations
MOG B Executive, Administrative, and Managerial Occupations
MOG C Sales Occupations
MOG D Administrative Support Occupations, Including Clerical
MOG E Precision Production, Craft, and Repair Occupations
MOG F Machine Operators, Assemblers, and Inspectors
MOG G Transportation and Material Moving Occupations
MOG H Handlers, Equipment Cleaners, Helpers, and Laborers
MOG K Service Occupations, Except Private Household
'
Source: http://www.job-analysis.net/G010.htm
Another grouping I could suggest is by education level, or even gender and race if the information is present.
Several cosmetic and usability suggestions:
- The title after your "Open" your graph could be centered more properly to the size of your graph instead of the entire browser window (nitpick)
- lighten the guidelines if you can, 20% line is very heavy for some reason and distracting
- with line plot, you can start the y-axis at 10% or even 15% to give you more room for separation between your points and lines as opposed to starting from 0 on a bar plot which relies partially on volume/size comparison.
- perhaps slightly larger points when you have more separation, kind of hard to mouse over properly at this point
- Tooltip background could be colored differently from the background of the plot so it shows up better
- Selection boxes for the occupations so multiple occupations can be added or removed with buttons that allow for select all or deselect all, or reverse selections.
Thanks for sharing! Hope this helps.
P.S. Can you possibly share how you obtained the data? Is there an API that Prosper is sharing? Thanks in advance.
Show less
Todd Trautman24 Feb 2016
* What do you notice in the visualization?
Despite similar pay, not every occupation will receive similar loan rates.
* What questions do you have about the data?
How many datapoints fall into each category? For example, are there that many students receiving $75k-$100k in salary? (And how do I get this gig??)
* What relationships do you notice?
Dip in the APR for $25k-$50k...what cause that?
* What do you think is the main takeaway from this visualization?
Different categories of jobs often see different APR despite similar pay.
* Is there something you don’t understand in the graphic?
No.
Suggestions for improvement
- lighten black horizontal lines to light grey.
- maybe show all lines as light grey, mouse-over colors a line and displays job category?
- For categorical data, curious why line plot rather than bar plot was chosen? Line plot typically implies continuous data, but salaries have been binned into categories.
- move axis labels outside the plot instead of inside
- Legend did not display well in Safari - maybe move to bottom of graph in a 5x10 matrix?
I found the plot very engaging and enjoyed exploring the data. It definitely engaged my curiosity. Really well done!
Show less
Doug Wight24 Feb 2016
+Fang Lu
Hi Fang,
Thanks very much for your comprehensive review - I am REALLY grateful for your time and effort. To answer some of your questions:
The data came from the Udacity Data Analyst Nanodegree program. They presumably sourced it from Prosper - the peer to peer lending site. As others have pointed out, I need to explain more about this in the intro, so that viewers have an understanding about what is going on.
This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others.
One thing you have drawn out is the dip in the APR at the $25-50k bracket. This would be an interesting insight - but it's WRONG - it was my mistake in data preparation. I did some wrangling in R before I wrote the file to csv and then picked it up through d3. I simply renamed categories, rather than re-ordering them. Have now adjusted for this, and it is the more normal relationship. Will send the version 2 when I've done it. But a massive thanks for spotting this - I'm kicking myself that I didn't investigate it further myself, as it was staring me in the face.....
The job groupings is also very useful - I did amalgamate down from 66 down to 50 initially, primarily to avoid too much clutter, and also to increase the ability to generate meaningful averages (i.e. some occupations had too few data points on their own to be meaningful). But it may be cleaner to further amalgamate down to the 9 you mentioned.
Thanks for the cosmetic / usability suggestions - they all make sense and I will try to implement some of them. The only one I don't think I'll do is to start the y axis at 10% or 15%. Whilst this might make the graph look more volatile, I think it could be quite misleading without being zero based.
Thanks again for your comments - a great help. When I try version 2, will send to you.
Doug
Show less
Doug Wight24 Feb 2016
+Todd Trautman
Hi Todd,
thanks for the feedback. I posted in my response to Fang Liu, that I made a mistake in my initial datawrangling in R, in that I mixed up the category titles, instead of just re-ordering. Which is why there was the dip at $25-50k instead of the normal downward relationship of APR vs income level.
Regarding the number of datapoints - I think I will put that in - I could size the circles according to numbers in the sample. That might show how dominant the field 'other' is vs more unusual ones such as 'judge' and 'pilot'. I already tried to ensure that the averages were 'reasonable' by only allowing those datapoints with at least 8 loans to be averaged. In answer to your specific question, there are apparently 11 students on $75-$100k. I agree, it is unusual andI'm not one of them!
Thanks for the cosmetic suggestions. I will try the mouse over line function, but I just think it is too busy (too many lines). I might try further amalgamating occupations to allow this.
I did choose plot, as I felt it was easier to compare two data categories (e.g. Dentists vs the Average). I think plot can be justified vs bar, because there is a relationship between the APR of the loan and the income level - i.e. everything else being equal, you'd expect the APR required to decrease with the income level. What this analysis is trying to show is the second influence on the APR - the actual occupation - i.e. what lenders might perceive to be the 'riskiness' of the occupation and therefore the income backing the loan.
Thanks very much for the comments.
Doug
Show less
Doug Wight24 Feb 2016+1
2
1
+Fang Lu
Fang, if you want the source data - here is the link. Am sure Prosper would have an API too.
https://www.google.com/url?q=https://s3.amazonaws.com/udacity-hosted-downloads/ud651/prosperLoanData.csv&sa=D&ust=1456316000624000&usg=AFQjCNGrFyWCcmJNHOEIO2UAh5220iDgzQ
Show less
Doug Wight13:52
+Fang Lu +John Enyeart +Todd Trautman
Here is my second iteration.
http://bl.ocks.org/dcfwight/eacce00080e3161888ce I've taken on board your feedback - th
1) Corrected my data mistake
2) Sized the svg so it displays neatly on bl.ocks
3) Used one colour for each occupation, rather than the rainbow.
4) Added occupation buttons for easy clicking.
5) Made the animation speed more reasonable - still relatively fast so it spins through them all quickly.
6) Added the sample size circles, to give a third dimension perspective.
7) Added a visual aid for the circle sizes to quickly allow the viewer to assess the rough sample sizes (there is a tooltip which gives exact sizes as well, but I don't draw attention to that)
As with previous comments, I've kept it as a line graph, as I think that is relevant, and I've also kept the scale zero based, (rather than starting at, say, 15%). I toyed with the idea of adding a button to toggle between the two, but I felt the chart would get too cluttered then.
As always, any feedback gratefully received.
Thx, Doug.
Read more (29 lines)
John Enyeart15:25
It looks really good. A lot more intuitive and readable than the first iteration. By the way, what kind of loans are these? A quick google search shows that the average home loan interest rate in the United States is 4.37% for a 30-year loan, but the rates shown in your visualization look high enough to be credit cards or something else along those lines...
Show less
Fang Lu19:30
Hi Doug,
I like this second iteration. The update to the selection buttons makes it much easier to select and see occupations. The transparency on the other lines make it a bit easier to see and compare the current occupation to others.
A couple of minor bugs
------
I did notice that your updateButtonColors function was not firing onclick. I did some checking, I think that you are chaining multiple .onclick commands in your buttonGroups definition var buttonGroups= allButtons.selectAll("g.button"), I believe that it only takes the later .onclick commands overwrite the previous .onclick and you have 3 .onclick settings. I'm not sure myself, having only taken Udacity's Data Viz course on D3, but you might try to combine you updateButtonColors and update function in one .onclick and see if that fixes it for you.
Another possible minor bug is that after the animation is run, the first occupation you select does not show the sample size circles, but the second one and there after functions as expected.
------
With that aside, I think the visualization is much clearer. I personally would like to know the total sample size of each bracket on your tooltip and/or percentages of each occupation per bracket.
The sample size addition was a nice touch, there's more information to be inferred and intuitions checked.
Relationships I noticed: Other occupation holds seemingly a third to half of all the samples (I guess these might be self-employed?). Also if you want a lower APR, one might try not listing your occupation, as 'None-provided' occupation has a favorable rate against the average. Average APR dropped as the income brackets increased.
Thank you for sharing your code as well, it's really helpful to be able to see how certain aspects of the visualization is coded. I had used divs as buttons, but your use of rect I feel might be more appropriate as it contains it within one svg element.
I have also downloaded the prosper data, though haven't had a chance to look at it yet, thanks for providing the link.
Show less
Doug Wight23:05
+John Enyeart
Hi John,
Thx for the comments - you ask really good questions and I have to confess I don't know all the answers. I did think of exactly what you point out in the data exploration phase, and I don't really have answers.
First I just googled 'what is the average APR on Prosper', and the answer seemed to be 'from 6% to 36%, but with the average around 15%'. I got that from this (source)[http://www.lendingmemo.com/rates-fees-lending-club-prosper/], but I'm not sure how reliable that is....
I've added some R charts to show some of the exploration that I did on this (drive) [https://drive.google.com/folderview?id=0B4KHzGWLBby0MzZJaFc0RGpIcWM&usp=sharing]
It shows the histogram with APRs around the mean of c. 20% (exact mean from R is 21.8%!)
Also attached the bar chart of loan types - the dominant one is type '1' at c. 50%. Next is type '0' at c. 15% and type '7; at c. 8%. Now, I have the definitions - type 1 is 'debt consolidation' , type '0' is 'not available' and type '7' is 'other'!
The full list of loan categories is 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans.
Suffice to say that none of them are as safe as secured mortgage lending. And if 50% of them are debt consolidation, then perhaps you could say that this is replacing credit card debt, and therefore that is why the APR is higher? I don't know.
So I also looked at some box-plots to see if there was much variation between loan category. I thought about doing this analysis instead, but just thought the analysis by occupation might be more interesting. It looks line loan category '4' is viewed as the 'safest' by lenders - this is 'personal loan', but there are not that many in the sample.
So, I don't really have the answer - I don't know why this data-set is showing a mean APR of c. 20%, when the articles are saying c. 15% - maybe it's a timing thing. I can definitely explain why it's higher than the mortgage rate - these are unsecured loans. But I can't explain all the difference. For the purposes of the Udacity course, I was just going to analyse it by occupation
Show less
Doug Wight23:22
+Fang Lu
Hi Fang.
Thanks for taking the time to have a look - I'm a coding newbie, so appreciate you looking in detail. I've cleaned up the on.click function so that it is all encapsulated in one function. (I didn't realise that you shouldn't have more than one - makes sense though, to avoid conflicts).
I also realised I had defined updateButtonColors twice, so removed one.
Finally I saw why the first circles didn't appear when you clicked on an occupation. It was because at the end of the animation, I created a whole set of phantom circles - i.e. with opacity 0. Then, when you select a particular occupation, it enters the data for that occupation, and exit()remove() all others. However, because the circles were there for the selected occupation already (albeit with opacity 0), then it didn't enter them or update them. It's only when you click the second that it actually had something to enter and update.
Here is my version 3 - fixes that and tidies things up (a bit).