BIAS | ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Modeling - Preparation
Introduction
This section will focus on supervised machine learning. Specifically, classification with the following families will be used:
Supervised Machine Learning models require labeled data, or known tags on the data to train the model. Additionally, when teaching the models, the data is split into disjoint training and testing sets. In essence, the models learn from the training set and then are tested on unseen data. This helps to prevent overfitting and simulates applying the model on real-world data.
This is what the exploratory and unsupervised methods in the previous sections have been leading to. The idea is to begin with the NewsAPI data labeled with political bias by news organization. After creating acceptable models, they will be applied to the Reddit data in an attempt to project political bias on Reddit authors. Given the ultimate goal of finding positive and negative sentiment on the topic of student loan forgiveness, and the fact that the sentiment is roughly split along politcal bias, aiming to classify by political bias will be a decent indicator of sentiment.
Data Preparation
To prepare for the modeling, the NewsAPI data where articles from organizations which have known political bias will be used. To increase the efficiency of the models, general articles (non-topic specific) will be combined with the topic specific articles. The Reddit data which will have political bias projected onto it will be content aggregated by Author. Additionally, only authors with an acceptable amount of content will be used (roughly equivalent to the first quartile of article length).
NewsAPI
As a reminder, the political labels are:
- Left
- Lean Left
- Center
- Lean Right
- Right
There will be several aggregations of the labels used:
- 5 Labels (strictly all five)
- 3 Labels
- Lean Left combined into Left
- Center
- Lean Right combined into Right
- 3 Labels Strict
- Strictly Left
- Strictly Center
- Strictly Right
- 2 Labels
- Lean Left combined into Left
- Lean Right combined into Right
- 2 Labels Strict
- Strictly Left
- Strictly Right
Each of these aggregations will be transformed into labeled 1000 word vectorized versions.
The Reddit data will remain unlabeled, as this will be where the models are applied to project political bias onto authors. However, the Reddit data will be transformed into word vectorized versions with no limit on the maximum word count.
Vectorizing
NewsAPI Data
After the text data is cleaned, stopwords removed, and lemmatized, CountVectorization is performed and the labels were reappended to this. A sample of this data looks like:
From this vectorized version of the data, rows will be aggregated or dropped dependong on the 5, 3, or 2 strategy outlined above.
Reddit Data
After the text data is cleaned, stopwords removed, and lemmatized, CountVectorization is and labels were not appended. A sample of this data looks like:
aa | abandon | abandoned | ability | able | abortion | abraham | absolute | absolutely | absolve | abusing | academically | accelerated | accept | acceptable | acceptance | accepted | accepting | access | accessible | accident | accidentally | accommodation | accomplished | accomplishment | accordance | according | accordingly | account | accountability | accountable | accountant | accounting | accreditation | accrual | accrued | accrues | accruing | accumulate | accumulated | accumulating | accumulation | accurate | aced | achieve | achieves | act | action | active | actively | actual | actually | ad | adam | adapt | add | added | adding | addition | additional | additionally | address | addressed | addressing | adjudication | adjust | adjustable | adjusted | adjustment | admin | administration | administrative | administratively | administrator | admiral | admission | admit | admitting | adopt | adoption | adoreing | adoring | adult | advantage | advantageous | advice | advocacy | advocate | advocating | af | affair | affect | affected | afford | affordable | africa | ag | age | agency | agenda | week | weekend | weekly | weighing | weird | weirdly | welcome | welfare | wellwishes | went | west | western | whats | wheel | white | whopping | wich | widely | wife | wiggle | wild | wildly | willful | willfully | william | willing | win | wind | window | wing | winner | wiped | wiping | wire | wise | wisely | wiser | wish | wished | withdraw | withdrawn | witherspoon | wo | woman | won | wonder | wondered | wonderful | wont | word | wording | work | workaround | workarounds | worked | worker | workerslatest | workforce | working | world | worldnews | worried | worry | worse | worst | worth | worthiness | worthless | wouldn | wound | wow | wrinkle | write | writing | written | wrong | wrote | wtfhow | xfinity | yale | yap | yard | yea | yeah | year | yearly | yep | yes | yesterday | yield | york | youdid | young | younger | youngest | youth | youtube | yr | zero | zillion |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Training Testing Split
An important feature of the training and testing sets are that they are disjoint. Notice the indices in the first few rows of the sets below are different between the training and testing sets but are the same within the training and testing data compared to the labels. This ensures a real-world simulation but helps the models learn by matching the records with their respective labels.
Five Labels
Training Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Testing Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Three Labels
Training Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Testing Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Strict Three Labels
Training Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Testing Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Two Labels
Training Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Testing Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Strict Two Labels
Training Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Testing Data and Labels Example
Data
ability | able | academic | acceptance | access | according | account | act | acting | action | activity | actually | ad | add | added | adding | addition | additional | additionally | address | administration | administrative | advantage | affect | affected | affordable | age | agency | agenda | agent | ago | agreement | agricultural | agriculture | ahead | ai | aid | aim | air | allow | allowed | allowing | allows | amazon | america | american | angeles | announced | announcement | annual | answer | ap | app | application | apply | approach | approved | area | art | article | ask | asked | asset | assistance | associated | association | attack | attempt | attorney | authority | available | average | avoid | away | bad | balance | bank | bankruptcy | based | basis | began | begin | believe | benefit | best | better | biden | big | biggest | billion | billionaire | bird | black | blagojevich | block | blocked | blue | board | body | book | tech | technology | tell | temporarily | temporary | term | tetfund | texas | thing | think | thought | thousand | threat | thursday | time | tip | title | today | told | took | tool | total | track | trade | transfer | transgender | treasury | treatment | tried | trillion | trump | try | trying | tuesday | tuition | turn | type | typically | ultimately | uncertainty | unclear | undergraduate | understand | union | united | university | update | usaid | use | used | user | using | valuable | value | vance | various | versus | vice | video | view | virginia | vote | voter | vought | want | wanted | war | washington | watch | water | way | wealth | website | wednesday | week | went | west | white | win | wing | wo | woman | won | word | work | worked | worker | workforce | working | world | worried | worth | wrote | year | yes | yield | york | young | yoy | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Label
BIAS | |
---|---|
Loading ITables v2.3.0 from the internet... (need help?) |
Balance of Labels
The introduction on this page discusses models “learning”. When models are taught on a dataset with unbalanced labels, the model may incorrectly predict a label which has a higher prevalence in the training data. It’s proper to examine the balance of the labels in the datasets. If the model performs poorly, this could be an area to either downscale by random removal or upscale by bootstrapping until the labels come into better balance. Fortunately, this data isn’t too skewed, but it isn’t perfect. The proportions are illustrated below, along with their total counts. Notice how the strict labels differ from the aggregated non-strict labels.
Five Labels
Three Labels
Strict Three Labels
Two Labels
Strict Two Labels
Applications
This data will be used throughout the remainder of the modeling sections.