Introduction

I’ve long been curious about legislative data—how bills become law and what drives the process from introduction to passage. More importantly, I wanted to explore ways to make this data accessible and transparent, helping people better understand how Congress works.

This project analyzes the 117th Congress using data scraped from Congress.gov. While my ultimate goals include geographic analysis and predictive modeling of voting patterns, I’m starting with fundamental exploratory analysis.

Today, I’ll walk through the data collection process and share key insights about legislative patterns, party dynamics, and policy areas.

Data Collection

My primary source is Congress.gov, maintained by the Library of Congress. I focused on the 117th Congress (2021-2023), collecting data on resolutions and joint resolutions while omitting amendments and concurrent resolutions.

Data collected:

Bill TypeIntroduced
House Resolution9,698
House Joint Resolution106
Senate Resolution5,357
Senate Joint Resolution70
Total15,231

Technical Implementation

Building the web crawler was straightforward thanks to Congress.gov’s well-organized structure. I used Python with BeautifulSoup and Selenium to handle the site’s dynamic content loading.

Key technologies:

To respect the site’s bandwidth, I added 5-second delays between requests, resulting in a 3-day crawl time. The crawler and processed data will be available on GitHub for public use.

For each bill, I queried two pages:

  • All info page: https://www.congress.gov/bill/117th-congress/{bill_type}/{bill_id}/all-info
  • Text page: https://www.congress.gov/bill/117th-congress/{bill_type}/{bill_id}/text?format=txt

The parsing process involved targeting specific HTML elements and implementing basic caching to avoid redundant requests.

Key Findings

This analysis focuses on high-level patterns rather than cross-feature relationships or bill text analysis. The insights are organized by the main variables tracked.

Legislative Outcomes

What matters most: which bills get introduced and which become law?

Each bill has a tracker status indicating its position in the legislative process. The eight possible statuses can be grouped into three meaningful categories:

  • Introduced: Bills introduced but never voted on
  • Stalled: Bills that saw votes but didn’t become law (since the 117th Congress ended, these effectively died)
  • Law: Bills signed by the President
IntroducedStalledLaw
House Resolution8,977523198
House Joint Resolution10213
Senate Resolution5,083114160
Senate Joint Resolution5794
Total14,219647365

Key insights:

  • Only 7% of introduced bills ever receive a vote
  • Of bills that receive votes, 36% become law
  • Overall, just 2% of introduced bills become law

The bill sponsor—the primary member who introduces legislation—provides insights into party and geographic patterns.

Party Breakdown

IntroducedStalledLaw
Democrat8,271437235
Republican5,883210130
Independent6500

Party comparison:

  • Democrats: 7.5% of bills moved beyond introduction; 2.6% became law
  • Republicans: 5.5% of bills moved beyond introduction; 2.1% became law
  • When bills do advance, Republicans have a slightly higher success rate (38% vs 35%)

Geographic Distribution

Top 10 states by bills introduced:

RankingState: IntroducedState: StalledState: Law
1CA: 1,350CA: 93CA: 34
2TX: 879NY: 44MI: 30
3NY: 784TX: 43TX: 25
4FL: 766MI: 28NY: 24
5IL: 660NJ: 28MN: 17
6PA: 521IL: 27IL: 16
7NJ: 478VA: 26OH: 11
8MI: 380FL: 24VA: 11
9OH: 377PA: 22FL: 11
10MA: 361OH: 19GA: 9

Per-representative normalization reveals different patterns:

RankingState: IntroducedState: StalledState: Law
1DC: 101.0DC: 7.0AK: 2.2
2NH: 47.5AK: 2.8NH: 2.0
3MT: 44.0IA: 2.3MT: 2.0
4OR: 41.0SD: 2.3MI: 1.9
5NV: 40.0NH: 2.2MN: 1.5
6DE: 38.7VA: 2.0HI: 1.5
7SD: 38.3NJ: 2.0CT: 1.3
8IA: 37.7PR: 2.0IA: 1.2
9RI: 36.5NV: 1.8OR: 1.1
10UT: 36.0MO: 1.8SD: 1.0

Top Individual Sponsors

Most prolific legislators by bills introduced:

RankingIndividual: IntroducedIndividual: StalledIndividual: Law
1Sen. Rubio (R-FL): 186Sen. Peters (D-MI): 11Sen. Peters (D-MI): 19
2Sen. Klobuchar (D-MN): 143Sen. Cornyn (R-TX): 8Sen. Cornyn (R-TX): 15
3Sen. Lee (R-UT): 125Rep. Connolly (D-VA-11): 8Sen. Klobuchar (D-MN): 7
4Sen. Markey (D-MA): 118Rep. Takano (D-CA-41): 8Sen. Tester (D-MT): 6
5Sen. Casey (D-PA): 116Sen. Grassley (R-IA): 7Sen. Rubio (R-FL): 6
6Sen. Cortez Masto (D-NV): 109Del. Norton (D-DC): 7Rep. DeLauro (D-CT-3): 6
7Sen. Booker (D-NJ): 106Rep. Johnson (D-TX-30): 7Sen. Grassley (R-IA): 5
8Sen. Durbin (D-IL): 102Rep. Katko (R-NY-24): 7Sen. Ossoff (D-GA): 4
9Del. Norton (D-DC): 101Rep. Dean (D-PA-4): 6Sen. Murkowski (R-AK): 4
10Sen. Menendez (D-NJ): 99Rep. Wagner (R-MO-2): 6Sen. Padilla (D-CA): 4

Effectiveness score (laws enacted / total bills):

$$ \text{effectiveness} = \frac{\text{bills that became law}}{\text{total bills introduced}} $$

RankingIndividual: Effectiveness Score
1Rep. Pelosi (D-CA-12): 0.500
2Rep. Mrvan (D-IN-1): 0.444
3Rep. Yarmuth (D-KY-3): 0.333
4Rep. Stivers (R-OH-15): 0.250
5Rep. Graves (R-MO-6): 0.222
6Rep. Jeffries (D-NY-8): 0.200
7Rep. Neal (D-MA-1): 0.200
8Rep. Palazzo (R-MS-4): 0.200
9Sen. Peters (D-MI): 0.186
10Rep. Fischbach (R-MN-7): 0.176

Policy Focus Areas

Each bill is assigned a primary policy area. Here are the most active areas by legislative outcome:

RankingPolicy Area: IntroducedPolicy Area: StalledPolicy Area: Law
1Health: 1,885Government Operations: 79Government Operations: 94
2Armed Forces: 1,114Armed Forces: 60Armed Forces: 69
3Taxation: 1,066International Affairs: 60Crime & Law Enforcement: 31
4Government Operations: 982Health: 56Health: 19
5International Affairs: 866Crime & Law Enforcement: 44Native Americans: 17
6Crime & Law Enforcement: 842Public Lands: 44International Affairs: 14
7Education: 663Science & Technology: 44Economics & Finance: 13
8Transportation: 663Commerce: 43Public Lands: 13
9Public Lands: 548Finance: 34Commerce: 13
10Finance: 547Emergency Management: 27Emergency Management: 11

Notable patterns: Health dominates introductions but has lower success rates, while government operations and armed forces bills are more likely to become law.

Next Steps

This analysis reveals clear patterns in congressional activity: most bills die in committee, party affiliation influences success rates, and certain policy areas have higher passage rates than others.

Future work will explore:

  • Committee dynamics and voting patterns
  • Geographic analysis of state-level interests
  • Bill text analysis using NLP techniques
  • Predictive modeling for bill outcomes

Update: I’ve now written about Congressional Bill Policy Area Classification, which uses machine learning to automatically categorize bills by policy area using 48K+ bills from three Congresses. See the complete Congressional Data Analysis series for the full learning path.

The complete dataset and analysis code will be made publicly available to encourage further research into legislative transparency.

Have thoughts or questions about this analysis? I’d love to hear from you!