Pythonを用いたデータ分析のプログラミング例

概要
詳細内容

概要

Pythonは、現代のプログラミング言語の中でも優れた性能と幅広い用途が特徴の言語です。

多くのIT企業ではPythonを用いて様々な開発業務に取り組んでおり、データ分析や機械学習、Webアプリケーションの開発などで活用されています。

ここでは、実際にPythonを活用している企業の事例を紹介します。

詳細内容

【案件 1】
企業名：株式会社A
業務内容：機械学習による顧客予測分析モデルの構築以下は、株式会社AがPythonを用いて構築した顧客予測分析モデルの一部です。

このコードは、特定の顧客の利用履歴から、その顧客が次にどの商品を利用するかを予測するために用いられます。

import pandas as pd
from lightgbm import LGBMClassifier# データ読み込み
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')# 特徴量エンジニアリング
train['flg'] = 1
test['flg'] = 0
all_data = pd.concat([train, test], axis=0)# カテゴリ変数のエンコーディング
cat_cols = ['sex', 'job', 'marital', 'education']
for col in cat_cols:
  all_data[col] = all_data[col].astype('category')
  all_data[col] = all_data[col].cat.codes# 学習データとテストデータの分割
train_data = all_data[all_data['flg'] == 1]
test_data = all_data[all_data['flg'] == 0]
train_data.drop('flg', axis=1, inplace=True)
test_data.drop('flg', axis=1, inplace=True)# モデルの構築
lgb = LGBMClassifier(random_state=42)
lgb.fit(train_data.drop('y', axis=1), train_data['y'])
test_data['y_pred'] = lgb.predict(test_data)# 結果の出力
submit = test_data[['id', 'y_pred']]
submit.to_csv('submit.csv', index=False)

このコードでは、まずpandasを用いて訓練用データとテストデータを読み込み、それらを結合して特徴量エンジニアリングを行っています。

次に、カテゴリ変数を数値にエンコーディングし、さらに学習データとテストデータに分割しています。

最後に、lightgbmを用いてモデルを構築し、テストデータを予測して結果を出力しています。

【案件 2】
企業名：株式会社B
業務内容：ウェブスクレイピングによる情報収集以下は、株式会社BがPythonを用いて収集したインフルエンサーのフォロワー数データです。

このコードは、Twitter APIを用いてインフルエンサーのフォロワー数を取得し、そのデータをCSVファイルに出力するものです。

import tweepy
import pandas as pd# 認証キーの設定
consumer_key = 'xxxxx'
consumer_secret = 'xxxxx'
access_token = 'xxxxx'
access_token_secret = 'xxxxx'# API認証
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)# インフルエンサーのユーザー名
users=['user1', 'user2', 'user3', 'user4', 'user5']# フォロワー数の取得
results = []
for user in users:
    try:
        user_info = api.get_user(screen_name=user)
        follower_count = user_info.followers_count
        results.append((user, follower_count))
    except:
        pass# 結果の出力
df = pd.DataFrame(results, columns=['username', 'follower_count'])
df.to_csv('follower_count.csv', index=False)

このコードでは、まずTwitter APIの認証キーを設定し、認証を行っています。

次に、対象となるインフルエンサーのユーザー名を指定し、それぞれのフォロワー数をTwitter APIから取得しています。

最後に、pandasを用いて結果をCSVファイルに出力しています。