python获取留存曲线的幂函数系数并计算生命周期LT


APP的LT预估模型及LTV/ROI计算https://blog.csdn.net/ISIS7Protessional/article/details/114410214

输入数据为:日期、渠道、第几日留存(x_value)、第几日留存率(y_value)【若拟合app整体留存曲线,则去掉渠道字段】

输出数据为:日期、渠道、a、b、lt【同上】

#!/usr/bin/python
# -*- coding: utf-8 -*-import sys
import pandas as pd
from scipy.optimize import curve_fit
import numpy as np
import math
import datetimedef lnhandle(x):return math.log(x)def attenuation_coefficient2(fields_avg, result_average):x = np.array(list(map(lnhandle, fields_avg)))y = np.array(list(map(lnhandle, result_average)))# print("lny:", y)x_mean = np.mean(x)y_mean = np.mean(y)m1 = 0m2 = 0for xi, yi in zip(x, y):m1 += (xi - x_mean) * (yi - y_mean)m2 += (xi - x_mean) ** 2b = m1 / m2a = math.exp(y_mean - b * x_mean)return a, bdef read_data(filelink):rawData = pd.read_csv(filelink, names=['minmax_timestamp_f','u_dic_media','x_value','ret_rate'], sep='\t')return rawDatadef zero_handle(x):v = 0.00001 if x == 0 else xreturn vdef handle_time(x):return datetime.datetime.strptime(x, '%Y-%m-%d').date()def append_predict(x):result = pd.DataFrame(columns=['minmax_timestamp_f', 'u_dic_media','a', 'b', 'lt180', 'lt270', 'lt360'])# print(x)minmax_timestamp_f = x.iloc[0, 0]u_dic_media = x.iloc[0, 1]a = x.iloc[0, 4]b = x.iloc[0, 5]lt180 = (((180) ** (1 - b)) * a - a) / (1 - b) + 1lt270 = (((270) ** (1 - b)) * a - a) / (1 - b) + 1lt360 = (((360) ** (1 - b)) * a - a) / (1 - b) + 1temp = {"minmax_timestamp_f": minmax_timestamp_f,"u_dic_media": u_dic_media,"a": a,"b": b,"lt180": lt180,"lt270": lt270,"lt360": lt360,}# print(temp)if b > 0:# result = result.append([temp], ignore_index=True)result = result.append(temp, ignore_index=True)# print(result)return resultdef xy_data(x):x = x.sort_values(by='x_value')# print(x)x_value = x['x_value'].values.tolist()y_value = x['ret_rate'].map(zero_handle).values.tolist()# print("x_value", x_value)# print("y_value", y_value)a, b = attenuation_coefficient2(x_value, y_value)x['a'] = ax['b'] = -b# print(a,b)x = append_predict(x)# print(x)return xdef group_data(rawData):rawData = rawData.groupby(['minmax_timestamp_f', 'u_dic_media']).apply(xy_data)rawData = rawData[~(rawData['u_dic_media'].isnull())]return rawDataif __name__ == '__main__':# 接受命令行输入文件路径argv = sys.argv[1]filelink = argvprint("filelink:", filelink)loadfile = sys.argv[2]print("loadfile:", loadfile)# rawData = read_data('x.txt')rawData = read_data(filelink)rawData = group_data(rawData)# 组合数据toTXT# rawData.to_csv('y.txt', index=False, header=False, sep=',', float_format="%.3f")rawData.to_csv(loadfile, index=False, header=False, sep=',', float_format="%.3f")


补充:其中系数a、b的计算除上述【lnx、lny对应的线性拟合,最小二乘法求解系数】外,还可以直接使用【非线性最小二乘方式求解系数】,代码如下:

前者和EXCEL (EXCEL公式获取幂函数系数解析) 求解结果一致,且R^2更大。

 

from scipy.optimize import curve_fitdef fit_func(x, a, b):return a * pow(x, b)def attenuation_coefficient(fields_avg, result_average):x = fields_avgy = result_average# 非线性最小二乘法拟合popt, pcov = curve_fit(fit_func, x, y)# 获取popt里面是拟合系数a = popt[0]b = popt[1]# yvals = fit_func(x, a, b)  # 拟合y值return a, b


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部