Predicting the success of entrepreneurial campaigns in crowdfunding: a spatio-temporal approach

Woods, Clinton; Yu, Han; Huang, Hong

doi:10.1186/s13731-020-00122-8

Research
Open access
Published: 31 July 2020

Predicting the success of entrepreneurial campaigns in crowdfunding: a spatio-temporal approach

Journal of Innovation and Entrepreneurship volume 9, Article number: 13 (2020) Cite this article

4297 Accesses
6 Citations
Metrics details

Abstract

As an alternative to traditional venture capital investment, crowdfunding has emerged as a novel method and potentially disruptive innovation for financing a variety of new entrepreneurial ventures without standard financial intermediaries. It is still unknown to scholars and people who use crowdfunding services whether the crowdfunding efforts reinforce or contradict existing theories about the dynamics of successful entrepreneurial financing as well as the general distribution and use of crowdfunding mechanisms. This paper presents new results obtained from investigating the Kickstarter campaign data of over ninety-nine thousand projects totaling about 1 billion USD in pledges from 2009 until the most recent 2017 through dynamical spatio-temporal modeling. The funding level, the percentage of a project’s goal actually raised from online communities, is used as the outcome of interest in the modeling to associate with dollar pledged and backer count that reflect the signals of underlying project quality. Evidence from the results was found to support the dynamic impact of the geographic location of a Kickstarter on its success and the associations between the observed project traits and the success of the entrepreneurial effort in the presence of the unmeasured spatio-temporal confounding. These results offer further insight into the empirical dynamics of the emerging phenomenon of online entrepreneurial financing about the role the spatio-temporal component plays in both the type of projects proposed and the association of sociocultural traits of successful fundraising with the underlying quality.

Introduction

One of the most critical of resources required for new ventures to succeed is financing. The financing promotes creative ideas, stimulate entrepreneurs to gather resources, hire workers, and transform resources into goods and services for society’s consumption (Frank, 1998). Crowdfunding and its concepts have been around for a few hundred years as a method for raising funds for ventures that many people want or need. The idea behind crowdfunding is to gain support from a relatively large number of small investors in order to fund a large project, thus generating the capital needed to start or maintain a venture without requiring the backing of wealthy donors. Crowdfunding is mostly viewed from the entrepreneurial perspective that startup capital is needed for the founding of a new business. Even though this is a common reason for crowdfunding, there have been other historical uses, such as war bonds to help fund a nation’s military effort or helping fund construction of the base of the Statue of Liberty in New York. Overall, crowdfunding’s greatest strength may be its ability to help people with an entrepreneurial spirit become business owners by overcoming the barrier that stops many: a lack of available capital, as well as its ability to transform ordinary customers into business investors (Ordanini, Miceli, Pizzetti, & Parasuraman, 2011), (Belleflamme, Lambert, & Schwienbacher, 2014), (Mollick, 2014).

With the widespread use of the internet in recent years, crowdfunding has emerged as a novel method and potentially disruptive innovation for financing a variety of new entrepreneurial ventures without standard financial intermediaries. The internet is responsible for giving people the connection they need to find investors who are willing to fund entrepreneurial efforts. There are three large companies based in the United States that have given people platforms that allow them to market their ideas to the world: Kickstarter, Indiegogo, and GoFundMe, which process billions of dollars in campaigns each year. Websites such as Kickstarter [https://www.kickstarter.com/] have made user interfaces that are easy to follow and give people an effective template when presenting their ideas, helping their users attract possible donors. The ideas presented in entrepreneurial crowdfunding cover everything from art and music to technology and food, a fact which demonstrates crowdfunding’s power to open up the business world to people with any set of skills (Schwienbacher & Larralde, 2010). Crowdfunding trends of products and ideas vary in different locations, with many happening in the United States, Europe, and Australia. Crowdfunding has gained popularity over time, with an ever-increasing number of campaigns and investors participating. It was relatively popular from the start, and it has rapidly grown in prominence since then.

As an alternative to traditional venture capital investment, it is still unknown to scholars and people who use crowdfunding services what makes for a truly successful drive to obtain investors and whether the crowdfunding efforts reinforce or contradict existing theories about the dynamics of successful entrepreneurial financing as well as the general distribution and use of crowdfunding mechanisms (Agrawal, Catalini, & Goldfarb, 2010), (Burtch, Ghose, & Wattal, 2011), (Mollick, 2014). One of the most difficult parts in entrepreneurship research is dealing with sociocultural facets that have the elusive nature of preparedness, creativity, perseverance, and the capability of transforming old values into more appropriate ones as the entrepreneurial life starts (Vuong, 2016). Due to the complexity and complication arising from a diverse range of entrepreneurial goals and approaches (Schwienbacher & Larralde, 2010), a taxonomy of causes and effects in the dynamics of entrepreneurship process would be rarely complete and effective, especially when considering the sociocultural and spatio-temporal factors in the large scale.

As crowdfunding becomes more and more popular alternative financing, many researchers have explored various methods to understand the dynamics behind it. Mollick (2014) (Mollick, 2014) took a holistic view and proposed that personal networks and underlying project quality as well as geography were the most important factors in determining the success of crowdfunding. (Mitra & Gilbert, 2014) took a different approach and focused on analyzing the language used in crowdfunding. By studying a huge corpus of texts presented in 45,000 projects, they found that phrases following certain principles such as reciprocity, scarcity, and social identity increased the chance of success. (Vuong, 2016) selected a group of factors seen as critical to the understanding of entrepreneurial efforts based on the extant literature of entrepreneurship. Evidence was found to support the relationship between sociocultural traits and entrepreneurship-related performance or traits adjusting for geographical locations.

Pinpointing the causes of all the changes and trends that appear to happen randomly at any given time or location in the ever-changing environment is something that eludes the observers of the entrepreneurial activities. If one were able to infer what the causes of a successful crowdfunding effort were at any given location and time, then this could enhance the understanding of these kinds of ventures and help future entrepreneurs make the right plans when starting a campaign. Finding the causes could also help us predict the future outcome of any given campaign or even predict the outcomes of many campaigns, thus perhaps finding a trend in its early stages before it even begins. Spatio-temporal variability is the key to understand all this.

The complexity of the underlying dependence structure of the spatio-temporal component seems to be growing visibly. Changing trends in different times and locations can be observed. Many locations have experienced a change in the types of ideas that are most successful over time and others seem to discover online crowdfunding for the first time. Understanding the dynamics of crowdfunding at many different times and locations is useful in helping entrepreneurs or small enterprises to raise the essential capital from the crowd to support their projects or business. This paper first time conducts the research exploring the spatio-temporal pattern of the success of campaigns in crowdfunding using spatio-temporal statistical approach for additional insight into the dynamics of crowdfunding.

The rest of the paper is organized as follows. We begin with the methods in “Methods” section. Specifically, “Data” section introduces the data followed by the descriptive analytics in “Descriptive patterns” section to provide insight into the past and figure out what has happened; Then the spatio-temporal model is developed for an in-depth analysis of the dynamics of a successful crowdfunding campaign in “Spatio-temporal model” section. After that, the results are presented in “Results” section. Discussion and conclusion are finally given in “Discussion” and “Conclusion”.

Methods

Data

The data scraped in its original form consists of 104 csv data sets spanning the time from 2009 until 2017. When merged, the data sets contain 36 million observations, many being duplicates due to web scraping taking place each month and projects going from inception to deadline within months as well. Some may have been projects that were restated as well. The campaign IDs of the observations were used to remove the duplicates. Many observations were also missing. Therefore, variables with observations too incomplete were removed from the overall data set for this study. As a result, a data set with 99,036 observations totaling $1,064,392,179 USD in pledges was created. The data has observations from all over the world, with most originating from the United States and North America. This data set should be a good representation of Kickstarter campaigns and possibly any crowdfunding platform that can be used on the internet to help entrepreneurs gain startup capital.

Our statistical goals are to visualize, summarize, and infer the dynamical behavior of crowdfunding. The complex spatio-temporal data are a window to the underlying complex dynamics of crowdfunding, from which the extraction and description of information are challenging. For analytical understanding of crowdfunding, the descriptive analytics via summarization and visualization are considered in conjunction with spatio-temporal modeling because they suggest relationships that can be incorporated into spatio-temporal models for the purpose of inference. The development of exploratory and diagnostic methods for spatio-temporal data is an important research topic and will remain so in the future.

Descriptive patterns

Prior to any statistical modeling for the dynamics underlying the complex data set, the variability of the data across locations over time and major variables appeared in the literature are described beginning with the information on worldwide campaigns. The goal of this part is to develop initial evidence about the nature of crowdfunding, which is appropriate for an evolving topic in the evolving field of entrepreneurship. Figure 1 shows a general map of the geographical distribution of the locations of the Kickstarter campaigns. Notice that places that speak English tend to use the Kickstarter platform at much higher rates than other nations around the world. Table 1 gives a solid look at which nations are using this platform the most. Notice that the US makes up a majority of the campaigns while many of the other nations make up just enough observations to possibly produce some kind of understanding of how a successful Kickstarter happens across countries.

Table 1 Descriptive statistics of international distributions

Full size table

The first bar plot in Fig. 2 shows the change of the status (canceled, failed, live, successful, and suspended) and the associated number of campaigns for each status across the cities observed in the USA superimposed with a median pledge in USD. Some cities appear to be more popular than others for a campaign, while there is variability when looking at successful cities versus unsuccessful cities. A chi-squared test against the null hypothesis that status and city are independent reports the observed chi-squared statistic x² of 19,590 with P value < 0.0001 based on 12,840 degrees of freedom, indicating that status is related to location. The influence of geographic location will be considered for incorporation into the model for statistical inference. The second bar plot in Fig. 3 shows the status based on the categories with median pledge USD by line: canceled, failed, live, successful, and suspended. There is a noticeable variation in popularity across the categories, while different categories also show different levels of success. A chi-squared test against the null hypothesis that status and categories are independent reports a P value < 0.0001 based on the observed x² of 22,253 with degrees of freedom 56. This indicates that the status of a campaign associated with the categories of the campaign. In both bar plots, notice that some categories and geographic locations seem much more successful than their counterparts. When looking into the live median pledges per city in the data, some cities are showing a higher rate of pledges happening at the current time than the historic popularity of the city would have you believe. A chi-squared test against the null hypothesis that categories were independent of the US state where they are reports a P value < 0.00001 for the observed x² of 7043 on 686 degrees of freedom, indicating that the categories are dependent on what US state the campaign is in. This suggests possible trends developing for both the cities and categories of Kickstarter campaigns.

The histogram in Fig. 4 shows goals in red versus the actual pledges by donors in blue. Notice that the goals seem to ask for more money than what the actual campaigns seem to achieve. Pledge goals seem to lay in a narrower region than the more variable pledged money. The Kickstarters overall seem to overestimate and set goals much higher than they will actually receive. Many in the red zone of the histogram may fail, while the campaigns existing in the purple zones may represent successes. Based on the fact that both can be log-transformed to a normal distribution, a t test against the null hypothesis that the goal USD is higher than the pledged USD actually obtained reports a P value < 0.0001.

The histogram in Fig. 5 displays the percentage of the goal that a project obtains by its end on a log scale due to large right skewness. There are two modes for the failed and live categories sitting near a low percentage of backers and for the success category. On one hand, many of the failed projects do not even come close to reaching their goals. On the other hand, most successful projects got backed only by making the minimum funding to reach their goals, while some others got much more than a 100% backing for their projects. This brings up the question of how a campaign achieves more than the goal set. These represent a further possible group of extremely successful campaigns that go above and beyond just a simple success.

Figure 6 displays a histogram of the number of backers across status with blue being successes and brown failures. There are others that have numbers too small to notice, such as the suspended category. Notice that there are many more backers on the more successful projects, whereas the failed projects seem to stay down near the lower end of the backer count. This shows a correlation between project success and backer count in which projects with larger numbers of backers are more likely to be successfully funded. Also notice that failed and successful campaigns cross over in the number of backers with some failures having a lot of backers and some successes having very few.

The maps in Fig. 7 exhibit the average number of backers in each US state across all the years being studied, beginning with 2009 in the top left and concluding with 2017 in the bottom right. You can see that there are more backers every year, indicating the increasing popularity of this platform to make money for a project. As the overall popularity of the projects goes, they have gone from a mean of 40.62 backers per project in 2009 to a mean of 154 backers per campaign in 2016, with 2016 being the last year for which the data includes for all 12 months. Each state’s backer count increases differently when compared to its neighbors as overall popularity is growing. This seems to display local trends happening in states in different years. Montana’s projects in 2017, on average, seem to be very popular compared with many other states in the US. A chi-squared test against the null hypothesis of independence between mean backers of states versus years reports a P value < 0.00001. In contrast to the state maps in Fig. 7 that show the variability of the number of backers over time, Fig. 8 displays the variability in mean US dollar amount of pledges per county in the United States. The darker the color, the higher the amount donated from that county. There exist areas in which campaigns are gaining quite a bit of money from pledges compared with other parts of the US. This map seems to show that many of the campaigns have concentrations of entrepreneurship where people come up with successful ideas. As seen in the bar plot in Fig. 2 provided earlier, there are certain cities that see a larger portion of the Kickstarter activity, which could also be the areas that pull in the most money on average. Another set of maps that are too cluttered to add to this paper show the mean US dollar amount of pledges per county in the United States from the years 2009 to 2017. The maps seem to show that more and more counties are donating more money as time goes on. The states and counties all appear to sit around more populated areas, which could be attributed to population density or other latent variables that explain why smaller areas have a more meager mean spending on Kickstarter. Due to this being an internet platform, one may think that the mean money made per project in each county would be more similar, but this is not the case. The descriptive statistics for US states are presented in Table 2, showing how the states differ numerically.

Table 2 US states descriptive statistics

Full size table

The bar graph in Fig. 9 represents the percentage of backers in city populations across all time. You can see the change in how popularity is viewed: for example, Los Angeles being the city with the most overall backers might only be so because of a larger population. San Francisco and Salt Lake seem to have had an extremely successful number of backers based on population size. So even in an extreme case such as Salt Lake having a very small comparative population, it has quite a few project backers. This also seems to be the case when looking at the money each city made over the whole time period. San Francisco still seems to reign supreme as the city that has made the most money in a Kickstarter. This appears true even when Los Angeles has the most apparent projects based on both successes and overall volume. What truly drives smaller cities to be more successful per capita than larger cities does indeed deserve a deeper analysis to understand what makes a successful spatial location other than the largest population.

Figure 10 is a word cloud based on the most common words used in the titles of successful campaigns, where the larger the word the more common its usage. The words that were found equally commonly in successful and unsuccessful campaigns are removed so as to focus on the words that appear more often in successful campaigns. In the word cloud, you can see the words “record,” “debut,” and “tour” as top words, which are words you may think of as relating to music as shown in Fig. 3, where music is the most common category and has the most overall success. There are also many other words that appear in the more successful campaigns. Blurbs, which are small paragraphs explaining the Kickstarter campaigns, might also be found within the data sets. Often times, these blurbs provide similar words when analyzing successful campaigns. Blurbs may help refine what makes a successful campaign further than the numerical data alone.

The blurbs and titles consist of unique sentences for the campaign that they are attached to. A factor was then created that was weighted by attributes hidden in the blurbs and titles. To make the factor, the main data set was separated into two unique data subsets. The first weighting for the factor was created by taking the two subsets of the data, one being all failed campaigns and the other all successful campaigns. The tm, snowballC, and wordcloud packages in R were used to prepare and separate the words used in blurbs and titles based on being a failed or successful campaign. After the data was prepared, a count was then taken of the words that appeared the most in both successful and failed campaigns. This was then further cleaned by taking the top 500 most common words of both successful and failed campaigns. The last step was then to only take the top words that uniquely appeared on successful campaigns, thus removing any words that appeared in both of the top 500 data sets. For this weighting, the top 50 unique words left in both blurbs and titles were extracted.

To get the other weighting for the factor, a new subset of the data was made. This data set consisted of only successful campaigns that were in the 90th percentile in terms of the number of backers or of dollars earned. The tm, snowballC, and wordcloud packages in R were then used to prepare and separate words used for blurbs and titles. Words that were related to extremely successful campaigns were often not unique, suggesting similar marketing strategies. A count was taken of the most common words in this new category. The top 50 most common words for both blurbs and titles were extracted for weighting. This gave us two categories for blurbs and two categories for titles, a total of four new sets of words. For each of the new sets of words, the grepl function in base R was used; this function queries strings of words for specific sets of words or characters and returns a TRUE or FALSE value. After using grepl for each of the new sets of words, four new columns were created that contained either TRUE or FALSE based on if a title or blurb had one of the words in the four sets. The TRUE and FALSE values were changed to 1 representing TRUE and 0 representing FALSE. The new ones and zeros were used as weights for summation. After the four columns were summed, they gave a new factor column that ended up being called Twords with levels coded as 0, 1, 2, 3, and 4 as shown in Table 3.

Table 3 Statistics of the new Twords variable

Full size table

In addition to the examined variables here that are major variables having appeared in the literature, there may be many more relevant variables that need to be identified and screened to predict the behavior of a project. The semiparametric structured modeling is natural for complex data with variables of different types in different formats and technical variables of high dimensions. The spatio-temporal data consists of strings of words, such as the title and what is written about the campaign to help entice donors to give their project a shot as well as many relevant covariates important to the possible trends and evolving behavior of crowdfunding projects.

Spatio-temporal model

More recently, standard machine learning algorithms are the only major methods applied to study the elements of a successful crowdfunding campaign in the existing literature. As is well known, most standard machine learning algorithms are based on the independence of the observed values replicated from the same model, where these observed values are considered independent realizations of the same random variable. The First Law of Geography states: “Everything is related to everything else, but near things [in space and time] are more related than distant things” (Tobler (Tobler, 1970)). When the observed values are anchored in space and time, the assumption of independence is no longer realistic. The dependence structure of crowdfunding campaigns rising from space and time is the key to understand the dynamics of crowdfunding and should be considered in the modeling.

To mitigate unobserved spatial confounding, evaluate the impact of the geographic location of a Kickstarter, and further understand the dynamics of online entrepreneurial crowdfunding efforts (Mollick (Mollick, 2014), Vuong (Vuong, 2016)), we employ a hierarchical dynamical spatio-temporal framework (Cressie and Wikle (Cressie & Wikle, 2011)) to develop a hurdle model with the data from the Kickstarter platform. The dynamical hierarchical hurdle model describes the variability of the outcomes that are more correlated when close in space and time than outcomes that are collected further away. The hurdle model was employed to predict the funding level, the percentage of a project’s goal actually raised from online communities, with the dollar pledged and backer count of a crowdfunding effort that reflects the signals of underlying project quality in conjunction with spatio-temporal component. The predicted funding level can readily be shifted to the outcome of success and failure.

The data can be considered a realization of a dynamical process for funding level indexed by geographical locations and time points in a study region D (i.e., USA in this paper) that resides in the 2-dimensional space R² (US states) and the 1-dimensional temporal line R (years)

$$ Y\left(s,t\right)\equiv \left\{y\left(s,t\right),\left(s,t\right)\in D\subset {R}^2\times R\right\}, $$

where s is the geographical location in the study region D of the United States and t is the year from 2009 to 2017. The spatial component of the data was modeled on US states.

Spatio-temporal components

The predictor η_ist for the spatio-temporal components of a Kickstarter project is represented as follows,

$$ {\eta}_{\mathrm{ist}}={b}_0+{b}_1{x}_i+{u}_s+{v}_s+{\gamma}_t+{\phi}_t+{\delta}_{st}, $$

(1)

where b₀ is the intercept, b₁ is the vector of linear fixed effects of the vector of observed covariates x_i, u_s is spatially structured effect; v_s is the spatially unstructured effect, γ_t is the temporal process, ∅_t is independent temporal effect, and δ_st is the spatio-temporal interaction.

The structured spatial effect

$$ \operatorname{}{u}_s\mid {u}_{s^{\prime }},{s}^{\prime}\ne s,{\tau}_u\sim N\left(\frac{1}{n_s}\sum \limits_{s^{\prime}\sim s}{u}_{s^{\prime }},\frac{1}{n_s{\tau}_u}\right), $$

(2)

where n_s is the number of the neighbors of state s, s ∼ s' indicates that the two states s and s' are neighbors, and the precision parameter τ_u is represented as θ₁ = ln (τ_u) and the prior is defined on θ₁. v_s is the spatially unstructured effect.

The structured temporal effect of the component can be represented as the random walk process of order two (RW2). The random walk process of order two for the Gaussian vector γ = (γ₁, …, γ_T) is constructed assuming independent second-order increments.

$$ \operatorname{}{\gamma}_t\mid {{\gamma_t}_{-1}}_{,}{\gamma}_{t-2}\sim N\left(2{\gamma}_{\mathrm{t}-1}-{\gamma}_{t-2,}{\tau_{\gamma}}^{-1}\right). $$

(3)

The precision parameter τ_γ is represented as $ {\tau}_{\gamma }={e}^{\theta_2} $ with a prior on θ₂. The unstructured component ∅_t of the model is represented with an interchangeable model. This model simply defines ∅ = (ϕ₁, …, ϕ_T) to be a vector of independent Gaussian random variables with mean zero and precision τ_∅, i.e.,

$$ {\phi}_t\sim N\left(0,\frac{1}{s_i{\tau}_{\phi }}\right), $$

(4)

where s_i > 0 is a scalar.

The δ_st is the spatio-temporal interaction between the spatially and temporally structured effects u_s and γ_t called type IV interaction by Blangiardo and Cameletti (2015) (Blangiardo & Cameletti, 2015), which can be represented by a structured matrix R_δ of rank (n − 1)(T − 2) for a RW2. The structured matrix can be written as a Kronecker product of R_δ = R_u ⊗ R_γ. This assumes that the temporal dependency structure for each area is not independent from other areas and areas depend on the temporal patterns of their neighbors as well. This type of interaction is the most appropriate for the data under the assumption that Kickstarter is online assuming so that interaction is highly dependent.

Two linked hurdle mediation models

There are 11,729 campaigns out of 99,036 that never received any backers at all, resulting in these 11,729 campaigns having zero USD pledged. There are two typical models available to account for this potentially high occurrence of zeros in failed campaigns (Hu et al. 2011). The first is zero-inflated models assuming that all projects have a certain chance to obtain a zero, i.e., all zeros have two different origins: structural origin and sampling origin. In such a model, sampling zeros occur by chance, while other zeros are observed due to some specific structure in the data. The second is a hurdle model with a latent factor Z_i taking 0 and 1 with 1 indicating an observation that passes a hurdle (success) and is defined as a positive count and 0 indicating an observation that does not pass the hurdle (failure) and is defined as a zero count. The assumption for the hurdle model is that all zero observations are considered from one structural source. There are a considerable number of failures obtaining zero count of backers, it is plausible to consider a campaign with zero backers to be a complete failure representing an anomaly in contrast with the rest of the failures that gained at least a positive count of backers, i.e., the zero counts most likely are from one structural source and need to be studied separately while the positive counts of backers have a sampling origin, implying choice of the hurdle model over the zero-inflated model. The binary latent variable Z_i represents the origin of data, with 1 representing a positive count and 0 a zero count for the hurdle model

$$ {Z}_i=\Big\{{\displaystyle \begin{array}{l}0,\kern1em with\kern0.5em probability\kern0.6em 1-{\pi}_0\\ {}1,\kern1em with\kern0.5em probability\kern0.6em {\pi}_0\end{array}}\operatorname{} $$

(5)

where π₀=Pr(Z_i = 1) and logit(π₀) = η_ist. Conditional on the binary latent variable Z_i, the hurdle models for $ {X}_{\mathrm{ist}}^{(1)} $ representing backer count and $ {X}_{\mathrm{ist}}^{(2)} $ USD pledged for project i can be specified as the finite mixture models

$$ p\left(\operatorname{}{x}_{\mathrm{ist}}^{(j)}|{Z}_i={z}_i\right)=\Big\{{\displaystyle \begin{array}{c}1-{\pi}_0,\kern4em {z}_i=0\\ {}{\pi}_0f\left({x}_{ist}^{(j)}\right),\kern2em {z}_i=1\end{array}}\operatorname{}\kern1em j=1,2 $$

(6)

where $ f\left({x}_{\mathrm{ist}}^{(j)}\right) $ is the probability density function for the positive count of $ {X}_{\mathrm{ist}}^{(j)} $. Backer count $ {X}_{\mathrm{ist}}^{(1)} $ has many campaigns with a smaller number of backers and a very large and gradual tail of backers. Gamma distribution with predictor η_ist makes the most sense for $ {X}_{\mathrm{ist}}^{(1)} $ due to the heavy right skew of the data, i.e., $ {X}_{\mathrm{ist}}^{(1)}\sim $Gamma(sϑ, μ_ist), where s is a fixed scaling factor. The ϑ is reparameterized as $ \vartheta ={e}^{\theta_3}\ with $ the prior log-Gamma for θ₃.

The pledged USD $ {X}_{\mathrm{ist}}^{(2)} $ can be specified as lognormal distribution, i.e., log($ {X}_{\mathrm{ist}}^{(2)} $) ~ N(η_ist, τ₁) by transformation using the max of the observations subtracted by the values with the log-gamma prior for θ₃ in the reparameterization $ {\tau}_1={e}^{\theta_4} $.

Funding level model

The funding level, the percentage of a project’s goal actually raised from online communities, is used as the outcome of interest in the modeling to associate with dollar pledged and backer count that reflect the signals of underlying project quality. The percent of a project’s goal actually raised decides if a campaign is a failure or success since only campaigns with 100% funded or more will be deemed successful by the Kickstarter platform. The spatio-temporal component can play an important role in both the type of projects proposed and the sociocultural traits of successful fundraising related to the underlying quality. Twords, city population, length of a Kickstarter, length of time for preparation, categories such as music or technology that a project falls in, US county, US states, and years that a campaign takes place are used as the predictors for additional covariate information on each effort. All covariates were included based on the availability of the information for the Kickstarter.

The funding level Y_ist is specified as a lognormal distribution

$$ \left.\log \left({Y}_{\mathrm{ist}}\right)\right|{l}_{\mathrm{ist}},{\tau}_2\sim N\left({l}_{\mathrm{ist}},{\tau}_2^{-1}\right) $$

(7)

conditional on the predictor consisting of the two model components of pledged USD and backer count

$$ {l}_{\mathrm{ist}}={\beta}_0+{g}_1\left({X}_{\mathrm{ist}}^{(1)}\right)+{g}_2\left({X}_{\mathrm{ist}}^{(2)}\right)+\beta {x}_i+{u}_s+{v}_s+{\gamma}_t+{\phi}_t+{\delta}_{st}. $$

(8)

The log-Gamma prior was used for θ₅ in $ {\tau}_2={e}^{\theta_5}. $

As the hierarchical model is richly parameterized to deal with the large spatio-temporal data sets, known as the “big n problem” (see Banerjee et al. (Banerjee, Carlin, & Gelfand, 2004), Page 387; (Lasinio et al. 2013)), and the functional form of the posterior distribution is nonstandard and unknown in practice, simulation-based MCMC is not computationally feasible for the Bayesian inference. Integrated Nested Laplace Approximation (INLA) proposed by Rue et al. (Rue & Held, 2005; Rue, Martino, & Chopin, 2009) was employed as a deterministic algorithm to perform approximate fully Bayesian inference as a valid and computationally effective alternative to the simulation-based Monte Carlo Markov chain (MCMC) method. INLA was developed based on the Laplace method of transformation to approximate the integrand with a second-order Taylor expansion around the mode and computes the integral analytically. It provides faster and more accurate results in shorter computing time compared with the MCMC scheme, especially for latent Gaussian models with large-scale data (Rue et al. (Rue et al., 2017), Bivand et al. (Bivand et al., 2015) and Ferkingstad et al. (Ferkingstad et al. 2017)). Bayesian hierarchical modeling with latent Gaussian processes has proven very flexible in capturing complex stochastic behavior in hierarchical structures in high-dimensional spatial and spatio-temporal data (Opitz (Opitz, 2017)).

We trained the model with the strategy of cross-fitting (Chernozhukov et al. (Chernozhukov et al., 2018)), which provides an efficient form of data-splitting into four samples of equal size so that the model components can be trained and tested at each step in the proposed multi-stage fitting procedure. The first subset of the data was used to train the hurdle model component. The second subset of the data was used to test the hurdle model component and then can be used to predict the backer count and the pledged USD after the hurdle model component is fitted. The third subset of the data was used for testing the accuracy of the predicted backer count and pledged USD and training the funding level model using the two predictions. The fourth subset of the data will be used to evaluate predictive ability using correlation, which gives an evaluation of how accurate this method is at finding a successful Kickstarter campaign. All the components will be trained based on the assumption that they are both spatially and temporally dependent.

The first random subset data was used to train the model for Z_i. The deviance information criterion (DIC) was used for model fit. The lowest DIC was 15649. After predicting zeros and ones for each Kickstarter campaign on the second subset data, the projects that are predicted as Z_i=1 from the second subset data will then be picked out to be used to train subsequent backer account and pledged USD components.

The backer count component based on the hurdle model is used for further investigating traits attributing to the success of a campaign in the Kickstarter campaigns obtaining positive counts of backers. The model was trained using the second subset data and tested on the third subset data. The model predicted 42% of the actual values. The best model is based on the smallest DIC = 65,898.29 with interaction IV component compared with any other model without interaction IV component, which has a DIC greater than 160,000.

The trained model for pledged USD was able to predict 41% of the actual values in the third subset data. The best fit model for the pledged USD with a DIC of 83,180.84 compared with any other model with a DIC larger than 150,000 that did not apply the interaction IV of the spatial and temporal components.

The predicted $ {\hat{X}}_{\mathrm{ist}}^{(1)}\mathrm{and}\kern0.50em {\hat{X}}_{\mathrm{ist}}^{(2)} $ can now be added to the third subset data for the funding level model, each of which gives more than 40% of the information that could have been obtained if getting backer count and pledged USD was possible before one launches a campaign. The third subset data with the newly predicted variables was used in the funding level model. After the funding level model was trained, the fourth subset of the data was used as a test data set.

Results

The model with type IV interaction has a DIC of 90,578.29 and obtains the best fit compared with other models trained of DIC’s near 200,000. Figure 11 shows the scatterplot separated by two red lines, which represent 100% of the actual (horizontal) and predicted (vertical) goal. Points in the top right section or the bottom left section represent accurately predicted campaigns as a successful campaign or a failed campaign respectively. This spatio-temporal model with type IV interaction was able to predict successes and failures 79% of the time regardless of how close to the actual value the prediction is. Being that the main goal of most entrepreneurs is to have a successful campaign, predicting how successful over 100% is just an additional beneficial piece of information. Different states do have better successes than others.

Figure 12 shows the posterior marginal distributions for the model hyperparameters (top and middle rows of panels) and the posterior means pattern with 95% pointwise posterior intervals for all states (bottom panel). The concentrations of the posterior marginals of the model components are all significantly different from zero. The results indicate the geographic location of a Kickstarter plays a role in its success. The effect of the location where a Kickstarter began can be slight to drastic on the performance of Kickstarter campaigns. The posterior variance for the hyperparameters is narrow compared with the overall range, something that is also true with the fixed effects. This narrow variance can show that the model is most likely not overfit and that the data is being explained in a general way so as to help predict future campaigns that are yet to happen.

Pledged USD to a campaign has the strongest pull on the model while the goal is the second strongest variable in the model. There is also important information on how categories are being explained, such as the Twords factor level four having the most positive pull on the model. This seems to make sense though, as Twords was constructed using words that were common among successful campaigns. Category-wise, comics, design, games, and technology had the biggest positive pull on the model whereas crafts, fashion, and journalism, had a negative but smaller impact on the overall model.

Discussion

The proposed dynamical spatio-temporal model for the data did a fair job of giving promising results. The prediction appears to be reasonably strong with around 80% prediction rates. As far as sheer prediction goes, these models are performing strongly and can give entrepreneurs some insight into how well their Kickstarter campaign may perform based on the variables that they can know beforehand. The presence of type IV interaction showed that a geographic location is not just affected by its neighbors but also by time in its location. This supports the dynamic impact of the geographic location of a Kickstarter on the success of a Kickstarter campaign when it came to prediction.

The random forest algorithm was applied to the data following the same steps for the sake of comparison. The data was split randomly into four subsets of the same size and the hurdle model was fitted. The random forest algorithm was applied to each step, from predicting the back count and the pledged USD to making the final prediction on the reached funding level. The random forest algorithm was able to predict 81% of the actual values and produced very similar results compared with those of the spatio-temporal modeling. The range predicted by the spatio-temporal model was wider than the scope predicted by the random forest within the distribution of the observed data. The random forest algorithm was biased towards more central values, thus the random forest algorithm underestimated the variance of outcomes as well as it cannot predict more extreme values. Another notable difference was that the random forest algorithm was more likely to predict failed campaigns more accurately while the spatio-temporal model correctly predicted successes more often.

Conclusion

Crowdfunding is a novel method and potentially disruptive innovation for funding a variety of new entrepreneurial ventures, allowing individual founders of for-profit, cultural, or social projects to request funding from many individuals in online communities, often in return for future products or equity. Today, crowdfunding is becoming a major way for entrepreneurs to achieve their dreams. Crowdfunding is mostly viewed from the entrepreneurial perspective as financing including startup capital, one of the most critical of resources required for new ventures to succeed. It is still unknown to scholars and people who use crowdfunding services what makes for a truly successful drive to obtain funding and whether the crowdfunding efforts reinforce or contradict existing theories about the dynamics of successful entrepreneurial financing and the general distribution and use of crowdfunding mechanisms.

This is the first study in the literature using spatio-temporal modeling to understand the dynamics of a successful crowdfunding effort as well as the dynamical impact of geographical locations. Employing spatio-temporal modeling is able to mitigate unobserved spatial confounding when estimating the effect of the factor on a successful entrepreneurial financing, evaluate the impact of the geographic location of a Kickstarter and predict unknown values at unmeasured locations and at future times. Our study involves crowdfunding data collected with explanatory variables at spatial locations from 2009 to the most recent 2017. One distinctive feature of this kind of data is that the data are spatially and temporally indexed to support exploring the hidden dependence structure that is not addressed in the standard machine learning methods through the covariance function of a stochastic process in the spatio-temporal model. The covariance function kernel is essential for the prediction of value at an unobserved location or time. Modeling the covariance function appropriately may improve the efficiency of the estimation of the determinants of a successful crowdfunding campaign and offset the effects of the unobserved sociocultural traits that may affect the determinant under investigation. The spatio-temporal model includes the two components, a systematic component with available explanatory variables and the spatio-temporal correlation component, and how the two components interact to produce reliable forecasts. Such models can thus be reliably used to produce maps and to identify regions (problem or success areas) in the crowdfunding campaign where, for example, the level of performance exceeds the permissible level and thus could be of importance to the success of a new project.

This paper presents new results obtained from investigating the Kickstarter campaign data of over ninety-nine thousand projects totaling about 1 billion USD in pledges from 2009 to the most recent 2017 through spatio-temporal modeling. The funding level is used as the outcome of interest in the modeling to associate with dollar pledged and backer count that reflect the underlying signals of project quality. The spatio-temporal component plays an important role in both the type of projects proposed and the sociocultural traits of successful fundraising related to the underlying quality. Evidence from the results was found to support the impact of the geographic location of a Kickstarter on its success and the associations between the observed project traits and the success of the entrepreneurial effort in conjunction with the spatio-temporal component. These results offer further insight into the empirical dynamics of the emerging phenomenon of online entrepreneurial financing.

Future work will need to be focused on the hurdle in the model, as figuring out what causes a Kickstarter to fail outright with zero backers seems to be a big obstacle to entrepreneurs and deserves further analysis. It is also necessary to look into the spatial and temporal components and focus on what the fixed effects do through time at each location.

Availability of data and materials

The scraped data are available at https://webrobots.io/kickstarter-datasets/.

Abbreviations

USD:: US dollar
INLA:: Integrated Nested Laplace Approximation
DIC:: Deviance information criterion
MCMC:: Markov chain Monte Carlo

References

Agrawal, A., Catalini, C., and Goldfarb, A. (2010) The geography of crowdfunding. SSRN Electronic Journal.
Banerjee, S. Carlin, B. P. and Gelfand, A. E. (2004) Hierarchical modeling and analysis for spatial data. Boca Raton: Chapman & Hall/CRC.
Belleflamme, P., Lambert, T., & Schwienbacher, A. (2014). Crowdfunding: Tapping the right crowd. Journal of Business Venturing, 29(5), 585–609.
Article Google Scholar
Bivand, R., Gómez-Rubio, V., & Rue, H. (2015). Spatial data analysis with R-INLA with some extensions. Journal of Statistical Software, 63(20), 1–31.
Article Google Scholar
Blangiardo, M. and Cameletti, M. (2015) Spatial and spatio-temporal Bayesian models with R-INLA. Chichester: John Wiley & Sons, Ltd.
Burtch, G., Ghose, A., and Wattal, S. (2011) An empirical examination of the antecedents and consequences of investment patterns in crowd-funded markets. SSRN Electronic Journal.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68.
Article Google Scholar
Cressie, N., and Wikle, C. K. (2011) Statistics for spatio-temporal data. John Wiley & Sons.
Ferkingstad, E., Held, L., & Rue, H. (2017). Fast and accurate Bayesian model criticism and conflict diagnostics using R-INLA. Stat, 6(1), 331–344.
Article Google Scholar
Frank, M. W. (1998). Schumpeter on entrepreneurs and innovation: A reappraisal. Journal of the History of Economic Thought, 20(4), 505–516.
Article Google Scholar
Hu, M. C., Pavlicova, M., & Nunes, E. V. (2011). Zero-inflated and hurdle models of count data with extra zeros: Examples from an HIV-risk reduction intervention trial. The American Journal of Drug and Alcohol Abuse, 37(5), 367–375.
Article Google Scholar
Lasinio, G. J., Mastrantonio, G., and Pollice, A. (2013). Discussing the “big n problem”. Statistical Methods and Applications, 22(1),97–112.
Mitra, T. and Gilbert, E. (2014) The language that gets people to give: phrases that predict success on Kickstarter, proceedings of the 17^th ACM Conference on Computer Supported Cooperative Work & Social Computing 2014, 49-61.
Mollick, E. (2014). The dynamics of crowdfunding: An exploratory study. Journal of Business Venturing, Volume, 29(1), 1–16.
Article Google Scholar
Opitz, T. (2017). Latent Gaussian modeling and INLA: A review with focus on space-time applications. arXiv preprint arXiv:1708.02723.
Ordanini, A., Miceli, L., Pizzetti, M., & Parasuraman, A. (2011). Crowdfunding: Transforming customers through innovative service platforms. Journal of Service Management, 22(4), 443–470.
Article Google Scholar
Rue, H., & Held, L. (2005). Gaussian Markov random fields: Theory and applications. Monographs on statistics & applied probability. Boca Raton, FL: Chapman and Hall/CRC.
Book Google Scholar
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistics Society: Series B (Statistical Methodology), 71(2), 319–392.
Article Google Scholar
Rue, H., Riebler, A., Sorbye, S. H., Illian, J. B., Simpson, D. P., & Lindgren, F. K. (2017). Bayesian computing with INLA: A review. Annual Review of Statistics and Its Application, 4, 395–421.
Article Google Scholar
Schwienbacher, A. and Larralde, B. (2010) Crowdfunding of small entrepreneurial ventures. SSRN Electronic Journal.
Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 2(46), 234–240.
Article Google Scholar
Vuong, Q. H. (2016). Impacts of geographical locations and sociocultural traits on the Vietnamese entrepreneurship. SpringerPlus, 5(1), 1189. https://doi.org/10.1186/s40064-016-2850-9.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the reviewers who contributed significantly to the improvement of the article.

Funding

University of Northern Colorado Fund for Faculty Publications.

Author information

Authors and Affiliations

Department of Applied Statistics and Research Methods, University of Northern Colorado, Greeley, CO, 80639, USA
Clinton Woods & Han Yu
School of Information, University of South Florida, Tampa, FL, 33620, USA
Hong Huang

Authors

Clinton Woods
View author publications
You can also search for this author in PubMed Google Scholar
Han Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The study was jointly conceived and designed by Dr. HY and Dr. HH. Dr. HY and Dr. HH revised the article critically for important intellectual content. CW performed the statistical computing under the supervision of Dr. HY. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Han Yu.

Ethics declarations

Competing interests

The authors declare that there are no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Woods, C., Yu, H. & Huang, H. Predicting the success of entrepreneurial campaigns in crowdfunding: a spatio-temporal approach. J Innov Entrep 9, 13 (2020). https://doi.org/10.1186/s13731-020-00122-8

Download citation

Received: 07 December 2019
Accepted: 28 May 2020
Published: 31 July 2020
DOI: https://doi.org/10.1186/s13731-020-00122-8

Predicting the success of entrepreneurial campaigns in crowdfunding: a spatio-temporal approach

Abstract

Introduction

Methods

Data

Descriptive patterns

Spatio-temporal model

Spatio-temporal components

Two linked hurdle mediation models

Funding level model

Results

Discussion

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords