high_scalability high_scalability-2009 high_scalability-2009-606 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: (Please bare with me, I'm a new, passionate, confident and terrified programmer :D ) Background: I'm pre-launch and 1 year into the development of my application. My target is to be able to eventually handle millions of registered users with 5-10% of them concurrent. Up to this point I've used auto-increment to assign unique identifiers to rows. I am now considering switching to a non-sequential strategy. Oh, I'm using the LAMP configuration. My reasons for avoiding auto-increment: 1. Complicates replication when scaling horizontally. Risk of collision is significant (when running multiple masters). Note: I've read the other entries in this forum that relate to ID generation and there have been some great suggestions -- including a strategy that uses auto-increment in a way that avoids this pitfall... That said, I'm still nervous about it. 2. Potential bottleneck when retrieving/assigning IDs -- IDs assigned at the database. My reasons for being nervous about
sentIndex sentText sentNum sentScore
1 (Please bare with me, I'm a new, passionate, confident and terrified programmer :D ) Background: I'm pre-launch and 1 year into the development of my application. [sent-1, score-0.172]
2 My target is to be able to eventually handle millions of registered users with 5-10% of them concurrent. [sent-2, score-0.082]
3 Up to this point I've used auto-increment to assign unique identifiers to rows. [sent-3, score-0.292]
4 I am now considering switching to a non-sequential strategy. [sent-4, score-0.076]
5 Risk of collision is significant (when running multiple masters). [sent-8, score-0.117]
6 Note: I've read the other entries in this forum that relate to ID generation and there have been some great suggestions -- including a strategy that uses auto-increment in a way that avoids this pitfall. [sent-9, score-0.579]
7 Potential bottleneck when retrieving/assigning IDs -- IDs assigned at the database. [sent-14, score-0.082]
8 My reasons for being nervous about non-sequential IDs: 1. [sent-15, score-0.323]
9 To guarantee uniqueness, the IDs are going to be much larger -- potentially affecting performance significantly My New Strategy: (I haven't started to implement this. [sent-16, score-0.188]
10 I'm waiting for someone smarter than me to steer me in the right direction) 1. [sent-19, score-0.221]
11 Generate a guaranteed-unique ID by concatenating the user id (1-9 digits) and the UNIX timestamp(10 digits). [sent-20, score-0.159]
12 Convert the resulting 11-19 digit number to base_36. [sent-22, score-0.401]
13 The resulting string will be alphanumeric and 6-10 characters long. [sent-23, score-0.607]
14 This is, of course, much shorter (at least with regard to characters) then the standard GUID hash. [sent-24, score-0.223]
15 Pass the new identifier to a column in the database that is type CHAR() set to binary. [sent-26, score-0.203]
16 Is a 11-19 digit number (base 10) actually any larger (in terms of bytes) than its base-36 equivalent? [sent-34, score-0.335]
wordName wordTfidf (topN-words)
[('ids', 0.332), ('digits', 0.27), ('digit', 0.24), ('characters', 0.218), ('nervous', 0.207), ('resulting', 0.161), ('id', 0.159), ('alphanumeric', 0.144), ('guid', 0.135), ('char', 0.135), ('steer', 0.135), ('strategy', 0.132), ('uniqueness', 0.129), ('identifier', 0.124), ('flawed', 0.117), ('collision', 0.117), ('graphic', 0.117), ('reasons', 0.116), ('supplying', 0.114), ('complicates', 0.114), ('identifiers', 0.114), ('regard', 0.114), ('relate', 0.111), ('potential', 0.11), ('shorter', 0.109), ('designer', 0.107), ('timestamp', 0.1), ('larger', 0.095), ('affecting', 0.093), ('unique', 0.092), ('convert', 0.09), ('appreciate', 0.089), ('masters', 0.088), ('unix', 0.088), ('valid', 0.086), ('bare', 0.086), ('avoids', 0.086), ('assign', 0.086), ('smarter', 0.086), ('confident', 0.086), ('entries', 0.085), ('string', 0.084), ('suggestions', 0.084), ('registered', 0.082), ('assigned', 0.082), ('forum', 0.081), ('column', 0.079), ('pass', 0.078), ('direction', 0.077), ('switching', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 606 high scalability-2009-05-25-non-sequential, unique identifier, strategy question
Introduction: (Please bare with me, I'm a new, passionate, confident and terrified programmer :D ) Background: I'm pre-launch and 1 year into the development of my application. My target is to be able to eventually handle millions of registered users with 5-10% of them concurrent. Up to this point I've used auto-increment to assign unique identifiers to rows. I am now considering switching to a non-sequential strategy. Oh, I'm using the LAMP configuration. My reasons for avoiding auto-increment: 1. Complicates replication when scaling horizontally. Risk of collision is significant (when running multiple masters). Note: I've read the other entries in this forum that relate to ID generation and there have been some great suggestions -- including a strategy that uses auto-increment in a way that avoids this pitfall... That said, I'm still nervous about it. 2. Potential bottleneck when retrieving/assigning IDs -- IDs assigned at the database. My reasons for being nervous about
2 0.16678427 346 high scalability-2008-06-28-ID generation schemes
Introduction: Hi, Generating unique ids is a common requirements in many projects. Generally, this responsibility is given to Database layer. By using sequences or some other technique. This is a problem for horizontal scalability. What are the Guid generation schemes used in high scalable web sites generally? I have seen use java's SecureRandom class to generate Guid. What are the other methods generally used? Thanks Unmesh
3 0.10870203 145 high scalability-2007-11-08-ID generator
Introduction: Hi, I would like feed back on a ID generator I just made. What positive and negative effects do you see with this. It's programmed in Java, but could just as easily be programmed in any other typical language. It's thread safe and does not use any synchronization. When testing it on my laptop, I was able to generate 10 million IDs within about 15 seconds, so it should be more than fast enough. Take a look at the attachment.. (had to rename it from IdGen.java to IdGen.txt to attach it) IdGen.java
4 0.10145608 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
Introduction: Pinterest has been riding an exponential growth curve, doubling every month and half. They've gone from 0 to 10s of billions of page views a month in two years, from 2 founders and one engineer to over 40 engineers, from one little MySQL server to 180 Web Engines, 240 API Engines, 88 MySQL DBs (cc2.8xlarge) + 1 slave each, 110 Redis Instances, and 200 Memcache Instances.Stunning growth. So what's Pinterest's story? To tell their story we have our bards, Pinterest'sYashwanth NelapatiandMarty Weiner, who tell the dramatic story of Pinterest's architecture evolution in a talk titledScaling Pinterest. This is the talk they would have liked to hear a year and half ago when they were scaling fast and there were a lot of options to choose from. And they made a lot of incorrect choices.This is a great talk. It's full of amazing details. It's also very practical, down to earth, and it contains strategies adoptable by nearly anyone. Highly recommended.Two of my favorite lessons from the talk:Arc
5 0.098446161 716 high scalability-2009-10-06-Building a Unique Data Warehouse
Introduction: There are many reasons to roll your own data storage solution on top of existing technologies. We've seen stories on HighScalability about custom databases for very large sets of individual data (like Twitter) and large amounts of binary data (like Facebook pictures). However, I recently ran into a unique type of problem. I was tasked with recording and storing bandwidth information for more than 20,000 servers and their associated networking equipment. This data needed to be accessed in real-time, with less than a 5 minute delay between the data being recorded and the data showing up on customer bandwidth graphs on our customer portal. After numerous false starts with off the shelf components and existing database clustering technology, we decided we must roll our own system. The real key to our problem (literally) was the ratio of the size of the key to the size of the actual data . Because the tracked metric was so small (a 64-bit counter) compared to the unique ide
6 0.09042871 1205 high scalability-2012-03-07-Scale Indefinitely on S3 With These Secrets of the S3 Masters
7 0.088963687 561 high scalability-2009-04-08-N+1+caching is ok?
8 0.088235214 1303 high scalability-2012-08-13-Ask HighScalability: Facing scaling issues with news feeds on Redis. Any advice?
9 0.083141886 337 high scalability-2008-05-31-memcached and Storage of Friend list
10 0.075661488 115 high scalability-2007-10-07-Using ThreadLocal to pass context information around in web applications
11 0.072206065 757 high scalability-2010-01-04-11 Strategies to Rock Your Startup’s Scalability in 2010
12 0.071714677 449 high scalability-2008-11-24-Product: Scribe - Facebook's Scalable Logging System
13 0.069485962 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
14 0.066797301 1159 high scalability-2011-12-19-How Twitter Stores 250 Million Tweets a Day Using MySQL
15 0.066196144 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?
16 0.065761149 140 high scalability-2007-11-02-How WordPress.com Tracks 300 Servers Handling 10 Million Pageviews
17 0.065344386 1368 high scalability-2012-12-07-Stuff The Internet Says On Scalability For December 7, 2012
19 0.062643051 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
20 0.061397966 589 high scalability-2009-05-05-Drop ACID and Think About Data
topicId topicWeight
[(0, 0.098), (1, 0.044), (2, -0.01), (3, -0.029), (4, 0.009), (5, 0.004), (6, -0.002), (7, 0.006), (8, 0.002), (9, -0.028), (10, -0.009), (11, 0.048), (12, -0.005), (13, 0.001), (14, 0.0), (15, -0.01), (16, -0.017), (17, -0.018), (18, -0.012), (19, 0.002), (20, -0.01), (21, -0.027), (22, -0.017), (23, -0.001), (24, -0.011), (25, -0.039), (26, 0.031), (27, -0.011), (28, 0.004), (29, 0.012), (30, 0.017), (31, -0.018), (32, -0.007), (33, 0.019), (34, -0.011), (35, 0.004), (36, 0.016), (37, 0.024), (38, -0.023), (39, -0.039), (40, 0.002), (41, 0.026), (42, -0.008), (43, -0.014), (44, 0.002), (45, -0.017), (46, -0.031), (47, 0.007), (48, -0.04), (49, 0.017)]
simIndex simValue blogId blogTitle
same-blog 1 0.9462564 606 high scalability-2009-05-25-non-sequential, unique identifier, strategy question
Introduction: (Please bare with me, I'm a new, passionate, confident and terrified programmer :D ) Background: I'm pre-launch and 1 year into the development of my application. My target is to be able to eventually handle millions of registered users with 5-10% of them concurrent. Up to this point I've used auto-increment to assign unique identifiers to rows. I am now considering switching to a non-sequential strategy. Oh, I'm using the LAMP configuration. My reasons for avoiding auto-increment: 1. Complicates replication when scaling horizontally. Risk of collision is significant (when running multiple masters). Note: I've read the other entries in this forum that relate to ID generation and there have been some great suggestions -- including a strategy that uses auto-increment in a way that avoids this pitfall... That said, I'm still nervous about it. 2. Potential bottleneck when retrieving/assigning IDs -- IDs assigned at the database. My reasons for being nervous about
2 0.71925563 431 high scalability-2008-10-27-Notify.me Architecture - Synchronicity Kills
Introduction: What's cool about starting a new project is you finally have a chance to do it right. You of course eventually mess everything up in your own way, but for that one moment the world has a perfect order, a rightness that feels satisfying and good. Arne Claassen, the CTO of notify.me, a brand new real time notification delivery service, is in this honeymoon period now. Arne has been gracious enough to share with us his philosophy of how to build a notification service. I think you'll find it fascinating because Arne goes into a lot of useful detail about how his system works. His main design philosophy is to minimize the bottlenecks that form around synchronous access, that is when some resource is requested and the requestor ties up more resources, waiting for a response. If the requested resource can’t be delivered in a timely manner, more and more requests pile up until the server can’t accept any new ones. Nobody gets what they want and you have an outage. Breaking synchronous op
3 0.71514654 1507 high scalability-2013-08-26-Reddit: Lessons Learned from Mistakes Made Scaling to 1 Billion Pageviews a Month
Introduction: Jeremy Edberg , the first paid employee at reddit, teaches us a lot about how to create a successful social site in a really good talk he gave at the RAMP conference. Watch it here at Scaling Reddit from 1 Million to 1 Billion–Pitfalls and Lessons . Jeremy uses a virtue and sin approach. Examples of the mistakes made in scaling reddit are shared and it turns out they did a lot of good stuff too. Somewhat of a shocker is that Jeremy is now a Reliability Architect at Netflix, so we get a little Netflix perspective thrown in for free. Some of the lessons that stood out most for me: Think of SSDs as cheap RAM, not expensive disk . When reddit moved from spinning disks to SSDs for the database the number of servers was reduced from 12 to 1 with a ton of headroom. SSDs are 4x more expensive but you get 16x the performance. Worth the cost. Give users a little bit of power, see what they do with it, and turn the good stuff into features . One of the biggest revelations
4 0.71502161 435 high scalability-2008-10-30-The case for functional decomposition
Introduction: Hi all, I'm a big fan of http://highscalability.com/ and have been looking in my current development to decompose my application along functional boundaries as a route to being able to scale out the server side, specifically the database layer. The problem comes when there are links between the data in different components, ie one component holds all the user data, but another component needs to reference a user as being an owner of some piece of data. I'm currently doing this by holding the primary key information for each side of the the link (as you would if they all lived in a single database), but this link table needs to exist in both components to allow lookups to be done in either direction, ie 'get the things a specific user owns' and 'get the owners of this specific thing' would each use different components. The alternative to this would be to store the link data in only one of the components, but then the reverse lookups would require 2 calls instead of just one. M
5 0.713314 561 high scalability-2009-04-08-N+1+caching is ok?
Introduction: Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all. I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache. I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data th
6 0.70823097 1491 high scalability-2013-07-15-Ask HS: What's Wrong with Twitter, Why Isn't One Machine Enough?
7 0.70668936 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
8 0.70481265 1148 high scalability-2011-11-29-DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
9 0.69294578 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
10 0.69179684 828 high scalability-2010-05-17-7 Lessons Learned While Building Reddit to 270 Million Page Views a Month
11 0.68791777 337 high scalability-2008-05-31-memcached and Storage of Friend list
13 0.67741615 1573 high scalability-2014-01-06-How HipChat Stores and Indexes Billions of Messages Using ElasticSearch and Redis
14 0.67670697 1222 high scalability-2012-04-05-Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory
15 0.67561924 379 high scalability-2008-09-04-Database question for upcoming project
16 0.67418265 343 high scalability-2008-06-09-Apple's iPhone to Use a Centralized Push Based Notification Architecture
17 0.67382115 514 high scalability-2009-02-18-Numbers Everyone Should Know
18 0.67223036 116 high scalability-2007-10-08-Lessons from Pownce - The Early Years
19 0.66868067 639 high scalability-2009-06-27-Scaling Twitter: Making Twitter 10000 Percent Faster
20 0.66154891 1519 high scalability-2013-09-18-If You're Programming a Cell Phone Like a Server You're Doing it Wrong
topicId topicWeight
[(1, 0.09), (2, 0.159), (10, 0.06), (28, 0.317), (61, 0.14), (85, 0.065), (94, 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 0.85333061 606 high scalability-2009-05-25-non-sequential, unique identifier, strategy question
Introduction: (Please bare with me, I'm a new, passionate, confident and terrified programmer :D ) Background: I'm pre-launch and 1 year into the development of my application. My target is to be able to eventually handle millions of registered users with 5-10% of them concurrent. Up to this point I've used auto-increment to assign unique identifiers to rows. I am now considering switching to a non-sequential strategy. Oh, I'm using the LAMP configuration. My reasons for avoiding auto-increment: 1. Complicates replication when scaling horizontally. Risk of collision is significant (when running multiple masters). Note: I've read the other entries in this forum that relate to ID generation and there have been some great suggestions -- including a strategy that uses auto-increment in a way that avoids this pitfall... That said, I'm still nervous about it. 2. Potential bottleneck when retrieving/assigning IDs -- IDs assigned at the database. My reasons for being nervous about
2 0.81724924 562 high scalability-2009-04-10-Facebook's Aditya giving presentation on Facebook Architecture
Introduction: Facebook's engg. director aditya talks about facebook architecture. How they use mysql, php and memcache. How they have modified the above to suit their requirements.
3 0.78876907 630 high scalability-2009-06-14-kngine 'Knowledge Engine' milestone 2
Introduction: Kngine is Knowledge Web search engine designed to provide meaningful search results, such as: semantic information about the keywords/concepts, answer the user’s questions, discover the relations between the keywords/concepts, and link the different kind of data together, such as: Movies, Subtitles, Photos, Price at sale store, User reviews, and Influenced story Goals Kngine long-term goal is to make all human beings systematic knowledge and experience accessible to everyone. I aim to collect and organize all objective data, and make it possible and easy to access. Our goal is to build on the advances of Web search engine, semantic web, data representation technologies a new form of Web search engine that will unleash a revolution of new possibilities. Kngine tries to combine the power of Web search engines with the power of Semantic search and the data representation to provide meaningful search results compromising user needs. Status Kngine starts as a research project in O
4 0.76370627 1294 high scalability-2012-08-01-Prismatic Update: Machine Learning on Documents and Users
Introduction: In update to Prismatic Architecture - Using Machine Learning on Social Networks to Figure Out What You Should Read on the Web , Jason Wolfe, even in the face of deadening fatigue from long nights spent getting their iPhone app out, has gallantly agreed to talk a little more about Primatic's approach to Machine Learning. Documents and users are two areas where Prismatic applies ML (machine learning): ML on Documents Given an HTML document:Â learn how to extract the main text of the page (rather than the sidebar, footer, comments, etc), its title, author, best images, etc determine features for relevance (e.g., what the article is about, topics, etc.) The setup for most of these tasks is pretty typical. Models are trained using big batch jobs on other machines that read data from s3, save the learned parameter files to s3, and then read (and periodically refresh) the models from s3 in the ingest pipeline. All of the data that flows out of the system can be
5 0.71947068 752 high scalability-2009-12-17-Oracle and IBM databases: Disk-based vs In-memory databases
Introduction: Current disk based RDBMS can run out of steam when processing large data. Can these problems be solved by migrating from a disk based RDBMS to an IMDB? Any limitations? To find out, I tested one of each from the two leading vendors who together hold 70% of the market share - Oracle's 11g and TimesTen 11g , and IBM's DB2 v9.5 and solidDB 6.3 . read more at BigDataMatters.com
7 0.66142589 903 high scalability-2010-09-17-Hot Scalability Links For Sep 17, 2010
8 0.65965128 1506 high scalability-2013-08-23-Stuff The Internet Says On Scalability For August 23, 2013
9 0.64519304 806 high scalability-2010-04-08-Hot Scalability Links for April 8, 2010
10 0.64151859 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
11 0.63537723 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks
12 0.63228881 1261 high scalability-2012-06-08-Stuff The Internet Says On Scalability For June 8, 2012
14 0.58451325 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
15 0.58358836 6 high scalability-2007-07-11-Friendster Architecture
16 0.58281219 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
17 0.58125162 1031 high scalability-2011-04-28-PaaS on OpenStack - Run Applications on Any Cloud, Any Time Using Any Thing
18 0.57955563 1089 high scalability-2011-07-29-Stuff The Internet Says On Scalability For July 29, 2011
19 0.5792433 1439 high scalability-2013-04-12-Stuff The Internet Says On Scalability For April 12, 2013
20 0.57785028 269 high scalability-2008-03-08-Audiogalaxy.com Architecture