`
wbj0110
  • 浏览: 1631723 次
  • 性别: Icon_minigender_1
  • 来自: 上海
文章分类
社区版块
存档分类
最新评论

Easy, Real-Time Big Data Analysis Using Storm

阅读更多

 

Today, companies regularly generate terabytes of data in their daily operations. The sources include everything from data captured from network sensors, to the Web, social media, transactional business data, and data created in other business contexts. Given the volume of data being generated, real-time computation has become a major challenge faced by many organizations. A scalable real-time computation system that we have used effectively is the open-source Storm tool, which was developed at Twitter and is sometimes referred to as "real-time Hadoop." However, Storm is far simpler to use than Hadoop in that it does not require mastering an alternate universe of new technologies simply to handle big data jobs.

This article explains how to use Storm. The example project, called "Speeding Alert System," analyzes real-time data and raises a trigger and relevant data to a database, when the speed of a vehicle exceeds a predefined threshold.

Storm

Whereas Hadoop relies on batch processing, Storm is a real-time, distributed, fault-tolerant, computation system. Like Hadoop, it can process huge amount of data—but does so in real time — with guaranteed reliability; that is, every message will be processed. Storm also offers features such as fault tolerance and distributed computation, which make it suitable for processing huge amounts of data on different machines. It has these features as well:

  • It has simple scalability. To scale, you simply add machines and change parallelism settings of the topology. Storm's usage of Hadoop's Zookeeper for cluster coordination makes it scalable for large cluster sizes.
  • It guarantees processing of every message.
  • Storm clusters are easy to manage.
  • Storm is fault tolerant: Once a topology is submitted, Storm runs the topology until it is killed or the cluster is shut down. Also, if there are faults during execution, reassignment of tasks is handled by Storm.
  • Topologies in Storm can be defined in any language, although typically Java is used.

To follow the rest of the article, you first need to install and set up Storm. The steps are straightforward:

  • Download the Storm archive from the official Storm website.
  • Unpack the bin/ directory onto your PATH and make sure the bin/storm script is executable.

Storm Components

A Storm cluster mainly consists of a master and worker node, with coordination done by Zookeeper.

Master Node:

The master node runs a daemon, Nimbus, which is responsible for distributing the code around the cluster, assigning the tasks, and monitoring failures. It is similar to the Job Tracker in Hadoop.

Worker Node:

The worker node runs a daemon, Supervisor, which listens to the work assigned and runs the worker process based on requirements. Each worker node executes a subset of a topology. The coordination between Nimbus and several supervisors is managed by a Zookeeper system or cluster.

Zookeeper

Zookeeper is responsible for maintaining the coordination service between the supervisor and master. The logic for a real-time application is packaged into a Storm "topology." A topology consists of a graph of spouts (data sources) and bolts (data operations) that are connected with stream groupings (coordination). Let's look at these terms in greater depth.

Spout:

In simple terms, a spout reads the data from a source for use in the topology. A spout can either be reliable or unreliable. A reliable spout makes sure to resend a tuple (which is an ordered list of data items) if Storm fails to process it. An unreliable spout does not track the tuple once it's emitted. The main method in a spout is nextTuple(). This method either emits a new tuple to the topology or it returns if there is nothing to emit.

Bolt:

A bolt is responsible for all the processing that happens in a topology. Bolts can do anything from filtering to joins, aggregations, talking to files/databases, and so on. Bolts receive the data from a spout for processing, which may further emit tuples to another bolt in case of complex stream transformations. The main method in a bolt is execute(), which accepts a tuple as input. In both the spout and bolt, to emit the tuple to more than one stream, the streams can be declared and specified in declareStream().

Stream Groupings:

A stream grouping defines how a stream should be partitioned among the bolt's tasks. There are built-in stream groupings provided by Storm: shuffle grouping, fields grouping, all grouping, one grouping, direct grouping, and local/shuffle grouping. Custom implementation by using theCustomStreamGrouping interface can also be added.

Implementation

For our use case, we designed one topology of spout and bolt that can process a huge amount of data (log files) designed to trigger an alarm when a specific value crosses a predefined threshold. Using a Storm topology, the log file is read line by line and the topology is designed to monitor the incoming data. In terms of Storm components, the spout reads the incoming data. It not only reads the data from existing files, but it also monitors for new files. As soon as a file is modified, spout reads this new entry and, after converting it to tuples (a format that can be read by a bolt), emits the tuples to the bolt to perform threshold analysis, which finds any record that has exceeded the threshold.

The next section explains the use case in detail.

Threshold Analysis

In this article, we will be mainly concentrating on two types of threshold analysis: instant threshold and time series threshold.

  • Instant threshold checks if the value of a field has exceeded the threshold value at that instant and raises a trigger if the condition is satisfied. For example, raise a trigger if the speed of a vehicle exceeds 80 km/h.
  • Time series threshold checks if the value of a field has exceeded the threshold value for a given time window and raises a trigger if the same is satisfied. For example, raise a trigger if the speed of a vehicle exceeds 80 km/h more than once in last five minutes.

Listing One shows a log file of the type we'll use, which contains vehicle data information such as vehicle number, speed at which the vehicle is traveling, and location in which the information is captured.

 

Listing OneA log file with entries of vehicles passing through the checkpoint.

AB 123, 60, North city
BC 123, 70, South city
CD 234, 40, South city 
DE 123, 40, East city
EF 123, 90, South city
GH 123, 50, West city

A corresponding XML file is created, which consists of the schema for the incoming data. It is used for parsing the log file. The schema XML and its corresponding description are shown in the table.

The XML file and the log file are in a directory that is monitored by the spout constantly for real-time changes. The topology we use for this example is shown in Figure 1.

 
Figure 1: Topology created in Storm to process real-time data.

As shown in Figure 1, the FileListenerSpout accepts the input log file, reads the data line by line, and emits the datea to the ThresoldCalculatorBolt for further threshold processing. Once the processing is done, the contents of the line for which the threshold is calculated is emitted to the DBWriterBolt, where it is persisted in the database (or an alert is raised). The detailed implementation for this process is explained next.

Spout Implementation

Spout takes a log file and the XML descriptor file as the input. The XML file consists of the schema corresponding to the log file. Let us consider an example log file, which has vehicle data information such as vehicle number, speed at which the vehicle is travelling, and location in which the information is captured. (See Figure 2.)

 
Figure 2: Flow of data from log files to Spout.

 

 

 

Listing Two shows the specific XML file for a tuple, which specifies the fields and the delimiter separating the fields in a log file. Both the XML file and the data are kept in a directory whose path is specified in the spout.

Listing TwoAn XML file created for describing the log file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<TUPLEINFO>
              <FIELDLIST>
                  <FIELD>
                            <COLUMNNAME>vehicle_number</COLUMNNAME>
                            <COLUMNTYPE>string</COLUMNTYPE>
                  </FIELD>
                             
                  <FIELD>
                            <COLUMNNAME>speed</COLUMNNAME>
                            <COLUMNTYPE>int</COLUMNTYPE>
                  </FIELD>
                                                          
                  <FIELD>
                             <COLUMNNAME>location</COLUMNNAME>
                             <COLUMNTYPE>string</COLUMNTYPE>
                  </FIELD>
              </FIELDLIST
           <DELIMITER>,</DELIMITER>
</TUPLEINFO>

 

An instance of Spout is initialized with constructor parameters of DirectoryPath, andTupleInfo object. The TupleInfo object stores necessary information related to log file such as fields, delimiter, and type of field. This object is created by serializing the XML file usingXStream.

Spout implementation steps are:

  • Listen to changes on individual log files. Monitor the directory for the addition of new log files.
  • Convert rows read by the spout to tuples after declaring fields for them.
  • Declare the grouping between spout and bolt, deciding the way in which tuples are given to bolt.

The code for Spout is shown in Listing Three.

Listing ThreeLogic in Open, nextTuple, and declareOutputFields methods of Spout.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
public void open( Map conf, TopologyContext context,SpoutOutputCollector collector )
{
    _collector = collector;
    try
    {
    fileReader  =  new BufferedReader(new FileReader(new File(file)));
    }
    catch (FileNotFoundException e)
    {
    System.exit(1);
    }
}
                                                                                                  
public void nextTuple()
{
    protected void ListenFile(File file)
    {
    Utils.sleep(2000);
    RandomAccessFile access = null;
    String line = null;                 
       try
       {
           while ((line = access.readLine()) != null)
           {
               if (line !=null)
               {
                    String[] fields=null;
                    if (tupleInfo.getDelimiter().equals("|"))
                       fields = line.split("\\"+tupleInfo.getDelimiter());
                    else                                                                                                             fields = line.split(tupleInfo.getDelimiter());                                               
                    if (tupleInfo.getFieldList().size() == fields.length)
                       _collector.emit(new Values(fields));
               }         
           }
      }
      catch (IOException ex) { }             
      }
}
 
public void declareOutputFields(OutputFieldsDeclarer declarer)
{
     String[] fieldsArr = new String [tupleInfo.getFieldList().size()];
     for(int i=0; i<tupleInfo.getFieldList().size(); i++)
     {
         fieldsArr[i] = tupleInfo.getFieldList().get(i).getColumnName();
     }          
     declarer.declare(new Fields(fieldsArr));
}  

declareOutputFields() decides the format in which the tuple is emitted, so that the bolt can decode the tuple in a similar fashion. Spout keeps on listening to the data added to the log file and as soon as data is added, it reads and emits the data to the bolt for processing.

 

 

Bolt Implementation

The output of Spout is given to Bolt for further processing. The topology we have considered for our use case consists of two bolts as shown in Figure 3.

 
Figure 3: Flow of data from Spout to Bolt.

ThresholdCalculatorBolt

The tuples emitted by spout is received by the ThresholdCalculatorBolt for threshold processing. It accepts several of inputs for threshold check. The inputs it accepts are:

  • Threshold value to check
  • Threshold column number to check
  • Threshold column data type
  • Threshold check operator
  • Threshold frequency of occurrence
  • Threshold time window

A class, shown Listing Four, is defined to hold these values.

Listing FourThresholdInfo class.

1
2
3
4
5
6
7
8
9
public class ThresholdInfo implements Serializable
{
    private String action;
    private String rule;
    private Object thresholdValue;
    private int thresholdColNumber;
    private Integer timeWindow;
    private int frequencyOfOccurence;
}

 

Based on the values provided in fields, the threshold check is made in the execute() method as shown in Listing Five. The code mostly consists of parsing and checking the incoming values.

Listing FiveCode for threshold check.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
public void execute(Tuple tuple, BasicOutputCollector collector)
{
    if(tuple!=null)
    {
        List<Object> inputTupleList = (List<Object>) tuple.getValues();
        int thresholdColNum = thresholdInfo.getThresholdColNumber();
        Object thresholdValue = thresholdInfo.getThresholdValue();
        String thresholdDataType =
            tupleInfo.getFieldList().get(thresholdColNum-1).getColumnType();
        Integer timeWindow = thresholdInfo.getTimeWindow();
        int frequency = thresholdInfo.getFrequencyOfOccurence();
 
        if(thresholdDataType.equalsIgnoreCase("string"))
        {
            String valueToCheck = inputTupleList.get(thresholdColNum-1).toString();
            String frequencyChkOp = thresholdInfo.getAction();
            if(timeWindow!=null)
            {
                long curTime = System.currentTimeMillis();
                long diffInMinutes = (curTime-startTime)/(1000);
                if(diffInMinutes>=timeWindow)
                {
                    if(frequencyChkOp.equals("=="))
                    {
                         if(valueToCheck.equalsIgnoreCase(thresholdValue.toString()))
                         {
                             count.incrementAndGet();
                             if(count.get() > frequency)
                                 splitAndEmit(inputTupleList,collector);
                         }
                    }
                    else if(frequencyChkOp.equals("!="))
                    {
                        if(!valueToCheck.equalsIgnoreCase(thresholdValue.toString()))
                        {
                             count.incrementAndGet();
                             if(count.get() > frequency)
                                 splitAndEmit(inputTupleList,collector);
                         }
                     }
                     else
                         System.out.println("Operator not supported");
                 }
             }
             else
             {
                 if(frequencyChkOp.equals("=="))
                 {
                     if(valueToCheck.equalsIgnoreCase(thresholdValue.toString()))
                     {
                         count.incrementAndGet();
                         if(count.get() > frequency)
                             splitAndEmit(inputTupleList,collector);   
                     }
                 }
                 else if(frequencyChkOp.equals("!="))
                 {
                      if(!valueToCheck.equalsIgnoreCase(thresholdValue.toString()))
                      {
                          count.incrementAndGet();
                          if(count.get() > frequency)
                              splitAndEmit(inputTupleList,collector);  
                      }
                  }
              }
           }
           else if(thresholdDataType.equalsIgnoreCase("int") ||
                   thresholdDataType.equalsIgnoreCase("double") ||
                   thresholdDataType.equalsIgnoreCase("float") ||
                   thresholdDataType.equalsIgnoreCase("long") ||
                   thresholdDataType.equalsIgnoreCase("short"))
           {
               String frequencyChkOp = thresholdInfo.getAction();
               if(timeWindow!=null)
               {
                    long valueToCheck =
                        Long.parseLong(inputTupleList.get(thresholdColNum-1).toString());
                    long curTime = System.currentTimeMillis();
                    long diffInMinutes = (curTime-startTime)/(1000);
                    System.out.println("Difference in minutes="+diffInMinutes);
                    if(diffInMinutes>=timeWindow)
                    {
                         if(frequencyChkOp.equals("<"))
                         {
                             if(valueToCheck < Double.parseDouble(thresholdValue.toString()))
                             {
                                  count.incrementAndGet();
                                  if(count.get() > frequency)
                                      splitAndEmit(inputTupleList,collector);
                             }
                         }
                         else if(frequencyChkOp.equals(">"))
                         {
                              if(valueToCheck > Double.parseDouble(thresholdValue.toString()))
                              {
                                  count.incrementAndGet();
                                  if(count.get() > frequency)
                                      splitAndEmit(inputTupleList,collector);
                              }
                          }
                          else if(frequencyChkOp.equals("=="))
                          {
                             if(valueToCheck == Double.parseDouble(thresholdValue.toString()))
                             {
                                 count.incrementAndGet();
                                 if(count.get() > frequency)
                                     splitAndEmit(inputTupleList,collector);
                              }
                          }
                          else if(frequencyChkOp.equals("!="))
                          {
   . . .
                          }
                      }
                   
             }
     else
          splitAndEmit(null,collector);
     }
     else
     {
          System.err.println("Emitting null in bolt");
          splitAndEmit(null,collector);
     }
}

 

The tuples emitted by the threshold bolt are passed to the next corresponding bolt, which is the DBWriterBolt bolt in our case.

DBWriterBolt

The processed tuple has to be persisted for raising a trigger or for further use. DBWriterBoltdoes the job of persisting the tuples into the database. The creation of a table is done inprepare(), which is the first method invoked by the topology. Code for this method is given in Listing Six.

Listing SixCode for creation of tables.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
public void prepare( Map StormConf, TopologyContext context )
{      
    try
    {
        Class.forName(dbClass);
    }
    catch (ClassNotFoundException e)
    {
        System.out.println("Driver not found");
        e.printStackTrace();
    }
 
    try
    {
       connection driverManager.getConnection(
           "jdbc:mysql://"+databaseIP+":"+databasePort+"/"+databaseName, userName, pwd);
       connection.prepareStatement("DROP TABLE IF EXISTS "+tableName).execute();
 
       StringBuilder createQuery = new StringBuilder(
           "CREATE TABLE IF NOT EXISTS "+tableName+"(");
       for(Field fields : tupleInfo.getFieldList())
       {
           if(fields.getColumnType().equalsIgnoreCase("String"))
               createQuery.append(fields.getColumnName()+" VARCHAR(500),");
           else
               createQuery.append(fields.getColumnName()+" "+fields.getColumnType()+",");
       }
       createQuery.append("thresholdTimeStamp timestamp)");
       connection.prepareStatement(createQuery.toString()).execute();
 
       // Insert Query
       StringBuilder insertQuery = new StringBuilder("INSERT INTO "+tableName+"(");
       String tempCreateQuery = new String();
       for(Field fields : tupleInfo.getFieldList())
       {
            insertQuery.append(fields.getColumnName()+",");
       }
       insertQuery.append("thresholdTimeStamp").append(") values (");
       for(Field fields : tupleInfo.getFieldList())
       {
           insertQuery.append("?,");
       }
 
       insertQuery.append("?)");
       prepStatement = connection.prepareStatement(insertQuery.toString());
    }
    catch (SQLException e)
    {      
        e.printStackTrace();
    }      
}

Insertion of data is done in batches. The logic for insertion is provided in execute() as shown in Listing Seven, and consists mostly of parsing the variety of different possible input types.

Listing Seven: Code for insertion of data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
public void execute(Tuple tuple, BasicOutputCollector collector)
{
    batchExecuted=false;
    if(tuple!=null)
    {
       List<Object> inputTupleList = (List<Object>) tuple.getValues();
       int dbIndex=0;
       for(int i=0;i<tupleInfo.getFieldList().size();i++)
       {
           Field field = tupleInfo.getFieldList().get(i);
           try {
               dbIndex = i+1;
               if(field.getColumnType().equalsIgnoreCase("String"))            
                   prepStatement.setString(dbIndex, inputTupleList.get(i).toString());
               else if(field.getColumnType().equalsIgnoreCase("int"))
                   prepStatement.setInt(dbIndex,
                       Integer.parseInt(inputTupleList.get(i).toString()));
               else if(field.getColumnType().equalsIgnoreCase("long"))
                   prepStatement.setLong(dbIndex,
                       Long.parseLong(inputTupleList.get(i).toString()));
               else if(field.getColumnType().equalsIgnoreCase("float"))
                   prepStatement.setFloat(dbIndex,
                       Float.parseFloat(inputTupleList.get(i).toString()));
               else if(field.getColumnType().equalsIgnoreCase("double"))
                   prepStatement.setDouble(dbIndex,
                       Double.parseDouble(inputTupleList.get(i).toString()));
               else if(field.getColumnType().equalsIgnoreCase("short"))
                   prepStatement.setShort(dbIndex,
                       Short.parseShort(inputTupleList.get(i).toString()));
               else if(field.getColumnType().equalsIgnoreCase("boolean"))
                   prepStatement.setBoolean(dbIndex,
                       Boolean.parseBoolean(inputTupleList.get(i).toString()));
               else if(field.getColumnType().equalsIgnoreCase("byte"))
                   prepStatement.setByte(dbIndex,
                       Byte.parseByte(inputTupleList.get(i).toString()));
               else if(field.getColumnType().equalsIgnoreCase("Date"))
               {
                  Date dateToAdd=null;
                  if (!(inputTupleList.get(i) instanceof Date)) 
                  
                       DateFormat df = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
                       try
                       {
                           dateToAdd = df.parse(inputTupleList.get(i).toString());
                       }
                       catch (ParseException e)
                       {
                           System.err.println("Data type not valid");
                       }
                   
                   else
                   {
            dateToAdd = (Date)inputTupleList.get(i);
            java.sql.Date sqlDate = new java.sql.Date(dateToAdd.getTime());
            prepStatement.setDate(dbIndex, sqlDate);
            }  
            }
        catch (SQLException e)
        {
             e.printStackTrace();
        }
    }
    Date now = new Date();         
    try
    {
        prepStatement.setTimestamp(dbIndex+1, new java.sql.Timestamp(now.getTime()));
        prepStatement.addBatch();
        counter.incrementAndGet();
        if (counter.get()== batchSize)
        executeBatch();
    }
    catch (SQLException e1)
    {
        e1.printStackTrace();
    }          
   }
   else
   {
        long curTime = System.currentTimeMillis();
       long diffInSeconds = (curTime-startTime)/(60*1000);
       if(counter.get()<batchSize && diffInSeconds>batchTimeWindowInSeconds)
       {
            try {
                executeBatch();
                startTime = System.currentTimeMillis();
            }
            catch (SQLException e) {
                 e.printStackTrace();
            }
       }
   }
}
 
public void executeBatch() throws SQLException
{
    batchExecuted=true;
    prepStatement.executeBatch();
    counter = new AtomicInteger(0);
}

Once the spout and bolt are ready to be executed, a topology is built by the topology builder to execute it. The next section explains the execution steps.

Running and Testing the Topology in a Local Cluster

  • Define the topology using TopologyBuilder, which exposes the Java API for specifying a topology for Storm to execute.
  • Using Storm Submitter, we submit the topology to the cluster. It takes name of the topology, configuration, and topology as input.
  • Submit the topology.

Listing EightBuilding and executing a topology.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public class StormMain
{
     public static void main(String[] args) throws AlreadyAliveException,
                                                   InvalidTopologyException,
                                                   InterruptedException
     {
          ParallelFileSpout parallelFileSpout = new ParallelFileSpout();
          ThresholdBolt thresholdBolt = new ThresholdBolt();
          DBWriterBolt dbWriterBolt = new DBWriterBolt();
          TopologyBuilder builder = new TopologyBuilder();
          builder.setSpout("spout", parallelFileSpout, 1);
          builder.setBolt("thresholdBolt", thresholdBolt,1).shuffleGrouping("spout");
          builder.setBolt("dbWriterBolt",dbWriterBolt,1).shuffleGrouping("thresholdBolt");
          if(this.argsMain!=null && this.argsMain.length > 0)
          {
              conf.setNumWorkers(1);
              StormSubmitter.submitTopology(
                   this.argsMain[0], conf, builder.createTopology());
          }
          else
          {   
              Config conf = new Config();
              conf.setDebug(true);
              conf.setMaxTaskParallelism(3);
              LocalCluster cluster = new LocalCluster();
              cluster.submitTopology(
              "Threshold_Test", conf, builder.createTopology());
          }
     }
}

 

After building the topology, it is submitted to local cluster. Once the topology is submitted, it runs until it is explicitly killed or the cluster is shut down without requiring any modifications. This is another big advantage of Storm.

This comparatively simple example shows the ease with which it's possible to set up and use Storm once you understand the basic concepts of topology, spout, and bolt. The code is straightforward and both scalability and speed are provided by Storm. So, if you're looking to handle big data and don't want to traverse the Hadoop universe, you might well find that using Storm is a simple and elegant solution.


Shruthi Kumar works as a technology analyst and Siddharth Patankar is a software engineer with the Cloud Center of Excellence at Infosys Labs.

 

 

 

 

分享到:
评论

相关推荐

    Big Data Made Easy - A Working Guide To The Complete Hadoop Toolset

    - **Apache Storm**: A real-time computation system for processing streaming data. #### Chapter 7: Monitoring Data Monitoring is essential for ensuring the health and performance of Hadoop clusters. ...

    Sams.Teach.Yourself.Big.Data.Analytics.with.Microsoft.HDInsight

    Perform real-time Stream Analytics on high-velocity big data streams with Storm Integration of Enterprise Data Warehouse with Hadoop and Microsoft Analytics Platform System (APS), formally known as ...

    【大数据课设】p105出租车数据可视化分析-大数据-实训大作业.zip

    项目资源包含:可运行源码+数据集+文档 python + numpy, pandas, matplotlib, pyecharts, wordcloud 适用人群:学习不同技术领域的小白或进阶学习者;可作为课程设计、大作业、工程实训或初期项目立项。 数据来源:数据集taxis.csv从网络下载 数据清洗:异常值与缺失值的处理:有一些数据distance(乘车距离)为零而且上下车地点为空,还有些一些数据的payment(支付方式)为空。 数据预处理:将列名更改成中文 标准化与归一化: 数据分析: 数据可视化:

    TypeScript 入门教程

    TypeScript 入门教程

    人脸识别_课堂考勤_OpenCV_服务端系统_1741777828.zip

    人脸识别项目实战

    历届电赛试题及综合测评(真题+模拟题)

    本资源汇总了 历届全国电子设计竞赛(电赛)真题+模拟题,涵盖 电路设计、嵌入式系统、信号处理、自动控制等核心考点,并提供详细解析及综合测评,帮助参赛者高效备赛、查漏补缺、提升实战能力。 适用人群: 适合 准备参加电子设计竞赛的大学生、电赛爱好者、电子信息类相关专业的学生,以及希望提高电子设计和电路分析能力的工程师。 能学到什么: 电赛考察重点:熟悉往届竞赛的命题方向及考核重点。 电路设计与仿真:提升模拟电路、数字电路、单片机等核心技能。 问题分析与解决能力:通过综合测评找到薄弱点并针对性提升。 实战经验:掌握竞赛策略,提高应试效率和设计能力。 阅读建议: 建议先 通读真题,了解题型与解题思路,然后 结合模拟题实战演练,查找不足并通过测评强化练习,逐步提升竞赛能力。

    2024人工智能如何塑造未来产业:AI对各行业组织带来的的变革研究研究报告.pdf

    2024人工智能如何塑造未来产业:AI对各行业组织带来的的变革研究研究报告.pdf

    人脸识别_Golang_SDK_命令行登录_微信小程序应用_1741772240.zip

    人脸识别项目源码实战

    Vulkan原理与实战课程

    给大家分享一套课程——Vulkan原理与实战课程

    SiriYXR_Sokoban11_1741860914.zip

    c语言学习

    海豚鲸鱼数据集 5435张图 正确识别率可达92.6% 可识别:海豚 虎鲸 蜥蜴 海豹 鲨鱼 龟 支持yolov8格式标注

    海豚鲸鱼数据集 5435张图 正确识别率可达92.6% 可识别:海豚 虎鲸 蜥蜴 海豹 鲨鱼 龟 支持yolov8格式标注

    答谢中书书教学设计.docx

    答谢中书书教学设计.docx

    人脸识别_环境搭建_dlib_face_recognitio_1741771308.zip

    人脸识别项目源码实战

    网络技术_Web服务器_C语言_学习交流版_1741863251.zip

    c语言学习

    安卓开发_Gradle配置_React_Native_Meg_1741777287.zip

    人脸识别项目源码实战

    人工智能_深度学习_图像识别_UI界面_项目展示.zip

    人脸识别项目实战

    基于Springboot框架的美发门店管理系统的设计与实现(Java项目编程实战+完整源码+毕设文档+sql文件+学习练手好项目).zip

    本美发门店管理系统有管理员和用户两个角色。用户功能有项目预定管理,产品购买管理,会员充值管理,余额查询管理。管理员功能有个人中心,用户管理,美容项目管理,项目类型管理,项目预定管理,产品库存管理,产品购买管理,产品入库管理,会员卡管理,会员充值管理,余额查询管理,产品类型管理,系统管理等。因而具有一定的实用性。 本站是一个B/S模式系统,采用SSM框架,MYSQL数据库设计开发,充分保证系统的稳定性。系统具有界面清晰、操作简单,功能齐全的特点,使得美发门店管理系统管理工作系统化、规范化。本系统的使用使管理人员从繁重的工作中解脱出来,实现无纸化办公,能够有效的提高美发门店管理系统管理效率。 关键词:美发门店管理系统;SSM框架;MYSQL数据库;Spring Boot 1系统概述 1 1.1 研究背景 1 1.2研究目的 1 1.3系统设计思想 1 2相关技术 2 2.1 MYSQL数据库 2 2.2 B/S结构 3 2.3 Spring Boot框架简介 4 3系统分析 4 3.1可行性分析 4 3.1.1技术可行性 4 3.1.2经济可行性 5 3.1.3操作可行性 5 3.2系

    Python实现基于SSA-CNN-GRU麻雀算法优化卷积门控循环单元数据分类预测的详细项目实例(含完整的程序,GUI设计和代码详解)

    内容概要:本文档介绍了基于SSA-CNN-GRU麻雀算法优化卷积门控循环单元数据分类预测的详细项目实例,重点讲述了该项目的背景、目标、挑战与解决方案、技术特点、应用领域等方面的内容。文档详细记录了从项目启动、数据预处理、算法设计(SSA优化CNN-GRU模型)、构建与评估模型到实现美观的GUI界面整个过程,并讨论了防止过拟合的技术如正则化、早停和超参数优化。另外还涵盖了项目扩展的可能性、部署和应用策略、需要注意的地方以及未来改进的方向。全文强调了模型的泛化能力和计算效率,展示了该混合算法模型在实际应用中的优越性能。 适合人群:具备一定的Python编程经验及机器学习基础知识的研究人员和技术人员;对深度学习、智能优化算法及实际应用感兴趣的学者和从业者;寻求提升数据分析和预测准确性的金融分析师、数据科学家等相关专业人士。 使用场景及目标:本文档非常适合用作学习和参考资料,以掌握如何将SSA、CNN与GRU三种先进技术结合起来进行复杂的分类和预测问题求解。具体应用场景包括但不限于以下几个方面:金融领域——股票价格预测;医疗保健领域——辅助诊断;工业制造——预防性维护;智能家居——个性化服务;以及其他涉及到时序数据分析和多模态数据处理的场合。文档既包含了理论知识又提供了完整的源代码示例,可以帮助读者理解算法原理并通过实践中加深对其的认识。 其他说明:该项目不仅仅是关于算法的设计实现,更是有关于系统的整体架构规划以及工程上的考量,比如环境准备(确保环境洁净、必要包的安装等)、数据准备、GPU配置支持等等。同时文中给出了详细的代码片段,方便开发者理解和复现实验成果。值得注意的是,虽然文中提供了一套通用解决方案,但在真实场景下还需要针对性的调整参数或修改网络结构来达到最好的性能效果。此外,对于追求更高的预测精度或解决更大规模的问题,作者建议进一步探索深度强化学习等高级技术和多任务学习策略,并且考虑使用增量学习让模型能够适应新数据而不必重新训练整个模型。最后提到安全性和隐私保护也是项目实施过程中的重要因素,要妥善保管用户的敏感信息并且做到合法合规地收集和使用数据。

    人脸识别_T形分布_Gabor变换_特征提取_增强鲁棒性_1741777397.zip

    人脸识别项目实战

Global site tag (gtag.js) - Google Analytics